Using ? with sed

I just want to get the number of a file that may or may not be gzip'd. However, it appears that a regular expression in sed does not support a ?. Here's what I tried:

echo 'file_1.gz'|sed -n 's/.*_\(.*\)\(\.gz\)?/\1/p'

and nothing was returned. Then I added a ? to the string being analyzed:

echo 'file_1.gz?'|sed -n 's/.*_\(.*\)\(\.gz\)?/\1/p'

and got:


So, it looks like the ? used in most regex's is not supported in sed, right? Well then, I would just like sed to give a 1 for file_1 and file_1.gz. What's the best way to do that in a bash script if execution time is critical?


The equivalent to x? is \(x\|\).

However, many versions of sed support an option to enable "extended regular expressions" which includes ?. In GNU sed the flag is -r. Note that this also changes unescaped parens to do grouping. eg:

echo 'file_1.gz'|sed -n -r 's/.*_(.*)(\.gz)?/\1/p'

Actually, there's another bug in your regex which is that the greedy .* in the parens is going to swallow up the ".gz" if there is one. sed doesn't have a non-greedy equivalent to * as far as I know, but you can use | to work around this. | in sed (and many other regex implementations) will use the leftmost match that works, so you can do something like this:

echo 'file_1.gz'|sed -r 's/(.*_(.*)\.gz)|(.*_(.*))/\2\4/'

This tries to match with .gz, and only tries without it if that doesn't work. Only one of group 2 or 4 will actually exist (since they are on opposite sides of the same |) so we just concatenate them to get the value we want.

Need Your Help

Sitefinity Search on dynamic module resulting no url


I have use sitefinity search in Sitefinity 6.3

Recursively find files that are not publicly readable

linux bash server public-html

I'd like to recursively find all files in my public_html folder that are not publicly readable (i.e. those files that will cause 403 error). Is there a quick bash command for that? I am using Linux