Is this text extraction scenario possible in linux bash shell?

Let's say my text file is like this

Person1 : movie1 (space and tab) : movie 2 (space and tab) : movie 3 (space and tab) : movie 4

I want to find for a particular movie, the actor. So here is how I am going about doing this.

Do a grep cat actors | grep 'movie3'

This will give me line 3 which is an empty line up unitl movie3 appears. So if somehow I can get the first line before this particular line which follows this pattern

grep '^[^ \t].'(does not start with a space)

it has to be the line with the actor's name in this movie.(I don't care about movie one there)

Is there any combination of sed/grep/awk which can help me do it in shell? I hope the question is clear.


Bill Murray <- Groundhog Day <- grep with Perl mode Magic

It's a bit tricky, but you can use this:

grep -P "(?sm)^\S+[^:\r\n]*?(?=\s*:(?:(?!^\S).)*?Groundhog Day)" mymoviefile

See demo.

  • -P activates Perl mode
  • (?sm) turns on two mode modifiers:
  • s activates DOTALL mode, allowing the dot to match across lines
  • m turns on multi-line mode, allowing ^ and $ to match on each line
  • The ^ anchor asserts that we are at the beginning of the line
  • \S+ matches one or more non-space chars
  • [^:\r\n]*? lazily matches any non-colon, non-newline chars, up to ...
  • the point where the lookahead (?=\s*:(?:(?!^\S).)*?Groundhog Day) can assert, without consuming chars, that what follows is...
  • \s*: optional spaces and a colon
  • then (?:(?!^\S).)* zero or more chars that are not a non-space char at the beginning of a line, lazily matching up to...
  • Groundhog Day the movie title!


I would do it with awk if I unserstood the problem right:

 awk -F: -v s="$search" '$1~/\S/{p=$1}$2~s{print $1 FS $2}' file

test with movie 3:

kent$ cat f
Person1 : movie1
          : movie 2
          : movie 3
          : movie 4

in above file, there are leading spaces/tabs

kent$  awk -F: -v s="movie 3" '$1~/\S/{p=$1}$2~s{print p FS $2}' f
Person1 : movie 3

This might work for you (GNU sed):

sed -n '/^\S/h;/movie 3/{H;x;s/:.*:/:/p}' file

Use the -n switch to provide grep like nature. Save the person in the hold space and append the movie to it. Then remove unwanted text and print out.

This is a bit obscure but get the job done:

awk '/^[^ ]/{p=0} /Person1/{p=1} p'


Input file:

Person1 : movie1
    : movie 2
    : movie 3
    : movie 4
Person2 : movie 5
    : movie 6


awk '/^[^ ]/{p=0} /Person1/{p=1} p' file
Person1 : movie1
    : movie 2
    : movie 3
    : movie 4

awk '/^[^ ]/{p=0} /Person2/{p=1} p' file
Person2 : movie 5
    : movie 6

OBS: In the command line the output is indented.


  1. If the line does not start with space, sets p=0
  2. If the line contains Person1 sets p=1
  3. if p=1 then print (This part is obscure)

Can be done in perl too:

perl -ne '/^\w+/ && {$p=0}; /Person1/ && {$p=1}; $p && {print}' 

