Is this text extraction scenario possible in linux bash shell?

Let's say my text file is like this

Person1 : movie1 (space and tab) : movie 2 (space and tab) : movie 3 (space and tab) : movie 4

I want to find for a particular movie, the actor. So here is how I am going about doing this.

Do a grep cat actors | grep 'movie3'

This will give me line 3 which is an empty line up unitl movie3 appears. So if somehow I can get the first line before this particular line which follows this pattern

grep '^[^ \t].'(does not start with a space)

it has to be the line with the actor's name in this movie.(I don't care about movie one there)

Is there any combination of sed/grep/awk which can help me do it in shell? I hope the question is clear.

Answers


Bill Murray <- Groundhog Day <- grep with Perl mode Magic

It's a bit tricky, but you can use this:

grep -P "(?sm)^\S+[^:\r\n]*?(?=\s*:(?:(?!^\S).)*?Groundhog Day)" mymoviefile

See demo.

  • -P activates Perl mode
  • (?sm) turns on two mode modifiers:
  • s activates DOTALL mode, allowing the dot to match across lines
  • m turns on multi-line mode, allowing ^ and $ to match on each line
  • The ^ anchor asserts that we are at the beginning of the line
  • \S+ matches one or more non-space chars
  • [^:\r\n]*? lazily matches any non-colon, non-newline chars, up to ...
  • the point where the lookahead (?=\s*:(?:(?!^\S).)*?Groundhog Day) can assert, without consuming chars, that what follows is...
  • \s*: optional spaces and a colon
  • then (?:(?!^\S).)* zero or more chars that are not a non-space char at the beginning of a line, lazily matching up to...
  • Groundhog Day the movie title!

Reference


I would do it with awk if I unserstood the problem right:

 awk -F: -v s="$search" '$1~/\S/{p=$1}$2~s{print $1 FS $2}' file

test with movie 3:

kent$ cat f
Person1 : movie1
          : movie 2
          : movie 3
          : movie 4

in above file, there are leading spaces/tabs

kent$  awk -F: -v s="movie 3" '$1~/\S/{p=$1}$2~s{print p FS $2}' f
Person1 : movie 3

This might work for you (GNU sed):

sed -n '/^\S/h;/movie 3/{H;x;s/:.*:/:/p}' file

Use the -n switch to provide grep like nature. Save the person in the hold space and append the movie to it. Then remove unwanted text and print out.


This is a bit obscure but get the job done:

awk '/^[^ ]/{p=0} /Person1/{p=1} p'

Example:

Input file:

Person1 : movie1
    : movie 2
    : movie 3
    : movie 4
Person2 : movie 5
    : movie 6

Execution:

awk '/^[^ ]/{p=0} /Person1/{p=1} p' file
Person1 : movie1
    : movie 2
    : movie 3
    : movie 4

awk '/^[^ ]/{p=0} /Person2/{p=1} p' file
Person2 : movie 5
    : movie 6

OBS: In the command line the output is indented.

Explanation:

  1. If the line does not start with space, sets p=0
  2. If the line contains Person1 sets p=1
  3. if p=1 then print (This part is obscure)

Can be done in perl too:

perl -ne '/^\w+/ && {$p=0}; /Person1/ && {$p=1}; $p && {print}' 

Need Your Help

A best approach to extract contents of matched groups using a Perl-style regular expression in shell script

regex perl shell sed awk

My task is to extract some data from a given document using Perl-style (or at least extended) regular expression. I have:

How to smoothly rotate a character according to the slope it is walking onto/standing on?

box2d game-physics box2d-iphone physics-engine

I need to rotate a character sprite according to the scope of the platform he (a rectangle) is standing or walking on. I have achieved the effect by rotating it according to the slope of the platfo...

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.