Tweet

How do I parse out part of a file from the middle of a file?

You have a file from which you are only interested in the lines between two bits of text.

A sample file, called "sample.txt"

    I don't care about this line
    or this one either...
    Some text I want to capture
    And this line 
    And this line also
    End of text that I want to capture
    I don't care about this line

The problem code

    #!/usr/bin/perl
    use strict;
    use warnings;

    while (<>) {
        next if (! /^Some text/ );   # match lines starting with 'Some text'
        print;
    }

Now, assuming you have saved this code in a file called parse.pl, and run it on the command line with: parse.pl < sample.txt, you'll get this output...

    Some text I want to capture

...but you want to see...

    Some text I want to capture
    And this line 
    And this line also
    End of text that I want to capture

Solution 1: Use the Perl range operator ".." between two regular expressions

    #!/usr/bin/perl
    use strict;
    use warnings;

    open (FILE, '<', 'sample.txt') or die "Could not open sample.txt: $!";
    while (<FILE>) {
        print $_ if (/^Some text/ .. /^End of text/);
    }
    close (FILE) or die "Could not close sample.txt: $!";

...which gives you this output...

    Some text I want to capture
    And this line 
    And this line also
    End of text that I want to capture

The if statement:

    print $_ if (/^Some text/ .. /^End of text/);

evaluates to true as soon as the first condition

    /^Some text/

is true, and will continue to be true until the second condition

    /^End of text/

is true, printing out exactly the lines you want.

Solution 2: Read the file in one gulp and use regular expressions to capture the text

    #!/usr/bin/perl
    use strict;
    use warnings;

    undef $/;               # Enable 'slurp' mode
    open (FILE, '<', 'sample.txt') or die "Could not open sample.txt: $!";

    my $file = <FILE>;      # Whole file here now...
    my ($stuff_that_interests_me) = 
         ($file =~ m/.*?(Some text.*?End of text that I want to capture).*/s);
    print "$stuff_that_interests_me\n";
    close (FILE) or die "Could not close sample.txt: $!";
Revision: 1.5 [Top]