Tweet

How do I use regular expressions to match numbers?

You have some text, and you want to find the numbers in it. For example, you have the following:

    #!/usr/bin/perl
    use strict;
    use warnings;

    my $text = "I have 37 balloons";

	my $num;
    # Need to make $num = 37
    print "Bob says that he has $num balloons\n";

Solution: use Regular Expressions

    #!/usr/bin/perl
    use strict;
    use warnings;
    my $text = "I have 37 balloons";

    $text =~ m/(\d+)/;
    my $num = $1;
    print "Bob says that he has $num balloons\n";

What does that actually do?

    $text =~ m/(\d+)/;

In a regular expression \d means a digit, and a + means one or more. You want to 'match' one or more numbers in the text. The brackets mean to capture what is found into a number variable - e.g. $1.

However, if your text said:

    my $text = "I have 3 bananas and 37 balloons";

And you wanted to know how many balloons you had, the above regular expression wouldn't work. It would match the first number it found, which would be the 3 from '3 bananas'.

There are three different ways to get the number 37 out of the above text using a regular expression. You could get:

    1. The numbers directly before the word 'balloon'
    2. The second set of numbers to appear in the text
    3. The last set of numbers to appear in the text

If you decided the first case, your regular expression would become:

    $text =~ m/(\d+)\s*balloon/;

Which is: one or more digits, followed by zero or more spaces, followed by the word balloon.

The second case would be:

    $text =~ m/\d+[^\d*](\d+)/;

This regular expression is explained as: one or more digits, followed by zero or more non-digits, followed by one or more digits, (which are captured in $1 because of the parenthesis).

The reason this works is that by default perl regular expressions are 'greedy'. That is they will try to capture as many characters as possible.

The final case would become:

    $text =~ m/(\d+)[^\d*]$/;

The $ means the end of the text. So the regular expression means: match one or more digits (capture these digits in $1), followed by zero or more non-digits, followed by the end of the text.

See also

For more information on regular expressions, see:

    perldoc perlretut
    perldoc perlre
Revision: 1.5 [Top]