Tweet

How can I count the separator characters in a string?

You have a string, which you are splitting. For example:

    a,b,c,,d,e,,,,

You want to split this string at each comma. But you also want to know how many commas were in the string.

Why you can't use split

The split function will drop any null fields left at the end of the string. For example, if you used the following code:

    #!/usr/bin/perl
    use strict;
    use warnings;

    my $string1 = "a,b,c,,d,e,,,,";
    my $string2 = "a,b,c,,d,e";

    my @nums1 = split(",", $string1);
    my @nums2 = split(",", $string2);

    print "The first string has " . scalar(@nums1) . " fields\n";
    print "The second string has " . scalar(@nums2) . " fields\n";

You'd get the following output:

    The first string has 6 fields
    The second string has 6 fields

But the answers you would want are 10 fields (so 9 commas) for the first string, and 6 fields (5 commas) for the second string.

Solution 1

There are a number of solutions to this problem.

The first uses the C(tr> operator of regular expressions:

    #!/usr/bin/perl
    use strict;
    use warnings;

    my $string1 = "a,b,c,,d,e,,,,";
    my $string2 = "a,b,c,,d,e";

    my $num1 = $string1 =~ tr/,//;
    my $num2 = $string2 =~ tr/,//;

    print "The first string has $num1 commas\n";
    print "The second string has $num2 commas\n";

This gives you the following output:

    The first string has 9 commas
    The second string has 5 commas

The tr operator returns the number of characters replaced or deleted. So in this case it gives you the number of commas in the string.

Solution 2

The second solution is a little more readable:

    #!/usr/bin/perl
    use strict;
    use warnings;

    my $string1 = "a,b,c,,d,e,,,,";
    my $string2 = "a,b,c,,d,e";

    my $num1;
    my $num2;

    $num1++ while ($string1 =~ m/,/g);
    $num2++ while ($string2 =~ m/,/g);

    print "The first string has $num1 commas\n";
    print "The second string has $num2 commas\n";

This gives you the correct output, as follows:

    The first string has 9 commas
    The second string has 5 commas

The g modifier of the regular expression searches globally. That is it matches each occurance of the pattern in the string in turn. In this example we simply increment a counter each time a comma is matched.

See also

Revision: 1.4 [Top]