Using the Perl split() function
Introduction
The split()
function is used to split a string into smaller sections. You can split a string on a single character, a group of characers or a regular expression (a pattern).
You can also specify how many pieces to split the string into. This is better explained in the examples below.
Example 1. Splitting on a character
A common use of split()
is when parsing data from a file or from another program. In this example, we will split the string on the comma ‘,’. Note that you typically should not use split()
to parse CSV (comma separated value) files in case there are commas in your data: use Text::CSV instead.
This program produces the following output:
Example 2. Splitting on a string
In the same way you use a character to split, you can use a string. In this example, the data is separated by three tildas ‘~~~’.
This outputs:
Example 3. Splitting on a pattern
In some cases, you may want to split the string on a pattern (regular expression) or a type of character. We’ll assume here that you know a little about regular expressions. In this example we will split on any integer:
The output of this program is:
Example 4. Splitting on an undefined value
If you split on an undefined value, the string will be split on every character:
The results of this program are:
Example 5. Splitting on a space
If you use a space ‘ ‘ to split on, it will actually split on any kind of space including newlines and tabs (regular expression /\s+/) rather than just a space. In this example we print ‘aa’ either side of the values so we can see where the split took place:
This produces:
As you can see, it has split on the newlines that were in our data. If you really want to split on a space, use regular expressions:
Example 6. Delimiter at the start of the string
If the delimiter is at the start of the string then the first element in the array of results will be empty. We’ll print fixed text with each line so that you can see the blank one:
The output of this program is:
Example 7. Split and context
If you do not pass in a string to split, then split()
will use $_. If you do not pass an expression or string to split on, then split()
will use ‘ ‘:
This produces:
Example 8. Limiting the split
You can limit the number of sections the string will be split into. You can do this by passing in a positive integer as the third argument. In this example, we’re splitting our data into 3 fields – even though there are 4 occurrances of the delimiter:
This program produces:
Example 9. Keeping the delimiter
Sometimes, when splitting on a pattern, you want the delimiter in the result of the split. You can do this by capturing the characters you want to keep inside parenthesis. Let’s do our regular expression example again, but this time we’ll keep the numbers in the result:
The output is:
Example 10. Splitting into a hash
If you know a bit about your data, you could split it directly into a hash instead of an array:
The output of this program is:
The problem is that if the data does not contain exactly what you think, for example FIRSTFIELD=1;SECONDFIELD=2;THIRDFIELD=
then you will get an ‘Odd number of elements in hash assignment’ warning. Here is the output of the same program but with this new data: