| Home » Tutorials » Parsing xml |
In this tutorial you will learn how to parse some simple XML. We've provided an example XML file below. This is the file we used for all our testing, called test.xml.
<?xml version="1.0" encoding="ISO-8859-1"?>
<TEST>
<PERSON name="Melissa">
<PET>Cat</PET>
<AGE>24</AGE>
<CAR>Y</CAR>
</PERSON>
<PERSON name="Thomas">
<AGE>28</AGE>
<CAR>N</CAR>
</PERSON>
</TEST>
The Expat library, available from SourceForge, is commonly used to build and parse XML. The Perl mobulde XML::Parser (and related modules) is a very powerful modules for parsing XML in many different formats. However, because of the power of the module, the output can be difficult to follow. The XML::Simple module provides a simple interface to the output of this, and other, XML module.
The script below uses XML::Simple to read the XML from test.xml into a simple hash structure. We are using Data::Dumper to show you the output easily.
#!/usr/bin/perl
use strict;
use warnings;
use XML::Simple;
use Data::Dumper;
my $file = 'test.xml';
my $test_data = XMLin($file);
print Dumper($test_data);
Using our test.xml this would produce the following output:
$VAR1 = {
'PERSON' => {
'Thomas' => {
'CAR' => 'N',
'AGE' => '28'
},
'Melissa' => {
'CAR' => 'Y',
'AGE' => '24',
'PET' => 'Cat'
}
}
};
So if you wanted a script that printed how old everyone was, you could write:
#!/usr/bin/perl
use strict;
use warnings;
use XML::Simple;
use Data::Dumper;
my $file = 'test.xml';
my $test_data = XMLin($file);
foreach my $person (keys %{$test_data->{PERSON}}) {
print $person . ' is ' . $test_data->{PERSON}->{$person}->{AGE} . "\n";
}
Which would produce:
Thomas is 28
Melissa is 24
More like XML::Difficult
#!/usr/bin/perl
use strict;
use warnings;
use XML::Smart;
use Data::Dumper;
my $file = 'test.xml';
my $test_data = XML::Smart->new($file);
my $cat = $test_data->{TEST}{PERSON}{CAR};
print "CAR: $cat\n";
Easier than XML::Parser.
#!/usr/bin/perl
use strict;
use warnings;
use XML::Parser;
use XML::Parser::EasyTree;
my $file = 'test.xml';
$XML::Parser::EasyTree::Noempty = 1;
my $p = XML::Parser->new(
Style => 'EasyTree'
);
my $tree = $p->parsefile($file);
print $tree->[0]->{content}->[0]->{content}->[0]->{name} . ": ";
print $tree->[0]->{content}->[0]->{content}->[0]->{content}->[0]->{content} . "\n";
This module provides a pure Perl XML parser. Unlink XML::Parser it does not require any external libraries or modules. The parse subroutine accepts a string of xml (not a filename), and the toHash function builds the xml into a hash structure much like that in XML::Simple.
The program below parses the example test.xml file and we use Data::Dumper to display the output:
#!/usr/bin/perl
use strict;
use warnings;
use XML::Mini::Document;
use Data::Dumper;
my $file = 'test.xml';
open (XML, $file) or die $!;
undef($/);
my $xml = <XML>;
close XML;
$/ = "\n";
my $xml_doc = XML::Mini::Document->new();
$xml_doc->parse($xml);
my $test_data = $xml_doc->toHash();
print Dumper($test_data);
The output of this program would be:
$VAR1 = {
'xml' => {
'version' => '1.0',
'encoding' => 'ISO-8859-1'
},
'TEST' => {
'PERSON' => [
{
'CAR' => 'Y',
'AGE' => '24',
'PET' => 'Cat',
'name' => 'Melissa'
},
{
'CAR' => 'N',
'AGE' => '28',
'name' => 'Thomas'
}
]
}
};
Note that attributes (i.e. name) are treated the same as children tags of a node. For example, if we added a tag called 'name' to the Melissa Person, the output of the above program would be:
$VAR1 = {
'xml' => {
'version' => '1.0',
'encoding' => 'ISO-8859-1'
},
'TEST' => {
'PERSON' => [
{
'CAR' => 'Y',
'AGE' => '24',
'PET' => 'Cat',
'name' => [
'Melissa',
'Extra Name'
]
},
{
'CAR' => 'N',
'AGE' => '28',
'name' => 'Thomas'
}
]
}
};
perldoc XML::Simple
perldoc XML::Parser
perldoc XML::Smart
perldoc XML::Parser::EasyTree
perldoc Data::Dumper
perldoc XML::Mini