Home » Tutorials » Parsing xml |
In this tutorial you will learn how to parse some simple XML. We've provided an example XML file below. This is the file we used for all our testing, called test.xml.
<?xml version="1.0" encoding="ISO-8859-1"?> <TEST> <PERSON name="Melissa"> <PET>Cat</PET> <AGE>24</AGE> <CAR>Y</CAR> </PERSON> <PERSON name="Thomas"> <AGE>28</AGE> <CAR>N</CAR> </PERSON> </TEST>
The Expat library, available from SourceForge, is commonly used to build and parse XML. The Perl mobulde XML::Parser (and related modules) is a very powerful modules for parsing XML in many different formats. However, because of the power of the module, the output can be difficult to follow. The XML::Simple module provides a simple interface to the output of this, and other, XML module.
The script below uses XML::Simple to read the XML from test.xml into a simple hash structure. We are using Data::Dumper to show you the output easily.
#!/usr/bin/perl use strict; use warnings; use XML::Simple; use Data::Dumper; my $file = 'test.xml'; my $test_data = XMLin($file); print Dumper($test_data);
Using our test.xml this would produce the following output:
$VAR1 = { 'PERSON' => { 'Thomas' => { 'CAR' => 'N', 'AGE' => '28' }, 'Melissa' => { 'CAR' => 'Y', 'AGE' => '24', 'PET' => 'Cat' } } };
So if you wanted a script that printed how old everyone was, you could write:
#!/usr/bin/perl use strict; use warnings; use XML::Simple; use Data::Dumper; my $file = 'test.xml'; my $test_data = XMLin($file); foreach my $person (keys %{$test_data->{PERSON}}) { print $person . ' is ' . $test_data->{PERSON}->{$person}->{AGE} . "\n"; }
Which would produce:
Thomas is 28 Melissa is 24
More like XML::Difficult
#!/usr/bin/perl use strict; use warnings; use XML::Smart; use Data::Dumper; my $file = 'test.xml'; my $test_data = XML::Smart->new($file); my $cat = $test_data->{TEST}{PERSON}{CAR}; print "CAR: $cat\n";
Easier than XML::Parser.
#!/usr/bin/perl use strict; use warnings; use XML::Parser; use XML::Parser::EasyTree; my $file = 'test.xml'; $XML::Parser::EasyTree::Noempty = 1; my $p = XML::Parser->new( Style => 'EasyTree' ); my $tree = $p->parsefile($file); print $tree->[0]->{content}->[0]->{content}->[0]->{name} . ": "; print $tree->[0]->{content}->[0]->{content}->[0]->{content}->[0]->{content} . "\n";
This module provides a pure Perl XML parser. Unlink XML::Parser it does not require any external libraries or modules. The parse subroutine accepts a string of xml (not a filename), and the toHash function builds the xml into a hash structure much like that in XML::Simple.
The program below parses the example test.xml file and we use Data::Dumper to display the output:
#!/usr/bin/perl use strict; use warnings; use XML::Mini::Document; use Data::Dumper; my $file = 'test.xml'; open (XML, $file) or die $!; undef($/); my $xml = <XML>; close XML; $/ = "\n"; my $xml_doc = XML::Mini::Document->new(); $xml_doc->parse($xml); my $test_data = $xml_doc->toHash(); print Dumper($test_data);
The output of this program would be:
$VAR1 = { 'xml' => { 'version' => '1.0', 'encoding' => 'ISO-8859-1' }, 'TEST' => { 'PERSON' => [ { 'CAR' => 'Y', 'AGE' => '24', 'PET' => 'Cat', 'name' => 'Melissa' }, { 'CAR' => 'N', 'AGE' => '28', 'name' => 'Thomas' } ] } };
Note that attributes (i.e. name) are treated the same as children tags of a node. For example, if we added a tag called 'name' to the Melissa Person, the output of the above program would be:
$VAR1 = { 'xml' => { 'version' => '1.0', 'encoding' => 'ISO-8859-1' }, 'TEST' => { 'PERSON' => [ { 'CAR' => 'Y', 'AGE' => '24', 'PET' => 'Cat', 'name' => [ 'Melissa', 'Extra Name' ] }, { 'CAR' => 'N', 'AGE' => '28', 'name' => 'Thomas' } ] } };
perldoc XML::Simple perldoc XML::Parser perldoc XML::Smart perldoc XML::Parser::EasyTree perldoc Data::Dumper perldoc XML::Mini