Tweet

Parse XML

In this tutorial you will learn how to parse some simple XML. We've provided an example XML file below. This is the file we used for all our testing, called test.xml.

    <?xml version="1.0" encoding="ISO-8859-1"?>
    <TEST>
      <PERSON name="Melissa">
        <PET>Cat</PET>
        <AGE>24</AGE>
        <CAR>Y</CAR>
      </PERSON>
      <PERSON name="Thomas">
        <AGE>28</AGE>
        <CAR>N</CAR>
      </PERSON>
    </TEST>

XML::Simple

The Expat library, available from SourceForge, is commonly used to build and parse XML. The Perl mobulde XML::Parser (and related modules) is a very powerful modules for parsing XML in many different formats. However, because of the power of the module, the output can be difficult to follow. The XML::Simple module provides a simple interface to the output of this, and other, XML module.

The script below uses XML::Simple to read the XML from test.xml into a simple hash structure. We are using Data::Dumper to show you the output easily.

    #!/usr/bin/perl
    use strict;
    use warnings;
    use XML::Simple;
    use Data::Dumper;

    my $file = 'test.xml';

    my $test_data = XMLin($file);

    print Dumper($test_data);

Using our test.xml this would produce the following output:

    $VAR1 = {
              'PERSON' => {
                            'Thomas' => {
                                          'CAR' => 'N',
                                          'AGE' => '28'
                                        },
                            'Melissa' => {
                                           'CAR' => 'Y',
                                           'AGE' => '24',
                                           'PET' => 'Cat'
                                         }
                          }
            };

So if you wanted a script that printed how old everyone was, you could write:

    #!/usr/bin/perl
    use strict;
    use warnings;
    use XML::Simple;
    use Data::Dumper;

    my $file = 'test.xml';

    my $test_data = XMLin($file);

    foreach my $person (keys %{$test_data->{PERSON}}) {
        print $person . ' is ' . $test_data->{PERSON}->{$person}->{AGE} . "\n";
    }

Which would produce:

    Thomas is 28
    Melissa is 24

XML::Smart

More like XML::Difficult

    #!/usr/bin/perl
    use strict;
    use warnings;
    use XML::Smart;
    use Data::Dumper;

    my $file = 'test.xml';

    my $test_data = XML::Smart->new($file);

    my $cat = $test_data->{TEST}{PERSON}{CAR};
    print "CAR: $cat\n";

XML::Parser::EasyTree

Easier than XML::Parser.

    #!/usr/bin/perl
    use strict;
    use warnings;
    use XML::Parser;
    use XML::Parser::EasyTree;

    my $file = 'test.xml';

    $XML::Parser::EasyTree::Noempty = 1;

    my $p = XML::Parser->new(
        Style => 'EasyTree'
    );

    my $tree = $p->parsefile($file);

    print $tree->[0]->{content}->[0]->{content}->[0]->{name} . ": ";
    print $tree->[0]->{content}->[0]->{content}->[0]->{content}->[0]->{content} . "\n";

XML::Mini

This module provides a pure Perl XML parser. Unlink XML::Parser it does not require any external libraries or modules. The parse subroutine accepts a string of xml (not a filename), and the toHash function builds the xml into a hash structure much like that in XML::Simple.

The program below parses the example test.xml file and we use Data::Dumper to display the output:

    #!/usr/bin/perl
    use strict;
    use warnings;
    use XML::Mini::Document;
    use Data::Dumper;

    my $file = 'test.xml';

    open (XML, $file) or die $!;
    undef($/);
    my $xml = <XML>;
    close XML;
    $/ = "\n";

    my $xml_doc = XML::Mini::Document->new();
    $xml_doc->parse($xml);

    my $test_data = $xml_doc->toHash();

    print Dumper($test_data);

The output of this program would be:

    $VAR1 = {
              'xml' => {
                         'version' => '1.0',
                         'encoding' => 'ISO-8859-1'
                       },
              'TEST' => {
                          'PERSON' => [
                                        {
                                          'CAR' => 'Y',
                                          'AGE' => '24',
                                          'PET' => 'Cat',
                                          'name' => 'Melissa'
                                        },
                                        {
                                          'CAR' => 'N',
                                          'AGE' => '28',
                                          'name' => 'Thomas'
                                        }
                                      ]
                        }
            };

Note that attributes (i.e. name) are treated the same as children tags of a node. For example, if we added a tag called 'name' to the Melissa Person, the output of the above program would be:

    $VAR1 = {
              'xml' => {
                         'version' => '1.0',
                         'encoding' => 'ISO-8859-1'
                       },
              'TEST' => {
                          'PERSON' => [
                                        {
                                          'CAR' => 'Y',
                                          'AGE' => '24',
                                          'PET' => 'Cat',
                                          'name' => [
                                                      'Melissa',
                                                      'Extra Name'
                                                    ]
                                        },
                                        {
                                          'CAR' => 'N',
                                          'AGE' => '28',
                                          'name' => 'Thomas'
                                        }
                                      ]
                        }
            };

See also

    perldoc XML::Simple
    perldoc XML::Parser
    perldoc XML::Smart
    perldoc XML::Parser::EasyTree
    perldoc Data::Dumper
    perldoc XML::Mini
Revision: 1.3 [Top]