Home » Tutorials » Lwp |
In this tutorial you will learn how to retrieve the source for web pages. The first example covers simply retrieving the page and storing it either in a variable or a file. The second example shows the more complex possibilities available.
This first example uses the very friendly LWP::Simple module. This module allows you to request a url and either store the HTML in a variable, print it, or write it to a file.
In this example we are retrieving the HTML to a variable:
#!/usr/bin/perl use strict; use warnings; use LWP::Simple; my $content = get('http://www.perlmeme.org') or die 'Unable to get page'; exit 0;
The LWP::Simple module provides only a functional interface - that is, there is no object oriented interface to use.
You can also use LWP::Simple to print the web page source directly to STDOUT. It is exactly the same as the previous example except we use getprint
instead of get
.
#!/usr/bin/perl use strict; use warnings; use LWP::Simple; getprint('http://www.perlmeme.org') or die 'Unable to get page'; exit 0;
The third example shows how to get the web page source and write it directly to a file, using LWP::Simple. It uses the getstore
method that outputs the web page source directly to the given filename:
#!/usr/bin/perl use strict; use warnings; use LWP::Simple; getstore('http://www.perlmeme.org', 'test.html') or die 'Unable to get page'; exit 0;
If you want to do more with the web page source than store it, you may want to consider using the full object oriented LWP::UserAgent interface. The package Bundle::LWP contains the standard LWP modules that you will need.
Firstly, to start your script:
#!/usr/bin/perl -w use strict; use warnings; use LWP::UserAgent;
For the Lazy (this is a good thing), you most likely also want to use:
use HTTP::Request::Common qw(POST);
You can export the GET method if you do not need POST.
use HTTP::Request::Common qw(GET);
You then need to define your user agent:
my $ua = LWP::UserAgent->new;
This is the object that acts as a browser and makes requests and receives responses.
Next you need to create the request object that will be used to request the url. Since we are using the HTTP::Request::Common module, we can use the exported POST
method. It accepts a URL as its first parameter, and a list of arguments to be passed to the url (e.g. form arguments).
my $req = POST 'http://www.perlmeme.org', [];
Or passing in form arguments:
my $req = POST 'http://www.perlmeme.org', [name => 'Bob', age => 24];
The GET
method is used in a similar way to the first example:
my $req = GET 'http://www.perlmeme.org';
You can also pass header data to the GET
and POST
methods.
Once you have defined your request object, use the UserAgent to make the request:
my $res = $ua->request($req);
The request
method returns a HTTP::Response object. This object contains the status code of the response, and the content of the page if the request was successful.
You can check if the request was successful by using the is_success
method:
if ($res->is_success) { print $res->content; } else { print $res->status_line . "\n"; }
If you want your program to be represented as a particular agent, for example Mozilla 8.0, you can set this using the agent
method:
$ua->agent('Mozilla/8.0');
Or, for example, an Internet Explorer example:
$ua->agent('Mozilla/4.0 (compatible; MSIE 5.0; Windows 95)');
For whatever reason, you may want your requests to be made through a proxy. You can set different proxies for different protocols. Here is an example of setting a proxy for the ftp protocol:
$ua->proxy(ftp => 'http://some.proxy.com');
Sometimes you will want your program to store the cookies created by retrieved web pages. The LWP bundle provides a HTTP::Cookies module that will handle cookies for you. You need to use this module:
use HTTP::Cookies;
And then set up a cookie_jar:
$au->cookie_jar( HTTP::Cookies->new( file => 'mycookies.txt', autosave => 1 ) );
LWP User Agent will now automatically store the cookies in the specified file, and they cookies will be available to future requests.
If you are requesting any urls using the SSL protocol (for example, a https page) you will first need to install an appropriate SSL module. The two modules currently supported by LWP are Crypt::SSLeay and IO::Socket::SSL. The Crypt::SSLeay module is preferred. Once you have installed either of these modules, you can request SSL encrypted urls just like other urls.
Below is a working script that requests a url and, if successful, prints the contents to standard out.
#!/usr/bin/perl use strict; use warnings; use LWP::UserAgent; use HTTP::Request::Common qw(GET); use HTTP::Cookies; my $ua = LWP::UserAgent->new; # Define user agent type $ua->agent('Mozilla/8.0'); # Cookies $ua->cookie_jar( HTTP::Cookies->new( file => 'mycookies.txt', autosave => 1 ) ); # Request object my $req = GET 'http://www.perlmeme.org'; # Make the request my $res = $ua->request($req); # Check the response if ($res->is_success) { print $res->content; } else { print $res->status_line . "\n"; } exit 0;
perldoc LWP::Simple perldoc lwpcook perldoc LWP