Rohit D'souza: 2008

Thursday, May 22, 2008

Coding ftp in perl

So how do we code ftp in perl.Its easy to get started read on....

Lets see what are the main methods for ftp.First of all one should have the Net:FTP package to use ftp in perl.
Below are the list and explanation of some important ftp class methods which will get us started.

CONSTRUCTOR

* new ([ HOST ] [, OPTIONS ])

This is the constructor for a new Net::FTP object. HOST is the name of the remote host to which an FTP connection is required.

HOST is optional. If HOST is not given then it may instead be passed as the Host option described below.

OPTIONS are passed in a hash like fashion, using key and value pairs. Possible options are:

Host - FTP host to connect to. It may be a single scalar, as defined for the PeerAddr option in IO::Socket::INET, or a reference to an array with hosts to try in turn. The "host" method will return the value which was used to connect to the host.

Firewall - The name of a machine which acts as an FTP firewall. This can be overridden by an environment variable FTP_FIREWALL . If specified, and the given host cannot be directly connected to, then the connection is made to the firewall machine and the string @hostname is appended to the login identifier. This kind of setup is also referred to as an ftp proxy.

FirewallType - The type of firewall running on the machine indicated by Firewall. This can be overridden by an environment variable FTP_FIREWALL_TYPE . For a list of permissible types, see the description of ftp_firewall_type in Net::Config.

BlockSize - This is the block size that Net::FTP will use when doing transfers. (defaults to 10240)

Port - The port number to connect to on the remote machine for the FTP connection

Timeout - Set a timeout value (defaults to 120)

Debug - debug level (see the debug method in Net::Cmd)

Passive - If set to a non-zero value then all data transfers will be done using passive mode. If set to zero then data transfers will be done using active mode. If the machine is connected to the Internet directly, both passive and active mode should work equally well. Behind most firewall and NAT configurations passive mode has a better chance of working. However, in some rare firewall configurations, active mode actually works when passive mode doesn't. Some really old FTP servers might not implement passive transfers. If not specified, then the transfer mode is set by the environment variable FTP_PASSIVE or if that one is not set by the settings done by the libnetcfg utility. If none of these apply then passive mode is used.

Hash - If given a reference to a file handle (e.g., \*STDERR ), print hash marks (#) on that filehandle every 1024 bytes. This simply invokes the hash() method for you, so that hash marks are displayed for all transfers. You can, of course, call hash() explicitly whenever you'd like.

LocalAddr - Local address to use for all socket connections, this argument will be passed to IO::Socket::INET

If the constructor fails undef will be returned and an error message will be in $@

METHODS

Unless otherwise stated all methods return either a true or false value, with true meaning that the operation was a success. When a method states that it returns a value, failure will be returned as undef or an empty list.

* login ([LOGIN [,PASSWORD [, ACCOUNT] ] ])

Log into the remote FTP server with the given login information. If no arguments are given then the Net::FTP uses the Net::Netrc package to lookup the login information for the connected host. If no information is found then a login of anonymous is used. If no password is given and the login is anonymous then anonymous@ will be used for password.

If the connection is via a firewall then the authorize method will be called with no arguments.
* authorize ( [AUTH [, RESP]])

This is a protocol used by some firewall ftp proxies. It is used to authorise the user to send data out. If both arguments are not specified then authorize uses Net::Netrc to do a lookup.
* site (ARGS)

Send a SITE command to the remote server and wait for a response.

Returns most significant digit of the response code.
* ascii

Transfer file in ASCII. CRLF translation will be done if required
* binary

Transfer file in binary mode. No transformation will be done.

Hint: If both server and client machines use the same line ending for text files, then it will be faster to transfer all files in binary mode.
* rename ( OLDNAME, NEWNAME )

Rename a file on the remote FTP server from OLDNAME to NEWNAME . This is done by sending the RNFR and RNTO commands.
* delete ( FILENAME )

Send a request to the server to delete FILENAME .
* cwd ( [ DIR ] )

Attempt to change directory to the directory given in $dir . If $dir is ".." , the FTP CDUP command is used to attempt to move up one directory. If no directory is given then an attempt is made to change the directory to the root directory.
* cdup ()

Change directory to the parent of the current directory.
* pwd ()

Returns the full pathname of the current directory.
* restart ( WHERE )

Set the byte offset at which to begin the next data transfer. Net::FTP simply records this value and uses it when during the next data transfer. For this reason this method will not return an error, but setting it may cause a subsequent data transfer to fail.
* rmdir ( DIR [, RECURSE ])

Remove the directory with the name DIR . If RECURSE is true then rmdir will attempt to delete everything inside the directory.
* mkdir ( DIR [, RECURSE ])

Create a new directory with the name DIR . If RECURSE is true then mkdir will attempt to create all the directories in the given path.

Returns the full pathname to the new directory.
* alloc ( SIZE [, RECORD_SIZE] )

The alloc command allows you to give the ftp server a hint about the size of the file about to be transferred using the ALLO ftp command. Some storage systems use this to make intelligent decisions about how to store the file. The SIZE argument represents the size of the file in bytes. The RECORD_SIZE argument indicates a maximum record or page size for files sent with a record or page structure.

The size of the file will be determined, and sent to the server automatically for normal files so that this method need only be called if you are transferring data from a socket, named pipe, or other stream not associated with a normal file.
* ls ( [ DIR ] )

Get a directory listing of DIR , or the current directory.

In an array context, returns a list of lines returned from the server. In a scalar context, returns a reference to a list.
* dir ( [ DIR ] )

Get a directory listing of DIR , or the current directory in long format.

In an array context, returns a list of lines returned from the server. In a scalar context, returns a reference to a list.
* get ( REMOTE_FILE [, LOCAL_FILE [, WHERE]] )

Get REMOTE_FILE from the server and store locally. LOCAL_FILE may be a filename or a filehandle. If not specified, the file will be stored in the current directory with the same leafname as the remote file.

If WHERE is given then the first WHERE bytes of the file will not be transferred, and the remaining bytes will be appended to the local file if it already exists.

Returns LOCAL_FILE , or the generated local file name if LOCAL_FILE is not given. If an error was encountered undef is returned.
* put ( LOCAL_FILE [, REMOTE_FILE ] )

Put a file on the remote server. LOCAL_FILE may be a name or a filehandle. If LOCAL_FILE is a filehandle then REMOTE_FILE must be specified. If REMOTE_FILE is not specified then the file will be stored in the current directory with the same leafname as LOCAL_FILE .

Returns REMOTE_FILE , or the generated remote filename if REMOTE_FILE is not given.

NOTE: If for some reason the transfer does not complete and an error is returned then the contents that had been transferred will not be remove automatically.
* put_unique ( LOCAL_FILE [, REMOTE_FILE ] )

Same as put but uses the STOU command.

Returns the name of the file on the server.
* append ( LOCAL_FILE [, REMOTE_FILE ] )

Same as put but appends to the file on the remote server.

Returns REMOTE_FILE , or the generated remote filename if REMOTE_FILE is not given.
* unique_name ()

Returns the name of the last file stored on the server using the STOU command.
* mdtm ( FILE )

Returns the modification time of the given file
* size ( FILE )

Returns the size in bytes for the given file as stored on the remote server.

NOTE: The size reported is the size of the stored file on the remote server. If the file is subsequently transferred from the server in ASCII mode and the remote server and local machine have different ideas about "End Of Line" then the size of file on the local machine after transfer may be different.
* supported ( CMD )

Returns TRUE if the remote server supports the given command.
* hash ( [FILEHANDLE_GLOB_REF],[ BYTES_PER_HASH_MARK] )

Called without parameters, or with the first argument false, hash marks are suppressed. If the first argument is true but not a reference to a file handle glob, then \*STDERR is used. The second argument is the number of bytes per hash mark printed, and defaults to 1024. In all cases the return value is a reference to an array of two: the filehandle glob reference and the bytes per hash mark.
* feature ( NAME )

Determine if the server supports the specified feature. The return value is a list of lines the server responded with to describe the options that it supports for the given feature. If the feature is unsupported then the empty list is returned.

if ($ftp->feature( 'MDTM' )) {
# Do something
}

if (grep { /\bTLS\b/ } $ftp->feature('AUTH')) {
# Server supports TLS
}

The following methods can return different results depending on how they are called. If the user explicitly calls either of the pasv or port methods then these methods will return a true or false value. If the user does not call either of these methods then the result will be a reference to a Net::FTP::dataconn based object.

* nlst ( [ DIR ] )

Send an NLST command to the server, with an optional parameter.
* list ( [ DIR ] )

Same as nlst but using the LIST command
* retr ( FILE )

Begin the retrieval of a file called FILE from the remote server.
* stor ( FILE )

Tell the server that you wish to store a file. FILE is the name of the new file that should be created.
* stou ( FILE )

Same as stor but using the STOU command. The name of the unique file which was created on the server will be available via the unique_name method after the data connection has been closed.
* appe ( FILE )

Tell the server that we want to append some data to the end of a file called FILE . If this file does not exist then create it.

If for some reason you want to have complete control over the data connection, this includes generating it and calling the response method when required, then the user can use these methods to do so.

However calling these methods only affects the use of the methods above that can return a data connection. They have no effect on methods get , put , put_unique and those that do not require data connections.

* port ( [ PORT ] )

Send a PORT command to the server. If PORT is specified then it is sent to the server. If not, then a listen socket is created and the correct information sent to the server.
* pasv ()

Tell the server to go into passive mode. Returns the text that represents the port on which the server is listening, this text is in a suitable form to sent to another ftp server using the port method.

The following methods can be used to transfer files between two remote servers, providing that these two servers can connect directly to each other.

* pasv_xfer ( SRC_FILE, DEST_SERVER [, DEST_FILE ] )

This method will do a file transfer between two remote ftp servers. If DEST_FILE is omitted then the leaf name of SRC_FILE will be used.
* pasv_xfer_unique ( SRC_FILE, DEST_SERVER [, DEST_FILE ] )

Like pasv_xfer but the file is stored on the remote server using the STOU command.
* pasv_wait ( NON_PASV_SERVER )

This method can be used to wait for a transfer to complete between a passive server and a non-passive server. The method should be called on the passive server with the Net::FTP object for the non-passive server passed as an argument.
* abort ()

Abort the current data transfer.
* quit ()

Send the QUIT command to the remote FTP server and close the socket connection.

Methods for the adventurous

Net::FTP inherits from Net::Cmd so methods defined in Net::Cmd may be used to send commands to the remote FTP server.

* quot (CMD [,ARGS])

Send a command, that Net::FTP does not directly support, to the remote server and wait for a response.

Returns most significant digit of the response code.

WARNING This call should only be used on commands that do not require data connections. Misuse of this method can hang the connection.

THE dataconn CLASS

Some of the methods defined in Net::FTP return an object which will be derived from this class.The dataconn class itself is derived from the IO::Socket::INET class, so any normal IO operations can be performed. However the following methods are defined in the dataconn class and IO should be performed using these.

* read ( BUFFER, SIZE [, TIMEOUT ] )

Read SIZE bytes of data from the server and place it into BUFFER , also performing any translation necessary. TIMEOUT is optional, if not given, the timeout value from the command connection will be used.

Returns the number of bytes read before any translation.
* write ( BUFFER, SIZE [, TIMEOUT ] )

Write SIZE bytes of data from BUFFER to the server, also performing any translation necessary. TIMEOUT is optional, if not given, the timeout value from the command connection will be used.

Returns the number of bytes written before any translation.
* bytes_read ()

Returns the number of bytes read so far.
* abort ()

Abort the current data transfer.
* close ()

Close the data connection and get a response from the FTP server. Returns true if the connection was closed successfully and the first digit of the response from the server was a '2'.

UNIMPLEMENTED

The following RFC959 commands have not been implemented:

* SMNT

Mount a different file system structure without changing login or accounting information.
* HELP

Ask the server for "helpful information" (that's what the RFC says) on the commands it accepts.
* MODE

Specifies transfer mode (stream, block or compressed) for file to be transferred.
* SYST

Request remote server system identification.
* STAT

Request remote server status.
* STRU

Specifies file structure for file to be transferred.
* REIN

Reinitialize the connection, flushing all I/O and account information.
Now lets look at a simple example

#!/usr/bin/perl
use Net::FTP;##########Define the ftp class

###########this is error printing subroutine###################
sub gettheerror {
print "Error in your ftp commands are: \n";
print @ERRORS;
exit 0;
}
#first we have to define the ftp hostname and the directory to browse
my $ftphost="yourftpsite.com";
my $browsedirectory="rohitdsouza";

$ftp=Net::FTP->new($ftphost,Timeout=>350) or $gettheerror=1;#####this connects to the host or throws an error
push @ERRORS, "Can't ftp to $host: $!\n" if $gettheerror;#######we push the arrays onto a array which we will display in the error display routine
$gettheerror() if $gettheerror;
print "Connected to te ftp host $ftphost\n";

$ftp->login("rohitdsouza","password") or $gettheerror=1;########now login to the account
#If no information is found then a login of anonymous is used. If no password is given and the login is anonymous then anonymous@ will be used for password.

#If the connection is via a firewall then the authorize method will be called with no arguments.

print "Getting file list";
push @ERRORS, "Can't login to $host: $!\n" if $gettheerror;
$ftp->quit if $gettheerror;
gettheerror() if $gettheerror;
print "Logged in\n";

$ftp->cwd($browsedirectory) or $gettheerror=1; now chnage the current working directory to the desired directory
push @ERRORS, "Can't cd $!\n" if $gettheerror;
gettheerror() if $gettheerror;
$ftp->quit if $gettheerror;

@files=$ftp->dir or $gettheerror=1;# do the file listing by puuting the file names in an array
push @ERRORS, "Can't get file list $!\n" if $gettheerror;
gettheerror() if $gettheerror;
print "Got file list\n";
foreach(@files) {
print "$_\n";#######print the files here
}
$ftp->quit;

Monday, May 19, 2008

How backup and restore mysql databases

I thought to write about this topic as there are many developers around who want to take a dump of mysql database from the live server to the local testing server.

They then search the web if they dont know how to do it.This article is obviously dedicated to those people.

so lets get started..........

first step take a dump of the whole mysql database using below command on CLI

mysqldump –-user [user name] –-password=[password] [database name] > [dump file]

Let's take a look at each of the arguments that can be passed to the mysqldump utility, as shown above:

* --user [user name]: The --user flag followed by a valid MySQL username tells MySQL the username of the account that we want to use to perform the database dump. MySQL user accounts are stored in the "user" table of the "mysql" database. You can view a list of users and their permissions for your MySQL server by using the following code at the MySQL console application:

use mysql;

select * from user;
* --password=[password]: The password for the user account mentioned above.
* [database name]: The name of the database that we would like the mysqldump utility to backup. Instead of specifying one single database name, we could use either --databases or --all-databases to backup every single database on our MySQL server.
* > [dump file]: If you're familiar with DOS and batch files, then you will know that the ">" symbol specifies that we are directing output to a stream, port, or file. For the mysqldump utility, we prepend a ">" to the filename we would like our database to be backed up to. If no path is specified for the file, then it will be created in the current directory.

Lets take a example

lets say i want to back up a database named rohitdata on my live server into rohitdata.txt for subsequent import to my local server.

The syntax for this is as follows

mysqldump –-user='rohitdsouza' –-password='rohitdsouza' rohitdata > rohitdata.txt

Only catch here is one should create a database first if it doesn't exists before executing the above command.

for that us create database rohitdata;

Ok after we execute this command successfully and we have our dump file its just a matter of one more command on the CLI on our local server to restore the database

mysql --user='rohit' --password='rohit' rohitdata < rohitdata.txt

the dump file should be in the current directory if no path is specified.

Thats all.Keep rocking people.
also check my cool blog www.rohitdsouza.blogspot.com

Saturday, May 17, 2008

Generating unique id in php

Hey people.

I had gone to a interview some years back and the interviewer had asked me a question about how does one generate an uuid in php and at that time i never used or knew there was something like uniqid function in php as i used to generate a uuid myself combining various unique parameters from a web request.Then that guy told me he has a good alternative to my code and it takes less lines of codes.I now use this function often as it suits my purpose.

So lets see how we can generate a uuid in php .

$getmy_uuid=md5(uniqid(rand(),true));

?>

Thats all isn't that a kewl single line of code.To know what the uniqid parameters are
read this article http://in.php.net/manual/en/function.uniqid.php.

Regular expressions in php

Hey folks,

Sorry for not posting articles regularly on this blog.Anyways i am gonna take you through a kinda crash course in regular expressions in php.

I am really found of regular expressions as they help me to accomplish many tasks like scraping data from the web,matching patterns in config file on my Linux box etc.

so lets get started.

The first set of functions one can use for pattern matching are ereg functions.

The ereg functionsfor pattern matching requires one to specify the regular expression as a string,for example ereg('regexpression', "subject") checks if regexpression matches subject string. One should use a single quoted string when passing a regular expression as a literal string. We do this as several special characters like the dollar and backslash are also special characters in double-quoted PHP strings, but not in single-quoted PHP strings.
Remember friends double quoted string in scripting languages like php /perl interpolate the variables if present any inside the string.

int ereg (string pattern, string subject [, array groups]) returns the length of the match if the regular expression pattern matches the subject string or part of the subject string, or zero otherwise. Since zero evaluates to False and non-zero evaluates to True, you can use ereg in an if statement to test for a match. If you specify the third parameter, ereg will store the substring matched by the part of the regular expression between the first pair of round brackets in $groups[1]. $groups[2] will contain the second pair, and so on. Note that grouping-only round brackets are not supported by ereg. ereg is case sensitive. eregi is the case insensitive equivalent.
string ereg_replace (string pattern, string replacement, string subject) replaces all matches of the regex patten in the subject string with the replacement string. You can use backrefrences in the replacement string. \\0 is the entire regex match, \\1 is the first backreference, \\2 the second, etc. The highest possible backreference is \\9. ereg_replace is case sensitive. eregi_replace is the case insensitive equivalent.

Example:
if(ereg("([0-9]{1,3}).([0-9]{1,3}).([0-9]{1,3}).([0-9]{1,3})",
$_SERVER['REMOTE_ADDR'],$regs))
{
echo "Last 2 octets of the remote ip are: "."$regs[3].$regs[4]";
}
else{

echo "Invalid remote ip ".$_SERVER['REMOTE_ADDR'];
}
$string = "24/6/1982";
echo ereg_replace("([0-9]+)/([0-9]+)/([0-9]+)",
"\\3-\\2-\\1", $string);
///changing date to mysql date format
?>

But people I advice all the developers to use preg_match as it is faster than ereg.

So now let’s look at preg functions. They are kewl and the main advantage they have when they are pitted against the ereg functions is faster execution time.

All of the preg functions require you to specify the regular expression as a string using Perl syntax. In Perl use of /regexpression/ defines a regular expression while in PHP, this becomes preg_match('/regexpression/', $subject).Remember all forward slashes in the regular expression have to be escaped with a backslash as they signify the delimitation of the regular expression to match.
To specify regexpression matching options such as case insensitivity are specified in the same way as in Perl. '/regexpression/i' applies the regexpression case insensitively. '/regexpression/s' makes the dot match all characters. '/regexpression/m' makes the start and end of line anchors match at embedded newlines in the subject string. '/regexpression/x' turns on free spacing mode. You can specify multiple letters to turn on several options. '/regexpression/misx' turns on all four options.
A special option is the /u which turns on the unicode matching mode, instead of the default 8-bit matching mode. You should specify /u for regular expressions that use \x{FFFF}, \X or \p{L} to match Unicode characters, graphemes, properties or scripts. PHP will interpret '/regexpression/u' as a UTF-8 string rather than as an ASCII string.
Like the ereg function, bool preg_match (string pattern, string subject [, array groups]) returns TRUE if the regular expression pattern matches the subject string or part of the subject string. If you specify the third parameter, preg will store the substring matched by the first capturing group in $groups[1]. $groups[2] will contain the second pair, and so on. If the regex pattern uses named capture, you can access the groups by name with $groups['name']. $groups[0] will hold the overall match.
int preg_match_all (string pattern, string subject, array matches, int flags) fills the array "matches" with all the matches of the regular expression pattern in the subject string. If you specify PREG_SET_ORDER as the flag, then $matches[0] is an array containing the match and backreferences of the first match, just like the $groups array filled by preg_match. $matches[1] holds the results for the second match, and so on. If you specify PREG_PATTERN_ORDER, then $matches[0] is an array with full subsequent regex matches, $matches[1] an array with the first backreference of all matches, $matches[2] an array with the second backreference of each match, etc.
array preg_grep (string pattern, array subjects) returns an array that contains all the strings in the array "subjects" that can be matched by the regular expression pattern.
mixed preg_replace (mixed pattern, mixed replacement, mixed subject [, int limit]) returns a string with all matches of the regex pattern in the subject string replaced with the replacement string. At most limit replacements are made. One key difference is that all parameters, except limit, can be arrays instead of strings. In that case, preg_replace does its job multiple times, iterating over the elements in the arrays simultaneously. You can also use strings for some parameters, and arrays for others. Then the function will iterate over the arrays, and use the same strings for each iteration. Using an array of the pattern and replacement, allows you to perform a sequence of search and replace operations on a single subject string. Using an array for the subject string, allows you to perform the same search and replace operation on many subject strings.
preg_replace_callback (mixed pattern, callback replacement, mixed subject [, int limit]) works just like preg_replace, except that the second parameter takes a callback instead of a string or an array of strings. The callback function will be called for each match. The callback should accept a single parameter. This parameter will be an array of strings, with element 0 holding the overall regex match, and the other elements the text matched by capturing groups. This is the same array you'd get from preg_match. The callback function should return the text that the match should be replaced with. Return an empty string to delete the match. Return $groups[0] to skip this match.
array preg_split (string pattern, string subject [, int limit]) works just like split, except that it uses the Perl syntax for the regex pattern.

Lets look at some examples now.

Example:-

if(preg_match('/([a-zA-Z\s]+) scored (\d+) out of
(\d+) in ([a-zA-Z]+)/','Rohit Dsouza scored 97 out of 100 in
maths',$matches)){

#print "The Student named ".$matches[1]." scored ".$matches[2]."/".
$matches[3]." in ".$matches[4]."\n";

}
else{

print "match not found";

}

if(preg_match_all('/([a-zA-Z\s]+) scored (\d+) out of (\d+) in
([a-zA-Z]+)/','Rohit Dsouza scored 97 out of 100 in maths John
Dsouza scored 95 out of 100 in Physics',$matches)){
#print_r($matches);
#print "Count:".count($matches[0]);

for($c=0;$c
print "The Student named ".$matches[1][$c]." scored ".$matches[2][$c]."/".$matches[3][$c]." in ".$matches[4][$c]."\n";

}
}
else{

print "match not found";

}

print_r(preg_grep('/([a-zA-Z\s]+) scored (\d+) out of (\d+) in ([a-zA-Z]+)/',array('Rohit Dsouza scored 97 out of 100 in maths','John Dsouza scored 95 out of 100 in Physics')));

$string='rohit dsouza is smart';
print preg_replace('/rohit dsouza/','rajeev nair',$string);

$string='Rohit Dsouza scored 97 out of 100 in maths john Dsouza scored 93 out of 100 in physics';

echo preg_replace_callback ('/([a-zA-Z\s]+) scored (\d+)
out of (\d+) in ([a-zA-Z]+)/','call_replacement',$string);
function call_replacement($matches){

###################replace and return code here#####################
return "$matches[1] is a genius";

#####################################################################

}
#print $string;
$split_array=preg_split('/\s/','rohit dsouza');
print "My First name is ".$split_array[0]." and my Surname is ".$split_array[1];

?>

Thats all.if you guys have any query the do mail me at rajdsouza at yahoo dot com

Wednesday, January 16, 2008

How to Start writing Web Crawler in perl

Hey People,

Crawlers are kewl.i always thought writing crawlers is a tough job.Until i had to write one.I am posting a small code snippet of how to get started to write ur own crawler in perl.

The packages which u will require is
HTML::TreeBuilder
LWP::UserAgent
HTTP::Headers
URI::Escape

Here how to start up.

if u want to crawl results for search text "rohit d'souza" then u start up with google as below.

$searchtext=uri_escape("rohit d'souza")
$ua = LWP::UserAgent->new;#ur user agent
$ua->agent("Mozilla/5.0");
$ua->timeout(3000000);
# Create a request
my $req = HTTP::Request->new(GET => "http://www.google.co.in/search?hl=en&q=$searchtext&btnG=Google+Search&meta=
");
my $res = $ua->request($req);

if ($res->is_success) {
$tree = HTML::TreeBuilder->new_from_content($res->content);
if(defined $tree->look_down( '_tag' => 'a' )){
@getlinks=$tree->look_down( '_tag' => 'a' );

for($b=0;$b<@getlinks;$b++){

if($getlinks[$b]->attr('href') and $getlinks[$b]->attr('href')!~/google|orkut|^\/(.*)/gi ){
push(@links,$getlinks[$b]->attr('href'));
#####################################################################
Do repeated processing here to crawl other pages from the links extracted
#####################################################################
}
}

}
else
{
print "got failure".$res->status_line;

}

Thats all u can build a full fledge crawler using this login.Mind u the above code provided is a simple one.i suggest to build a professional crawler u need to build a OOPS structure and use functions to call repeatedly called code snippet.

If u have any queries u can mail me at topindiancoder@gmail.com.

One who never fails never grows an inch

Rohit D'souza