Saturday, May 17, 2008

Regular expressions in php

Hey folks,

Sorry for not posting articles regularly on this blog.Anyways i am gonna take you through a kinda crash course in regular expressions in php.

I am really found of regular expressions as they help me to accomplish many tasks like scraping data from the web,matching patterns in config file on my Linux box etc.


so lets get started.


The first set of functions one can use for pattern matching are ereg functions.

The ereg functionsfor pattern matching requires one to specify the regular expression as a string,for example ereg('regexpression', "subject") checks if regexpression matches subject string. One should use a single quoted string when passing a regular expression as a literal string. We do this as several special characters like the dollar and backslash are also special characters in double-quoted PHP strings, but not in single-quoted PHP strings.
Remember friends double quoted string in scripting languages like php /perl interpolate the variables if present any inside the string.

int ereg (string pattern, string subject [, array groups]) returns the length of the match if the regular expression pattern matches the subject string or part of the subject string, or zero otherwise. Since zero evaluates to False and non-zero evaluates to True, you can use ereg in an if statement to test for a match. If you specify the third parameter, ereg will store the substring matched by the part of the regular expression between the first pair of round brackets in $groups[1]. $groups[2] will contain the second pair, and so on. Note that grouping-only round brackets are not supported by ereg. ereg is case sensitive. eregi is the case insensitive equivalent.
string ereg_replace (string pattern, string replacement, string subject) replaces all matches of the regex patten in the subject string with the replacement string. You can use backrefrences in the replacement string. \\0 is the entire regex match, \\1 is the first backreference, \\2 the second, etc. The highest possible backreference is \\9. ereg_replace is case sensitive. eregi_replace is the case insensitive equivalent.


Example:
if(ereg("([0-9]{1,3}).([0-9]{1,3}).([0-9]{1,3}).([0-9]{1,3})",
$_SERVER['REMOTE_ADDR'],$regs))
{
echo "Last 2 octets of the remote ip are: "."$regs[3].$regs[4]";
}
else{

echo "Invalid remote ip ".$_SERVER['REMOTE_ADDR'];
}
$string = "24/6/1982";
echo ereg_replace("([0-9]+)/([0-9]+)/([0-9]+)",
"\\3-\\2-\\1", $string);
///changing date to mysql date format
?>


But people I advice all the developers to use preg_match as it is faster than ereg.

So now let’s look at preg functions. They are kewl and the main advantage they have when they are pitted against the ereg functions is faster execution time.

All of the preg functions require you to specify the regular expression as a string using Perl syntax. In Perl use of /regexpression/ defines a regular expression while in PHP, this becomes preg_match('/regexpression/', $subject).Remember all forward slashes in the regular expression have to be escaped with a backslash as they signify the delimitation of the regular expression to match.
To specify regexpression matching options such as case insensitivity are specified in the same way as in Perl. '/regexpression/i' applies the regexpression case insensitively. '/regexpression/s' makes the dot match all characters. '/regexpression/m' makes the start and end of line anchors match at embedded newlines in the subject string. '/regexpression/x' turns on free spacing mode. You can specify multiple letters to turn on several options. '/regexpression/misx' turns on all four options.
A special option is the /u which turns on the unicode matching mode, instead of the default 8-bit matching mode. You should specify /u for regular expressions that use \x{FFFF}, \X or \p{L} to match Unicode characters, graphemes, properties or scripts. PHP will interpret '/regexpression/u' as a UTF-8 string rather than as an ASCII string.
Like the ereg function, bool preg_match (string pattern, string subject [, array groups]) returns TRUE if the regular expression pattern matches the subject string or part of the subject string. If you specify the third parameter, preg will store the substring matched by the first capturing group in $groups[1]. $groups[2] will contain the second pair, and so on. If the regex pattern uses named capture, you can access the groups by name with $groups['name']. $groups[0] will hold the overall match.
int preg_match_all (string pattern, string subject, array matches, int flags) fills the array "matches" with all the matches of the regular expression pattern in the subject string. If you specify PREG_SET_ORDER as the flag, then $matches[0] is an array containing the match and backreferences of the first match, just like the $groups array filled by preg_match. $matches[1] holds the results for the second match, and so on. If you specify PREG_PATTERN_ORDER, then $matches[0] is an array with full subsequent regex matches, $matches[1] an array with the first backreference of all matches, $matches[2] an array with the second backreference of each match, etc.
array preg_grep (string pattern, array subjects) returns an array that contains all the strings in the array "subjects" that can be matched by the regular expression pattern.
mixed preg_replace (mixed pattern, mixed replacement, mixed subject [, int limit]) returns a string with all matches of the regex pattern in the subject string replaced with the replacement string. At most limit replacements are made. One key difference is that all parameters, except limit, can be arrays instead of strings. In that case, preg_replace does its job multiple times, iterating over the elements in the arrays simultaneously. You can also use strings for some parameters, and arrays for others. Then the function will iterate over the arrays, and use the same strings for each iteration. Using an array of the pattern and replacement, allows you to perform a sequence of search and replace operations on a single subject string. Using an array for the subject string, allows you to perform the same search and replace operation on many subject strings.
preg_replace_callback (mixed pattern, callback replacement, mixed subject [, int limit]) works just like preg_replace, except that the second parameter takes a callback instead of a string or an array of strings. The callback function will be called for each match. The callback should accept a single parameter. This parameter will be an array of strings, with element 0 holding the overall regex match, and the other elements the text matched by capturing groups. This is the same array you'd get from preg_match. The callback function should return the text that the match should be replaced with. Return an empty string to delete the match. Return $groups[0] to skip this match.
array preg_split (string pattern, string subject [, int limit]) works just like split, except that it uses the Perl syntax for the regex pattern.

Lets look at some examples now.

Example:-

if(preg_match('/([a-zA-Z\s]+) scored (\d+) out of
(\d+) in ([a-zA-Z]+)/','Rohit Dsouza scored 97 out of 100 in
maths',$matches)){

#print "The Student named ".$matches[1]." scored ".$matches[2]."/".
$matches[3]." in ".$matches[4]."\n";

}
else{

print "match not found";

}

if(preg_match_all('/([a-zA-Z\s]+) scored (\d+) out of (\d+) in
([a-zA-Z]+)/','Rohit Dsouza scored 97 out of 100 in maths John
Dsouza scored 95 out of 100 in Physics',$matches)){
#print_r($matches);
#print "Count:".count($matches[0]);

for($c=0;$c
print "The Student named ".$matches[1][$c]." scored ".$matches[2][$c]."/".$matches[3][$c]." in ".$matches[4][$c]."\n";


}
}
else{

print "match not found";

}

print_r(preg_grep('/([a-zA-Z\s]+) scored (\d+) out of (\d+) in ([a-zA-Z]+)/',array('Rohit Dsouza scored 97 out of 100 in maths','John Dsouza scored 95 out of 100 in Physics')));


$string='rohit dsouza is smart';
print preg_replace('/rohit dsouza/','rajeev nair',$string);


$string='Rohit Dsouza scored 97 out of 100 in maths john Dsouza scored 93 out of 100 in physics';

echo preg_replace_callback ('/([a-zA-Z\s]+) scored (\d+)
out of (\d+) in ([a-zA-Z]+)/','call_replacement',$string);
function call_replacement($matches){


###################replace and return code here#####################
return "$matches[1] is a genius";


#####################################################################


}
#print $string;
$split_array=preg_split('/\s/','rohit dsouza');
print "My First name is ".$split_array[0]." and my Surname is ".$split_array[1];

?>

Thats all.if you guys have any query the do mail me at rajdsouza at yahoo dot com

No comments: