| miao's profile苗苗 合作 交流PhotosBlogLists | Help |
|
May 31 Perl compatible regular expression(PCRE) tutorialWhy this article?Regular expression are often considered as some mystery one would better stay away from. A lot of people seem to prefer writing lines and lines of code to solve a problem with simple string functions rather than getting into regular expressions to do it with just one statement. I admit that staring at a regular expression pattern like the one below (and there are worse) for the first time actually may scare someone away: '/"[^"\\\\]*(\\\\.[^"\\\\]*)*"/' However, regular expressions are such a powerful feature that once you got used to them you would probably miss them terribly if they were ever removed from PHP. PHP supports two different flavors of regular expressions. This article is focused on perl compatible regular expressions (short PCRE), because they are even more powerful and often said to be faster. Most of the examples are derived from questions asked in a php forum, so I hope they do not seem too artificial. Content:The PCRE functions and what they are typically used for Writing regular expressions on your own Evaluating the replacement argument The PCRE functions and what they are typically used forThe following is only a brief description of what the different PCRE function do. For a complete description including some more optional arguments, see the manual. preg_grep (pattern, array)This function extracts array elements that match the given pattern. It takes the array to investigate as argument and returns an array consisting of the matching array elements, indexed with the keys from the input array. preg_match (pattern, string[, matches])Although it can take an optional third argument that saves the complete match (and matches of sub-patterns included in parentheses, if there are any) as an array for later use, preg_match() is most commonly used for validation purposes, e.g. to validate user input such as email addresses etc. Returns 1 if a match is found, 0 otherwise. The corresponding simple string functions would be strstr() or stristr(). preg_match_all (pattern, string, matches)Similar to preg_match() except that it continues searching until the end of the string, storing all matches in a multi-dimensional array. By default the first array element indexed with 0 holds an array with all complete matches in the order they were found, and all the following array elements hold arrays containing matches of sub-patterns from left to right within the pattern. It returns the number of complete matches. preg_replace (pattern, replacement, string[, limit])This function replaces all occurrences of the pattern in a given string or array with what is passed to it as replacement argument. It returns the modified string or array. Pattern and replacement arguments may be arrays as well. If both are arrays, the first element of the pattern array will be replaced with the first element of the replacement array (this refers to the order they appear within the array, which is not necessarily the numeric index). The option fourth argument allows to specify a limited number of replacements in case you do not want all occurrences to be replaced. The corresponding simple string functions would be str_replace() or strtr(). preg_replace_callback (pattern, function, string)Similar to preg replace, but it takes a function name as second argument and replaces each match with the return value of the function. You will find an example how to use preg_replace_callback() in the section Evaluating the replacement argument. preg_split (pattern, string)With preg_split() you can extract parts of the investigated string that are delimited by the pattern. It returns an array containing these substrings. The corresponding simple string function would be explode(). preg_quote (string[, delimiter])This function inserts a backslash in front of every character that has a special meaning in the regular expression syntax. It is useful to modify a string e.g. from user input in order to use it in a regex pattern, thus making sure every character has its literal meaning and does not cause unexpected results or errors because it is interpreted as special character. The second argument is an additional character that needs to be escaped, which usually would be the delimiter (see below) you use. A similar simple string function would be addslashes() which escapes quotes and backslashes. Pattern syntaxAll preg functions except preg_quote() take the pattern as first argument. The pattern consists of delimiters, the actual search pattern inside them, and optional modifiers at the end. A simple and really silly pattern passed e.g. to preg_match() might look like this: "/apple/i" This would case-insensitive match any string that contains a sequence of these letters 'a', 'p', 'p', 'l', 'e' in exactly this order, so it does nothing that could not be achieved with the simple string functions like stristr(). DelimitersThe delimiters enclose the actual pattern, separating it from the modifiers that may follow. Any special character except the backslash may be used as delimiter. Fairly common is the forward slash, as in the above example. Since you need to escape any occurrence of the delimiter within the pattern, it sometimes can be convenient to choose another delimiter that is less frequently contained in the pattern. Search patternSingle charactersAny single character matches exactly one occurrence of this character, unless it has a special meaning within the regex pattern. These signs with special meaning, often referred to as meta-characters, are explained below. If you want to find any of them literally, you need to escape it with a backslash. Especially the backslash itself is a little difficult to handle: Since patterns are strings where the backslash is a meta-character as well, you need to escape it once for the regex machine and escape these two backslashes within the string again (this may not be 100% accurate from the technical point of view, but I found it a good way to visualize what is going on). Therefore, a regex for a windows-style path might look like this: "/C:\\\\Windows\\\\Temp/" Character classesBy enclosing them in square brackets, you can define character classes that hold a variety of character and matches one occurrence of any characters inside the bracket. You can also use ranges such as [0-9] which would be equivalent to [0123456789]. In addition, you can negate a character class by inserting a ^ right after the opening square bracket. This would match a character that is not listed. Predefined character classesThere are some predefined character classes in PCRE: . the dot matches any character except the newline by default You can use these predefined character classes in character classes you define yourself, e.g. [a-z\d] if you want something similar to \w, but without underscore and only lowercase letters. If you use the dot in a character class, it will loose its special meaning. QuantifiersQuantifiers allow to define how often the previous character or character class may occur. Whenever you want the quantifier to refer to more than one element, you have to group these elements with parentheses. ? - 0 or 1 occurrences Additionally, you can directly specify different lower and upper limits by enclosing them in curly braces. Examples: {2} - exactly 2 occurrences AlternationThe vertical bar indicates a choice between the parts of the pattern on either side of it. Example: "/apple|pear|banana/" ParenthesesParentheses are used for two purposes: The first one is to group parts of the pattern, either to apply quantifiers to a sequence of characters or character classes, or to limit alternation to a certain part of the pattern. Examples: "/Hello+/" "/(Hello)+/" Where the first pattern would match "Helloooooo" and the second one "HelloHelloHello". "/Hi|Hello/" "/H(i|ello)/" Here both pattern would match either "Hi" or "Hello", but if we omitted the parentheses, the latter would match "Hi" or "ello". The second use of parentheses is to mark sub-pattern you want to reuse. This is described in the section about backreferences later on. AssertionThese signs do not consume any character, the only tie the pattern (or sub-patterns) to specific positions in the investigated string like an anchor. ^ and $ - These signs mark the beginning and end of the string (or of a line if used with m-modifier) You may wonder if it is necessary to escape the dollar sign when using variables in your regex pattern, e.g. like this: "/\$var/"; The answer is no, you actually should not. This is because it is a two step process: before the pattern is passed to the regex compiler, the PHP parser evaluates the variable. Escaping the dollar sign would be the same as including the pattern in single quotes: the variables would not been evaluated. Therefore the regex machine would receive it as it is, and the pattern would be interpreted as "end of the string, followed by 'var'". \b - Word boundary (unless within a character class, where it stands for a backspace character) Unlike other languages, in PHP it is not distinguished between word boundaries at the beginning and at the end of a word. A word boundary simply is a "non-word-character", may it be a space, a comma or a special character, next to a "word-character" and vice versa. Caution: If you develop for a non-English site, problems may occur with characters such as ? ? ? ?etc. being interpreted as "non-word-character". There are a few more assertion characters that are less frequently used: \B for not a word boundary, \A for the beginning of the string, and \z and \Z for the end of the string (these make sense in combination with the m-modifier and ^ and $, in case you need additional assertion characters that do not match beginning and end of each line). ModifiersThe most common modifiers are the following: iMatches case-insensitive. Instead of our first example, you could as well write: "/[aA][pP][pP][lL][eE]/" "/(a|A)(p|P)(p|P)(l|L)(e|E)/" but this does not really increase readability. sBy default the dot matches all characters except a newline character. If the s-modifier is set, it matches the newline as well. mWhen the m-modifier is set, ^ and $ match the beginning and end of each line in a multiline string. As said before, they are just anchors and do not consume anything, so "/$^/" would never be true even with the m-modifier set, since there has to be a newline character in between them. UMatches "ungreedy", i.e. each sub-pattern consumes as little as possible to make the whole pattern match. This applies to all sub-patterns. Another way to match ungreedy is inserting a question mark after the quantifier (see section ungreedy matching below for an example). Since this only applies to the preceding sub-pattern, it allows a more fine-grained control of greedy and ungreedy matching. eThe e-modifier evaluates the replacement argument passed to preg_replace() for replacing the pattern with it. Since it does not apply to the other PCRE functions, it is deferred to a later section. xThe x-modifier allows to extend a pattern over several lines, arrange it in a pleasant way, and comment it. Our discouraging first example rewritten: '/
" # doublequote
[^"\\\\]* # optional sequence of anything except doublequote or backslash
# (i.e. "normal" content of a text string)
( # start of sub-pattern
\\\\. # a backslash followed by any character
# (i.e. escape sequence)
[^"\\\\]* # followed by optional "normal" content again
)* # the sub-pattern is optional itself, but may repeat
" # closing doublequote
/x'
Writing regular expressions on your ownWith what we have up to now, you should be able to work through most regular expressions you find somewhere sign by sign to understand what they are doing. When you are starting to write your own, it is often helpful to break it down into pieces. Let's say we wanted to validate an email address (as a user enters it in a form, not the complete specification including alias and so on). So for a start we define:
In a regex patterns these parts could be expressed as: ^[-_.a-zA-Z0-9]+
@
[-a-zA-Z0-9]+
\.
[a-zA-Z]{2,6}$
By putting this together we receive: "/^[-_.a-zA-Z0-9]+@[-a-zA-Z0-9]+\.[a-zA-Z]{2,6}$/"
Or slightly shorter (but possibly slightly less efficient, too) when we decide to match case-insensitive: "/^[-_.a-z0-9]+@[-a-z0-9]+\.[a-z]{2,6}$/i"
But with this, we would reject valid addresses because we did not consider subdomains or additional suffixes as in 'someone@domain.co.uk' yet. Therefore, we would break down part 3, regarding the occurrence of a dot as "special" since it separates subdomains and domain, which make the "normal" parts. The pattern would be: "normal", optionally follow by "special" and "normal" again, where the optional part may repeat. Or as our new part 3: [-a-z0-9]+(\.[-a-z0-9]+)* Now our pattern looks like this: "/^[-_.a-z0-9]+@[-a-z0-9]+(\.[-a-z0-9]+)*\.[a-z]{2,6}$/i"
Treating the first part the same way to reject email addresses with a leading dot or a sequence of dots, our pattern would become: "/^[-_a-z0-9]+(\.[-_a-z0-9]+)*@[-a-z0-9]+(\.[-a-z0-9]+)*\.[a-z]{2,6}$/i"
<?php
// we assume the email address a user submitted has been extracted from the request variables and assign to $email
// since capitalization does not matter anyway, we may decide to set it to lowercase right away
$email = strtolower($email);
if (!pregmatch("/^[-_a-z0-9]+(\.[-_a-z0-9]+)*@[-a-z0-9]+(\.[-a-z0-9]+)*\.[a-z]{2,6}$/", $email){
echo "Sorry, invalid email address.";
} else {
echo ":)";
}
?>
Another example might be extracting strings in a file that contains PHP code or in a CSV file. We will restrict this to doublequoted strings. The first most simplified definition of a string could be
or (note the pattern is enclosed in singlequotes to save some escaping): '/"[^"]*"/' Now we need to take care of escaped quotes which are perfectly legal in a string. Again, the second part can be broken down into a "normal" part (not a doublequote) and a "special" part (a backslash followed by a doublequote). Both the "normal" part and the "special"-"normal" sequence are optional. First try: '/"[^"]*(\\\\"[^"]*)*"/' This will not work yet, since the backslash that escapes a doublequote would be caught within the first "normal" part, thus leaving the doublequote out there naked, and since the "special"-"normal" sequence is optional while the doublequote at the end is not, it would be interpreted as closing doublequote again. Therefore, we disallow both backslashes and quotes in the "normal" part. '/"[^"\\\\]*(\\\\"[^"\\\\]*)*"/' With what we have now, a string like "She said: \"Hello!\"" would be matches correctly, but strings that contain backslashes not followed by a doublequote like "item 1\nitem2" or "C:\\Windows" would not be found. Thus we need to make the "special" part more general, defining it may be a backslash followed by any character: '/"[^"\\\\]*(\\\\.[^"\\\\]*)*"/'
<?php
//we assume the file has been opened, read and its content has been assign to $content
preg_match_all('/"[^"\\\\]*(\\\\.[^"\\\\]*)*"/', $content, $matches);
for ($i=0; $i<count($matches[0]); $i++){
echo $matches[0][$i]."<br />";
}
?>
BackreferencesBackreferences come in handy when you need to reuse parts of the matches, either in the pattern itself or in the replacement argument passed to preg_replace(). Within the pattern \\1 is the syntax to reference the first sub-pattern in parentheses, \\2 the second and so on (counting opening parenthesis from left to right). In the replacement argument, you can either use the same syntax, or $1, $2 and so on which is recommended. In addition $0 holds the complete match. In the pattern itself, you would need backreferences e.g. to find corresponding html tags. There may be tags without attributes, like <h1> or with attributes, like <font face=...>, so we will need to get the first alphanumeric sequence after the opening angular bracket in order to reuse it in the closing tag. <?php
// if tags are nested, this would always catch the innermost
preg_match_all("%<([a-z0-9]+)[^>]*>[^<]*</\\1>%i", $text, $matches);
?>
Backreferences can be useful in the replacement argument as well. A popular use would be highlighting certain words in the text without changing the original capitalization. Imagine the text in question contained the word repeatedly, but differently capitalized like this: <?php $text = "something TEST A test. Test something else tEsT."; $keyword = "test"; ?> It is no problem to search case-insensitive, but if we used "test" or "TEST" in the replacement argument, we would change the original capitalization. To avoid this, we use a backreference instead. <?php
preg_replace("/$keyword/i", "<b>$0</b>", $text);
?>
Sometimes it is a helpful that you can mark parentheses as "grouping only" if you do not need to reuse this part. To do this, insert a question mark and a colon at the beginning: "/H(?:i|ello)/" Ungreedy matchingBy default quantifiers behave greedy, consuming as much as possible from the string. Example: <?php
$str = "A really <b>important word</b> in a text that contains more <b>important stuff</b>";
echo preg_replace("#<b>(.*)</b>#si", "<i>$0</i>", $str);
?>
The above code would match everything from the first opening to the last closing tag, thus changing the text to "A really <i>important word</b> in a text that contains more <b>important stuff</i>", which is probably not what we wanted. This behavior can be changed with the U modifier or an additional question mark following the quantifier, as explained in the syntax section. Caution: If you apply the pattern below to a non-empty string, it would match only the first character at the beginning, as expected. "/^.+?/s" But you might expect the next pattern to match the last character only: "/.+?$/s" and this is not the case, it will still match the complete string. This is because matching always starts at the leftmost position in the string and is continued until the last requirement of the pattern, in this case the end of the string, is fulfilled. The beginning of the match would only be moved to the second character from the left if there was no match at the current position. Is alternation greedy?No, alternation is neither greedy nor ungreedy, the regex machine simply works it way through the branches of the alternation from left to right until it detects a match or the matching fails. Thus, "/http|https/" would never match 'https', because it is satisfied with the 'http' within 'https' already. This would only be different if the pattern would be continued with more requirements, e.g. "#(http|https)://#" Again, 'http' would be found first, but then the regex machine would compare the 's' in 'https' against the colon in the pattern, and since this fails, try the other branch of the alternation. The best way in both examples probably would be to use single characters for the 'http' sequence followed by an optional 's'. LookaheadsLookaheads are special regex constructs that allow to check if the following characters meet certain requirements without actually capturing them within the match. The syntax for positive and negative lookaheads is (?=...) (?!...) Lookaheads can often be useful, e.g. if we were off to highlight certain words, but only if they are not within html tags. A simple approach to find something unless it is within an html tag might be: We assume that if our keyword is followed by an opening angular bracket without a closing angular bracket in between, it is not within a tag. This should be true for all keywords except 'html' in a well-formed html document. A first try to modify the highlighting example: <?php
$text = "<body>TEXT text <img src=\"text.gif\"> Text</body>";
$keyword = "text";
echo preg_replace("/($keyword)([^>]*<)/i", "<B>$1</B>$2", $text);
?>
When running this, you will find that it fails to mark the second occurrence of the keyword bold. This is because it was caught within the sub-pattern [^>]* of the first match. Using a positive lookahead eliminates the problem: <?php
echo preg_replace("/($keyword)(?=[^>]*<)/i", "<B>$0</B>", $text);
?>
An example for a negative lookahead might be finding relative links in a html document. We assume well-formed html again to keep things simple, but the pattern could easily be modified to allow singlequotes as well, or additional spaces. "%href=\"(?!https?://|ftp://|mailto:|news:|javascript:|#)([^\"]+)%i" LookbehindsThere is a syntax for lookbehinds as well, with a lesser than sign inserted after the questionmark: (?<=...) (?<!...) This can be useful e.g. with preg_split(). Imagine you would want to split a text into sentences, and you define that a sentence is something, followed by an interpunctation sign, followed by a space at which you would like to split the string. The interpunctation sign is required, but it is belonging to the sentence and therefore should not be consumed by the pattern. Lookaheads will not get us far here, since a pattern like '(?=[.?!])\s+/' can never match, and when turning it the other way round, we are splitting at the interpunctation sign and not at the space as intended. '[.?!](?=\s+)/' But with lookbehinds we finally achieve what we wanted: <?php
$text = 'This is a sentence. Is there more to come? I don't think so!';
$sentences = preg_split('/(?<=[.?!])\s+/', $text);
var_dump($sentences);
?>
Keep in mind though that tasks which seems to require a "lookbehind" can be expressed with negative lookaheads just as well sometimes. Let's look at a simplified example with a list of elements, separated by a semicolon and a space. Obviously elements beginning with group= categorize the following elements. <?php $string = "group=fruits; apple; banana; group=music; jazz; pop; rock; folk; group=numbers; one; two; tree;"; $keyword = "rock"; ?> Now, we would like to lookup the group that is preceding a given keyword. In other words, we want
Translated in a regular expression this would be: <?php
preg_match("/group=([^;]+);\s((?!group=)([^;]*;\s))*$keyword/i", $string, $match);
echo "Group of ".$keyword." is ".$match[1]".";
?>
Evaluating the replacement argumentSay we would like to automatically enclose URLs in <a href=...> </a> tags unless they are either within html tags or already surrounded by <a href="http://..."> </a>. We have already used lookaheads to find something that is not within html tags, but adding that new requirement to the previous example would be fairly complicated, so we are going to take a look at a different technique. First we would write a pattern for each unwanted case and one for the general case (note that the first one only matches if the url is directly within tags without any spaces, and change this if you like): <a\s[^>]+>http://\S+</a> <[^>]+http://[^>]+> http://\S+ To keep things simple, we assume the url to be anything until the next space (which often may be incorrect but that is not what we are looking at right now). Then we would combine them in an alternation, capturing both unwanted cases within parentheses, and appending the e-modifier to the pattern. Since both unwanted cases would start matching at the same position in the string if they encounter the text '<a href="http://...', we need to place the one that grabs more first in order to simulate greedy behavior: "#(<a\s[^>]+>http://\S+</a>)|(<[^>]+http://[^>]+>)|http://\S+#ie" What actually does the trick happens in the replacement argument. The logic is: If the complete match equals one of the unwanted matches, replace it with itself (i.e. do nothing), else add the <a href=...> </a> tags around it. <?php
$text = "http://www.domain.com this was the first url.\n";
$text .= "However there is more to come <img scr=\"http://www.domain.com/pic.gif\">\n";
$text .= "image path is http://www.domain.com/pic.gif and here comes one enclosed in tags: <a href=\"http://www.domain.org\">http://www.domain.com</a>";
echo preg_replace("#(<a\s[^>]+>http://\S+</a>)|(<[^>]+http://[^>]+>)|http://\S+#ie",
'"$0"=="$1" || "$0"=="$2" ? "$0" : "<a href=\"$0\">$0</a>"',
$text);
?>
If you do not feel comfortable with the ternary operator, you can use preg_replace_callback() to have a function do the evaluation. It receives the function name as second argument. You do not need to pass any arguments, each match is automatically passed as an array with complete match and matches of the sub-patterns. <?php
function check_url($matches)
{
if ($matches[0]==$matches[1]||$matches[0]==$matches[2]){
return $matches[0];
} else {
return '<a href="'.$matches[0].'">'.$matches[0].'</a>';
}
}
echo preg_replace_callback('%(<a\s[^>]+>http://\S+</a>)|(<[^>]+http://[^>]+>)|http://\S+%i',
'check_url', $text);
?>
Though you can do a lot of things with just one call to a PCRE function, it is sometimes easier to split the task into two. Imagine you wanted to convert a sequence of lines beginning with a hyphen and a space into the proper html list format. Example string: <?php $str = "some text and a list - apples - bananas some more text and the second list: - green - red - yellow - purple and more text... But what I always wanted to tell you ... - oh no, I forgot!"; ?> A pattern to identify the complete lists is not too hard to write, and it is easy wrap <ul> and </ul> around it, too. However it is difficult to reference each single line at the point where we would like to put it into <li> </li> tags. We can solve this with preg_replace_callback(). The pattern identifies the lists, and passes each of them to a function that uses preg_replace() to modify the single lines within the match. This would even allow to set up an additional requirement that a list must consist of at least two lines. <?php
function format_list($matches)
{
return "<ul>\n".preg_replace("/^-\s(.*)/m", "<li>$1</li>", $matches[0])."\n</ul>";
}
echo preg_replace_callback("/(^-\s.*$[\r\n]*){2,}/m", 'format_list', $str);
?>
But in case we skip that extra requirement, two calls to preg_replace() would do it as well: <?php
$str = preg_replace("/(^-\s.*$[\r\n]*)+/m", "<ul>\n$0\n</ul>", $str);
$str = preg_replace("/^-\s(.*)/m", "<li>$1</li>", $str);
echo $str;
?>
When using preg_replace_callback(), you can alternatively create the function on the fly. The advantage is that you do not "waste" function names, but I would usually do this only if the function is rather simple (and not worth reusing it for other purposes of course). Here is an example that replaces matches with a text followed by a sequential number. <?php
$input = '<img scr="/path/img.gif">text text text <img name="test" scr="http://www.domain.com/imgage.jpg" alt="test" /> text';
$output = preg_replace_callback("/<img[^>]*>/i", create_function(
'$matches',
'static $counter = 0;
$counter++;
return "Image ".$counter;'
), $input);
echo $output;
?>
TroubleshootingCompilation failureThis is often caused either by missing delimiters[?] or by unescaped meta-characters[?]. Choose a delimiter that is not contained in the pattern if possible, and escape it where it occurs within the pattern. Carefully escape all other meta-characters you want to match literally. Too much escaping should not hurt in most situations, though it does not increase readability. And count unescaped parentheses, square bracket and curly braces to see if opening and closing elements match. No MatchesIf your pattern fails to match anything at all, you would need to identify the point where it fails. A simply but effective way to do this is to test parts of it with preg_match_all()[?], and printing the array with the matches: <?php
preg_match_all("$pattern", $str, $matches);
print_r($matches);
?>
Replace $pattern with the first element in your pattern. Run that and check if it matches what you think it should. Then add the next element and so on. Certainly you do not need to do this element by element actually, you can always use a group of elements to save time, and cut it further down only if it fails to match. Also check the modifiers[?]. If your pattern contains literal sequences or letters in character classes[?] , make sure you have the i-modifier set if capitalization may differ in the investigated string. If you apply a pattern with ^ or $ assertion characters[?] to a multiline string, check if you would need to set the m-modifier to make your regex work. If you use the dot as a wildcard for any character, remember that it needs the s-modifier being set in order to match newlines. In some situations it may be a good idea to apply your pattern to some dummy data. If it works there, but fails with data from another source, there must be a difference in the data at the point where the regex machine receives them, may this be invisible characters or html entities like or & you are trying to match against their literal equivalent. One huge match instead of several smallMost likely this results from greedy quantifiers. If it is possible in your particular situation, use negated character classes instead of .* or .+, or switch to ungreedy matching by using the U-modifier or inserting a question mark after the quantifier[?]. Some common mistakesWrong use of negated character classesSometimes negated character classes are mistaken as being negative lookaheads[?], like this: "#^[^http://]#i" This does not mean that a string does not begin with 'http://', it only says that the first character is not 'h', 't', 'p' or a semicolon or a slash. Missing escapesSay you would want to detect if a given filename has the extension htm or html, but forgot to escape to dot: "/.html?$/i" This would match all valid filenames, but it would match file.phtml or files/htm just as well. All elements optionalAn example of this may be to find numbers, allowing the formats "46", "45.999" but as well ".999". One could be tempted to write the pattern like this: "/\d*(\.\d+)?/" That does match all numbers, but unfortunately it does match anything else too including empty strings, because none of the parts is required. Further readingUnfortunately I am not aware of many online tutorials etc. on PCRE apart from the PHP manual. Though some find it hard to read, the manual definitely is a good and most accurate source of information. And especially if you would like to know how regex machines internally work, and learn about optimizations and efficiency, "Mastering Regular Expressions" by Jeffrey Friedl is a great book. |
|
|