-
Literals
- All characters are literals
- Except: ".", "|", "*", "?", "+", "(", ")", "{", "}", "[", "]", "^", "$" and "\"
-
Special Characters
-
Widecard
-
.
- matchs any single character
-
Repeat
-
*
-
repeated any number of times including zero
- "ba*" will match all of "b", "ba", "baaa" etc
-
+
-
repeated any number of times, but at least once
- "ba+" will match "ba" or "baaaa" for example but not "b"
-
?
-
repeated zero or one time only
- "ba?" will match "b" or "ba"
-
{}
-
bounds operator, mininum and maximum number of repeats
- "a{2}" is the letter "a" repeated exactly twice
- "a{2,4}" the letter "a" repeated between 2 and 4 times
- "a{2,}" the letter "a" repeated at least twice with no upper limit
-
Parenthesis ( )
-
To group items together into a sub-expression
- "p(hp)*" will match all of "p", "php", "phphp" etc
- "<b>(.*)</br>" will match characters around by<b> and </br>
- To mark what generated the match
-
Non-Marking Parenthesis
- "(?:abc)*" creates no sub-expressions
-
Forward Lookahead Asserts
-
Positive
- "(?=abc)" matches zero characters only if they are followed by the expression "abc"
-
Negative
- "(?!abc)" match zero characters only if they are not followed by the expression "abc"
-
Sets [ ]
-
Character literals
- "[^abc]" match any character other than "a", "b", or "c"
- "[abc]" match either of "a", "b", or "c"
-
Character ranges
- "[a-z]" match any character in the range "a" to "z"
- "[^A-Z]" match any character other than those in the range "A" to "Z"
-
Character classes, "[:classname:]"
-
alnum
- Any alpha numeric character
-
alpha
- Any alphabetical character a-z and A-Z
-
blank
- Any blank charactyer, either a space or a tab
-
cntrl
- Any control character
-
digit
- Any digit 0-9
- \d
-
graph
- Any graphical character
-
lower
- Any lower case character
- \l
-
print
- Any printable character
-
punct
- Any punctuation character
-
space
- Any whitespace character
- \s
-
upper
- Any upper case character A-Z
- \u
-
xdigit
- Any hexadecimal digit character
-
word
- Any word character
- \w
-
unicode
- Any character whose code is greater than 255
-
Line anchors
-
^
- matches the null string at the start of a line
-
$
- matches the null string at the end of a line
-
Back references
-
"\" followed by a digit "1" to "9"
- "(.*)\1" matches any string that is repeated about its mid-point
- example "abcabc" or "xyzxyz"
-
Word operators
- Provided for compatibility with the GNU regular expression library
-
\w
- matches any single character that is a member of the "word" character class
- identical to the expression "[[:word:]]"
-
\W
- matches any single character that is not a member of the "word" character class
- identical to the expression "[^[:word:]]"
-
\<
- matches the null string at the start of a word
-
\>
- matches the null string at the end of the word
-
\b
- matches the null string at either the start or the end of a word
-
\B
- matches a null string within a word
-
Buffer operators
-
\'
- matches the start of a buffer
-
\A
- matches the start of the buffer
-
"\'"
- matches the end of a buffer
-
\z
- matches the end of a buffer
-
\Z
- matches the end of a buffer, or possibly one or more new line characters followed by the end of the buffer
-
Escape operator
-
\
- "\*" represents a literal "*" rather than the repeat operator
-
Single character escape sequences
-
\a, 0x07
- Bell character
-
\f, 0x0C
- Form feed
-
\n, 0x0A
- Newline character
-
\r, 0x0D
- Carriage return
-
\t, 0x09
- Tab character
-
\v, 0x0B
- Vetical tab
-
\e, 0x1B
- ASCII Escape character
-
\0dd, 0dd
- An octal character code, where dd is one or more octal digits
-
\xXX, 0xXX
- A hexadecimal character code, where XX is one or more hexadecimal digits
-
\x{XX}, 0xXX
- A hexadecimal character code, where XX is one or more hexadecimal digits, optionally a unicode character
-
\cZ, z-@
- An ASCII escape sequence control-Z, where Z is any ASCII character greater than or equal to the character code for '@'
-
Miscellaneous escape sequences(perl compatibility)
-
\w
- Equivalent to [[:word:]].
-
\W
- Equivalent to [^[:word:]].
-
\s
- Equivalent to [[:space:]].
-
\S
- Equivalent to [^[:space:]].
-
\d
- Equivalent to [[:digit:]].
-
\D
- Equivalent to [^[:digit:]].
-
\l
- Equivalent to [[:lower:]].
-
\L
- Equivalent to [^[:lower:]].
-
\u
- Equivalent to [[:upper:]].
-
\U
- Equivalent to [^[:upper:]].
-
\C
- Any single character, equivalent to '.'.
-
\X
- Match any Unicode combining character sequence, for example "a\x 0301" (a letter a with an acute).
-
\Q
- The begin quote operator, everything that follows is treated as a literal character until a \E end quote operator is found.
-
\E
- The end quote operator, terminates a sequence begun with \Q.