Boost::Regex Syntax

Literals
1. All characters are literals
2. Except: ".", "|", "*", "?", "+", "(", ")", "{", "}", "[", "]", "^", "$" and "\"
Special Characters
1. Widecard
  1. .
    1. matchs any single character
2. Repeat
  1. *
    1. repeated any number of times including zero
      1. "ba*" will match all of "b", "ba", "baaa" etc
  2. +
    1. repeated any number of times, but at least once
      1. "ba+" will match "ba" or "baaaa" for example but not "b"
  3. ?
    1. repeated zero or one time only
      1. "ba?" will match "b" or "ba"
  4. {}
    1. bounds operator, mininum and maximum number of repeats
      1. "a{2}" is the letter "a" repeated exactly twice
      2. "a{2,4}" the letter "a" repeated between 2 and 4 times
      3. "a{2,}" the letter "a" repeated at least twice with no upper limit
Parenthesis ( )
1. To group items together into a sub-expression
  1. "p(hp)*" will match all of "p", "php", "phphp" etc
  2. "<b>(.*)</br>" will match characters around by<b> and </br>
2. To mark what generated the match
Non-Marking Parenthesis
1. "(?:abc)*" creates no sub-expressions
Forward Lookahead Asserts
1. Positive
  1. "(?=abc)" matches zero characters only if they are followed by the expression "abc"
2. Negative
  1. "(?!abc)" match zero characters only if they are not followed by the expression "abc"
Sets [ ]
1. Character literals
  1. "[^abc]" match any character other than "a", "b", or "c"
  2. "[abc]" match either of "a", "b", or "c"
2. Character ranges
  1. "[a-z]" match any character in the range "a" to "z"
  2. "[^A-Z]" match any character other than those in the range "A" to "Z"
Character classes, "[:classname:]"
1. alnum
  1. Any alpha numeric character
2. alpha
  1. Any alphabetical character a-z and A-Z
3. blank
  1. Any blank charactyer, either a space or a tab
4. cntrl
  1. Any control character
5. digit
  1. Any digit 0-9
  2. \d
6. graph
  1. Any graphical character
7. lower
  1. Any lower case character
  2. \l
8. print
  1. Any printable character
9. punct
  1. Any punctuation character
10. space
  1. Any whitespace character
  2. \s
11. upper
  1. Any upper case character A-Z
  2. \u
12. xdigit
  1. Any hexadecimal digit character
13. word
  1. Any word character
  2. \w
14. unicode
  1. Any character whose code is greater than 255
Line anchors
1. ^
  1. matches the null string at the start of a line
2. $
  1. matches the null string at the end of a line
Back references
1. "\" followed by a digit "1" to "9"
  1. "(.*)\1" matches any string that is repeated about its mid-point
  2. example "abcabc" or "xyzxyz"
Word operators
1. Provided for compatibility with the GNU regular expression library
2. \w
  1. matches any single character that is a member of the "word" character class
  2. identical to the expression "[[:word:]]"
3. \W
  1. matches any single character that is not a member of the "word" character class
  2. identical to the expression "[^[:word:]]"
4. \<
  1. matches the null string at the start of a word
5. \>
  1. matches the null string at the end of the word
6. \b
  1. matches the null string at either the start or the end of a word
7. \B
  1. matches a null string within a word
Buffer operators
1. \'
  1. matches the start of a buffer
2. \A
  1. matches the start of the buffer
3. "\'"
  1. matches the end of a buffer
4. \z
  1. matches the end of a buffer
5. \Z
  1. matches the end of a buffer, or possibly one or more new line characters followed by the end of the buffer
Escape operator
1. \
  1. "\*" represents a literal "*" rather than the repeat operator
Single character escape sequences
1. \a, 0x07
  1. Bell character
2. \f, 0x0C
  1. Form feed
3. \n, 0x0A
  1. Newline character
4. \r, 0x0D
  1. Carriage return
5. \t, 0x09
  1. Tab character
6. \v, 0x0B
  1. Vetical tab
7. \e, 0x1B
  1. ASCII Escape character
8. \0dd, 0dd
  1. An octal character code, where dd is one or more octal digits
9. \xXX, 0xXX
  1. A hexadecimal character code, where XX is one or more hexadecimal digits
10. \x{XX}, 0xXX
  1. A hexadecimal character code, where XX is one or more hexadecimal digits, optionally a unicode character
11. \cZ, z-@
  1. An ASCII escape sequence control-Z, where Z is any ASCII character greater than or equal to the character code for '@'
Miscellaneous escape sequences(perl compatibility)
1. \w
  1. Equivalent to [[:word:]].
2. \W
  1. Equivalent to [^[:word:]].
3. \s
  1. Equivalent to [[:space:]].
4. \S
  1. Equivalent to [^[:space:]].
5. \d
  1. Equivalent to [[:digit:]].
6. \D
  1. Equivalent to [^[:digit:]].
7. \l
  1. Equivalent to [[:lower:]].
8. \L
  1. Equivalent to [^[:lower:]].
9. \u
  1. Equivalent to [[:upper:]].
10. \U
  1. Equivalent to [^[:upper:]].
11. \C
  1. Any single character, equivalent to '.'.
12. \X
  1. Match any Unicode combining character sequence, for example "a\x 0301" (a letter a with an acute).
13. \Q
  1. The begin quote operator, everything that follows is treated as a literal character until a \E end quote operator is found.
14. \E
  1. The end quote operator, terminates a sequence begun with \Q.