diff options
author | Peter Thoeny <web-hurd@gnu.org> | 2000-08-18 08:47:58 +0000 |
---|---|---|
committer | Peter Thoeny <web-hurd@gnu.org> | 2000-08-18 08:47:58 +0000 |
commit | 4e0e35394f188e012e8d73ae68f4dede2c7268a1 (patch) | |
tree | b92581b9603a8994f1ee9f0508b47c58a88fa2f5 | |
parent | 972d1c023d10025d02fe165f4fb2fa3c0279a232 (diff) |
none
-rw-r--r-- | TWiki/RegularExpression.mdwn | 141 |
1 files changed, 141 insertions, 0 deletions
diff --git a/TWiki/RegularExpression.mdwn b/TWiki/RegularExpression.mdwn new file mode 100644 index 00000000..db340de9 --- /dev/null +++ b/TWiki/RegularExpression.mdwn @@ -0,0 +1,141 @@ +Regular expressions allow more specific queries then a simple query. + +**Examples** + +<table> + <tr> + <td> compan(y|ies) </td> + <td> Search for <em>company</em> , <em>companies</em></td> + </tr> + <tr> + <td> (peter|paul) </td> + <td> Search for <em>peter</em> , <em>paul</em></td> + </tr> + <tr> + <td> bug* </td> + <td> Search for <em>bug</em> , <em>bugs</em> , <em>bugfix</em></td> + </tr> + <tr> + <td> [Bb]ag </td> + <td> Search for <em>Bag</em> , <em>bag</em></td> + </tr> + <tr> + <td> b[aiueo]g </td> + <td> Second letter is a vowel. Matches <em>bag</em> , <em>bug</em> , <em>big</em></td> + </tr> + <tr> + <td> b.g </td> + <td> Second letter is any letter. Matches also <em>b&amp;g</em></td> + </tr> + <tr> + <td> [a-zA-Z] </td> + <td> Matches any one letter (not a number and a symbol) </td> + </tr> + <tr> + <td> [^0-9a-zA-Z] </td> + <td> Matches any symbol (not a number or a letter) </td> + </tr> + <tr> + <td> [A-Z][A-Z]* </td> + <td> Matches one or more uppercase letters </td> + </tr> + <tr> + <td> [0-9][0-9][0-9]-[0-9][0-9]- <br /> [0-9][0-9][0-9][0-9] </td> + <td valign="top"> US social security number, e.g. 123-45-6789 </td> + </tr> +</table> + +Here is stuff for our UNIX freaks: <br /> (copied from 'man grep') + + \c A backslash (\) followed by any special character is a + one-character regular expression that matches the spe- + cial character itself. The special characters are: + + + `.', `*', `[', and `\' (period, asterisk, + left square bracket, and backslash, respec- + tively), which are always special, except + when they appear within square brackets ([]). + + + `^' (caret or circumflex), which is special + at the beginning of an entire regular expres- + sion, or when it immediately follows the left + of a pair of square brackets ([]). + + + $ (currency symbol), which is special at the + end of an entire regular expression. + + . A `.' (period) is a one-character regular expression + that matches any character except NEWLINE. + + [string] + A non-empty string of characters enclosed in square + brackets is a one-character regular expression that + matches any one character in that string. If, however, + the first character of the string is a `^' (a circum- + flex or caret), the one-character regular expression + matches any character except NEWLINE and the remaining + characters in the string. The `^' has this special + meaning only if it occurs first in the string. The `-' + (minus) may be used to indicate a range of consecutive + ASCII characters; for example, [0-9] is equivalent to + [0123456789]. The `-' loses this special meaning if it + occurs first (after an initial `^', if any) or last in + the string. The `]' (right square bracket) does not + terminate such a string when it is the first character + within it (after an initial `^', if any); that is, + []a-f] matches either `]' (a right square bracket ) or + one of the letters a through f inclusive. The four + characters `.', `*', `[', and `\' stand for themselves + within such a string of characters. + + The following rules may be used to construct regular expres- + sions: + + * A one-character regular expression followed by `*' (an + asterisk) is a regular expression that matches zero or + more occurrences of the one-character regular expres- + sion. If there is any choice, the longest leftmost + string that permits a match is chosen. + + ^ A circumflex or caret (^) at the beginning of an entire + regular expression constrains that regular expression + to match an initial segment of a line. + + $ A currency symbol ($) at the end of an entire regular + expression constrains that regular expression to match + a final segment of a line. + + * A regular expression (not just a one- + character regular expression) followed by `*' + (an asterisk) is a regular expression that + matches zero or more occurrences of the one- + character regular expression. If there is + any choice, the longest leftmost string that + permits a match is chosen. + + + A regular expression followed by `+' (a plus + sign) is a regular expression that matches + one or more occurrences of the one-character + regular expression. If there is any choice, + the longest leftmost string that permits a + match is chosen. + + ? A regular expression followed by `?' (a ques- + tion mark) is a regular expression that + matches zero or one occurrences of the one- + character regular expression. If there is + any choice, the longest leftmost string that + permits a match is chosen. + + | Alternation: two regular expressions + separated by `|' or NEWLINE match either a + match for the first or a match for the + second. + + () A regular expression enclosed in parentheses + matches a match for the regular expression. + + The order of precedence of operators at the same parenthesis + level is `[ ]' (character classes), then `*' `+' `?' + (closures),then concatenation, then `|' (alternation)and + NEWLINE. |