summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorPeter Thoeny <web-hurd@gnu.org>2000-08-18 08:47:58 +0000
committerPeter Thoeny <web-hurd@gnu.org>2000-08-18 08:47:58 +0000
commit4e0e35394f188e012e8d73ae68f4dede2c7268a1 (patch)
treeb92581b9603a8994f1ee9f0508b47c58a88fa2f5
parent972d1c023d10025d02fe165f4fb2fa3c0279a232 (diff)
none
-rw-r--r--TWiki/RegularExpression.mdwn141
1 files changed, 141 insertions, 0 deletions
diff --git a/TWiki/RegularExpression.mdwn b/TWiki/RegularExpression.mdwn
new file mode 100644
index 00000000..db340de9
--- /dev/null
+++ b/TWiki/RegularExpression.mdwn
@@ -0,0 +1,141 @@
+Regular expressions allow more specific queries then a simple query.
+
+**Examples**
+
+<table>
+ <tr>
+ <td> compan(y|ies) </td>
+ <td> Search for <em>company</em> , <em>companies</em></td>
+ </tr>
+ <tr>
+ <td> (peter|paul) </td>
+ <td> Search for <em>peter</em> , <em>paul</em></td>
+ </tr>
+ <tr>
+ <td> bug* </td>
+ <td> Search for <em>bug</em> , <em>bugs</em> , <em>bugfix</em></td>
+ </tr>
+ <tr>
+ <td> [Bb]ag </td>
+ <td> Search for <em>Bag</em> , <em>bag</em></td>
+ </tr>
+ <tr>
+ <td> b[aiueo]g </td>
+ <td> Second letter is a vowel. Matches <em>bag</em> , <em>bug</em> , <em>big</em></td>
+ </tr>
+ <tr>
+ <td> b.g </td>
+ <td> Second letter is any letter. Matches also <em>b&amp;amp;g</em></td>
+ </tr>
+ <tr>
+ <td> [a-zA-Z] </td>
+ <td> Matches any one letter (not a number and a symbol) </td>
+ </tr>
+ <tr>
+ <td> [^0-9a-zA-Z] </td>
+ <td> Matches any symbol (not a number or a letter) </td>
+ </tr>
+ <tr>
+ <td> [A-Z][A-Z]* </td>
+ <td> Matches one or more uppercase letters </td>
+ </tr>
+ <tr>
+ <td> [0-9][0-9][0-9]-[0-9][0-9]- <br /> [0-9][0-9][0-9][0-9] </td>
+ <td valign="top"> US social security number, e.g. 123-45-6789 </td>
+ </tr>
+</table>
+
+Here is stuff for our UNIX freaks: <br /> (copied from 'man grep')
+
+ \c A backslash (\) followed by any special character is a
+ one-character regular expression that matches the spe-
+ cial character itself. The special characters are:
+
+ + `.', `*', `[', and `\' (period, asterisk,
+ left square bracket, and backslash, respec-
+ tively), which are always special, except
+ when they appear within square brackets ([]).
+
+ + `^' (caret or circumflex), which is special
+ at the beginning of an entire regular expres-
+ sion, or when it immediately follows the left
+ of a pair of square brackets ([]).
+
+ + $ (currency symbol), which is special at the
+ end of an entire regular expression.
+
+ . A `.' (period) is a one-character regular expression
+ that matches any character except NEWLINE.
+
+ [string]
+ A non-empty string of characters enclosed in square
+ brackets is a one-character regular expression that
+ matches any one character in that string. If, however,
+ the first character of the string is a `^' (a circum-
+ flex or caret), the one-character regular expression
+ matches any character except NEWLINE and the remaining
+ characters in the string. The `^' has this special
+ meaning only if it occurs first in the string. The `-'
+ (minus) may be used to indicate a range of consecutive
+ ASCII characters; for example, [0-9] is equivalent to
+ [0123456789]. The `-' loses this special meaning if it
+ occurs first (after an initial `^', if any) or last in
+ the string. The `]' (right square bracket) does not
+ terminate such a string when it is the first character
+ within it (after an initial `^', if any); that is,
+ []a-f] matches either `]' (a right square bracket ) or
+ one of the letters a through f inclusive. The four
+ characters `.', `*', `[', and `\' stand for themselves
+ within such a string of characters.
+
+ The following rules may be used to construct regular expres-
+ sions:
+
+ * A one-character regular expression followed by `*' (an
+ asterisk) is a regular expression that matches zero or
+ more occurrences of the one-character regular expres-
+ sion. If there is any choice, the longest leftmost
+ string that permits a match is chosen.
+
+ ^ A circumflex or caret (^) at the beginning of an entire
+ regular expression constrains that regular expression
+ to match an initial segment of a line.
+
+ $ A currency symbol ($) at the end of an entire regular
+ expression constrains that regular expression to match
+ a final segment of a line.
+
+ * A regular expression (not just a one-
+ character regular expression) followed by `*'
+ (an asterisk) is a regular expression that
+ matches zero or more occurrences of the one-
+ character regular expression. If there is
+ any choice, the longest leftmost string that
+ permits a match is chosen.
+
+ + A regular expression followed by `+' (a plus
+ sign) is a regular expression that matches
+ one or more occurrences of the one-character
+ regular expression. If there is any choice,
+ the longest leftmost string that permits a
+ match is chosen.
+
+ ? A regular expression followed by `?' (a ques-
+ tion mark) is a regular expression that
+ matches zero or one occurrences of the one-
+ character regular expression. If there is
+ any choice, the longest leftmost string that
+ permits a match is chosen.
+
+ | Alternation: two regular expressions
+ separated by `|' or NEWLINE match either a
+ match for the first or a match for the
+ second.
+
+ () A regular expression enclosed in parentheses
+ matches a match for the regular expression.
+
+ The order of precedence of operators at the same parenthesis
+ level is `[ ]' (character classes), then `*' `+' `?'
+ (closures),then concatenation, then `|' (alternation)and
+ NEWLINE.