Regular Expressions
Regexp new: pattern &flags: flags → Regexp | pattern is-a?: String | flags is-a?: String
Creates a Regexp
with a given pattern, compiled with the given flags.
The macro-quoter r
is an easyer way of calling this, though it cannot be done programmatically.
Note that all regular expressions match strings as UTF-8; there is no flag for this.
Accepted flags (see the PCRE manual for more information):
m
(multiline
)By default, PCRE treats the subject string as consisting of a single line of characters (even if it actually contains newlines). The start of line metacharacter (
^
) matches only at the start of the string, while the end of line metacharacter ($
) matches only at the end of the string, or before a terminating newline (unlessdollar_endonly
is set). This is the same as Perl.When this flag is set, the start of line and end of line constructs match immediately following or immediately before internal newlines in the subject string, respectively, as well as at the very start and end. This is equivalent to Perl's
/m
option, and it can be changed within a pattern by a(?m)
option setting. If there are no newlines in a subject string, or no occur- rences of^
or$
in a pattern, setting this flag has no effect.s
(dotall
)If this flag is set, a dot metacharater in the pattern matches all characters, including those that indicate newline. Without it, a dot does not match when the current position is at a newline. This option is equivalent to Perl's
/s
option, and it can be changed within a pattern by a(?s)
option setting. A negative class such as[^a]
always matches newline characters, independent of the setting of this option.i
(caseless
)Equivalent to Perl's
/i
option. If set, letters in the pattern match both upper and lower -case letters.x
(extended
)If this bit is set, whitespace data characters in the pattern are totally ignored except when escaped or inside a character class. Whitespace does not include the VT character (code 11). In addition, characters between an unescaped
#
outside a character class and the next newline, inclusive, are also ignored. This is equivalent to Perl's/x
option, and it can be changed within a pattern by a(?x)
option setting.This option makes it possible to include comments inside complicated patterns. Note, however, that this applies only to data characters. Whitespace characters may never appear within special character sequences in a pattern, for example within the sequence
(?(
which introduces a conditional subpattern.a
(anchored
)If this flag is set, the pattern is forced to be anchored, that is, it is constrained to match only at the first matching point in the string that is being searched (the subject string). This effect can also be achieved by appropriate constructs in the pattern itself, which is the only way to do it in Perl.
G
(ungreedy
)This option inverts the greediness of the quantifiers so that they are not greedy by default, but become greedy if followed by
?
. It is not compatible with Perl. It can also be set by a(?U)
option setting within the pattern.e
(dollar_endonly
)If this flag is set, a dollar metacharacter in the pattern matches only at the end of the subject string. Without this option, a dollar also matches immediately before a newline at the end of the string (but not before any other newlines). This flag is ignored if multiline is set. There is no equivalent to this option in Perl, and no way to set it within a pattern.
f
(firstline
)If this flag is set, an unanchored pattern is required to match before or at the first newline in the subject string, though the matched text may continue over the newline.
C
(no_auto_capture
)If this option is set, it disables the use of numbered capturing parentheses in the pattern. Any opening paren- thesis that is not followed by
?
behaves as if it were followed by?:
but named parentheses can still be used for capturing (and they acquire numbers in the usual way). There is no equivalent of this option in Perl.
The above flag descriptions are based on man pcreapi
, written by Philip Hazel, 2007.
needle matches?: haystack → Boolean | needle is-a?: Regexp | haystack is-a?: String
Test if a regular expression matches a given string.
r match: s → in?: [@(ok: RegexpMatch), @none] | r is-a?: Regexp | s is-a?: String
Attempt a regular expression match, yielding @none
if there is not match or @ok:
wrapping a RegexpMatch
if it's successful.
Example:
> r"\d+" match: "123" @(ok: <object (delegates to 1 object)> captures := [] before := "" bindings := <object> after := "" match := "123") > r"\d+" match: "abc" @none
This is often more cleanly used with case-of?:
, where the matching case provides implicit bindings, and the non-matching case continues onto the next test:
haystack replace: needle with: replacement → String | haystack is-a?: String | needle is-a?: Regexp | (replacement is-a?: Block) || (replacement is-a?: String)
Replace the first occurrence of needle
in haystack
.
If replacement
is a block, it is called with the bindings in context.
If it is a string, it is parsed as a replacement format (at runtime), looking for bindings denoted by $
. For example, to use the first subcapture, you use "$1"
. To match a named capture, you use "$(foo)"
.
Example:
> "abc 123 foo" replace: r{abc (?<num>\d+)} with: { \num .. "!" } "123! foo" > "abc 123 foo" replace: r{abc (?<num>\d+)} with: "$(num)!" "123! foo" > "123 456" replace: r{(?<num>\d+)} with: "($(num))" "(123) 456"
haystack replace-all: needle with: replacement → String | haystack is-a?: String | needle is-a?: Regexp | (replacement is-a?: Block) || (replacement is-a?: String)
Similar to replace:with:
, but replaces all occurrences of the match.
Example:
> "123 456" replace-all: r{(?<num>\d+)} with: "($(num))" "(123) (456)"
RegexpMatch → Object
A throw-away object containing information about a regular expression match.
m match → String | m is-a?: RegexpMatch
Yields the matched text.
m captures → List | m is-a?: RegexpMatch
Yields the captured matches.
m before → String | m is-a?: RegexpMatch
Yields the text preceding a match.
m after → String | m is-a?: RegexpMatch
Yields the text following a match.
RegexpBindings → Object
A throw-away object containing the bindings from a regular expression match, as methods preceded by a backslash. Captures are named from their offset, and named matches are bound by their name.
m bindings → RegexpBindings | m is-a?: RegexpMatch
Yields the bind following a match.
Yields the RegexpBindings
for a match.