Regular Expressions

Regexp → Object

A compiled PCRE-style regular expression, usually created via the r macro-quoter.

Example:

> r"foo bar" is-a?: Regexp
True
> r"(\d{4,})"
r{(\d\{4,\})}

Regexp new: pattern &flags: flags → Regexp
  | pattern is-a?: String
  | flags is-a?: String

Creates a Regexp with a given pattern, compiled with the given flags.

The macro-quoter r is an easyer way of calling this, though it cannot be done programmatically.

Note that all regular expressions match strings as UTF-8; there is no flag for this.

Example:

> Regexp new: "\\d+" &flags: "m"
r{\d+}m
> Regexp new: "\\d+" &flags: "isa"
r{\d+}isa

Accepted flags (see the PCRE manual for more information):

m (multiline)

By default, PCRE treats the subject string as consisting of a single line of characters (even if it actually contains newlines). The start of line metacharacter (^) matches only at the start of the string, while the end of line metacharacter ($) matches only at the end of the string, or before a terminating newline (unless dollar_endonly is set). This is the same as Perl.

When this flag is set, the start of line and end of line constructs match immediately following or immediately before internal newlines in the subject string, respectively, as well as at the very start and end. This is equivalent to Perl's /m option, and it can be changed within a pattern by a (?m) option setting. If there are no newlines in a subject string, or no occur- rences of ^ or $ in a pattern, setting this flag has no effect.

s (dotall)

If this flag is set, a dot metacharater in the pattern matches all characters, including those that indicate newline. Without it, a dot does not match when the current position is at a newline. This option is equivalent to Perl's /s option, and it can be changed within a pattern by a (?s) option setting. A negative class such as [^a] always matches newline characters, independent of the setting of this option.

i (caseless)

Equivalent to Perl's /i option. If set, letters in the pattern match both upper and lower -case letters.

x (extended)

If this bit is set, whitespace data characters in the pattern are totally ignored except when escaped or inside a character class. Whitespace does not include the VT character (code 11). In addition, characters between an unescaped # outside a character class and the next newline, inclusive, are also ignored. This is equivalent to Perl's /x option, and it can be changed within a pattern by a (?x) option setting.

This option makes it possible to include comments inside complicated patterns. Note, however, that this applies only to data characters. Whitespace characters may never appear within special character sequences in a pattern, for example within the sequence (?( which introduces a conditional subpattern.

a (anchored)

If this flag is set, the pattern is forced to be anchored, that is, it is constrained to match only at the first matching point in the string that is being searched (the subject string). This effect can also be achieved by appropriate constructs in the pattern itself, which is the only way to do it in Perl.

G (ungreedy)

This option inverts the greediness of the quantifiers so that they are not greedy by default, but become greedy if followed by ?. It is not compatible with Perl. It can also be set by a (?U) option setting within the pattern.

e (dollar_endonly)

If this flag is set, a dollar metacharacter in the pattern matches only at the end of the subject string. Without this option, a dollar also matches immediately before a newline at the end of the string (but not before any other newlines). This flag is ignored if multiline is set. There is no equivalent to this option in Perl, and no way to set it within a pattern.

f (firstline)

If this flag is set, an unanchored pattern is required to match before or at the first newline in the subject string, though the matched text may continue over the newline.

C (no_auto_capture)

If this option is set, it disables the use of numbered capturing parentheses in the pattern. Any opening paren- thesis that is not followed by ? behaves as if it were followed by ?: but named parentheses can still be used for capturing (and they acquire numbers in the usual way). There is no equivalent of this option in Perl.

The above flag descriptions are based on man pcreapi, written by Philip Hazel, 2007.

needle matches?: haystack → Boolean
  | needle is-a?: Regexp
  | haystack is-a?: String

Test if a regular expression matches a given string.

Example:

> r"abc" matches?: "abc"
True
> r"^\d+$" matches?: "123"
True
> r"^\d+$" matches?: "123abc"
False

a =~ b → Boolean

Alias for matches?:, allowing the roles to be swapped.

Example:

> r"\d+" matches?: "1"
True
> "1" matches?: r"\d+"
False

r match: s → in?: [@(ok: RegexpMatch), @none]
  | r is-a?: Regexp
  | s is-a?: String

Attempt a regular expression match, yielding @none if there is not match or @ok: wrapping a RegexpMatch if it's successful.

Example:

> r"\d+" match: "123"
@(ok: <object (delegates to 1 object)>
        captures := []
        before := ""
        bindings := <object>
        after := ""
        match := "123")
> r"\d+" match: "abc"
@none

This is often more cleanly used with case-of?:, where the matching case provides implicit bindings, and the non-matching case continues onto the next test:

Example:

> "123" case-of: { r"\w+" -> @foo; r"(\d+)" -> this }
@foo

haystack replace: needle with: replacement → String
  | haystack is-a?: String
  | needle is-a?: Regexp
  | (replacement is-a?: Block) || (replacement is-a?: String)

Replace the first occurrence of needle in haystack.

If replacement is a block, it is called with the bindings in context.

If it is a string, it is parsed as a replacement format (at runtime), looking for bindings denoted by $. For example, to use the first subcapture, you use "$1". To match a named capture, you use "$(foo)".

Example:

> "abc 123 foo" replace: r{abc (?<num>\d+)} with: { \num .. "!" }
"123! foo"
> "abc 123 foo" replace: r{abc (?<num>\d+)} with: "$(num)!"
"123! foo"
> "123 456" replace: r{(?<num>\d+)} with: "($(num))"
"(123) 456"

haystack replace-all: needle with: replacement → String
  | haystack is-a?: String
  | needle is-a?: Regexp
  | (replacement is-a?: Block) || (replacement is-a?: String)

Similar to replace:with:, but replaces all occurrences of the match.

Example:

> "123 456" replace-all: r{(?<num>\d+)} with: "($(num))"
"(123) (456)"

RegexpMatch → Object

A throw-away object containing information about a regular expression match.

Example:

> (r"a(b)(?<see>c)" match: "abc") match: { @(ok: m) -> m }
<object (delegates to 1 object)>
  captures := ["b", "c"]
  before := ""
  bindings := <object>
  after := ""
  match := "abc"

m match → String
  | m is-a?: RegexpMatch

Yields the matched text.

Example:

> (r"b" match: "abc") match: { @(ok: m) -> m match }
"b"

m captures → List
  | m is-a?: RegexpMatch

Yields the captured matches.

Example:

> (r"a(b)c" match: "abc") match: { @(ok: m) -> m captures }
["b"]

m before → String
  | m is-a?: RegexpMatch

Yields the text preceding a match.

Example:

> (r"b" match: "abc") match: { @(ok: m) -> m before }
"a"

m after → String
  | m is-a?: RegexpMatch

Yields the text following a match.

Example:

> (r"b" match: "abc") match: { @(ok: m) -> m after }
"c"

RegexpBindings → Object

A throw-away object containing the bindings from a regular expression match, as methods preceded by a backslash. Captures are named from their offset, and named matches are bound by their name.

Example:

> (r"a(b)(?<see>c)" match: "abc") match: { @(ok: m) -> m bindings }
<object (delegates to 1 object)>
  \0 := "abc"
  \1 := "b"
  \2 := "c"
  \see := "c"

m bindings → RegexpBindings
  | m is-a?: RegexpMatch

Yields the bind following a match.

Yields the RegexpBindings for a match.

Example:

> (r"(b)(?<see>c)" match: "abc") match: { @(ok: m) -> m bindings }
<object (delegates to 1 object)>
  \0 := "bc"
  \1 := "b"
  \2 := "c"
  \see := "c"

On this page:

Up one level:

Regular Expressions