15.11. Regular Expression Functions
All of the regular expression functions use the Java pattern syntax, with a few notable exceptions:
- When using multi-line mode (enabled via the
(?m)
flag), only\n
is recognized as a line terminator. Additionally, the(?d)
flag is not supported and must not be used. - Case-insensitive matching (enabled via the
(?i)
flag) is always performed in a Unicode-aware manner. However, context-sensitive and local-sensitive matching is not supported. Additionally, the(?u)
flag is not supported and must not be used. - Surrogate pairs are not supported. For example,
\uD800\uDC00
is not treated asU+10000
and must be specified as\x{10000}
. - Boundaries (
\b
) are incorrectly handled for a non-spacing mark without a base character. \Q
and\E
are not supported in character classes (such as[A-Z123]
) and are instead treated as literals.- Unicode character classes (
\p{prop}
) are supported with the following differences:- All underscores in names must be removed. For example, use
OldItalic
instead ofOld_Italic
. - Scripts must be specified directly, without the
Is
,script=
orsc=
prefixes. Example:\p{Hiragana}
- Blocks must be specified with the
In
prefix. Theblock=
andblk=
prefixes are not supported. Example:\p{Mongolian}
- Categories must be specified directly, without the
Is
,general_category=
orgc=
prefixes. Example:\p{L}
- Binary properties must be specified directly, without the
Is
. Example:\p{NoncharacterCodePoint}
- All underscores in names must be removed. For example, use
-
regexp_extract_all
(string, pattern) → array<varchar> Returns the substring(s) matched by the regular expression
pattern
instring
:SELECT regexp_extract_all('1a 2b 14m', '\d+'); -- [1, 2, 14]
-
regexp_extract_all
(string, pattern, group) → array<varchar> Finds all occurrences of the regular expression
pattern
instring
and returns the capturing group numbergroup
:SELECT regexp_extract_all('1a 2b 14m', '(\d+)([a-z]+)', 2); -- ['a', 'b', 'm']
-
regexp_extract
(string, pattern) → varchar Returns the first substring matched by the regular expression
pattern
instring
:SELECT regexp_extract('1a 2b 14m', '\d+'); -- 1
-
regexp_extract
(string, pattern, group) → varchar Finds the first occurrence of the regular expression
pattern
instring
and returns the capturing group numbergroup
:SELECT regexp_extract('1a 2b 14m', '(\d+)([a-z]+)', 2); -- 'a'
-
regexp_like
(string, pattern) → boolean Evaluates the regular expression
pattern
and determines if it is contained withinstring
.This function is similar to the
LIKE
operator, expect that the pattern only needs to be contained withinstring
, rather than needing to match all ofstring
. In other words, this performs a contains operation rather than a match operation. You can match the entire string by anchoring the pattern using^
and$
:SELECT regexp_like('1a 2b 14m', '\d+b'); -- true
-
regexp_replace
(string, pattern) → varchar Removes every instance of the substring matched by the regular expression
pattern
fromstring
:SELECT regexp_replace('1a 2b 14m', '\d+[ab] '); -- '14m'
-
regexp_replace
(string, pattern, replacement) → varchar Replaces every instance of the substring matched by the regular expression
pattern
instring
withreplacement
. Capturing groups can be referenced inreplacement
using$g
for a numbered group or${name}
for a named group. A dollar sign ($
) may be included in the replacement by escaping it with a backslash (\$
):SELECT regexp_replace('1a 2b 14m', '(\d+)([ab]) ', '3c$2 '); -- '3ca 3cb 14m'
-
regexp_split
(string, pattern) → array<varchar> Splits
string
using the regular expressionpattern
and returns an array. Trailing empty strings are preserved:SELECT regexp_split('1a 2b 14m', '\s*[a-z]+\s*'); -- [1, 2, 14, ]