============================ Regular Expression Functions ============================ All of the regular expression functions use the `Java pattern`_ syntax, with a few notable exceptions: * When using multi-line mode (enabled via the ``(?m)`` flag), only ``\n`` is recognized as a line terminator. Additionally, the ``(?d)`` flag is not supported and must not be used. * Case-insensitive matching (enabled via the ``(?i)`` flag) is always performed in a Unicode-aware manner. However, context-sensitive and local-sensitive matching is not supported. Additionally, the ``(?u)`` flag is not supported and must not be used. * Surrogate pairs are not supported. For example, ``\uD800\uDC00`` is not treated as ``U+10000`` and must be specified as ``\x{10000}``. * Boundaries (``\b``) are incorrectly handled for a non-spacing mark without a base character. * ``\Q`` and ``\E`` are not supported in character classes (such as ``[A-Z123]``) and are instead treated as literals. * Unicode character classes (``\p{prop}``) are supported with the following differences: * All underscores in names must be removed. For example, use ``OldItalic`` instead of ``Old_Italic``. * Scripts must be specified directly, without the ``Is``, ``script=`` or ``sc=`` prefixes. Example: ``\p{Hiragana}`` * Blocks must be specified with the ``In`` prefix. The ``block=`` and ``blk=`` prefixes are not supported. Example: ``\p{Mongolian}`` * Categories must be specified directly, without the ``Is``, ``general_category=`` or ``gc=`` prefixes. Example: ``\p{L}`` * Binary properties must be specified directly, without the ``Is``. Example: ``\p{NoncharacterCodePoint}`` .. _Java pattern: http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html .. _capturing group number: http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#gnumber .. _Capturing groups: http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#cg .. function:: regexp_extract_all(string, pattern) -> array Returns the substring(s) matched by the regular expression ``pattern`` in ``string``:: SELECT regexp_extract_all('1a 2b 14m', '\d+'); -- [1, 2, 14] .. function:: regexp_extract_all(string, pattern, group) -> array Finds all occurrences of the regular expression ``pattern`` in ``string`` and returns the `capturing group number`_ ``group``:: SELECT regexp_extract_all('1a 2b 14m', '(\d+)([a-z]+)', 2); -- ['a', 'b', 'm'] .. function:: regexp_extract(string, pattern) -> varchar Returns the first substring matched by the regular expression ``pattern`` in ``string``:: SELECT regexp_extract('1a 2b 14m', '\d+'); -- 1 .. function:: regexp_extract(string, pattern, group) -> varchar Finds the first occurrence of the regular expression ``pattern`` in ``string`` and returns the `capturing group number`_ ``group``:: SELECT regexp_extract('1a 2b 14m', '(\d+)([a-z]+)', 2); -- 'a' .. function:: regexp_like(string, pattern) -> boolean Evaluates the regular expression ``pattern`` and determines if it is contained within ``string``. This function is similar to the ``LIKE`` operator, expect that the pattern only needs to be contained within ``string``, rather than needing to match all of ``string``. In other words, this performs a *contains* operation rather than a *match* operation. You can match the entire string by anchoring the pattern using ``^`` and ``$``:: SELECT regexp_like('1a 2b 14m', '\d+b'); -- true .. function:: regexp_replace(string, pattern) -> varchar Removes every instance of the substring matched by the regular expression ``pattern`` from ``string``:: SELECT regexp_replace('1a 2b 14m', '\d+[ab] '); -- '14m' .. function:: regexp_replace(string, pattern, replacement) -> varchar Replaces every instance of the substring matched by the regular expression ``pattern`` in ``string`` with ``replacement``. `Capturing groups`_ can be referenced in ``replacement`` using ``$g`` for a numbered group or ``${name}`` for a named group. A dollar sign (``$``) may be included in the replacement by escaping it with a backslash (``\$``):: SELECT regexp_replace('1a 2b 14m', '(\d+)([ab]) ', '3c$2 '); -- '3ca 3cb 14m' .. function:: regexp_split(string, pattern) -> array Splits ``string`` using the regular expression ``pattern`` and returns an array. Trailing empty strings are preserved:: SELECT regexp_split('1a 2b 14m', '\s*[a-z]+\s*'); -- [1, 2, 14, ]