regexp: Regular expressions in SQLite
The sqlean-regexp
extension provides regexp search and replace functions. Supports all major regular expression features (see the section on syntax below).
Supports Unicode in character classes (like \w
) and assertions (like \b
).
Reference • Supported syntax • Acknowledgements • Installation and usage
Reference
regexp • like • substr • capture • replace
REGEXP statement
Checks if the source string matches the pattern.
select true where 'the year is 2021' regexp '[0-9]+';
-- 1
regexp_like
regexp_like(source, pattern)
Checks if the source string matches the pattern.
select regexp_like('the year is 2021', '[0-9]+');
-- 1
select regexp_like('the year is 2021', '2k21');
-- 0
regexp_substr
regexp_substr(source, pattern)
Returns a substring of the source string that matches the pattern.
select regexp_substr('the year is 2021', '[0-9]+');
-- 2021
select regexp_substr('the year is 2021', '2k21');
-- (null)
regexp_capture
regexp_capture(source, pattern [, n])
Finds a substring of the source string that matches the pattern and returns the n
th matching group within that substring. Group numbering starts at 1. n = 0
(default) returns the entire substring.
select regexp_capture('years is 2021', '\d\d(\d\d)', 0);
-- 2021
select regexp_capture('years is 2021', '\d\d(\d\d)', 1);
-- 21
regexp_replace
regexp_replace(source, pattern, replacement)
Replaces all matching substrings with the replacement string.
select regexp_replace('the year is 2021', '[0-9]+', '2050');
-- the year is 2050
select regexp_replace('the year is 2021', '2k21', '2050');
-- the year is 2021
Supports backreferences to captured groups $1
trough $9
in the replacement string:
select regexp_replace('the year is 2021', '([0-9]+)', '$1 or 2050');
-- the year is 2021 or 2050
Supported syntax
Basic expressions:
. any character except newline
a the character a
ab the string ab
a|b a or b
\ escapes a special character
Quantifiers:
* 0 or more
+ 1 or more
? 0 or 1
{n} exactly n
{n,m} between n and m
{n,} n or more
Quantifiers are greedy by default (i.e., match as much text as possible). Add ?
to make them lazy (i.e., match as little text as possible):
*? 0 or more (lazy)
+? 1 or more (lazy)
Groups:
(...) capturing group
(?:...) non-capturing group
(?>...) atomic group
\N match the Nth captured group
Character classes:
[ab-d] one character of: a, b, c, d
[^ab-d] one character except: a, b, c, d
\d one digit
\D one non-digit
\s one whitespace
\S one non-whitespace
\w one word character
\W one non-word character
Assertions:
^ start of string
$ end of string
\b word boundary
\B non-word boundary
(?=...) positive lookahead
(?!...) negative lookahead
Options:
(?i) case-insensitive matching
select regexp_substr('the YEAR is 2021', 'year');
-- (null)
select regexp_substr('the YEAR is 2021', '(?i)year');
-- YEAR
Acknowledgements
Based on the PCRE2 engine.
Installation and usage
SQLite command-line interface:
sqlite> .load ./regexp
sqlite> select regexp_like('abcdef', 'b.d');
See How to install an extension for usage with IDE, Python, etc.