fuzzy: Fuzzy string matching and phonetics in SQLite
The sqlean-fuzzy
extension provides fuzzy-matching helpers:
- Measure distance between two strings.
- Compute phonetic string code.
- Transliterate a string.
If you want a ready-to-use mechanism to search a large vocabulary for close matches, see the spellfix extension instead.
String distances • Phonetic codes • Transliteration • Acknowledgements • Installation and usage
String distances
These functions measure the distance between two strings.
Only ASCII strings are supported. Use the translit function to convert the input string from UTF-8 to plain ASCII.
damlev • editdist • hamming • jarowin • leven • osadist
fuzzy_damlev
fuzzy_damlev(x, y)
Calculates the Damerau-Levenshtein distance.
select fuzzy_damlev('awesome', 'aewsme');
-- 2
fuzzy_editdist
fuzzy_editdist(x, y)
Calculates the spellcheck edit distance.
select fuzzy_editdist('awesome', 'aewsme');
-- 215
fuzzy_hamming
fuzzy_hamming(x, y)
Calculates the Hamming distance.
select fuzzy_hamming('awesome', 'aewsome');
-- 2
fuzzy_jarowin
fuzzy_jarowin(x, y)
Calculates the Jaro-Winkler distance.
select fuzzy_jarowin('awesome', 'aewsme');
-- 0.907142857142857
fuzzy_leven
Calculates the Levenshtein distance.
select fuzzy_leven('awesome', 'aewsme');
-- 3
fuzzy_osadist
fuzzy_osadist(x, y)
Calculates the Optimal String Alignment distance.
select fuzzy_osadist('awesome', 'aewsme');
-- 3
Phonetic codes
These functions compute phonetic string codes.
Only ASCII strings are supported. Use the translit function to convert the input string from UTF-8 to plain ASCII.
caver • phonetic • soundex • rsoundex
fuzzy_caver
fuzzy_caver(x)
Calculates the Caverphone code.
select fuzzy_caver('awesome');
-- AWSM111111
fuzzy_phonetic
fuzzy_phonetic(x)
Calsulates the spellcheck phonetic code.
select fuzzy_phonetic('awesome');
-- ABACAMA
fuzzy_soundex
fuzzy_soundex(x)
Calculates the Soundex code.
select fuzzy_soundex('awesome');
-- A250
fuzzy_rsoundex
fuzzy_rsoundex(x)
Calculates the Refined Soundex code.
select fuzzy_rsoundex('awesome');
-- A03080
Transliteration
fuzzy_translit(str)
Transliteration converts the input string from UTF-8 into plain ASCII by converting all non-ASCII characters to some combination of characters in the ASCII subset.
The distance and phonetic functions are ASCII only, so to work with a Unicode string, you should first transliterate it:
select fuzzy_translit('sí señor');
-- si senor
select fuzzy_translit('привет');
-- privet
Some characters may be lost:
select fuzzy_translit('oh my 😅');
-- oh my ?
Acknowledgements
Adapted from libstrcmp by Ross Bayer and spellfix.c by D. Richard Hipp.
Installation and usage
SQLite command-line interface:
sqlite> .load ./fuzzy
sqlite> select fuzzy_soundex('hello');
See How to install an extension for usage with IDE, Python, etc.
↓ Download the extension.
⛱ Explore other extensions.
★ Subscribe to stay on top of new features.