vsv: CSV files as virtual tables in SQLite

The sqlean-vsv extension provides virtual table for working directly with CSV files, without importing data into the database. Useful for very large datasets.

ExampleParametersAcknowledgementsInstallation and usage

Example

For the people.csv file with the following data:

11,Diane,London
22,Grace,Berlin
33,Alice,Paris

The vsv virtual table could look like this:

.load ./vsv

create virtual table people using vsv(
    filename=people.csv,
    schema="create table people(id integer, name text, city text)",
    columns=3,
    affinity=integer
);
select * from people;
┌────┬───────┬────────┐
│ id │ name  │  city  │
├────┼───────┼────────┤
│ 11 │ Diane │ London │
│ 22 │ Grace │ Berlin │
│ 33 │ Alice │ Paris  │
└────┴───────┴────────┘

Parameters

The parameters to the vsv module (the vsv(...) part) are as follows:

filename=STRING     the filename, passed to the Operating System
data=STRING         alternative data
schema=STRING       Alternate Schema to use
columns=N           columns parsed from the VSV file
header=BOOL         whether or not a header row is present
skip=N              number of leading data rows to skip
rsep=STRING         record separator
fsep=STRING         field separator
dsep=STRING         decimal separator
validatetext=BOOL   validate UTF-8 encoding of text fields
affinity=AFFINITY   affinity to apply to each returned value
nulls=BOOL          empty fields are returned as NULL

If schema is given, then columns is also required.

Defaults

filename / data     nothing.  You must provide one or the other
                    it is an error to provide both or neither

schema              nothing.  If not provided then one will be
                    generated for you from the header, or if no
                    header is available then autogenerated using
                    field names manufactured as cX where X is the
                    column number

columns             nothing.  If not specified then the number of
                    columns is determined by counting the fields
                    in the first record of the VSV file (which
                    will be the header row if header is specified),
                    the number of columns is not parsed from the
                    schema even if one is provided

header=no           no header row in the VSV file
skip=0              do not skip any data rows in the VSV file
fsep=','            default field separator is a comma
rsep='\n'           default record separator is a newline
dsep='.'            default decimal separator is a point
validatetext=no     do not validate text field encoding
affinity=none       do not apply affinity to each returned value
nulls=off           empty fields returned as zero-length

Options

The validatetext setting will cause the validity of the field encoding (not its contents) to be verified. It effects how fields that are supposed to contain text will be returned to the SQLite3 library in order to prevent invalid utf8 data from being stored or processed as if it were valid utf8 text.

The nulls option will cause fields that do not contain anything to return NULL rather than an empty result. Two separators side-by-each with no intervening characters at all will be returned as NULL if nulls is true and if nulls is false or the contents are explicity empty ("") then a 0 length blob (if affinity=blob) or 0 length text string.

For the affinity setting, the following processing is applied to each value returned by the VSV virtual table:

Parameter types

The separator string containing exactly one character, or a valid escape sequence. Recognized escape sequences are:

\t horizontal tab, ascii character 9 (0x09)
\n linefeed, ascii character 10 (0x0a)
\v vertical tab, ascii character 11 (0x0b)
\f form feed, ascii character 12 (0x0c)
\xhh specific byte where hh is hexadecimal

Acknowledgements

Adapted from vsv.c by Keith Medcalf.

Installation and usage

SQLite command-line interface:

sqlite> .load ./vsv
sqlite> create virtual table temp.vsv using vsv(...);
sqlite> select * from vsv;

See How to install an extension for usage with IDE, Python, etc.

Download the extension.

Explore other extensions.

Subscribe to stay on top of new features.