Cassandra
Cassandra is a distributed NoSQL database designed to handle large amounts of data across many commodity servers. It provides high availability with no single point of failure. Anyquery provides a way to connect to Cassandra tables and run queries on them.
Getting started
Section titled “Getting started”To connect a Cassandra database to Anyquery, you need to provide the connection details. Open a shell, and run the following command:
anyquery connection add
The CLI will ask you for:
- a connection name (the schema name to reference the Cassandra database)
- the type of the connection (Cassandra)
- the connection string (the format is
cassandra://user:password@host:port
, e.g.cassandra://myuser:mypassword@127.0.0.1:9042
). Refer to the DSN documentation for more information on the connection string. - a filter. It is a CEL script that filters the tables and views to import. For example, to import only the tables of the
cassandra
keyspace, you can use the following filter:table.schema == 'cassandra'
. If you want to import all tables, you can leave this field empty. Refer to the CEL syntax documentation for more information.
Press enter to add the connection. On the next startup, Anyquery will fetch the list of tables and views from the database and import them automatically.
Congratulations 🎉! You can now run queries on the tables and views from the Cassandra database. The table name will follow this format connection_name.[schema_name]_table_name
(e.g. mycassandra.information_schema_tables
)
-- Join a table from Cassandra with a table from an APISELECT *FROM mycassandra.my_table mJOIN github_stargazers_from_repository('julien040/anyquery') gON m.id = g.login
Additional information
Section titled “Additional information”The Data Source Name (DSN) for a Cassandra connection is a string that specifies the connection details. The format is cassandra://user:password@host:port?option1=value1&option2=value2
.
The user
and password
are the credentials to connect to the Cassandra database. They often default to empty strings or cassandra:cassandra
.
The host
is the address of one of the Cassandra nodes in the cluster. You can specify multiple hosts separated by commas (e.g. 192.168.1.1:9042, 172.16.0.1:9043
).
Options
Section titled “Options”You can specify additional options in the DSN to configure the connection. Here are the options:
consistency
: The consistency level for the queries. It can be one of the following values:ANY
,ONE
,TWO
,THREE
,QUORUM
,LOCAL_QUORUM
,EACH_QUORUM
,SERIAL
,LOCAL_SERIAL
, orALL
.tls
orssl
: If set totrue
, the connection will use TLS/SSL. The default isfalse
. It also enables thetls_ca_cert
,tls_cert
, andtls_key
options.tls_ca_cert
: The path to the CA certificate file for TLS/SSL.tls_cert
: The path to the client certificate file for TLS/SSL.tls_key
: The path to the client private key file for TLS/SSL.
cassandra://myuser:mypassword@cassandra1.datacenter1:9042,cassandra2.datacenter1:9042?consistency=QUORUM&tls=true&tls_ca_cert=/path/to/ca.crt&tls_cert=/path/to/client.crt&tls_key=/path/to/client.key
Functions
Section titled “Functions”When querying a Cassandra database, functions aren’t passed to Cassandra. Instead, they are executed in the SQLite database. It might often result in a performance hit.
Performance
Section titled “Performance”Cassandra is designed for high performance. While this integration tries to transfer the least amount of data possible, any query that requires a join or a WHERE invoquing a function will result in all the rows transferred to Anyquery, which can be slow.
Also, Cassandra is designed around SSTable. You can only filter by the primary key or clustering columns. Any other WHERE clause will result in a full table scan, transferring all the rows to Anyquery, which can be slow.
Anyquery is based on SQLite, which has a limited set of types. When querying a Cassandra database, the types are converted to SQLite types. Here is the mapping:
Cassandra Type | SQLite Type | Additional Information |
---|---|---|
ascii | TEXT | |
bigint | INTEGER | |
blob | BLOB | |
boolean | BOOLEAN | |
counter | INTEGER | |
date | DATE | |
decimal | REAL | |
double | REAL | |
float | REAL | |
inet | TEXT | 0.0.0.0 representation |
int | INTEGER | |
smallint | INTEGER | |
text | TEXT | |
timestamp | DATETIME | |
timeuuid | TEXT | UUID representation |
tinyint | INTEGER | |
uuid | TEXT | UUID representation |
varint | INTEGER | |
list<type> | JSON | |
set<type> | JSON | |
map<key_type, value_type> | JSON | |
tuple<type1, type2, ...> | Not supported | The column will be ignored |
udt<type> | JSON | Not supported. Might return {} |
frozen<type> | JSON | Frozen is ignored, the type is used instead |
duration | INTEGER | Duration in nanoseconds. A month is 30 days. |
Limitations
Section titled “Limitations”- Tuples are not supported. The column will be ignored.
- User-defined types (UDT) are not supported. The column will be filled with an empty JSON object
{}
. - Frozen types are not supported. The type is used instead.
- Insert, update and delete queries are not supported. Only SELECT queries are supported.