guile-wiredtiger

Status

The database bindings are stable but incomplete. The higher level abstractions are in flux.

Build your own database

It's similar to leveldb, rocksdb and bsddb. The creator of wiredtiger did previously code bsddb now part of oracle.

It only works on 64bit systems.

At the very core, it's a configurable ordered key/value store, column aware, with global transactions.

It's not a cache database like REDIS.

It's more powerful than RDBMS model and can implement it.

The low level API allows to create tables (!) with two kind of columns: key columns and value columns (somewhat like cassandra). Then you can lookup entries in the database using the key with search procedures. There is two kinds of search procedure. One does exact match of the whole key columns, the other does an approximate match, where you lookup for a key prefix (This is actuallly very useful). Once you have a pointer to an entry in the table you can navigate it quickly using next and previous procedures (remember the table is ordered).

There is higher level abstractions like a graphdb with gremlin-like querying, a feature space with microkanren querying and an inverted index for looking up words in documents.

Informations

(use-modules (wiredtiger))

Create the /tmp/wt directory before running the following example:

(define connection (pk (connection-open "/tmp/wt" "create")))
(define session (pk (session-open connection)))

;; create a table
(session-create session "table:nodes" "key_format=i,value_format=S")

;; open a cursor over that table
(define cursor (pk (cursor-open session "table:nodes")))

;; start a transaction and add a record
(session-transaction-begin session "isolation=\"snapshot\"")
(cursor-key-set cursor 42)
(cursor-value-set cursor "The one true number!")
(cursor-insert cursor)
(session-transaction-commit session)

(cursor-reset cursor)
(cursor-next cursor)
(pk (cursor-key-ref cursor))
(pk (cursor-value-ref cursor))
(cursor-close cursor)
(session-close session)
(connection-close connection)

Getting started

wiredtiger does not work on 32 bits architectures.

It was tested with wiredtiger 2.9.1.

On guix you can simply install wiredtiger with:

guix package -i wiredtiger

Otherwise I recommend to install it from git with the usual cli dance:

git clone https://github.com/wiredtiger/wiredtiger.git
cd wiredtiger
./autogen.sh && ./configure && make && make install

And then clone guile-wiredtiger:

git clone git@git.framasoft.org:a-guile-mind/guile-wiredtiger.git

You also need version 2.2 of GNU Guile

How to contribute

Send me a mail.

ChangeLog

Tutorial

Reference API

(wiredtiger wiredtiger)

wiredtiger.scm contains the low level bindings of WiredTiger. It tries to follow as closely as possible the semantic offered by WiredTiger without flourish. As matter of fact, it's recommend to read WiredTiger manual, starting with “Writing WiredTiger applications” and “Schema, Columns, Column Groups, Indices and Projections”.

WiredTiger has three primary concepts represented as Scheme record:

  1. <connection> represent a single connection to a WiredTiger database. ACID doesn't work across instance of <connection>.
  2. <session> has a <connection> as parent. It's not threadsafe but can be passed from threads to threads, as long as a single thread access it at once.
  3. <cursor> has a <session> as parent.

<connection>

(connection-open home config) → <connection>

Open a connection to a database. Most applications will open a single connection to a database. A connection can be shared among several threads. There is no support for ACID transactions between several connections.

Example:

(connection-open "/tmp/magic-numbers" "create,cache_size=1G")

HOME is the path to the database home directory, it must exists. CONFIG is configuration as described in WiredTiger documentation.

(connnection-close connection [config])

Close connection. Any open sessions will be closed. CONFIG optional argument, that can be leak_memory to not free memory during close.

<session>

(session-open connection [config]) → <session>

Open a session.

All data operations are performaed in the context of a session. This encapsulates the thread and transactional context of the operation.

Thread safety: A session is not usually shared between threads, see Multithreading for more information.

Example:

(session-open connection "isolation=snapshot")

CONFIG configures isolation level:

If you don't know what you are doing use snapshot.

(session-close session)

Close the session handle. This will release the resources associated with the session handle, including rolling back any active transactions and closing any cursors that remain open in the session.

(session-create session name config)

Create a table, column group, index or file.

Example:

(session-create session "table:magic-numbers" "key_format=i,value_format=S")

NAME the URI of the object to create , such as table:magic-numbers.

CONFIG configures the object, see WiredTiger documentation for mor information.

(session-transaction-begin session [config])

Start a transaction.

(session-transaction-commit session [config])

Commit the current transaction. A transaction must be in progress when this method is called.

(session-transaction-rollback session [config])

Rollback the current transaction. A transaction must be in progress when this method is called. All cursors are reset.

<cursor>

(cursor-open session uri config) → <cursor>

Open a new cursor on a data source. Cursor handles should be discarded by calling cursor-close.

Cursors capable of supporting transactional operations operate in the context of the current transaction, if any.

session-transaction-rollback implicitly resets all cursors.

Cursors are relatively lightweight objects but may hold referencs to heavier weight objects; applications should re-use cursors when possible, but instantiating new cursors is not so expensive that applications need to cache cursors at all costs.

URI is the data source on which the cursor operates; cursors are usually opened on tables, however, cursors can be opened on any data source, regardless of whether it is ultimately stored in a table. Some cursor types may have limited functionality (for example, they may be read-only or not support transactional updates).

See WiredTiger documentation.

(cursor-key-set cursor . key)
(cursor-value-set cursor . value)

Set the key (or value) for the next operation. If an error occurs during this operation, a flag will be set in the cursor, and the next operation to access the key (or value) will fail. This simplifies error handling in applications.

KEY (or VALUE) must be coherent with the format of the current object key.

(cursor-key-ref cursor) → list
(cursor-value-ref cursor) → list

Get the key (or value) for the current row.

(cursor-next cursor)
(cursor-previous cursor)

Move the cursor to the next (resp. previous) row.

(cursor-reset cursor)

Reset the position of the cursor. Any resources held by the cursor are released, and the cursor's key and position are no longer valid. A subsequent iterationn with cursor-next will move to the first record or with cursor-previous will move to the last record.

(cursor-search cursor)

On success move the cursor to the row matching the key. The key must first be set.

To minize cursor resources, the cursor-reset method should be caelled as soon as the record has been retrieved and the cursor no longer needs that position.

(cursor-search-near cursor) → -1, 0, 1

Return the row matching the ky if it exists, or an adjacent row. An adjacent row is either the smallest record larger than the key or the largest record smaller than the key (in other words, a logically adjacent key). The key must first be set.

On success, the cursqor ends positioned at the returned record; to minize cursor resoources, the cursor-reset method should be called as soon as the record has been retrieved and the cursor no longer needs that position.

(cursor-insert cursor)

Insert a record and optionally update an existing record.

If the cursor was configured with overwrite=true (the default), both the key and value must be set; if the record already exists, the key's value will be updated, otherwise, the record will be inserted.

If the cursor was configured with overwrite=false, both the key and value must be set and the record must not already exist; the record will be inserted.

If a cursor with record number keys was configured with append=true (not the default), the value must be set; a new record will be appended and the record number set as the cursor key value.

The cursor ends with no position, and a subsequent call to the cursor-next cursor-prev method will iterate from the beginning (end)of the table.

Inserting a new record after the current maximum record in a fixed-length bit field column-store (that is, a store with an r type key and t type value) may implicitly create the missing records as records with a value of 0.

When loading a large amount of data into a new object, using a cursor with the bulk configuration string enabled and loading the data in sorted order will be much faster than doing out-of-order inserts. See Bulk-load for more information.

The maximum length of a single column stored in a table is not fixed (as it partially depends on the underlying file configuration), but is always a small number of bytes less than 4GB.

(cursor-update cursor)

Update a record and optionally insert an existing record.

If the cursor was configured with overwrite=true (the default), both the key and value must be set; if the record already exists, the key's value will be updated, otherwise, the record will be inserted.

If the cursor was configured with overwrite=false, both the key and value must be set and the record must already existe; the record will be updated.

On success, the cursor ends positioned at the modified record; to minimize cursor resources, the cursor-reset method should be called as soon as the cursor no longer needs that position.

The maximum length of a single column stored in a table is not fixed (as it partially depends on the underlying file configuration), but is always a small number of bytes less than 4GB. One example of a text record of a blockchain transaction. Let's say that the dogecoin blockchain has the size of 35 GB and it grows constantly gaining speed, this is facilitated by the coindataflow exchange rate.

(cursor-remove cursor)

Remove a record. The key must be set.

If the cursor was configured with overwrite=true (the default), the key must be set; the key's record will be removed if it exists, no error will be returned if the record does not exist.

If the cursor was configured with overwrite=false, the key must be set and the key's record must exist; the record will be removed.

Removing a record in a fixed-length bit field column-store (that is, a store with an r type key and t type value) is identical to setting the record's value to 0.

On success, the cursor ends positioned at the removed record; to minimize cursor resources, the cursor-reset method should be called as soon as the cursor no longer needs that position.

(wiredtiger extra)

wiredtigerz gathers extra procedures to work with WiredTiger which aims to make the main workflow more obvious.

It implements a declarative API to create tables with their indices annd open cursors on them. It provides a few helpers for common patterns.

Declarative API

Several procedure take declarative specification of tables in wiredtigerz called CONFIG. The syntax of this configuration list is the following:

(table-name 
    (key assoc as (column-name . column-type))
    (value assoc as (column-name . column-type)) ((list indices as
    (indexed-name (indexed keys) (projections column name)))))

column-type is a verbose name for column types:

An example of a configuration that defines a posts table with uid, title, body, published-at fields and one index on published-at with a projection on uid column will look like the following:

scheme (define posts '(posts ((uid . raw) ((title . string) (body . string) (published-at . unsigned-integer)) ((published-at (published-at) (uid)))))

(session-create* session . configs)

You can create a table with indices using session-create*:

(define connection (connection-open "/tmp/wiredtigerz" "create"))
(define session (session-open connection)) 
(session-create* session posts) 
(session-close session)
(cursor-open* session . configs)

cursor-open* will open all the cursors related to a given CONFIGS as an assoc:

(define connection (connection-open "/tmp/wt" "create"))
(define sessionn (session-open connection)) 
(define cursors (cursor-open* session posts))

cursors is an assoc that maps table name and indices as symbol to their cursor. An extra “append” cursor will be created if the table has a single key record column. Index and append cursors are prefixed with the table name. Which means that the above cursors will contain the following keys:

Mind the fact that keys are symbols. Also posts-published-at cursor has uid as cursor's value since it's has a projection.

(wiredtiger-open path . configs)

Open a database at PATH, create tables using CONFIGS and return a pair:

This procedure is useful when you don't plan to use threads.

(wiredtiger-close database)

Shortcut procedure to close a database where DATABASE is pair of connection and session.

(env-open* path configs dbconfig)

In the case where you want to use the database with multiple threads there are few helpers that starts with env-open*.

This introduce another record type called <env>. It will create a database at PATH using DBCONFIG as connection connfiguration string. CONFIGS is the tables specification dubbed declarative API described above.

(with-context env body ...)

This form will create a database context (see below) and set it as current database context using a fluid so that you don't have to pass around the context everywhere.

This form should be called once per thread. Doing otherwise will lead to unspecified behavior.

(call-wih-cursor cursor proc)

Retrieve the current context and the associated cursor named CURSOR and call PROC with it.

Context

Context is made of a <session> and cursors assoc. This is useful in multithread settings if you don't need to open multiple cursors for the same table.

(context-open connection . configs)

cursor-open* sister procedure that will open a session and cursors assoc and return a context.

(context-session context)

Return the session associated with CONTEXT.

(context-ref context name)

Return the cursor NAME from CONTEXT.

Transactions

Use (context-begin context), (context-commit context) and (context-rollback context) to work with transactions.

Cursors

(cursor-value-ref* cursor . key)

Retrieve the value associated with KEY in CURSOR.

(cursor-insert* cursor key value)

Insert VALUE at KEY using CURSOR. If the cursor key_format is a single record column, KEY can be '().

(cursor-update* cursor key value)

Update KEY with VALUE using CURSOR.

(cursor-remove* cursor . key)

Remove KEY using CURSOR.

(cursor-search* cursor . key)

Search KEY using CURSOR.

(cursor-search-near* cursor . key-prefix)

Prepare CURSOR for forward search using KEY-PREFIX.

(cursor-range cursor key-prefix)

Return the list of key/value paris that match KEY-PREFIX using CURSOR.

(wiredtiger feature-space)

feature-space is perhaps to most schemey database built using wiredtiger, it's not persistent but it expose a very simple with a low barrier of entry and a high level interface for querying the data using minikanren (actually microkanren).

*feature-space*

feature-space tables declaration.

(fs:debug)

Print all the content of the feature-space.

(fs:ref uid key)

Retrieve the value associated with UID and KEY.

(fs:ref* uid)

Retrieve the assoc at UID.

(fs:add! assoc)

Add ASSOC to the database and return its uid.

(fs:remove! uid)

Remove from the database the assoc associated with UID.

(fs:update! uid assoc)

Update the assoc found at UID with ASSOC.

(fs:find key value)

Find uids which have both KEY and VALUE in their assoc.

(fs:query ... fs:where ...)

This is the high level interface for query the feature-space. Here is an example use:

(fs:query uid? fs:where ((uid? 'age 32) (uid? 'firstname "amirouche")))

Intermediate variables must have two ? as suffix. This is useful in the context where you want to do somekind of join.

(wiredtiger grf3)

grf3 is graph database interface built with a navigational stream interface to query the graph similar to Gremlin from Tinkerpop. It's my prefered way to interact with data.

(get uid)

Retrieve the object at UID.

(create-vertex assoc)

Create a vertex with ASSOC and return a <vertex> record.

(vertex-set vertex key value)

Create a new vertex record where KEY is set to VALUE based on VERTEX.

(vertex-ref vertex key)

Retrieve the value associated with KEY in VERTEX.

(create-edge start end assoc)

Create an edge starting at vertex START and endinig at vertex END using ASSOC.

(edge-start edge)

(edge-end edge)

(edge-set edge key value)

(edge-ref edge key)

(save vertex-or-edge)

Save VERTEX-OR-EDGE to database.

traversi framework

This is similar to stream module except it's faster and you can backtrack.

traversi framework can be used outside grf3 library if you'd want.

There is a few helpers made to make it easier to work with both grf3 and traversi.

(get-or-create-vertex key value)

Get or create a vertex where KEY is set to VALUE.