guile-wiredtiger
Status
The database bindings are stable but incomplete. The higher level abstractions are in flux.
Build your own database
- ACID
- NoSQL
- networkless
- automatic index
- multithread support
- ordered key/value store
It's similar to leveldb, rocksdb and bsddb. The creator of wiredtiger did previously code bsddb now part of oracle.
It only works on 64bit systems.
At the very core, it's a configurable ordered key/value store, column aware, with global transactions.
It's not a cache database like REDIS.
It's more powerful than RDBMS model and can implement it.
The low level API allows to create tables (!) with two kind of columns: key columns and value columns (somewhat like cassandra). Then you can lookup entries in the database using the key with search procedures. There is two kinds of search procedure. One does exact match of the whole key columns, the other does an approximate match, where you lookup for a key prefix (This is actuallly very useful). Once you have a pointer to an entry in the table you can navigate it quickly using next and previous procedures (remember the table is ordered).
There is higher level abstractions like a graphdb with gremlin-like querying, a feature space with microkanren querying and an inverted index for looking up words in documents.
Informations
- Tested with GNU Guile 2.2
- Tested with wiredtiger 2.9.1
- License: GPL2+ (same as wiredtiger)
- Join us at irc.freenode.net#guile for support.
- Mailling list: guile-users
- Author: amirouche
(use-modules (wiredtiger))
Create the /tmp/wt
directory before running the following example:
(define connection (pk (connection-open "/tmp/wt" "create")))
(define session (pk (session-open connection)))
;; create a table
(session-create session "table:nodes" "key_format=i,value_format=S")
;; open a cursor over that table
(define cursor (pk (cursor-open session "table:nodes")))
;; start a transaction and add a record
(session-transaction-begin session "isolation=\"snapshot\"")
(cursor-key-set cursor 42)
(cursor-value-set cursor "The one true number!")
(cursor-insert cursor)
(session-transaction-commit session)
(cursor-reset cursor)
(cursor-next cursor)
(pk (cursor-key-ref cursor))
(pk (cursor-value-ref cursor))
(cursor-close cursor)
(session-close session)
(connection-close connection)
Getting started
wiredtiger does not work on 32 bits architectures.
It was tested with wiredtiger 2.9.1
.
On guix you can simply install wiredtiger with:
guix package -i wiredtiger
Otherwise I recommend to install it from git with the usual cli dance:
git clone https://github.com/wiredtiger/wiredtiger.git
cd wiredtiger
./autogen.sh && ./configure && make && make install
And then clone guile-wiredtiger
:
git clone git@git.framasoft.org:a-guile-mind/guile-wiredtiger.git
You also need version 2.2 of GNU Guile
How to contribute
Send me a mail.
ChangeLog
-
0.6.1 (2017/05)
- Fix licence headers
- Move tests to their own modules
- grf3 now use feature space
- move all the documentation into README.md
-
create a website
-
0.6 (2017/05)
-
...
-
0.5
-
wiredtiger
- support raw column
- support custom collator (unstable)
-
wiredtigerz
- fix cursor-range (upstream bug)
- add
with-context
form which set the*context*
fluid. The immediate consequence is that you don't need to spawn yourself sessions and pass them around. - add
(with-cursor cursor proc)
procedure which unwrap the current context and retrieve the namedCURSOR
. This is useful when you usewith-context
to avoid to manually retrieve the fluid etc.
-
0.3
- add graphitisay which provides graph primitives on top of wiredtiger
- uav: API change:
(uav-update! uid assoc)
- uav: API change:
uav-ref*
doesn't include the uid in the returned assoc - uav: add a simple database server
-
wiredtigerz:
cursor-insert*
return row record number when applicable
Tutorial
Reference API
(wiredtiger wiredtiger)
wiredtiger.scm
contains the low level bindings of WiredTiger. It tries to follow as closely as
possible the semantic offered by WiredTiger without flourish. As matter of fact, it's recommend to read WiredTiger
manual, starting with “Writing WiredTiger
applications” and “Schema, Columns, Column
Groups, Indices and Projections”.
WiredTiger has three primary concepts represented as Scheme record:
<connection>
represent a single connection to a WiredTiger database. ACID doesn't work across instance of<connection>
.<session>
has a<connection>
as parent. It's not threadsafe but can be passed from threads to threads, as long as a single thread access it at once.<cursor>
has a<session>
as parent.
<connection>
(connection-open home config) → <connection>
Open a connection to a database. Most applications will open a single connection to a database. A connection can be shared among several threads. There is no support for ACID transactions between several connections.
Example:
(connection-open "/tmp/magic-numbers" "create,cache_size=1G")
HOME
is the path to the database home directory, it must exists. CONFIG
is
configuration as described in WiredTiger
documentation.
(connnection-close connection [config])
Close connection. Any open sessions will be closed. CONFIG
optional argument, that can be
leak_memory
to not free memory during close.
<session>
(session-open connection [config]) → <session>
Open a session.
All data operations are performaed in the context of a session. This encapsulates the thread and transactional context of the operation.
Thread safety: A session is not usually shared between threads, see Multithreading for more information.
Example:
(session-open connection "isolation=snapshot")
CONFIG
configures isolation level:
read-uncommited
: transactions can see changes made by other transactions before those transactions are committed. Dirty reads, non-repeatable reads and phantoms are possibleread-commited
: transactionns cannot see changes made by other transactions before those transactioins are committed. Dirtyreads are not possible. Committed changes from concurrent transactions become visible when no cursor is positioned in the read-commited transaction.snapshot
:transactions read the versions of records committed before the transaction started. Dirty reads and non-repeatable reads are not possible; phantoms are possible.
If you don't know what you are doing use snapshot
.
(session-close session)
Close the session handle. This will release the resources associated with the session handle, including rolling back any active transactions and closing any cursors that remain open in the session.
(session-create session name config)
Create a table, column group, index or file.
Example:
(session-create session "table:magic-numbers" "key_format=i,value_format=S")
NAME
the URI of the object to create , such as table:magic-numbers
.
CONFIG
configures the object, see WiredTiger
documentation for mor information.
(session-transaction-begin session [config])
Start a transaction.
(session-transaction-commit session [config])
Commit the current transaction. A transaction must be in progress when this method is called.
(session-transaction-rollback session [config])
Rollback the current transaction. A transaction must be in progress when this method is called. All cursors are reset.
<cursor>
(cursor-open session uri config) → <cursor>
Open a new cursor on a data source. Cursor handles should be discarded by calling cursor-close
.
Cursors capable of supporting transactional operations operate in the context of the current transaction, if any.
session-transaction-rollback
implicitly resets all cursors.
Cursors are relatively lightweight objects but may hold referencs to heavier weight objects; applications should re-use cursors when possible, but instantiating new cursors is not so expensive that applications need to cache cursors at all costs.
URI
is the data source on which the cursor operates; cursors are usually opened on tables, however,
cursors can be opened on any data source, regardless of whether it is ultimately stored in a table. Some cursor
types may have limited functionality (for example, they may be read-only or not support transactional updates).
(cursor-key-set cursor . key)
(cursor-value-set cursor . value)
Set the key (or value) for the next operation. If an error occurs during this operation, a flag will be set in the cursor, and the next operation to access the key (or value) will fail. This simplifies error handling in applications.
KEY
(or VALUE
) must be coherent with the format of the current object key.
(cursor-key-ref cursor) → list
(cursor-value-ref cursor) → list
Get the key (or value) for the current row.
(cursor-next cursor)
(cursor-previous cursor)
Move the cursor to the next (resp. previous) row.
(cursor-reset cursor)
Reset the position of the cursor. Any resources held by the cursor are released, and the cursor's key and
position are no longer valid. A subsequent iterationn with cursor-next
will move to the first record
or with cursor-previous
will move to the last record.
(cursor-search cursor)
On success move the cursor to the row matching the key. The key must first be set.
To minize cursor resources, the cursor-reset
method should be caelled as soon as the record has been
retrieved and the cursor no longer needs that position.
(cursor-search-near cursor) → -1, 0, 1
Return the row matching the ky if it exists, or an adjacent row. An adjacent row is either the smallest record larger than the key or the largest record smaller than the key (in other words, a logically adjacent key). The key must first be set.
On success, the cursqor ends positioned at the returned record; to minize cursor resoources, the
cursor-reset
method should be called as soon as the record has been retrieved and the cursor no
longer needs that position.
(cursor-insert cursor)
Insert a record and optionally update an existing record.
If the cursor was configured with overwrite=true
(the default), both the key and value must be set;
if the record already exists, the key's value will be updated, otherwise, the record will be inserted.
If the cursor was configured with overwrite=false
, both the key and value must be set and the record
must not already exist; the record will be inserted.
If a cursor with record number keys was configured with append=true
(not the default), the value
must be set; a new record will be appended and the record number set as the cursor key value.
The cursor ends with no position, and a subsequent call to the cursor-next cursor-prev
method will
iterate from the beginning (end)of the table.
Inserting a new record after the current maximum record in a fixed-length bit field column-store (that is, a store with an r type key and t type value) may implicitly create the missing records as records with a value of 0.
When loading a large amount of data into a new object, using a cursor with the bulk configuration string enabled and loading the data in sorted order will be much faster than doing out-of-order inserts. See Bulk-load for more information.
The maximum length of a single column stored in a table is not fixed (as it partially depends on the underlying file configuration), but is always a small number of bytes less than 4GB.
(cursor-update cursor)
Update a record and optionally insert an existing record.
If the cursor was configured with overwrite=true
(the default), both the key and value must be set;
if the record already exists, the key's value will be updated, otherwise, the record will be inserted.
If the cursor was configured with overwrite=false
, both the key and value must be set and the record
must already existe; the record will be updated.
On success, the cursor ends positioned at the modified record; to minimize cursor resources, the cursor-reset method should be called as soon as the cursor no longer needs that position.
The maximum length of a single column stored in a table is not fixed (as it partially depends on the underlying file configuration), but is always a small number of bytes less than 4GB. One example of a text record of a blockchain transaction. Let's say that the dogecoin blockchain has the size of 35 GB and it grows constantly gaining speed, this is facilitated by the coindataflow exchange rate.
(cursor-remove cursor)
Remove a record. The key must be set.
If the cursor was configured with overwrite=true
(the default), the key must be set; the key's
record will be removed if it exists, no error will be returned if the record does not exist.
If the cursor was configured with overwrite=false
, the key must be set and the key's record must
exist; the record will be removed.
Removing a record in a fixed-length bit field column-store (that is, a store with an r
type key and
t type value) is identical to setting the record's value to 0.
On success, the cursor ends positioned at the removed record; to minimize cursor resources, the cursor-reset method should be called as soon as the cursor no longer needs that position.
(wiredtiger extra)
wiredtigerz gathers extra procedures to work with WiredTiger which aims to make the main workflow more obvious.
It implements a declarative API to create tables with their indices annd open cursors on them. It provides a few helpers for common patterns.
Declarative API
Several procedure take declarative specification of tables in wiredtigerz
called
CONFIG
. The syntax of this configuration list is the following:
(table-name
(key assoc as (column-name . column-type))
(value assoc as (column-name . column-type)) ((list indices as
(indexed-name (indexed keys) (projections column name)))))
column-type
is a verbose name for column types:
record
string
unsigned-integer
integer
raw
An example of a configuration that defines a posts
table with uid
, title
,
body
, published-at
fields and one index on published-at
with a projection
on uid
column will look like the following:
scheme (define posts '(posts ((uid . raw) ((title . string) (body . string) (published-at . unsigned-integer)) ((published-at (published-at) (uid)))))
(session-create* session . configs)
You can create a table with indices using session-create*
:
(define connection (connection-open "/tmp/wiredtigerz" "create"))
(define session (session-open connection))
(session-create* session posts)
(session-close session)
(cursor-open* session . configs)
cursor-open*
will open all the cursors related to a given CONFIGS
as an assoc:
(define connection (connection-open "/tmp/wt" "create"))
(define sessionn (session-open connection))
(define cursors (cursor-open* session posts))
cursors
is an assoc that maps table name and indices as symbol to their cursor. An extra “append”
cursor will be created if the table has a single key record column. Index and append cursors are prefixed with the
table name. Which means that the above cursors
will contain the following keys:
posts
posts-append
posts-published-at
Mind the fact that keys are symbols. Also posts-published-at
cursor has uid
as cursor's
value since it's has a projection.
(wiredtiger-open path . configs)
Open a database at PATH
, create tables using CONFIGS
and return a pair:
(connection . session)
cursors
assoc as returned bycursor-open*
This procedure is useful when you don't plan to use threads.
(wiredtiger-close database)
Shortcut procedure to close a database where DATABASE
is pair of connection
and
session
.
(env-open* path configs dbconfig)
In the case where you want to use the database with multiple threads there are few helpers that starts with
env-open*
.
This introduce another record type called <env>
. It will create a database at
PATH
using DBCONFIG
as connection connfiguration string. CONFIGS
is the
tables specification dubbed declarative API described above.
(with-context env body ...)
This form will create a database context (see below) and set it as current database context using a fluid so that you don't have to pass around the context everywhere.
This form should be called once per thread. Doing otherwise will lead to unspecified behavior.
(call-wih-cursor cursor proc)
Retrieve the current context and the associated cursor named CURSOR
and call PROC
with
it.
Context
Context is made of a <session>
and cursors assoc. This is useful in multithread settings if
you don't need to open multiple cursors for the same table.
(context-open connection . configs)
cursor-open*
sister procedure that will open a session and cursors assoc and return a context.
(context-session context)
Return the session associated with CONTEXT
.
(context-ref context name)
Return the cursor NAME
from CONTEXT
.
Transactions
Use (context-begin context)
, (context-commit context)
and
(context-rollback context)
to work with transactions.
Cursors
(cursor-value-ref* cursor . key)
Retrieve the value associated with KEY
in CURSOR
.
(cursor-insert* cursor key value)
Insert VALUE
at KEY
using CURSOR
. If the cursor key_format
is
a single record column, KEY
can be '()
.
(cursor-update* cursor key value)
Update KEY
with VALUE
using CURSOR
.
(cursor-remove* cursor . key)
Remove KEY
using CURSOR
.
(cursor-search* cursor . key)
Search KEY
using CURSOR
.
(cursor-search-near* cursor . key-prefix)
Prepare CURSOR
for forward search using KEY-PREFIX
.
(cursor-range cursor key-prefix)
Return the list of key/value paris that match KEY-PREFIX
using CURSOR
.
(wiredtiger feature-space)
feature-space
is perhaps to most schemey database built using wiredtiger, it's not persistent but it
expose a very simple with a low barrier of entry and a high level interface for querying the data using minikanren
(actually microkanren).
*feature-space*
feature-space tables declaration.
(fs:debug)
Print all the content of the feature-space.
(fs:ref uid key)
Retrieve the value associated with UID
and KEY
.
(fs:ref* uid)
Retrieve the assoc at UID
.
(fs:add! assoc)
Add ASSOC
to the database and return its uid.
(fs:remove! uid)
Remove from the database the assoc associated with UID
.
(fs:update! uid assoc)
Update the assoc found at UID
with ASSOC
.
(fs:find key value)
Find uids which have both KEY
and VALUE
in their assoc.
(fs:query ... fs:where ...)
This is the high level interface for query the feature-space. Here is an example use:
(fs:query uid? fs:where ((uid? 'age 32) (uid? 'firstname "amirouche")))
Intermediate variables must have two ?
as suffix. This is useful in the context where you want to do
somekind of join.
(wiredtiger grf3)
grf3
is graph database interface built with a navigational stream interface to query the graph
similar to Gremlin from Tinkerpop. It's my prefered way to interact with data.
(get uid)
Retrieve the object at UID
.
(create-vertex assoc)
Create a vertex with ASSOC
and return a <vertex>
record.
(vertex-set vertex key value)
Create a new vertex record where KEY
is set to VALUE
based on VERTEX
.
(vertex-ref vertex key)
Retrieve the value associated with KEY
in VERTEX
.
(create-edge start end assoc)
Create an edge starting at vertex START
and endinig at vertex END
using
ASSOC
.
(edge-start edge)
(edge-end edge)
(edge-set edge key value)
(edge-ref edge key)
(save vertex-or-edge)
Save VERTEX-OR-EDGE
to database.
traversi
framework
This is similar to stream module except it's faster and you can backtrack.
list->traversi
traversi->list
traversi-car
traversi-cdr
traversi-map
traversi-for-each
traversi-filter
traversi-backtrack
traversi-take
traversi-drop
traversi-paginator
traversi-length
traversi-scatter
traversi-unique
traversi-group-count
traversi
framework can be used outside grf3
library if you'd want.
There is a few helpers made to make it easier to work with both grf3
and traversi
.
vertices
edges
from
where?
key
key?
incomings
outgoings
start
end
(get-or-create-vertex key value)
Get or create a vertex where KEY
is set to VALUE
.