diff --git a/README.md b/README.md index 287b3bc9..e54ef57a 100644 --- a/README.md +++ b/README.md @@ -55,15 +55,29 @@ provided implementations can be considered stable. ###Idiomatic API API inspired by [BoltDB](https://github.com/boltdb/bolt) with automatic -commit/rollback of transactions. The goal of lmdb-go is to provide idiomatic, -safe database interactions without compromising the flexibility of the C API. +commit/rollback of transactions. The goal of lmdb-go is to provide idiomatic +database interactions without compromising the flexibility of the C API. + +**NOTE:** While the lmdb package tries hard to make LMDB as easy to use as +possible there are compromises, gotchas, and caveats that application +developers must be aware of when relying on LMDB to store their data. All +users are encouraged to fully read the +[documentation](https://godoc.org/github.com/bmatsuo/lmdb-go/lmdb) so they are +aware of these caveats. + +Where the lmdb package and its implementation decisions do not meet the needs +of application developers in terms of safety or operational use the lmdbsync +package has been designed to wrap lmdb and safely fill in additional +functionality. Consult the +[documentation](https://godoc.org/github.com/bmatsuo/lmdb-go/exp/lmdbsync) for +more information about the lmdbsync package. ###API coverage The lmdb-go project aims for complete coverage of the LMDB C API (within reason). Some notable features and optimizations that are supported: -- Idiomatic subtransactions ("sub-updates") that do not disrupt thread locking. +- Idiomatic subtransactions ("sub-updates") that allow the batching of updates. - Batch IO on databases utilizing the `MDB_DUPSORT` and `MDB_DUPFIXED` flags. @@ -119,6 +133,11 @@ questions of why to use one database or the other. - As a pure Go package bolt can be easily cross-compiled using the `go` toolchain and `GOOS`/`GOARCH` variables. +- Its simpler design and implementation in pure Go mean it is free of many + caveats and gotchas which are present using the lmdb package. For more + information about caveats with the lmdb package, consult its + [documentation](https://godoc.org/github.com/bmatsuo/lmdb-go/lmdb). + ###Advantages of LMDB - Keys can contain multiple values using the DupSort flag. diff --git a/lmdb/env.go b/lmdb/env.go index 50357bec..d3b4105e 100644 --- a/lmdb/env.go +++ b/lmdb/env.go @@ -378,18 +378,22 @@ func (env *Env) SetMaxDBs(size int) error { // calling either its Abort or Commit methods to ensure that its resources are // released. // +// BeginTxn does not call runtime.LockOSThread. Unless the Readonly flag is +// passed goroutines must call runtime.LockOSThread before calling BeginTxn and +// the returned Txn must not have its methods called from another goroutine. +// Failure to meet these restrictions can have undefined results that may +// include deadlocking your application. +// +// Instead of calling BeginTxn users should prefer calling the View and Update +// methods, which assist in management of Txn objects and provide OS thread +// locking required for write transactions. +// // A finalizer detects unreachable, live transactions and logs thems to // standard error. The transactions are aborted, but their presence should be // interpreted as an application error which should be patched so transactions // are terminated explicitly. Unterminated transactions can adversly effect // database performance and cause the database to grow until the map is full. // -// BeginTxn does not attempt to serialize write transaction operations to an OS -// thread and without care its use for write transactions can have undefined -// results. -// -// Instead of BeginTxn users should call the View, Update, RunTxn methods. -// // See mdb_txn_begin. func (env *Env) BeginTxn(parent *Txn, flags uint) (*Txn, error) { txn, err := beginTxn(env, parent, flags) @@ -404,9 +408,10 @@ func (env *Env) BeginTxn(parent *Txn, flags uint) (*Txn, error) { // Because RunTxn terminates the transaction goroutines should not retain // references to it or its data after fn returns. // -// RunTxn does not lock the thread of the calling goroutine. Unless the -// Readonly flag is passed the calling goroutine should ensure it is locked to -// its thread. +// RunTxn does not call runtime.LockOSThread. Unless the Readonly flag is +// passed the calling goroutine should ensure it is locked to its thread and +// any goroutines started by fn must not call methods on the Txn object it is +// passed. // // See mdb_txn_begin. func (env *Env) RunTxn(flags uint, fn TxnOp) error { @@ -417,6 +422,10 @@ func (env *Env) RunTxn(flags uint, fn TxnOp) error { // environment and passes it to fn. View terminates its transaction after fn // returns. Any error encountered by View is returned. // +// Unlike with Update transactions, goroutines created by fn are free to call +// methods on the Txn passed to fn provided they are synchronized in their +// accesses (e.g. using a mutex or channel). +// // Any call to Commit, Abort, Reset or Renew on a Txn created by View will // panic. func (env *Env) View(fn TxnOp) error { @@ -427,9 +436,22 @@ func (env *Env) View(fn TxnOp) error { // if fn returns a nil error otherwise Update aborts the transaction and // returns the error. // -// Update locks the calling goroutine to its thread and unlocks it after fn -// returns. The Txn must not be used from multiple goroutines, even with -// synchronization. +// Update calls runtime.LockOSThread to lock the calling goroutine to its +// thread and until fn returns and the transaction has been terminated, at +// which point runtime.UnlockOSThread is called. If the calling goroutine is +// already known to be locked to a thread, use UpdateLocked instead to avoid +// premature unlocking of the goroutine. +// +// Neither Update nor UpdateLocked cannot be called safely from a goroutine +// where it isn't known if runtime.LockOSThread has been called. In such +// situations writes must either be done in a newly created goroutine which can +// be safely locked, or through a worker goroutine that accepts updates to +// apply and delivers transaction results using channels. See the package +// documentation and examples for more details. +// +// Goroutines created by the operation fn must not use methods on the Txn +// object that fn is passed. Doing so would have undefined and unpredictable +// results for your program (likely including data loss, deadlock, etc). // // Any call to Commit, Abort, Reset or Renew on a Txn created by Update will // panic. @@ -441,6 +463,17 @@ func (env *Env) Update(fn TxnOp) error { // its thread. UpdateLocked should be used if the calling goroutine is already // locked to its thread for another purpose. // +// Neither Update nor UpdateLocked cannot be called safely from a goroutine +// where it isn't known if runtime.LockOSThread has been called. In such +// situations writes must either be done in a newly created goroutine which can +// be safely locked, or through a worker goroutine that accepts updates to +// apply and delivers transaction results using channels. See the package +// documentation and examples for more details. +// +// Goroutines created by the operation fn must not use methods on the Txn +// object that fn is passed. Doing so would have undefined and unpredictable +// results for your program (likely including data loss, deadlock, etc). +// // Any call to Commit, Abort, Reset or Renew on a Txn created by UpdateLocked // will panic. func (env *Env) UpdateLocked(fn TxnOp) error { diff --git a/lmdb/example_test.go b/lmdb/example_test.go index e0c1aea7..e79d28ee 100644 --- a/lmdb/example_test.go +++ b/lmdb/example_test.go @@ -6,6 +6,7 @@ import ( "fmt" "log" "os" + "runtime" "time" "github.com/bmatsuo/lmdb-go/lmdb" @@ -25,8 +26,10 @@ var err error var stop chan struct{} // These values can be used as no-op placeholders in examples. -func doUpdate(txn *lmdb.Txn) error { return nil } -func doView(txn *lmdb.Txn) error { return nil } +func doUpdate(txn *lmdb.Txn) error { return nil } +func doUpdate1(txn *lmdb.Txn) error { return nil } +func doUpdate2(txn *lmdb.Txn) error { return nil } +func doView(txn *lmdb.Txn) error { return nil } // This example demonstrates a complete workflow for an lmdb environment. The // Env is first created. After being configured the Env is mapped to memory. @@ -80,6 +83,73 @@ func Example() { } } +// This example demonstrates the simplest (and most naive) way to issue +// database updates from a goroutine for which it cannot be known ahead of time +// whether runtime.LockOSThread has been called. +func Example_threads() { + // Create a function that wraps env.Update and sends the resulting error + // over a channel. Because env.Update is called our update function will + // call runtime.LockOSThread to safely issue the update operation. + update := func(res chan<- error, op lmdb.TxnOp) { + res <- env.Update(op) + } + + // ... + + // Now, in goroutine where we cannot determine if we are locked to a + // thread, we can create a new goroutine to process the update(s) we want. + res := make(chan error) + go update(res, func(txn *lmdb.Txn) (err error) { + return txn.Put(dbi, []byte("thisUpdate"), []byte("isSafe"), 0) + }) + err = <-res + if err != nil { + panic(err) + } +} + +// This example demonstrates a more sophisticated way to issue database updates +// from a goroutine for which it cannot be known ahead of time whether +// runtime.LockOSThread has been called. +func Example_worker() { + // Wrap operations in a struct that can be passed over a channel to a + // worker goroutine. + type lmdbop struct { + op lmdb.TxnOp + res chan<- error + } + worker := make(chan *lmdbop) + update := func(op lmdb.TxnOp) error { + res := make(chan error) + worker <- &lmdbop{op, res} + return <-res + } + + // Start issuing update operations in a goroutine in which we know + // runtime.LockOSThread can be called and we can safely issue transactions. + go func() { + runtime.LockOSThread() + defer runtime.LockOSThread() + + // Perform each operation as we get to it. Because this goroutine is + // already locked to a thread, env.UpdateLocked is called to avoid + // premature unlocking of the goroutine from its thread. + for op := range worker { + op.res <- env.UpdateLocked(op.op) + } + }() + + // ... + + // In another goroutine, where we cannot determine if we are locked to a + // thread already. + err = update(func(txn *lmdb.Txn) (err error) { + // This function will execute safely in the worker goroutine, which is + // locked to its thread. + return txn.Put(dbi, []byte("thisUpdate"), []byte("isSafe"), 0) + }) +} + // This example demonstrates how an application typically uses Env.SetMapSize. // The call to Env.SetMapSize() is made before calling env.Open(). Any calls // after calling Env.Open() must take special care to synchronize with other @@ -159,6 +229,69 @@ func ExampleEnv_Copy() { }(time.Tick(time.Hour)) } +// This example shows the basic use of Env.Update, the primary method lmdb-go +// provides for to store data in an Env. +func ExampleEnv_Update() { + // It is not safe to call runtime.LockOSThread here because Env.Update + // would later cause premature unlocking of the goroutine. If an + // application requires that goroutines be locked to threads before + // starting an an update on the Env then you must use Env.UpdateLocked + // instead of Env.Update. + + err = env.Update(func(txn *lmdb.Txn) (err error) { + // Write several keys to the database within one transaction. If + // either write fails and this function returns an error then readers + // in other transactions will not see either value because Env.Update + // aborts the transaction if an error is returned. + + err = txn.Put(dbi, []byte("x"), []byte("hello"), 0) + if err != nil { + return err + } + err = txn.Put(dbi, []byte("y"), []byte("goodbye"), 0) + if err != nil { + return err + } + return nil + }) + if err != nil { + panic(err) + } +} + +// In this example, another C library requires the application to lock a +// goroutine to its thread. When writing to the Env this goroutine must use +// the method Env.UpdateLocked to prevent premature unlocking of the goroutine. +// +// Note that there is no way for a goroutine to determine if it is locked to a +// thread failure to call Env.UpdateLocked in a scenario like this can lead to +// unspecified and hard to debug failure modes for your application. +func ExampleEnv_UpdateLocked() { + runtime.LockOSThread() + defer runtime.UnlockOSThread() + + // ... Do something that requires the goroutine be locked to its thread. + + // Create a transaction that will not interfere with thread locking and + // issue some writes with it. + err = env.UpdateLocked(func(txn *lmdb.Txn) (err error) { + err = txn.Put(dbi, []byte("x"), []byte("hello"), 0) + if err != nil { + return err + } + err = txn.Put(dbi, []byte("y"), []byte("goodbye"), 0) + if err != nil { + return err + } + return nil + }) + if err != nil { + panic(err) + } + + // ... Do something requiring the goroutine still be locked to its thread. +} + // This example shows the general workflow of LMDB. An environment is created // and configured before being opened. After the environment is opened its // databases are created and their handles are saved for use in future diff --git a/lmdb/lmdb.go b/lmdb/lmdb.go index 0c7e649b..214780f3 100644 --- a/lmdb/lmdb.go +++ b/lmdb/lmdb.go @@ -4,7 +4,10 @@ fairly low level and are designed to provide a minimal interface that prevents misuse to a reasonable extent. When in doubt refer to the C documentation as a reference. - http://symas.com/mdb/doc/group__mdb.html + http://www.lmdb.tech/doc/ + http://www.lmdb.tech/doc/starting.html + http://www.lmdb.tech/doc/modules.html + Environment @@ -17,6 +20,12 @@ of creation. On filesystems that support sparse files this should not adversely affect disk usage. Resizing an environment is possible but must be handled with care when concurrent access is involved. +Note that the package lmdb forces all Env objects to be opened with the NoTLS +(MDB_NOTLS) flag. Without this flag LMDB would not be practically usable in Go +(in the author's opinion). However, even for environments opened with this +flag there are caveats regarding how transactions are used (see Caveats below). + + Databases A database in an LMDB environment is an ordered key-value store that holds @@ -34,6 +43,7 @@ closed but it is not required. Typically, applications acquire handles for all their databases immediately after opening an environment and retain them for the lifetime of the process. + Transactions View (readonly) transactions in LMDB operate on a snapshot of the database at @@ -50,6 +60,26 @@ transactions do not require explicit calling of Abort/Commit and are provided through the Env methods Update, View, and RunTxn. The BeginTxn method on Env creates an unmanaged transaction but its use is not advised in most applications. + + +Caveats + +Write transactions (those created without the Readonly flag) must be created in +a goroutine that has been locked to its thread by calling the function +runtime.LockOSThread. Futhermore, all methods on such transactions must be +called from the goroutine which created them. This is a fundamental limitation +of LMDB even when using the NoTLS flag (which the package always uses). The +Env.Update method assists the programmer by calling runtime.LockOSThread +automatically but it cannot sufficiently abstract write transactions to make +them completely safe in Go. + +A goroutine must never create a write transaction if the application programmer +cannot determine whether the goroutine is locked to an OS thread. This is a +consequence of goroutine restrictions on write transactions and limitations in +the runtime's thread locking implementation. In such situations updates +desired by the goroutine in question must be proxied by a goroutine with a +known state (i.e. "locked" or "unlocked"). See the included examples for more +details about dealing with such situations. */ package lmdb diff --git a/lmdb/txn.go b/lmdb/txn.go index 8347b118..bd7c948b 100644 --- a/lmdb/txn.go +++ b/lmdb/txn.go @@ -244,8 +244,10 @@ func (txn *Txn) Drop(dbi DBI, del bool) error { // nil error is returned by fn and otherwise aborts it. Sub returns any error // it encounters. // -// Sub may only be called on an Update (a Txn created without the Readonly -// flag). Calling Sub on a View transaction will return an error. +// Sub may only be called on an Update Txn (one created without the Readonly +// flag). Calling Sub on a View transaction will return an error. Sub assumes +// the calling goroutine is locked to an OS thread and will not call +// runtime.LockOSThread. // // Any call to Abort, Commit, Renew, or Reset on a Txn created by Sub will // panic.