Go implementation of Bitcask - A Log-Structured Hash Table for Fast Key / Value Data as defined per this paper and with help from this repo.
A learning venture into database development. Special thanks go to the amazing Ben Johnson for pointing me in the right direction and being as helpful as he was.
- Low latency per item read or written
- High throughput, especially when writing an incoming stream of random items
- Ability to handle datasets much larger than RAM w/o degradation
- Crash friendliness, both in terms of fast recovery and not losing data
- Ease of backup and restore
- A relatively simple, understandable (and thus supportable) code structure and data format
- Predictable behavior under heavy access load or large volume
- Data files are rotated based on the user defined data file size (2GB default)
- A license that allowed for easy use
- Data corruption crc check
- GoCask does not implement any buffer cache in-memory. Instead, it depends on the filesystem’s cache. Adjusting the caching characteristics of your filesystem can impact performance.
- GoCask stores all keys in memory which means that your system needs to have enough RAM to store all of your keyspace
There are two ways to use gocask
GoCask can be used similarly to bolt or badger as an embedded db.
go get github.com/aneshas/gocask/cmd/gocask
and use the api. See the docs
If you have go installed:
go install github.com/aneshas/gocask/cmd/gocask@latest
go install github.com/aneshas/gocask/cmd/gccli@latest
Then run gocask
which will run the db engine itself, open default
db and start grpc (twirp) server on localhost:8888
(Run gocask -help
to see config options and the defaults)
While the server is running you can interact with it via gccli
binary:
gccli keys
- list stored keysgccli put somekey someval
- stores the key value pairgccli get somekey
- retrieves the value stored under the keygccli del somekey
- deletes the value stored under the key
gccli
is just meant as a simple probing tool, and you can generate your own client you can use the .proto definition included (or use the pre generated go client.
If you don't have go installed, you can go to releases download latest release and go through the same process as above.
Since the primary motivation for this repo was learning more about how db engines work and although it could already be used, it's far from production ready. With that being said, I do plan to maintain and extend it in the future.
Some things that are on my mind:
- Support for multiple processes and write locking
- Current key deletion is a soft delete (implement merging and hint files)
- key durability/expiry
- Fold over keys
- Double down on tests (fuzz?)
- Add benchmarks
- Make it distributed
- An eventstore spin off (use gocask instead of sqlite)