Bitcask is an application for storing and retrieving key/value data using log-structured hash tables. It stores keys and metadata in memory with the values on disk. Retrieving a value is fast because it requires a single disk seek.
The key benefits are:
- Low latency per item read or written
- Consistent performance
- Handles datasets larger than RAM
- Small design specification
The main drawback is:
- All your keys must fit in RAM
Over the weekend, I implemented part of Bitcask's design specification in a project called bitcask-lite — a key/value database and server using the Go standard library.
I needed to serve some largeish values for a side project and instead of building an MVP with something like SQLite, I yak shaved a database.
In bitcask-lite, keys and metadata live in a concurrent map based on orcaman/concurrent-map — a map of map shards. Go doesn't allow concurrent reading and writing of maps — so each map shard needs to be locked independently to avoid limiting bitcask-lite's concurrency to a single request.
type ConcurrentMap[V any] []*MapShard[V]type MapShard[V any] struct {items map[string]Vmu *sync.Mutex}
The database is a directory with one or many log files. Items get written to the active log file with the schema: expire, keySize, valueSize, key, value,
(I'm a big fan of human-readable data).
An item with a key of a
and a value of b
that expires on 10 Aug 2022 looks like this:
1759300313415,1,1,a,b,
Log files are append-only and don't need to be locked for getting values but when setting values there's a lock on the active log file to ensure that the database is correct relative to the order of incoming requests. Unfortunately, this means that write-heavy workloads will perform worse than read-heavy ones.
I really like Go's API for reading/writing to files. For me, it's sensible, consistent, and obvious. It can be verbose (especially error handling) but I'm in the that's-a-feature camp. Sometimes it's better to be clear.
The following snippet details the hot path for handling /get
requests. I've added some extra comments and trimmed error handling.
// StreamGet gets a value from a log storefunc (logStore *LogStore) StreamGet(key string, w io.Writer) (bool, error) {// Lock the map shardaccess := logStore.keys.AccessShard(key)defer access.Unlock()item, found := logStore.keys.Get(key)if !found {// Key not foundreturn false, nil} else if int(time.Now().UnixMilli()) >= item.expire {// Key found but it's expired..// so we can clean it up (aka lazy garbage collection!)logStore.keys.Delete(key)return false, nil}f, err := os.Open(item.file)// ..defer f.Close()// Set the offset for the upcoming read_, err = f.Seek(int64(item.valuePos), 0)// ..// Pipe it to the HTTP response_, err = io.CopyN(w, f, int64(item.valueSize))// ..return true, nil}
The trickiest part of this project was parsing the log files; largely due to off-by-one errors. The algorithm I used is fairly naive. It makes too many system calls but I wanted to ship rather than optimize early. My gut says that reading into a buffer and parsing is faster but the real performance win would be parsing log files in parallel so if startup time bothers me I'll fix that first!
// Parsing a bitcask-lite log file// Loop:// - expire = ReadBytes(COMMA)// - EOF error? return// - keySize = ReadBytes(COMMA)// - valueSize = ReadBytes(COMMA)// - key = Read(keySize + 1)// - value = Discard(valueSize+ 1)
HTTP API
I started this project with the API sketched out on a single sticky note.
/get?key=a/delete?key=c/set?key=b&expire=1759300313415- HTTP body is read as the value- expire is optional (default is infinite)
After finishing it, I saw there were more lines of test code than other code — which says a lot about software complexity in general. I can describe this API in a single breath but it took me a good few hours to cover every edge case with tests and fixtures.
The server code lives in func main()
and uses plain old net/http. Since main functions can't be tested from within Go, I went with a test pattern I've used in side projects before which is a end-to-end test script written in Python that spawns a server process, hits it, and asserts things.
# dev command to start serverproc = subprocess.Popen(["go", "run", "."], stdout=subprocess.PIPE, stderr=subprocess.STDOUT)# test getting a missing keyg = requests.get(f"{addr}/get?key=z")print(g.status_code, g.text)assert g.status_code == 404# ^ failing asserts exit with a non-zero exit code# which fails any continuous integration (CI) process# or other test runner
The Missing Parts
Bitcask's design specification also describes how the database can be cleaned up over time. Currently, bitcask-lite grows and grows. Expired keys live on disk forever.
Bitcask can merge several files into a more compact form and produce hintfiles for faster start up times — deleting expired keys in the process. This merge process has a slight performance hit but it can be performed during low traffic periods.
In order to cope with a high volume of writes without performance degradation during the day, you might want to limit merging to in non-peak periods. Setting the merge window to hours of the day when traffic is low will help.
I also skipped adding CRCs (cyclic redundancy checks), and instead of setting tombstone values for my delete operation I just pretend that an key has been set to zero bytes with an expire of 1970.
// Set takes key, expire, valueerr := logStore.Set(key, 0, []byte(""))if err != nil {log.Printf("couldn't delete %s: %s", key, err)w.WriteHeader(500)return}
I'm happy with the shortcuts I took. And so far, my toy database has been humming along just fine.
Sure, using SQLite with a single table would have gotten me up and going much quicker. But I mostly took this database detour for fun. I do wonder about the performance comparison between bitcask-lite and SQLite+server. I was about to set up some benchmarks when I couldn't figure out if you can stream individual SQLite values to a client. If you know, let me know!