The world’s simplest database
With their sophisticated query engines, transaction management, and indexing algorithms, databases are frequently regarded as complex systems. What if I told you that it is quite simple to write a database from scratch, what would you think? In this post, we’ll use only simple bash functions to reduce databases to their bare essentials: storing, retrieving, and modifying data.
This lightweight, basic method will provide you with a new perspective, regardless of whether you’re searching for a fun weekend project or simply want to comprehend the fundamental ideas behind databases. Let’s get started!
Imagine the world’s simplest database, we can create it using only two bash functions:
#!/bin/bash
db_set () {
echo "$1,$2" >> database
}
db_get () {
grep "^$1," database | sed -e "s/^$1,//" | tail -n 1
}
These two functions uses a plain text file to construct a basic key-value store.
db_set
adds key-value pairs to the store. This function takes two arguments ($1
and $2
). They are appended, separated by a comma, to a file called database
. In essence, it functions as a straightforward key-value store, with $1
serving as the key and $2
as the value. You can use nearly anything as the key and value. For instance, a JSON document could be used as the value.
db_get
retrieves from store the most recent value for a given key. This function takes a single argument ($1
), which is the key to search for. grep
is used to locate all lines in the store that begin with the specified key and a comma (^$1,
). Only the value remains after the key and comma are eliminated using the sed
command. If there are more than one entry for the same key, tail -n 1
guarantees that the final one is returned (simulating an update mechanism).
$ db_set 100 '{"title": "Designing Data-Intensive Applications", "author": "Martin Kleppmann"}'
$ db_set 101 '{"title": "The Pragmatic Programmer", "author": "Andrew Hunt"}'
$ db_get 101
{"title": "The Pragmatic Programmer", "author": "Andrew Hunt"}
The logic underlying all this is very simple: Each line stores the key-value pair separated by a comma. (A lot of storage details are ignored here.) Each db_set
call appends the key-value pair to the end of the file, so when a key is updated several times, the value will not overwrite older versions. This is why tail -n 1
is used in the db_get
method.
$ db_set 101 '{"title": "The Pragmatic Programmer", "author": "David Thomas & Andrew Hunt"}'
$ db_get 101
{"title": "The Pragmatic Programmer", "author": "David Thomas & Andrew Hunt"}
$ cat database
100,{"title": "Designing Data-Intensive Applications", "author": "Martin Kleppmann"}
101,{"title": "The Pragmatic Programmer", "author": "Andrew Hunt"}
101,{"title": "The Pragmatic Programmer", "author": "David Thomas & Andrew Hunt"}
In this article, we looked at how to use just two bash functions to create a simple database. Because the db_set
method only adds data to a file, it is incredibly efficient. This strategy is comparable to how many databases in the real-world use log files to rapidly store new items. Because this method just adds information to the end of the file without changing existing records, it is quick to write data.
But when we have to get data, things become complicated. To obtain the most recent value for a particular key, the db_get
method scans the entire file from top to bottom. This means searches take longer as the file size increases. Essentially, the more records there are in the file, the longer it takes to locate a record.
Databases utilize a structure called an index to speed up data retrieval. An index functions similarly to a signpost or shortcut, allowing speedy data retrieval without requiring a thorough search. Although indexes can significantly speed up searches, they have a drawback in that the index needs to be updated each time new data is added. Writing new data is slowed down by this extra step.
One of the main trade-offs in database architecture is that using indexes to speed up searches slows down writing. Databases do not automatically generate indexes for everything as a consequence. Rather, developers use what searches are most common in their applications to determine what data should be indexed. The goal is to find the right balance between fast searches and efficient storage.
We managed to get insight into the basic ideas underlying real-world databases by creating this simple database. Despite the extreme simplicity of our version, it illustrates the problems that database systems need to address on much larger scales.
Thanks for reading! I hope this post gave you a fun and insightful look into how databases work under the hood.
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann contains some of the content you are viewing.