Skip to main content

Unpacking Storage Engines!

When it comes to storage, the very basic notion that occurs is a Database. A database, should store the data when given and read it back when asked. But wait, As a developer,

Why should we care?

                   We know the databases, SQL and popular NoSQL. And there is a vast number of different databases available out there. Every one of them is optimized for different types of workloads and are based on different storage engines. FYI storage engine is a software module that a database management system uses to create, read, update and delete data from a database. In order to tune the storage engines, you have got to understand the type of workload you are going to serve and what storage engine is doing under the hood. 
                    There are different storage engines. Ex. Log-structured storage engines, and page-oriented storage engines.

Stuff to care for when it comes to Database

                    The database often deals with concurrent reads/writes, thus it is ought to have better concurrency control mechanism. Depending on the storage engine underneath, it should perform defragmentation without affecting performance. Moreover, in just a sentence, a DBMS should obey ACID properties.

Simple Storage Engine
     
                     Let's consider a storage engine. Where for every write, data is appended at the end of the file. Even for modification, the new value is written at the end of the file. These writes are sequential writes and perform better than random writes.  For every read, we return the latest occurrence of the data i.e. {key: value }. But how to efficiently find the data on the disk? we need some sort of indexing, which would in turn help in locating the data on the disk.
                     Index, in general terms, is about keeping some metadata aside, which can be leveraged later for efficient retrieval of the on-disk data. But, this involves maintaining another secondary data other than primary data. It also incurs the overhead of maintaining the secondary data and consistency between them. This is a trade-off in storage systems.

Indexing: Hash Indexes

                      Hash indexes stores key, value pairs and is generally implemented using a hash map. Let's continue our example, if we are having a storage engine based on append only (logging) mechanism, we can maintain an in-memory hash map, where we map keys with the file pointer of the value on the disk. Now, whenever a read request comes, we check for its key in the in-memory hash map and perform disk seek in O(1) using the file pointer associated with the key and return the original value. This rapidly improves the read performance, on the cost of maintaining the in-memory hash map for every key in the database. But not a bad approach at all!
                       For every write, we append the data at the end of the on-disk file and update the file pointer in the in-memory hash map. But wait, what about the disk space?
                       So, to tackle this we break the on-disk file into the segments of a certain size, every data is appended to the current segment (active segment) and if the size of the active segment reaches a certain threshold, we create another segment which now becomes the active segment and the previous segment is marked as old segment. Old segments are kept immutable.
                        To solve disk space problem, we run compaction and merge algorithm, where old segments are iterated over and compacted i.e. the duplicate keys are replaced with the latest occurrence of the key and are merged into the single segment. This is a timely process and runs after a definite amount of time. While serving IOs data is read from the old segments, and when compaction and merge completes, the old segments are deleted and these new segments are added, requires file pointer update in the in-memory hash map. Appending and merging are sequential write operations which are generally much faster than random writes.

Riak: Bitcask

                         Well, this is the exact way in which bitcask works. Bitcask is a default storage engine used by Riak Key-Value store. It serves well when the workload involves lots of key-value updates. Bitcask provides high performant reads/writes due to, of course, in-memory hasp map, which also incurs the overhead of having all the keys in the memory.

Problems with Bitcask:
  1. Bitcask requires all keys to be in memory.
  2. Range queries are not efficient (keys are not sorted) and take much time.
Bitcask is way cooler and simple. Let's explore more cooler technique, SSTables in the next blog! 

Comments

Popular posts from this blog

System Design #1: Designing Live Commenting!

All of us surely have come across bunch of systems that supports live commenting. For example, facebook live commenting, twitch/youtube live stream commenting, reddit live stream commenting etc. Lets deep dive in the system that support the live commenting feature. Requirements: User should be able to see active real time comments on the post/video/stream across the globe. System should be highly available, fault tolerant.  Due to CAP theorem, we will need to trade the consistency. Consider our system to be eventually consistent. If the comment is made, its okay for us if it takes few seconds to appear everywhere else. Goal: to build a system to sync live comments across the demographies & data centers to build a system that supports the real time pushing of comments to the web/mobile clients. Estimation: Consider 100M Daily Active Users (DAU), 400M daily posts/videos/streams on the system and daily 10B comments being made on different streams/videos/posts.  To support such high sc

Behind the "Multiplexing of user threads over kernel threads" | Goroutines & Green Threads

Introduction I have been working on Golang for quite a time now. I explored a lot of features. The few that caught up my eye was 'Scalability' & 'Concurrency'. Scalability & Concurrency have been some of the major objectives behind the design of Golang. Let's dive in a bit. Threads  A thread is the unit of execution within a process. A process can have anywhere from just one thread to many threads. On a machine, we have multiple processes running and in these processes, we have independent or dependent threads aggregating computations.  Contextually, these threads are further broken down into two types, namely  User-Level Threads and Kernel Level Threads . The basic difference between these threads is that the kernel-level threads are managed, operated, and scheduled by the operating system(kernel), and user-level threads are managed, operated, and scheduled by the application layer.  Just to have more understanding about them, let's list dow

The stuff you should know about InnoDB | MySQL storage engine

It's been quite a while after the first blog about Storage Engines . But after that blog, the thing that hit me was how the databases like the great MySQL and the legend PostgreSQL works(subjective). While exploring MySQL I came across the famous, and default storage engine of MySQL , i.e. InnoDB . Whenever you create a table without mentioning 'ENGINE' attribute in a query, you are telling MySQL to go and use InnoDB to create the table. Well, there are many amazing/awesome/mind-forking storage engines that can be used instead of InnoDB . But, as InnoDB is the default, we should not hesitate to explore it. What is InnoDB?               InnoDB is the general-purpose storage engine that balances high reliability and high performance. Reliability is the fault tolerance quotient of the system. In MySQL 8.0 , InnoDB is the default MySQL storage engine, unless you configure it with other storage engines. What the hell InnoDB has? B-Tree indexes (u