Skip to main content

Posts

Partitioning | Key Partitioning, Hash Partitioning and Request Routing

Concept       In distributed systems, it is often normal that data is replicated on other nodes to provide resiliency and high fault tolerance.  But at the same time, storing all of the data on a single node and keeping its copies across others might increase the query time and might affect query throughput. Well, While working with a large amount of data, in order to increase the query throughput, data needs to be distributed or partitioned across the nodes. This makes it easy to leverage the compute power of all the nodes involved.          A node having all the data is often known to be a "hot-spot". If the data is being queried, it will get queried from one of the nodes, depending upon the query type and type of replication i.e. leader-follower replication or leader-leader replication. If a node has all the data, then query throughput will suffer as it will have to refer all of the data unnecessarily. And that's where partitioning comes to rescue. Key-Range

Configuring systems in Multiple VLANs | concepts, use-case, conclusion

Preface When it comes to Networking, I always thought it's so boring and never ever gave a thought about how cool it can be! Well, as a human, psychologically, we all tend to have some biases, towards clothes, brands, things, opinions, judgments, emotions, the language to code in :P, etc which all contribute to some stubborn prejudices in our mind and which in turn creates conjectures. Well, in my opinion, Conjectures aren't healthy(subjective), as you tend to live in your hypothetical world without giving damn about facts and proofs. And the only way out is to cross-examine every decision, opinion, and judgment you make. Well, it's not a psychology blog, let's just leave it here. :) This is my experience of getting out of my prejudice with networking.  While working for a network feature, I came across this amazing use case. Thought of writing it down after I get done with it. Now I think it's the time. Grab your chair and popcorn, here it goes! Con

SSTables and Its Wonders!

Recap: In the last blog, we mentioned about the log structured based storage engine. Where the log-structured storage segment is a sequence of key-value pairs. These pairs appear in the order that they were written and values later in the log take precedence over the values for the same key earlier in the log. A step ahead... what if we modify its way of storage and store the data in sorted order of their keys? We call this format as Sorted String Table or SSTable as an abbreviation. Where we store keys in its sorted order. But wait! What about sequential writes that made bitcask faster? you might have this doubt. Let's keep that doubt as it is for now. Benefits with the approach: This would, undoubtedly, in turn, improve the performance of compaction(the process of eliminating duplicate values) and merging of the segments. Compaction would be more of a merge function in the famous MergeSort. While merging the segments, if we come across the same keys, then we cho

Unpacking Storage Engines!

When it comes to storage, the very basic notion that occurs is a Database. A database, should store the data when given and read it back when asked. But wait, As a developer, Why should we care?                    We know the databases, SQL and popular NoSQL. And there is a vast number of different databases available out there. Every one of them is optimized for different types of workloads and are based on different storage engines. FYI storage engine is a software module that a database management system uses to create, read, update and delete data from a database. In order to tune the storage engines, you have got to understand the type of workload you are going to serve and what storage engine is doing under the hood.                      There are different storage engines. Ex. Log-structured storage engines, and page-oriented storage engines. Stuff to care for when it comes to Database                     The database often deals with concurrent reads/