Skip to main content

Posts

System Design #1: Designing Live Commenting!

All of us surely have come across bunch of systems that supports live commenting. For example, facebook live commenting, twitch/youtube live stream commenting, reddit live stream commenting etc. Lets deep dive in the system that support the live commenting feature. Requirements: User should be able to see active real time comments on the post/video/stream across the globe. System should be highly available, fault tolerant.  Due to CAP theorem, we will need to trade the consistency. Consider our system to be eventually consistent. If the comment is made, its okay for us if it takes few seconds to appear everywhere else. Goal: to build a system to sync live comments across the demographies & data centers to build a system that supports the real time pushing of comments to the web/mobile clients. Estimation: Consider 100M Daily Active Users (DAU), 400M daily posts/videos/streams on the system and daily 10B comments being made on different streams/videos/posts.  To support suc...
Recent posts

Behind the "Multiplexing of user threads over kernel threads" | Goroutines & Green Threads

Introduction I have been working on Golang for quite a time now. I explored a lot of features. The few that caught up my eye was 'Scalability' & 'Concurrency'. Scalability & Concurrency have been some of the major objectives behind the design of Golang. Let's dive in a bit. Threads  A thread is the unit of execution within a process. A process can have anywhere from just one thread to many threads. On a machine, we have multiple processes running and in these processes, we have independent or dependent threads aggregating computations.  Contextually, these threads are further broken down into two types, namely  User-Level Threads and Kernel Level Threads . The basic difference between these threads is that the kernel-level threads are managed, operated, and scheduled by the operating system(kernel), and user-level threads are managed, operated, and scheduled by the application layer.  Just to have more understanding about them, let's list dow...

Decoding Json Web Tokens (JWTs) | Purpose, Solution and application

Well, I have used a bunch of user authentication and authorization web applications in my tenure on the Internet. And its the time, while working on one of the related projects, I was introduced to this amazing term "JWT". And this is how my journey of exploration began! What is JSON Web Token(JWT)? As defined on the official website, on an abstract level,  JWT  is a standard that defines a compact and self-contained way for securely transmitting information between parties as a JSON object. 😐 If you got it, skip the blog! If you are still reading, buckle up the belts, time to dissect it further! Let's understand why do we need it in the first place! HTTP as per the design is a stateless protocol. This means the server while serving a request does not know anything about the previous request. Thus, for applications which majorly relies on user authentication and authorization suffers a big problem. Pre-context | Authorization vs Authentication...

A case study on Dynamo | Highly available key-value store by Amazon

How e-commerce giants like Amazon , eBay scales? was the question I had for a long time. And Meanwhile, I stumbled upon this amazing paper  by Amazon. The paper highlights the design and implementation of Dynamo, a highly available key-value store that some of Amazon's core services use to provide a seamless user experience.  When it comes to building applications at large scale,  Reliability and scalability  are the biggest challenges these services face in their day-to-day business. Amazon operates on an infrastructure of tens of thousands of servers and networking components located across many data centers worldwide. Designing such applications starts right from the understanding of the requirements and objectives of the business.  Requirements: Dynamo is mainly for applications that need an ' always writable ' datastore where no updates are rejected due to failures or concurrent writes. Dynamo is built for infrastructure within a single a...

DI : The Buzzword

On the last Monday morning, when I was reading " this blog " I came across this term 'injector' for enough time to give it a thought and dig into it. and That's how I got started with it. DI: stands for Dependency Injection (ugh! what a fancy and confusing name!) It is a very famous code pattern to make the codebase more cohesive and loosely coupled. Often, while coding, we write some classes which internally initialize the objects of other classes. And thus the earlier class become dependent on the object creation of the later class. But, thoughtfully speaking, a class should be cohesive and should do nothing more than it's a purpose. For example, we have a class Employee and a class Address . Where an object of class Address is aggregated inside Employee class. Thus, Employee class, while providing a blueprint of an Employee object, now manages the creation of Address object too. This adds up the dependency and makes the class, less cohesive. ...

The stuff you should know about InnoDB | MySQL storage engine

It's been quite a while after the first blog about Storage Engines . But after that blog, the thing that hit me was how the databases like the great MySQL and the legend PostgreSQL works(subjective). While exploring MySQL I came across the famous, and default storage engine of MySQL , i.e. InnoDB . Whenever you create a table without mentioning 'ENGINE' attribute in a query, you are telling MySQL to go and use InnoDB to create the table. Well, there are many amazing/awesome/mind-forking storage engines that can be used instead of InnoDB . But, as InnoDB is the default, we should not hesitate to explore it. What is InnoDB?               InnoDB is the general-purpose storage engine that balances high reliability and high performance. Reliability is the fault tolerance quotient of the system. In MySQL 8.0 , InnoDB is the default MySQL storage engine, unless you configure it with other storage engines. What the hell InnoDB has...

Partitioning | Key Partitioning, Hash Partitioning and Request Routing

Concept       In distributed systems, it is often normal that data is replicated on other nodes to provide resiliency and high fault tolerance.  But at the same time, storing all of the data on a single node and keeping its copies across others might increase the query time and might affect query throughput. Well, While working with a large amount of data, in order to increase the query throughput, data needs to be distributed or partitioned across the nodes. This makes it easy to leverage the compute power of all the nodes involved.          A node having all the data is often known to be a "hot-spot". If the data is being queried, it will get queried from one of the nodes, depending upon the query type and type of replication i.e. leader-follower replication or leader-leader replication. If a node has all the data, then query throughput will suffer as it will have to refer all of the data unnecessarily. And that's where partition...