Behind the "Multiplexing of user threads over kernel threads"

Behind the "Multiplexing of user threads over kernel threads" | Goroutines & Green Threads

Introduction

I have been working on Golang for quite a time now. I explored a lot of features. The few that caught up my eye was 'Scalability' & 'Concurrency'. Scalability & Concurrency have been some of the major objectives behind the design of Golang. Let's dive in a bit.

Threads

A thread is the unit of execution within a process. A process can have anywhere from just one thread to many threads. On a machine, we have multiple processes running and in these processes, we have independent or dependent threads aggregating computations.

Contextually, these threads are further broken down into two types, namely User-Level Threads and Kernel Level Threads. The basic difference between these threads is that the kernel-level threads are managed, operated, and scheduled by the operating system(kernel), and user-level threads are managed, operated, and scheduled by the application layer.

Just to have more understanding about them, let's list down the advantages and disadvantages!

Advantages of Kernel Threads

The kernel knows the whereabouts of kernel threads. Thus, Kernel can schedule these threads optimally, i.e. the scheduler can give priority to a process with a large number of threads over the process with comparatively fewer threads.
Kernel threads are secure and are managed by native OS.

Disadvantages of Kernel Threads

Kernel threads are very inefficient, these threads take a lot of time during a context switch. It involves changing a large set of processor registers that define the current memory map and permissions. It also evicts some or all of the processor cache.
Kernel threads are very slow. These threads are spawned by the kernel using system calls, which are hefty when it comes to execution speed. Thus, kernel threads are slow to start/stop.

Advantages of User Threads

Well, well, first thing first, you can implement your own user threads, even when the native OS does not support any concurrency.
User threads are comparatively very cheap to spawn and consume less memory than the kernel threads. Creating a new thread, switching between threads & synchronization of these threads are done via procedure calls and have no kernel involvement. Thus, user threads are faster than kernel threads.

Disadvantages of User Threads

User-level threads are not optimized for scheduling, the reason being, kernel scheduler is way more optimized than the custom schedulers.

Looking at the pros and cons, what if we leverage the speed of user-level threads and the scheduling capability of kernel-level threads. Thus, to get the best of both worlds, it's better to multiplex the user-level threads (lightweight, easy to create, but not known to the kernel so poor scheduling) over kernel-level threads (Good at concurrency and scheduling, but inefficient for creation, maintenance & context switch).

Underhood Goroutines Concurrency:

P: Number of Processors
M: Number of threads (os level)
G: Number of Goroutines (user level, green threads)

There are M threads running on P processors and G threads(goroutines) are multiplexed over the M threads(kernel level).

Thus G goroutines need to be scheduled on M os threads which are internally scheduled over P processors. In Golang, the GOMAXPROCS environment variable depicts the number of processors(cores in the system) that will be contributing to the execution of these threads. Note that, GOMAXPROCS set to 1 means, no parallelism. Given that P <= GOMAXPROCS.

Every processor has a Local Run Queue i.e. LRQ, Goroutines in LRQ are picked up one by one by the scheduler to schedule them on the owner processor of LRQ. Above this, there is a GLOBAL RUN QUEUE i.e. GRQ. GRQ is shared across all threads. As LRQ is local, thus scheduler threads do not need locks over it, i.e. LRQ doesn't need to be synchronized as they are accessed by only one thread. Whereas, GRQ needs locking as this queue of tasks is shared across all the processor threads.

Whenever the scheduler does not find any thread on LRQ, then it performs thread stealing, which means it randomly accesses the LRQs of other processors (now LRQ needs locking) and steals half of the workload.

If in case, there are no threads to steal from other LRQs, the scheduler gets the workload in GRQ.

Advantages of Goroutines:

Less memory consumption (few kilobytes per goroutine)
Less setup and teardown cost (user-space threads)
Context switch cost is less as the scheduling is co-operative and non-preemptive. In cooperative scheduling, there is no concept of the scheduler time slice. In such scheduling, Goroutines yield the control periodically when they are idle or logically blocked in order to tun multiple goroutines concurrently. The switch between goroutines happens only at well-defined points, when an explicit call is made to the Go Runtime Scheduler. And those points are:

Send/Receive calls over the channels, as it involves the blocking calls
Go statement, thus it is not guaranteed that the new routine will be scheduled immediately.
Blocking syscalls for the file or network operations.
After being stopped for a garbage collection cycle.

Fun fact:

Considering 2KB size of single goroutine and 8GB of RAM, you can run,

500 goroutines per 1MB
500,000 goroutines per 1GB
4,000,000 goroutines per 8GB

Given the calculation, on 8GB RAM, we can have around a million goroutines running.

System Design #1: Designing Live Commenting!

All of us surely have come across bunch of systems that supports live commenting. For example, facebook live commenting, twitch/youtube live stream commenting, reddit live stream commenting etc. Lets deep dive in the system that support the live commenting feature. Requirements: User should be able to see active real time comments on the post/video/stream across the globe. System should be highly available, fault tolerant. Due to CAP theorem, we will need to trade the consistency. Consider our system to be eventually consistent. If the comment is made, its okay for us if it takes few seconds to appear everywhere else. Goal: to build a system to sync live comments across the demographies & data centers to build a system that supports the real time pushing of comments to the web/mobile clients. Estimation: Consider 100M Daily Active Users (DAU), 400M daily posts/videos/streams on the system and daily 10B comments being made on different streams/videos/posts. To support suc...

Ajinkya (AJ)January 18, 2021 at 12:27 AM
Nice article. Covering critical details as well as simple to understand.
AnonymousFebruary 3, 2022 at 11:44 AM
I love the videos you watch at the Vdara in youtube
You can youtube to mp4 watch this video on Vdara in youtube. I hope to play them at the Vdara, a place where you can bet on sports.

Mindfork Tech

Search This Blog