Multi-Processing vs Multi-Threading
Multi-Processing vs Multi-Threading is a concept that frequently comes up. The summary is that if your application is network bound, use threading. If your application is compute bound, use processing. Let’s look a little deeper and see why that is with a simplified view of how an Application works. When you run your Application, that code goes through a series of virtualizations (jvm, docker, etc..), and eventually makes its way onto the CPU.
CPUs are fairly straightforward, they execute a set of sequential commands (Store, Load, Add, etc…). If you are performing a bunch of compute through Addition or other forms of math, they all must be executed sequentially on the CPU. It logically makes sense, if you want more compute you increase the number of CPUs. Most modern CPUs consist of multiple cores, which all execute sets of sequential commands. The process of your application leveraging more of these cores is multi-processing.
What happens when you make a network request? The CPU executes a loop which looks something like, 1/ check to see if response is returned 2/ if not sleep and repeat 1 3/ continue. When the CPU is sleeping instead of doing other things, that is wasteful. Operating systems and applications have built mechanisms to handle this by executing other work while the CPU would be sleeping, these are known as threads. Now, instead of the CPU sleeping, the CPU starts executing another thread. If our application is waiting most of the time, it can wait for a bunch of things in parallel by using threads. Only one thread is executing at a single time because of the sequential nature of processing commands; however, since these threads are “waiting“, they can wait in parallel.
Modern operating systems (and virtualization environments) have gotten really good at optimizing resources. They are able to detect if your application is waiting, and can run another application. Instead of a CPU running a single process forever, the core processes Application A, then Application B, then back to Application A. But, that only helps across multiple Applications not within an Application. Some modern languages like java help with this automatically as well, for instance, by moving garbage collection to a thread. Java does some cleanup in the “background” while the application would have been wasting precious compute cycles.
What about so-called “single threaded languages” like python and javascript? These languages have just built special support for “threads”. Thread management (which thread to execute when) is generally handled at the operating system level, and is relatively expensive (wasted time/compute). Javascript essentially handles these threads for you, but rather than call them threads they are called promises. A promise does something, normally waiting for something to complete, like a network call, then returns. The javascript engine notices that the Promise is waiting, and goes to check if there is work to be done (more Promises). This is a lower overhead alternative to Threads, and comes with other benefits such as Promise-Chaining which I will not delve into.
TL;DR - If you need more compute, parallel process. If you are waiting for things (such as network calls) on a single-threaded language, make sure to use the concurrency controls (Promises/Await). If you are waiting for things on other languages, make sure to use threads.
PS - This is a gross oversimplification of how CPU architecture works. However, it is true enough and provides enough detail to justify why using threads for network bound operations work. We could go into greater details, such as CPU interrupts, L0/L1+ caches, CPU core locality, vector processors, DMA, GPUs, Map-Reduce, etc… but perhaps those are topics for another day.