Factorio
I tend to not write about video games, but its astounding how good a video game can be for career development. I just got back from parental leave and a game that I played quite a bit was Factorio. You crash land on an alien planet and need to build a factory to research science and launch a rocket. But, its really just a large logistical problem. How do you manage resources? If we look at it in a slightly different perspective, its really just a distributed system problem. If I ever had to teach a class on software engineering or distributed systems - I would have my students play this game.
I am not going to fully describe the game, but you do need a little bit of background to understand the post. You mine resources to collect ore like copper or iron. Then you move those resources on belts. Then you smelt the ore into plates. Then you assemble the plates into products, either final products or intermediate products. All sounds pretty easy, right? Except you might mine iron at 0.5 ore per second and then smelt it at 0.3125 per second, but you want to build a product which needs 10 iron plates per second. How many of each do you need for this to be balanced? If it is not balanced, what happens? Ohh, and while you are thinking about that, lets not forget you need to move all of these items around on belts, which have their own limits. It is a giant constraint problem, which, interestingly enough, looks very similar to the types of problems I solve every day. The game makes it fun by throwing aliens at you to make it harder, and changing up how these things work with things like speed modules to change ratios and fluid systems - but lets keep it simple for this post.
In software systems, there are largely three types of scaling you can do. You can vertically scale by buying a larger instance. You can horizontally scale by adding more instances. Or, you can stamp out “cells”. Each has their respective pros and cons, but it really takes experience to truly learn about the pros and cons. This experience unfortunately is best learned by being woken up in the middle of the night — or going down during a peak and having to explain to your director/vp what happened. In factorio you have 3 tiers of factories, each factory simply gets faster - a perfect example of vertical scaling. You can vertically scale, but you will quickly hit limits as there is a maximum factory size (and its expensive in terms of resources). You can simply build more factories to consume resources - a perfect example of horizontal scaling. Except, you hit limits of how many items can be consumed or placed onto a belt - just like load balancers or many dependencies. There is a special technique in factorio, which is commonly used for megabases, which is called city blocks. These are no more than cells. They take a good horizontally scaled/vertically scaled block and connect it to a network, then stamp out these “blocks”. That is the exact same thing as a cellular architecture. Depending on where your service is, or where your factory is, you will want to employ some or all of these techniques in different scenarios.
The nature of factorio leads to a lot of asynchronous processing. In software systems, you generally have two paradigms push vs pull models. When you build your base you will probably use trains, and you will need to decide how to operate them. Guess what? Do you use push based or pull based? Well, that probably depends on what you are doing. You can build a factory which produces circuits and consumes iron and copper. You can disable the trains when you have enough buffer of iron and copper to produce circuits (pull based). Or, you can simply run trains constantly to ensure that you always have enough iron and copper. Notice any similarities? If you consume faster than the producer, push based is nice. If you consume slower than the producer, then poll based is nice. Just like in software where some systems adhere to one pattern or the other, some of your components will adhere to one pattern or the other. I didn’t even get into discussions of 1-N vs M-N or N-1 or M-N, and we already have parallels.
Buffers/caches/quotas are all very similar. This can manifest in a couple of different ways, but this one gets me every time. You hit a stage in the game where you have lazers which can defend your base, but they take A LOT of power. This power is very bursty, you only fire lazers when a wave of enemies is attacking your base and then you don’t need that power after they all die. So, you need a power generation system which is perfectly scalable. Perfectly scalable. How laughable, is anything perfectly scalable? There are different forms of power, you can use solar, or oil, or coal, or nuclear. All have their pros/cons, but how do you handle these large bursts? Well, if you dont then you might brown your system out. All of your factories operate on reduced power and reduce their throughput accordingly. You also might black your system out if you use electricity to load coal into a boiler, but run out of electricity, so you cant load coal into a boiler and your boiler stops producing power. Just like in software systems, there are different ways to solve this problem. You can isolate workloads, or in this case isolate power networks so your critical systems remain online no matter what happens. You can tune your buffer so that these types of things don’t happen - until that buffer doesn’t work/scale because the customer/aliens change things up on you. Maybe you implement load shedding so only a few groups of lazers operate at a time so you do not take your system out. The possibilities are as endless as are the parallels to our reality.
I could go on and on with a couple more examples, like circuit breakers or throttling, of how factorio is a parallel to software engineering, but that isnt the purpose of this post. The purpose of this post is to spark your curiosity. A common thing I get from my mentees and my team was - how could I have known about this before it happened? That is a really hard answer. Learning from others is not easy. Building a distributed system big enough and successful enough that you actually need to consider a cellular architecture is rare. You certainly could do a lot of this in mock setups in AWS, but thats both hard and extremely expensive. Or, you could go play a sim which will force you to learn these lessons the hard way. I know, I was on break and decided to play a game simulating my work - the irony is not lost on me (or my mother which constantly made fun of me).