Scaling a microservice with its database
The solution is rather simple: just do it.
If you recall what I wrote recently, you will know I made a prototype of a microservice system privately. The whole system itself contained a two PHP applications made using the Lumen micro framework: one for handling Contracts, and another for Clients, each with its own database inside.
While the prototype worked, there was one problem: it wasn’t elastic. In other words, we can’t automatically spawn more microservices to serve more requests in a reliable way.
Let’s say we make a second and third instance of the Contracts microservice because there is too much people signing up. Since each microservice contains its own database, creating a new instance would also make a new empty database, and the proxy who manages the connection to the Contracts microservices will route the new requests to these new instances that contains nothing, since these databases also spawn anew.
Destroying the new instances once the load goes down would also destroy the data. If microservices shared the same database volume, where the database information is saved, there is no way to let each database engine know what the other is doing since each manages its own cache and buffers.
In other words, it gets complicated.
There is a solution: separate the database further. Just make the database another microservice!
Since most of the microservices in my whole prototype deal with saving data somewhere, making another microservice for the database seems like a good solution. When an application microservice scales back and forth depending on the load, the database instance is the same for all and data is kept consistent.
Now, the problem is the database itself. There are two ways to make a microservice elastic while keeping data integrity of the database.
One database for each microservice
One simple solution is just create a database for each microservice that needs it. Imagine spawning a two Docker containers with the apps, and two additional containing a database for each one. If you’re using Kubernetes, then we would have a “Contracts Pod” and a “Contracts Database Pod”.
- Each database is exclusive to the microservice it governs it.
- Each microservices deals with its own RDBMS.
- Scaling the database is cumbersome but can be done.
- Multiple RDBMS instances can hook up resources.
The third point is what makes this approach very good. For example, the Clients service is mostly hammered with retrieving data, specially as part of the authentication flow, and write queries are rarely done. In that case we can just swap MariaDB and put SQLite since it drives in read operations. Meanwhile, the Payment service deals with its own MariaDB microservice with some optimizations because it needs to handle a lot of writes, specially at the end of the month — everyone likes to pay the last day, en masse.
About scalability, it’s clear that if you have 20 microservices ongoing like I’m (hopefully) about to make, that means 20 databases instances. If each instance asks for 128MB to work properly, then we have 2.5GB only on databases sitting around. Counting the 0.5GB for the system itself, that leaves our 4GB server with just 1GB for the rest of microservices. We could tell each database to use less, like 64MB or even 32MB per instance, that could theoretically work. If we can do that, it leaves space for sharding up to four or eight instances if needed, assuming I can do it. The point is to try to optimize the databases livelihood on the system enough to make them scalable on demand instead of shoving big databases instances and call it a day.
By the way, there a calculators and some walkthroughs to tune a database engine to the lowest acceptable.
One big shared Database
The other solution is just to put a whole RDBMS in a beefier server, or assign it more resources, and leave each microservice to deal directly with this database instance:
- Each microservice can check other database tables and validate data.
- All microservices must deal with the same RDBMS and schema.
- One single RDBMS instance uses better the server resources.
One clear advantage of using the same database is that all microservices can check other tables and validate data. The problem on that approach, which bypasses the other microservice entirely, is that one change on the schema means changing every microservice hooked to it. It’s good or bad? Well, I think it’s bad but that depends on the application itself, since there may be performances gains when bypassing a second microservice and going directly to the database, but it will depend on if what you’re doing is complex or not.
The third point can be debatable. The database engine itself will need to be tuned for a performance for all-scenarios, as some services will demand more writing than reading and vice versa. Additionally, you will have to make the math for the number of connections for all microservices so you have enough at all times.
I personally see one single RDBMS point more manageable in terms of resource usage, since the data lives in one place and there is no duplication of processes. For example, it’s more easy for database partitioning or sharding. Trying to do the same at a per-microservice scale is not impossible, but can be difficult or cumbersome, at least for me, since it’s not that you would push a database engine to its knees.
As far as I know, most of the time your application will be the culprit on managing the tsunami of data from your database than the other way around, unless you’re executing a very unoptimized query on it multiple times or concurrently.
I think that “one database to rule them all” it’s not silver bullet, but if you consider unmanageable two dozens of different databases on your microservice architecture, you may want to give a single one a try.
For my microservices, I’m just using the first approach: one database per microservice, highly tuned to use a small footprint. With proper diagnostic and monitoring tools, I can know when the database for a particular microservice becomes a problem. I think it will happen with the Cart History microservice. I will consider properly sharding the database with more instances if I fail to tuning it properly or giving it more resources. Sharding a database while keeping it “small” it’s a whole rabbit hole I’ll try to avoid until it’s unavoidable.
The above doesn’t mean I will rule out the “One Big Shared Database”. There is still place for this kind of approach for a group of highly related microservices. For example, a Cart, Order, Payments and Refunds microservices can all work from the same database. I only have to be careful to keep data consistent across the database and microservices.