Laravel: Avoiding cache data-races
Multiple processes hate this!
When you are not dealing with multiple processes pointing the same data, everything is fine. PHP is itself a single-threaded process and takes out a lot of headaches of concurrency. That’s why using the Laravel Cache to keep data at hand is wise, specially to avoid things that take little too much, like a complex SQL query or slow HTTP server request.
One of the things I hate when using a Cache is data races, and probably you too. If two or more processes want to save data into the cache store, the data will be replaced constantly without order.
This usually poses a big problem.
Bob and Ana don’t get too well
A simple example of a data race is two users editing an article. Imagine Bob retrieves the draft he left writing at 10:00 AM, starts updating it at 10:05, and then stores back his finished article at 11:00 AM.
Ana, his editor on the other side, has retrieved his article at 10:30 AM. After making some small corrections, she saves it at 10:45 expecting Bob to pick up what she wrote.
What Bob has done at 11:00 AM is to overwrite the corrections Ana did, without ever knowing. The article publishes, and Ana yells at Bob for ignoring her fixes and comments.
While in applications this happens in seconds, there is a simple solution.
Invalidate and regenerate
To avoid the newer data being replaced by older data, we need to know if the data on the cache is “fresher” than what we hold right now. In other words, we need to use a timestamp.
Our timestamp will keep track of when the data has became no longer “valid”, and we will set it only once: when the data we have received from a source (cache or database) has changed.
We do this because, if it’s not invalidated, then it’s equal to the cache, so there is no need to even store it.
For the cache itself, we will use two keys instead of one. The first will hold the data itself, while the second the time it was persisted.
With these two bits of information, we can now “regenerate” the data when what we hold is fresher than the data stored in the cache:
We can translate the code above into two conditions:
- If the time key from the cache doesn’t exists, there is nothing in the cache so we will save it.
- There is a time, so we will proceed if the data was persisted before the time we invalidated it.
This can be extrapolated no only to a database, but anything where you can put both data and timestamps.
If we want to go the extra mile, we can create a class that handles this for us.
The above class is very easy to use. Let’s fix the problem that Bob and Ana have:
This is just a silly example, but the point is: as long you can know “when” the data becomes different to the origin, you can avoid data races between two or more separate processes by checking which has the fresher data to save.
You can find this trait and other helpers in my Laratraits package. Give it a chance if you find something useful to use in your project.