Laravel: Are soft deletes really slow?

Well, they CAN be slow if you don’t use the power of indexes

One of the key features that Eloquent ORM offers is soft-deletion. This mechanism treat records as “deleted” by just pointing out when they were deleted, instead of destroying the record forever from the database. A Model with Soft Deletes will exclude them automatically every time is queried.

The table for the soft-deletable model looks like this. Note the deleted_at column, that is added by default, but you can change it nonetheless.

|-----------------------------------------------------|
| colors |
|----|---------|----------|-----|---------------------|
| id | user_id | color | ... | deleted_at |
|----|---------|----------|-----|---------------------|
| 1 | 3 | red | ... | 2020-01-01 13:00:00 |
| 2 | 3 | gray | ... | 2020-01-01 13:00:00 |
| 3 | 4 | blue | ... | null |
| 4 | 5 | red | ... | null |
| 5 | 6 | blue | ... | 2019-12-03 00:45:00 |

For example, if you call Colors::all() you won’t get the records with id equal to 1, 2 and 5, since these have been marked as deleted, even if they still exists in the database table.

That’s a nice way to have something like a “virtual trash can”. You can restore a deleted record if for some reason you f*ck up, or you need to hold the record in the database to avoid dangling references from other tables.

There is a problem with this approach, though. After doing some benchmarks with thousands of records, performance for this table started to slightly degrade. Suspecting it was the deleted_at column’s fault, I asked something related in StackExchange and I came back with some answers.

To my surprise, it wasn’t.

Word of advice: I will use SQLite for this article for because is braindead easy to work with. Most SQL database engines should have the same behaviour.

Primary key is awesome

When retrieving an Eloquent Model using soft deletion, Eloquent ORM will append WHERE deleted_at IS NULL to the SQL query using a global scope behind the scenes.

When trying to retrieve values by their primary key (the id in this case), the queries executed are as follows:

// For a single item
SELECT * FROM colors WHERE id = ? AND deleted_at IS NULL
// For multiple items
SELECT * FROM colors WHERE id IN (?, ?, ?) AND deleted_at IS NULL

Most of the databases will use the index of the primary key. I asked SQLite to explain how it would get multiple items just to be sure:

EXPLAIN QUERY PLAN 
SELECT * FROM colors
WHERE id IN (1, 2, 4) AND deleted_at NOT NULL
>> SEARCH TABLE colors USING INTEGER PRIMARY KEY (rowid=?)

SQLite said that it would search for the records in the table using the primary key, which is basically the identifier of the row. That’s cool.

For those who don’t know, a primary key is basically an index that is unique, but also identifies a single record in the whole database. This is created automatically when using auto-increment keys. More info in Use the Index, Luke.

We are still using the index even if we do a simple where X = X, since the databases enforces it to gain speed. If it wasn’t, a full table sweep would have to be done.

If it was using the index, then what was holding down the query performance?

Relationships: When things go south

This was were my troubled performance diminished: relationships. As you can see, the table has an user_id column, meaning, each record belongs to an User and there may be more than one of them. This is One to Many relationship, but a One to One may also apply here.

Because the user_id column is not part of any index, most databases will have to look the entire table. The more entries the table has, the slower will be the retrieval:

EXPLAIN QUERY PLAN 
SELECT * FROM colors WHERE user_id = 3 AND deleted_at IS NULL
>> SCAN TABLE colors

This can be fixed by creating an index with the user_id — in the case of One-to-one relationships, an unique index would be preferred, while letting the user to restore the record instead of creating a new one.

Each time the database checks for records for a given user_id, it will use the index instead of looking the entire table, even if you add the deleted_at into the mix.

We go the Index way

When executing the migrations, we can simply instruct the application to create an index on the user_id column.

Schema::create('colors', function (Blueprint $table) {
$table->bigIncrements('id');
$table->unsignedBigInteger('user_id');
...
$table->softDeletes();
$table->index('user_id');
});

Alternatively, if you already have your database running, you can directly execute this command in your database console or issue a statement. Obviously, replace the table and index name with your own.

CREATE INDEX colors_user_id_index ON colors(user_id)

After that, queries will use the index if the statement includes the indexed column. In this case, when selecting multiple records by the given user_id:

EXPLAIN QUERY PLAN 
SELECT * FROM colors
WHERE user_id = 3 AND deleted_at IS NULL
>> SEARCH TABLE colors USING INDEX colors_user_id_index (user_id=?)

Of course there are some drawbacks when having more than one index. Inserting a new record, updating one, or force-deleting, will take some of milliseconds more, since these records need to update each index status accordingly, but nothing you could note in a normal application.

Storage

Laravel uses by default an UNSIGNED BIGINT AUTOINCREMENT for the table id. In layman’s terms, a 64-bit numeric column that by each insertion is incremented automatically by the database, and that can hold up to 2⁶⁴ -1 insertions. That is ‭18.446.744.073.709.551.615‬ records! Quite insane, but let’s go on with it since you can change it afterwards.

When declaring the column that would hold the relationship, which in this case is the user_id column, we need to save is as UNSIGNED BIGINT which amounts to 64 bits. In total, that means 8 bytes for each index entry.

A very broad calculation on 1.000.000 rows would mean an index of around 8 MB, something that can perfectly fit in your server memory if supported. If you’re worried, you can use a 4 byte integer to hold up to 4.294.967.295 insertions.

If you’re using UUID, which uses 128 bits, one million rows takes the size up to 16MB, which is totally fine when you consider sacrificing that amount for a faster retrieval of records.

Summing it up, with a million rows, primary key with the user_id can take between 4MB and 16MB depending on your choice. Practically nothing.

If you’re worried by the storage, there is a more aggressive alternative.

Thinning the index, the filtered way

Filtered indexes (also called Partial Indexes) are indexes that contain a portion of rows of the table by a given condition. To do that, the index is declared using that condition, and only when it the condition is true the entry will be part of the index.

This is a very good way to thin the index, specially if you plan to have a lot of soft-deleted models. Imagine that half of the index is composed of records soft-deleted: half of the index is just wasted data that barely gets queried. Unacceptable!

MySQL and MariaDB still doesn’t support filtered indexes, but PostgreSQL, SQL Server and even SQLite do. The syntax may vary but the concept is still the same:

CREATE INDEX colors_user_id_deleted_at_is_null_index 
ON colors(user_id)
WHERE deleted_at IS NULL

Practically speaking, when you retrieve a record (or many of them) without pointing the primary key, like when using relationships, this will happen:

  • Not soft-deleted: it will use the index.
EXPLAIN QUERY PLAN
SELECT * FROM colors WHERE user_id = 5 AND deleted_at IS NULL
>> SEARCH TABLE colors USING INDEX colors_user_id_deleted_at_is_null_index (user_id=?)
  • Soft-deleted: it will swipe the table.
EXPLAIN QUERY PLAN
SELECT * FROM colors WHERE user_id = 5 AND deleted_at NOT NULL
>> SCAN TABLE colors
  • Without disclosing soft-deletion: it also will swipe the table.
EXPLAIN QUERY PLAN
SELECT * FROM colors WHERE user_id = 5
>> SCAN TABLE colors

Why in the last two it needs to swipe the table? Because the index only holds non-soft-deleted records. The moment you don’t disclose that explicitly, a full swipe will be done since the engine won’t know if there are soft-deleted models or not.

If querying trashed records is extensively done in the application, this may be a not a good trade-off to think about when thinning the index.

In summary, answering the title question: No, soft-deleted models are not slow.

The slowness comes from relationships, which queries a column not indexed. Thus, adding an index on the relationship column will accelerate the retrieval of records.

This won’t avoid sweeping the table entirely on most complex scenarios, since these strategies are not silver bullet, but retrieving a single (or multiple) records should avoid that sweep, which in some cases may save precious seconds on the request.

By the way, you should totally and definitely read Use the Index, Luke.

Graphic Designer graduate. Full Stack Web Developer. Retired Tech & Gaming Editor.