February 20 2025

Cache and Consistency in Real Systems

How to think about cache as a copy with a consistency cost, not as a magic patch for slow reads.

Andrews Ribeiro

Founder & Engineer

5 min Intermediate Systems

#data-storage#data#cache#consistency#performance#backend

Track

System Design Interviews - From Basics to Advanced

Step 5 / 19

Back to track Previous article Next article

The problem

Cache often enters the conversation by reflex.

The read is slow?

add cache

The endpoint is struggling?

add cache

The database is expensive?

add cache

That sounds practical.

But a lot of the time it is just a neat way to move the problem somewhere else.

Because the real question is not:

“Can we add cache?”

The real question is:

“Are we speeding up the right read or hiding a weak design?”

If the query is bad, the index is missing, the ORM is causing N+1, or the screen asks for too much data, cache may improve the benchmark and make the system harder to understand.

And worse:

it may improve the benchmark while serving stale data to the user.

Mental model

Think about it like this:

Cache is a temporary copy of the truth, created so the system does not have to fetch the original every time.

That sentence already clears up half the confusion.

If cache is a copy, the conversation changes.

You have to answer:

how stale can this copy be?
who updates it?
when does it stop being trustworthy?
what happens if the user sees that delay?

So cache is not only about performance.

It is performance bought with consistency risk.

Breaking it down

Good cache speeds up repeated and expensive reads

Cache usually makes a lot of sense when you have a pattern like this:

the same read happens all the time
fetching the source is expensive
a small delay is acceptable
the update policy is understandable

Common examples:

product page content
public configuration
rankings that can lag a little
heavily read lists with rare updates

In those cases, the temporary copy often pays for itself.

Bad cache becomes makeup for a misunderstood cause

This is the most common mistake.

The read is slow, but the cause may be:

bad query shape
missing index
N+1
excessive join cost
unnecessary select *
a screen asking for too much data

If you throw cache on top without understanding that, you may:

preserve the original cause
increase complexity
make debugging harder
create stale reads

So maturity here starts with one simple question:

“Why is this read expensive right now?”

TTL is not a full strategy

Many people think the cache design is done when they define:

TTL = 5 minutes

But TTL is only one part.

It answers:

how long the copy can live before forced expiration

It does not answer everything else.

You still need to think about:

what if the data changes before that?
what if many copies expire at the same time?
what if this read cannot wait that long?

TTL helps.

It does not replace invalidation thinking.

Different data deserves different tolerance

This gets ignored too early.

A small delay may be acceptable for:

product description
avatar
ranking
like counts

The same delay may be terrible for:

balance
tight inventory
permissions
payment state

If you use the same cache strategy for everything, the cache stops being an optimization and starts becoming a nicely packaged lie.

Invalidation is a product decision too

Another common mistake is treating invalidation as only an infrastructure detail.

It is not.

The question “when should this cache die?” often depends on:

user impact
risk of stale data
update flow
read frequency

That is why good invalidation does not come only from the tool.

It comes from understanding what truth the user expects on that screen or in that flow.

Using cache at the wrong layer also hurts

Sometimes the issue is not whether to cache.

It is where to cache.

You can cache:

the database query
the API response
a fragment of the page
an object in the application
an edge or CDN response

Each layer has different trade-offs.

Caching the whole page when only the price changes quickly may be worse than caching only the stable part.

Caching per user when most of the content is shared may waste memory.

So besides deciding whether to use cache, you also need to decide where.

Misses and stampedes are part of the design too

This is another place where mature answers stand out:

cache is not only about the nice hit path.

There is also:

expensive miss
synchronized expiration
bursts of recomputation

If many requests lose the copy at the same time, you can push the whole load back to the origin at the worst possible moment.

So cache design also needs to consider what happens when cache stops helping.

Simple example

Imagine a product page with:

product description
images
current stock
current price

Caching the full page may reduce load a lot.

But if stock changes quickly during a promotion, a user may see “in stock,” click buy, and then fail at checkout because the cached page was old.

So the real discussion is not “cache or no cache.”

It is deciding which parts can tolerate delay and which parts must stay fresh.

For example:

description and images may tolerate more delay
stock and price may need a much fresher path

That is a much more useful cache conversation.

Common mistakes

Adding cache before proving where the real bottleneck is.
Acting as if invalidation is a small detail to solve later.
Treating all data as if it had the same tolerance for staleness.
Forgetting that user trust and perceived consistency are part of the product.

How a senior thinks

A strong senior engineer does not just ask, “Where do we put the cache?”

They ask:

Which read really needs to be cheaper, and how much staleness can the business accept before we start lying to the user?

That question changes the maturity of the decision immediately.

What the interviewer wants to see

In interviews, cache separates shallow performance talk from actual system thinking.

They want to see whether you can:

describe cache as a trade-off, not a free bonus
bring up invalidation and freshness early
connect consistency directly to user experience

Cache speeds up reads, but it also creates distance from the truth.

If you still do not know when the copy stops being valid, the cache design is not finished.

Quick summary

What to keep in your head

Cache is a temporary copy of the truth, so it always brings questions about freshness, invalidation, and stale reads.
If a read is slow because of a bad query, missing index, or N+1 problem, cache may hide the symptom without fixing the cause.
Not every kind of data tolerates the same strategy. Product description can age more than balance, stock, or permissions.
In interviews, a strong answer shows where cache helps, which trade-off it buys, and how you avoid lying to the user.

Practice checklist

Use this when you answer

Can I explain cache as a copy with a cost instead of as a magic fix?
Do I know when to investigate query shape, indexing, or modeling before adding cache?
Can I separate data that tolerates delay from data that does not?
Can I discuss TTL and invalidation as product and architecture decisions, not just infrastructure settings?

You finished this article

Part of the track: System Design Interviews - From Basics to Advanced (5/19)

Next step

Choosing Between SQL and NoSQL Next step →