February 20 2025
Cache and Consistency in Real Systems
How to think about cache as a copy with a consistency cost, not as a magic patch for slow reads.
Andrews Ribeiro
Founder & Engineer
5 min Intermediate Systems
Track
System Design Interviews - From Basics to Advanced
Step 5 / 19
The problem
Cache often enters the conversation by reflex.
The read is slow?
- add cache
The endpoint is struggling?
- add cache
The database is expensive?
- add cache
That sounds practical.
But a lot of the time it is just a neat way to move the problem somewhere else.
Because the real question is not:
- “Can we add cache?”
The real question is:
- “Are we speeding up the right read or hiding a weak design?”
If the query is bad, the index is missing, the ORM is causing N+1, or the screen asks for too much data, cache may improve the benchmark and make the system harder to understand.
And worse:
it may improve the benchmark while serving stale data to the user.
Mental model
Think about it like this:
Cache is a temporary copy of the truth, created so the system does not have to fetch the original every time.
That sentence already clears up half the confusion.
If cache is a copy, the conversation changes.
You have to answer:
- how stale can this copy be?
- who updates it?
- when does it stop being trustworthy?
- what happens if the user sees that delay?
So cache is not only about performance.
It is performance bought with consistency risk.
Breaking it down
Good cache speeds up repeated and expensive reads
Cache usually makes a lot of sense when you have a pattern like this:
- the same read happens all the time
- fetching the source is expensive
- a small delay is acceptable
- the update policy is understandable
Common examples:
- product page content
- public configuration
- rankings that can lag a little
- heavily read lists with rare updates
In those cases, the temporary copy often pays for itself.
Bad cache becomes makeup for a misunderstood cause
This is the most common mistake.
The read is slow, but the cause may be:
- bad query shape
- missing index
- N+1
- excessive join cost
- unnecessary
select * - a screen asking for too much data
If you throw cache on top without understanding that, you may:
- preserve the original cause
- increase complexity
- make debugging harder
- create stale reads
So maturity here starts with one simple question:
- “Why is this read expensive right now?”
TTL is not a full strategy
Many people think the cache design is done when they define:
TTL = 5 minutes
But TTL is only one part.
It answers:
- how long the copy can live before forced expiration
It does not answer everything else.
You still need to think about:
- what if the data changes before that?
- what if many copies expire at the same time?
- what if this read cannot wait that long?
TTL helps.
It does not replace invalidation thinking.
Different data deserves different tolerance
This gets ignored too early.
A small delay may be acceptable for:
- product description
- avatar
- ranking
- like counts
The same delay may be terrible for:
- balance
- tight inventory
- permissions
- payment state
If you use the same cache strategy for everything, the cache stops being an optimization and starts becoming a nicely packaged lie.
Invalidation is a product decision too
Another common mistake is treating invalidation as only an infrastructure detail.
It is not.
The question “when should this cache die?” often depends on:
- user impact
- risk of stale data
- update flow
- read frequency
That is why good invalidation does not come only from the tool.
It comes from understanding what truth the user expects on that screen or in that flow.
Using cache at the wrong layer also hurts
Sometimes the issue is not whether to cache.
It is where to cache.
You can cache:
- the database query
- the API response
- a fragment of the page
- an object in the application
- an edge or CDN response
Each layer has different trade-offs.
Caching the whole page when only the price changes quickly may be worse than caching only the stable part.
Caching per user when most of the content is shared may waste memory.
So besides deciding whether to use cache, you also need to decide where.
Misses and stampedes are part of the design too
This is another place where mature answers stand out:
cache is not only about the nice hit path.
There is also:
- expensive miss
- synchronized expiration
- bursts of recomputation
If many requests lose the copy at the same time, you can push the whole load back to the origin at the worst possible moment.
So cache design also needs to consider what happens when cache stops helping.
Simple example
Imagine a product page with:
- product description
- images
- current stock
- current price
Caching the full page may reduce load a lot.
But if stock changes quickly during a promotion, a user may see “in stock,” click buy, and then fail at checkout because the cached page was old.
So the real discussion is not “cache or no cache.”
It is deciding which parts can tolerate delay and which parts must stay fresh.
For example:
- description and images may tolerate more delay
- stock and price may need a much fresher path
That is a much more useful cache conversation.
Common mistakes
- Adding cache before proving where the real bottleneck is.
- Acting as if invalidation is a small detail to solve later.
- Treating all data as if it had the same tolerance for staleness.
- Forgetting that user trust and perceived consistency are part of the product.
How a senior thinks
A strong senior engineer does not just ask, “Where do we put the cache?”
They ask:
Which read really needs to be cheaper, and how much staleness can the business accept before we start lying to the user?
That question changes the maturity of the decision immediately.
What the interviewer wants to see
In interviews, cache separates shallow performance talk from actual system thinking.
They want to see whether you can:
- describe cache as a trade-off, not a free bonus
- bring up invalidation and freshness early
- connect consistency directly to user experience
Cache speeds up reads, but it also creates distance from the truth.
If you still do not know when the copy stops being valid, the cache design is not finished.
Quick summary
What to keep in your head
- Cache is a temporary copy of the truth, so it always brings questions about freshness, invalidation, and stale reads.
- If a read is slow because of a bad query, missing index, or N+1 problem, cache may hide the symptom without fixing the cause.
- Not every kind of data tolerates the same strategy. Product description can age more than balance, stock, or permissions.
- In interviews, a strong answer shows where cache helps, which trade-off it buys, and how you avoid lying to the user.
Practice checklist
Use this when you answer
- Can I explain cache as a copy with a cost instead of as a magic fix?
- Do I know when to investigate query shape, indexing, or modeling before adding cache?
- Can I separate data that tolerates delay from data that does not?
- Can I discuss TTL and invalidation as product and architecture decisions, not just infrastructure settings?
You finished this article
Part of the track: System Design Interviews - From Basics to Advanced (5/19)
Share this page
Copy the link manually from the field below.