Patterns of Distributed Systems: A New Series for People Who Ship Real Systems

Patterns of Distributed Systems in C# and .NET: A New Series for People Who Ship Real Systems

Distributed systems do not fail because you missed a feature. They fail because responsibility is unclear. Two nodes act, both think they are right, and your data becomes a debate. This series is my pushback against cargo cult architecture. We are going to talk about the small, repeatable techniques that stop outages, not the buzzwords that decorate slides.

Unmesh Joshi’s Patterns of Distributed Systems collects those techniques into a catalog you can apply. Each pattern names a recurring problem and proposes a practical move. My goal in this series is to translate those moves into C# and .NET code you can lift into production with minimal ceremony.

Catalog link: https://martinfowler.com/articles/patterns-of-distributed-systems/

Series Purpose

This series exists for one reason: to make the behavior of distributed systems explicit. You will see how the patterns shape ownership, ordering, and acknowledgement, so your system stops relying on luck.

Across the posts, you will learn how to:

  • Choose a single decision maker when your business rules assume one
  • Keep the state consistent when nodes and networks misbehave
  • Make time and ordering explicit instead of trusting clocks
  • Build recovery paths that do not create new failures

Each post focuses on one pattern. You will get the intent, the failure story that pattern prevents, and C# examples you can adapt directly. I will also include a link back to the catalog entry so you can cross-check details.

Series Pattern List

The table below lists the patterns I plan to cover. After each post is published, I will replace the placeholder with a link to that post.

PatternShort description
Leader and FollowersOne node decides for a group, others replicate the result
Leader ElectionSelect a leader and replace it safely after failure
LeaseReject writes from a stale leader after a failover
Fencing TokenSuspicion-based failure detection instead of fixed thresholds
Generation ClockTrack leadership terms so every decision has an epoch
QuorumUse a majority for reads and writes to survive failures
Compare and SwapConditional writes that prevent lost updates
HeartbeatPublish liveness and detect silence early
Phi Accrual Failure DetectorSuspicion-based failure detection instead of fixed thresholds
Gossip DisseminationShare membership state without a central coordinator
Write Ahead LogPersist intent before applying state changes
Segmented LogManage retention and compaction by splitting logs into segments
High Water MarkDefine what is committed and safe for clients
Low Water MarkDefine what can be deleted because no consumer needs it
Idempotent ReceiverMake duplicate delivery harmless
Transactional OutboxPublish events reliably without losing them
SagaLong running workflows with compensations
RetryRecover from transient failure without causing a stampede
TimeoutBound waiting so failures do not spread
Circuit BreakerStop hammering a dependency that is already failing
Request Waiting ListBackpressure that protects your service under load
Read RepairHeal replica drift during reads
Hinted HandoffBuffer writes for down replicas and replay later
Anti EntropyBackground reconciliation to reduce drift
Merkle TreeFind differences quickly by comparing hashes by range
Lamport ClockOrdering without trusting wall clocks
Hybrid Logical ClockCausality plus physical time for easier diagnosis
Version VectorDetect concurrent writes so you do not overwrite silently
Two Phase CommitCoordinate a commit decision across participants

What I Am Not Going to Do

I am not going to treat these patterns as optional decoration. If your system runs on more than one node, you are already living with these problems. You can solve them intentionally, or you can solve them at 2 a.m. while production is on fire.

A Short Story About Two Leaders and One Pizza Order

A team once had a nightly invoicing job that everyone described as “single instance.” That was not a design, it was a wish. Then the service gained a second node for availability. One night, both nodes started at the same moment. Both ran the job. Customers woke up to duplicate invoices. Finance woke up to a phone queue that sounded like a denial-of-service attack.

The fix was embarrassingly simple once the team stopped negotiating with reality. We made one node the leader for that job using a lease, and we fenced writes with a term so the old leader could not keep invoicing after a takeover. The next morning, the only duplicates were the pizza slices we ordered to celebrate not being on a call.

That is why these patterns matter. They turn “we assume only one thing happens” into “only one thing can happen.”

Leave A Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.