Low Latency Trading Insights

Low Latency Trading Insights

The Cloudflare Outage and the Rust Marketing Problem

What Happens When Language Safety Promises Meet Reality

Henrique Bucher's avatar
Henrique Bucher
Nov 19, 2025
∙ Paid

On November 18, 2025, a significant portion of the internet went offline. X, ChatGPT, Canva, Letterboxd, and countless other services became inaccessible. The culprit? A single unwrap() call in Rust code that panicked across Cloudflare’s 330+ datacenters. The incident has sparked a predictable wave of commentary: some defending Rust’s actual guarantees, others claiming vindication that “Rust isn’t special,” and most people somewhere in between, confused about what just happened and why.

The real story here isn’t technical failure—it’s a marketing failure. Not Cloudflare’s marketing, but Rust’s.

What Actually Happened: The Mundane Chain of Failure

Let’s start with the facts, because they matter more than the narrative.

On November 18 at 11:05 UTC, Cloudflare engineers made a database permission change to their ClickHouse cluster. The change was intended to improve security and reliability by allowing distributed queries to run under individual user accounts instead of a shared system account. Sensible, defensive engineering.

But this change created an unintended consequence. A machine learning query that generated Cloudflare’s Bot Management feature file—metadata about how to detect malicious traffic—began returning results from both the “default” database and the underlying “r0” database. Without explicit filtering, the query concatenated these results, creating duplicate entries in the feature file.

The feature file normally contained a modest number of entries. On November 18, it suddenly doubled in size, exceeding a hardcoded limit of 200 features.

Here’s where the Rust code enters the story. Cloudflare was in the middle of migrating from their old proxy engine (FL) to a new one called FL2, written in Rust. When the oversized feature file propagated to all FL2 machines, this code ran:

Code Block 1

Actually, it was slightly more subtle. The panic came from an unwrap() call on a Result that failed:

Code Block 2

The thread panicked: called Result::unwrap() on an Err value. By 11:20 UTC, FL2 workers across Cloudflare’s network were crashing. HTTP requests returned 5xx errors. The internet experienced a distributed denial of service—not from attackers, but from its own infrastructure.

The incident lasted roughly three hours until engineers could roll back the database change and deploy patches.

The Code That Broke the Internet

Let’s be explicit about what happened, because this is where the Rust conversation gets interesting.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Henrique Bucher
Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture