Cloudflare Failure: What led to the massive internet outage that affected ChatGPT and X.

19 November 2025

On November 18, 2025, a significant portion of the internet experienced turbulence as websites reliant on Cloudflare began showing widespread HTTP 5xx errors. The outage was massive, affecting everything from core content delivery to security and authentication services globally.

While the immediate suspicion for any incident of this scale often lands on a major cyberattack, Cloudflare has confirmed the true cause was far simpler—and far more frustrating—a single, cascading configuration error.

Here is a deep dive into what happened, the technical root cause, and the steps Cloudflare is taking to ensure it never happens again.

1. What Happened? (The Incident Snapshot)

Starting at 11:20 UTC on November 18, 2025, Cloudflare’s network began struggling, leading to customers receiving generic HTTP 5xx error pages. The failure was a complete internal system crash, not an external compromise.

The critical insight? This wasn’t a malicious attack, but a bug triggered by an internal configuration change to a database permissions system.

2. The Core Technical Failure: Duplicate Data, Fatal Size Limit

The root cause was traced back to a seemingly innocuous change in how a database permissions system was handled. This triggered an unexpected and catastrophic chain reaction:

Duplicate Data: The permissions change caused the underlying database (Cloudflare uses ClickHouse for this system) to produce duplicate rows when querying configuration data for the Bot Management system.
Configuration File Bloat: This duplicate data effectively doubled the size of a key configuration file, called the “feature file,” which is used by Cloudflare’s core proxy to determine if traffic is human or bot.
Crossing the Threshold: The feature file, which typically contains around 60 features, suddenly swelled, exceeding a hardcoded internal limit of 200 allowed features.
System Crash: When this oversized, invalid file was distributed to thousands of Cloudflare edge servers, the Core Proxy (FL / FL2) systems responsible for processing it panicked and crashed, causing the 5xx errors and widespread outage.

3. The Confusing Fluctuation: Good-Bad-Good

For engineers trying to diagnose the issue, the outage was initially erratic and confusing.

The faulty configuration file was being regenerated every 5 minutes. Sometimes the system would generate a correct (small) file, and sometimes it would generate the duplicate-filled (large) file. This meant the network kept failing and recovering repeatedly, leading to suspicion that a large-scale DDoS attack was targeting them.

Eventually, all servers began generating the bad file, leading to a complete system failure before the root cause was identified.

4. Cloudflare’s Response & Resolution Timeline

Cloudflare’s team took several hours to understand and resolve the core issue.

Time (UTC)	Action Taken
13:05	Partial mitigation by bypassing the failing proxy for Workers KV and Access, reducing immediate impact.
14:24	Stopped the creation of the bad configuration files and tested a known good version.
14:30	The correct, valid feature file was successfully deployed across all servers — leading to major recovery.
17:06	All services were reported as fully restored.

5. Services Affected

The cascading failure didn’t just hit websites; it impacted almost every major Cloudflare offering:

Core CDN & Security: Widespread 5xx error pages.
Turnstile: Failed to load, blocking user logins and forms on customer sites.
Workers KV: High error rates and failed requests.
Access: Authentication failures prevented users from logging into applications.
Dashboard: Users couldn’t log in due to the Turnstile dependency.
Email Security: Spam accuracy was temporarily degraded.

6. Commitment to Prevention

Cloudflare has openly acknowledged that this was their worst outage since 2019, and has committed to significant improvements in system resilience:

Better Internal Validation: Implementing stronger checks to validate internally generated configuration files before deployment.
Improved Error Handling: Ensuring that one bad configuration file cannot crash the core traffic processing (proxy) module.
Emergency Kill Switches: Adding immediate “kill switches” for non-essential internal systems.
Strengthening Core Proxies: Increasing the safeguards in the core traffic modules to handle unexpected inputs more gracefully.

Conclusion: A Simple Error, A Global Impact

The 2025 Cloudflare outage serves as a stark reminder of how fragile complex, globally scaled systems can be. The cause was a single, simple mistake—a change in a database permission system led to duplicate data, which led to a configuration file that was too big, which ultimately led to the failure of a vast portion of the internet.

It was not a hack, but a technical failure triggered by unexpected behavior during an internal update. Cloudflare has accepted responsibility and the internet will be watching as they implement the necessary engineering improvements to prevent history from repeating itself.

What do you think?

Show comments / Leave a comment

1 Comment

Rebecca Moor

April 11, 2023

The potential uses for Chat GPT-3 are endless, and it has the potential to revolutionize the way we interact with computers and machines.

Cloudflare Failure: What led to the massive internet outage that affected ChatGPT and X.

On a seemingly ordinary day, a massive internet outage sent shockwaves through the digital landscape, leaving millions of users unable to access popular platforms like ChatGPT and X. What caused this unprecedented disruption? In this article, we delve into the intricate web of events that led to the failure at Cloudflare, a key player in internet infrastructure. Discover the technical glitches, the ripple effects on global connectivity, and the lessons learned from this incident. Join us as we unravel the mystery behind the outage and explore its implications for the future of online services.

Cloud Hosting

Best Hosting Company in the World!

Best Hosting Company in the World! -Reasons why you should be careful when choosing a web hosting company.

Product engineering

Sales Jet Business Management Solution; By Lijohtech Developers

Dashboard Today’s metrics, including sales, income, expenses, and profit, are all available at a glance. You can track weekly sales and purchases, identify the best-selling

Partner with Us for Comprehensive IT

We’re happy as always to answer any questions you may have and help you determine which of our services best fit your needs.

Your benefits:

What happens next?

We Schedule a call at your convenience

We do a discovery and consulting meting

We prepare a proposal

Schedule a Free Consultation

First name

Last name

Company / Organization

Company email

Phone

How Can We Help You?

Message

Cloudflare Failure: What led to the massive internet outage that affected ChatGPT and X.

1. What Happened? (The Incident Snapshot)

2. The Core Technical Failure: Duplicate Data, Fatal Size Limit

3. The Confusing Fluctuation: Good-Bad-Good