On November 18, 2025, a significant portion of the internet experienced turbulence as websites reliant on Cloudflare began showing widespread HTTP 5xx errors. The outage was massive, affecting everything from core content delivery to security and authentication services globally.
While the immediate suspicion for any incident of this scale often lands on a major cyberattack, Cloudflare has confirmed the true cause was far simpler—and far more frustrating—a single, cascading configuration error.
Here is a deep dive into what happened, the technical root cause, and the steps Cloudflare is taking to ensure it never happens again.
1. What Happened? (The Incident Snapshot)
Starting at 11:20 UTC on November 18, 2025, Cloudflare’s network began struggling, leading to customers receiving generic HTTP 5xx error pages. The failure was a complete internal system crash, not an external compromise.
The critical insight? This wasn’t a malicious attack, but a bug triggered by an internal configuration change to a database permissions system.
2. The Core Technical Failure: Duplicate Data, Fatal Size Limit
The root cause was traced back to a seemingly innocuous change in how a database permissions system was handled. This triggered an unexpected and catastrophic chain reaction:
- Duplicate Data: The permissions change caused the underlying database (Cloudflare uses ClickHouse for this system) to produce duplicate rows when querying configuration data for the Bot Management system.
- Configuration File Bloat: This duplicate data effectively doubled the size of a key configuration file, called the “feature file,” which is used by Cloudflare’s core proxy to determine if traffic is human or bot.
- Crossing the Threshold: The feature file, which typically contains around 60 features, suddenly swelled, exceeding a hardcoded internal limit of 200 allowed features.
- System Crash: When this oversized, invalid file was distributed to thousands of Cloudflare edge servers, the Core Proxy (FL / FL2) systems responsible for processing it panicked and crashed, causing the 5xx errors and widespread outage.
3. The Confusing Fluctuation: Good-Bad-Good
For engineers trying to diagnose the issue, the outage was initially erratic and confusing.
The faulty configuration file was being regenerated every 5 minutes. Sometimes the system would generate a correct (small) file, and sometimes it would generate the duplicate-filled (large) file. This meant the network kept failing and recovering repeatedly, leading to suspicion that a large-scale DDoS attack was targeting them.
Eventually, all servers began generating the bad file, leading to a complete system failure before the root cause was identified.
4. Cloudflare’s Response & Resolution Timeline
Cloudflare’s team took several hours to understand and resolve the core issue.
| Time (UTC) | Action Taken |
| 13:05 | Partial mitigation by bypassing the failing proxy for Workers KV and Access, reducing immediate impact. |
| 14:24 | Stopped the creation of the bad configuration files and tested a known good version. |
| 14:30 | The correct, valid feature file was successfully deployed across all servers — leading to major recovery. |
| 17:06 | All services were reported as fully restored. |
5. Services Affected
The cascading failure didn’t just hit websites; it impacted almost every major Cloudflare offering:
- Core CDN & Security: Widespread 5xx error pages.
- Turnstile: Failed to load, blocking user logins and forms on customer sites.
- Workers KV: High error rates and failed requests.
- Access: Authentication failures prevented users from logging into applications.
- Dashboard: Users couldn’t log in due to the Turnstile dependency.
- Email Security: Spam accuracy was temporarily degraded.
6. Commitment to Prevention
Cloudflare has openly acknowledged that this was their worst outage since 2019, and has committed to significant improvements in system resilience:
- Better Internal Validation: Implementing stronger checks to validate internally generated configuration files before deployment.
- Improved Error Handling: Ensuring that one bad configuration file cannot crash the core traffic processing (proxy) module.
- Emergency Kill Switches: Adding immediate “kill switches” for non-essential internal systems.
- Strengthening Core Proxies: Increasing the safeguards in the core traffic modules to handle unexpected inputs more gracefully.
Conclusion: A Simple Error, A Global Impact
The 2025 Cloudflare outage serves as a stark reminder of how fragile complex, globally scaled systems can be. The cause was a single, simple mistake—a change in a database permission system led to duplicate data, which led to a configuration file that was too big, which ultimately led to the failure of a vast portion of the internet.
It was not a hack, but a technical failure triggered by unexpected behavior during an internal update. Cloudflare has accepted responsibility and the internet will be watching as they implement the necessary engineering improvements to prevent history from repeating itself.



The potential uses for Chat GPT-3 are endless, and it has the potential to revolutionize the way we interact with computers and machines.