Cloudflare error

WBahn

Joined Mar 31, 2012
32,703
Yep. So much for "Check back in a few minutes."

At least now we can get our AAC fix!

I don't know which is more troubling, that the forums were unavailable for a few hours, or that some of us (myself obviously included) are more bothered by it than we probably should be. Can anyone say, "Addiction!"?
 

WBahn

Joined Mar 31, 2012
32,703
Hello,

The cloudflare error has been world wide.
The view on aac was a short time available and later the error was there again.

More info on the cloudflare site:
https://www.cloudflarestatus.com/

Bertus
I must admit, they seem pretty good about posting updated status messages, even if just to say that they are still investigating. Much better than leaving people swinging in the wind.

Also, when you think about it, identifying and fixing a worldwide issue in three hours is pretty incredible. Clearly a lot of experienced and well-trained people involved that are able to respond well in a crisis.
 

Thread Starter

MrChips

Joined Oct 2, 2009
34,628
Initial interruption on 2025.11.18 11:48 UTC
Some services restored at 14:42 UTC

Nov 18, 2025 - 17:44 UTC
Update - Cloudflare services are currently operating normally. We are no longer observing elevated errors or latency across the network.

Our engineering teams continue to closely monitor the platform and perform a deeper investigation into the earlier disruption, but no configuration changes are being made at this time.

At this point, it is considered safe to re-enable any Cloudflare services that were temporarily disabled during the incident. We will provide a final update once our investigation is complete.
Nov 18, 2025 - 17:44 UTC
:
:

Monitoring - A fix has been implemented and we believe the incident is now resolved. We are continuing to monitor for errors to ensure all services are back to normal.
Nov 18, 2025 - 14:42 UTC
Update - We've deployed a change which has restored dashboard services. We are still working to remediate broad application services impact
Nov 18, 2025 - 14:34 UTC
 

WBahn

Joined Mar 31, 2012
32,703
Initial interruption on 2025.11.18 11:48 UTC
Some services restored at 14:42 UTC

Nov 18, 2025 - 17:44 UTC
Update - Cloudflare services are currently operating normally. We are no longer observing elevated errors or latency across the network.

Our engineering teams continue to closely monitor the platform and perform a deeper investigation into the earlier disruption, but no configuration changes are being made at this time.

At this point, it is considered safe to re-enable any Cloudflare services that were temporarily disabled during the incident. We will provide a final update once our investigation is complete.
Nov 18, 2025 - 17:44 UTC
:
:

Monitoring - A fix has been implemented and we believe the incident is now resolved. We are continuing to monitor for errors to ensure all services are back to normal.
Nov 18, 2025 - 14:42 UTC
Update - We've deployed a change which has restored dashboard services. We are still working to remediate broad application services impact
Nov 18, 2025 - 14:34 UTC
I had a brief outage about 1.5 hours ago. I don't know when it started, but all I had to do was try again a few seconds later and it worked. That's something I see from time to time, so it might not be related to this incident at all.
 

nsaspook

Joined Aug 27, 2009
16,250
1763492604226.png
Cloudflare CTO speaks out:
I won’t mince words: earlier today we failed our customers and the broader Internet when a problem in Cloudflare network impacted large amounts of traffic that rely on us. The sites, businesses, and organizations that rely on Cloudflare depend on us being available and I apologize for the impact that we caused. Transparency about what happened matters, and we plan to share a breakdown with more details in a few hours. In short, a latent bug in a service underpinning our bot mitigation capability started to crash after a routine configuration change we made. That cascaded into a broad degradation to our network and other services. This was not an attack. That issue, impact it caused, and time to resolution is unacceptable. Work is already underway to make sure it does not happen again, but I know it caused real pain today. The trust our customers place in us is what we value the most and we are going to do what it takes to earn that back.
 

joeyd999

Joined Jun 6, 2011
6,204
Yes, I know that you could have single-handedly identified the problem, developed a solution, and implemented the solution within at most a couple of minutes, but the rest of humanity isn't judged against the most superior standard that your skills would obviously set.
You should consider the possibility that you may have misinterpreted the motivation behind my post.
 

nsaspook

Joined Aug 27, 2009
16,250
Sure, a C kernel patch, to fix Rust code (the solution for memory errors ;) ), for a database permissions problem, caused by a 'AI' agent. Do better next time but thanks for playing.

https://blog.cloudflare.com/18-november-2025-outage/

One of those modules, Bot Management, was the source of today’s outage.

Cloudflare’s Bot Management includes, among other systems, a machine learning model that we use to generate bot scores for every request traversing our network. Our customers use bot scores to control which bots are allowed to access their sites — or not.

...
When the bad file with more than 200 features was propagated to our servers, this limit was hit — resulting in the system panicking. The FL2 Rust code that makes the check and was the source of the unhandled error is shown below:

code that generated the error

This resulted in the following panic which in turn resulted in a 5xx error:

thread fl2_worker_thread panicked: called Result::unwrap() on an Err value


 
Last edited:

Futurist

Joined Apr 8, 2025
721
Sure, a C kernel patch, to fix Rust code (the solution for memory errors ;) ), for a database permissions problem, caused by a 'AI' agent. Do better next time but thanks for playing.

https://blog.cloudflare.com/18-november-2025-outage/

One of those modules, Bot Management, was the source of today’s outage.

Cloudflare’s Bot Management includes, among other systems, a machine learning model that we use to generate bot scores for every request traversing our network. Our customers use bot scores to control which bots are allowed to access their sites — or not.

...
When the bad file with more than 200 features was propagated to our servers, this limit was hit — resulting in the system panicking. The FL2 Rust code that makes the check and was the source of the unhandled error is shown below:

code that generated the error

This resulted in the following panic which in turn resulted in a 5xx error:

thread fl2_worker_thread panicked: called Result::unwrap() on an Err value



What's a "panic"? is that a Linux term? real operating systems never panic, they act rationally and decisively at all times; only losers panic.
;)

 
Last edited:
Top