Global IT outages have hit airlines and business worldwide

**Commissar SFLUFAN** · July 19

108009101-1721402940158-gettyimages-1711

CrowdStrike IT outage hits global supply chain, with air freight facing days or weeks to recover

WWW.CNBC.COM

Crowdstrike global IT outage hits the heart of the global supply chain, with air freight, rails, ports and trucking in the U.S. and beyond down.

Quote

The CrowdStrike software bug that crashed Microsoft operating systems and caused the largest IT outage in history caused disruptions at U.S. and global ports, with highly complex air freight systems suffering the heaviest hit, according to logistics experts, as global airlines grounded flights.

“Planes and cargo are not where they are supposed to be and it will take days or even weeks to fully resolve,” Niall van de Wouw, chief air freight officer at supply chain consulting firm

Xeneta, said in a statement shared with CNBC. “This is a reminder of how vulnerable our ocean and air supply chains are to IT failure.”

Ricofoley · July 19

Sounds like the Southwest Airlines stuff where once things got out of sync enough they basically had to start over from scratch to create a new plan for who should go where

SuperSpreader · July 19

so this is why IT rolls out updates late after they test them for weeks

sblfilms · July 19

2 minutes ago, Commissar SFLUFAN said:

CrowdStrike IT outage hits global supply chain, with air freight facing days or weeks to recover

WWW.CNBC.COM

Crowdstrike global IT outage hits the heart of the global supply chain, with air freight, rails, ports and trucking in the U.S. and beyond down.

One of the problems with our hyper efficient supply chains, things to JIT manufacturing, is holy moly do things get all out of whack in relatively short order and the dominoes don’t stop falling for quite some time.

**Commissar SFLUFAN** · July 19

1 minute ago, sblfilms said:

One of the problems with our hyper efficient supply chains, things to JIT manufacturing, is holy moly do things get all out of whack in relatively short order and the dominoes don’t stop falling for quite some time.

JIT ensures that the logistics chain operates on the knife's edge of disaster, with little-to-no redundancy.

legend · July 19

What I want to know is why is this patch distributed without question everywhere? Why doesn't each client using this software not do a local test of a new patch before deploying into prod?

I mean, CloudStrike is deserving of blame and should probably go under for this. But why did no users of it think to test things when they update them?

SuperSpreader · July 19

7 minutes ago, legend said:

I mean, CloudStrike is deserving of blame and should probably go under for this. But why did no users of it think to test things when they update them?

that's what big tech firms usually do
engineering tests it locally for a few weeks before deploying

Ghost_MH · July 19

16 minutes ago, legend said:

What I want to know is why is this patch distributed without question everywhere? Why doesn't each client using this software not do a local test of a new patch before deploying into prod?

I mean, CloudStrike is deserving of blame and should probably go under for this. But why did no users of it think to test things when they update them?

Wouldn't be the first time. I haven't dug into this one too much since it didn't affect our office, but I remember a decade ago where McAfee bricked Windows desktops. In that case, it want an untested update to the software, but security definitions. There are more and more applications that are self-updating without IT intervention because the update is supposedly limited to itself. Edge and Chrome fall in this category, security or virus definitions with white and black lists along with threat hashes didn't normally get manual approval.

**Jason** · July 19

3 minutes ago, Ghost_MH said:

Wouldn't be the first time. I haven't dug into this one too much since it didn't affect our office, but I remember a decade ago where McAfee bricked Windows desktops. In that case, it want an untested update to the software, but security definitions. There are more and more applications that are self-updating without IT intervention because the update is supposedly limited to itself. Edge and Chrome fall in this category, security or virus definitions with white and black lists along with threat hashes didn't normally get manual approval.

CrowdStrike is ring 0.

ApatheticSarcasm · July 19

So this is what I understand to have happened

Keyser_Soze · July 19

Square should sue them because their name is too close to Cloud Strife.

**Reputator** · July 19

1 hour ago, legend said:

What I want to know is why is this patch distributed without question everywhere? Why doesn't each client using this software not do a local test of a new patch before deploying into prod?

I mean, CloudStrike is deserving of blame and should probably go under for this. But why did no users of it think to test things when they update them?

In the modern world of software, customers ARE the QA testers.

**Commissar SFLUFAN** · July 19

1 minute ago, Reputator said:

In the modern world of software, customers ARE the QA testers.

Especially for the fans of Bethesda games!

Best · July 19

41 minutes ago, ApatheticSarcasm said:

So this is what I understand to have happened

I always enjoy these.

Link200 · July 19

The people at fault are the people that constantly invest in a single point of failure. Car dealerships learned that mistake last month.

This really shouldn't have been as big as it was but the drive to lower costs just cost many of these companies potentially billions in lost revenue.

ApatheticSarcasm · July 19

13 minutes ago, Link200 said:

The people at fault are the people that constantly invest in a single point of failure. Car dealerships learned that mistake last month.

This really shouldn't have been as big as it was but the drive to lower costs just cost many of these companies potentially billions in lost revenue.

So once all these companies perform a post mortem, what are the lessons learned?

dualhunter · July 19

Looks like I am affected after all. Teams training call for our new phone system couldn't be set up and it looks like the internet fax service isn't working either.

**Commissar SFLUFAN** · July 19

Visualization of how the outage impacted US air traffic:

legend · July 19

2 hours ago, Ghost_MH said:

Wouldn't be the first time. I haven't dug into this one too much since it didn't affect our office, but I remember a decade ago where McAfee bricked Windows desktops. In that case, it want an untested update to the software, but security definitions. There are more and more applications that are self-updating without IT intervention because the update is supposedly limited to itself. Edge and Chrome fall in this category, security or virus definitions with white and black lists along with threat hashes didn't normally get manual approval.

I’m suddenly in favor of government regulation of IT

legend · July 19

2 hours ago, SuperSpreader said:

that's what big tech firms usually do
engineering tests it locally for a few weeks before deploying

Yeah this is the absolutely sane thing to do. I think my groups AI research experiment platform has more safe guards than apparently a lot of big companies with high risk production systems.

Keyser_Soze · July 19

22 minutes ago, Commissar SFLUFAN said:

Visualization of how the outage impacted US air traffic:

In case of emergency flock to America's penis.

b_m_b_m_b_m · July 19

1 hour ago, Commissar SFLUFAN said:

Visualization of how the outage impacted US air traffic:

Take that bin laden

ApatheticSarcasm · July 20

22 hours ago, Commissar SFLUFAN said:

Visualization of how the outage impacted US air traffic:

I know its not exactly the same, but I remember seeing something like that when 9/11 was happening, they had to route inbound planes to Canada or wherever else they could.

**Nokra** · July 20

I work in tech, and this made for a really fun day of work yesterday. :silly: I spent the better part of my day dealing with about 100 systems that were (temporarily) bricked by this. GG, CrowdStrike.

Ghost_MH · July 20

22 hours ago, legend said:

I’m suddenly in favor of government regulation of IT

Looking into it more, this was an update to CrowdStrike's logic engine. This is not an update that would normally be vetted because the tools for locally vetting them are expensive and few corporations are willing to pay for a full QA infrastructure. On top of that, I do believe CrowdStrike is directly updated. That is, the updates are pulled directly from them and not controlled by some IT-controlled update server.

I'd take some government regulation here if it forced companies to provide free tools for staging updates or providing updates at a regular schedule. Like on the Microsoft front, I know they'll release updates every second Tuesday of the month. That means if I just update on the third Tuesday of the month I can be pretty confident I won't pull anything bad without any extra cost to me.

legend · July 20

19 minutes ago, Ghost_MH said:

Looking into it more, this was an update to CrowdStrike's logic engine. This is not an update that would normally be vetted because the tools for locally vetting them are expensive and few corporations are willing to pay for a full QA infrastructure. On top of that, I do believe CrowdStrike is directly updated. That is, the updates are pulled directly from them and not controlled by some IT-controlled update server.

I'd take some government regulation here if it forced companies to provide free tools for staging updates or providing updates at a regular schedule. Like on the Microsoft front, I know they'll release updates every second Tuesday of the month. That means if I just update on the third Tuesday of the month I can be pretty confident I won't pull anything bad without any extra cost to me.

Not sure I'm following why it can't be vetted and requires expensive tools. Just install it on a single system and try to start? Are you saying the updates are pulled automatically in the background? Because if that's it, maybe we should stop doing that for system critical software.

**Commissar SFLUFAN** · July 20

1 hour ago, ApatheticSarcasm said:

I know its not exactly the same, but I remember seeing something like that when 9/11 was happening, they had to route inbound planes to Canada or wherever else they could.

This is the 9/11 airspace closure visualization with timeline:

Ghost_MH · July 20

14 minutes ago, legend said:

Not sure I'm following why it can't be vetted and requires expensive tools. Just install it on a single system and try to start? Are you saying the updates are pulled automatically in the background? Because if that's it, maybe we should stop doing that for system critical software.

Yup, just pulled automatically in the background. Seems CloudStrike told some clients to just reboot their systems dozens of times until a fix was downloaded, but I don't know anyone where that actually worked.

Many security apps are like this. AV definitions aren't normally vetted. This is especially true for logic engines in security suites. Think of these as machine learning tools for keeping systems safe.

I'm more intimately familiar with McAfee's similar outage nearly twenty years. That one had their AV definitions accidently flag a Windows system file as bad which bricked Windows as soon as the AV quarantined the essential DLL. I'm also pretty familiar with Qualys. I previously used Qualys for managing security and updates and their tools were automatically updated by DEFAULT. This is part of the problem. The reason I say it's expensive is because you'd need parallel hardware and companies already view IT as a net negative on corporate profits. You can't just test things on one virtual machine and call it a day. Have some physical database cluster? Well, now you need a second similar cluster. Have an entire virtual environment for your engineers? Well, if you really want to test things you need a complicated engineering environment. If you don't, you need to accept that you're not fully testing things and I've never met a CFO that was OK with funding partial tests that can't be guaranteed against.

My cheap solution to this was always to push all updates off by a week and then pay attention to news reports about faulty updates. That's obviously not an option for everyone, though. If everyone skips their updates by a week then we're back where we started. Also, all of these companies tell you best practice is to stay updated and on time. If you don't and you get bit by a zero day during that update gap, it's your policy that caused the outage and you wind up with the full blame.

It sucks, but that's how it is. I've personally gotten drilled by a CEO that was upset with me for updates that weren't installed per my policies even though we weren't negatively affected. Just big news about some zero day, randomly sees me walking by his office, calls me in and asks if we're patched to prevent this exploit. When he heard we weren't because those updates weren't scheduled to go out for another week, he really wasn't happy. Wasn't happy about it, but I ended up pushing an out of band update for just that one zero day and left everything else as is.

I like my job, but working in IT often sucks.

CitizenVectron · July 20

16 minutes ago, Ghost_MH said:

Yup, just pulled automatically in the background. Seems CloudStrike told some clients to just reboot their systems dozens of times until a fix was downloaded, but I don't know anyone where that actually worked.

Many security apps are like this. AV definitions aren't normally vetted. This is especially true for logic engines in security suites. Think of these as machine learning tools for keeping systems safe.

I'm more intimately familiar with McAfee's similar outage nearly twenty years. That one had their AV definitions accidently flag a Windows system file as bad which bricked Windows as soon as the AV quarantined the essential DLL. I'm also pretty familiar with Qualys. I previously used Qualys for managing security and updates and their tools were automatically updated by DEFAULT. This is part of the problem. The reason I say it's expensive is because you'd need parallel hardware and companies already view IT as a net negative on corporate profits. You can't just test things on one virtual machine and call it a day. Have some physical database cluster? Well, now you need a second similar cluster. Have an entire virtual environment for your engineers? Well, if you really want to test things you need a complicated engineering environment. If you don't, you need to accept that you're not fully testing things and I've never met a CFO that was OK with funding partial tests that can't be guaranteed against.

My cheap solution to this was always to push all updates off by a week and then pay attention to news reports about faulty updates. That's obviously not an option for everyone, though. If everyone skips their updates by a week then we're back where we started. Also, all of these companies tell you best practice is to stay updated and on time. If you don't and you get bit by a zero day during that update gap, it's your policy that caused the outage and you wind up with the full blame.

It sucks, but that's how it is. I've personally gotten drilled by a CEO that was upset with me for updates that weren't installed per my policies even though we weren't negatively affected. Just big news about some zero day, randomly sees me walking by his office, calls me in and asks if we're patched to prevent this exploit. When he heard we weren't because those updates weren't scheduled to go out for another week, he really wasn't happy. Wasn't happy about it, but I ended up pushing an out of band update for just that one zero day and left everything else as is.

I like my job, but working in IT often sucks.

Leadership generally views IT as lesser-than, and also not required. Until they can't print a weird PDF.

CitizenVectron · July 20

We currently have no Infrastructure Manager in IT (also responsible for security) as leadership won't pay the position enough to attract good talent. We just fired the last person we hired during her probation as she basically lied about her skills. If we'd had crowdstrike...we'd be fucked. We are a team of 18 people and support around 8,000 windows laptops that we just reimaged and deployed into schools.

chakoo · July 20

I don’t know how someone can run any software company without a full QA department that certifies builds/updates before they go live. I also don’t understand how any large scale company can use/trust any software that auto updates without allowing you to control the roll out schedule. When I ran a SaaS startup even we had a dedicated QA team.

SuperSpreader · July 20

2 minutes ago, chakoo said:

I don’t know how someone can run any software company without a full QA department that certifies builds/updates before they go live. I also don’t understand how any large scale company can use/trust any software that auto updates without allowing you to control the roll out schedule. When I ran a SaaS startup even we had a dedicated QA team.

Trying to cut costs. Lots of layoffs this past year.

Ghost_MH · July 20

7 minutes ago, chakoo said:

I don’t know how someone can run any software company without a full QA department that certifies builds/updates before they go live. I also don’t understand how any large scale company can use/trust any software that auto updates without allowing you to control the roll out schedule. When I ran a SaaS startup even we had a dedicated QA team.

4 minutes ago, SuperSpreader said:

Trying to cut costs. Lots of layoffs this past year.

Exactly this. It's the bigger companies that have the budget that refuse to adequately fund this kind of stuff. They'll often bring in some MBA to manage IT/cut costs and that MBA will decide that nobody working for them knows better, so if we use a third party tools to manage certain risks you should abide by that parties' best practices which usually includes allowing them to manage their own updates because it frees up internal resources to run leaner and more efficiently.

Last company I worked for, tend of thousands of employees...

Innocent old me: What's our DR plan here?

IT management: We have good backups.

Me: That's not a real plan. What do we do if our main datacenter becomes a crater?

Them: We can recover from tape.

Me: Well, I didn't realize the business could run if we were offline for a couple of months.

Just a full functional mess. Years ago, I got into a very real argument with our CFO over moving expenses. I wanted to hire movers and he thought that was silly since we could just get a truck and move all our servers ourselves. I told him if that's the route he wants to go, I want him to drive the truck and for his own personal insurance to cover the damages if some asshole with no insurance t-bones us and we're out hundreds of thousands of dollars in equipment.

SuperSpreader · July 20

1 hour ago, Ghost_MH said:

MBA to manage IT/cut costs and that MBA will decide that nobody working for them knows better

This is the problem with all tech right now including management. Big applications get managed like they're a small 2 week college project

SuperSpreader · July 20

I think what happened was tech started hiring engineers only from places like Stanford and then Stanford combined some eng with generic management/MBA with no functional training or application. They invented a bunch of BS management styles that only make sense in a college dorm and then tech hired these dummies

Global IT outages have hit airlines and business worldwide

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members