Jump to content

Global IT outages have hit airlines and business worldwide


Recommended Posts

WWW.CNBC.COM

Crowdstrike global IT outage hits the heart of the global supply chain, with air freight, rails, ports and trucking in the U.S. and beyond down.

 

Quote

 

The CrowdStrike software bug that crashed Microsoft operating systems and caused the largest IT outage in history caused disruptions at U.S. and global ports, with highly complex air freight systems suffering the heaviest hit, according to logistics experts, as global airlines grounded flights.

 

“Planes and cargo are not where they are supposed to be and it will take days or even weeks to fully resolve,” Niall van de Wouw, chief air freight officer at supply chain consulting firm

 

Xeneta, said in a statement shared with CNBC. “This is a reminder of how vulnerable our ocean and air supply chains are to IT failure.” 

 

 

  • Sad 1
Link to comment
Share on other sites

2 minutes ago, Commissar SFLUFAN said:
WWW.CNBC.COM

Crowdstrike global IT outage hits the heart of the global supply chain, with air freight, rails, ports and trucking in the U.S. and beyond down.

 

 


One of the problems with our hyper efficient supply chains, things to JIT manufacturing, is holy moly do things get all out of whack in relatively short order and the dominoes don’t stop falling for quite some time.

Link to comment
Share on other sites

1 minute ago, sblfilms said:


One of the problems with our hyper efficient supply chains, things to JIT manufacturing, is holy moly do things get all out of whack in relatively short order and the dominoes don’t stop falling for quite some time.

 

JIT ensures that the logistics chain operates on the knife's edge of disaster, with little-to-no redundancy.

Link to comment
Share on other sites

What I want to know is why is this patch distributed without question everywhere? Why doesn't each client using this software not do a local test of a new patch before deploying into prod?

 

I mean, CloudStrike is deserving of blame and should probably go under for this. But why did no users of it think to test things when they update them?

Link to comment
Share on other sites

7 minutes ago, legend said:

I mean, CloudStrike is deserving of blame and should probably go under for this. But why did no users of it think to test things when they update them?

 

that's what big tech firms usually do
engineering tests it locally for a few weeks before deploying

Link to comment
Share on other sites

16 minutes ago, legend said:

What I want to know is why is this patch distributed without question everywhere? Why doesn't each client using this software not do a local test of a new patch before deploying into prod?

 

I mean, CloudStrike is deserving of blame and should probably go under for this. But why did no users of it think to test things when they update them?

 

Wouldn't be the first time. I haven't dug into this one too much since it didn't affect our office, but I remember a decade ago where McAfee bricked Windows desktops. In that case, it want an untested update to the software, but security definitions. There are more and more applications that are self-updating without IT intervention because the update is supposedly limited to itself. Edge and Chrome fall in this category, security or virus definitions with white and black lists along with threat hashes didn't normally get manual approval.

Link to comment
Share on other sites

3 minutes ago, Ghost_MH said:

 

Wouldn't be the first time. I haven't dug into this one too much since it didn't affect our office, but I remember a decade ago where McAfee bricked Windows desktops. In that case, it want an untested update to the software, but security definitions. There are more and more applications that are self-updating without IT intervention because the update is supposedly limited to itself. Edge and Chrome fall in this category, security or virus definitions with white and black lists along with threat hashes didn't normally get manual approval.

 

CrowdStrike is ring 0.

Link to comment
Share on other sites

1 hour ago, legend said:

What I want to know is why is this patch distributed without question everywhere? Why doesn't each client using this software not do a local test of a new patch before deploying into prod?

 

I mean, CloudStrike is deserving of blame and should probably go under for this. But why did no users of it think to test things when they update them?

 

In the modern world of software, customers ARE the QA testers.

  • Sad 1
  • True 2
Link to comment
Share on other sites

The people at fault are the people that constantly invest in a single point of failure. Car dealerships learned that mistake last month.

 

This really shouldn't have been as big as it was but the drive to lower costs just cost many of these companies potentially billions in lost revenue.

Link to comment
Share on other sites

13 minutes ago, Link200 said:

The people at fault are the people that constantly invest in a single point of failure. Car dealerships learned that mistake last month.

 

This really shouldn't have been as big as it was but the drive to lower costs just cost many of these companies potentially billions in lost revenue.

So once all these companies perform a post mortem, what are the lessons learned?

Link to comment
Share on other sites

2 hours ago, Ghost_MH said:

 

Wouldn't be the first time. I haven't dug into this one too much since it didn't affect our office, but I remember a decade ago where McAfee bricked Windows desktops. In that case, it want an untested update to the software, but security definitions. There are more and more applications that are self-updating without IT intervention because the update is supposedly limited to itself. Edge and Chrome fall in this category, security or virus definitions with white and black lists along with threat hashes didn't normally get manual approval.


I’m suddenly in favor of government regulation of IT :p 

Link to comment
Share on other sites

2 hours ago, SuperSpreader said:

 

that's what big tech firms usually do
engineering tests it locally for a few weeks before deploying


Yeah this is the absolutely sane thing to do. I think my groups AI research experiment platform has more safe guards than apparently a lot of big companies with high risk production systems. 

Link to comment
Share on other sites

22 hours ago, Commissar SFLUFAN said:

Visualization of how the outage impacted US air traffic:

 

108009199-1721409773840-2024_0719_flight

I know its not exactly the same, but I remember seeing something like that when 9/11 was happening, they had to route inbound planes to Canada or wherever else they could.

Link to comment
Share on other sites

I work in tech, and this made for a really fun day of work yesterday. :silly:  I spent the better part of my day dealing with about 100 systems that were (temporarily) bricked by this. GG, CrowdStrike. 

 

DDSFkO0.jpeg

 

Link to comment
Share on other sites

22 hours ago, legend said:

I’m suddenly in favor of government regulation of IT :p 

 

Looking into it more, this was an update to CrowdStrike's logic engine. This is not an update that would normally be vetted because the tools for locally vetting them are expensive and few corporations are willing to pay for a full QA infrastructure. On top of that, I do believe CrowdStrike is directly updated. That is, the updates are pulled directly from them and not controlled by some IT-controlled update server.

 

I'd take some government regulation here if it forced companies to provide free tools for staging updates or providing updates at a regular schedule. Like on the Microsoft front, I know they'll release updates every second Tuesday of the month. That means if I just update on the third Tuesday of the month I can be pretty confident I won't pull anything bad without any extra cost to me.

Link to comment
Share on other sites

19 minutes ago, Ghost_MH said:

 

Looking into it more, this was an update to CrowdStrike's logic engine. This is not an update that would normally be vetted because the tools for locally vetting them are expensive and few corporations are willing to pay for a full QA infrastructure. On top of that, I do believe CrowdStrike is directly updated. That is, the updates are pulled directly from them and not controlled by some IT-controlled update server.

 

I'd take some government regulation here if it forced companies to provide free tools for staging updates or providing updates at a regular schedule. Like on the Microsoft front, I know they'll release updates every second Tuesday of the month. That means if I just update on the third Tuesday of the month I can be pretty confident I won't pull anything bad without any extra cost to me.

 

Not sure I'm following why it can't be vetted and requires expensive tools. Just install it on a single system and try to start? Are you saying the updates are pulled automatically in the background? Because if that's it, maybe we should stop doing that for system critical software.

Link to comment
Share on other sites

1 hour ago, ApatheticSarcasm said:

I know its not exactly the same, but I remember seeing something like that when 9/11 was happening, they had to route inbound planes to Canada or wherever else they could.

 

This is the 9/11 airspace closure visualization with timeline:

 

  • Shocked 1
Link to comment
Share on other sites

14 minutes ago, legend said:

Not sure I'm following why it can't be vetted and requires expensive tools. Just install it on a single system and try to start? Are you saying the updates are pulled automatically in the background? Because if that's it, maybe we should stop doing that for system critical software.

 

Yup, just pulled automatically in the background. Seems CloudStrike told some clients to just reboot their systems dozens of times until a fix was downloaded, but I don't know anyone where that actually worked.

 

Many security apps are like this. AV definitions aren't normally vetted. This is especially true for logic engines in security suites. Think of these as machine learning tools for keeping systems safe.

 

I'm more intimately familiar with McAfee's similar outage nearly twenty years. That one had their AV definitions accidently flag a Windows system file as bad which bricked Windows as soon as the AV quarantined the essential DLL. I'm also pretty familiar with Qualys.  I previously used Qualys for managing security and updates and their tools were automatically updated by DEFAULT. This is part of the problem. The reason I say it's expensive is because you'd need parallel hardware and companies already view IT as a net negative on corporate profits. You can't just test things on one virtual machine and call it a day. Have some physical database cluster? Well, now you need a second similar cluster. Have an entire virtual environment for your engineers? Well, if you really want to test things you need a complicated engineering environment. If you don't, you need to accept that you're not fully testing things and I've never met a CFO that was OK with funding partial tests that can't be guaranteed against.

 

My cheap solution to this was always to push all updates off by a week and then pay attention to news reports about faulty updates. That's obviously not an option for everyone, though. If everyone skips their updates by a week then we're back where we started. Also, all of these companies tell you best practice is to stay updated and on time. If you don't and you get bit by a zero day during that update gap, it's your policy that caused the outage and you wind up with the full blame.

 

It sucks, but that's how it is. I've personally gotten drilled by a CEO that was upset with me for updates that weren't installed per my policies even though we weren't negatively affected. Just big news about some zero day, randomly sees me walking by his office, calls me in and asks if we're patched to prevent this exploit. When he heard we weren't because those updates weren't scheduled to go out for another week, he really wasn't happy. Wasn't happy about it, but I ended up pushing an out of band update for just that one zero day and left everything else as is.

 

I like my job, but working in IT often sucks.

  • Thanks 1
Link to comment
Share on other sites

16 minutes ago, Ghost_MH said:

 

Yup, just pulled automatically in the background. Seems CloudStrike told some clients to just reboot their systems dozens of times until a fix was downloaded, but I don't know anyone where that actually worked.

 

Many security apps are like this. AV definitions aren't normally vetted. This is especially true for logic engines in security suites. Think of these as machine learning tools for keeping systems safe.

 

I'm more intimately familiar with McAfee's similar outage nearly twenty years. That one had their AV definitions accidently flag a Windows system file as bad which bricked Windows as soon as the AV quarantined the essential DLL. I'm also pretty familiar with Qualys.  I previously used Qualys for managing security and updates and their tools were automatically updated by DEFAULT. This is part of the problem. The reason I say it's expensive is because you'd need parallel hardware and companies already view IT as a net negative on corporate profits. You can't just test things on one virtual machine and call it a day. Have some physical database cluster? Well, now you need a second similar cluster. Have an entire virtual environment for your engineers? Well, if you really want to test things you need a complicated engineering environment. If you don't, you need to accept that you're not fully testing things and I've never met a CFO that was OK with funding partial tests that can't be guaranteed against.

 

My cheap solution to this was always to push all updates off by a week and then pay attention to news reports about faulty updates. That's obviously not an option for everyone, though. If everyone skips their updates by a week then we're back where we started. Also, all of these companies tell you best practice is to stay updated and on time. If you don't and you get bit by a zero day during that update gap, it's your policy that caused the outage and you wind up with the full blame.

 

It sucks, but that's how it is. I've personally gotten drilled by a CEO that was upset with me for updates that weren't installed per my policies even though we weren't negatively affected. Just big news about some zero day, randomly sees me walking by his office, calls me in and asks if we're patched to prevent this exploit. When he heard we weren't because those updates weren't scheduled to go out for another week, he really wasn't happy. Wasn't happy about it, but I ended up pushing an out of band update for just that one zero day and left everything else as is.

 

I like my job, but working in IT often sucks.

 

Leadership generally views IT as lesser-than, and also not required. Until they can't print a weird PDF.

 

 

Link to comment
Share on other sites

We currently have no Infrastructure Manager in IT (also responsible for security) as leadership won't pay the position enough to attract good talent. We just fired the last person we hired during her probation as she basically lied about her skills. If we'd had crowdstrike...we'd be fucked. We are a team of 18 people and support around 8,000 windows laptops that we just reimaged and deployed into schools.

  • Hugs 1
Link to comment
Share on other sites

I don’t know how someone can run any software company without a full QA department that certifies builds/updates before they go live. I also don’t understand how any large scale company can use/trust any software that auto updates without allowing you to control the roll out schedule. When I ran a SaaS startup even we had a dedicated QA team.

  • Halal 1
Link to comment
Share on other sites

2 minutes ago, chakoo said:

I don’t know how someone can run any software company without a full QA department that certifies builds/updates before they go live. I also don’t understand how any large scale company can use/trust any software that auto updates without allowing you to control the roll out schedule. When I ran a SaaS startup even we had a dedicated QA team.

 

Trying to cut costs. Lots of layoffs this past year.

Link to comment
Share on other sites

7 minutes ago, chakoo said:

I don’t know how someone can run any software company without a full QA department that certifies builds/updates before they go live. I also don’t understand how any large scale company can use/trust any software that auto updates without allowing you to control the roll out schedule. When I ran a SaaS startup even we had a dedicated QA team.

 

4 minutes ago, SuperSpreader said:

 

Trying to cut costs. Lots of layoffs this past year.

 

Exactly this. It's the bigger companies that have the budget that refuse to adequately fund this kind of stuff. They'll often bring in some MBA to manage IT/cut costs and that MBA will decide that nobody working for them knows better, so if we use a third party tools to manage certain risks you should abide by that parties' best practices which usually includes allowing them to manage their own updates because it frees up internal resources to run leaner and more efficiently.

 

Last company I worked for, tend of thousands of employees...

Innocent old me: What's our DR plan here?

IT management: We have good backups.

Me: That's not a real plan. What do we do if our main datacenter becomes a crater? 

Them: We can recover from tape.

Me: Well, I didn't realize the business could run if we were offline for a couple of months.

 

Just a full functional mess. Years ago, I got into a very real argument with our CFO over moving expenses. I wanted to hire movers and he thought that was silly since we could just get a truck and move all our servers ourselves. I told him if that's the route he wants to go, I want him to drive the truck and for his own personal insurance to cover the damages if some asshole with no insurance t-bones us and we're out hundreds of thousands of dollars in equipment.

  • Thanks 1
Link to comment
Share on other sites

1 hour ago, Ghost_MH said:

MBA to manage IT/cut costs and that MBA will decide that nobody working for them knows better

 

This is the problem with all tech right now including management. Big applications get managed like they're a small 2 week college project

  • True 1
Link to comment
Share on other sites

I think what happened was tech started hiring engineers only from places like Stanford and then Stanford combined some eng with generic management/MBA with no functional training or application. They invented a bunch of BS management styles that only make sense in a college dorm and then tech hired these dummies

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...