ELI5

How do Supply-Chain Attacks Work? Examples from Software Development

Did you know software is made up of hundreds of tiny pieces of software called libraries? Attackers sure do. Nowadays, they prey on developers' cognitive loads to infiltrate our most trusted applications. Let's unpack the new phenomenon of supply chain attacks.

Pierre-Paul Ferland

07 Jun 2023 • 6 min read

In December 2021, researchers from the Ali Baba group discovered a "once in a generation" vulnerability: log4shell. It was a perfect 10: a simple line anybody could inject, unauthenticated, on a system to gain full access.

The community rightfully freaked out. People worked around the clock to apply the security update. And, perhaps more importantly, the "zero-day" hit the mainstream headlines. Such a phenomenon can only mean one thing for security specialists: brace for impact.

Here's the problem: log4shell hit an open-source software component, log4j, maintained by a handful of volunteers.

Hundreds of enterprise applications in all industries relied on this simple library, which now threatened their whole business. This is where things started to go awry. Higher-ups in many large organizations suddenly realized this fact. Not only that, but their developers were using hundreds, if not thousands of these little free community-maintained modules. Fortune 100 banks sent strongly-worded letters to open-source contributors as if they were business partners they could push around. How little did they know that their ivory towers rested on frail clay blocks... The xkcd comic says it all:

This fragile structure is also called the software supply chain. Now, the log4shell vulnerability affected one of these bricks because of a software bug. But attackers nowadays are realizing; what if they could somehow infiltrate one of these tiny building blocks? What if they could masquerade as a useful open-source piece of code, only to climb up into these banks' systems?

Supply chain attacks are one of the most important threats that any company faces these days. This week, I will explain to you like you're five how threat actors corrupt software through the supply chain, which techniques they use, and how to protect against them.

So... what is a software library?

A library is a pre-written piece of code which carries out some basic functions. Think of it like pre-made pie dough. Sure, you can do the dough yourself, but it requires much more time, ingredients, and skill. And developers aren't paid to make a pie, actually, they need to do the whole buffet.

Using external libraries is not only a way to achieve better productivity, but sometimes it can also be a secure move! Think of cryptography. Turns out it is more secure to use OpenSSL, which is being actively reviewed by hundreds of high-skilled academics for implementation flaws, than to build yourself a cryptography manager.

Remember: any development you make outside of your core business is a potential "non-differentiated" piece of code. Healthcare developers should focus on health-related business functions. Why should they build from scratch a framework to arrange the visual aspect of their web applications?

Machine learning libraries are essential to allow data scientists to come up with the amazing models we now know. Frameworks such as PyTorch and TensorFlow handle the complex math behind these models while ensuring optimal computing resource allocations to carry out the calculations.

Just like a cook shouldn't have to bother with building their own appliances, software libraries provide a higher level of functionality to build more sophisticated applications. Appliances... applications... apps... see what I did?

But wait, it gets deeper. Any good cook will tell you that while they shouldn't have to build their own stove, they must know, to a certain level, how they are made to achieve better results (and not burn themselves).

And this is where, sometimes, developers can be caught off-guard.

Typosquatting attacks

Libraries can be found in various "marketplaces", which are called "package managers". The most well-known ones for web applications are npm, PyPi, and RubyGems. Think of them like the app store on your phone: you can get all sorts of packages, and the "store" ensure their versioning and warns you to update them.

The biggest difference lies in the interface. You are used to interacting with apps using your fingers and a screen. Software packages do not work like this; rather, they are meant to interface with other programs using commands. This means when humans do want to interact with the package, they must use a command line interface (CLI). This is the terminal that you see in films when they want to show a hacker.

Here's where typosquatting comes into play. Since humans must input package names into their code, typos do happen. So if you have a security dependency that is called crypto-js, and malicious individuals register cryptojs while keeping the same functionality as the original, but with added malware, many will notice only when it is too late...

Having a malicious library running in an application basically gives attackers the control to do anything! These scripts can become keyloggers, ransomware or, more commonly, crypto miners. It's no coincidence the practice of infecting libraries boomed during the crypto bubble and focused on cryptocurrency-related libraries.

But it gets worse...

Ghost packages haunt you

Remember what I said about components being maintained by volunteers? Well, people get married, have kids, or retire. Guess what happens to the projects they maintain in their spare time?

Software evolves blazingly, and many components retire unannounced. However, if a developer chose to pull a dependency based on its name (rather than an identifier), malicious actors can register the abandoned name to host an infected package.

The average application depends on more than 500 open-source libraries. Open-source components make up upwards of 75% of the code in any given application.

🖥️

This is why we spend so much time deconstructing the image of the programmer writing lines upon lines of code most of their days. It would be more accurate to say they are assembling other people's code into a patchwork.

It's beyond human cognition to keep track of so many sub-systems.

Package managers, unlike app stores, favour a "laissez-faire" style, based in open-source ideology: knowledge is freely shared, community-driven, and decentralized. It's up to every developer to uphold their responsibilities, and all hackers have to do is wait for a human to make a mistake... which they can also trigger because...

It's easy to hide malware in plain sight

Let's say you are a mindful and competent developer who examines any third-party code before you install the package. You will catch the infected files, right? RIGHT?

Threat actors have perfected obfuscation techniques to hide their misdeeds. In the early days, they would simply put the malicious code faaaaaaaaaaaaaaaar away on the page (as if you put information in the column ZZZ in Excel) so no human would think to scroll over there while reading the code.

Now, hackers inject compiled malware into the packages. What does that mean? Without digging into specifics, there exists a type of code called bytecode that bridges the gap between human-readable source code and pure machine code.

By injecting non-human readable code into a package, only a runtime analysis (i.e. launching the software and seeing what pops up) can uncover unexpected behaviours.

What is even more problematic is that most mainstream scanners that help developers uncover malicious packages run static (i.e. they read the code, they don't run it).

This stealth technique illustrates the type of arms race we security professionals are squared against.

So is all hope lost? Not quite...

As a consumer, it is impossible for you to know whether an app you are about to get has a supply-chain vulnerability. You only have one option, which I would call your "bullshit radar". If an app offers benefits too good to be true, then it probably makes its money ripping you off. If an extension requires weird privileges, it's a red flag. If you notice a slowdown of your computer, it's probably not planned obsolescence.

If you are purchasing on behalf of an enterprise, then it's another story altogether! You can (and should) use your bargaining power to require an examination of your supplier's software development practice and methodologies, their commitment to developer's skills development, and their dependency management. And if a supplier tells you they do not use any open-source software, they are either bullshitting or will have a ridiculously overpriced product.

Supply chain attacks prey on the very nature of software as being, huh, soft (as in: malleable). It's simply too easy to add dependencies. (And I'm not just writing that because of wishful thinking of having a job forever). True secure products will make peace with this fact and build processes to ensure the infected packages do not impact their users.