Learn Ethical Hacking (#66) - Reporting and Documentation - The Professional Difference
Learn Ethical Hacking (#66) - Reporting and Documentation - The Professional Difference

What will I learn
- Why reporting is the most important pentest deliverable -- the report is what the client pays for, not the hack;
- Report structure -- executive summary, methodology, findings, evidence, remediation, and appendices;
- Writing findings that drive action -- CVSS scoring, risk ratings, business impact language;
- Evidence collection -- screenshots, command output, traffic captures, and chain of evidence;
- Writing for different audiences -- executives who want risk, managers who want priorities, engineers who want fix instructions;
- Common reporting mistakes -- jargon overload, missing reproduction steps, and findings without business context;
- Reporting tools -- Serpico, PlexTrac, Ghostwriter, and Markdown-to-PDF pipelines;
- Defense: using pentest reports to drive remediation programs and track risk reduction over time.
Requirements
- A working modern computer running macOS, Windows or Ubuntu;
- Understanding of the full pentest lifecycle from this series;
- Basic document writing skills;
- The ambition to learn ethical hacking and security research.
Difficulty
- Intermediate
Curriculum (of the Learn Ethical Hacking Series):
- Learn Ethical Hacking (#1) - Why Hackers Win
- Learn Ethical Hacking (#2) - Your Hacking Lab
- Learn Ethical Hacking (#3) - How the Internet Actually Works - For Attackers
- Learn Ethical Hacking (#4) - Reconnaissance - The Art of Not Being Noticed
- Learn Ethical Hacking (#5) - Active Scanning - Mapping the Attack Surface
- Learn Ethical Hacking (#6) - The AI Slop Epidemic - Why AI-Generated Code Is a Security Disaster
- Learn Ethical Hacking (#7) - Passwords - Why Humans Are the Weakest Cipher
- Learn Ethical Hacking (#8) - Social Engineering - Hacking the Human
- Learn Ethical Hacking (#9) - Cryptography for Hackers - What Protects Data (and What Doesn't)
- Learn Ethical Hacking (#10) - The Vulnerability Lifecycle - From Discovery to Patch to Exploit
- Learn Ethical Hacking (#11) - HTTP Deep Dive - Request Smuggling and Header Injection
- Learn Ethical Hacking (#12) - SQL Injection - The Bug That Won't Die
- Learn Ethical Hacking (#13) - SQL Injection Advanced - Extracting Entire Databases
- Learn Ethical Hacking (#14) - Cross-Site Scripting (XSS) - Injecting Code Into Browsers
- Learn Ethical Hacking (#15) - XSS Advanced - Bypassing Filters and CSP
- Learn Ethical Hacking (#16) - Cross-Site Request Forgery - Making Users Attack Themselves
- Learn Ethical Hacking (#17) - Authentication Bypass - Getting In Without a Password
- Learn Ethical Hacking (#18) - Server-Side Request Forgery - Making Servers Betray Themselves
- Learn Ethical Hacking (#19) - Insecure Deserialization - Code Execution via Data
- Learn Ethical Hacking (#20) - File Upload Vulnerabilities - When Users Upload Weapons
- Learn Ethical Hacking (#21) - API Security - The New Attack Surface
- Learn Ethical Hacking (#22) - Business Logic Flaws - When the Code Works But the Logic Doesn't
- Learn Ethical Hacking (#23) - Client-Side Attacks - Beyond XSS
- Learn Ethical Hacking (#24) - Content Management Systems - Hacking WordPress and Friends
- Learn Ethical Hacking (#25) - Web Application Firewalls - Bypassing the Guards
- Learn Ethical Hacking (#26) - The Full Web Pentest - Methodology and Reporting
- Learn Ethical Hacking (#27) - Bug Bounty Hunting - Getting Paid to Hack the Web
- Learn Ethical Hacking (#28) - The AI Web Attack Surface - AI Features as Vulnerabilities
- Learn Ethical Hacking (#29) - Network Sniffing - Seeing Everything on the Wire
- Learn Ethical Hacking (#30) - Wireless Network Attacks - Breaking Wi-Fi
- Learn Ethical Hacking (#31) - Privilege Escalation - Linux
- Learn Ethical Hacking (#32) - Privilege Escalation - Windows
- Learn Ethical Hacking (#33) - Active Directory Attacks - The Crown Jewels
- Learn Ethical Hacking (#34) - Pivoting and Lateral Movement - Spreading Through Networks
- Learn Ethical Hacking (#35) - Cloud Security - AWS Attack and Defense
- Learn Ethical Hacking (#36) - Cloud Security - Azure and GCP
- Learn Ethical Hacking (#37) - Container Security - Docker and Kubernetes Attacks
- Learn Ethical Hacking (#38) - Infrastructure as Code - Securing the Automation
- Learn Ethical Hacking (#39) - Email Security - Phishing Infrastructure and Defense
- Learn Ethical Hacking (#40) - DNS Attacks - Exploiting the Internet's Foundation
- Learn Ethical Hacking (#41) - Exploitation Frameworks - Metasploit and Cobalt Strike
- Learn Ethical Hacking (#42) - Custom Exploit Development - Writing Your Own
- Learn Ethical Hacking (#43) - Exploit Development Advanced - Modern Mitigations and Bypasses
- Learn Ethical Hacking (#44) - Reverse Engineering - Understanding Binaries
- Learn Ethical Hacking (#45) - Supply Chain Attacks - Poisoning the Source
- Learn Ethical Hacking (#46) - The Human Factor - Why Security Training Fails
- Learn Ethical Hacking (#47) - Physical Security and OSINT - The Forgotten Attack Vectors
- Learn Ethical Hacking (#48) - Insider Threats - When the Call Is Coming from Inside the House
- Learn Ethical Hacking (#49) - Deepfakes and AI Deception - The New Social Engineering
- Learn Ethical Hacking (#50) - Red Team Operations - Simulating Real Attacks
- Learn Ethical Hacking (#51) - Incident Response - When Things Go Wrong
- Learn Ethical Hacking (#52) - Threat Intelligence - Knowing Your Enemy
- Learn Ethical Hacking (#53) - Security Architecture - Designing Systems That Resist Attack
- Learn Ethical Hacking (#54) - Compliance and Governance - The Business of Security
- Learn Ethical Hacking (#55) - Privacy and Data Protection - GDPR, CCPA, and Beyond
- Learn Ethical Hacking (#56) - Cryptocurrency Security - Attacking and Defending Digital Assets
- Learn Ethical Hacking (#57) - IoT and Embedded Security - Hacking the Physical World
- Learn Ethical Hacking (#58) - The AI Security Landscape - Attacking and Defending AI Systems
- Learn Ethical Hacking (#59) - Python for Pentesters - Automating Everything
- Learn Ethical Hacking (#60) - Zig for Security Tools - When Speed and Memory Matter
- Learn Ethical Hacking (#61) - Writing Custom Scanners - Beyond Off-the-Shelf
- Learn Ethical Hacking (#62) - C2 Frameworks - Building Command and Control
- Learn Ethical Hacking (#63) - Payload Generation and Evasion - Defeating Antivirus
- Learn Ethical Hacking (#64) - Fuzzing - Finding Bugs at Machine Speed
- Learn Ethical Hacking (#65) - OSINT Automation - Large-Scale Intelligence Gathering
- Learn Ethical Hacking (#66) - Reporting and Documentation - The Professional Difference (this post)
Learn Ethical Hacking (#66) - Reporting and Documentation - The Professional Difference
Solutions to Episode 65 Exercises
Exercise 1: Subdomain enumeration pipeline (abbreviated).
Domain: mydomain.com (owned, authorized)
subfinder: 42 subdomains (passive, fast)
amass: 67 subdomains (includes overlap + DNS brute)
crt.sh: 38 subdomains (CT logs, some wildcards)
Combined: 78 unique subdomains (after dedup)
Live (httpx): 31 responding to HTTP/HTTPS
Overlap: 60% of subdomains found by 2+ tools. crt.sh found 4
unique subdomains not in subfinder or amass (old certs for
decommissioned services). amass found 12 unique via DNS brute
force that no passive source had.
Surprise: staging-old.mydomain.com still resolves to a server
running an outdated Nginx with directory listing enabled.
The overlap percentage tells you something important about your pipeline's coverage. When 60% of findings appear in 2+ tools, it means the tools are corroborating each other -- good for confidence, but it also means the remaining 40% are unique to a single tool. Those unique findings are the ones you would MISS by running only one tool. The staging-old finding from amass's DNS brute force is the kind of discovery that justifies running the full pipeline: a forgotten staging server with directory listing is a genuine finding that could end up in your pentest report (we will get to exactly how to write that finding today).
Exercise 2: CT monitoring (abbreviated).
Monitor ran for 24 hours on test domain.
Hour 18: issued Let's Encrypt cert for newtest.mydomain.com
Monitor detected new certificate at hour 19 (crt.sh propagation
delay: ~45 minutes after cert issuance).
Detection latency: 45-90 minutes (depends on crt.sh crawl cycle).
For real-time monitoring, certstream (WebSocket stream of CT logs)
provides sub-second detection but requires more infrastructure.
The 45-90 minute detection latency through crt.sh polling is acceptable for defensive monitoring (you want to know about unauthorized certificates within the same business day, not the same second), but it is too slow for offensive CT monitoring during a red team engagement where you are watching for the target to deploy new infrastructure. For the offensive use case, certstream is the right tool -- it provides a WebSocket feed of every certificate logged to any CT log in real time. The tradeoff is infrastructure: you need a process running continuously, consuming the stream, filtering for your target domain, and alerting on matches. The crt.sh polling approach from episode 65 is simpler to deploy and sufficient for most use cases.
Exercise 3: Full OSINT assessment report (abbreviated).
{
"target": "fictional-corp.com",
"risk_score": 62,
"risk_level": "HIGH",
"sections": {
"subdomains": {"total": 78, "live": 31},
"breaches": {"exposed_accounts": 12, "total_breaches": 34},
"github": {"total_findings": 3, "repos_affected": 2},
"certificates": {"new_certs": 5}
}
}
The risk score of 62 (HIGH) is driven primarily by the 12 exposed accounts across 34 breaches and the 3 GitHub secret findings. The executive summary for this fictional assessment would lead with: "12 employee email addresses appear in known data breaches, and 3 instances of exposed credentials were found in public GitHub repositories. These findings represent immediate unauthorized access risk without requiring any technical exploitation." That last sentence is what makes the executive care -- you are telling them that an attacker does not need to hack anything, they just need to log in with credentials that are already public. The risk scoring model is deliberately simple (linear weights, capped per category), but the executive summary is where the analysis happens. Numbers without context are useless. Numbers with business impact drive remediation budgets.
Episode 65 was about OSINT automation -- scaling intelligence gathering from manual searches to systematic pipelines that enumerate subdomains, monitor Certificate Transparency logs, check breach databases, scan GitHub for leaked secrets, and correlate findings into consolidated risk reports. We built Python tools for each of these tasks and discussed how the same automation that attackers use for reconnaissance is equally valuable for defenders monitoring their own attack surface. The closing point was that raw OSINT data is only useful if it gets communicated effectively to the people who can act on it.
Today we tackle that communication problem head-on. This episode is about the skill that separates amateur pentesters from professionals who get hired back: reporting and documentation. All the OSINT automation, all the exploits, all the custom scanners and C2 frameworks we have built across 65 episodes -- none of it matters if you cannot communicate what you found, why it matters, and what the client should do about it. We touched on reporting briefly in episode 26 (the full web pentest methodology), but today we go deep.
The Report Is the Product
Here is the uncomfortable truth that most aspiring pentesters do not want to hear: the hack is not the deliverable. The report is. A client does not pay you to get Domain Admin. They pay you to tell them what is broken, how bad it is, and how to fix it. If your report is unclear, incomplete, or technically accurate but unreadable, the engagement fails -- regardless of how brilliant your exploitation was.
I have seen pentesters who can pop shells in minutes but cannot write a coherent paragraph about what they found. And I have seen pentesters with average technical skills who write reports so clear and actionable that clients implement every recommendation within a month. Guess which one gets invited back? ;-)
The economic reality is straightforward. A pentest engagement costs the client somewhere between $10,000 and $100,000+ depending on scope. What they receive for that money is a PDF. That PDF needs to justify the spend, communicate the risk in terms the board understands, give the engineering team enough detail to actually fix the problems, and serve as a baseline for measuring improvement over time. If the PDF is bad -- vague findings, missing evidence, "just patch it" remediation -- the client's security posture does not improve, and they hire a different firm next year.
Report Structure
Every professional pentest report follows the same general structure. The specifics vary between firms (some use custom templates, some use reporting platforms like PlexTrac or Ghostwriter), but the sections are universal:
1. EXECUTIVE SUMMARY (1-2 pages)
Audience: C-suite, board members, non-technical stakeholders
Content:
- Engagement overview (scope, dates, methodology)
- Overall risk rating (Critical / High / Medium / Low)
- Key findings summary (3-5 most important issues)
- Business impact statement (what an attacker could DO)
- Strategic recommendations (not technical -- business language)
2. SCOPE AND METHODOLOGY (1-2 pages)
- What was tested (IP ranges, domains, applications)
- What was NOT tested (exclusions)
- Testing methodology (OWASP, PTES, MITRE ATT&CK)
- Tools used
- Testing dates and duration
- Limitations and caveats
3. FINDINGS (bulk of the report)
Each finding includes:
- Title (clear, specific)
- Severity (Critical/High/Medium/Low + CVSS score)
- Affected system(s)
- Description (what the vulnerability is)
- Business impact (what an attacker could achieve)
- Evidence (screenshots, request/response, command output)
- Reproduction steps (step-by-step, anyone can follow)
- Remediation (specific, actionable fix)
- References (CVE, CWE, OWASP)
4. RISK SUMMARY (1 page)
- Finding count by severity
- Risk matrix or heat map
- Trend comparison (if repeat engagement)
5. APPENDICES
- Full tool output (nmap scans, vulnerability scan reports)
- Raw evidence not included in findings
- Credential dumps (sanitized)
- Network diagrams
The executive summary is the most important section in the entire report and the one that pentesters write worst. The C-suite is not going to read your 80-page findings section. They are going to read the executive summary, decide whether security needs more budget, and move on. If your executive summary reads like a technical document ("we identified a blind SQL injection vulnerability in the POST /api/v1/auth endpoint allowing extraction of the users table"), the executive stops reading. If it reads like a business risk assessment ("an unauthenticated attacker can access all 150,000 customer records, triggering GDPR notification obligations and potential fines of up to 4% of annual revenue"), the executive picks up the phone and asks how fast it can be fixed.
The scope and methodology section protects both you and the client. "We tested the external perimeter of 10.0.0.0/24 between March 1 and March 14 using automated scanning and manual exploitation techniques" establishes exactly what was covered. If the client later asks "why didn't you find the vulnerability in the internal network?", the scope section is your answer: it was explicitly excluded. This also matters for compliance -- auditors (episode 54) need to verify that the testing methodology was appropriate for the regulatory framework.
Writing a Finding That Drives Action
The difference between a finding that gets fixed and a finding that gets ignored is specificity. Compare:
BAD finding (common mistake):
Title: SQL Injection
Severity: High
Description: Found SQL injection on the login page.
Fix: Patch it.
This is useless. No evidence. No reproduction steps. No business
impact. No specific fix. The developer reading this has no idea
what to do.
GOOD finding:
Title: Blind SQL Injection in Login Form (POST /api/v1/auth)
Severity: Critical (CVSS 9.8)
Affected: https://app.target.com/api/v1/auth
Description:
The username parameter in the authentication endpoint is
vulnerable to blind SQL injection. An unauthenticated attacker
can extract the entire database contents, including user
credentials, personal information, and payment data.
Business Impact:
An attacker can exfiltrate the full customer database (estimated
150,000 records) containing names, email addresses, hashed
passwords, and billing addresses. This would trigger GDPR breach
notification obligations and potential fines of up to 4% of
annual revenue.
Evidence:
Request:
POST /api/v1/auth HTTP/1.1
Content-Type: application/json
{"username":"admin' AND SLEEP(5)-- -","password":"test"}
Response time: 5.03 seconds (baseline: 0.12 seconds)
This confirms time-based blind SQL injection.
Database extracted (proof of concept -- first 5 rows only):
admin | [email protected] | $2b$12$hash...
jsmith | [email protected] | $2b$12$hash...
...
Reproduction Steps:
1. Navigate to https://app.target.com/login
2. Intercept the login request with Burp Suite
3. Modify the username parameter to: admin' AND SLEEP(5)-- -
4. Observe 5-second delay in response (vs ~0.1s baseline)
5. Use sqlmap for automated extraction:
sqlmap -u "https://app.target.com/api/v1/auth" --data
'{"username":"*","password":"test"}' --dbms=mysql --dump
Remediation:
1. Use parameterized queries (prepared statements) for all
database interactions. Replace string concatenation with
placeholders:
BEFORE: query = f"SELECT * FROM users WHERE name='{username}'"
AFTER: cursor.execute("SELECT * FROM users WHERE name=%s",
(username,))
2. Implement input validation (alphanumeric only for usernames)
3. Deploy a WAF rule to block common SQLi patterns as a
temporary mitigation until code fix is deployed
References:
- CWE-89: Improper Neutralization of Special Elements in SQL
- OWASP: https://owasp.org/www-community/attacks/SQL_Injection
The good finding has everything: a specific title that identifies the exact endpoint, a CVSS score, evidence showing the exact request and response, reproduction steps that any developer can follow, remediation with actual code showing the before-and-after, and references for further reading. The developer receiving this finding knows EXACTLY what to do. No guessing, no interpretation, no back-and-forth emails asking "which page was the SQL injection on?"
Notice how the business impact section translates technical severity into business consequences. "Blind SQL injection" means nothing to the CFO. "150,000 customer records exposed, GDPR fines of up to 4% of annual revenue" means everything. This is the paragraph that gets copied into the board presentation. Make it count.
CVSS Scoring
CVSS (Common Vulnerability Scoring System) provides a standardized way to communicate vulnerability severity. Version 3.1 is the current standard, though 4.0 exists and is slowly being adopted:
CVSS v3.1 base metrics:
Attack Vector: Network (N) / Adjacent (A) / Local (L) / Physical (P)
Attack Complexity: Low (L) / High (H)
Privileges Req: None (N) / Low (L) / High (H)
User Interaction: None (N) / Required (R)
Scope: Unchanged (U) / Changed (C)
Confidentiality: None (N) / Low (L) / High (H)
Integrity: None (N) / Low (L) / High (H)
Availability: None (N) / Low (L) / High (H)
Example: SQL injection (unauthenticated, over network, full DB access)
AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H = CVSS 9.8 (Critical)
Example: Reflected XSS (requires user to click link)
AV:N/AC:L/PR:N/UI:R/S:C/C:L/I:L/A:N = CVSS 6.1 (Medium)
Example: Local privilege escalation (requires shell access first)
AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H = CVSS 7.8 (High)
Calculator: https://www.first.org/cvss/calculator/3.1
Having said that, CVSS measures technical severity, not business risk. A "Medium" CVSS finding on the CEO's workstation may be more urgent than a "Critical" finding on a test server that holds no real data. Your report should include CVSS scores for standardization (clients expect them, compliance frameworks reference them), but the prioritization should be driven by business context. I argue that the best reports include both: the CVSS score for the auditors, and a separate "business priority" rating that accounts for the actual value of the affected system, the sensitivity of the data it processes, and the exposure level.
Writing for Different Audiences
The same vulnerability needs different presentations depending on who is reading:
Executive audience:
"An attacker can access all customer financial data without
authentication. This creates regulatory exposure under GDPR
and potential liability estimated at EUR 2-5 million."
Manager audience:
"The authentication API has a critical SQL injection vulnerability
(CVSS 9.8). Priority: immediate. Estimated fix: 2-4 developer
hours. Temporary mitigation: WAF rule deployment (30 minutes)."
Engineer audience:
"POST /api/v1/auth is vulnerable to time-based blind SQL injection
via the username parameter. The application uses string concatenation
to build SQL queries in auth_controller.py line 47. Replace with
parameterized queries using cursor.execute() with %s placeholders.
See reproduction steps and sqlmap command in the evidence section."
Same finding. Three presentations. Each audience gets what they
need to take action at their level.
The executive version mentions no technology. No "SQL injection," no "POST endpoint," no "parameterized queries." Just: what can go wrong, and how much will it cost. The manager version adds just enough technical context to prioritize and schedule the work. The engineer version is pure implementation detail -- the exact file, the exact line, the exact fix. A professional report contains all three levels, layered in the structure: executive summary for the board, risk summary for managers, detailed findings for engineers.
This multi-audience approach is something we touched on in episode 54 (compliance and governance) from the regulatory side. The same principle applies here: security information needs to flow to the right people in the right format. A 50-page technical report handed to the CEO accomplishes nothing. A 1-page executive summary handed to the developer accomplishes nothing. Both need their own version.
Evidence Standards
Every finding needs evidence. Evidence must be:
1. Reproducible
Another tester should be able to follow your steps and get
the same result. "I found SQLi" is a claim. A curl command
with the exact payload that triggers a 5-second delay is proof.
2. Timestamped
Screenshots must show date/time. Request/response captures
must include timestamps. This proves the finding was valid
during the engagement window.
3. Sanitized
Redact real passwords if you captured them (show partial hashes).
Redact PII if you accessed customer data (show row counts, not
actual records). Do NOT include full credential dumps in the
report -- reference them in the appendix with restricted access.
4. Minimal but sufficient
Show enough evidence to prove the vulnerability. Not every
screenshot from your 3-day engagement. Select the evidence
that demonstrates the issue most clearly.
Screenshot checklist:
- Show the URL bar (proves you are testing the correct target)
- Show the full request and response (not just the response)
- Highlight the relevant part (box, arrow, annotation)
- Include the terminal command that produced the output
- Use consistent screenshot naming: finding-01-evidence-a.png
The sanitization requirement is critical and often overlooked. If you pop the database during a pentest and extract 150,000 customer records, you do NOT put those records in the report. You put the row count, maybe the first 3-5 rows with PII redacted (partial email addresses, hashed passwords with most characters masked), and a statement confirming the full dataset was accessible. The report itself becomes a security-sensitive document -- if it leaks, you do not want it to contain the very data the vulnerability would have exposed. Think about it: your report proving that customer data is at risk should not itself be a customer data leak if it falls into the wrong hands.
The reproducibility requirement is equally important for a different reason: fix verification. After the client remediates the finding, someone (either you on a retest, or the client's internal team) needs to verify the fix works. If your reproduction steps are "I found SQLi on the login page," nobody can verify anything. If your reproduction steps are "send this exact HTTP request, observe this exact response difference," verification is straightforward -- send the same request, confirm the response no longer shows the vulnerability. The reproduction steps you write during the engagement are the test cases used to verify the remediation. Write them as if someone else will run them six weeks later (because they will).
Reporting Tools
Ghostwriter (https://github.com/GhostManager/Ghostwriter)
- Full engagement management platform
- Finding database with templates
- Report generation from findings
- Client and project tracking
- Free, self-hosted
PlexTrac (commercial)
- Cloud-based pentest management
- Collaborative report writing
- Finding libraries with CVSS auto-calculation
- Client portal for report delivery
Serpico (https://github.com/SerpicoProject/Serpico)
- Report generation from templates
- Finding database
- Word/PDF export
- Free, self-hosted
DIY: Markdown + Pandoc
- Write findings in Markdown
- Use pandoc to generate PDF with company template
- Version control with git
- Best for solo operators who want full control
pandoc report.md -o report.pdf --template company.latex \
--pdf-engine=xelatex --toc
The choice between these tools comes down to team size and engagement volume. Solo operators and small teams can get away with the Markdown + Pandoc pipeline -- write in your editor of choice, store in git, generate PDFs with consistent formatting. The major advantage is version control: every edit to the report is tracked, diffs are clean, and you can branch for different report versions (draft, client review, final). The major disadvantage is that you build your own finding library from scratch and there is no built-in collaboration.
For teams running 10+ engagements per year, Ghostwriter is the practical choice among the free options. Its finding database means you write a "SQL Injection" finding template once (with customizable fields for the specifially affected endpoint, evidence, and remediation), and reuse it across engagements. Over time, your finding library becomes a substantial asset -- hundreds of pre-written findings covering every common vulnerability, each refined through multiple engagements. A new team member can produce a professional-quality report on their first engagement by selecting findings from the library and filling in the engagement-specific details.
PlexTrac adds a client portal (the client logs in and sees their findings, tracks remediation progress, compares across engagements) and collaborative editing (multiple pentesters writing findings simultaneously). These features justify the commercial price for larger firms where report delivery and client management are significant overhead.
Common Reporting Mistakes
1. All jargon, no business context
"RCE via deserialization of untrusted OGNL expressions"
means nothing to the CEO. Translate:
"An attacker can take complete control of the web server
and access all data it processes."
2. Missing reproduction steps
If the client's team cannot reproduce the finding, they
cannot verify the fix. Every finding needs step-by-step
instructions that a junior developer can follow.
3. Remediation says "patch it" or "fix the code"
This is not remediation. Remediation is:
"Replace the call to strcpy() on line 142 of parser.c with
strncpy() with a maximum length of sizeof(buffer)-1. Recompile
with -fstack-protector-strong. Deploy to production via the
standard CI/CD pipeline."
4. No severity prioritization
50 findings all marked "High" is not helpful. The client needs
to know: fix THIS one first, THAT one second, these 10 can wait.
Use CVSS + business context to create a prioritized list.
5. Report delivered, never discussed
Always do a report walkthrough with the client. The report is
a conversation starter, not a final deliverable. Answer questions.
Explain attack chains. Help them understand what matters most.
6. Findings without attack chains
Individual findings are useful. But showing how they CHAIN
together is devastating. "We used the SSRF (finding 3) to
access the internal metadata endpoint, extracted AWS credentials,
used those to access S3 buckets containing database backups,
and extracted admin credentials from the backups." THAT is what
makes the board sit up.
Mistake number 6 deserves emphasis. We covered attack chaining in episode 50 (red team operations) from the offensive perspective -- combining multiple moderate-severity findings into a critical-severity attack path. In the report, that attack chain is your most powerful narrative tool. A blind SSRF rated Medium, an exposed metadata endpoint rated Low, and an S3 bucket with overly permissive access rated Medium -- individually, each looks manageable. But the chain (SSRF -> metadata -> credentials -> S3 -> database -> domain admin) is Critical, and the report needs to show that chain explicitly. Draw it out. Show each step. Make the reader understand that fixing ANY link in the chain breaks the entire attack path -- and then recommend which link to fix first.
Defense: Using Pentest Reports Effectively
For organizations receiving pentest reports -- and this is equally important as writing them, because a brilliant report that sits unread in someone's inbox achieves exactly nothing:
1. Triage immediately
Critical and High findings get remediation tickets within 48 hours.
Medium findings within 2 weeks. Low within 90 days.
2. Verify fixes
After remediation, re-test each finding. Do NOT assume the
fix worked. Request a re-test from the pentester if budget allows.
3. Track remediation metrics
- Time from report to fix (by severity)
- Percentage of findings remediated within SLA
- Recurring findings across engagements (these indicate
systemic issues, not individual bugs)
4. Use findings for training
Share (sanitized) findings with the development team. Real
vulnerabilities found in YOUR code are more impactful than
generic security training.
5. Compare across engagements
Is the total finding count going down? Are critical findings
decreasing? Are the same issues recurring? This trend data
proves whether your security program is improving.
The recurring findings metric is the most telling indicator of organizational security maturity. If the same SQL injection pattern appears in three consecutive annual pentests, the problem is not the individual bug -- it is the development process. The developers are not using parameterized queries because nobody taught them to, or because the framework makes it easy to write raw SQL, or because there is no code review catching it. The remediation for a recurring finding is not "fix this instance" -- it is "fix the process that creates this class of bug." That might mean developer training (episode 46 covered why generic security training fails -- use the actual pentest findings as case studies instead), static analysis tools in the CI/CD pipeline, or architectural changes that make the vulnerable pattern impossible.
This is where pentest reports connect to the broader security program. Individual findings fix individual bugs. Trend analysis across engagements reveals systemic weaknesses. And fixing systemic weaknesses is how organizations actually improve their security posture over time -- not by playing whack-a-mole with individual vulnerabilities, but by eliminating the root causes that produce them. The best pentest firms include a "strategic recommendations" section in their executive summary that addresses exactly this: not just "fix these 47 bugs" but "here is why you keep getting these bugs, and here is what to change in your development and deployment process to stop producing them." That strategic perspective is where the real value of regular penetration testing lives -- and it is something that automated scanners (no matter how sophisticated) cannot provide, because they find individual instances without understanding the organizational patterns behind them.
The AI Slop Connection
AI can generate pentest report text. And it shows. AI-generated findings are generic, lack specificity, and often include remediation advice that does not apply to the target technology stack. "Use parameterized queries" is correct remediation for SQLi -- but if the target uses an ORM that should already prevent SQLi, the real finding is "why is the ORM being bypassed?" and the remediation is "fix the raw query in auth_controller.py line 47."
AI reports also tend to inflate findings. A generic "information disclosure" finding that AI generates for every HTTP server returning a Server header is noise, not intelligence. The professional pentester knows what matters and what does not. The AI treats everything equally because it cannot assess business context. A Server: Apache/2.4.52 header on a public-facing web server is not a finding -- it is a fact. A Server: Apache/2.2.8 header (a version from 2008 with dozens of known CVEs) on a server processing credit card transactions is a finding. The difference is judgment, and judgment is what the client is paying for.
The report is also where AI-generated content is most dangerous to the pentester's reputation. If a client reads a finding that says "remediation: use parameterized queries" for a Django application (where the ORM already uses parameterized queries), they know the pentester did not actually understand what they found. They lose trust in the entire report. Every finding needs to be specifc to the target -- the actual technology, the actual code, the actual deployment. Generic advice that could apply to any application is a red flag that nobody actually analyzed this system in depth.
Write your own reports. Use AI to check grammar if you like. Never use AI to generate findings you did not verify yourself.
Exercises
Exercise 1: Write a full finding for a vulnerability you discovered in your lab (any vulnerability from this series). Follow the GOOD finding template from this episode: title, CVSS score, affected system, description, business impact, evidence (with screenshots or request/response), reproduction steps, remediation, and references. Have someone else (or yourself after a week-long break) attempt to reproduce the finding using ONLY your documentation. If they cannot reproduce it, your documentation is insufficient -- rewrite and repeat until the finding is independently reproducible. Save to ~/lab-notes/pentest-finding.md.
Exercise 2: Write a 1-page executive summary for a fictional pentest engagement. The engagement tested a web application and found: 2 Critical (SQLi, RCE), 3 High (XSS, IDOR, broken auth), 5 Medium, and 8 Low findings. Domain Admin was achieved from an unauthenticated starting point through an attack chain of SQLi -> credential extraction -> lateral movement -> AD compromise. Write the summary using business language -- no technical jargon. Include the overall risk rating, the business impact of the attack chain, and the top 3 strategic recommendations. Have a non-technical person read it and tell you what they understood -- if they cannot explain the risk in their own words, rewrite until they can. Save to ~/lab-notes/executive-summary.md.
Exercise 3: Set up Ghostwriter (https://github.com/GhostManager/Ghostwriter) or Serpico in your lab. Create a project, add 3 findings from your lab work, and generate a PDF report. Compare the output against a manually written Markdown report (use pandoc report.md -o report.pdf --toc for the manual version). Document: (a) time to create a report with the tool vs manually, (b) quality of the generated output, (c) whether you would use the tool for real engagements, (d) how the finding library feature would scale across 10 engagements vs writing from scratch each time. Save your comparison to ~/lab-notes/reporting-tools.md.