Learn Ethical Hacking (#4) - Reconnaissance - The Art of Not Being Noticed

avatar
(Edited)

Learn Ethical Hacking (#4) - Reconnaissance - The Art of Not Being Noticed

leh-banner.jpg

What will I learn

  • The difference between passive and active reconnaissance;
  • OSINT techniques: Google dorking, Shodan, certificate transparency, WHOIS;
  • DNS enumeration: finding subdomains, mail servers, zone transfer attempts;
  • How social media leaks more than people realize;
  • Building a target profile from public information alone;
  • Writing a Python OSINT collector script.

Requirements

  • A working modern computer running macOS, Windows or Ubuntu;
  • Your hacking lab from Episode 2;
  • Python 3 with requests and beautifulsoup4 installed (from Episode 2 setup);
  • The ambition to learn ethical hacking and security research.

Difficulty

  • Beginner

Curriculum (of the Learn Ethical Hacking series):

Solutions to Episode 3 Exercises

Exercise 1 -- Wireshark HTTP vs HTTPS comparison:

The HTTP capture shows everything in plaintext: the POST request body containing username=admin&password=password, all response headers, the full HTML body. The HTTPS capture shows: the TLS handshake (Client Hello with SNI showing the domain name, Server Hello with certificate), then encrypted Application Data blocks. You can see the domain name (via SNI), the certificate details, and the data volume -- but NOT the URL path, headers, body, or any content.

The key insight: HTTPS protects content and credentials from network observers, but still leaks metadata -- which server you're connecting to, when, and how much data flows.

Exercise 2 -- Python banner grabber:

import socket

def grab_banner(ip, port):
    try:
        s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        s.settimeout(3)
        s.connect((ip, port))
        if port == 80:
            s.send(b"GET / HTTP/1.1\r\nHost: " + ip.encode() + b"\r\n\r\n")
        banner = s.recv(1024).decode(errors='replace')
        s.close()
        return banner
    except Exception as e:
        return f"Error: {e}"

target = "192.168.56.101"
with open("/root/lab-notes/banners.txt", "w") as f:
    for port in [21, 22, 25, 80]:
        banner = grab_banner(target, port)[:500]
        f.write(f"=== Port {port} ===\n{banner}\n\n")
        print(f"Port {port}: {banner[:80]}...")

Output (abbreviated):

Port 21: 220 (vsFTPd 2.3.4)
Port 22: SSH-2.0-OpenSSH_4.7p1 Debian-8ubuntu1
Port 25: 220 metasploitable.localdomain ESMTP Postfix (Ubuntu)
Port 80: HTTP/1.1 200 OK...Server: Apache/2.2.8 (Ubuntu)...

The key insight: most services volunteer their exact name and version on connect. This is a design decision from a friendlier internet era -- and it hands attackers their CVE search terms on a silver platter.

Exercise 3 -- DNS recon of hive.blog:

Results vary over time, but a typical dig shows: A records pointing to Hetzner/OVH servers, MX records revealing Google Workspace (if Gmail-based), NS records showing the DNS provider, and TXT records containing SPF rules (which IPs can send email as @hive.blog), DKIM verification, and DMARC policy. This reveals: their hosting provider, email infrastructure, and whether they enforce strict email authentication.

The key insight: DNS is the most information-dense passive recon source. A single dig ANY command reveals hosting, email, CDN, security posture, and sometimes internal naming conventions.


Learn Ethical Hacking (#4) - Reconnaissance - The Art of Not Being Noticed

In the military, they call it intelligence gathering. In cybersecurity, we call it reconnaissance -- and it's the most important phase of any engagement. More important than the exploit. More important than the payload. More important than any technical trick you'll learn in this series.

Why? Because if you don't know what you're attacking, you can't attack it effectively. And if you're LOUD about your recon, you get caught before you even start.

Professional pentesters spend 50-70% of their time on reconnaissance. Not hacking. Not exploiting. Researching. The actual exploitation, when it comes, is often trivially easy -- because the recon made it obvious where the holes are. I've seen it firsthand on real engagements: hours of recon, followed by one command that pops a shell. All the work is in knowing which command to run, not how to run it.

There are two fundamentally different types of recon:

Passive reconnaissance -- you gather information WITHOUT touching the target. No packets sent, no connections made, nothing for the target to detect. You're reading public records, searching the internet, analyzing what's already out there. Completely legal for public information.

Active reconnaissance -- you interact with the target directly. Scanning ports, probing services, testing inputs. The target CAN detect this (IDS alerts, firewall logs, web application logs). This is where authorization matters -- doing this without permission is potentially illegal.

Today we focus on passive. We'll get to active scanning soon enough.

Google Dorking: Search Engine Hacking

Google indexes vastly more than most people realize. Server configuration files, database dumps, login pages, internal documents -- if a misconfigured web server exposes them and Google's crawler finds them, they're in the index. Sometimes forever.

Google dorks are special search operators that let you dig deep into what Google has found. This is not some exotic hacker tool -- it's literally just advanced search syntax. The fact that it works as well as it does says more about how badly organizations configure their web servers than it does about Google.

site:target.com                     # only results from this domain
site:target.com filetype:pdf        # all PDFs on the domain
site:target.com filetype:xlsx       # spreadsheets (often contain data)
site:target.com inurl:admin         # admin pages
site:target.com inurl:login         # login portals
site:target.com intitle:"index of"  # open directory listings
site:target.com ext:sql             # SQL database dumps
site:target.com ext:env             # environment files (often contain secrets!)
site:target.com "password" filetype:log  # log files mentioning passwords

You can combine these for more targeted searches:

# Find exposed configuration files
site:target.com ext:conf OR ext:cfg OR ext:ini

# Find error messages that reveal internal paths
site:target.com "Warning:" "on line" filetype:php

# Find publicly exposed Git repositories
site:target.com inurl:".git"

# Find backup files (developers often leave these lying around)
site:target.com ext:bak OR ext:old OR ext:backup

The Google Hacking Database (GHDB) at exploit-db.com/google-hacking-database contains thousands of pre-built dorks organized by category. It's basically an encyclopedia of things Google has found that shouldn't be public. Categories include "Files Containing Passwords", "Sensitive Directories", "Vulnerable Servers", "Error Messages" -- you get the idea. Hours of browsing material in there.

A real-world example: in 2023, a security researcher used the dork site:*.mil filetype:env and found US military .env files containing database credentials, API keys, and SMTP passwords. On publicly indexed servers. Found via Google. The US Department of Defense runs a bug bounty program, so this was legal and the researcher got paid. But the files had been sitting there, indexed, for months before anyone noticed.

Think about that. The most powerful military in the world, with a cybersecurity budget in the billions, had database passwords indexed by Google. And it took a civilian with a search query to find them.

Having said that, there's a caveat: Google's indexing is not exhaustive. Many pages are behind authentication, many servers have robots.txt files that tell Google not to index certain paths, and Google sometimes de-indexes sensitive content when notified. So Google dorking is a starting point, not a compltee picture. But it's a very good starting point.

Shodan: The Search Engine for Hackers

Shodan (shodan.io) doesn't index web pages -- it indexes internet-connected devices. Servers, cameras, routers, industrial control systems, IoT devices, databases. It continuously scans the entire IPv4 address space and records what responds on every port.

Think of it as Google, but for infrastructure instead of content.

# Search for Apache servers in the Netherlands
apache country:NL

# Find MySQL servers with no authentication required
mysql "authentication" port:3306

# MongoDB databases exposed to the internet
mongodb port:27017

# Webcams (yes, really)
"Server: webcamXP"

# Industrial control systems (SCADA)
port:502 "Schneider"

# Servers running specific software with known vulnerabilities
"OpenSSH 4.7" country:US

Shodan reveals how exposed an organization's infrastructure really is. We've all heard "don't expose your database to the internet." Shodan shows you just how many organizations do exactly that. It's depressing.

The free tier gives you basic search capabilities. A paid account ($49/year for the membership, or free for students/educators) unlocks more results, filters, and the API -- which lets you integrate Shodan into your own scripts. And yes, we'll do exactly that later in the series when we build more advanced recon automation.

One particularly interesting Shodan feature is historical data. Shodan keeps records of previous scans, so you can see what services a particular IP was running six months ago, a year ago. If an organization patched a vulnerability last month, you can still see that they had it -- which tells you something about their patching cadence. (Slow patchers tend to be slow across the board.)

There's also Censys (censys.io) and ZoomEye (zoomeye.org) -- similar search engines with different scan coverage and query syntax. Professional recon uses all three, because each one finds things the others miss.

Certificate Transparency Logs

Since 2018, all major Certificate Authorities (CAs) must log every SSL/TLS certificate they issue to public Certificate Transparency (CT) logs. These logs are searchable. And they reveal every subdomain that has or had an SSL certificate.

Use crt.sh to search:

https://crt.sh/?q=%25.target.com

The %25 is URL-encoded % -- a wildcard. This returns every certificate ever issued for any subdomain of target.com. And what you'll typically find is gold:

  • staging.target.com (staging environments, often less secured than production)
  • api-internal.target.com (internal APIs exposed to get certificates)
  • jenkins.target.com (CI/CD servers -- high-value targets for supply chain attacks)
  • old-app.target.com (abandoned applications, probably unpatched)
  • dev-john.target.com (developer environments with people's names in them)
  • vpn.target.com (VPN entry points -- remember Colonial Pipeline?)

This is completely passive -- you're querying a public log, not touching the target's infrastructure at all. The information is there by design (certificate transparency exists to prevent rogue certificates), and the side effect is a massive OSINT goldmine.

I want to draw attention to something specific here: abandoned subdomains. When an organization decomissions a service but doesn't revoke the certificate or remove the DNS record, the subdomain still exists and still points somewhere. If the DNS record points to a cloud service the organization no longer controls (like an old Heroku app or a decommissioned AWS S3 bucket), an attacker can potentially claim that service and take over the subdomain. This is called a subdomain takeover attack, and it's shockingly common. Bug bounty programs pay well for these -- often $500-2000 per subdomain.

WHOIS and Domain Intelligence

The WHOIS protocol reveals domain registration information:

whois target.com

This shows: registrar, creation date, expiration date, name servers, and sometimes registrant name, email, and organization. GDPR has pushed many registrars to redact personal info (especially for European domains), but the technical fields -- name servers, registrar, creation and expiry dates -- are always visible.

Why it matters: domain registration dates tell you how established the organization is. Expiration dates that are coming up soon could indicate a domain that might not get renewed -- and an attacker who registers an expired domain inherits all its email, trust, and SEO history. Name servers reveal the DNS provider, which tells you what infrastructure they trust most. And if WHOIS privacy is NOT enabled, you might get the registrant's actual name and email -- which is social engineering material.

A small trick I use: search the registrant email address across multiple domains. People who register multiple domains often use the same email. This lets you map out all domains owned by the same person or organization -- including ones they might not want publicly associated with each other.

The Wayback Machine (web.archive.org) stores historical snapshots of websites. Want to see what target.com looked like in 2019? What pages existed that were later removed? What technology stack they used three years ago? The Wayback Machine remembers. Organizations delete sensitive pages thinking they're gone -- but the archive has a copy.

How often does this actually matter? More than you'd think. I've seen cases where companies removed pages listing their technology partners, internal tool names, or employee directories. The pages are gone from the live site. But the Wayback Machine snapshots from 2020 still have everything. If you know what URL to check, the information is recoverable. This is why "security through obscurity" (hiding something and hoping nobody finds it) is not a real strategy.

DNS Enumeration: Deeper Than dig

In episode 3 we covered DNS from the protocol perspective -- how queries work, what record types exist, what they leak. Now let's turn that into a systematic enumeration technique.

Beyond basic dig lookups, there are several DNS recon techniques that every pentester should know:

Subdomain brute-forcing -- try thousands of common subdomain names against the target's DNS:

# Using a wordlist of common subdomain names
# (Kali includes several wordlists in /usr/share/wordlists/)
for sub in $(cat /usr/share/wordlists/subdomains-top1million-5000.txt); do
    result=$(dig +short $sub.target.com A)
    if [ -n "$result" ]; then
        echo "$sub.target.com -> $result"
    fi
done

This is technically active recon (you're querying the target's DNS servers), but it's such low-impact traffic that it's rarely detected or flagged. DNS queries are the background noise of the internet -- every device on the network makes thousands of them per day.

Reverse DNS lookups on IP ranges -- if you know the target owns a /24 block (256 IPs), you can reverse-lookup all of them:

# Sweep a /24 range for PTR records
for i in $(seq 1 254); do
    result=$(dig +short -x 203.0.113.$i)
    if [ -n "$result" ]; then
        echo "203.0.113.$i -> $result"
    fi
done

This often reveals internal hostnames that the organization didn't intend to make public. I've seen PTR records like mail01.internal.corp.target.com, jenkins-prod.target.com, backup-nas.target.com -- each one a breadcrumb that maps the internal network from the outside.

Zone transfers -- as we briefly mentioned in episode 3, a zone transfer (AXFR) requests a complete copy of all DNS records for a domain. Properly configured DNS servers refuse these requests from unauthorized sources. Improperly configured ones hand over the entire zone file -- every subdomain, every IP, every MX record, everything.

# Get the name servers first
dig +short target.com NS

# Attempt zone transfer against each name server
dig @ns1.target.com target.com AXFR
dig @ns2.target.com target.com AXFR

If this works, you just got the complete DNS map of the target. It's like they handed you the blueprints to the building. Shocking how often it still works in 2026 -- I'd estimate maybe 5-10% of DNS servers you test will allow zone transfers to anonymous queries. That's millions of domains worldwide.

Social Media: The Accidental OSINT Source

People leak more on social media than any technical system ever could. This is not an exaggeration. The average corporate employee's LinkedIn profile tells you more about their employer's technology stack than any port scan ever will.

LinkedIn is the single most valuable OSINT source for corporate targets:

  • Employee names and roles (for spear phishing -- "Hi Jan, I'm from IT, we need to verify your credentials...")
  • Technology skills listed on profiles (reveals the tech stack: "Experience with Kubernetes, AWS, Terraform, Datadog")
  • Job postings (reveal exactly what technologies they're hiring for -- and therefore using, or planning to use)
  • Organization chart inference (who reports to whom, which teams exist)
  • Employee count and growth rate (tells you about their security budget -- small teams, big infrastructure = overstretched)

GitHub is another treasure trove. Developers do a few things that make recon easy:

  • Push code with hardcoded credentials (API keys, database passwords, AWS secret keys). Tools like truffleHog and git-secrets scan for these, but the keys are in the git history even after the commit is reverted. git log --diff-filter=D shows deleted files; git log -p shows every change ever made. If a password was committed and then removed, it's still in the repo history forever (unless they force-push a rewritten history, which breaks everyone's clones).
  • Share configuration files that reveal internal hostnames, database endpoints, IP ranges.
  • Star and fork repositories that indicate which tools they use internally.
  • Create issues and pull requests that discuss internal architecture decisions.

Twitter/X and Stack Overflow contribute too:

  • Developers asking questions about specific versions of internal tools ("How do I configure Splunk 8.2 forwarding to...?" -- now you know they use Splunk 8.2)
  • Screenshot shares that accidentally include browser tabs, terminal prompts with hostnames, or notification bars with email previews
  • Conference talk slides that describe internal architecture "anonymized" but with enough detail to identify the company

A seasoned social engineer can build a complete organizational profile -- names, roles, email format, technology stack, physical location, office layout, badge design -- entirely from LinkedIn, job postings, and Google Street View. No technical skills required. No laws broken. And from there, a phishing campaign practically writes itself.

Building the Python OSINT Collector

Here we go -- let's build a script that automates several passive recon techniques. This is where the Learn Python Series pays off -- everything here uses concepts we've covered: subprocess management, JSON parsing, HTTP requests, string processing.

#!/usr/bin/env python3
"""
Simple OSINT collector - passive reconnaissance only.
No connections to the target's infrastructure.
"""
import subprocess
import json
import sys

def dns_recon(domain):
    """Collect DNS records via dig."""
    records = {}
    for rtype in ['A', 'AAAA', 'MX', 'NS', 'TXT', 'CNAME', 'SOA']:
        result = subprocess.run(
            ['dig', '+short', domain, rtype],
            capture_output=True, text=True, timeout=10
        )
        entries = [line.strip() for line in result.stdout.strip().split('\n') if line.strip()]
        if entries:
            records[rtype] = entries
    return records

def whois_recon(domain):
    """Extract key WHOIS fields."""
    result = subprocess.run(
        ['whois', domain],
        capture_output=True, text=True, timeout=15
    )
    info = {}
    for line in result.stdout.split('\n'):
        line = line.strip()
        for field in ['Registrar:', 'Creation Date:', 'Registry Expiry Date:',
                       'Name Server:', 'Registrant Organization:']:
            if line.startswith(field):
                key = field.rstrip(':').lower().replace(' ', '_')
                info.setdefault(key, []).append(line.split(':', 1)[1].strip())
    return info

def subdomain_from_crtsh(domain):
    """Query certificate transparency logs via crt.sh."""
    import urllib.request
    url = f"https://crt.sh/?q=%25.{domain}&output=json"
    try:
        req = urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0'})
        resp = urllib.request.urlopen(req, timeout=15)
        data = json.loads(resp.read())
        subdomains = set()
        for entry in data:
            name = entry.get('name_value', '')
            for sub in name.split('\n'):
                sub = sub.strip().lower()
                if sub.endswith(domain) and '*' not in sub:
                    subdomains.add(sub)
        return sorted(subdomains)
    except Exception as e:
        return [f"Error: {e}"]

def check_email_security(domain):
    """Analyze SPF, DKIM, and DMARC from DNS TXT records."""
    results = {}

    # SPF check
    spf_result = subprocess.run(
        ['dig', '+short', domain, 'TXT'],
        capture_output=True, text=True, timeout=10
    )
    spf_records = [l for l in spf_result.stdout.split('\n') if 'v=spf1' in l]
    results['spf'] = spf_records[0].strip() if spf_records else None

    # DMARC check
    dmarc_result = subprocess.run(
        ['dig', '+short', f'_dmarc.{domain}', 'TXT'],
        capture_output=True, text=True, timeout=10
    )
    dmarc = dmarc_result.stdout.strip()
    results['dmarc'] = dmarc if dmarc else None

    return results

def main():
    if len(sys.argv) < 2:
        print("Usage: python3 osint_collector.py <domain>")
        sys.exit(1)

    domain = sys.argv[1]
    print(f"[*] OSINT Collection for: {domain}")
    print(f"{'='*50}")

    print("\n[+] DNS Records:")
    dns = dns_recon(domain)
    for rtype, values in dns.items():
        for v in values:
            print(f"    {rtype:6s} -> {v}")

    print("\n[+] WHOIS Information:")
    whois = whois_recon(domain)
    for field, values in whois.items():
        for v in values:
            print(f"    {field}: {v}")

    print("\n[+] Email Security:")
    email_sec = check_email_security(domain)
    if email_sec['spf']:
        print(f"    SPF: FOUND")
        print(f"      {email_sec['spf'][:100]}")
    else:
        print(f"    SPF: MISSING (spoofing risk!)")

    if email_sec['dmarc']:
        print(f"    DMARC: FOUND")
        print(f"      {email_sec['dmarc'][:100]}")
        if 'p=reject' in email_sec['dmarc']:
            print("      Policy: REJECT (strong)")
        elif 'p=quarantine' in email_sec['dmarc']:
            print("      Policy: QUARANTINE (moderate)")
        elif 'p=none' in email_sec['dmarc']:
            print("      Policy: NONE (weak - spoofed email gets delivered)")
    else:
        print(f"    DMARC: MISSING (spoofing risk!)")

    print("\n[+] Subdomains (from Certificate Transparency):")
    subs = subdomain_from_crtsh(domain)
    for sub in subs[:30]:  # limit output
        print(f"    {sub}")
    if len(subs) > 30:
        print(f"    ... and {len(subs) - 30} more")

    print(f"\n[*] Total: {len(dns)} DNS record types, "
          f"{sum(len(v) for v in whois.values())} WHOIS fields, "
          f"{len(subs)} subdomains")

if __name__ == '__main__':
    main()

Save this as ~/pentest-tools/osint_collector.py on your Kali VM. Run it:

python3 osint_collector.py hive.blog

Sample output (results will vary depending on when you run it):

[*] OSINT Collection for: hive.blog
==================================================

[+] DNS Records:
    A      -> 135.181.37.31
    NS     -> ns1.openprovider.nl.
    NS     -> ns2.openprovider.be.
    NS     -> ns3.openprovider.eu.
    TXT    -> "v=spf1 include:_spf.google.com ~all"

[+] WHOIS Information:
    registrar: OpenProvider B.V.
    creation_date: 2019-12-27T21:08:16Z
    name_server: ns1.openprovider.nl

[+] Email Security:
    SPF: FOUND
      "v=spf1 include:_spf.google.com ~all"
    DMARC: MISSING (spoofing risk!)

[+] Subdomains (from Certificate Transparency):
    api.hive.blog
    hive.blog
    images.hive.blog
    ... and 4 more

[*] Total: 4 DNS record types, 3 WHOIS fields, 7 subdomains

And just like that, you have a DNS map, WHOIS data, email security posture, and a subdomain list -- all from public sources, all legal, all without touching the target's servers. Not bad for a Python script you wrote yourself, right? ;-)

This script is a foundation. Professional OSINT tools like SpiderFoot, Maltego, and recon-ng do the same thing but at much larger scale, with dozens of data sources, correlation engines, and graph visualization. We'll explore those later in the series. But understanding what they do under the hood -- because you built your own version first -- makes you a better operator when you eventually use them. It's the same teaching philosophy as the Learn Python Series: build it yourself, then use the library.

The Recon Methodology

Professional pentesters don't just fire random queries and hope something sticks. They follow a structured methodology. Before any exploitation begins, you should have a reconnaissance report that answers six fundamental questions:

  1. What is the target's public infrastructure? (domains, IPs, hosting providers, CDN, cloud services)
  2. What services are exposed? (web servers, email servers, VPN endpoints, databases, remote access)
  3. What is their technology stack? (programming languages, frameworks, databases, operating systems)
  4. Who are the key people? (IT staff, developers, executives -- for social engineering vectors)
  5. What is their security posture? (DMARC/SPF, patching cadence, bug bounty program, security team size)
  6. What are the most promising attack vectors? (outdated services, large attack surface, weak email security, abandoned subdomains)

This report becomes the foundation for everything that follows. Good recon means you walk into the exploitation phase knowing where the vulnerabilities likely are, rather than spraying attacks and hoping something sticks. The difference between an amateur and a professional isn't the tools they use -- it's the preparation they do before using them.

One more thing worth mentioning (and this trips up beginners constantly): document as you go. Don't collect all your recon data in your head or in a pile of terminal windows. Write it down. Use a structured format. Save command outputs to files. Timestamp everything. During a real engagement, you'll need to produce a report that shows exactly what you found, when you found it, and how. Your future self will thank you.

I keep a simple directory structure per target:

~/lab-notes/recon/
  target-name/
    dns.txt           # dig output
    whois.txt         # whois output
    subdomains.txt    # crt.sh results
    shodan.txt        # shodan search results
    social-media.md   # linkedin/github findings
    email-security.txt # SPF/DKIM/DMARC analysis
    report.md         # summary and analysis

Nothing fancy. Text files. Markdown for the summary. The point is to have it organized and searchable. When you discover that staging.target.com runs an old version of nginx three weeks into an engagement, you want to be able to look up your initial CT log search and confirm you identified that subdomain on day one.

We'll put all of this together into a proper scanning methodology soon. The distinction between passive (what we covered today) and active (what's coming next) recon is important but artificial -- in practice, you move fluidly between them. Passive recon identifies targets; active recon verifies them. Together, they give you the complete picture.

Exercises

Exercise 1: Run the osint_collector.py script against three different domains: one major tech company (e.g., github.com), one media site (e.g., cnn.com), and one government domain (e.g., nasa.gov). Save the output for each. Compare: which has the most subdomains? Which reveals the most about their technology stack via DNS TXT records? Which has the most restrictive WHOIS information? Write a one-paragraph analysis for each.

Exercise 2: Using ONLY Google dorks (no tools, just the Google search engine), find at least 5 interesting things about a public university's website of your choice. Use at least three different dork operators (site:, filetype:, inurl:, intitle:, ext:). Document each dork you used and what it found. Note: universities are excellent targets for this exercise because they typically have large, diverse, and often poorly managed web presences.

Exercise 3: Enhance the osint_collector.py script with a new function: check_headers(domain) that makes an HTTP HEAD request to the domain and reports security-relevant headers. Check for: Server (version disclosure), X-Powered-By (technology disclosure), Strict-Transport-Security (HSTS), X-Frame-Options (clickjacking protection), Content-Security-Policy (XSS protection), and X-Content-Type-Options (MIME sniffing protection). For each header, report whether it's present or missing, and briefly explain the security implication. Test it against at least 3 domains.

def check_headers(domain):
    """Check HTTP response headers for security indicators."""
    import urllib.request
    import ssl

    security_headers = {
        'Strict-Transport-Security': 'HSTS - forces HTTPS connections',
        'X-Frame-Options': 'Clickjacking protection',
        'Content-Security-Policy': 'XSS and injection protection',
        'X-Content-Type-Options': 'MIME sniffing protection',
    }
    disclosure_headers = ['Server', 'X-Powered-By']

    ctx = ssl.create_default_context()
    try:
        url = f"https://{domain}/"
        req = urllib.request.Request(url, method='HEAD',
                                     headers={'User-Agent': 'Mozilla/5.0'})
        resp = urllib.request.urlopen(req, timeout=10, context=ctx)
        headers = dict(resp.headers)

        print(f"\n  Disclosure headers:")
        for h in disclosure_headers:
            val = headers.get(h)
            if val:
                print(f"    {h}: {val} (EXPOSED - version info leaked)")
            else:
                print(f"    {h}: not set (good)")

        print(f"\n  Security headers:")
        for h, desc in security_headers.items():
            val = headers.get(h)
            if val:
                print(f"    {h}: PRESENT ({desc})")
            else:
                print(f"    {h}: MISSING ({desc})")

    except Exception as e:
        print(f"  Error: {e}")

Geniet ervan, en tot snel!

@scipio



0
0
0.000
0 comments