Learn Ethical Hacking (#48) - Insider Threats - When the Call Is Coming from Inside the House

avatar

Learn Ethical Hacking (#48) - Insider Threats - When the Call Is Coming from Inside the House

leh-banner.jpg

What will I learn

  • What insider threats are and why they are harder to detect than external attacks;
  • Insider threat categories -- malicious insiders, negligent insiders, and compromised insiders;
  • Data exfiltration techniques -- how insiders steal data using legitimate access;
  • Behavioral indicators -- the warning signs that precede insider incidents;
  • User and Entity Behavior Analytics (UEBA) -- using baseline detection to identify anomalous insider behavior;
  • DLP (Data Loss Prevention) -- technical controls to prevent data from leaving the organization;
  • Case studies -- Edward Snowden, Tesla saboteur, Twitter insider abuse, and others;
  • Defense: least privilege, separation of duties, monitoring, and building a culture where people report concerns.

Requirements

  • A working modern computer running macOS, Windows or Ubuntu;
  • Understanding of the human factor from Episode 46;
  • Understanding of physical security and OSINT from Episode 47;
  • The ambition to learn ethical hacking and security research.

Difficulty

  • Intermediate

Curriculum (of the Learn Ethical Hacking Series):

Found: 47 emails, 12 subdomains, 8 hosts, 3 employees on LinkedIn

Targeting profile:

Who to phish: Sarah Chen (HR Director) -- access to employee data,

likely processes sensitive documents daily, listed on company website

Pretext: resume submission with macro-enabled doc

(HR expects to receive documents from strangers -- it's their job)

Technology to attack: WordPress (found at blog.megacorpone.com),

Exchange (found via MX records), VPN (vpn.megacorpone.com)

Entry point: WordPress blog is likely lowest-hanging fruit --

check for outdated plugins, then pivot internally


The targeting profile transforms raw OSINT data into an actionable attack plan. The key insight is that HR is almost always the best initial phishing target -- they receive documents from strangers as part of their normal job function. An email with a resume attached does NOT look suspicious to an HR director. It looks like Tuesday. Contrast that with phishing the IT team (who are trained to be skeptical) or the CEO (who has executive phishing protection). HR is the soft spot because their job description literally requires them to open attachments from unknown senders.

**Exercise 3:** Badge cloning research.

```text
125kHz (HID Prox, EM4100):
  - No encryption, no authentication
  - Trivially cloneable with $50 reader/writer
  - Read range: 1-3 feet (covert reading is easy)
  - Defense: migrate to 13.56MHz encrypted cards

13.56MHz (MIFARE Classic):
  - Broken crypto (Crypto-1, defeated 2008)
  - Cloneable with Proxmark3 in seconds
  - Defense: migrate to MIFARE DESFire or iCLASS SE

13.56MHz (MIFARE DESFire EV2/EV3):
  - AES-128 encryption, mutual authentication
  - Not trivially cloneable (no known practical attack)
  - Currently the recommended standard

Physical defenses: RFID-shielded badge holders, multi-factor
physical access (badge + PIN), cameras at all access points,
anti-passback (cannot badge in twice without badging out).

The badge cloning landscape is a perfect example of security economics in action. The technology to make access cards uncloneable exists (DESFire EV3, HID SEOS) and has existed for years. The reason most buildings still use cloneable 125kHz cards is not ignorance -- it's budget allocation. A campus-wide card system upgrade costs $200,000-$500,000 when you factor in readers, cards, installation, and downtime. The person who authorizes that budget (facilities manager, CFO) doesn't think in terms of attack probability. They think in terms of "the current system works and nobody has cloned a badge yet" -- which is the same reasoning that keeps every unpatched server running until the breach.


Learn Ethical Hacking (#48) - Insider Threats - When the Call Is Coming from Inside the House

Episode 47 covered the physical dimension of security -- the attacks that happen in the real world, not through a terminal. We went through lock picking (where a $30 pick set and a weekend of practice opens most commercial doors), badge cloning with the Proxmark3 (where 65-80% of buildings still use trivially cloneable 125kHz cards), USB drop attacks (with 60% plug-in rates even among security-trained employees), tailgating scenarios that achieve 85-95% success rates by exploiting basic social politeness, rogue device deployment with credit-card-sized Raspberry Pi Zeros, and then the full OSINT toolkit -- Google dorking, LinkedIn intelligence extraction, theHarvester, SpiderFoot, Sherlock, Shodan, breach data correlation, and metadata analysis that reveals internal username formats from public PDF files. Physical security and OSINT represent some of the most reliable attack vectors in real engagements, precisely because organizations invest millions in digital defenses while leaving a $12/hour guard and an unencrypted badge reader as the last line of physical defense.

Today we turn the threat model inside out. Every attack we've covered so far -- all 47 episodes worth -- assumes the attacker is outside trying to get in. Firewalls, WAFs, authentication, access controls, encryption -- all of these are perimeter defenses designed to keep outsiders out. But what happens when the attacker is already inside? When they have a badge, a VPN connection, a corporate laptop, and legitimate credentials? When the threat is a person who sits three desks away from the security team?

Here we go.

The Insider Advantage

External attackers need to find vulnerabilities, exploit them, establish persistent access, move laterally, and eventually reach the data they want. That chain has dozens of points where detection can happen -- network IDS triggers, endpoint detection fires, SIEM correlates suspicious events, a security analyst notices something odd in the logs.

Insiders skip ALL of that. They already have credentials. They already know the network layout. They know where the valuable data lives because they work with it every day. They know which systems are monitored and which are not (because they've seen the alerts, or lack thereof). And the firewall? The firewall does not protect you from someone who is already inside.

The numbers tell the story: according to the Ponemon Institute's 2023 Cost of Insider Threats report, insider incidents cost an average of $15.4 million per year per organization. The average time to detect and contain an insider threat? 85 days. Compare that to external attacks, which are typically detected in hours or days (thanks to IDS, EDR, and SIEM tools that are specifically designed to catch external intrusions). Insiders operate for months because the tools designed to catch attackers are looking for attacker BEHAVIOR -- and insiders behave like employees, because they ARE employees.

#!/usr/bin/env python3
"""insider_risk_score.py -- basic insider risk scoring concept"""

# This is a simplified model of how UEBA platforms score user risk.
# Real systems use ML models trained on months of behavioral data.
# This is the conceptual framework underneath them.

import json
from datetime import datetime, timedelta

def calculate_risk_score(user_events, baseline):
    """Score a user's recent activity against their baseline."""
    risk_score = 0
    risk_factors = []

    # Factor 1: unusual data volume
    daily_downloads = sum(
        1 for e in user_events
        if e['action'] == 'file_download'
        and e['timestamp'] > datetime.now() - timedelta(hours=24)
    )
    if daily_downloads > baseline['avg_daily_downloads'] * 3:
        risk_score += 30
        risk_factors.append(
            f"Downloads: {daily_downloads} vs baseline "
            f"{baseline['avg_daily_downloads']}"
        )

    # Factor 2: access outside normal hours
    off_hours = [
        e for e in user_events
        if e['timestamp'].hour < 6 or e['timestamp'].hour > 22
    ]
    if len(off_hours) > baseline['avg_off_hours_events'] * 2:
        risk_score += 20
        risk_factors.append(
            f"Off-hours events: {len(off_hours)} vs baseline "
            f"{baseline['avg_off_hours_events']}"
        )

    # Factor 3: new system access
    accessed_systems = set(e['system'] for e in user_events)
    new_systems = accessed_systems - set(baseline['normal_systems'])
    if new_systems:
        risk_score += 15 * len(new_systems)
        risk_factors.append(
            f"New systems accessed: {new_systems}"
        )

    # Factor 4: USB device usage (if policy restricts)
    usb_events = [
        e for e in user_events if e['action'] == 'usb_connect'
    ]
    if usb_events:
        risk_score += 25
        risk_factors.append(
            f"USB devices connected: {len(usb_events)}"
        )

    return {
        'score': min(risk_score, 100),
        'level': 'HIGH' if risk_score >= 60
                 else 'MEDIUM' if risk_score >= 30
                 else 'LOW',
        'factors': risk_factors
    }

# Example baseline (built from 90 days of observation)
baseline = {
    'avg_daily_downloads': 22,
    'avg_off_hours_events': 3,
    'normal_systems': [
        'email', 'sharepoint', 'jira', 'confluence', 'gitlab'
    ]
}

# Example: suspicious day of activity
suspicious_events = [
    {'action': 'file_download', 'system': 'sharepoint',
     'timestamp': datetime(2026, 6, 4, 2, 30)},  # 2:30 AM
    {'action': 'file_download', 'system': 'sharepoint',
     'timestamp': datetime(2026, 6, 4, 2, 31)},
    # ... imagine 150 more download events at 2 AM
    {'action': 'file_download', 'system': 'hr_database',
     'timestamp': datetime(2026, 6, 4, 2, 45)},  # NEW system!
    {'action': 'usb_connect', 'system': 'workstation',
     'timestamp': datetime(2026, 6, 4, 3, 0)},
]
# Score: HIGH -- mass downloads at 2AM, new system, USB connected

That scoring model is deliberately simplified, but it captures the core idea behind every UEBA platform: you establish what normal looks like for each individual user, then you flag when reality deviates from that normal. The important detail is "for each individual user" -- a DBA downloading 500 records from the production database at 3 AM might be doing scheduled maintenance. A marketing coordinator doing the same thing is almost certainly not. Context matters, and context comes from baselines.

Three Categories of Insider Threats

Not all insiders are created equal. The motivations, methods, and detection strategies differ significantly across categories, and getting the category wrong means applying the wrong defenses.

Malicious Insiders

People who intentionally steal data, sabotage systems, or sell access. Motivations vary: financial gain, revenge after being passed over for a promotion, ideological conviction, or recruitment by a foreign intelligence service or competitor. These are the insiders everyone thinks about, but they're actually the MINORITY of insider incidents (roughly 25-30% of cases). They're just the ones that make the news.

Case study: Tesla saboteur (2018)
An employee modified the Tesla Manufacturing Operating System
source code and exported large amounts of highly sensitive data
to unknown third parties. The employee had been passed over for
a promotion and was apparently motivated by resentment.

Key detail: the employee used their NORMAL access credentials.
No privilege escalation, no exploitation, no hacking. They just
used what they already had permission to use -- and used it to
steal and sabotage.

Case study: Anthony Levandowski (Waymo vs Uber, 2017)
Downloaded 14,000 proprietary files from Google's self-driving
car project before leaving to join Uber. Used a personal laptop
to access and copy internal documents. Google's DLP was either
nonexistent for this data or configured to allow it.

Result: Uber paid $245 million to Waymo in settlement.
Levandowski was sentenced to 18 months for trade secret theft.

Pattern: most malicious insiders are employees who recently
received negative performance reviews, were denied promotions,
or are planning to leave for a competitor. The behavioral
indicators often appear weeks or months before the actual
exfiltration. The question is whether anyone is watching.

Negligent Insiders

People who cause security incidents through carelessness, not malice. This is the majority of insider incidents -- roughly 56% according to Ponemon. Lost laptops, misconfigured S3 buckets, emailing sensitive data to personal accounts "for convenience", clicking phishing links, using weak passwords, leaving screens unlocked. Not evil. Just human.

#!/usr/bin/env python3
"""dlp_email_scanner.py -- simplified email DLP check"""

import re

# Patterns that DLP systems scan for in outbound email
SENSITIVE_PATTERNS = {
    'ssn': r'\b\d{3}-\d{2}-\d{4}\b',
    'credit_card': r'\b(?:\d{4}[-\s]?){3}\d{4}\b',
    'api_key': r'(?:api[_-]?key|apikey)\s*[:=]\s*["\']?[\w-]{20,}',
    'aws_key': r'AKIA[0-9A-Z]{16}',
    'private_key': r'-----BEGIN (?:RSA |EC )?PRIVATE KEY-----',
    'password_field': r'(?:password|passwd|pwd)\s*[:=]\s*\S+',
}

def scan_email_content(subject, body, attachments):
    """Check outbound email for sensitive data patterns."""
    findings = []
    full_text = f"{subject}\n{body}"

    for pattern_name, regex in SENSITIVE_PATTERNS.items():
        matches = re.findall(regex, full_text, re.IGNORECASE)
        if matches:
            findings.append({
                'type': pattern_name,
                'count': len(matches),
                'action': 'BLOCK',
                'reason': f'Sensitive data detected: {pattern_name}'
            })

    # Check attachment names for suspicious patterns
    for att in attachments:
        if att.endswith(('.sql', '.bak', '.dump', '.csv')):
            findings.append({
                'type': 'suspicious_attachment',
                'file': att,
                'action': 'QUARANTINE',
                'reason': f'Database/export file: {att}'
            })

    return findings

# Example: employee accidentally sends customer data to Gmail
test_body = """
Hey, can you take a look at these records? I'll work on them
from home tonight.

John Smith, 123-45-6789, account #4532-0981-2233-4455
Jane Doe, 987-65-4321, account #5678-1234-9012-3456
"""

results = scan_email_content("Customer list", test_body, ['export.csv'])
for r in results:
    print(f"[{r['action']}] {r['reason']}")
# [BLOCK] Sensitive data detected: ssn
# [BLOCK] Sensitive data detected: credit_card
# [QUARANTINE] Database/export file: export.csv

That DLP scanner catches the obvious cases -- SSNs and credit card numbers in plaintext. But the negligent insider who renames customer_database.xlsx to meeting-notes.xlsx before emailing it to their personal Gmail? Pattern matching misses that completely. This is why DLP is necessary but not sufficient. You need multiple layers: pattern matching AND content classification AND behavioral analytics AND egress monitoring.

Common negligent insider scenarios:

1. "I'll just email it to myself so I can work from home"
   - Employee sends customer data to personal Gmail
   - No malicious intent -- genuinely wants to be productive
   - Data now lives on an unmanaged personal device
   - If that Gmail account gets compromised (password reuse
     from a breach), the company's customer data leaks

2. "I need to share this with the vendor"
   - Employee creates a public SharePoint link instead of
     a restricted one because the sharing UI is confusing
   - Document is now accessible to anyone with the URL
   - Google indexes the publicly accessible SharePoint page
   - Three months later, someone finds it via dorking

3. "It's just a test environment"
   - Developer copies production database to staging server
     "for realistic testing"
   - Staging server has no access controls, no encryption,
     no monitoring
   - Production data with real customer records sits on an
     unprotected system for months

Compromised Insiders

Legitimate users whose accounts have been taken over by external attackers. This is the endpoint of a successful phishing attack -- the attacker has the employee's password and (if MFA was bypassed via session hijacking, SIM swapping, or MFA fatigue) an authenticated session. From the organization's perspective, every action the attacker takes looks like a legitimate employee doing their job.

#!/usr/bin/env python3
"""session_anomaly_detector.py -- detect compromised accounts"""

from datetime import datetime

def check_impossible_travel(logins):
    """Detect logins from geographically impossible locations.

    If a user logs in from Amsterdam at 14:00 and from Tokyo
    at 14:30, they did not fly 9,200 km in 30 minutes.
    Someone else has their credentials.
    """
    alerts = []
    sorted_logins = sorted(logins, key=lambda x: x['timestamp'])

    for i in range(1, len(sorted_logins)):
        prev = sorted_logins[i - 1]
        curr = sorted_logins[i]

        time_diff_hours = (
            curr['timestamp'] - prev['timestamp']
        ).total_seconds() / 3600

        # Rough great-circle distance between cities
        distance_km = curr.get('distance_from_prev_km', 0)

        if time_diff_hours > 0:
            required_speed = distance_km / time_diff_hours
        else:
            required_speed = float('inf')

        # Commercial aircraft: ~900 km/h max
        if required_speed > 1000:
            alerts.append({
                'type': 'impossible_travel',
                'user': curr['user'],
                'from': prev['location'],
                'to': curr['location'],
                'time_gap_hours': round(time_diff_hours, 1),
                'distance_km': distance_km,
                'required_speed_kmh': round(required_speed),
                'severity': 'CRITICAL'
            })

    return alerts

# Example: compromised account
logins = [
    {'user': 'jsmith', 'timestamp': datetime(2026, 6, 4, 14, 0),
     'location': 'Amsterdam, NL', 'distance_from_prev_km': 0},
    {'user': 'jsmith', 'timestamp': datetime(2026, 6, 4, 14, 30),
     'location': 'Lagos, NG', 'distance_from_prev_km': 5200},
]

for alert in check_impossible_travel(logins):
    print(f"[{alert['severity']}] {alert['type']}: "
          f"{alert['user']} from {alert['from']} "
          f"then {alert['to']} within {alert['time_gap_hours']}h "
          f"({alert['required_speed_kmh']} km/h required)")
# [CRITICAL] impossible_travel: jsmith from
# Amsterdam, NL then Lagos, NG within 0.5h (10400 km/h required)

"Impossible travel" detection is one of the simplest and most effective indicators of a compromised account. Microsoft Sentinel, Splunk, and most enterprise SIEMs have this rule built in. Having said that, it only catches the lazy cases -- a competent attacker using a VPN exit node in the same city as the victim won't trigger it. That's where the deeper behavioral baselines come in: typing speed, mouse movement patterns, the specific sequence of applications a user typically accesses, and the time gaps between actions ;-)

How Insiders Exfiltrate Data

This is where it gets interesting from a defensive perspective, because every exfiltration channel requires a different detection strategy. There's no single tool that catches everything:

Channel 1: Email (personal accounts)
  Method: forward sensitive docs to personal Gmail/Yahoo,
    use BCC to avoid detection in sent folder, rename files
    to look innocuous ("meeting-notes.xlsx")
  Detection: network DLP scanning outbound SMTP, keyword
    matching on attachments, blocking personal email domains
    from corporate endpoints
  Bypass difficulty: LOW (just rename the file)

Channel 2: Cloud storage (personal)
  Method: upload to personal Dropbox, Google Drive, iCloud
  Detection: web proxy logs showing uploads to cloud storage,
    cloud access security broker (CASB) policies
  Bypass: often passes DLP if the cloud service uses TLS and
    the org doesn't do TLS inspection

Channel 3: USB drives
  Method: copy files to removable media, walk out
  Detection: endpoint DLP agent, USB device whitelisting,
    Group Policy disabling USB mass storage
  Bypass difficulty: depends on endpoint controls -- many
    organizations STILL haven't disabled USB ports

Channel 4: Screenshots and photos
  Method: photograph screens with personal phone
  Detection: NONE with current technology
  Bypass difficulty: zero -- this is the ultimate airgap bypass
  Mitigation: camera detection in sensitive areas (expensive,
    privacy concerns), visible watermarks on screen content

Channel 5: Printing
  Method: print documents and walk out
  Detection: print server monitoring, watermarking printed docs,
    secure print (badge required at printer to release job)
  Bypass: print to PDF first, then use channel 1-3

Channel 6: Steganography
  Method: hide data inside images, audio files, or documents.
    Upload "vacation photos" that contain embedded source code
  Detection: steganalysis tools (but rarely deployed)
  Bypass difficulty: VERY HIGH -- nearly impossible to detect
    without specialized tools that most orgs don't have

Channel 7: Encrypted containers
  Method: VeraCrypt volume that looks like random data,
    upload to personal cloud storage
  Detection: file entropy analysis (encrypted data has high
    entropy, but so do compressed files -- false positive city)
  Bypass difficulty: HIGH
#!/usr/bin/env python3
"""usb_monitor.py -- detect USB mass storage connections"""

import subprocess
import json

def check_usb_events():
    """Parse system logs for USB mass storage events (Linux)."""
    try:
        # journalctl for USB storage events
        result = subprocess.run(
            ['journalctl', '-k', '--since', '24 hours ago',
             '--no-pager', '-o', 'json'],
            capture_output=True, text=True, timeout=10
        )

        usb_events = []
        for line in result.stdout.strip().split('\n'):
            if not line:
                continue
            try:
                entry = json.loads(line)
                msg = entry.get('MESSAGE', '')
                if 'usb-storage' in msg.lower() or \
                   'mass storage' in msg.lower():
                    usb_events.append({
                        'timestamp': entry.get(
                            '__REALTIME_TIMESTAMP', ''
                        ),
                        'message': msg,
                        'hostname': entry.get('_HOSTNAME', '')
                    })
            except json.JSONDecodeError:
                continue

        return usb_events

    except subprocess.TimeoutExpired:
        return []

def alert_on_usb(events, policy='block'):
    """Generate alerts for USB storage connections."""
    for event in events:
        severity = 'HIGH' if policy == 'block' else 'MEDIUM'
        print(f"[{severity}] USB storage detected: "
              f"{event['message'][:80]}")
        print(f"  Host: {event['hostname']}")
        print(f"  Time: {event['timestamp']}")
        print(f"  Policy action: {policy.upper()}")
        print()

# In a real deployment, this runs as a daemon or
# integrates with the organization's SIEM via syslog
events = check_usb_events()
if events:
    alert_on_usb(events, policy='block')
else:
    print("No USB storage events in last 24 hours")

The uncomfortable truth about Channel 4 (screenshots and photos) is that it's completely undetectable with any technology that currently exists. A person can photograph their screen with a personal phone, walk out of the building, and nobody will ever know. This is why defense-in-depth for insider threats includes controls that seem unrelated to technology: mandatory background checks, financial wellness programs (reducing the motivation for data theft), positive workplace culture (reducing the motivation for sabotage), and exit procedures that include device inspection. You cannot stop someone from taking a photo. You CAN reduce the number of people who WANT to take a photo.

Case Study: Edward Snowden (2013)

The most famous insider threat case in history, and still the most instructive:

Access: NSA contractor (Booz Allen Hamilton) with sysadmin
  privileges across multiple NSA systems
Method: used admin access to download classified documents
  from NSA systems onto USB drives. Walked out with them.
Volume: estimated 1.5 million classified documents
Duration: several months of data collection before discovery
Detection: NONE during exfiltration. Only discovered after
  Snowden contacted journalists from Hong Kong.

What failed:
1. No DLP on sysadmin workstations -- the people with the
   MOST access had the LEAST monitoring
2. No behavioral analytics -- downloading thousands of files
   from systems he wasn't assigned to went unnoticed
3. Sysadmins had unrestricted access with no separation of
   duties -- a single person could access everything
4. USB ports were not disabled on classified systems
5. No audit trail linking specific file access to specific users
6. No two-person integrity rule -- nuclear launch codes require
   two officers, but NSA classified files required one admin

What would have caught it:
- UEBA: anomalous download volume (thousands vs baseline ~50)
- DLP: classified markers in content leaving the network
- USB monitoring: mass copy to removable media
- Separation of duties: no single admin accesses all systems
- Two-person integrity: two authorized personnel required for
  access to the most sensitive categories

The Snowden case changed everything about insider threat programs. Post-Snowden, the NSA (and the broader intelligence community) implemented continuous evaluation, enhanced monitoring of privileged users, two-person integrity for sensitive operations, and behavioral analytics. These controls didn't exist before because the assumption was that cleared employees with sysadmin access could be trusted. That asumption was wrong exactly once, and once was enough.

Case Study: Twitter Insider Abuse (2020)

What happened: Several Twitter employees used internal
  admin tools to take over high-profile accounts (Barack Obama,
  Joe Biden, Elon Musk, Apple) and posted cryptocurrency
  scam tweets ("send Bitcoin to this address, I'll double it")

Stolen: ~$120,000 in Bitcoin (surprisingly low for the access)

Key insight: the attackers were a mix of compromised insiders
  (social engineering of Twitter employees via phone phishing)
  and a 17-year-old who coordinated it all.

What failed:
1. Internal tools had no additional authentication beyond
   employee login -- any employee could reset ANY account
2. No rate limiting on admin actions (multiple account
   takeovers in rapid succession)
3. No separation of duties -- a single employee could
   both reset password AND disable MFA
4. Audit logging existed but wasn't monitored in real time
5. No automated alerts on mass admin actions

What would have caught it:
- Tiered admin access: password reset requires approval from
  a second employee for verified/high-profile accounts
- Rate limiting: >3 account resets in an hour triggers lockout
- Real-time alerting: admin action on verified accounts
  automatically notifies the security operations center
- Behavioral analytics: employee suddenly resetting accounts
  they've never touched before = instant flag

The Twitter case is interesting because it demonstrates something that security profesionals have known for a long time but that companies refuse to internalize: internal tools are attack surface. Every admin panel, every support tool, every internal dashboard that lets employees modify user data is a potential weapon. And most internal tools are built with zero security controls because the assumption is "only our employees use this." That assumption fails the moment an employee goes rogue or gets phished ;-)

UEBA -- User and Entity Behavior Analytics

We touched on UEBA in the risk scoring example earlier. Here's how it works at scale in production environments:

Phase 1: Learning (30-90 days)
  The UEBA system observes every user's behavior and builds
  individual baselines:
  - What files does User X normally access?
  - What hours does User X normally work?
  - What systems does User X authenticate to?
  - How much data does User X typically download per day?
  - What is User X's normal network traffic pattern?
  - Which other users does User X normally interact with?

Phase 2: Scoring (continuous)
  Every action is compared against the baseline. Deviations
  increase the user's risk score:
  - User X downloaded 2,000 files today (baseline: 25)
    -> risk score +40
  - User X logged in at 2:30 AM from a new IP address
    -> risk score +20
  - User X accessed the HR database for the first time ever
    -> risk score +15
  - User X connected a USB device (policy: prohibited)
    -> risk score +25
  - Combined risk score: 100 -> ALERT: investigate NOW

Phase 3: Correlation
  UEBA doesn't just look at individual events -- it correlates:
  - User X's manager submitted a negative performance review
    last week (from HR system feed)
  - User X updated their LinkedIn profile (from OSINT feed)
  - User X's badge access shows them coming in on weekends
    (from physical access system feed)
  - Individually: maybe nothing. Combined: textbook
    pre-exfiltration pattern.

Major UEBA platforms:
  - Microsoft Sentinel with UEBA module (Azure native)
  - Splunk User Behavior Analytics
  - Exabeam Advanced Analytics
  - Securonix UEBA
  - Varonis DatAlert (data-centric, file access focus)
#!/usr/bin/env python3
"""siem_insider_rules.py -- SIEM detection rules for insider threats"""

# These rules would be implemented in Splunk SPL, Sentinel KQL,
# or Elastic Query DSL. Python pseudocode for clarity.

DETECTION_RULES = [
    {
        'name': 'Mass file download',
        'description': 'User downloads >100 files in 1 hour',
        'data_source': 'file_access_logs',
        'logic': '''
            source=file_server action=download
            | stats count by user, span=1h
            | where count > 100
        ''',
        'false_positive_rate': 'MEDIUM',
        'notes': 'Exclude known batch processes and backup accounts',
        'response': 'Alert SOC, check user risk score, call manager'
    },
    {
        'name': 'Off-hours sensitive data access',
        'description': 'Classified data accessed outside 7AM-8PM',
        'data_source': 'endpoint_dlp + auth_logs',
        'logic': '''
            source=dlp classification=confidential
            | where hour < 7 OR hour > 20
            | stats count by user
            | where count > 0
        ''',
        'false_positive_rate': 'LOW',
        'notes': 'Whitelist shift workers and on-call rotations',
        'response': 'Alert + ticket, review within 4 hours'
    },
    {
        'name': 'Email to personal domain with attachment',
        'description': 'Outbound email to personal domain, file >1MB',
        'data_source': 'email_gateway_logs',
        'logic': '''
            source=email_gw direction=outbound
            recipient_domain IN (gmail.com, yahoo.com, hotmail.com,
                                 protonmail.com, outlook.com)
            attachment_size > 1048576
        ''',
        'false_positive_rate': 'HIGH',
        'notes': 'Many legitimate uses -- score, do not auto-block',
        'response': 'Add to risk score, alert if combined HIGH'
    },
    {
        'name': 'Impossible travel',
        'description': 'Logins >500km apart within 1h',
        'data_source': 'auth_logs + geoip',
        'logic': '''
            source=auth action=login
            | geoip src_ip
            | streamstats current=f last(city) as prev_city,
              last(_time) as prev_time, last(lat) as prev_lat,
              last(lon) as prev_lon by user
            | eval distance = haversine(lat, lon, prev_lat, prev_lon)
            | eval time_diff = (_time - prev_time) / 3600
            | where distance > 500 AND time_diff < 1
        ''',
        'false_positive_rate': 'LOW',
        'notes': 'VPN exit node changes may cause false positives',
        'response': 'Force re-auth, alert SOC immediately'
    },
    {
        'name': 'USB on restricted endpoint',
        'description': 'USB mass storage on workstation',
        'data_source': 'endpoint_logs',
        'logic': '''
            source=endpoint EventType=DeviceConnect
            DeviceClass=MassStorage
        ''',
        'false_positive_rate': 'LOW',
        'notes': 'Should be zero if USB blocked by policy',
        'response': 'Block device, alert SOC, investigate immediately'
    }
]

# Print rules as a reference card
for rule in DETECTION_RULES:
    print(f"Rule: {rule['name']}")
    print(f"  FP rate: {rule['false_positive_rate']}")
    print(f"  Response: {rule['response']}")
    print()

The false positive problem is the central challenge of insider threat detection. A rule that fires on "user downloads >100 files" will catch a data-stealing insider, but it will also fire on the marketing team downloading assets for a campaign, the legal team pulling documents for a lawsuit, and the data engineering team doing their normal job. If your SOC analyst investigates 50 false positives before seeing one real incident, they stop investigating alerts. This is called alert fatigue, and it's the reason that many insider incidents go undetected even when detection rules exist and fire correctly. The alert was there. Nobody looked at it.

The solution is layered scoring (as in the UEBA model) rather than binary alerting. A single anomalous event adds to a risk score. Multiple anomalous events in combination escalate the score past a threshold that demands human attention. The UEBA system says "this user has been behaving oddly across FIVE different dimensions over the past week" rather than "this user downloaded a file." The former gets investigated. The latter gets ignored.

DLP -- Data Loss Prevention

DLP is the technical control layer that attempts to prevent sensitive data from leaving the organization. It operates at three deployment points:

1. Endpoint DLP
   Agent installed on every workstation/laptop.
   Monitors: clipboard operations, USB access, print jobs,
     screen captures, file access patterns, application usage
   Can block: copying PII to USB, printing classified docs,
     pasting source code into unauthorized applications
   Tools: Microsoft Purview, Symantec DLP, Digital Guardian,
     Code42 Incydr
   Limitation: requires agent on every device. BYOD and
     unmanaged devices are blind spots.

2. Network DLP
   Inspects network traffic for sensitive data patterns.
   Monitors: email (SMTP), web uploads (HTTP/HTTPS), FTP,
     cloud storage traffic, DNS tunneling
   Can block: emailing SSNs, uploading source code to Pastebin
   Limitation: CANNOT inspect encrypted traffic without TLS
     interception (MITM proxy), which breaks some applications
     and raises privacy concerns in some jurisdictions

3. Cloud DLP
   API-based integration with SaaS platforms.
   Monitors: Microsoft 365, Google Workspace, Salesforce,
     Slack, Teams, Box, Dropbox
   Can block: sharing files externally, posting PII in Slack
     channels, creating public links to sensitive documents
   Advantage: sees content INSIDE the cloud service (after
     TLS termination), so encryption is not a barrier
   Limitation: only works with supported SaaS platforms

DLP detection methods:
  - Regex pattern matching (SSNs, credit cards, IBAN numbers)
  - Keyword matching ("confidential", "trade secret", "NDA")
  - Document fingerprinting (hash of sensitive documents --
    detect copies even if renamed)
  - ML classification (trained to distinguish
    sensitive from normal content)
  - Exact data matching (hash every record in a database,
    detect any record appearing outside the database)

Defense: Building an Insider Threat Program

Technical controls are necessary but insufficient. A complete insider threat program combines technical, process, and cultural controls:

Technical controls:
  - Least privilege: users access ONLY what their job requires.
    Review quarterly. Revoke automatically when role changes.
  - Separation of duties: no single person controls a complete
    critical process. The person who approves payments cannot
    also create vendors. The sysadmin who manages backups cannot
    also delete audit logs.
  - DLP at endpoint, network, and cloud layers
  - UEBA for behavioral anomaly detection
  - USB port control via Group Policy or endpoint management
  - Watermark sensitive documents (traceable if leaked)
  - Comprehensive audit logging on all sensitive data access
  - Privileged access management (PAM) -- admin credentials
    checked out from a vault, sessions recorded, time-limited

Process controls:
  - Background checks for positions with sensitive access
  - Exit procedures: immediate access revocation on last day,
    device return, exit interview, review of recent file access
  - Regular access reviews: "does this user still need access
    to this system?" (quarterly minimum)
  - Mandatory vacations: forced time off reveals single-point-
    of-failure roles and ongoing fraud. If Bob is the only one
    who can do X, and Bob never takes vacation, ask why.
  - Transfer procedures: access from old role revoked before
    new role access granted (prevent privilege accumulation)

Cultural controls:
  - Anonymous reporting mechanism (ethics hotline, anonymous
    web form, direct to security team without going through
    management chain)
  - Non-punitive reporting culture: reward reporting, do NOT
    punish the messenger. If people fear retaliation for
    reporting concerns, they will not report.
  - Regular check-ins between managers and employees --
    especially after performance reviews, reorgs, layoffs
  - Address grievances before they become motives. The
    employee who is heard does not become the employee
    who steals data.
  - Financial wellness programs -- reduce the financial
    pressure that makes people susceptable to bribery or
    data theft for money

The AI Slop Connection

AI is creating a new and genuinely novel category of insider threat: the AI-assisted negligent insider. Employees paste proprietary source code into ChatGPT. They upload confidential documents to AI summarization tools. They share customer data with AI assistants for analysis. Each interaction sends company data to a third-party provider whose data retention policies may not align with the company's security requirements.

Samsung banned ChatGPT internally after engineers pasted proprietary semiconductor source code into it -- three times in a single month. But banning the tool does not eliminate the behavior. It pushes usage underground. Employees find alternative AI services, use personal devices, or access AI tools through personal accounts. The productivity gain from AI is real (studies suggest 30-50% improvement for some coding tasks), which means the incentive to use it despite the ban is enormous.

The defense: approved AI tools with data retention agreements and enterprise privacy controls, DLP rules that detect paste operations to known AI service domains, network policies that block unauthorized AI endpoints, clear data classification policies that specify which categories can and cannot be shared with AI services, and -- perhaps most importantly -- providing a sanctioned AI tool that is BETTER than the unauthorized alternatives. If the company's approved AI tool is slower, less capable, or harder to use than ChatGPT, employees will use ChatGPT. Remove the temptation by providing a superior alternative.

What Comes Next

Insider threats represent a fundamentally different security challenge than external attacks because the attacker is already past every perimeter defense. Detection shifts from "find the intruder" to "find the anomaly" -- identifying when a legitimate user starts behaving in ways that deviate from their established baseline. The combination of UEBA, DLP, and process controls (separation of duties, least privilege, mandatory vacations) creates a defense-in-depth model specifically designed for the insider scenario.

The next phase of this arc moves into territory where the line between real and fake gets blurry. Emerging technologies are enabling entirely new categories of deception -- convincing video and audio that can impersonate real people, fabricated evidence that is indistinguishable from genuine content, and social engineering at a scale that was previously impossible. The tools we covered in earlier episodes (social engineering in episode 8, OSINT in episode 47) are about to get a very significant force multiplier. And after that, we shift perspective entirely from the attacker's view to the defender's -- simulating real attacks end-to-end, responding when attacks succeed, and building intelligence programs that give defenders the advantage.

Exercises

Exercise 1: Design an insider threat detection rule set for a SIEM. Create 5 detection rules that would identify potential insider threats. For each rule, specify: (a) the data source (endpoint logs, network traffic, authentication logs, file access logs), (b) the detection logic (what pattern triggers the alert), (c) the expected false positive rate (high/medium/low) and why, (d) the recommended response action. At least one rule should target malicious insiders, one should target negligent insiders, and one should target compromised accounts. Save to ~/lab-notes/insider-detection-rules.md.

Exercise 2: Research the Tesla insider sabotage case (2018) and the Twitter insider access abuse (2020, where employees used internal tools to hijack high-profile accounts for a Bitcoin scam). For each case, document: (a) the insider's motivation and category (malicious/negligent/compromised), (b) what access they abused and how they obtained it, (c) how they were eventually detected, (d) what specific technical and process controls would have prevented the incident. Compare the two cases: what do they have in common, and what makes them different? Save to ~/lab-notes/insider-threat-cases.md.

Exercise 3: Evaluate DLP bypass techniques. For each of the 7 exfiltration channels listed in this episode (email, cloud storage, USB, screenshots/photos, printing, steganography, encrypted containers), rate on a scale of 1-5: (a) how detectable it is with current DLP technology, (b) how much data can be exfiltrated per incident (1 = a few pages, 5 = entire databases), (c) what the single most effective countermeasure is. Identify which channel is the hardest to defend against and explain why. Then propose a layered defense strategy that addresses all 7 channels simultaneously. Save to ~/lab-notes/dlp-bypass-analysis.md.


Bedankt en tot de volgende keer!

@scipio



0
0
0.000
0 comments