How IP Threat Detection Really Works

Back to All Blogs

IP threat detection starts with data, and most providers won't tell you where theirs comes from. They'll show you an API response with fields like is_vpn and threat_score, but the methodology behind those fields stays hidden. At IPGeolocation.io, we take a different approach. We classify over 15.5 million IP addresses across four detection layers, and this article explains how each one works.

TL;DR

IPGeolocation.io uses four detection layers: VPN/proxy endpoint enumeration, real-time connection behavior analysis, self-hosted honeypots, and proprietary blacklists.
Our scrapers monitor 160+ VPN providers around the clock, collecting exit IPs as they rotate.
Live behavioral analysis combines client-side checks and server-side network observations to detect timezone mismatches, DNS leaks, MTU anomalies, OS discrepancies, and advanced-level tests.
Our honeypots have recorded over 100,000 attacker IPs from real brute-force attempts and scans against our decoy systems.
Blacklists consolidate all signals into daily-updated threat classifications covering VPNs, proxies, Tor, bots, spam, and known attackers.

Every IP security API claims to detect threats. The difference is in how the underlying data is collected, verified, and scored. We built this system because detection accuracy depends on data freshness and source diversity, not just database size.

Why Detection Methodology Matters

Every IP security provider claims high accuracy. Few explain how they arrive at it. The problem is that VPN and proxy detection is not a static lookup problem. VPN providers rotate their exit nodes. Residential proxy networks recruit new devices constantly. Attackers cycle through IP ranges faster than monthly database dumps can track.

According to Security.org's 2026 VPN usage research, over 40% of internet users in the U.S. now use VPNs, with adoption rates above 50% in countries like Indonesia. That's a massive surface area of IPs that rotate between legitimate and anonymized traffic daily. A detection API that relies on stale data or a single detection method will miss a significant portion of these connections.

If you're evaluating a VPN detection API or any IP threat intelligence provider, methodology transparency is one of the strongest practical signals of data quality, because it shows how freshness, coverage, and false-positive handling are managed. We wrote a detailed framework for this in our guide on how to evaluate a VPN and proxy detection API.

Layer 1: VPN Detection Through Endpoint Enumeration

The foundation of our detection data is active endpoint discovery. Our automated scrapers run continuously, monitoring over 160 commercial VPN providers and proxy services. When a provider adds new exit nodes, rotates IP ranges, or launches new server locations, our system picks it up.

This isn't a one-time scan. VPN providers change their infrastructure regularly. A detection database that updates monthly will have blind spots within weeks. Our scrapers connect to provider APIs and service endpoints around the clock, collecting fresh IP data as it becomes available.

1. Tracking Commercial VPN Providers

For major VPN services (NordVPN, ExpressVPN, Surfshark, and dozens of smaller providers), we monitor their publicly accessible server lists and connection endpoints. Each provider's infrastructure has a fingerprint: the data centers they operate from, the protocols they support.

When a VPN provider spins up new servers in a region, those IPs enter our classification system within the same update cycle. The 160+ providers we track cover the vast majority of commercial VPN traffic worldwide.

2. The Hard Problem: Residential and Mobile Proxies

Datacenter VPNs are relatively simple to identify. The IPs belong to hosting providers, and the ASN data gives them away. Residential proxies are a different challenge entirely.

Residential proxy networks route traffic through real household IP addresses, assigned by legitimate ISPs. Proxyway's 2025 proxy market research found that residential proxies have become up to 70% cheaper over the past two years, driving rapid adoption. These IPs look identical to regular consumer traffic on the surface.

Our approach combines multiple signals. Endpoint enumeration catches known residential proxy provider infrastructure, while behavioral analysis (Layer 2) flags the traffic patterns that distinguish residential proxy connections from genuine users. Neither layer alone is sufficient. Together, they catch what single-method detection misses.

IP risk detection workflow using enumeration, behavioral analysis, honeypots, blacklist lookup, and threat classification.

Layer 2: Real-Time Connection Behavior Analysis

Static databases tell you what an IP was doing previously. Real-time behavioral analysis helps identify what a connection is doing at the moment of access.

This behavioral analysis system is also our separate Real-Time Proxy and VPN Detection solution. The IP Security API does not require customers to install this script for normal IP lookups, but signals from this live-detection system can enrich the broader IP Security database and improve classifications over time.

When visitors reach websites operated by our partners where our detection script is deployed, the system runs a series of connection-level tests in real time. These tests analyze the connection itself, not user behavior. No mouse tracking, no keystroke logging. The focus is on technical signals that reveal whether a connection is being routed through a VPN, proxy, or other anonymization layer. When validated, these signals help enrich the IP Security database used by the IP Security API.

1. See the Detection Methodology in Action

Watch our live VPN and proxy detection demo to see how real-time connection signals are tested against Browsec VPN, Surfshark VPN, Proton VPN, IPRoyal residential proxy, and Cloudflare WARP.

Watch the real-time VPN and proxy detection demo.

2. What the Behavioral Tests Check

Timezone-location mismatch. If an IP geolocates to New York but the browser reports a Tokyo timezone, something is off. VPN and proxy connections frequently create these mismatches because the user's real location differs from the IP's apparent location.

MTU size analysis. Maximum Transmission Unit values vary depending on how traffic is routed. VPN tunnels typically reduce the MTU because of encapsulation overhead. A connection claiming to be direct but showing VPN-characteristic MTU values is a flag.

DNS leak detection. Properly configured VPNs route DNS queries through their own servers. When DNS requests leak to a different resolver than expected for the apparent IP location, it indicates the connection path isn't what it appears to be.

IP leak identification. WebRTC and similar protocols can expose a user's real IP address even when they're connected through a VPN. Detecting a mismatch between the visible IP and a leaked real IP is one of the strongest individual signals.

OS fingerprinting. The operating system reported by the browser's user agent string should match the OS characteristics visible at the network level. Discrepancies suggest the connection is being manipulated or proxied.

TCP/IP fingerprinting. TCP/IP fingerprinting uses server-side network observations and connection metadata such as TTL-like behavior, MSS-related patterns, timing, and transport characteristics. These are combined with client-side signals rather than read directly from browser JavaScript.

HTTP/proxy headers. They can expose forwarding behavior, proxy chains, or altered client details. Headers like Proxy-Authenticate, Via, and Proxy_Connection may reveal that traffic is passing through an intermediary instead of following a direct user connection path.

Behavioral signal map showing live connection tests used to detect VPN behavior, proxy behavior, anonymization layers, and risk confidence.

These tests are supplemented by advanced live tests and proprietary detection layers that are not publicly disclosed, allowing us to identify VPN and proxy behavior with greater reliability across different network conditions. The combination of multiple behavioral signals produces a confidence score that feeds into the overall threat classification.

3. Why Behavioral Analysis Matters for Accuracy

Behavioral analysis closes gaps that static databases can't cover. A brand-new VPN server that launched an hour ago won't be in any enumeration database yet. But when traffic from that server hits a site with our behavioral detection, the timezone mismatch, MTU anomaly, and DNS leak tests can still flag it.

This is also the primary mechanism for catching residential proxies in real time. The IP itself looks clean (it's a real ISP-assigned address), but the connection behavior reveals the routing.

Layer 3: Self-Hosted Honeypots

Endpoint enumeration catches known services. Honeypots catch unknown attackers.

We operate multiple dedicated honeypot servers that mimic vulnerable services, attracting brute-force login attempts, port scans, and automated exploit probes. Every IP that interacts with the honeypots gets logged, analyzed, and classified.

So far, these servers have captured over 100,000 unique attacker IPs. These IPs were actively engaged in malicious behavior, including brute-force SSH attempts, credential stuffing, vulnerability scanning, and other automated attacks. The data feeds directly into our blacklist classification system.

Honeypots are particularly good at catching threats that endpoint enumeration misses. An attacker using a compromised residential IP address won't appear in any VPN provider's database. But when that IP attempts to access our honeypot using a brute-force pattern, it gets flagged as a known attacker.

Honeypot coverage depends on where the servers are deployed and what types of attack traffic they attract. Our honeypot network gives us direct visibility into active brute-force attempts, scans, and automated probes, but it is one signal among several. That is why this layer works alongside endpoint enumeration, behavioral analysis, and blacklist classification rather than standing alone.

1. Redacted Honeypot Log Samples

Below is a redacted sample from our self-hosted SSH honeypot logs, showing scanner activity, credential attempts, brute-force behavior, and post-login reconnaissance patterns captured during one observation window.

{
  "source": "self-hosted SSH honeypot",
  "protocol": "ssh",
  "observation_window": "2026-05-08 UTC",
  "redaction_note": "Source IPs are partially masked. Exact source IPs, destination IPs, sensor names, session IDs, UUIDs, usernames, passwords, SSH fingerprints, and internal file paths have been redacted.",
  "samples": [
    {
      "category": "scanner",
      "events": [
        {
          "eventid": "honeypot.session.connect",
          "src_ip": "139.199.x.x",
          "dst_ip": "redacted",
          "dst_port": 2222,
          "protocol": "ssh",
          "timestamp": "2026-05-08T00:04:45Z"
        },
        {
          "eventid": "honeypot.client.version",
          "src_ip": "139.199.x.x",
          "client_version": "SSH-2.0-libssh-0.2",
          "protocol": "ssh",
          "timestamp": "2026-05-08T00:04:45Z"
        },
        {
          "eventid": "honeypot.session.closed",
          "src_ip": "139.199.x.x",
          "duration_seconds": 0.4,
          "protocol": "ssh",
          "timestamp": "2026-05-08T00:04:45Z"
        }
      ],
      "interpretation": "Short-lived SSH probe that connected, revealed a client version, and disconnected without a meaningful authentication sequence."
    },
    {
      "category": "credential_stuffing",
      "events": [
        {
          "eventid": "honeypot.session.connect",
          "src_ip": "45.148.x.x",
          "dst_ip": "redacted",
          "dst_port": 2222,
          "protocol": "ssh",
          "timestamp": "2026-05-08T00:00:21Z"
        },
        {
          "eventid": "honeypot.client.version",
          "src_ip": "45.148.x.x",
          "client_version": "SSH-2.0-Go",
          "protocol": "ssh",
          "timestamp": "2026-05-08T00:00:21Z"
        },
        {
          "eventid": "honeypot.login.failed",
          "src_ip": "45.148.x.x",
          "username": "redacted",
          "password": "redacted",
          "attempt_count": 1,
          "protocol": "ssh",
          "timestamp": "2026-05-08T00:00:21Z"
        },
        {
          "eventid": "honeypot.session.closed",
          "src_ip": "45.148.x.x",
          "duration_seconds": 1.1,
          "protocol": "ssh",
          "timestamp": "2026-05-08T00:00:22Z"
        }
      ],
      "interpretation": "Automated login attempt using a common username/password pattern, followed by quick session closure."
    },
    {
      "category": "brute_force",
      "events": [
        {
          "eventid": "honeypot.session.connect",
          "src_ip": "2.57.x.x",
          "dst_ip": "redacted",
          "dst_port": 2222,
          "protocol": "ssh",
          "timestamp": "2026-05-08T00:08:03Z"
        },
        {
          "eventid": "honeypot.client.version",
          "src_ip": "2.57.x.x",
          "client_version": "SSH-2.0-libssh2_1.9.0",
          "protocol": "ssh",
          "timestamp": "2026-05-08T00:08:03Z"
        },
        {
          "eventid": "honeypot.login.failed",
          "src_ip": "2.57.x.x",
          "username": "redacted",
          "password": "redacted",
          "attempt_count": 5,
          "protocol": "ssh",
          "timestamp_range": "2026-05-08T00:08:03Z to 2026-05-08T00:08:07Z"
        },
        {
          "eventid": "honeypot.session.closed",
          "src_ip": "2.57.x.x",
          "duration_seconds": 5.4,
          "protocol": "ssh",
          "timestamp": "2026-05-08T00:08:08Z"
        }
      ],
      "interpretation": "Repeated failed login attempts from the same source during a short SSH session, consistent with brute-force behavior."
    },
    {
      "category": "post_login_behavior",
      "events": [
        {
          "eventid": "honeypot.session.connect",
          "src_ip": "124.117.x.x",
          "dst_ip": "redacted",
          "dst_port": 2222,
          "protocol": "ssh",
          "timestamp": "2026-05-08T00:05:02Z"
        },
        {
          "eventid": "honeypot.client.version",
          "src_ip": "124.117.x.x",
          "client_version": "SSH-2.0-Go",
          "protocol": "ssh",
          "timestamp": "2026-05-08T00:05:02Z"
        },
        {
          "eventid": "honeypot.login.success",
          "src_ip": "124.117.x.x",
          "username": "redacted",
          "password": "redacted",
          "protocol": "ssh",
          "timestamp": "2026-05-08T00:05:03Z"
        },
        {
          "eventid": "honeypot.command.input",
          "src_ip": "124.117.x.x",
          "command": "uname -s -m",
          "protocol": "ssh",
          "timestamp": "2026-05-08T00:05:03Z"
        },
        {
          "eventid": "honeypot.session.closed",
          "src_ip": "124.117.x.x",
          "duration_seconds": 1.7,
          "protocol": "ssh",
          "timestamp": "2026-05-08T00:05:03Z"
        }
      ],
      "interpretation": "The attacker entered the honeypot and immediately ran a basic system reconnaissance command."
    }
  ]
}

Layer 4: Threat Classification, Blocklists, and Reputation Scoring

The blacklist is where everything converges. Data from endpoint enumeration, honeypot captures, and behavioral analysis feeds into a unified classification system that labels every IP with specific threat types and confidence scores.

1. How Classification Works

Each IP in our database receives one or more threat labels: VPN, proxy, residential proxy, Tor exit node, bot, spam source, known attacker, relay, or cloud provider. The classification isn't binary. Each VPN and proxy label comes with a confidence score and a "last seen" timestamp.

An IP classified as a VPN with 95% confidence that was last seen 1 day ago is a stronger signal than one last seen 30 days ago. The timestamps let API consumers make their own risk decisions based on when the IP address last appeared.

2. Beyond VPNs and Proxies: Other Threat Categories

VPN and proxy detection get the most attention, but the classification system covers a wider range of IP threat types. Each category has its own data sources and detection logic.

Tor exit nodes. The Tor network publishes its own exit node list through a DNS-based service (DNSEL), which we poll continuously. Published lists are strongest for currently active, publicly listed exit nodes, so we refresh this data frequently and combine it with reputation checks before exposing the is_tor classification.

Known attackers. This label comes primarily from Layer 3 (honeypots). When an IP runs brute-force SSH attempts, credential stuffing, or automated vulnerability probes against our honeypot infrastructure, it earns the is_known_attacker classification. These IPs are often compromised machines or rented servers being used as attack platforms. The 100,000+ attacker IPs we've captured so far are high-confidence entries because the evidence is direct and based on attack behavior we observed firsthand.

Bots. Bot classification draws from multiple signals. At the IP level, we look at ASN reputation (datacenter ASNs with high bot-traffic ratios), request-pattern analysis, and cross-referencing against known bot infrastructure. Not all bots are malicious. Search engine crawlers, uptime monitors, and API integrations are legitimate automated traffic. The is_bot flag focuses on non-legitimate automated activity, including scraping, credential testing, inventory hoarding, and similar patterns where the traffic source is automated and the intent is adversarial or policy-violating.

Spam sources. Spam classification relies on our honeypot captures (IPs that attempt SMTP abuse or send unsolicited traffic to trap addresses), combined with curated threat intelligence feeds that track IP ranges associated with bulk spam distribution. An IP flagged as is_spam has been observed participating in spam campaigns or operating from infrastructure with a documented spam history.

Relay services. Relays are a newer category distinct from traditional VPNs. Services like Apple's iCloud Private Relay and Cloudflare WARP route user traffic through intermediary servers for privacy, but they aren't designed to hide identity for malicious purposes. Apple publishes its Private Relay IP ranges, and relay traffic from Cloudflare's network carries identifiable characteristics. We track these published ranges and map them to the is_relay flag with the associated relay_provider_name, so API consumers can distinguish privacy-oriented relay traffic from evasive VPN or proxy use.

Cloud and hosting providers. An IP address assigned to AWS, Google Cloud, Azure, DigitalOcean, or any of thousands of smaller hosting providers is not inherently malicious, but it's rarely a real end user. Legitimate consumer traffic doesn't originate from datacenter IP ranges. We identify hosting IPs through ASN mapping against regional internet registry (RIR) allocation data, published IP ranges from major cloud providers, and reverse DNS patterns that reveal hosting infrastructure. The is_cloud_provider flag with its accompanying cloud_provider_name lets API consumers decide their own policy, whether they want to block all datacenter traffic or simply weigh it into a risk score.

When multiple layers flag the same IP, confidence goes up. An IP identified through endpoint enumeration as belonging to a known VPN provider AND flagged by behavioral analysis for DNS leaks carries a higher confidence score than one flagged by a single layer. The system weights corroborating signals, though we don't publish the exact weights.

Threat category table mapping Tor, attackers, bots, spam, relays, and cloud hosting providers to detection signals and IP Security API fields.

3. Update Frequency

The whole database updates daily. New IPs from our scrapers, honeypot captures, and behavioral detections are integrated into the classification system within each update cycle. IPs that are no longer associated with threats get reclassified or removed.

This matters because IP addresses change hands. A datacenter IP that hosted a VPN endpoint last month might be reassigned to a legitimate business this month. Stale blacklists produce false positives. Daily updates minimize that window.

4. Scale

The current classification database covers over 15.5 million IP addresses, including hosting IP ranges. Threat categories span VPNs, proxies (datacenter and residential), Tor exit nodes, botnets, spam sources, known attackers, cloud providers, and relay services.

How the Four Layers Work Together

No single detection method catches everything. Endpoint enumeration identifies known commercial VPN and proxy services, but cannot determine whether the IPs are associated with attackers or spam. Honeypots catch active attackers but only see traffic directed at their specific servers. Behavioral analysis works in real time but requires client-side integration. The blacklist consolidates everything but depends on the other layers for fresh data.

The strength of this system is source diversity, which makes the VPN detection API more reliable across different anonymization and threat patterns. When endpoint enumeration flags an IP as belonging to NordVPN, that's a high-confidence classification based on infrastructure data. When a honeypot captures an IP running brute-force attacks, that's behavioral evidence of malicious intent. When client-side analysis detects a timezone mismatch and DNS leak on a residential IP, that's real-time evidence of proxy routing.

Each layer compensates for the others' blind spots. The combined result is more accurate than any individual method, and the daily update cycle keeps classifications current as IP assignments change.

What This Means for the IP Security API Response

When you query the IP Security API, many response fields map primarily to one detection layer, while aggregate fields such as threat_score and is_anonymous combine signals from multiple layers.

{
  "security": {
    "threat_score": 45,
    "is_tor": false,
    "is_proxy": true,
    "proxy_provider_names": [
      "Evomi Proxy",
      "Zyte Proxy",
      "NetNut",
      "Oxy Labs",
      "DataImpulse"
    ],
    "proxy_confidence_score": 99,
    "proxy_last_seen": "2026-04-30",
    "is_residential_proxy": true,
    "is_vpn": true,
    "vpn_provider_names": [
      "Air VPN"
    ],
    "vpn_confidence_score": 99,
    "vpn_last_seen": "2026-04-28",
    "is_relay": false,
    "relay_provider_name": "",
    "is_anonymous": true,
    "is_known_attacker": false,
    "is_bot": false,
    "is_spam": false,
    "is_cloud_provider": true,
    "cloud_provider_name": "M247 Ltd."
  }
}

The threat_score is a 0 to 100 IP risk score that aggregates signals from all four layers. Individual boolean fields (is_vpn, is_proxy, is_tor, is_bot, is_spam, is_known_attacker) reflect specific classifications. Together, these fields represent the full IP reputation profile for any address. Confidence scores indicate how certain the system is about each label, and last_seen timestamps tell you how recently the classification was confirmed.

For a deeper look at what these fields mean for different use cases, see our comparison of the best VPN and proxy detection APIs. If you're comparing providers, the confidence scores and last-seen timestamps are the fields most APIs don't offer. They exist because our methodology produces graded assessments, not just yes/no flags.

Full field documentation is available in the database documentation. Pricing details for the Security Pro tier are on our pricing page.

Annotated IP security API response mapping proxy, VPN, anonymization, attacker, bot, spam, and cloud provider fields to risk detection layers.

What We Don't Detect (and Why That Matters)

No detection system is perfect. Being transparent about limitations is part of being transparent about methodology.

Brand-new VPN services. A VPN provider that launched yesterday won't be in our enumeration database until our scrapers discover it. Behavioral analysis can flag traffic from unknown VPNs in real time, but only when the client-side script is deployed.

False positives happen. An IP recently released by a VPN provider and reassigned to a legitimate user may carry a stale classification for up to 24 hours. The daily update cycle minimizes this, but it doesn't eliminate it. Confidence scores and last-seen timestamps help handle cases like this: an old, low-confidence classification is a weaker signal than a recent, high-confidence one.

We publish these limitations because they're real. Any provider claiming 100% detection accuracy either isn't measuring carefully or isn't being honest.

FAQ

VPN detection combines multiple data sources: endpoint enumeration of known VPN provider IP ranges, real-time behavioral analysis of connection characteristics like MTU size and DNS routing, and historical classification data from honeypots and threat feeds. No single method catches everything, which is why multi-layered approaches produce better results.

An IP threat score is a numeric value (typically 0-100) that aggregates multiple risk signals into a single indicator. At IPGeolocation.io, the threat score reflects data from endpoint enumeration, honeypot intelligence, behavioral analysis, and blacklist classification. Higher scores indicate more risk signals detected across more layers.

Residential proxies are the hardest category to detect because they use real ISP-assigned IP addresses. They can still be detected through enumeration, but their highly rotating nature makes static lists insufficient on their own. The best results come from combining enumeration with real-time behavioral and IP reputation analysis. Accuracy is lower than for datacenter VPNs, and sophisticated residential proxy setups can still evade detection entirely.

The IPGeolocation.io threat classification database updates daily. Endpoint enumeration scrapers run continuously, collecting new VPN and proxy IPs around the clock. Behavioral analysis provides real-time detection when the client-side script is deployed. The combination of continuous collection and daily classification updates keeps data current while minimizing false positives from stale entries.