Background

I have been with GoFreight for 5 years. During the first two or three years, I was assigned to develop the crawler system for the tracking service. In the beginning, everything went well, and we were able to crawl the information we needed from carrier websites. However, as time passed, more and more carriers implemented anti-bot solutions on their websites, and we started encountering issues. We had to deal with various challenges such as CAPTCHAs, Google reCAPTCHA, CDN protections, etc. This required increasing amounts of time to bypass these detections, otherwise, we couldn’t meet our service SLA. In this blog, I want to take some time to recap the challenges we faced during this period, until we started retrieving data directly from our vendors or carriers.

TLS Handshake

First, I will briefly introduce the TLS Handshake. Transport Layer Security (TLS) is a widely adopted security protocol designed to ensure privacy and data security for communications over the internet. One of its primary use cases is encrypting the communication between web applications and servers. In the following figure, we can see how the client and server establish a secure connection before sending or receiving data, and this process is called the TLS Handshake.

src: https://www.cloudflare.com/learning/ssl/what-happens-in-a-tls-handshake/

Fingerprint

Next, I will briefly explain what a fingerprint is with a simple example. A fingerprint is a unique identifier generated from a combination of device, browser, and network characteristics. It’s used to recognize users or bots, even when:

  • Cookies are disabled
  • IP addresses change
  • Users switch to incognito or private mode

Think of it as a digital ID for your device/browser session. This technique is widely used by MarTech and CDN companies for various purposes, including improving ad transfer rates and detecting bots. In this example, we’ll use FingerprintJS to help demonstrate how it works.

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Bot Detection with FingerprintJS v4</title>
</head>
<body>
    <h1>Bot Detection Demo</h1>
    <p>Click the button below to check if you're a bot!</p>
    <button id="checkBtn">Check Me</button>
    <p id="result"></p>

    <!-- Load FingerprintJS as a module -->
    <script type="module">
        import FingerprintJS from 'https://openfpcdn.io/fingerprintjs/v4';

        async function checkFingerprint() {
            const fp = await FingerprintJS.load();
            const result = await fp.get({ extendedResult: true });

            console.log("Fingerprint ID:", result.visitorId);
            console.log("Detailed Components:", result.components);
            console.log("Confidence Score:", result.confidence.score);
        }


        // Attach event listener after DOM loads
        document.getElementById("checkBtn").addEventListener("click", checkFingerprint);
    </script>
</body>
</html>

You can open your HTML file in the browser. After clicking the button, you’ll see the Fingerprint ID in the developer console. We also print the details of the components for reference. You can review the content of these components to understand which items might affect the Fingerprint. If you open another tab and visit the same file, you may see the same Fingerprint ID because the content of the components remains unchanged. At this point, you should have an initial understanding of how the Fingerprint works.

Bot Detection

Imagine you have a website, and many bots are visiting, crawling everything. Sometimes, this can even cause your website to crash. What can you do? In the past, we often added rate limits based on IP addresses. However, with the rise of proxy services, users can easily apply new IPs from around the world. This makes the IP-based solution less effective today. Luckily, as you might have guessed, here comes Fingerprinting. By calculating the Fingerprint ID based on the browser's components, we can easily identify whether website requests are coming from the same instance.

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Bot Detection with FingerprintJS v4</title>
</head>
<body>
    <h1>Bot Detection Demo</h1>
    <p>Click the button below to check if you're a bot!</p>
    <button id="checkBtn">Check Me</button>
    <p id="result"></p>

    <!-- Load FingerprintJS as a module -->
    <script type="module">
        import FingerprintJS from 'https://openfpcdn.io/fingerprintjs/v4';

        async function checkFingerprint() {
            const fp = await FingerprintJS.load();
            const result = await fp.get({ extendedResult: true });

            console.log("Fingerprint ID:", result.visitorId);
            console.log("Detailed Components:", result.components);
            console.log("Confidence Score:", result.confidence.score);

            const fingerprint = result.visitorId;

            // Send to backend
            fetch('http://127.0.0.1:5000/check_bot', {
                method: 'POST',
                headers: { 'Content-Type': 'application/json' },
                body: JSON.stringify({ 
                    fingerprint: fingerprint,
                    userAgent: navigator.userAgent,
                    screenSize: `${screen.width}x${screen.height}`,
                    languages: navigator.languages
                })
            })
            .then(response => response.json())
            .then(data => {
                document.getElementById("result").innerText = data.message;
            });
        }


        // Attach event listener after DOM loads
        document.getElementById("checkBtn").addEventListener("click", checkFingerprint);
    </script>
</body>
</html>
from flask import Flask, request, jsonify
from flask_cors import CORS
import time

app = Flask(__name__)
CORS(app)

# Store fingerprints to detect rapid, repetitive requests
fingerprint_tracker = {}

# Known bot-like user-agents
BOT_USER_AGENTS = [
    "HeadlessChrome", "bot", "crawl", "spider", "Googlebot", "Bingbot", "Yahoo! Slurp", "DuckDuckBot"
]

# Fake screen resolutions (some bots report unusual screen sizes)
UNREALISTIC_SCREENS = ["0x0", "1x1", "1024x1024"]

# Check if the request is likely from a bot
def is_bot(fingerprint, user_agent, screen_size, languages):
    current_time = time.time()

    # 1. Detect rapid repeated requests (rate limiting)
    if fingerprint in fingerprint_tracker:
        last_request_time = fingerprint_tracker[fingerprint]
        if current_time - last_request_time < 2:  # Less than 2 seconds between requests
            return True, "Suspicious rapid requests detected"
    
    # Update last request time
    fingerprint_tracker[fingerprint] = current_time

    # 2. Check for known bot user-agents
    if any(bot in user_agent for bot in BOT_USER_AGENTS):
        return True, "Bot-like User-Agent detected"

    # 3. Check for unusual screen sizes (some bots use default headless sizes)
    if screen_size in UNREALISTIC_SCREENS:
        return True, "Unrealistic screen resolution detected"

    # 4. Check if the language list is empty (bots often don't send language data)
    if not languages or len(languages) == 0:
        return True, "No language data found"

    return False, "Looks like a human"

@app.route('/check_bot', methods=['POST'])
def check_bot():
    data = request.get_json()
    
    fingerprint = data.get("fingerprint")
    user_agent = data.get("userAgent", "")
    screen_size = data.get("screenSize", "")
    languages = data.get("languages", [])

    bot, message = is_bot(fingerprint, user_agent, screen_size, languages)
    
    return jsonify({"is_bot": bot, "message": message})

if __name__ == '__main__':
    app.run(debug=True)

By default, browsers block cross-origin requests for security reasons (Same-Origin Policy). By doing so, your Flask backend can receive a request from the client on localhost for testing.

Let's delve into the backend code. Now that you have the fingerprint, you can easily build the fingerprint profile. Although we use a dictionary format (fingerprint_tracker) as an example, you can imagine recording this fingerprint information in the database. Each time a request with a specific fingerprint comes in, we can query the database to check its history, helping us detect bots. This example just gives you a basic example of how to use the fingerprint to detect the bot.

TLS Fingerprint

Next, we will introduce the TLS Fingerprint. As we mentioned before, general fingerprinting uses the browser component to calculate its value. TLS Fingerprinting is a technique used to identify clients based on their unique Transport Layer Security (TLS) handshake characteristics. When a client (e.g., browser or bot) connects to a server, it starts with a TLS handshake that includes:

TLS ParameterWhat It RevealsHow It's Used for Fingerprinting
TLS Versione.g., TLS 1.2 or TLS 1.3Some bots use outdated versions
Cipher SuitesList of supported encryption algorithmsBots often have limited options
ExtensionsFeatures like ALPN, SNI, GREASEUnique combinations per browser
Elliptic CurvesSupported key exchange methodsUnusual curves = suspicious
Signature AlgorithmsAuthentication methods usedDiffer per OS/browser
Order of FieldsThe sequence of cipher suites/extensionsDifferent per client type

Let's see an example. There are two clients connected to your server:

TLS 1.3
Cipher Suites: [TLS_AES_128_GCM_SHA256, TLS_AES_256_GCM_SHA384]
Extensions: [server_name, supported_versions, key_share, psk_key_exchange_modes]
Elliptic Curves: [X25519, secp256r1]
Signature Algorithms: [rsa_pss_rsae_sha256, ecdsa_secp256r1_sha256]
Legitimate User (Chrome on Windows)
TLS 1.2
Cipher Suites: [TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384]
Extensions: [server_name]
Elliptic Curves: [secp256r1]
Signature Algorithms: [rsa_pss_rsae_sha256]
Suspicious Bot (Python Requests Library)

Based on this information, we can easily detect that the second request was generated by Python Requests and may belong to a suspicious bot, so we can block it.

I will use mitmproxy to patch the TLS information of the Python request. First, create the debug_hello.py script to check if our request is being patched in mitmproxy, and then use the mitmdump command to start the mitmproxy service locally.

from mitmproxy import tls

EXTENSION_NAMES = {
    0: "server_name",
    10: "supported_groups",
    11: "ec_point_formats",
    13: "signature_algorithms",
    16: "application_layer_protocol_negotiation",
    21: "padding",
    22: "encrypt_then_mac",
    23: "extended_master_secret",
    43: "supported_versions",
    45: "psk_key_exchange_modes",
    49: "post_handshake_auth",
    51: "key_share",
    65281: "renegotiation_info",
}

def readable_extensions(extensions):
    return [
        EXTENSION_NAMES.get(ext_id, f"unknown({ext_id})")
        for ext_id, _ in extensions
    ]

def tls_clienthello(data: tls.ClientHelloData):
    hello = data.client_hello
    print("JA3 Debug:")
    print(f"  - Client: {data.context.client.peername}")
    print(f"  - Cipher Suites: {hello.cipher_suites}")
    print(f"  - Extensions: {readable_extensions(hello.extensions)}")
debug_hello.py
$ mitmdump --mode regular@8082 -s debug_hello.py --set tls_client_hello=chrome_120

Next, we’ll prepare a simple Python request code to send a request with a proxy and examine the result from tls.peet.ws. Take some time to review the differences between using and not using a proxy, especially the significant variations in the ciphers section.

import requests
from pprint import pprint

proxies = {
    "http": "http://127.0.0.1:8082",
    "https": "http://127.0.0.1:8082",
}

response = requests.get("https://tls.peet.ws/api/all", proxies=proxies, verify=False)
# response = requests.get("https://tls.peet.ws/api/all")

pprint(response.json())
{'donate': 'Please consider donating to keep this API running. Visit '
           'https://tls.peet.ws',
 'http1': {'headers': ['Host: tls.peet.ws',
                       'User-Agent: python-requests/2.32.3',
                       'Accept-Encoding: gzip, deflate, br, zstd',
                       'Accept: */*',
                       'Connection: keep-alive']},
 'http_version': 'HTTP/1.1',
 'ip': '103.234.230.84:61132',
 'method': 'GET',
 'tcpip': {'ip': {}, 'tcp': {}},
 'tls': {'ciphers': ['TLS_AES_256_GCM_SHA384',
                     'TLS_CHACHA20_POLY1305_SHA256',
                     'TLS_AES_128_GCM_SHA256',
                     'TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256',
                     'TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256',
                     'TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384',
                     'TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384',
                     'TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256',
                     'TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256',
                     'TLS_DHE_RSA_WITH_AES_128_GCM_SHA256',
                     'TLS_DHE_RSA_WITH_AES_256_GCM_SHA384',
                     'TLS_DHE_RSA_WITH_CHACHA20_POLY1305_SHA256',
                     'TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256',
                     'TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256',
                     'TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA',
                     'TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA',
                     'TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384',
                     'TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384',
                     'TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA',
                     'TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA',
                     'TLS_DHE_RSA_WITH_AES_128_CBC_SHA256',
                     'TLS_DHE_RSA_WITH_AES_256_CBC_SHA256',
                     'TLS_RSA_WITH_AES_128_GCM_SHA256',
                     'TLS_RSA_WITH_AES_256_GCM_SHA384',
                     'TLS_RSA_WITH_AES_128_CBC_SHA256',
                     'TLS_RSA_WITH_AES_256_CBC_SHA256',
                     'TLS_RSA_WITH_AES_128_CBC_SHA',
                     'TLS_RSA_WITH_AES_256_CBC_SHA'],
         'client_random': '2059469c2c091fad5cdc5d3923783e2297f8a96de4c9e06efb22572f8294bc2d',
         'extensions': [{'data': '00',
                         'name': 'extensionRenegotiationInfo (boringssl) '
                                 '(65281)'},
                        {'name': 'server_name (0)',
                         'server_name': 'tls.peet.ws'},
                        {'elliptic_curves_point_formats': ['0x00',
                                                           '0x01',
                                                           '0x02'],
                         'name': 'ec_point_formats (11)'},
                        {'name': 'supported_groups (10)',
                         'supported_groups': ['X25519 (29)',
                                              'P-256 (23)',
                                              'X448 (30)',
                                              'P-521 (25)',
                                              'P-384 (24)',
                                              'ffdhe2048 (256)',
                                              'ffdhe3072 (257)',
                                              'ffdhe4096 (258)',
                                              'ffdhe6144 (259)',
                                              'ffdhe8192 (260)']},
                        {'data': '', 'name': 'session_ticket (35)'},
                        {'name': 'application_layer_protocol_negotiation (16)',
                         'protocols': ['http/1.1']},
                        {'data': '', 'name': 'encrypt_then_mac (22)'},
                        {'extended_master_secret_data': '',
                         'master_secret_data': '',
                         'name': 'extended_master_secret (23)'},
                        {'name': 'signature_algorithms (13)',
                         'signature_algorithms': ['ecdsa_secp256r1_sha256',
                                                  'ecdsa_secp384r1_sha384',
                                                  'ecdsa_secp521r1_sha512',
                                                  'ed25519',
                                                  'ed25519',
                                                  'ecdsa_brainpoolP256r1tls13_sha256',
                                                  'ecdsa_brainpoolP384r1tls13_sha384',
                                                  'ecdsa_brainpoolP512r1tls13_sha512',
                                                  'rsa_pss_pss_sha256',
                                                  'rsa_pss_pss_sha384',
                                                  'rsa_pss_pss_sha512',
                                                  'rsa_pss_rsae_sha256',
                                                  'rsa_pss_rsae_sha384',
                                                  'rsa_pss_rsae_sha512',
                                                  'rsa_pkcs1_sha256',
                                                  'rsa_pkcs1_sha384',
                                                  'rsa_pkcs1_sha512',
                                                  '0x303',
                                                  '0x301',
                                                  '0x302',
                                                  '0x402',
                                                  '0x502',
                                                  '0x602']},
                        {'name': 'supported_versions (43)',
                         'versions': ['TLS 1.3', 'TLS 1.2']},
                        {'PSK_Key_Exchange_Mode': 'PSK with (EC)DHE key '
                                                  'establishment (psk_dhe_ke) '
                                                  '(1)',
                         'name': 'psk_key_exchange_modes (45)'},
                        {'name': 'key_share (51)',
                         'shared_keys': [{'X25519 (29)': 'a3ba691321dfea99979785396e5c370ee6ee6a7403cb736d51388c9d65206800'}]}],
         'ja3': '771,4866-4867-4865-49195-49199-49196-49200-52393-52392-158-159-52394-49187-49191-49161-49171-49188-49192-49162-49172-103-107-156-157-60-61-47-53,65281-0-11-10-35-16-22-23-13-43-45-51,29-23-30-25-24-256-257-258-259-260,0-1-2',
         'ja3_hash': '135b770c875c319c3564deacfe0bcc39',
         'ja4': 't13d2812h1_a01be8c064b6_0b298858d6c1',
         'ja4_r': 't13d2812h1_002f,0035,003c,003d,0067,006b,009c,009d,009e,009f,1301,1302,1303,c009,c00a,c013,c014,c023,c024,c027,c028,c02b,c02c,c02f,c030,cca8,cca9,ccaa_000a,000b,000d,0015,0016,0017,0023,002b,002d,0033,ff01_0403,0503,0603,0807,0808,081a,081b,081c,0809,080a,080b,0804,0805,0806,0401,0501,0601,0303,0301,0302,0402,0502,0602',
         'peetprint': '772-771|1.1|29-23-30-25-24-256-257-258-259-260|1027-1283-1539-2055-2056-2074-2075-2076-2057-2058-2059-2052-2053-2054-1025-1281-1537-771-769-770-1026-1282-1538|1||4866-4867-4865-49195-49199-49196-49200-52393-52392-158-159-52394-49187-49191-49161-49171-49188-49192-49162-49172-103-107-156-157-60-61-47-53|0-10-11-13-16-22-23-35-43-45-51-65281',
         'peetprint_hash': 'a81429f9a27d4b2da1c4126a7921174a',
         'session_id': '4f5ff2f21118a79e9af3be3367428189e2c7050a629e24264149dea476e84e7e',
         'tls_version_negotiated': '772',
         'tls_version_record': '771'}}

Takeaways

In this post, I’ve introduced Fingerprint and TLS Fingerprint and provided some simple examples to demonstrate how modern websites and servers use them for bot detection. At the end of this post, I’ll share a few key takeaways:

  • Python Requests Can Be Upgraded to Look Like Chrome: If you find that your default TLS version is still 1.2 for Python requests, remember to upgrade both Python and OpenSSL to enable TLS 1.3. This upgrade helps you bypass TLS-layer bot detection mechanisms like JA3/JA4 checks.
  • mitmproxy Presets Emulate Real Browsers and Devices: Including mitmproxy with a fingerprint solution in your crawler engine can help bypass bot detection systems by emulating real browser and device behavior.
  • OpenSSL is the Default TLS Engine - and it’s Easy to detect: Python, curl, and most CLI tools use OpenSSL for TLS, making their ClientHello predictable and easy to fingerprint unless spoofed. mitmproxy mitigates this issue by terminating TLS itself and sending a new ClientHello to the target using your chosen preset.

Reference