Bad Bot Detection

This guide explains how to use the bad bot detection feature to block malicious crawlers and scrapers.

Overview

The badbots.py script generates configuration files to block known malicious bots based on their User-Agent strings. It fetches bot lists from multiple public sources and generates blocking rules for each supported web server.

How It Works

Fetches bot lists from public sources:
- ai.robots.txt
- Various community-maintained bot lists
Generates blocking configurations for each platform
Updates configurations daily via GitHub Actions

Generated Files

Platform	File	Format
Nginx	`bots.conf`	Map directive
Apache	`bots.conf`	ModSecurity rules
Traefik	`bots.toml`	Middleware config
HAProxy	`bots.acl`	ACL patterns

Nginx Bot Blocker

The Nginx configuration uses a map directive:

nginx

# In http block
map $http_user_agent $bad_bot {
    default 0;
    "~*AhrefsBot" 1;
    "~*SemrushBot" 1;
    "~*MJ12bot" 1;
    "~*DotBot" 1;
    # ... more bots
}

# In server block
if ($bad_bot) {
    return 403;
}

Integration

nginx

http {
    include /path/to/waf_patterns/nginx/bots.conf;
    
    server {
        if ($bad_bot) {
            return 403;
        }
    }
}

Apache Bot Blocker

Uses ModSecurity rules:

apache

SecRule REQUEST_HEADERS:User-Agent "@rx AhrefsBot" \
    "id:200001,phase:1,deny,status:403,msg:'Bad Bot Blocked'"

HAProxy Bot Blocker

Uses ACL rules:

haproxy

acl bad_bot hdr_reg(User-Agent) -i -f /etc/haproxy/bots.acl
http-request deny if bad_bot

Blocked Bot Categories

The following categories of bots are blocked by default:

SEO/Marketing Crawlers

AhrefsBot
SemrushBot
MJ12bot
DotBot
BLEXBot

AI/ML Crawlers

GPTBot
ChatGPT-User
CCBot
Google-Extended
Anthropic-AI

Scrapers

DataForSeoBot
PetalBot
Bytespider
ClaudeBot

Malicious Bots

Known vulnerability scanners
Spam bots
Content scrapers

Customization

Add Custom Bots

Edit the generated file or add your own patterns:

nginx

# Nginx: Add to bots.conf
"~*MyCustomBot" 1;

apache

# Apache: Add rule
SecRule REQUEST_HEADERS:User-Agent "@rx MyCustomBot" \
    "id:200999,deny"

Whitelist Bots

For Nginx, allow specific bots:

nginx

map $http_user_agent $bad_bot {
    default 0;
    "~*Googlebot" 0;     # Allow Google
    "~*AhrefsBot" 1;     # Block Ahrefs
}

Allow All Bots for Specific Paths

nginx

location /public-api {
    # Override bot blocking
    if ($bad_bot) {
        # Don't block here
    }
}

Generate Manually

Run the script to regenerate bot lists:

bash

python badbots.py

The script supports fallback lists if primary sources are unavailable.

Monitoring

Log Blocked Bots

Enable logging to track blocked requests:

nginx

if ($bad_bot) {
    access_log /var/log/nginx/blocked_bots.log;
    return 403;
}

Analyze Bot Traffic

bash

# Count blocked bot requests
grep "403" /var/log/nginx/access.log | \
  awk '{print $12}' | sort | uniq -c | sort -rn | head -20

Best Practices

Regular Updates: The bot lists are updated daily. Pull the latest changes or download from releases.
Monitor False Positives: Some legitimate services may use blocked User-Agents. Monitor your logs.
Combine with Rate Limiting: Use bot blocking with rate limiting for comprehensive protection.
Test Before Deploying: Verify that legitimate traffic (search engines, monitoring) is not blocked.

WARNING

Blocking search engine bots (Googlebot, Bingbot) can negatively impact SEO. The default lists do not block major search engines.

Bad Bot Detection ​

Overview ​

How It Works ​

Generated Files ​

Nginx Bot Blocker ​

Integration ​

Apache Bot Blocker ​

HAProxy Bot Blocker ​

Blocked Bot Categories ​

SEO/Marketing Crawlers ​

AI/ML Crawlers ​

Scrapers ​

Malicious Bots ​

Customization ​

Add Custom Bots ​

Whitelist Bots ​

Allow All Bots for Specific Paths ​

Generate Manually ​

Monitoring ​

Log Blocked Bots ​

Analyze Bot Traffic ​

Best Practices ​

Bad Bot Detection

Overview

How It Works

Generated Files

Nginx Bot Blocker

Integration

Apache Bot Blocker

HAProxy Bot Blocker

Blocked Bot Categories

SEO/Marketing Crawlers

AI/ML Crawlers

Scrapers

Malicious Bots

Customization

Add Custom Bots

Whitelist Bots

Allow All Bots for Specific Paths

Generate Manually

Monitoring

Log Blocked Bots

Analyze Bot Traffic

Best Practices