Bot Traffic

I discovered Peter Rukavina’s post Bots Are Eating My Blog for Lunch thanks to this post on Kev’s blog. Short story, by analizing his server’s log he discovered that around 85% of his blog’s traffic is generated by bots.

Couldn’t resist following the same procedure to analyze my blog’s traffic from January to May 2025:

  • Semrush bot accounts for 64.2% of my traffic
  • 18.6% are visits from other bots
  • 17.2 % is human traffic

Claude.ai generated this nice table:

Apache Log Analysis

User Agent Summary & Traffic Breakdown

1,579,188
Total Requests
20
Unique User Agents
64.2%
Bot Traffic
17.0%
Human Traffic
Category User Agent / Bot Requests % Description
SEO Bot SemrushBot 1,013,643 64.2% SEO analysis and competitive research
Unknown Empty/Null User Agent 293,203 18.6% Missing or stripped user agent
Monitoring UptimeRobot 80,669 5.1% Website uptime monitoring service
SEO Bot AhrefsBot 30,751 1.9% SEO backlink analysis crawler
Image Bot ImagesiftBot 26,659 1.7% Image analysis and processing
E-commerce Amazonbot 25,723 1.6% Amazon’s web crawler
Search Bot MJ12bot v1.4.8 17,493 1.1% Search engine crawler
Desktop Chrome 91 (Windows 10) 14,711 0.9% Human user - Windows desktop
Mobile Chrome 90 (Android - Redmi) 13,408 0.8% Human user - Android mobile
Search Bot PetalBot 12,997 0.8% Huawei search engine crawler
Desktop Safari 18.4 (Mac OS X) 12,922 0.8% Human user - Mac desktop
Mobile Chrome 60 (Samsung) 12,754 0.8% Human user - Samsung mobile
Custom Bot l9explore 12,715 0.8% Custom exploration/scraping tool
Social Media Facebook External Agent 12,339 0.8% Facebook link preview crawler
Desktop Chrome 80 (Mac OS X) 11,986 0.8% Human user - Mac desktop (older)
Search Bot MJ12bot v2.0.0 9,769 0.6% Search engine crawler (newer)
AI Bot GPTBot (OpenAI) 9,613 0.6% OpenAI’s web crawler for AI training
RSS Reader Feedbin 9,355 0.6% RSS feed reader (3 subscribers)
Desktop Edge 114 (Windows 10) 8,727 0.6% Human user - Windows Edge
Desktop Chrome 133 (Mac OS X) 8,427 0.5% Human user - Mac desktop (latest)

Key Insights: SemrushBot dominates with 64.2% of traffic, while legitimate human users account for only 17%. Consider implementing bot blocking or rate limiting to improve server performance and reduce bandwidth costs.

I don’t mind bots visits in general. But Semrush traffics seems excessive. I decided to block the bots that generated over 5% of the traffic in my robots.txt file.

# robots.txt - Block bots with 10%+ traffic
# Generated based on Apache log analysis

# Block SemrushBot (64.2% of traffic)
# SEO analysis and competitive research bot
User-agent: SemrushBot
Disallow: /

# Block various SemrushBot variants
User-agent: SemrushBot/7~bl
Disallow: /

User-agent: SemrushBot/*
Disallow: /

# Note: Cannot block empty/null user agents via robots.txt
# Consider server-level blocking for requests without User-Agent headers

# Allow other legitimate bots and crawlers
# UptimeRobot (5.1%) - monitoring service, usually beneficial
User-agent: UptimeRobot
Allow: /

# AhrefsBot (1.9%) - SEO bot, but lower traffic
User-agent: AhrefsBot
Allow: /

# Amazonbot (1.6%) - e-commerce crawler
User-agent: Amazonbot
Allow: /

# GPTBot (0.6%) - OpenAI's crawler for AI training
# Uncomment the following lines if you want to block AI training
# User-agent: GPTBot
# Disallow: /

# Facebook crawler (0.8%) - for link previews
User-agent: facebookexternalhit
Allow: /

# Allow all other bots by default
User-agent: *
Allow: /

Also, as bots not always respect the robots.txt file, I blocked Semrush in my webserver configuration:


    RewriteEngine On

    # Block SemrushBot and all its variants
    RewriteCond %{HTTP_USER_AGENT} SemrushBot [NC]
    RewriteRule ^.*$ - [F,L]

    # Block specific SemrushBot version from your logs
    RewriteCond %{HTTP_USER_AGENT} "SemrushBot/7~bl" [NC]
    RewriteRule ^.*$ - [F,L]

    # Block any SemrushBot version pattern
    RewriteCond %{HTTP_USER_AGENT} "SemrushBot/[0-9]" [NC]
    RewriteRule ^.*$ - [F,L]

    # Optional: Block empty/null user agents (18.6% of your traffic)
    # Uncomment the following lines to also block empty user agents
    # RewriteCond %{HTTP_USER_AGENT} ^$ [OR]
    # RewriteCond %{HTTP_USER_AGENT} ^-$
    # RewriteRule ^.*$ - [F,L]

Testing the new configuration:

$ curl -I -A "SemrushBot" https://zoia.org

HTTP/1.1 403 Forbidden
Date: Wed, 11 Jun 2025 18:10:38 GMT
Server: Apache/2.4.58 (Ubuntu)
Content-Type: text/html; charset=iso-8859-1

I’ll wait and check the new traffic stats in a couple of months.

bots, robots.txt, bots traffic

Join my free newsletter and receive updates directly to your inbox.