Bot Traffic
I discovered Peter Rukavina’s post Bots Are Eating My Blog for Lunch thanks to this post on Kev’s blog. Short story, by analizing his server’s log he discovered that around 85% of his blog’s traffic is generated by bots.
Couldn’t resist following the same procedure to analyze my blog’s traffic from January to May 2025:
- Semrush bot accounts for 64.2% of my traffic
- 18.6% are visits from other bots
- 17.2 % is human traffic
Claude.ai generated this nice table:
Apache Log Analysis
User Agent Summary & Traffic Breakdown
Key Insights: SemrushBot dominates with 64.2% of traffic, while legitimate human users account for only 17%. Consider implementing bot blocking or rate limiting to improve server performance and reduce bandwidth costs.
I don’t mind bots visits in general. But Semrush traffics seems excessive. I decided to block the bots that generated over 5% of the traffic in my robots.txt
file.
# robots.txt - Block bots with 10%+ traffic
# Generated based on Apache log analysis
# Block SemrushBot (64.2% of traffic)
# SEO analysis and competitive research bot
User-agent: SemrushBot
Disallow: /
# Block various SemrushBot variants
User-agent: SemrushBot/7~bl
Disallow: /
User-agent: SemrushBot/*
Disallow: /
# Note: Cannot block empty/null user agents via robots.txt
# Consider server-level blocking for requests without User-Agent headers
# Allow other legitimate bots and crawlers
# UptimeRobot (5.1%) - monitoring service, usually beneficial
User-agent: UptimeRobot
Allow: /
# AhrefsBot (1.9%) - SEO bot, but lower traffic
User-agent: AhrefsBot
Allow: /
# Amazonbot (1.6%) - e-commerce crawler
User-agent: Amazonbot
Allow: /
# GPTBot (0.6%) - OpenAI's crawler for AI training
# Uncomment the following lines if you want to block AI training
# User-agent: GPTBot
# Disallow: /
# Facebook crawler (0.8%) - for link previews
User-agent: facebookexternalhit
Allow: /
# Allow all other bots by default
User-agent: *
Allow: /
Also, as bots not always respect the robots.txt
file, I blocked Semrush in my webserver configuration:
RewriteEngine On
# Block SemrushBot and all its variants
RewriteCond %{HTTP_USER_AGENT} SemrushBot [NC]
RewriteRule ^.*$ - [F,L]
# Block specific SemrushBot version from your logs
RewriteCond %{HTTP_USER_AGENT} "SemrushBot/7~bl" [NC]
RewriteRule ^.*$ - [F,L]
# Block any SemrushBot version pattern
RewriteCond %{HTTP_USER_AGENT} "SemrushBot/[0-9]" [NC]
RewriteRule ^.*$ - [F,L]
# Optional: Block empty/null user agents (18.6% of your traffic)
# Uncomment the following lines to also block empty user agents
# RewriteCond %{HTTP_USER_AGENT} ^$ [OR]
# RewriteCond %{HTTP_USER_AGENT} ^-$
# RewriteRule ^.*$ - [F,L]
Testing the new configuration:
$ curl -I -A "SemrushBot" https://zoia.org
HTTP/1.1 403 Forbidden
Date: Wed, 11 Jun 2025 18:10:38 GMT
Server: Apache/2.4.58 (Ubuntu)
Content-Type: text/html; charset=iso-8859-1
I’ll wait and check the new traffic stats in a couple of months.