Skip to main content

AI Agent Blocking Overview

The LLMTAG plugin maintains a comprehensive database of 60+ known AI agents and crawlers, allowing you to selectively block or allow them based on your content protection needs.

60+ AI Agents Blocked

Proactive ProtectionSelective BlockingReal-time UpdatesCustom Agent Management

How AI Agent Blocking Works

Blocking Mechanism

The plugin uses multiple layers of protection to block AI agents:
1

User-Agent Detection

Analyze incoming requests to identify AI agents by their user-agent strings.
2

Database Lookup

Check the AI agent database to determine if the agent should be blocked.
3

Policy Application

Apply your configured blocking rules and exceptions.
4

Request Blocking

Block the request before it reaches your content, returning a 403 Forbidden response.
5

Logging and Analytics

Log the blocked request for monitoring and analysis.

Blocking Methods

  • Server-Level Blocking
  • Application-Level Blocking
  • Hybrid Blocking
Method: .htaccess rules and server configuration Advantages: Fast, efficient, works at the server level Best for: Most websites, high-traffic sites

AI Agent Database

Agent Categories

The plugin organizes AI agents into logical categories for easy management:

OpenAI Agents

GPTBot

Purpose: OpenAI’s web crawler for training GPT models Default: Blocked User-Agent: GPTBot

ChatGPT-User

Purpose: ChatGPT browsing feature Default: Blocked User-Agent: ChatGPT-User

OpenAI-Web

Purpose: General OpenAI web crawling Default: Blocked User-Agent: OpenAI-Web

Google AI Agents

Google-Extended

Purpose: Google’s AI training crawler Default: Blocked User-Agent: Google-Extended

Bard-Web

Purpose: Google Bard web crawling Default: Blocked User-Agent: Bard-Web

Gemini-Crawler

Purpose: Google Gemini model training Default: Blocked User-Agent: Gemini-Crawler

Anthropic Agents

Claude-Web

Purpose: Anthropic’s Claude web crawler Default: Blocked User-Agent: Claude-Web

Anthropic-Bot

Purpose: General Anthropic web crawling Default: Blocked User-Agent: Anthropic-Bot

Other AI Services

PerplexityBot

Purpose: Perplexity AI search engine Default: Blocked User-Agent: PerplexityBot

Copilot-Web

Purpose: GitHub Copilot web crawling Default: Blocked User-Agent: Copilot-Web

AI-Writing-Tools

Purpose: Various AI writing and content generation tools Default: Blocked User-Agent: Various

Agent Management

Category-Based Management

1

Select Categories

Use category checkboxes to block or allow entire groups of AI agents at once.
2

Individual Agent Control

Fine-tune by selecting or deselecting specific agents within categories.
3

Custom Agent Addition

Add custom user-agent strings for agents not in the database.
4

Whitelist Exceptions

Create exceptions for specific agents you want to allow.

Agent Status

Each AI agent has a configurable status:

Blocked

Status: Agent is blocked from accessing your content Response: 403 Forbidden Logging: Blocked requests are logged

Allowed

Status: Agent can access your content Response: Normal content delivery Logging: Access is logged for monitoring

Monitored

Status: Agent is allowed but closely monitored Response: Normal content delivery with enhanced logging Logging: Detailed access logs maintained

Configuration Options

Global Blocking Settings

Default Policy

Set the default behavior for new or unknown AI agents:
  • Block by Default
  • Allow by Default
  • Monitor by Default
Policy: Block all AI agents unless specifically allowed Use when: You want maximum protection Security: Highest

Blocking Response

Configure what happens when an AI agent is blocked:

403 Forbidden

Response: Standard HTTP 403 error Message: “Access Denied” Use for: Most websites

Custom Response

Response: Custom error page or message Message: Your custom content Use for: Branded error pages

Redirect

Response: Redirect to another page Message: Redirect to robots.txt or policy page Use for: Educational purposes

Advanced Blocking Rules

Time-Based Blocking

Block AI agents during specific time periods:
# Example: Block AI agents during business hours
Time: 09:00-17:00
Action: Block all AI agents
Reason: Business hours protection

IP-Based Blocking

Block AI agents from specific IP addresses or ranges:
# Example: Block AI agents from specific countries
IP Range: 192.168.1.0/24
Action: Block all AI agents
Reason: Geographic restrictions

Rate-Limited Blocking

Block AI agents that exceed request rate limits:
# Example: Block AI agents making too many requests
Rate Limit: 100 requests per hour
Action: Block for 1 hour
Reason: Rate limiting protection

Custom Agent Management

Adding Custom Agents

1

Identify the Agent

Use browser developer tools or server logs to identify the user-agent string.
2

Add to Database

Add the agent to your custom agent database with appropriate metadata.
3

Set Blocking Policy

Configure whether to block, allow, or monitor the new agent.
4

Test Configuration

Verify that the new agent is properly handled.

Custom Agent Configuration

# Example custom agent configuration
Agent Name: CustomAI-Bot
User-Agent: CustomAI-Bot/1.0
Category: Custom
Default Action: Block
Description: Custom AI agent for specific use case
Last Updated: 2024-01-15

Agent Metadata

Each agent in the database includes:

Monitoring and Analytics

Real-Time Monitoring

Track AI agent activity in real-time:

Live Dashboard

Shows: Current AI agent activity Updates: Real-time Purpose: Immediate threat assessment

Blocked Requests

Shows: Recently blocked AI agent requests Updates: Real-time Purpose: Security monitoring

Analytics and Reporting

Daily Reports

1

Agent Activity Summary

Overview of all AI agent activity for the day
2

Blocked Requests Count

Number of blocked requests by agent type
3

Top Blocked Agents

Most frequently blocked AI agents
4

Geographic Distribution

Geographic distribution of blocked requests

Weekly Reports

Trend Analysis

Content: AI agent activity trends over time Purpose: Identify patterns and changes

Threat Assessment

Content: Security threat level assessment Purpose: Evaluate protection effectiveness

Alert System

Configure alerts for important events:

Performance Optimization

Blocking Performance

The plugin is optimized for high-performance blocking:

Fast Lookups

Method: In-memory database caching Speed: < 1ms per request Memory: Optimized for efficiency

Minimal Overhead

Impact: < 5ms added to page load CPU: < 2% additional usage Memory: < 10MB additional usage

Caching Strategy

1

Agent Database Caching

Cache the AI agent database in memory for fast lookups
2

Blocking Rules Caching

Cache blocking rules to avoid repeated processing
3

Response Caching

Cache blocking responses for common scenarios
4

Analytics Caching

Cache analytics data for faster reporting

Troubleshooting

Common Issues

Possible causes:
  • Agent not in database
  • Blocking not enabled
  • Server configuration issues
  • Caching problems
Solutions:
  • Add agent to custom database
  • Verify blocking is enabled
  • Check .htaccess rules
  • Clear all caches
Possible causes:
  • Overly broad user-agent matching
  • Incorrect agent identification
  • Browser extensions mimicking AI agents
Solutions:
  • Refine user-agent matching rules
  • Add exceptions for legitimate users
  • Review blocked request logs
  • Adjust blocking sensitivity
Possible causes:
  • Large agent database
  • Inefficient blocking rules
  • Server resource limitations
Solutions:
  • Optimize agent database
  • Simplify blocking rules
  • Enable caching
  • Upgrade server resources

Debugging Tools

Blocking Test Tool

Use the built-in test tool to verify blocking:
1

Access Test Tool

Go to LLMTAG > Tools > Blocking Test
2

Enter User-Agent

Enter the user-agent string you want to test
3

Run Test

Click Test Blocking to see if the agent would be blocked
4

Review Results

Check the test results and adjust configuration if needed

Log Analysis

Analyze blocking logs to identify issues:
# Example log analysis
grep "LLMTAG-Blocked" /var/log/nginx/access.log | tail -20

Best Practices

Agent Management

Regular Updates

Keep the AI agent database updated with the latest agents and threats

Selective Blocking

Block only the agents that pose a real threat to your content

Monitor Effectiveness

Regularly review analytics to ensure blocking is working effectively

Test Changes

Test blocking changes in a staging environment before applying to production

Performance Optimization

Follow these tips to optimize blocking performance:
  • Enable caching for the AI agent database
  • Use efficient blocking rules to minimize processing overhead
  • Monitor resource usage and adjust as needed
  • Regular cleanup of old analytics data
  • Optimize server configuration for blocking operations

Security Considerations

Always follow security best practices when configuring AI agent blocking:
  • Regular security updates for the plugin and WordPress
  • Monitor for new threats and update blocking rules accordingly
  • Use strong authentication for admin access
  • Backup configurations before making changes
  • Test blocking rules to ensure they work as expected
I