Skip to main content

AI Agent Compliance

This guide is for AI companies, researchers, and developers who want to implement LLMTAG protocol compliance in their AI agents and crawlers.

Become LLMTAG Compliant

Respect Publisher PoliciesBuild TrustIndustry LeadershipEthical AI Practices

Why Implement LLMTAG Compliance?

Benefits for AI Companies

Legal Clarity

Clear, machine-readable policies reduce legal uncertainty and compliance risks.

Ethical Compliance

Respect publisher preferences and build trust with content creators.

Implementation Simplicity

Standardized format makes compliance straightforward to implement.

Industry Leadership

Be part of establishing ethical AI practices from the ground up.

Benefits for the AI Ecosystem

Sustainable AI

Create a sustainable relationship between AI and content creation.

Trust Building

Build trust between AI companies and content creators.

Innovation Protection

Protect content creators while enabling AI innovation.

Global Standard

Establish a universal protocol that works across all platforms.

Implementation Requirements

Core Compliance Requirements

AI agents that claim compliance with the LLMTAG protocol must:

Discovery Mechanism

AI agents should automatically check for llmtag.txt by making a GET request to:
https://[domain]/llmtag.txt
The discovery mechanism follows the same pattern as robots.txt, making it familiar and discoverable for both humans and automated systems.

Implementation Guide

Step 1: Discovery

1

Check for llmtag.txt

Before processing any content from a domain, make a GET request to https://domain.com/llmtag.txt.
2

Handle HTTP Responses

Process different HTTP response codes appropriately:
  • 200 OK: Parse the file content
  • 404 Not Found: Apply default policies
  • 403 Forbidden: Apply default policies
  • 500 Server Error: Apply default policies
3

Cache Results

Cache the parsed policies to avoid repeated requests for the same domain.

Step 2: Parsing

1

Validate File Format

Ensure the file starts with spec_version: 3.0 or is otherwise valid.
2

Parse Directives

Parse all directives according to the specification.
3

Handle Scope Blocks

Process User-agent and Path blocks to determine applicable policies.
4

Apply Inheritance

Apply directive inheritance from global to specific scopes.

Step 3: Policy Application

1

Identify Agent Scope

Determine which User-agent blocks apply to your agent.
2

Identify Path Scope

Determine which Path blocks apply to the content being accessed.
3

Apply Policies

Apply the most specific applicable policies.
4

Log Actions

Log all compliance actions for audit and transparency purposes.

Code Examples

Python Implementation

import requests
import re
from typing import Dict, List, Optional

class LLMTAGParser:
    def __init__(self):
        self.cache = {}
    
    def get_policies(self, domain: str, user_agent: str, path: str) -> Dict:
        """Get LLMTAG policies for a domain, user agent, and path."""
        
        # Check cache first
        cache_key = f"{domain}:{user_agent}:{path}"
        if cache_key in self.cache:
            return self.cache[cache_key]
        
        # Fetch llmtag.txt
        try:
            response = requests.get(f"https://{domain}/llmtag.txt", timeout=10)
            if response.status_code != 200:
                return self._get_default_policies()
            
            content = response.text
            policies = self._parse_llmtag(content, user_agent, path)
            
            # Cache the result
            self.cache[cache_key] = policies
            return policies
            
        except Exception as e:
            print(f"Error fetching llmtag.txt for {domain}: {e}")
            return self._get_default_policies()
    
    def _parse_llmtag(self, content: str, user_agent: str, path: str) -> Dict:
        """Parse llmtag.txt content and return applicable policies."""
        
        lines = content.strip().split('\n')
        policies = self._get_default_policies()
        
        current_scope = None
        current_agent = None
        current_path = None
        
        for line in lines:
            line = line.strip()
            if not line or line.startswith('#'):
                continue
            
            if line.startswith('spec_version:'):
                # Validate specification version
                version = line.split(':', 1)[1].strip()
                if version != '3.0':
                    print(f"Unsupported LLMTAG version: {version}")
                    return self._get_default_policies()
            
            elif line.startswith('User-agent:'):
                current_agent = line.split(':', 1)[1].strip()
                current_scope = 'agent'
            
            elif line.startswith('Path:'):
                current_path = line.split(':', 1)[1].strip()
                current_scope = 'path'
            
            elif line.startswith('ai_training_data:'):
                value = line.split(':', 1)[1].strip()
                if self._applies_to_current_scope(user_agent, path, current_agent, current_path):
                    policies['ai_training_data'] = value
            
            elif line.startswith('ai_use:'):
                value = line.split(':', 1)[1].strip()
                if self._applies_to_current_scope(user_agent, path, current_agent, current_path):
                    policies['ai_use'] = [v.strip() for v in value.split(',')]
        
        return policies
    
    def _applies_to_current_scope(self, user_agent: str, path: str, 
                                current_agent: str, current_path: str) -> bool:
        """Check if current scope applies to the given user agent and path."""
        
        if current_agent and current_agent.lower() not in user_agent.lower():
            return False
        
        if current_path and not path.startswith(current_path):
            return False
        
        return True
    
    def _get_default_policies(self) -> Dict:
        """Return default policies when no llmtag.txt is found."""
        return {
            'ai_training_data': 'allow',
            'ai_use': ['search_indexing']
        }

# Usage example
parser = LLMTAGParser()
policies = parser.get_policies('example.com', 'MyAI-Bot/1.0', '/blog/post-1')

if policies['ai_training_data'] == 'disallow':
    print("AI training not allowed for this content")
else:
    print("AI training allowed")

if 'generative_synthesis' not in policies['ai_use']:
    print("Generative synthesis not allowed")
else:
    print("Generative synthesis allowed")

JavaScript Implementation

class LLMTAGParser {
    constructor() {
        this.cache = new Map();
    }
    
    async getPolicies(domain, userAgent, path) {
        // Check cache first
        const cacheKey = `${domain}:${userAgent}:${path}`;
        if (this.cache.has(cacheKey)) {
            return this.cache.get(cacheKey);
        }
        
        try {
            // Fetch llmtag.txt
            const response = await fetch(`https://${domain}/llmtag.txt`, {
                method: 'GET',
                timeout: 10000
            });
            
            if (!response.ok) {
                return this.getDefaultPolicies();
            }
            
            const content = await response.text();
            const policies = this.parseLLMTAG(content, userAgent, path);
            
            // Cache the result
            this.cache.set(cacheKey, policies);
            return policies;
            
        } catch (error) {
            console.error(`Error fetching llmtag.txt for ${domain}:`, error);
            return this.getDefaultPolicies();
        }
    }
    
    parseLLMTAG(content, userAgent, path) {
        const lines = content.trim().split('\n');
        const policies = this.getDefaultPolicies();
        
        let currentScope = null;
        let currentAgent = null;
        let currentPath = null;
        
        for (const line of lines) {
            const trimmedLine = line.trim();
            if (!trimmedLine || trimmedLine.startsWith('#')) {
                continue;
            }
            
            if (trimmedLine.startsWith('spec_version:')) {
                const version = trimmedLine.split(':')[1].trim();
                if (version !== '3.0') {
                    console.warn(`Unsupported LLMTAG version: ${version}`);
                    return this.getDefaultPolicies();
                }
            }
            
            else if (trimmedLine.startsWith('User-agent:')) {
                currentAgent = trimmedLine.split(':')[1].trim();
                currentScope = 'agent';
            }
            
            else if (trimmedLine.startsWith('Path:')) {
                currentPath = trimmedLine.split(':')[1].trim();
                currentScope = 'path';
            }
            
            else if (trimmedLine.startsWith('ai_training_data:')) {
                const value = trimmedLine.split(':')[1].trim();
                if (this.appliesToCurrentScope(userAgent, path, currentAgent, currentPath)) {
                    policies.ai_training_data = value;
                }
            }
            
            else if (trimmedLine.startsWith('ai_use:')) {
                const value = trimmedLine.split(':')[1].trim();
                if (this.appliesToCurrentScope(userAgent, path, currentAgent, currentPath)) {
                    policies.ai_use = value.split(',').map(v => v.trim());
                }
            }
        }
        
        return policies;
    }
    
    appliesToCurrentScope(userAgent, path, currentAgent, currentPath) {
        if (currentAgent && !userAgent.toLowerCase().includes(currentAgent.toLowerCase())) {
            return false;
        }
        
        if (currentPath && !path.startsWith(currentPath)) {
            return false;
        }
        
        return true;
    }
    
    getDefaultPolicies() {
        return {
            ai_training_data: 'allow',
            ai_use: ['search_indexing']
        };
    }
}

// Usage example
const parser = new LLMTAGParser();
const policies = await parser.getPolicies('example.com', 'MyAI-Bot/1.0', '/blog/post-1');

if (policies.ai_training_data === 'disallow') {
    console.log('AI training not allowed for this content');
} else {
    console.log('AI training allowed');
}

if (!policies.ai_use.includes('generative_synthesis')) {
    console.log('Generative synthesis not allowed');
} else {
    console.log('Generative synthesis allowed');
}

Compliance Testing

Testing Checklist

1

Test Discovery

Verify that your agent correctly discovers and fetches llmtag.txt files.
2

Test Parsing

Test parsing with various llmtag.txt file formats and edge cases.
3

Test Policy Application

Verify that policies are correctly applied based on user agent and path.
4

Test Error Handling

Ensure graceful handling of inaccessible or malformed files.
5

Test Caching

Verify that caching works correctly and doesn’t cause stale policy issues.

Test Cases

Basic Compliance

Test: Simple llmtag.txt with global policies Expected: Policies applied correctly

Agent-Specific Rules

Test: User-agent blocks with specific policies Expected: Correct policies for matching agents

Path-Based Rules

Test: Path blocks with different policies Expected: Correct policies for matching paths

Error Handling

Test: 404, 403, 500 responses Expected: Default policies applied

Best Practices

Implementation Best Practices

Respect Policies

Actually follow the policies you claim to support, not just check the files.

Be Transparent

Provide clear information about how you handle LLMTAG policies and compliance.

Implement Early

Start implementing LLMTAG compliance now to build trust with content creators.

Provide Audit Trails

Maintain logs of your compliance actions for transparency and accountability.

Performance Considerations

Follow these tips to optimize your LLMTAG implementation:
  • Cache policies to avoid repeated requests
  • Use appropriate timeouts for HTTP requests
  • Handle errors gracefully without breaking functionality
  • Monitor performance and optimize as needed

Community and Support

Getting Help

Certification Program

LLMTAG Compliance Certification

Get CertifiedShow ComplianceBuild TrustIndustry Recognition
LLMTAG is a technical standard that communicates preferences, not legal requirements. However, respecting these preferences can help with legal compliance and ethical AI practices.

Ethical AI Practices

I