AI Agent Compliance
This guide is for AI companies, researchers, and developers who want to implement LLMTAG protocol compliance in their AI agents and crawlers.
Become LLMTAG Compliant Respect Publisher Policies • Build Trust • Industry Leadership • Ethical AI Practices
Why Implement LLMTAG Compliance?
Benefits for AI Companies
Legal Clarity Clear, machine-readable policies reduce legal uncertainty and compliance risks.
Ethical Compliance Respect publisher preferences and build trust with content creators.
Implementation Simplicity Standardized format makes compliance straightforward to implement.
Industry Leadership Be part of establishing ethical AI practices from the ground up.
Benefits for the AI Ecosystem
Sustainable AI Create a sustainable relationship between AI and content creation.
Trust Building Build trust between AI companies and content creators.
Innovation Protection Protect content creators while enabling AI innovation.
Global Standard Establish a universal protocol that works across all platforms.
Implementation Requirements
Core Compliance Requirements
AI agents that claim compliance with the LLMTAG protocol must:
Discovery Mechanism
AI agents should automatically check for llmtag.txt by making a GET request to:
https://[domain]/llmtag.txt
The discovery mechanism follows the same pattern as robots.txt, making it familiar and discoverable for both humans and automated systems.
Implementation Guide
Step 1: Discovery
Check for llmtag.txt
Before processing any content from a domain, make a GET request to https://domain.com/llmtag.txt.
Handle HTTP Responses
Process different HTTP response codes appropriately:
200 OK : Parse the file content
404 Not Found : Apply default policies
403 Forbidden : Apply default policies
500 Server Error : Apply default policies
Cache Results
Cache the parsed policies to avoid repeated requests for the same domain.
Step 2: Parsing
Validate File Format
Ensure the file starts with spec_version: 3.0 or is otherwise valid.
Parse Directives
Parse all directives according to the specification.
Handle Scope Blocks
Process User-agent and Path blocks to determine applicable policies.
Apply Inheritance
Apply directive inheritance from global to specific scopes.
Step 3: Policy Application
Identify Agent Scope
Determine which User-agent blocks apply to your agent.
Identify Path Scope
Determine which Path blocks apply to the content being accessed.
Apply Policies
Apply the most specific applicable policies.
Log Actions
Log all compliance actions for audit and transparency purposes.
Code Examples
Python Implementation
import requests
import re
from typing import Dict, List, Optional
class LLMTAGParser :
def __init__ ( self ):
self .cache = {}
def get_policies ( self , domain : str , user_agent : str , path : str ) -> Dict:
"""Get LLMTAG policies for a domain, user agent, and path."""
# Check cache first
cache_key = f " { domain } : { user_agent } : { path } "
if cache_key in self .cache:
return self .cache[cache_key]
# Fetch llmtag.txt
try :
response = requests.get( f "https:// { domain } /llmtag.txt" , timeout = 10 )
if response.status_code != 200 :
return self ._get_default_policies()
content = response.text
policies = self ._parse_llmtag(content, user_agent, path)
# Cache the result
self .cache[cache_key] = policies
return policies
except Exception as e:
print ( f "Error fetching llmtag.txt for { domain } : { e } " )
return self ._get_default_policies()
def _parse_llmtag ( self , content : str , user_agent : str , path : str ) -> Dict:
"""Parse llmtag.txt content and return applicable policies."""
lines = content.strip().split( ' \n ' )
policies = self ._get_default_policies()
current_scope = None
current_agent = None
current_path = None
for line in lines:
line = line.strip()
if not line or line.startswith( '#' ):
continue
if line.startswith( 'spec_version:' ):
# Validate specification version
version = line.split( ':' , 1 )[ 1 ].strip()
if version != '3.0' :
print ( f "Unsupported LLMTAG version: { version } " )
return self ._get_default_policies()
elif line.startswith( 'User-agent:' ):
current_agent = line.split( ':' , 1 )[ 1 ].strip()
current_scope = 'agent'
elif line.startswith( 'Path:' ):
current_path = line.split( ':' , 1 )[ 1 ].strip()
current_scope = 'path'
elif line.startswith( 'ai_training_data:' ):
value = line.split( ':' , 1 )[ 1 ].strip()
if self ._applies_to_current_scope(user_agent, path, current_agent, current_path):
policies[ 'ai_training_data' ] = value
elif line.startswith( 'ai_use:' ):
value = line.split( ':' , 1 )[ 1 ].strip()
if self ._applies_to_current_scope(user_agent, path, current_agent, current_path):
policies[ 'ai_use' ] = [v.strip() for v in value.split( ',' )]
return policies
def _applies_to_current_scope ( self , user_agent : str , path : str ,
current_agent : str , current_path : str ) -> bool :
"""Check if current scope applies to the given user agent and path."""
if current_agent and current_agent.lower() not in user_agent.lower():
return False
if current_path and not path.startswith(current_path):
return False
return True
def _get_default_policies ( self ) -> Dict:
"""Return default policies when no llmtag.txt is found."""
return {
'ai_training_data' : 'allow' ,
'ai_use' : [ 'search_indexing' ]
}
# Usage example
parser = LLMTAGParser()
policies = parser.get_policies( 'example.com' , 'MyAI-Bot/1.0' , '/blog/post-1' )
if policies[ 'ai_training_data' ] == 'disallow' :
print ( "AI training not allowed for this content" )
else :
print ( "AI training allowed" )
if 'generative_synthesis' not in policies[ 'ai_use' ]:
print ( "Generative synthesis not allowed" )
else :
print ( "Generative synthesis allowed" )
JavaScript Implementation
class LLMTAGParser {
constructor () {
this . cache = new Map ();
}
async getPolicies ( domain , userAgent , path ) {
// Check cache first
const cacheKey = ` ${ domain } : ${ userAgent } : ${ path } ` ;
if ( this . cache . has ( cacheKey )) {
return this . cache . get ( cacheKey );
}
try {
// Fetch llmtag.txt
const response = await fetch ( `https:// ${ domain } /llmtag.txt` , {
method: 'GET' ,
timeout: 10000
});
if ( ! response . ok ) {
return this . getDefaultPolicies ();
}
const content = await response . text ();
const policies = this . parseLLMTAG ( content , userAgent , path );
// Cache the result
this . cache . set ( cacheKey , policies );
return policies ;
} catch ( error ) {
console . error ( `Error fetching llmtag.txt for ${ domain } :` , error );
return this . getDefaultPolicies ();
}
}
parseLLMTAG ( content , userAgent , path ) {
const lines = content . trim (). split ( ' \n ' );
const policies = this . getDefaultPolicies ();
let currentScope = null ;
let currentAgent = null ;
let currentPath = null ;
for ( const line of lines ) {
const trimmedLine = line . trim ();
if ( ! trimmedLine || trimmedLine . startsWith ( '#' )) {
continue ;
}
if ( trimmedLine . startsWith ( 'spec_version:' )) {
const version = trimmedLine . split ( ':' )[ 1 ]. trim ();
if ( version !== '3.0' ) {
console . warn ( `Unsupported LLMTAG version: ${ version } ` );
return this . getDefaultPolicies ();
}
}
else if ( trimmedLine . startsWith ( 'User-agent:' )) {
currentAgent = trimmedLine . split ( ':' )[ 1 ]. trim ();
currentScope = 'agent' ;
}
else if ( trimmedLine . startsWith ( 'Path:' )) {
currentPath = trimmedLine . split ( ':' )[ 1 ]. trim ();
currentScope = 'path' ;
}
else if ( trimmedLine . startsWith ( 'ai_training_data:' )) {
const value = trimmedLine . split ( ':' )[ 1 ]. trim ();
if ( this . appliesToCurrentScope ( userAgent , path , currentAgent , currentPath )) {
policies . ai_training_data = value ;
}
}
else if ( trimmedLine . startsWith ( 'ai_use:' )) {
const value = trimmedLine . split ( ':' )[ 1 ]. trim ();
if ( this . appliesToCurrentScope ( userAgent , path , currentAgent , currentPath )) {
policies . ai_use = value . split ( ',' ). map ( v => v . trim ());
}
}
}
return policies ;
}
appliesToCurrentScope ( userAgent , path , currentAgent , currentPath ) {
if ( currentAgent && ! userAgent . toLowerCase (). includes ( currentAgent . toLowerCase ())) {
return false ;
}
if ( currentPath && ! path . startsWith ( currentPath )) {
return false ;
}
return true ;
}
getDefaultPolicies () {
return {
ai_training_data: 'allow' ,
ai_use: [ 'search_indexing' ]
};
}
}
// Usage example
const parser = new LLMTAGParser ();
const policies = await parser . getPolicies ( 'example.com' , 'MyAI-Bot/1.0' , '/blog/post-1' );
if ( policies . ai_training_data === 'disallow' ) {
console . log ( 'AI training not allowed for this content' );
} else {
console . log ( 'AI training allowed' );
}
if ( ! policies . ai_use . includes ( 'generative_synthesis' )) {
console . log ( 'Generative synthesis not allowed' );
} else {
console . log ( 'Generative synthesis allowed' );
}
Compliance Testing
Testing Checklist
Test Discovery
Verify that your agent correctly discovers and fetches llmtag.txt files.
Test Parsing
Test parsing with various llmtag.txt file formats and edge cases.
Test Policy Application
Verify that policies are correctly applied based on user agent and path.
Test Error Handling
Ensure graceful handling of inaccessible or malformed files.
Test Caching
Verify that caching works correctly and doesn’t cause stale policy issues.
Test Cases
Basic Compliance Test: Simple llmtag.txt with global policies
Expected: Policies applied correctly
Agent-Specific Rules Test: User-agent blocks with specific policies
Expected: Correct policies for matching agents
Path-Based Rules Test: Path blocks with different policies
Expected: Correct policies for matching paths
Error Handling Test: 404, 403, 500 responses
Expected: Default policies applied
Best Practices
Implementation Best Practices
Respect Policies Actually follow the policies you claim to support, not just check the files.
Be Transparent Provide clear information about how you handle LLMTAG policies and compliance.
Implement Early Start implementing LLMTAG compliance now to build trust with content creators.
Provide Audit Trails Maintain logs of your compliance actions for transparency and accountability.
Follow these tips to optimize your LLMTAG implementation:
Cache policies to avoid repeated requests
Use appropriate timeouts for HTTP requests
Handle errors gracefully without breaking functionality
Monitor performance and optimize as needed
Community and Support
Getting Help
Certification Program
LLMTAG Compliance Certification Get Certified • Show Compliance • Build Trust • Industry Recognition
Legal and Ethical Considerations
Legal Framework
LLMTAG is a technical standard that communicates preferences, not legal requirements. However, respecting these preferences can help with legal compliance and ethical AI practices.
Ethical AI Practices