Basic Examples
Simple Blog - No AI Training
A personal blog that wants to allow search indexing but block AI training:
# LLMTAG Protocol v3.0
# Content Usage Policy for myblog.com
# For more information, visit: https://docs.llmtag.org
# REQUIRED: Protocol version declaration
spec_version: 3.0
# AI Training Policy: Block AI model training to protect personal content
# Values: allow (permit training) | disallow (block training)
ai_training_data: disallow
# AI Use Policy: Allow only search indexing, block other AI usage
# Values: search_indexing (for search engines) | generative_synthesis (for AI responses) | research (for academic research)
ai_use: search_indexing
# Attribution Requirements: Require credit when content is used
# Values: required (must credit) | optional (preferred) | none (no credit needed)
attribution: required
attribution_format: "Source: MyBlog.com (https://myblog.com)"
# Contact Information: Where AI agents can reach you for questions
contact: [email protected]
# Protocol Information: Metadata about this policy file
protocol_name: LLMTAG
protocol_version: 3.0
last_updated: 2024-10-11
policy_effective_date: 2024-10-11
Open Source Documentation
Documentation site that welcomes AI training for educational purposes:
# LLMTAG Protocol v3.0
# Content Usage Policy for docs.example.com
# For more information, visit: https://docs.llmtag.org
# REQUIRED: Protocol version declaration
spec_version: 3.0
# AI Training Policy: Allow AI model training for educational purposes
# Values: allow (permit training) | disallow (block training)
ai_training_data: allow
# AI Use Policy: Allow all AI usage types for maximum accessibility
# Values: search_indexing (for search engines) | generative_synthesis (for AI responses) | research (for academic research)
ai_use: search_indexing, generative_synthesis, research
# Attribution Requirements: Require proper credit when content is used
# Values: required (must credit) | optional (preferred) | none (no credit needed)
attribution: required
attribution_format: "Source: Example Docs (https://docs.example.com)"
# Contact Information: Where AI agents can reach you for questions
contact: [email protected]
documentation: https://docs.example.com
# Protocol Information: Metadata about this policy file
protocol_name: LLMTAG
protocol_version: 3.0
last_updated: 2024-10-11
policy_effective_date: 2024-10-11
# Research and Academic Use: Explicitly allow research and academic usage
# Values: allow (permit) | disallow (block)
research_use: allow
academic_use: allow
News Website - Selective AI Use
A news site that allows search indexing and generative synthesis but blocks training:
spec_version: 3.0
ai_training_data: disallow
ai_use: search_indexing, generative_synthesis
Advanced Examples
E-commerce Site with Premium Content
An online store with different policies for public and premium content:
# LLMTAG Protocol v3.0
# Content Usage Policy for store.example.com
# For more information, visit: https://docs.llmtag.org
# REQUIRED: Protocol version declaration
spec_version: 3.0
# Global AI Training Policy: Block training by default, allow per path/agent
# Values: allow (permit training) | disallow (block training)
ai_training_data: disallow
# Global AI Use Policy: Allow search indexing by default
# Values: search_indexing (for search engines) | generative_synthesis (for AI responses) | research (for academic research)
ai_use: search_indexing
# Contact Information: Where AI agents can reach you for questions
contact: [email protected]
documentation: https://docs.store.example.com
# Protocol Information: Metadata about this policy file
protocol_name: LLMTAG
protocol_version: 3.0
last_updated: 2024-10-11
policy_effective_date: 2024-10-11
# Path-based Policies: Different rules for different content sections
# Path: /products/ - Allow training for product descriptions (helps discovery)
path: /products/
ai_training_data: allow
ai_use: search_indexing, generative_synthesis
# Path: /premium/ - Block training for premium content
path: /premium/
ai_training_data: disallow
ai_use: search_indexing
# AI Agent Specific Policies: Special rules for specific AI agents
# User-agent: AcademicBot - Allow research access to premium content
user_agent: AcademicBot
ai_training_data: allow
ai_use: research
path: /premium/
ai_training_data: allow
ai_use: research
# Commercial Use Policy: Allow commercial use with proper attribution
# Values: allow (permit) | disallow (block)
commercial_use: allow
commercial_attribution: required
# Compliance and Monitoring: Track usage and violations
# Values: enabled (track) | disabled (no tracking)
compliance_monitoring: enabled
usage_tracking: enabled
violation_reporting: [email protected]
Educational Institution
A university website with different policies for different content types:
spec_version: 3.0
# Global policy: Allow AI training for educational content
ai_training_data: allow
ai_use: search_indexing, generative_synthesis, research
# Protect student data and private information
Path: /student-portal/
ai_training_data: disallow
ai_use: search_indexing
# Allow research papers to be used for AI training
Path: /research/
ai_training_data: allow
ai_use: search_indexing, generative_synthesis, research
# Course materials: allow training but not commercial use
Path: /courses/
ai_training_data: allow
ai_use: search_indexing, generative_synthesis, research
# Block commercial AI products from using course materials
User-agent: CommercialAI
Path: /courses/
ai_training_data: disallow
ai_use: search_indexing
Creative Portfolio
An artist’s portfolio with selective AI permissions:
spec_version: 3.0
# Global policy: Block AI training to protect artistic work
ai_training_data: disallow
ai_use: search_indexing
# Allow portfolio pieces for search indexing only
Path: /portfolio/
ai_training_data: disallow
ai_use: search_indexing
# Allow research bots to study artistic techniques
User-agent: ArtResearchBot
ai_training_data: allow
ai_use: research
# Block commercial AI from using artwork
User-agent: CommercialAI
ai_training_data: disallow
ai_use: search_indexing
Corporate Website
A company website with different policies for public and internal content:
spec_version: 3.0
# Global policy: Allow search indexing, block training
ai_training_data: disallow
ai_use: search_indexing
# Public blog: allow AI training for thought leadership
Path: /blog/
ai_training_data: allow
ai_use: search_indexing, generative_synthesis
# Product pages: allow training to help with discovery
Path: /products/
ai_training_data: allow
ai_use: search_indexing, generative_synthesis
# Internal documentation: block all AI access
Path: /internal/
ai_training_data: disallow
ai_use: search_indexing
# Financial reports: search indexing only
Path: /investor-relations/
ai_training_data: disallow
ai_use: search_indexing
Agent-Specific Examples
OpenAI-Friendly Configuration
A site that specifically allows OpenAI’s crawlers:
spec_version: 3.0
# Global policy: Block AI training by default
ai_training_data: disallow
ai_use: search_indexing
# Allow OpenAI's GPTBot for training
User-agent: GPTBot
ai_training_data: allow
ai_use: search_indexing, generative_synthesis
# Allow ChatGPT-User for browsing
User-agent: ChatGPT-User
ai_training_data: disallow
ai_use: search_indexing, generative_synthesis
Research-Focused Configuration
A site that prioritizes research and academic use:
spec_version: 3.0
# Global policy: Allow research use
ai_training_data: allow
ai_use: search_indexing, generative_synthesis, research
# Block commercial AI products
User-agent: CommercialAI
ai_training_data: disallow
ai_use: search_indexing
User-agent: PaidAI
ai_training_data: disallow
ai_use: search_indexing
# Allow academic and research bots
User-agent: AcademicBot
ai_training_data: allow
ai_use: search_indexing, generative_synthesis, research
User-agent: ResearchBot
ai_training_data: allow
ai_use: research
Selective Agent Blocking
A site that blocks specific AI agents while allowing others:
spec_version: 3.0
# Global policy: Allow AI training and use
ai_training_data: allow
ai_use: search_indexing, generative_synthesis, research
# Block specific commercial AI agents
User-agent: CommercialAI
ai_training_data: disallow
ai_use: search_indexing
User-agent: PaidAI
ai_training_data: disallow
ai_use: search_indexing
# Block AI agents that don't respect policies
User-agent: BadBot
ai_training_data: disallow
ai_use: search_indexing
Path-Based Examples
Content Type Segmentation
Different policies for different content types:
spec_version: 3.0
# Global policy
ai_training_data: disallow
ai_use: search_indexing
# Public blog posts: allow AI training
Path: /blog/
ai_training_data: allow
ai_use: search_indexing, generative_synthesis
# Product documentation: allow training for better AI assistance
Path: /docs/
ai_training_data: allow
ai_use: search_indexing, generative_synthesis
# User-generated content: block training
Path: /community/
ai_training_data: disallow
ai_use: search_indexing
# Premium content: block all AI use except search
Path: /premium/
ai_training_data: disallow
ai_use: search_indexing
Geographic Content Policies
Different policies for different regions:
spec_version: 3.0
# Global policy
ai_training_data: disallow
ai_use: search_indexing
# US content: allow AI training
Path: /us/
ai_training_data: allow
ai_use: search_indexing, generative_synthesis
# EU content: block training due to GDPR concerns
Path: /eu/
ai_training_data: disallow
ai_use: search_indexing
# Global content: allow training
Path: /global/
ai_training_data: allow
ai_use: search_indexing, generative_synthesis
Complex Scenarios
A platform serving multiple clients with different policies:
spec_version: 3.0
# Global policy: Block AI training by default
ai_training_data: disallow
ai_use: search_indexing
# Client A: Allow AI training
Path: /client-a/
ai_training_data: allow
ai_use: search_indexing, generative_synthesis
# Client B: Block AI training
Path: /client-b/
ai_training_data: disallow
ai_use: search_indexing
# Client C: Allow research only
Path: /client-c/
ai_training_data: allow
ai_use: search_indexing, research
# Platform documentation: Allow training
Path: /platform-docs/
ai_training_data: allow
ai_use: search_indexing, generative_synthesis
Time-Based Policies
Policies that change based on content age:
spec_version: 3.0
# Global policy
ai_training_data: disallow
ai_use: search_indexing
# Recent content: block AI training
Path: /2024/
ai_training_data: disallow
ai_use: search_indexing
# Older content: allow AI training
Path: /2023/
ai_training_data: allow
ai_use: search_indexing, generative_synthesis
Path: /2022/
ai_training_data: allow
ai_use: search_indexing, generative_synthesis
Verification Examples
With Cryptographic Verification
Advanced implementation with verification challenges:
spec_version: 3.0
ai_training_data: disallow
ai_use: search_indexing
# Require verification for premium content access
Path: /premium/
verification_challenge: sha256:a1b2c3d4e5f6789012345678901234567890abcdef1234567890abcdef123456
ai_training_data: allow
ai_use: search_indexing, generative_synthesis
Testing Your Implementation
Validation Checklist
Before deploying your llmtag.txt file:
Check file accessibility
Verify your file is accessible at https://yourdomain.com/llmtag.txt
Validate syntax
Ensure all directives follow the correct format and syntax
Test with different agents
Use different user-agent strings to test agent-specific rules
Verify path matching
Test that path-based rules work correctly for different URLs
Check inheritance
Confirm that inheritance works as expected for unspecified directives
Common Mistakes to Avoid
Missing spec_version: Always include spec_version: 3.0 at the top of your file.
Incorrect user-agent strings: Use the exact user-agent string reported by the AI agent, not partial matches.
Path syntax errors: Use forward slashes for paths and ensure proper prefix matching.
Conflicting directives: Be careful with inheritance - more specific rules override global ones.