Skip to main content

Basic Examples

Simple Blog - No AI Training

A personal blog that wants to allow search indexing but block AI training:
# LLMTAG Protocol v3.0
# Content Usage Policy for myblog.com
# For more information, visit: https://docs.llmtag.org

# REQUIRED: Protocol version declaration
spec_version: 3.0

# AI Training Policy: Block AI model training to protect personal content
# Values: allow (permit training) | disallow (block training)
ai_training_data: disallow

# AI Use Policy: Allow only search indexing, block other AI usage
# Values: search_indexing (for search engines) | generative_synthesis (for AI responses) | research (for academic research)
ai_use: search_indexing

# Attribution Requirements: Require credit when content is used
# Values: required (must credit) | optional (preferred) | none (no credit needed)
attribution: required
attribution_format: "Source: MyBlog.com (https://myblog.com)"

# Contact Information: Where AI agents can reach you for questions
contact: [email protected]

# Protocol Information: Metadata about this policy file
protocol_name: LLMTAG
protocol_version: 3.0
last_updated: 2024-10-11
policy_effective_date: 2024-10-11

Open Source Documentation

Documentation site that welcomes AI training for educational purposes:
# LLMTAG Protocol v3.0
# Content Usage Policy for docs.example.com
# For more information, visit: https://docs.llmtag.org

# REQUIRED: Protocol version declaration
spec_version: 3.0

# AI Training Policy: Allow AI model training for educational purposes
# Values: allow (permit training) | disallow (block training)
ai_training_data: allow

# AI Use Policy: Allow all AI usage types for maximum accessibility
# Values: search_indexing (for search engines) | generative_synthesis (for AI responses) | research (for academic research)
ai_use: search_indexing, generative_synthesis, research

# Attribution Requirements: Require proper credit when content is used
# Values: required (must credit) | optional (preferred) | none (no credit needed)
attribution: required
attribution_format: "Source: Example Docs (https://docs.example.com)"

# Contact Information: Where AI agents can reach you for questions
contact: [email protected]
documentation: https://docs.example.com

# Protocol Information: Metadata about this policy file
protocol_name: LLMTAG
protocol_version: 3.0
last_updated: 2024-10-11
policy_effective_date: 2024-10-11

# Research and Academic Use: Explicitly allow research and academic usage
# Values: allow (permit) | disallow (block)
research_use: allow
academic_use: allow

News Website - Selective AI Use

A news site that allows search indexing and generative synthesis but blocks training:
spec_version: 3.0
ai_training_data: disallow
ai_use: search_indexing, generative_synthesis

Advanced Examples

E-commerce Site with Premium Content

An online store with different policies for public and premium content:
# LLMTAG Protocol v3.0
# Content Usage Policy for store.example.com
# For more information, visit: https://docs.llmtag.org

# REQUIRED: Protocol version declaration
spec_version: 3.0

# Global AI Training Policy: Block training by default, allow per path/agent
# Values: allow (permit training) | disallow (block training)
ai_training_data: disallow

# Global AI Use Policy: Allow search indexing by default
# Values: search_indexing (for search engines) | generative_synthesis (for AI responses) | research (for academic research)
ai_use: search_indexing

# Contact Information: Where AI agents can reach you for questions
contact: [email protected]
documentation: https://docs.store.example.com

# Protocol Information: Metadata about this policy file
protocol_name: LLMTAG
protocol_version: 3.0
last_updated: 2024-10-11
policy_effective_date: 2024-10-11

# Path-based Policies: Different rules for different content sections
# Path: /products/ - Allow training for product descriptions (helps discovery)
path: /products/
ai_training_data: allow
ai_use: search_indexing, generative_synthesis

# Path: /premium/ - Block training for premium content
path: /premium/
ai_training_data: disallow
ai_use: search_indexing

# AI Agent Specific Policies: Special rules for specific AI agents
# User-agent: AcademicBot - Allow research access to premium content
user_agent: AcademicBot
ai_training_data: allow
ai_use: research

path: /premium/
ai_training_data: allow
ai_use: research

# Commercial Use Policy: Allow commercial use with proper attribution
# Values: allow (permit) | disallow (block)
commercial_use: allow
commercial_attribution: required

# Compliance and Monitoring: Track usage and violations
# Values: enabled (track) | disabled (no tracking)
compliance_monitoring: enabled
usage_tracking: enabled
violation_reporting: [email protected]

Educational Institution

A university website with different policies for different content types:
spec_version: 3.0

# Global policy: Allow AI training for educational content
ai_training_data: allow
ai_use: search_indexing, generative_synthesis, research

# Protect student data and private information
Path: /student-portal/
ai_training_data: disallow
ai_use: search_indexing

# Allow research papers to be used for AI training
Path: /research/
ai_training_data: allow
ai_use: search_indexing, generative_synthesis, research

# Course materials: allow training but not commercial use
Path: /courses/
ai_training_data: allow
ai_use: search_indexing, generative_synthesis, research

# Block commercial AI products from using course materials
User-agent: CommercialAI
Path: /courses/
ai_training_data: disallow
ai_use: search_indexing

Creative Portfolio

An artist’s portfolio with selective AI permissions:
spec_version: 3.0

# Global policy: Block AI training to protect artistic work
ai_training_data: disallow
ai_use: search_indexing

# Allow portfolio pieces for search indexing only
Path: /portfolio/
ai_training_data: disallow
ai_use: search_indexing

# Allow research bots to study artistic techniques
User-agent: ArtResearchBot
ai_training_data: allow
ai_use: research

# Block commercial AI from using artwork
User-agent: CommercialAI
ai_training_data: disallow
ai_use: search_indexing

Corporate Website

A company website with different policies for public and internal content:
spec_version: 3.0

# Global policy: Allow search indexing, block training
ai_training_data: disallow
ai_use: search_indexing

# Public blog: allow AI training for thought leadership
Path: /blog/
ai_training_data: allow
ai_use: search_indexing, generative_synthesis

# Product pages: allow training to help with discovery
Path: /products/
ai_training_data: allow
ai_use: search_indexing, generative_synthesis

# Internal documentation: block all AI access
Path: /internal/
ai_training_data: disallow
ai_use: search_indexing

# Financial reports: search indexing only
Path: /investor-relations/
ai_training_data: disallow
ai_use: search_indexing

Agent-Specific Examples

OpenAI-Friendly Configuration

A site that specifically allows OpenAI’s crawlers:
spec_version: 3.0

# Global policy: Block AI training by default
ai_training_data: disallow
ai_use: search_indexing

# Allow OpenAI's GPTBot for training
User-agent: GPTBot
ai_training_data: allow
ai_use: search_indexing, generative_synthesis

# Allow ChatGPT-User for browsing
User-agent: ChatGPT-User
ai_training_data: disallow
ai_use: search_indexing, generative_synthesis

Research-Focused Configuration

A site that prioritizes research and academic use:
spec_version: 3.0

# Global policy: Allow research use
ai_training_data: allow
ai_use: search_indexing, generative_synthesis, research

# Block commercial AI products
User-agent: CommercialAI
ai_training_data: disallow
ai_use: search_indexing

User-agent: PaidAI
ai_training_data: disallow
ai_use: search_indexing

# Allow academic and research bots
User-agent: AcademicBot
ai_training_data: allow
ai_use: search_indexing, generative_synthesis, research

User-agent: ResearchBot
ai_training_data: allow
ai_use: research

Selective Agent Blocking

A site that blocks specific AI agents while allowing others:
spec_version: 3.0

# Global policy: Allow AI training and use
ai_training_data: allow
ai_use: search_indexing, generative_synthesis, research

# Block specific commercial AI agents
User-agent: CommercialAI
ai_training_data: disallow
ai_use: search_indexing

User-agent: PaidAI
ai_training_data: disallow
ai_use: search_indexing

# Block AI agents that don't respect policies
User-agent: BadBot
ai_training_data: disallow
ai_use: search_indexing

Path-Based Examples

Content Type Segmentation

Different policies for different content types:
spec_version: 3.0

# Global policy
ai_training_data: disallow
ai_use: search_indexing

# Public blog posts: allow AI training
Path: /blog/
ai_training_data: allow
ai_use: search_indexing, generative_synthesis

# Product documentation: allow training for better AI assistance
Path: /docs/
ai_training_data: allow
ai_use: search_indexing, generative_synthesis

# User-generated content: block training
Path: /community/
ai_training_data: disallow
ai_use: search_indexing

# Premium content: block all AI use except search
Path: /premium/
ai_training_data: disallow
ai_use: search_indexing

Geographic Content Policies

Different policies for different regions:
spec_version: 3.0

# Global policy
ai_training_data: disallow
ai_use: search_indexing

# US content: allow AI training
Path: /us/
ai_training_data: allow
ai_use: search_indexing, generative_synthesis

# EU content: block training due to GDPR concerns
Path: /eu/
ai_training_data: disallow
ai_use: search_indexing

# Global content: allow training
Path: /global/
ai_training_data: allow
ai_use: search_indexing, generative_synthesis

Complex Scenarios

Multi-Tenant Platform

A platform serving multiple clients with different policies:
spec_version: 3.0

# Global policy: Block AI training by default
ai_training_data: disallow
ai_use: search_indexing

# Client A: Allow AI training
Path: /client-a/
ai_training_data: allow
ai_use: search_indexing, generative_synthesis

# Client B: Block AI training
Path: /client-b/
ai_training_data: disallow
ai_use: search_indexing

# Client C: Allow research only
Path: /client-c/
ai_training_data: allow
ai_use: search_indexing, research

# Platform documentation: Allow training
Path: /platform-docs/
ai_training_data: allow
ai_use: search_indexing, generative_synthesis

Time-Based Policies

Policies that change based on content age:
spec_version: 3.0

# Global policy
ai_training_data: disallow
ai_use: search_indexing

# Recent content: block AI training
Path: /2024/
ai_training_data: disallow
ai_use: search_indexing

# Older content: allow AI training
Path: /2023/
ai_training_data: allow
ai_use: search_indexing, generative_synthesis

Path: /2022/
ai_training_data: allow
ai_use: search_indexing, generative_synthesis

Verification Examples

With Cryptographic Verification

Advanced implementation with verification challenges:
spec_version: 3.0
ai_training_data: disallow
ai_use: search_indexing

# Require verification for premium content access
Path: /premium/
verification_challenge: sha256:a1b2c3d4e5f6789012345678901234567890abcdef1234567890abcdef123456
ai_training_data: allow
ai_use: search_indexing, generative_synthesis

Testing Your Implementation

Validation Checklist

Before deploying your llmtag.txt file:
1

Check file accessibility

Verify your file is accessible at https://yourdomain.com/llmtag.txt
2

Validate syntax

Ensure all directives follow the correct format and syntax
3

Test with different agents

Use different user-agent strings to test agent-specific rules
4

Verify path matching

Test that path-based rules work correctly for different URLs
5

Check inheritance

Confirm that inheritance works as expected for unspecified directives

Common Mistakes to Avoid

Missing spec_version: Always include spec_version: 3.0 at the top of your file.
Incorrect user-agent strings: Use the exact user-agent string reported by the AI agent, not partial matches.
Path syntax errors: Use forward slashes for paths and ensure proper prefix matching.
Conflicting directives: Be careful with inheritance - more specific rules override global ones.