Directives Reference

Core Directives

`spec_version`

Required. Declares the specification version being used.

# REQUIRED: Protocol version declaration
spec_version: 3.0

All llmtag.txt files must include this directive. Files without it are considered invalid and will be ignored by compliant AI agents.

Content Usage Directives

`ai_training_data`

Controls whether content can be used as training data for machine learning models.

# AI Training Policy: Controls whether content can be used for AI model training
# Values: allow (permit training) | disallow (block training)
ai_training_data: disallow

This is the most critical directive for most publishers, as it directly controls whether their content can be used to train AI models.

Examples:

# Block all AI training
ai_training_data: disallow

# Allow AI training
ai_training_data: allow

`ai_use`

Controls specific AI applications and use cases beyond training.

# AI Use Policy: Defines how AI agents can use your content
# Values: search_indexing (for search engines) | generative_synthesis (for AI responses) | research (for academic research)
ai_use: search_indexing, generative_synthesis

Examples:

# Allow only search indexing
ai_use: search_indexing

# Allow multiple use cases
ai_use: search_indexing, generative_synthesis, research

# Allow all use cases
ai_use: search_indexing, generative_synthesis, commercial_products, research, personal_assistance

Advanced Directives

`verification_challenge`

Establishes a cryptographic handshake to verify that an AI agent has actually read and understood the rules.

verification_challenge: sha256:abc123def456...

This is an advanced feature for publishers who want to implement verification mechanisms. Most implementations can ignore this directive.

Example:

# Require SHA-256 verification
verification_challenge: sha256:a1b2c3d4e5f6789012345678901234567890abcdef1234567890abcdef123456

Scope Directives

`User-agent`

Defines a scope block for specific AI agents or crawlers.

User-agent: GPTBot
ai_training_data: allow
ai_use: search_indexing, generative_synthesis

Common AI Agent User-Agents:

# OpenAI's GPTBot
User-agent: GPTBot

# ChatGPT-User (when users browse with ChatGPT)
User-agent: ChatGPT-User

# Google's AI crawlers
User-agent: Google-Extended

# Anthropic's Claude
User-agent: Claude-Web

# Perplexity AI
User-agent: PerplexityBot

`Path`

Defines a scope block for specific URL paths or patterns.

Path: /premium/
ai_training_data: disallow
ai_use: search_indexing

Examples:

# Protect premium content
Path: /premium/
ai_training_data: disallow

# Allow AI training for blog content
Path: /blog/
ai_training_data: allow
ai_use: search_indexing, generative_synthesis

# Protect user-generated content
Path: /user-content/
ai_training_data: disallow
ai_use: search_indexing

Directive Processing

Precedence Order

Directives are processed in the following order of precedence (highest to lowest):

Path-specific directives - Most specific
User-agent specific directives - Agent-specific
Global directives - Least specific

Inheritance

When a directive is not specified at a more specific level, it inherits from the global level:

# Global policy
spec_version: 3.0
ai_training_data: disallow
ai_use: search_indexing

# This agent inherits global policy but overrides ai_use
User-agent: ResearchBot
ai_use: research
# ai_training_data remains "disallow" from global

# This path inherits from global but overrides ai_training_data
Path: /public-research/
ai_training_data: allow
# ai_use remains "search_indexing" from global

Default Values

If no directive is specified at any level, these defaults apply:

ai_training_data: allow
ai_use: search_indexing

Comments

Use # to add comments to your llmtag.txt file:

# Global policy: No AI training by default
spec_version: 3.0
ai_training_data: disallow
ai_use: search_indexing

# Allow research use for specific agents
User-agent: AcademicBot
ai_training_data: allow
ai_use: research

# Protect premium content from all AI training
Path: /premium/
ai_training_data: disallow
ai_use: search_indexing

Comments are ignored by AI agents and are purely for human readability and documentation purposes.

Best Practices

1. Start Simple

Begin with a basic global policy and add complexity as needed:

spec_version: 3.0
ai_training_data: disallow
ai_use: search_indexing

2. Use Clear Comments

Document your policies for future reference:

# Block AI training but allow search indexing
spec_version: 3.0
ai_training_data: disallow
ai_use: search_indexing

# Exception: Allow research bots to use content for training
User-agent: ResearchBot
ai_training_data: allow
ai_use: research

3. Test Your Implementation

Always verify your llmtag.txt file is accessible and properly formatted:

curl https://yourdomain.com/llmtag.txt

4. Keep It Maintainable

Use consistent formatting and logical grouping:

spec_version: 3.0

# Global policy
ai_training_data: disallow
ai_use: search_indexing

# Agent-specific policies
User-agent: GPTBot
ai_training_data: allow
ai_use: search_indexing, generative_synthesis

User-agent: ResearchBot
ai_training_data: allow
ai_use: research

# Path-specific policies
Path: /premium/
ai_training_data: disallow
ai_use: search_indexing

Path: /blog/
ai_training_data: allow
ai_use: search_indexing, generative_synthesis

Getting started

Core specification

Implementation

Core Directives

`spec_version`

Content Usage Directives

`ai_training_data`

`ai_use`

Advanced Directives

`verification_challenge`

Scope Directives

`User-agent`

`Path`

Directive Processing

Precedence Order

Inheritance

Default Values

Comments

Best Practices

1. Start Simple

2. Use Clear Comments

3. Test Your Implementation

4. Keep It Maintainable

Getting started

Core specification

Implementation

​Core Directives

​spec_version

​Content Usage Directives

​ai_training_data

​ai_use

​Advanced Directives

​verification_challenge

​Scope Directives

​User-agent

​Path

​Directive Processing

​Precedence Order

​Inheritance

​Default Values

​Comments

​Best Practices

​1. Start Simple

​2. Use Clear Comments

​3. Test Your Implementation

​4. Keep It Maintainable

Core Directives

`spec_version`

Content Usage Directives

`ai_training_data`

`ai_use`

Advanced Directives

`verification_challenge`

Scope Directives

`User-agent`

`Path`

Directive Processing

Precedence Order

Inheritance

Default Values

Comments

Best Practices

1. Start Simple

2. Use Clear Comments

3. Test Your Implementation

4. Keep It Maintainable