Specification Overview

Introduction

The LLMTAG protocol is a declarative standard that allows website publishers to communicate their content usage policies to AI agents in a machine-readable format. This document provides the complete technical specification for version 3.0.

File Format

File Name and Location

The policy file must be named llmtag.txt and placed in the root directory of the website, accessible at:

https://example.com/llmtag.txt

This follows the same convention as robots.txt, making it familiar and discoverable for both humans and automated systems.

File Structure

The llmtag.txt file is a plain text file with the following structure:

# LLMTAG Protocol v3.0
# Content Usage Policy for example.com
# For more information, visit: https://docs.llmtag.org

# REQUIRED: Protocol version declaration
spec_version: 3.0

# Global AI Training Policy: Controls whether content can be used for AI model training
# Values: allow (permit training) | disallow (block training)
ai_training_data: allow

# Global AI Use Policy: Defines how AI agents can use your content
# Values: search_indexing (for search engines) | generative_synthesis (for AI responses) | research (for academic research)
ai_use: search_indexing, generative_synthesis, research

# Content Attribution Requirements: Ensures proper credit when content is used
# Values: required (must credit) | optional (preferred) | none (no credit needed)
attribution: required
attribution_format: "Source: Example.com (https://example.com)"

# Contact Information: Where AI agents can reach you for questions
contact: ai-policy@example.com
documentation: https://docs.llmtag.org

# Protocol Information: Metadata about this policy file
protocol_name: LLMTAG
protocol_version: 3.0
last_updated: 2024-10-11
policy_effective_date: 2024-10-11

# Scope blocks for specific agents or paths
User-agent: [agent_name]
[agent-specific directives]

Path: [path_pattern]
[path-specific directives]

Core Directives

Required Directives

`spec_version`

Required. Declares the specification version being used.

# REQUIRED: Protocol version declaration
spec_version: 3.0

Purpose: This directive tells AI agents which version of the LLMTAG protocol to use when interpreting your policy file. It ensures compatibility and proper parsing of directives. Values:

3.0 - Current LLMTAG Protocol version (required)

All llmtag.txt files must include this directive. Files without it are considered invalid.

Content Usage Directives

`ai_training_data`

Controls whether content can be used as training data for machine learning models. Values:

allow - Content may be used for AI training
disallow - Content may not be used for AI training

Default: allow (if not specified)

ai_training_data: disallow

`ai_use`

Controls specific AI applications and use cases. Values: Comma-separated list of allowed use cases:

search_indexing - Traditional search engine indexing
generative_synthesis - Generating answers, summaries, or new content
commercial_products - Use within paid AI features or products
research - Academic or non-commercial research purposes
personal_assistance - Personal AI assistants and chatbots

Default: search_indexing (if not specified)

ai_use: search_indexing, generative_synthesis

Scope Blocks

User-agent Block

Allows setting different policies for specific AI agents or crawlers.

User-agent: GPTBot
ai_training_data: allow
ai_use: search_indexing, generative_synthesis, research

User-agent: ChatGPT-User
ai_training_data: disallow
ai_use: search_indexing

User-agent names are case-insensitive. Use the exact user-agent string as reported by the AI agent.

Path Block

Allows setting different policies for specific URL paths or patterns.

Path: /premium/
ai_training_data: disallow
ai_use: search_indexing

Path: /blog/
ai_training_data: allow
ai_use: search_indexing, generative_synthesis

Path Pattern Rules:

Use forward slashes (/) as path separators
Trailing slashes are optional
Wildcards are not supported in v3.0
Path matching is prefix-based

Advanced Features

Verification Challenge

The verification_challenge directive establishes a cryptographic handshake to verify that an AI agent has actually read and understood the rules.

verification_challenge: sha256:abc123def456...

This is an advanced feature for publishers who want to implement verification mechanisms. Most implementations can ignore this directive.

Comments

Use # to add comments to your llmtag.txt file:

# Global policy: No AI training by default
spec_version: 3.0
ai_training_data: disallow

# Allow research use for specific agents
User-agent: AcademicBot
ai_training_data: allow
ai_use: research

Processing Rules

Precedence

Directives are processed in the following order of precedence:

Path-specific directives (highest priority)
User-agent specific directives
Global directives (lowest priority)

Inheritance

When a directive is not specified at a more specific level, it inherits from the global level:

# Global policy
spec_version: 3.0
ai_training_data: disallow
ai_use: search_indexing

# This agent inherits global policy but overrides ai_use
User-agent: ResearchBot
ai_use: research
# ai_training_data remains "disallow" from global

Default Values

If no directive is specified at any level, these defaults apply:

ai_training_data: allow
ai_use: search_indexing

Discovery Mechanism

Automatic Discovery

AI agents should automatically check for llmtag.txt by making a GET request to:

https://[domain]/llmtag.txt

HTTP Headers

The server should respond with appropriate headers:

Content-Type: text/plain; charset=utf-8
Cache-Control: public, max-age=3600

Error Handling

404 Not Found: No llmtag.txt file exists - apply default policies
403 Forbidden: File exists but access is denied - apply default policies
500 Server Error: Server error - apply default policies
Invalid Format: Malformed file - apply default policies

Compliance Requirements

For AI Agents

AI agents that claim compliance with the LLMTAG protocol must:

Check for llmtag.txt before processing any content
Parse the file correctly according to this specification
Respect all applicable directives based on the agent’s identity and content path
Handle errors gracefully by applying default policies when files are inaccessible

For Publishers

Publishers implementing llmtag.txt should:

Include required directives (spec_version)
Use valid syntax as defined in this specification
Test their implementation to ensure the file is accessible
Keep policies up to date as their preferences change

Version History

Version 3.0 (Current)

Added verification_challenge directive
Improved path matching rules
Enhanced error handling specifications
Added comprehensive compliance requirements

Version 2.0

Added ai_use directive with granular control
Introduced path-based policies
Added user-agent specific rules

Version 1.0

Initial specification
Basic ai_training_data directive
Global policy support only

Examples

See our examples page for comprehensive real-world implementations of the LLMTAG protocol.

Getting started

Core specification

Implementation

Specification Overview

Introduction

File Format

File Name and Location

File Structure

Core Directives

Required Directives

`spec_version`

Content Usage Directives

`ai_training_data`

`ai_use`

Scope Blocks

User-agent Block

Path Block

Advanced Features

Verification Challenge

Comments

Processing Rules

Precedence

Inheritance

Default Values

Discovery Mechanism

Automatic Discovery

HTTP Headers

Error Handling

Compliance Requirements

For AI Agents

For Publishers

Version History

Version 3.0 (Current)

Version 2.0

Version 1.0

Examples

Getting started

Core specification

Implementation

​Introduction

​File Format

​File Name and Location

​File Structure

​Core Directives

​Required Directives

​spec_version

​Content Usage Directives

​ai_training_data

​ai_use

​Scope Blocks

​User-agent Block

​Path Block

​Advanced Features

​Verification Challenge

​Comments

​Processing Rules

​Precedence

​Inheritance

​Default Values

​Discovery Mechanism

​Automatic Discovery

​HTTP Headers

​Error Handling

​Compliance Requirements

​For AI Agents

​For Publishers

​Version History

​Version 3.0 (Current)

​Version 2.0

​Version 1.0

​Examples

Introduction

File Format

File Name and Location

File Structure

Core Directives

Required Directives

`spec_version`

Content Usage Directives

`ai_training_data`

`ai_use`

Scope Blocks

User-agent Block

Path Block

Advanced Features

Verification Challenge

Comments

Processing Rules

Precedence

Inheritance

Default Values

Discovery Mechanism

Automatic Discovery

HTTP Headers

Error Handling

Compliance Requirements

For AI Agents

For Publishers

Version History

Version 3.0 (Current)

Version 2.0

Version 1.0

Examples