Skip to main content

Introduction

The LLMTAG protocol is a declarative standard that allows website publishers to communicate their content usage policies to AI agents in a machine-readable format. This document provides the complete technical specification for version 3.0.

File Format

File Name and Location

The policy file must be named llmtag.txt and placed in the root directory of the website, accessible at:
https://example.com/llmtag.txt
This follows the same convention as robots.txt, making it familiar and discoverable for both humans and automated systems.

File Structure

The llmtag.txt file is a plain text file with the following structure:
# LLMTAG Protocol v3.0
# Content Usage Policy for example.com
# For more information, visit: https://docs.llmtag.org

# REQUIRED: Protocol version declaration
spec_version: 3.0

# Global AI Training Policy: Controls whether content can be used for AI model training
# Values: allow (permit training) | disallow (block training)
ai_training_data: allow

# Global AI Use Policy: Defines how AI agents can use your content
# Values: search_indexing (for search engines) | generative_synthesis (for AI responses) | research (for academic research)
ai_use: search_indexing, generative_synthesis, research

# Content Attribution Requirements: Ensures proper credit when content is used
# Values: required (must credit) | optional (preferred) | none (no credit needed)
attribution: required
attribution_format: "Source: Example.com (https://example.com)"

# Contact Information: Where AI agents can reach you for questions
contact: ai-policy@example.com
documentation: https://docs.llmtag.org

# Protocol Information: Metadata about this policy file
protocol_name: LLMTAG
protocol_version: 3.0
last_updated: 2024-10-11
policy_effective_date: 2024-10-11

# Scope blocks for specific agents or paths
User-agent: [agent_name]
[agent-specific directives]

Path: [path_pattern]
[path-specific directives]

Core Directives

Required Directives

spec_version

Required. Declares the specification version being used.
# REQUIRED: Protocol version declaration
spec_version: 3.0
Purpose: This directive tells AI agents which version of the LLMTAG protocol to use when interpreting your policy file. It ensures compatibility and proper parsing of directives. Values:
  • 3.0 - Current LLMTAG Protocol version (required)
All llmtag.txt files must include this directive. Files without it are considered invalid.

Content Usage Directives

ai_training_data

Controls whether content can be used as training data for machine learning models. Values:
  • allow - Content may be used for AI training
  • disallow - Content may not be used for AI training
Default: allow (if not specified)
ai_training_data: disallow

ai_use

Controls specific AI applications and use cases. Values: Comma-separated list of allowed use cases:
  • search_indexing - Traditional search engine indexing
  • generative_synthesis - Generating answers, summaries, or new content
  • commercial_products - Use within paid AI features or products
  • research - Academic or non-commercial research purposes
  • personal_assistance - Personal AI assistants and chatbots
Default: search_indexing (if not specified)
ai_use: search_indexing, generative_synthesis

Scope Blocks

User-agent Block

Allows setting different policies for specific AI agents or crawlers.
User-agent: GPTBot
ai_training_data: allow
ai_use: search_indexing, generative_synthesis, research

User-agent: ChatGPT-User
ai_training_data: disallow
ai_use: search_indexing
User-agent names are case-insensitive. Use the exact user-agent string as reported by the AI agent.

Path Block

Allows setting different policies for specific URL paths or patterns.
Path: /premium/
ai_training_data: disallow
ai_use: search_indexing

Path: /blog/
ai_training_data: allow
ai_use: search_indexing, generative_synthesis
Path Pattern Rules:
  • Use forward slashes (/) as path separators
  • Trailing slashes are optional
  • Wildcards are not supported in v3.0
  • Path matching is prefix-based

Advanced Features

Verification Challenge

The verification_challenge directive establishes a cryptographic handshake to verify that an AI agent has actually read and understood the rules.
verification_challenge: sha256:abc123def456...
This is an advanced feature for publishers who want to implement verification mechanisms. Most implementations can ignore this directive.

Comments

Use # to add comments to your llmtag.txt file:
# Global policy: No AI training by default
spec_version: 3.0
ai_training_data: disallow

# Allow research use for specific agents
User-agent: AcademicBot
ai_training_data: allow
ai_use: research

Processing Rules

Precedence

Directives are processed in the following order of precedence:
  1. Path-specific directives (highest priority)
  2. User-agent specific directives
  3. Global directives (lowest priority)

Inheritance

When a directive is not specified at a more specific level, it inherits from the global level:
# Global policy
spec_version: 3.0
ai_training_data: disallow
ai_use: search_indexing

# This agent inherits global policy but overrides ai_use
User-agent: ResearchBot
ai_use: research
# ai_training_data remains "disallow" from global

Default Values

If no directive is specified at any level, these defaults apply:
  • ai_training_data: allow
  • ai_use: search_indexing

Discovery Mechanism

Automatic Discovery

AI agents should automatically check for llmtag.txt by making a GET request to:
https://[domain]/llmtag.txt

HTTP Headers

The server should respond with appropriate headers:
Content-Type: text/plain; charset=utf-8
Cache-Control: public, max-age=3600

Error Handling

  • 404 Not Found: No llmtag.txt file exists - apply default policies
  • 403 Forbidden: File exists but access is denied - apply default policies
  • 500 Server Error: Server error - apply default policies
  • Invalid Format: Malformed file - apply default policies

Compliance Requirements

For AI Agents

AI agents that claim compliance with the LLMTAG protocol must:
  1. Check for llmtag.txt before processing any content
  2. Parse the file correctly according to this specification
  3. Respect all applicable directives based on the agent’s identity and content path
  4. Handle errors gracefully by applying default policies when files are inaccessible

For Publishers

Publishers implementing llmtag.txt should:
  1. Include required directives (spec_version)
  2. Use valid syntax as defined in this specification
  3. Test their implementation to ensure the file is accessible
  4. Keep policies up to date as their preferences change

Version History

Version 3.0 (Current)

  • Added verification_challenge directive
  • Improved path matching rules
  • Enhanced error handling specifications
  • Added comprehensive compliance requirements

Version 2.0

  • Added ai_use directive with granular control
  • Introduced path-based policies
  • Added user-agent specific rules

Version 1.0

  • Initial specification
  • Basic ai_training_data directive
  • Global policy support only

Examples

See our examples page for comprehensive real-world implementations of the LLMTAG protocol.
I