Skip to main content

The Problem We’re Solving

The internet is undergoing its most profound transformation since the dawn of the search engine. The open web has become the primary dataset for a new generation of artificial intelligence, yet this new relationship operates without a clear framework for consent or control.

The Current State

Today’s AI landscape operates on an implicit contract:
  • AI companies scrape content without explicit permission
  • Publishers have no standardized way to communicate their preferences
  • The robots.txt protocol only controls access, not usage
  • Content creators lack control over how their work is used for AI training
The robots.txt protocol answers: “Can a bot access this URL?” But it was never designed to answer the critical question: “Once accessed, what are you permitted to do with the content?”

Our Solution: Explicit Over Implicit

llmtag moves the web from an ambiguous, implicit contract to an explicit, transparent one.

Core Philosophy

Separation of Concerns

robots.txt handles Access Control
llmtag.txt handles Usage Control
AI agents must first be allowed to access a URL by robots.txt before they can read the usage policies in llmtag.txt.

Explicit over Implicit

If a rule is not defined, the default policy (allow) applies. This encourages participation without breaking functionality. Publishers can opt for a stricter default (disallow) if they choose.

Granularity and Extensibility

The standard is designed to be powerful, allowing rules to be set per-agent, per-path, and even per-content-type.

Machine-Readable

Policies are expressed in a structured, parseable format that AI agents can automatically understand and implement.

Design Principles

1. Simplicity First

The llmtag.txt format is intentionally simple and human-readable:
spec_version: 3.0
ai_training_data: disallow
ai_use: search_indexing, generative_synthesis
Just like robots.txt, anyone can create and understand an llmtag.txt file without technical expertise.

2. Backward Compatibility

The protocol is designed to be:
  • Non-breaking: Websites without llmtag.txt continue to function normally
  • Progressive: Publishers can adopt the standard incrementally
  • Future-proof: New directives can be added without breaking existing implementations

3. Publisher-Centric

The standard prioritizes publisher control and choice:
  • Opt-in granularity: Publishers choose exactly what to allow or disallow
  • Flexible defaults: Support both permissive and restrictive default policies
  • No enforcement burden: Publishers aren’t responsible for enforcing compliance

4. AI Agent Friendly

The protocol is designed to be easily implementable by AI companies:
  • Clear syntax: Unambiguous parsing rules
  • Standardized format: Consistent structure across all implementations
  • Discovery mechanism: Automatic detection via standard HTTP requests

The Vision: A New Social Contract

Today’s Web

Publisher → [Implicit Permission] → AI Agent

Tomorrow’s Web with LLMTAG

Publisher → [Explicit llmtag.txt] → AI Agent → [Respects Policies]

Why This Matters

For Publishers

Control

Take back control over how your content is used by AI systems

Transparency

Make your AI usage policies explicit and discoverable

Flexibility

Set different policies for different content types and AI agents

Future-Proofing

Establish clear boundaries before AI usage becomes even more widespread

For AI Companies

Legal Clarity

Clear, machine-readable policies reduce legal uncertainty

Ethical Compliance

Respect publisher preferences and build trust with content creators

Implementation Simplicity

Standardized format makes compliance straightforward to implement

Industry Leadership

Be part of establishing ethical AI practices from the ground up

For the Web Ecosystem

Sustainable AI

Create a sustainable relationship between AI and content creation

Innovation Protection

Protect content creators while enabling AI innovation

Global Standard

Establish a universal protocol that works across all platforms and languages

Trust Building

Build trust between AI companies and content creators

The Path Forward

Phase 1: Early Adoption

  • Publishers implement llmtag.txt on their websites
  • AI companies begin reading and respecting the standard
  • Community builds tools and integrations

Phase 2: Industry Standard

  • Major platforms adopt the standard (WordPress, Drupal, etc.)
  • AI companies make compliance a standard practice
  • Legal frameworks begin recognizing the protocol

Phase 3: Universal Protocol

  • llmtag.txt becomes as ubiquitous as robots.txt
  • AI agents universally respect publisher preferences
  • New web standards emerge based on explicit consent

Join the Movement

Be part of the solution

Help us establish llmtag as the universal standard for AI content policies. Your participation shapes the future of the web.
The LLMTAG protocol is open source and community-driven. We believe that the future of AI and content should be shaped by the people who create and consume it, not just the companies that build the technology.
I