Back to KB
Difficulty
Intermediate
Read Time
9 min

Prompt injection through website content: how AI agents can be manipulated by the pages they visit

By Codcompass Team··9 min read

DOM-Based Prompt Injection: Hardening AI Agents Against Hidden Web Content

Current Situation Analysis

The emergence of autonomous AI agents capable of browsing the web has fundamentally altered the threat landscape for web applications. Tools like ChatGPT, Claude, Perplexity, Microsoft Copilot, and Google Gemini routinely fetch arbitrary URLs to retrieve context, summarize content, or answer user queries. Unlike human users who interact with rendered visual interfaces, these agents ingest the raw Document Object Model (DOM). They parse HTML, CSS, metadata, comments, and attributes that are often invisible to human visitors.

This shift creates a critical vulnerability surface: Indirect Prompt Injection via DOM Content. Classified as LLM01:2025 in the OWASP LLM Top 10, this attack vector allows adversaries to embed malicious instructions within web pages that are invisible to humans but fully processed by AI agents. When an agent visits a compromised page, it may execute these hidden instructions, leading to data exfiltration, unauthorized actions, or manipulated outputs presented to the end-user.

The problem is frequently overlooked because traditional web security scanners are designed around a human-centric threat model. Tools like Burp Suite, OWASP ZAP, and Snyk prioritize vulnerabilities that affect browser rendering or user interaction, such as Cross-Site Scripting (XSS) or SQL injection. They generally ignore content that is hidden via CSS, stored in HTML comments, or embedded in metadata, operating under the assumption that invisible content cannot harm a human user. This assumption collapses when the consumer is an AI agent that reads the entire source code.

Furthermore, the attack surface extends beyond static HTML. User-generated content (UGC) platforms, e-commerce sites with dynamic product descriptions, and content management systems (CMS) often allow users to control attributes like image alt text or SVG uploads. Attackers exploit these fields to inject adversarial prompts. Additionally, sophisticated actors employ user-agent cloaking, serving benign content to human browsers and scanners while delivering malicious payloads specifically to known AI agent identifiers.

WOW Moment: Key Findings

The disparity between human visibility and AI ingestion creates a blind spot that traditional security controls cannot address. The following comparison highlights how different content types pose varying levels of risk based on their visibility and detectability by standard tools.

Content VectorHuman VisibilityAI Agent IngestionTraditional Scanner DetectionInjection Risk Level
Rendered Body TextHighHighHighLow
CSS Hidden (display:none)NoneHighNoneCritical
HTML CommentsNoneHighNoneHigh
Image Alt-TextLow (Accessibility)HighLowHigh
SVG Embedded TextNone (if styled)HighNoneCritical
UA-Specific PayloadsVariesHighNoneCritical

Why this matters: The data reveals that the highest-risk vectors are those with zero human visibility. Security teams relying on conventional scanners are effectively blind to the most dangerous attack surfaces. Mitigation requires a paradigm shift from "protecting the rendered view" to "securing the entire DOM for machine consumption."

Core Solution

Defending against DOM-based prompt injection requires a multi-layered approach combining content sanitization, multi-agent testing, and architectural hardening. The solution involves detecting hidden content, validating metadata, and ensuring consistency across different user agents.

Architecture Decisions

  1. DOM-Aware Parsing: Regex-based scanning is insufficient due to the nested structure of HTML and the variability of CSS properties. A robust solution must parse the DOM to evaluate computed styles, attribute values, and node types.
  2. Multi-Agent Crawling: To detect user-agent cloaking, the system must fetch the same URL using multiple user-agent strings representing major AI agents and huma

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr

Sign in to read the full article and unlock all 635+ tutorials.

Sign In / Register — Start Free Trial

7-day free trial · Cancel anytime · 30-day money-back