7. Security Considerations

This specification defines data structure only. It does not mandate sanitization rules, output delimiters, or content filtering.

Implementers and consumers should note:

All email content — including subject, sender names, and body fields at every level — originates from an untrusted external source.
The forAI field reduces noise but does not sanitize for prompt injection. An adversary can craft email content designed to manipulate an AI system that processes it as instructions.
Safe usage of any content.* field with an LLM is the responsibility of the consuming application.
Implementations are encouraged to offer an optional scanning layer and attach findings as metadata outside this core schema. This spec does not define that layer.
content.rawFull in particular MUST be treated as fully untrusted input if re-parsed downstream.
content.html is live, attacker-influenced markup, not just an LLM-injection vector. It commonly contains remote-resource references (<img src>, tracking pixels, remote CSS url()). A consuming application that renders content.html directly, or that eagerly fetches URLs found in it (e.g. for a link-preview feature), is exposed to SSRF (the fetch can target internal/private network addresses reachable from the fetching service) and to the classic email tracking-pixel privacy leak (fetching the URL confirms to the sender that the message was opened, and from roughly where). Consumers rendering content.html SHOULD do so in a sandboxed context (e.g. a sandboxed iframe with remote image loading disabled by default) and SHOULD NOT server-side fetch URLs discovered in email content without the same allow-listing/network-egress controls used for any other untrusted-URL-fetching feature.