ADR-0005: Validate inbound email attachments by content sniffing
ACCEPTED
Context
With inbound attachments persisted (ADR-0004), the bot accepts arbitrary files from anyone who can email a public address. Both the email Content-Type header and the filename extension are sender-controlled and spoofable. We need to keep executables, installers, and disk images out of team storage and the LLM's view, without an exhaustive allowlist that rejects legitimate but unusual types.
On rejection, ADR-0002's no-auto-reply stance for the email channel (to avoid bounce loops) applies.
Decision
We will sniff each attachment's canonical type from its bytes with python-magic and reject dangerous files, with no bounce reply:
- Check three signals — filename extension, claimed header type, and magic-detected type — against a shared denylist of executables, installers, and disk images; any single hit rejects the file.
- Reject cross-category lies where the claimed and detected top-level types disagree, exempting
application/octet-streamon either side (genuine "unknown") and a curated allowlist of textualapplication/*types (JSON, XML, YAML, …) that magic reports astext/plain. Script types are excluded from that allowlist, so "I'm a CSV but actually a shell script" fails. - Store the magic-detected type as the canonical
content_typeon the File. - On rejection, append a bracketed note per attachment to
message_textso the LLM can tell the user; do not bounce.
Consequences
- Spoofed extensions and headers cannot smuggle executables past the filter; the stored
content_typereflects the real type. - A denylist keeps unusual-but-benign types working; only a curated dangerous set is blocked.
- No bounce mail, consistent with ADR-0002; the user learns the reason via the LLM.
- Negative:
python-magic(libmagic) becomes a runtime dependency, and its heuristic detection can misclassify, occasionally rejecting a legitimate file on a category mismatch. - Negative: the text-like allowlist needs maintenance as new textual
application/*types appear. - Negative: skip reasons reach the user only if the LLM relays them; there is no guaranteed out-of-band notification.
Alternatives considered
- Trust the email
Content-Typeheader or filename extension → rejected; both are sender-controlled and spoofable. - Allowlist of permitted types → rejected; too restrictive and needs constant expansion.
- Bounce or reply on rejection → rejected; risks bounce loops (per ADR-0002) and a
message_textnote is simpler.