2026-05-27

Hermes Agent Ships Security-Guidance Plugin With 25 Pattern-Matched Vulnerability Rules

hermessecuritypluginsagent-safetycode-review

On May 26, ClaudeDevs shipped a security-guidance plugin for Claude Code. Less than 24 hours later, Hermes Agent merged a port of the same plugin, verbatim pattern data and all, into its own plugin system. The plugin catches 25 classes of dangerous code patterns at write time -- no LLM calls, no API roundtrips, just regex and substring matching that runs locally in milliseconds.

Anthropic reported a 30-40% drop in security-related PR comments during internal rollout of the original Claude Code version. Hermes Agent's port brings that same Layer 1 defense to agent-authored code.

How it works

The plugin hooks into three Hermes Agent tool calls: write_file, patch, and skill_manage (in write/patch modes). When the agent writes code to disk, the plugin scans the content against a pre-compiled set of regex and substring patterns before the write completes.

There are two operating modes:

Mode	Trigger	Behavior
Warn (default)	Pattern match on write	File is written. Warning appended to tool result for agent self-correction on next turn.
Block	`SECURITY_GUIDANCE_BLOCK=1`	Write is refused entirely. Warning returned as block reason.

The warn-by-default design is intentional. Pattern matching has a non-trivial false-positive rate -- redis.eval() trips the eval( rule, yaml.load() inside a SafeLoader wrapper will still match, and ECB mode inside a test fixture looks identical to ECB mode in production. Warning lets the agent self-correct on the next turn without blocking legitimate work.

What it catches: 25 rules across 8 categories

The pattern data is forked verbatim from Anthropic's claude-plugins-official repository (commit 0bde168, Apache 2.0). NousResearch wrote the Hermes-side wiring in __init__.py; the pattern rules themselves are byte-for-byte identical to upstream.

Category	Patterns detected
Unsafe deserialization	`pickle.load`, `cloudpickle`, `dill`, `marshal.loads`, `shelve.open`, `yaml.load` (no SafeLoader), `torch.load` (no `weights_only=True`), `joblib.load`, `pandas.read_pickle`, `numpy.load(allow_pickle=True)`
Command injection	`os.system`, `subprocess(shell=True)`, JS `child_process.exec`, Go `exec.Command("sh"...)`
Code injection	Bare `eval(`, JS `new Function(...)`
XSS sinks	`.innerHTML =`, `.outerHTML =`, `.insertAdjacentHTML(`, `document.write`, React `dangerouslySetInnerHTML`
Crypto footguns	AES ECB mode, Node `crypto.createCipher` (no IV), TLS verification disabled (`verify=False`, `rejectUnauthorized: false`, `InsecureSkipVerify: true`)
XXE	`xml.etree`, `minidom`, `xml.sax` without `defusedxml`
Supply chain	`<script src="https://...">` without `integrity=` SRI hash
CI/CD injection	GitHub Actions workflow files using `${{ github.event.* }}` interpolated into `run:`

Minimizing false positives

The plugin uses several strategies to keep the false-positive rate manageable:

Path-gating. Python-only rules (like pickle.load) skip .js, .ts, and .vue files. JavaScript rules (like .innerHTML) skip .py files. Documentation files (.md, .txt, .rst, .json, .yaml) are skipped by all rules -- pattern matching in docs produces noise without security value.

Lookbehind assertions. The eval( rule uses a regex lookbehind to exclude method calls. model.eval() and redis.eval() are not eval() calls -- they are method invocations on objects. The pattern (?<![a-zA-Z_.])eval\s*\( catches bare eval( while skipping thing.eval(.

Content size cap. Files larger than 256 KB are skipped entirely. Pattern matching a 10 MB blob has poor signal-to-noise and would slow the agent loop.

Error result exclusion. If a tool call already returned an error, the plugin does not append a security warning on top of it. The agent has bigger problems to deal with.

Architecture: Layer 1 of 3

Anthropic's security-guidance design has three layers:

Pattern match (this port). Runs locally at write time. Zero LLM tokens. Catches the most common, mechanically detectable patterns.
LLM diff review. Spawns a cheap auxiliary model on every turn that touched files to review the diff for subtler issues.
Agentic commit review. On git commit, spawns an SDK subagent with read/grep/glob tools to trace data flow through the changed files.

This port implements only Layer 1. The HERMES Agent README notes that Layers 2 and 3 are follow-up work -- Hermes Agent can already run those kinds of reviews on demand via delegate_task, but automated per-turn review with a cheap auxiliary model would require explicit wiring.

Enabling the plugin

Plugins in Hermes Agent are opt-in. To enable security-guidance:

hermes plugins enable security-guidance

Or edit ~/.hermes/config.yaml:

plugins:
  enabled:
    - security-guidance

The kill switch is SECURITY_GUIDANCE_DISABLE=1. For stricter environments where unsafe patterns are policy violations, SECURITY_GUIDANCE_BLOCK=1 refuses writes entirely instead of warning.

What this means

Agent-authored code is growing as a proportion of total code written. Every major coding agent -- Claude Code, Codex, Hermes Agent, Cursor -- generates code that lands in production repositories. Pattern-matched security warnings at write time are a low-cost first line of defense. They catch what static analysis would eventually catch, but they catch it while the agent still has context and can fix the issue in the same conversation.

The port from Claude Code to Hermes Agent took under 24 hours because the pattern data is portable and the plugin hooks are standard. As more agents adopt the same pattern library, cross-platform security coverage improves without any model changes.

[^1]: ClaudeDevs. "security-guidance plugin for Claude Code." X. May 26, 2026. [^2]: NousResearch. "plugins: add security-guidance -- pattern-matched warnings on dangerous code writes." Hermes Agent PR #33131. May 27, 2026. [^3]: Anthropic. "claude-plugins-official: security-guidance." GitHub. Apache 2.0.