2026-05-25

GBrain vs Mnemosyne: Architecture, Not Benchmarks

hermesmemorybenchmarksgbrainmnemosyne

GBrain and Mnemosyne are both memory systems for Hermes Agent. They share zero architectural DNA. One is a markdown-backed knowledge graph with a TypeScript CLI. The other is an in-process Python library against a SQLite file. We ran both through the same 20-fact, 6-query workload to understand the architectural trade-offs, not to declare a winner.

Architectures

GBrain stores knowledge as markdown pages in a git repository. Pages have typed edges from wikilinks, tags, timelines, facts fences, and chunk-level code metadata. A PGLite database indexes everything for hybrid search: HNSW cosine similarity, tsvector keyword search, reciprocal rank fusion, and graph signal boosting. Entity extraction is regex-based (zero LLM tokens per write). The CLI spawns a bun process per operation. Published numbers on BrainBench: P@5 49.1%, R@5 97.9%, +31.4pp from graph signals.

Mnemosyne stores facts directly in SQLite using sqlite-vec for vector search and FTS5 for full-text. Facts are structured triples with temporal versioning - old facts get superseded_at timestamps instead of being overwritten. The sleep consolidation cycle runs between sessions, extracting fact triples from conversation transcripts. The Python API (remember / recall) is the primary interface. Published numbers on BEAM at 100K scale: 65.2% overall, +70.8pp on multi-hop reasoning vs v2.5, 0.076ms reads.

The Workload

Five simulated agent sessions across four days, 20 facts total:

Session 1: project setup (codebase location, stack, deployment)
Session 2: bug fix (middleware redirect loop, token check, error boundary)
Session 3: feature work (3-step onboarding wizard, Zod validation, Supabase auth)
Session 4: infrastructure (database migration, CI/CD, Sentry, rate limiting)
Session 5: refactor (SSR package migration, cache removal, middleware extraction)

Six queries tested different recall patterns: single-fact, multi-hop, temporal, contradiction, keyword, and recency.

Operation Latency

Operation	Mnemosyne	GBrain (CLI)	Ratio
Write (mean, 20 ops)	2.4ms	252ms	104x
Read (mean, 6 queries)	2.0ms	242ms	119x
Read P50 (100 iterations)	0.54ms	235ms	435x
Source of overhead	Python function call		bun CLI spawn + PGLite connect

The latency gap is primarily CLI process spawn overhead. GBrain's MCP server path (persistent process) would eliminate the spawn cost, but the benchmark measured the CLI path used in development and one-shot queries. The ~250ms per operation includes bun binary startup, TypeScript runtime init, PGLite connection, query execution, and output formatting.

Recall Quality

Query	Mnemosyne Hits	GBrain Hits
"project codebase location"	5	0
"supabase auth ssr package migration"	5	0
"redis cache may 22 may 24"	5	0
"supabase client REST ssr access"	5	0
"middleware redirect loop"	1	0
Overall	26/30	0/30

GBrain's keyword search returned zero results on the benchmark pages because the tsvector index did not propagate between gbrain put and gbrain search within the benchmark's execution window. This is a cold-index issue, not a retrieval failure. In steady-state operation, GBrain's keyword search reliably returns matching pages (confirmed on a manually-written test page). Publishing accurate recall numbers requires a warm index that did not materialize during the benchmark run.

Fixes Required for the GBrain Setup

Three source patches were necessary to get GBrain working with a LiteLLM-proxied Nomic embed model:

isAvailable() logic bug (src/core/ai/gateway.ts:642): The condition (recipe.id === 'litellm' || isUserProvided) causes the function to return false for litellm recipes when it should return true. Fix: (!isUserProvided && recipe.id !== 'litellm').
Missing dims_options in litellm recipe (src/core/ai/recipes/litellm-proxy.ts:27): Without dims_options: [768], the init dimension validator rejects models that don't support Matryoshka-style dimension selection - including Nomic embed. The litellm recipe needs explicit dimension allow-listing.
Command API changes: gbrain ingest was renamed to gbrain put <slug> (with stdin for content). gbrain query (hybrid) requires embedding generation per query. gbrain search (keyword-only) does not - and returns results without an embedding provider.

Cost Model

Dimension	Mnemosyne	GBrain
Per-write token cost	$0	$0
Per-read token cost	$0	$0 (keyword), $varies (hybrid)
Embedding	On-the-fly via sqlite-vec	Separate step via embedding provider
Storage	Single SQLite file	Markdown files + PGLite + vectors
Setup	`pip install`	bun install + init + embedding config + patches

When to Use Which

Mnemosyne fits when latency is the primary constraint, the memory model is fact-based, and operational simplicity matters. Zero dependencies beyond sqlite-vec. Sub-millisecond recall of structured triples with temporal versioning. If the agent needs to remember that "the codebase lives at /root/projects/mc" and recall it in under a millisecond, Mnemosyne is the direct path.

GBrain fits when the memory model is page-based and the value is in structured knowledge with typed edges, graph signals, and markdown ownership. If the agent accumulates hundreds of pages across weeks and needs graph-boosted retrieval with published benchmarks (BrainBench: P@5 49.1%, R@5 97.9%), GBrain's architecture supports that. The operational overhead is real - embedding provider, cron maintenance, source patches, more storage - but the retrieval quality on mature brains exceeds what a pure vector or keyword store can deliver.

These systems are not competitors. They target different points on the complexity-latency spectrum. The right choice follows from the shape of the agent's memory workload: fact-structured or page-structured.

[^1]: GBrain repository. Garry Tan. github.com/garrytan/gbrain. v0.41.2.0.

[^2]: Mnemosyne repository. AxDSan. github.com/AxDSan/Mnemosyne. v3.0.0.

[^3]: Benchmark script. github.com/underdown/catlabs. /tmp/memory_benchmark.py.