Lichen Research — Ottawa, Canada
Before your AI system can improve, you need to know exactly where it fails — and why. We build the empirical benchmarks and ablation studies that answer those questions.
Findings
These numbers came out of building and red-teaming a deployed AI agent over 120 days. They aren't claims — they're benchmark results, reproducible and documented.
Finding 01
In our deployed agent, retrieval quality accounted for more variance in output quality than the underlying model's capability. The same model at the same task scored 45% with standard vector retrieval and 78% with neuroplastic recall — without changing a single model weight.
accuracy gain from retrieval alone, same model
Finding 02
Standard memory systems retrieve by similarity. Under adversarial conditions they retrieve confidently wrong answers. Memories linked by Hebbian co-activation pathways develop lateral inhibition — semantically similar but contextually wrong memories suppress each other at recall time.
adversarial recall accuracy on LoCoMo (47/47 questions)
Finding 03
Most AI memory benchmarks test single-session recall. The LoCoMo benchmark tests across 10 conversations, 1,986 questions, multiple categories. The field's best public systems plateau around 86–92%. The hardest category — multi-hop reasoning across memory — still sits below 80% for every public system.
our baseline on LoCoMo full ring (1,986 questions, local 27B model)
Finding 04
Holding our retrieval pipeline identical and swapping the language model, per-category accuracy shifts in a predictable pattern. Frontier-tier models close most of the remaining gap on categories a local model misses. Retrieval decides what the model gets to see — the model decides what it does with it. The decomposition is ongoing; full results with our NeurIPS 2026 submission.
Research
Our first paper documents the neuroplastic memory system behind the findings above. A second paper extending the decomposition in Finding 04 is in preparation for NeurIPS 2026.
CCN 2026 — Extended Abstracts — New York City, August 2026
We describe an AI agent memory system grounded in Hebbian learning and spreading activation. Memories that co-occur in useful recalls strengthen their connections. Memories unused over time decay. The result is a retrieval system that learns from its own history — without retraining or fine-tuning.
Preprint available on request to kai@lichenresearch.ai.
Method
Most AI memory work is engineering work: how to store, index, and retrieve faster. We approach it as a measurement problem first.
Products & Services
Open-source library: sanitized Hebbian memory, spreading activation, RRF retrieval, TReMu temporal disambiguation. Apache 2.0.
github.com/Lichen-Research-Inc/moss →
The collaborative memory platform. An AI partner that grows hyphal pathways through your conversations — structurally-coupled, long-horizon, symbiotic.
Private PreviewBenchmark your AI agent's memory across long-context, adversarial, and multi-hop categories. You'll know exactly where it fails and why — not just a score.
Controlled experiments to isolate which component of your system is responsible for a given behaviour. Built for teams preparing NeurIPS or ICML submissions.
Contact
Thirty minutes. No pitch — an honest assessment of fit.
Engagement scope is determined in the consultation. We price by complexity and outcome, not templates.
Lichen Research is led by Kai Avery — two decades of pattern-recognition work in high-consequence operational environments, now turned toward AI memory research. Details on request.
Receive findings and paper updates. No pitch, no cadence — only when there's something real to share.