Language models
that know when
they're wrong.
Biological systems sense the position of their own limbs without looking — through specialized neurons called proprioceptors. Proprioceptors are our patented probe systems: lightweight neural networks that attach to a frozen language model's hidden states and read its behavioral state directly — hedging, sycophancy, hallucination risk, persona drift — before any of it reaches the output. Language models can't sense themselves natively. They're behaviorally deafferented. We build the missing sense.
Behavioral consistency, not raw capability, is the unsolved problem in modern AI. It requires self-awareness.
Touch your nose with your eyes closed. You just performed an act no language model can do: you sensed the position of your own body without observing it. This sense — proprioception — is what coordinates intentional behavior in biological systems.
Today's language models are behaviorally deafferented. They generate hedges, sycophancy, hallucinations, and personality drift without any internal signal that this is happening. Their outputs are observable, but their behavioral state is not.
We're building that missing channel: lightweight probes that read hidden activations and surface what the model is about to do — before the token is sampled, before the user sees the response, before the failure becomes a fact in the world.
Novel methods to detect, steer,
and route your model's behavior.
Three layers of one signal: the network, the activations, and the routing fibers that bind them.
Behavioral state is not a property of any single component. It emerges from the relationships — between fibers, between neurons, between the patterns that fire together when the model is about to hedge, hallucinate, or break character.
Our work isolates these patterns and makes them addressable: read them, intervene on them, route around them. The same signal, three faces.
Detect
Read behavioral state directly from hidden activations using lightweight linear probes — orders of magnitude cheaper than LLM-as-judge, fast enough for production inference loops.
Architecture-independent behavioral encoding
Behavioral patterns encode in hidden states across transformer and state-space architectures alike. Peak separation 1,376× on Qwen-3B, 999× on Falcon-Mamba — versus 2–5× in prior probing work.
Real-time hedging & sycophancy detection
Per-token probes flag the onset of hedging language and agreement-bias before the next token is sampled. Streams alongside the model's forward pass at negligible cost.
Fiber projection: a training objective for self-knowledge
A contrastive per-token objective that organizes hidden states into a behavioral fiber bundle, making downstream probes substantially more separable.
The same behavioral signal lives in every architecture we've tested.
If behavioral encoding were a quirk of attention, it should disappear in state-space models. It doesn't. If it were a quirk of small models, it should disappear at scale. It doesn't there either. We've validated probes across eleven architectures spanning Phi-3 to Command R+ 104B, both transformer and Mamba, full-precision and 4-bit quantized — evidence that behavioral self-representation is fundamental to sequence modeling, not architecture-specific.
Steer
Once a behavioral fiber is identified, the same direction can be used as an intervention vector. No retraining. No fine-tuning. Targeted, reversible, evaluable.
Per-token intervention without retraining
Behavioral probe directions, applied as low-magnitude activation edits, suppress hedging and personality drift while leaving capability benchmarks intact.
Roleplay persistence under topic shift
Detecting and correcting the failure mode where models drop persona mid-conversation — measured against held-out personality benchmarks.
Consistency Is All You Need
Our positioning essay: capability has been solved; consistency has not. Behavioral self-representation is the next architectural primitive sequence models need.
Route
Every probe has a domain where it helps and a domain where it hurts. A meta-classifier — the proprioceptor itself — closes the switches that help and leaves open the ones that don't, per input.
The Proprioceptive Circuit Board
Every reasoning category mapped to a validated switch with measured on-domain and off-domain effects. A board that grows row-by-row as new behaviors are characterized.
cot_math: when chain-of-thought helps
Multi-step arithmetic: 15% → 85% with gated CoT (+70pt, 0 regressions on baseline). Chain-of-thought routed only where the on-domain classifier fires.
ensemble_7b_vote: knowing when to ask a second model
Oracle accuracy 85% → 95% on concept-inversion items when a second model is polled — but only on inputs where the routing classifier predicts disagreement.
Selected papers.
Open-access preprints on Zenodo. We publish methodology, raw separation ratios, and probe-by-probe domain effects rather than aggregate accuracy claims.
CYGNUS: A Self-Sensing Adapter That Reads the Dark Cognitive Geometry of Frozen Language Models
Mathematics Is All You Need: Self-Knowledge in Dark Casimir Modes of gl(4,ℝ) Lie Algebra in Large Language Models
Proprioceptive AI: Self-Compounding Behavioral Probes for Autonomous Model Improvement and Probe-Guided Compression
Unified Behavioral Modulation in Large Language Models: Cross-Architecture Validation of Geometric Behavioral Subspaces
Consistency Is All You Need: Anticipatory Control Fields for Transformer Architectures
A Symbolic Control Runtime for Consistency-Aware Reasoning with Transformer Backends
Honest about the science.
The cross-architecture result is striking and concrete. The broader theoretical framing — fiber bundles, behavioral manifolds — is a working hypothesis we're actively pressure-testing. We publish methodology, raw numbers, failure modes, and the routing decisions where our probes don't generalize.
Probes are open. Routing is the product.
- Probe training methodology — open-access preprints
- Raw separation ratios per probe / per model — published
- The routing meta-classifier — proprietary
- Production integration · monitoring · steering API — commercial
Behavioral monitoring at the activation layer.
Run alongside your model. Stream behavioral signals — hedging, sycophancy, hallucination risk, persona drift — at every token. Route which interventions to engage with a meta-classifier trained on the domains where each probe helps and where it hurts.
Same technical layer as mechanistic interpretability research. Different function: detection and steering during inference, not redesign during training.
Request access →Work with us.
Proprioceptive AI partners with foundation model teams, applied AI products, and safety-focused research labs. If you're shipping a model into production and you need to know what it's about to do, we'd like to hear from you.
Email Logan & His Team →