Research

Agentic behavior in financial domains.

We study how autonomous AI agents behave when deployed in financial markets. What works, what doesn't, and what to do about it.

Agent reliability

How agents degrade over sustained operation. What looks like it's working but isn't.

Decision quality

The gap between what an agent writes and what it actually does. Format compliance is not performance.

Adversarial environments

Financial markets test agent behavior in real time. We use them as a proving ground.

Open Research

CoherenceBench

Open-source benchmark for measuring decision quality in long-running autonomous agents. Agents look like they are working. Decision quality says otherwise. We measure the gap.

View on GitHub

200-tick sessionsSequential decision tasks across adversarial domains

5 metricsSeparating what agents write from what they do

4 scenariosPower grid, air traffic, hospital triage, network security

DeterministicFully reproducible with seed-based evaluation

Interested in this work?

If you are working on autonomous agents, studying agent reliability, or deploying AI in adversarial environments, we should talk.

Get in touch