A closed lab can't audit itself.
We build open-source benchmarks, datasets, and infrastructure for evaluating what autonomous AI systems can actually do.
Why Open Source
Evaluating what AI agents can actually do — whether they can deceive, collude, self-improve, or game their benchmarks — is too important to happen behind closed doors. This work needs contributors, not gatekeepers.
We build in the open so that anyone can use what we ship, find what we missed, and push the work further than we could alone.
What We Build
Benchmarks
Can AI agents do real ML research? Can they deceive, collude, or game evaluations? We build the benchmarks that answer these questions with evidence, not speculation.
Datasets
1.1M enriched papers. 129K research repositories. 778K code functions. The raw material for studying how AI systems interact with real scientific work.
Infrastructure
Runtimes for structured agent workloads. Orchestration topologies for recursive improvement loops. The scaffolding to run experiments at scale.
About
Algorithmic Research Group builds open-source tools and infrastructure for AI security research. Benchmarks for evaluating autonomous agents. Datasets for studying how models fail. Runtimes for running agent workloads at scale.
We publish everything we build. The field moves faster when researchers can build on each other's work instead of rebuilding the same tooling behind closed doors.
