Introducing Algorithmic Research Group
Study Failure: AI-driven GPU Kernel Optimization
Learning to Rank Architectures: A Small Model That Guides Neural Architecture Search
ARIA Benchmark: How Much Machine Learning Do AI Models Actually Know?
ArXiv Research Code Dataset: 129K Research Repositories
ArXivDLInstruct: 778K Research Code Functions for Instruction Tuning
DeltaMLBench: Can AI Agents Improve on Published ML Research?
Teaching Models to Bluff: Measuring Deception, Belief, and Coordination in LLM Secret Hitler
Understanding Recursive Self-Improvement in AI Systems
ML Research Benchmark: Can AI Agents Do Real ML Research?