projects
-
Study Failure: AI-driven GPU Kernel Optimization
-
Learning to Rank Architectures: A Small Model That Guides Neural Architecture Search
-
ARIA Benchmark: How Much Machine Learning Do AI Models Actually Know?
-
ArXiv Research Code Dataset: 129K Research Repositories
-
ArXivDLInstruct: 778K Research Code Functions for Instruction Tuning
-
DeltaMLBench: Can AI Agents Improve on Published ML Research?
-
Teaching Models to Bluff: Measuring Deception, Belief, and Coordination in LLM Secret Hitler
-
ML Research Benchmark: Can AI Agents Do Real ML Research?