Today, OpenBlock entered our frontier coding agent OB-1 into Terminal Bench, the open benchmark for agents solving software engineering tasks in a terminal environment. OB-1 finished second overall, with a 49% accuracy on the benchmark.
With a small team and focused iteration, we were able to build an agent that stands shoulder-to-shoulder with the best in the field. Our result illustrates the path to a decentralized agent economy, where specialized agents built by individuals can outperform the monolithic systems of big labs.
Terminal Bench is an open evaluation where agents must complete real coding tasks end-to-end: configuring environments, editing code, running tests, and validating results. Success requires more than prompt engineering; it demands persistence, planning, and adaptability inside a real terminal.
We built OB-1 to mimic how a capable engineer works:
This architecture allowed OB-1 to consistently make progress on long-horizon, error-prone tasks where most agents stall.
Placing second on Terminal Bench is validation of our design choices, but it’s also a signal of where OpenBlock is headed. Our belief is that the path to AGI will come not from single, monolithic models, but from specialized agents coordinating through shared memory, feedback, and incentives.
Terminal Bench is one arena for this vision. OpenBlock’s mission is to build the commons that powers them all: a decentralized gym where every agent can learn from every other.
We’ll keep pushing OB-1 forward, and continue to expand Agent Arena, a place where anyone can run, evaluate, and compare their agents in real environments. We look forward to working with open-source developer ecosystems to apply our agents to real-world tasks.
If you’re building agents, or want to see the future of agent coordination, try it here: obl.dev.
Follow @openblocklabs for updates on our research and progress toward the global agent commons.
Today, OpenBlock entered our frontier coding agent OB-1 into Terminal Bench, the open benchmark for agents solving software engineering tasks in a terminal environment. OB-1 finished second overall, with a 49% accuracy on the benchmark.
With a small team and focused iteration, we were able to build an agent that stands shoulder-to-shoulder with the best in the field. Our result illustrates the path to a decentralized agent economy, where specialized agents built by individuals can outperform the monolithic systems of big labs.
Terminal Bench is an open evaluation where agents must complete real coding tasks end-to-end: configuring environments, editing code, running tests, and validating results. Success requires more than prompt engineering; it demands persistence, planning, and adaptability inside a real terminal.
We built OB-1 to mimic how a capable engineer works:
This architecture allowed OB-1 to consistently make progress on long-horizon, error-prone tasks where most agents stall.
Placing second on Terminal Bench is validation of our design choices, but it’s also a signal of where OpenBlock is headed. Our belief is that the path to AGI will come not from single, monolithic models, but from specialized agents coordinating through shared memory, feedback, and incentives.
Terminal Bench is one arena for this vision. OpenBlock’s mission is to build the commons that powers them all: a decentralized gym where every agent can learn from every other.
We’ll keep pushing OB-1 forward, and continue to expand Agent Arena, a place where anyone can run, evaluate, and compare their agents in real environments. We look forward to working with open-source developer ecosystems to apply our agents to real-world tasks.
If you’re building agents, or want to see the future of agent coordination, try it here: obl.dev.
Follow @openblocklabs for updates on our research and progress toward the global agent commons.