I'm open-sourcing Bottega, our internal coding agent orchestration tool

We shipped the 1000th user story with our internal agent orchestration tool last week. To celebrate, I'm open-sourcing it: Bottega repo

We've been using Claude Code for more than a year now. Velocity has greatly increased, code quality too.

For the past 8 months, 100% of our production code has been written by agents. The workflow proposed in the specs is the distillation of that. Humans own the plan and the review, agents do the rest.

I'm not trying to argue against hand-written code, we just found a workflow that allowed us to reduce lead time and increase code quality. It works for us, in our context.

I'm sharing my own conclusion on what works and what doesn't.

We built this tool to formalize our current workflow. We wanted a tool that is minimalist, user-friendly, and easy to adapt.

What didn't work, and why we built this tool

We created this tool at the end of last year to address the issues we faced during our first 6 months of using coding agents.

Failure mode, PRs accumulate

The root cause was usually a combination of:

We decided to solve all of these problems at the planning stage. Before development starts.

This is not a novel idea. We have twenty years of Agile, XP, and BDD literature on the subject. We just realised that when applying this to agents, magic happens.

concrete examples, co-authored between the human and the agent, are the cheapest place to surface disagreement before code is written.

The plan is the centerpiece of the agentic development cycle. The quality of the final output is directly correlated with the quality of this plan.

Our failure mode was treating the plan artifact as disposable:

A task is not a prompt. A task is a requirement with acceptance criteria.

The task itself, the requirement, and the technical specification must all coexist as enduring artifacts that live alongside the implementation, not transient inputs to a single session.

Once the plan is detailed enough, and once the workflow ensures the agent rigorously executes it, we were finally able to produce PRs we could merge with zero, or minimal, back & forth. PRs stopped accumulating.

I think the term "autonomous coding agent" is extremely misleading. It focuses on time reduction rather than time investment. We had way better results once we shifted our focus to the HITL part of the process: where do we need to spend our time? At what stage of the workflow? In order to produce what?

It became quite obvious that we were wasting time at the end of the process, the PR stage naturally produces a bottleneck. We created this tool to help us invest our time at the beginning of the process instead, and have all downstream steps of the workflow as autonomous as possible.

What finally worked

We started to get great results once we finally managed to get Claude to behave just like a human developer.

Typical web-dev workflow:

Opinionated decisions we made along the way

Bottega is an attempt at automating this workflow in a simple and user-friendly manner.

As we ran more and more tasks in parallel, we decided that running this flow on a developer laptop made no sense. Why should the work stop if I close my laptop?

We quickly decided to set up this tool on a remote VPS.

Plan as the first-class citizen

We use a plan template. A planning agent has a single goal: start from a task requirement and fill the plan template with a detailed technical implementation plan, interview the user, ask questions, etc.

The developer reviews the plan.

My personal conclusion is that the quality of the final PR is almost entirely dependent on the quality of this plan.

Crafting this plan is time consuming. I spend >50% of my time at this stage. The goal is to minimize the amount of surprise at the PR stage. If the PR is just a simple reflection of the plan, usually the PR can be accepted right away.

Highly interactive stage.

Getting an implementation that matches the plan 100%

Once the plan is approved, an implementation agent executes it. As soon as it's done, an adversarial code review agent kicks in, its job is to make sure the implementation strictly matches the plan and that no checkbox was silently skipped.

This is now a well known concept: the Ralph Wiggum loop

The two agents iterate until the reviewer is satisfied

Manual testing

The plan must include manual testing scenarios, e.g.:

The agent runs these scenarios itself, just like a developer would before opening a PR.

This step was the major unlock: over the past few months, almost all issues discovered after a PR was written were due to acceptance criteria that were overlooked or missed during the planning phase (cf final quality of PR = quality of the specification). Once we got this step right we reached a level of quality similar to that of manual development. Software is rarely perfect: bugs happen, and most of the time, they are the result of scenarios that were genuinely missed during the feature design phase.

PR management

The agent creates the PR, then iterates until there are no conflicts and CI is green.

If I leave a comment on GitHub, it triggers a new agent run via a GitHub callback, again, the agent ensures CI stays green and solves any new conflicts that appear in the meantime.

Yet another orchestration tool

As we were working on this, a bunch of orchestration tools emerged. Variants of the same workflow we were converging on.

There is a lot of overlap with what we built. For us, this is a huge confirmation that we were on the right path.

Where Bottega differs:

Multi-harness. Bottega drives Claude Code, Codex, and OpenCode behind one interface, so you can assign a different model to each role on the same task.

Remote-first and multi-player. While you can run it on your laptop, Bottega is remote-first by design, we run it on a shared dev box. It has multiple concurrent users management out of the box. Side benefit1: sandboxing autonomous agents on a remote server was easier for us than sandboxing them on each laptop. Side benefit2: a lot of non-technical people use it internally.

Minimalist UX. The core ideas are super simple: we are just recreating the typical web developer workflow. And we wanted the tool to reflect that simplicity. Side benefit: easy to onboard the whole product team.


That's the core of it. The repo is here: github.com/vdaubry/bottega.

If you're building something similar, or you disagree with any of this, I'd love to hear from you.