What does it take to build towards 100 PRs/day per engineer?


A goal I’ve set for myself is to clear 100 “meaningful” Pull Requests in a single day, on a single project.

This is a tricky post to write. The first reason is my workflow is evolving quicker than I can update the post. The second is that I started with what I thought was an ambitious goal, but now I’m not so sure.

AI has enabled rapid improvements in some areas; only to expose other bottlenecks that were previously hidden or irrelevant. And while I’m getting a lot more done, I’m also managing a lot more Work-In-Process (WIP) with the increasingly overwhelming cognitive load that goes with it…

After some analysis on my typical day, I’ve narrowed the road-to-100 to these areas:

The TLDR is “do everything faster”, and “do the things you should do anyway” … but I like to think there is a decent amount of rethinking in there too.

I’ve also posted a snapshot of my current AI workflow.

My Day

My “traditional” day was built around deep “focus time” to crank through work. Long, uninterrupted stretches of time. While focus time is still important, it’s significantly more compartmentalized. When I do focus, it’s more likely to be planning and prioritizing than writing code. Often I’m doing this in my notebook rather than at my computer.

That means rethinking what my day looks like:

  • Sprints, not focus blocks. Instead of 2+ hour deep work sessions, I’m looking at 25-minute sprints with short breaks. Each sprint has a clear objective: get X PRs reviewed and merged, kick off Y new tasks. I’m focused on keeping the AI-fed and watered now.

  • Sharpening the Axe. Instead of coding “the thing” I spend a lot of time improving my AI workflow. My ideal is zero to one “handles” of any piece of code — that is, if it can reasonably be done end-to-end by the AI, that should be possible. If it does need my input I want to only jump in once. An example is “Shipping News” which is a post into Slack whenever a deploy goes out. I’ve handed that over to AI, which frankly does a better job at it. This was already the case for my commit messages.

  • Stacking Skills. Each time I use a skill I try to reflect on the bigger picture. Can I combine two skills into one? Could I have a skill that covers the end-to-end? In the Shipping News example above, that’s just one step of the deploy (as is say, smoke testing). So over time it’s been consumed by a broader skill that manages deployment end to end.

There is genuinely a sense of loss with this. I actually like coding. But on the other hand, it’s not a total loss. I feel like I can finally do “all the things!” that I’ve wanted to do.

And 25-minute sprints usually mean I take more breaks (short or long). I might clean the kitchen, exercise, call a friend. My day is overall longer, but strangely it’s significantly more balanced.

That may change as things evolve, but I feel like we’re in a mini Golden Age just now.

Inputs

If you’re working on any project of meaningful size, I suspect you also have a major backlog of tasks/tickets. Plus plenty of features on the roadmap. That said, having a large backlog of tickets doesn’t necessarily mean you can just throw them to AI (although it’s not necessarily a terrible idea). Backlog Grooming is an art in itself; and while necessary it can be a major time suck.

There are two challenges here:

  • Are these tickets actually important? AI makes you faster, but being faster on the wrong things doesn’t help. I suspect there are a lot of “way faster horses” out there at the moment.
  • Are they in good shape? Are they well-specced? Are they even relevant any more?

Often this is a lot of back-and-forth between the origin of the request and the developer. Someone files a bug. A developer asks for more context, they get a reply, and the loop repeats. By the time it’s picked up, it might not even be relevant. In some ways I think we’ve evolved such that this friction is deliberate — it’s a sort of Darwinistic prioritization process. If something survives the backlog long enough, it’s probably worth doing. The risk with AI velocity is that you bypass that filter entirely and start building things that may have naturally died in the queue.

So the challenge isn’t just going faster. It’s going faster on the right things. Some things I’m working towards — starting with the biggest:

Have people kick off the process themselves.

Instead of tickets sitting in a backlog until a developer picks them up (or let’s face it, never)… what if the person who filed the request could kick off the AI process themselves? The requester just jumps in and prompts the agent themselves.

To get this kicked off we started with a “#for-claude” channel on Slack where the team prompts. This lets the engineer give some feedback and upskill on what makes a good candidate and prompt. Since then we’ve invested in the pipeline to allow detailed branch previews. This means the person requesting can see the results and even iterate with the AI.

By the time I see the PR it could be done-done. We have a separate agent do a risk assessment that’s part of the PR the engineer can see — everything from “it’s a typo” through “3 humans should look at this”.

This works best with UI-focused changes and quality-of-life enhancements for the user. But that will change. We started the team on things like typos, adding helpful links, fixing interaction bugs, and we’re growing from there.

Get AI to triage and cluster tickets. I’ve started using AI to group tickets by theme — say, all the accessibility issues, or all the tickets touching a particular service. It also makes it easier to tackle a cluster in one sitting rather than context-switching between unrelated work.

Clearer coherent tasks, not just smaller tickets. The instinct is to break everything down into atomic tasks for AI. But sometimes the opposite can work better. I find having something coherent can perform better than small and clear. A broad task like “improve accessibility across the settings pages” can lead to a better outcome in a raft of UI tweaks (especially if you’re merge-conflicting all over the shop). Accessibility, translations, UI snafus are all examples here.

Less Double Handling

The biggest time sink in my current workflow isn’t writing code. It’s re-doing code.

AI is great at a first pass. It’s significantly less great at getting things right the first time on complex tasks. I’ll kick off a task, review the output, leave comments, wait for a revision, review again, leave more comments… Sometimes a task that should take one cycle takes three or four.

Each round-trip costs time and it’s a context switch for me (which is absolutely the killer). Every time I re-enter a review, I need to reload the (human brain) context of what I was trying to achieve.

The goal is to get AI to one-shot as often as possible. That means:

  • Better prompts, idiot. But beyond that, better context. The more the AI knows about the codebase’s conventions, patterns, and constraints, the closer it gets on the first try. This is where things like CLAUDE.md files, clear test suites, and well-documented APIs pay off disproportionately.

  • Axe Branch. I use worktrees. The main worktree is my “axe”. If I ever need to tweak a prompt or our development tools I’m continuously doing it in this branch. I then push this to the other worktrees or wrap up the lessons at the end of the day.

  • Comment-Only Prompts. I’ve written about this separately, but the idea is to have AI annotate where it intends to make changes before making them. I can course-correct at the intent stage rather than after it’s rewritten three files. It’s a specific trick, not a universal approach, but when it works it eliminates what could be multiple cycles.

  • Tighter specs on the ticket. This circles back to Inputs. The better the spec, the fewer round-trips. If a ticket says “fix the button” that’s going to take more iterations than “the submit button on /settings doesn’t trigger form validation when the email field is empty.” One of those gives the AI a fighting chance at a one-shot. For speed I’ve found annotated screenshots are amazing. Claude is better and quicker at finding the spot in the codebase than me by a mile.

  • Smaller PRs. This is the oldest trick in the book, but it should be used with caution. Smaller PRs are great for a lot of obvious reasons, but you also don’t want a death-by-a-thousand cuts. Not sure I’ve got the balance perfect here yet.

Restructuring the Codebase

This was actually one of the first big changes I made. We’ve got a large-ish codebase with a lot of history in it. For a long time it was unwieldy for AI tools, particularly earlier generations.

The restructuring helped AI navigate the codebase and made better use of context. It also helped us prompt more precisely and effectively:

Structure around domains rather than functionality. If you’ve used Ruby on Rails it’s a classic layout — all models in one directory, all controllers in another, all views in a third. Logical. But it means that any feature-level change spans multiple directories. We’ve flipped this and have the top-level feature or domain, which has its functional pieces as directories within that. I can then prompt the AI to “work on this directory” and get better, faster, more predictable results. For larger features I can point at directories to use as a template and build “like that one” with a decent amount of fidelity. A related investment is making sure we have “gold standards” in the codebase to use as these templates.

Minimize blast radius and repeat yourself (a bit). Related to the above. And probably one that hurts because “DRY” is a reflex that’s hard to give up. Shared utilities, global styles, and cross-cutting concerns can really hurt parallelism. AI can generate, maintain, AND refactor code at a stunning rate now. So I tolerate a lot more repeated code if it brings faster cycles, then occasionally step back and get the AI to refactor it into better shape. So more of a gardening than a grand architecture philosophy.

Consistent patterns and aggressive refactoring. If your codebase has three different ways of doing the same thing — say, three different HTTP client wrappers — the AI has to (and will) guess which one to use. Often it guesses wrong. Standardizing on one pattern, documented clearly, means fewer mistakes and fewer review cycles. This is one of those “things you should do anyway” items, but the ROI goes up significantly when AI is your primary builder. The fortunate thing here is I’ve found AI is the best partner in refactoring.

Faster Dev Loops

The development loop — write code, run tests, check CI, fix issues, repeat — is where I lose a lot of time. And frankly it’s also boring.

CI/CD speed is critical. Our CI currently averages about 8 minutes. That’s not bad by industry standards. But at 100 PRs/day, that’s over 13 hours of total CI time. The PR can’t be merged until CI passes. If I’m waiting on CI, I’m either context-switching (which has its own cost) or I’m idle.

Offload from the CI/CD. Right now Claude Code Web can run linting/etc quite easily. And somehow it’s free? I really push how far the Web/Cloud version has progressed. I basically want to guarantee each CI/CD run is green (as any “red” is naturally another cycle). The tighter the feedback loop, the better. If AI can run the relevant tests locally before even creating a PR, we catch issues earlier and avoid wasted CI cycles. This is where tools like pre-commit hooks and local linting shine.

Better tooling integration. Comment-Only Prompts are one example. But more broadly, the handoff between AI generating code and me reviewing it needs to be as frictionless as possible. IDE integrations, inline diff views, one-click approvals. Every click I can eliminate is time saved, and at scale those seconds matter. Claude Code Web/Desktop has the ability to comment directly on the diff now, which is great as I can prompt without a cycle via GitHub.

QA and preview environments. For frontend work especially, I need to see the change, not just read the diff. Automated preview deploys — where each PR gets its own URL — mean I can visually verify changes without pulling the branch locally. This is another one of those “you should do this anyway” items that becomes non-negotiable at higher velocity. This is also critical for widening access to the whole team.

Sprinting the Last Mile

The last mile — from “PR is ready for review” to “code is in production”.

Faster, targeted reviews. At 100+ PRs/day, I don’t follow a vanilla review model. We have another agent assess for risk, ranging from “robot” through to “three brains” (AI could probably review solo through to this needs three humans). Based on that assessment I do a quick review to see if I agree with the assessment. This is usually quick as it’s something from “typo” through to “this touches security”. From there I’ll approach the review with that risk profile in mind.

Faster deploys, automated rollbacks, observability. At this velocity, something will break. The question is how fast you recover. Automated canary deploys, health checks, and rollback-on-error-spike turn a 20+ minute incident into a 20-second one. Again, nothing new there, but as velocity goes up it’s harder to avoid this sort of investment.


The caveat with all of this: you need to be careful about the investments you’re making. The primary reason is the available tools may overtake what you’re doing. The investment is essential, but gold-plating is definitely a mistake. A process that you can change and improve quickly is a more important goal than the perfect process today.

The Mythical Man-Month told us that adding people doesn’t scale. The interesting thing about AI is that it can scale — but only if you remove the bottlenecks that were designed for a human-paced workflow. Most of what I’ve described above isn’t about AI at all. It’s about process, tooling, and structure. AI just makes the payoff bigger.



Source link