Every team has an optimal pull request size, it's likely much smaller than you think, and making your pull requests your optimal size will improve the performance of your team. I'm going to convince you that these three statements are true by the end of this post.

What does the research say?

A large body of research has shown that code reviews are the most cost-effective way of finding defects in code¹. But most of the value of a code review comes from the first reviewer in the first hour².

How much code can my team review in an hour?

My team tracks how much time we spend on each code review. I compared those times to the number of lines changed in our pull requests to produce this scatter plot³.

The data is pretty messy but if we want a 90% chance of completing a code review in an hour, we probably need to cap our pull requests at around 200 lines of code.

Finding your optimal pull request size

This is really just an optimization problem. You have a cost associated with creating, reviewing, managing, and merging a pull request, which we can think of like a transaction cost. But you also have a cost associated with letting the pull request remain as work in progress, which we can think of like a holding cost. I'll define these costs in more detail in a minute. I just want to establish that you are trying to find the pull request size that minimizes the sum of your costs (the lowest point on the blue line below).

Pull request transaction costs

It takes time and effort to:

break stories into bite sized chunks
make the them independent or schedule them so they don't block each other
then actually make the changes required
create a testing plan
test the changes
create a pull request
review it
merge it
deploy it to production (we do continuous delivery)

If you did this for every line you changed, you would be at the far left on graph above. Not good. But making larger pull requests creates costs of its own.

Pull request holding costs

The larger the pull request, the:

longer it takes for the reviewer to find a big enough slot of time to address it (and the less motivated the reviewer will be to review it)
less chance it will pass code review on the first attempt and therefore require multiple reviews
longer it will take to review
more chance the review will exceed an hour (or whatever your personal limit is for effective code reviews) and become be ineffective
more chance it will grow larger still and need even more review
bigger the risk that the author made a bad assumption and will have to redo much more work
bigger the risk that the requirements will change before the code is released
more demoralizing it becomes to work on this code (multiple reviews of the same code sucks for everyone)
more it sucks to have to re-review and re-test the whole pull request if some little part of it fails
less certainty we have over when the code will be merged
more chance of a merge conflict
more chance it will block other stories in the sprint and derail the schedule
longer it will take to get the code into production and start delivering business value

If someone submitted a 10,000 line pull request in our project I can guarantee that all of these points would come into play. But our data and experience show that much smaller pull requests are still incurring huge holding costs.

Why would smaller pull requests by better?

Smaller batch sizes increase flow, according to Donald Reinertsen in his book: "The Principles of Product Development."

Specifically:

Reducing batch size reduces cycle time
Reducing cycle time reduces variability in flow
Reducing batch sizes accelerates feedback
Reducing batch sizes reduces risk
Reducing batch sizes reduces overhead
Large batch sizes reduce efficiency
Large batch sizes inherently lower motivation and urgency
Large batch sizes cause exponential cost and schedule growth
Large batches lead to even larger batches
The entire batch is limited by its worst element

Applying lean manufacturing concepts to pull requests

Some very bright people at Toyota invented the Toyota Production System, which includes single-piece flow. And they used this system to take over the auto industry. This is a similar idea but Reinertsen tweaked it for product development. The key to making this really work is to lower your transaction costs. If you become really efficient at processing pull requests, your overall cost curve will shift down. And your optimal batch size will shift to the left (get smaller). It's a virtuous cycle.

You want to get stories moving across your sprint board very rapidly. And you want to get new code into production as quickly as possible. You don't want pull requests stalled in review for days at a time. And you really don't want failed pull requests and all the rework that goes with them.

Still not convinced that the optimal pull request size is small?

Let's do some math. The more code in your pull request, the more chance that you've introduced a defect. I know software defect rates vary widely but let's use the often quoted 10-50 defects/kloc for released commercial software. But, we are talking about a code review--not released software--so let's assume that your defect rate is 50 defects/kloc. That's one defect every 20 lines of code on average⁴.

Good code reviews catch 60-90%¹ of the defects in the code so let's pick a middle value and say that you can catch 75% of them. That means you should find one defect for every 27 lines of code on average.

Smaller pull requests contain fewer defects

A 200 line pull request should contain 10 defects and you should expect to find 7.5 of them. That means it's extremely unlikely that it will pass code review on your first attempt. Now let's redo the math with a 50 line pull request. There should be 2.5 defects in this pull request and you should expect to find 1.9 of them. You've got a much better chance of passing code review on the first try with a smaller pull request.

When your big pull request fails code review, you have to fix the whole pull request, re-review the whole pull request, re-test the whole pull request, and hope that the nobody has introduced a new error along the way. When your small pull request fails code review, you still have to do all the same steps. But it will go much more quickly because you are dealing with much less code.

What's wrong with the way my team handles pull requests?

My team is struggling with our pull requests and our problems are all related to what I like to call "mega pull requests". Mega pull requests are pull requests that have gotten out of control and have become difficult to review and deploy. We recognize them by their large diffs, multitude of comments, multiple reviews, many days spent in review, and many hours spent trying to get them into the 'done' column.

How a mega pull request is born

Suppose my colleague creates a 200 line pull request. I see it in Jira and just from the title I know it's going to need my full attention. So I schedule it in a big block of time tomorrow and plan to tackle it all at once. I review it and find a couple of problems. I write my comments in BitBucket and bounce it back to the author. He's moved onto something else so it takes him until the next morning to review my comments, change the code, update the testing plan, retest the branch, and put it back in review.

We repeat this cycle once or twice while we deal with other stories, meetings, unplanned work, and other interruptions. By now, this pull request has been in review for the better part of a week and now it might have merge conflicts. If so, the author fixes them, retests, and sends it back to review. At some point I eventually get all the tests in the testing plan to pass, I approve the pull request, merge into master, and move on to the next story.

We probably have one pull request like this per sprint⁵. But on rare occasions, we've had pull requests bounce several times over several weeks. I'm not joking. Mega pull requests kill our productivity.

What's helped

We've tried all kinds of things to prevent the formation of mega pull requests but our ideas have mostly been ineffective. The only things that helped a little were:

automating our code styling and formatting with a commit hook so it's never an issue
doing a refactoring story in front of the main story
avoiding stray changes in the main story no matter how ugly the code - we just put it in the backlog and make a comment in the code with the Jira story ID

We think we've still got room to improve so we're interested in finding our optimal pull request size.

How do we find our optimal pull request size?

We conduct experiments. Our current policy is to only create a pull request that will take less than an hour to review. But we suspect our velocity will improve if we create smaller pull requests. So we'll try a 45 minute limit for two months, examine the data, set a new target, and repeat the experiment until we find our optimal pull request size.

Wrapping up

Every team has an optimal pull request size, it's likely much smaller than you think, and making your pull requests your optimal size will improve the performance of your team. You can find your optimal pull request size by doing experiments and measuring the change in your velocity.

I think I've made a pretty strong case for smaller pull requests. But I'd love to hear what works for your team. Have you experimented with the size of your pull requests? Have you found your optimal pull request size? How big are they and how has your velocity changed since you found it?

Fagan ME (1976) Design and code inspections to reduce errors in program development. IBM Syst J 15: 182-211. ↩
Cohen J (2010) Modern code review. Oram A, Wilson G, editors. Making software: what really works, and why we believe it. Sepastopol (California): O'Reilly. pp. 329-336. ↩
I removed spurious data like file renames, automatically generated files, and changes caused by running a formatting tool over the code. I also removed pull requests that didn't pass review. ↩
This is just a back-of-the-napkin math. You can use a wide range of defect and detection rates and still reach the conclusion that large code changes should contain detectable errors that cause your code review to fail. Please don't clobber me in the comments about the exact numbers I've used here. ↩
We have a 'zero known defects' policy so code goes back in forth in code review until we think it has zero defects and meets our other quality objectives. We're trying to improve the quality of our code base and repay our technical debt. This will change our optimal pull request size compared to a team that has a 'ship it and we'll fix it in maintenance' policy. ↩

Optimal pull request size