I don’t want to call out anyone specifically, because I think this is a problem for everyone, but I’ve been noticing a pattern lately that is starting to concern me:
- New user submits patch.
- Reviewer provides a lot of feedback.
- Patch gets merged after multiple updates and rounds of review.
- Patch is reverted.
One example I can find are the contributions from this user (again sorry to this user, don’t mean to call you out, but you can help us solve this problem. It’s important that we can discuss problems like this in the open).
I suspect that a lot of this pattern is caused by AI generated changes, but it’s hard to know for sure. To me, it looks like reviewers are getting completely exhausted trying to review patches with many many mistakes and end up missing things in the review. I know we as reviewers have trained ourselves over the years to be patient and kind with new contributors, but I think we need to change our approach given how easy it is now for someone to submit a valid looking patch with AI.
I don’t want this to come off as “people using AI” == “Bad” discussion, because we want new contributors and if people can solve problems using AI, that’s great, but there has to be some kind of bar for what makes a PR worthy of a reviewer’s time.
I think we need to be proactive and do something about this before it gets worse. I don’t have any great ideas, but I’m hoping someone else does. And it would probably be good to get feedback from the people submitting AI generated patches and not just the reviewers.
The one recommendation I do have, though, would be for reviewers to just straight off reject patches that have markdown in the commit message. This seems to be a trademark of AI generated content and the commit messages are usually full of repetitive or irrelevant text that make it hard to understand what is actually going on. Remember that when we review patches, the commit message is one of the things we are supposed to review. That’s why we are using the “Squash and Merge” strategy, because then the commit message is right there in the PR body.
4 Likes
Please do not use markdown as a litmus test. I never use AI to write commit messages or code, but I typically use markdown in commit messages. I thought that was normal.
4 Likes
As the reviewer of the patch in question, yes I was exhausted by multiple rounds of review restating the same feedback. However, in this situation, the actual change is minimal and trivially correct. Most of the review concerned the test changes. In this case most of the contention was related to retaining test coverage of pre-existing breakage. I think the ultimate change is correct, but I’m not sure what the correct solution is for tests that are known to fail a sanitizer but are there to track that the test fails with a sanitizer.
I agree markdown is not the thing to focus on. The PR summary in question does, however, have far too much redundant information (especially for how trivial the actual change is), so maybe some guidelines in that area would be better.
It’s not really this one specific PR that’s the problem. It’s just one of many examples I’ve seen of this pattern, and what concerns me the most is I’m not really doing that much pre or post commit review. I’m coming across these mainly because I’m searching through the git logs for something else, and I keep finding commits like this that just seem a little off to me.
It is still very easy to get commit access and we don’t have very strict rules about who needs to review patches before they get committed. These AI tools are making it much easier for users to meet the requirements for getting commit access without actually gaining much experience or knowledge about the project. If we are burning out our experienced reviewers and getting an influx of new inexperienced contributors who then start reviewing patches themselves, then this problem is going to snowball and quickly get much worse.
Indeed, “markdown” is too vague a litmus test.
What looks like usage of overwrought templates for PR descriptions might need clamping down on.
At least part of the information (if it was not obvious from the content of the PR) would have been more appropriate in an Issue (that the PR is the fix for).
Markdown is ok in comments, issues, and even PR descriptions normally, but I really do not think it has a place in the commit messages themselves, which for our workflow means neither in the PR description. Occasionally it may genuinely be useful if you really do need to explain something very complex and detailed, but most of the time if you find yourself reaching for section headings in a commit message that’s a huge sign that you’re trying to do too much in one commit (or don’t have enough comments / tests to document the issue being addressed), and even then the use of Markdown should only be in the limited sense of “doing something to structure the ASCII” rather than actually expecting it to be rendered, since Git clients do not normally do so (even GitHub presents it as raw fixed-width text).
2 Likes
Your not imagining it, reviewing LLM generated PRs can indeed be exhausting because the tooling often can not get fixes correct and so you end up in a cycle of: you addressed some comments but others being unaddressed or often things made worse in other areas, like unrelated formatting changes. Wash, rinse, repeat and no one is happy.
My personal experience so far has been that the PRs end up abandoned after a few iterations of this but this is clearly not happening in other areas.
What we used to have before was, a new contributor would put in some non-trivial amount of work into understanding our code base and how to do things correctly. Invariably they would make some mistakes on their first PR or so. Once you have put that first effort in the extra effort is not that bad and usually we can shepherd these into a good place. With LLMs folks are not doing this initial up front work and when confronted with fixes that the LLM can not do right they end up completely flummoxed and give up (my impression at least but it could be off).
This to me feels like an “LLM tax” but we are not getting a share of the tax revenue, it is being gobbled up by someone else but we are absorbing the cost of it. We are far from the only ones dealing with this.
This excellent write up from the Wiki Education folks: Generative AI and Wikipedia editing: What we learned in 2025 – Wiki Education
Demonstrates the cost they are bearing and it is high. OTOH they have the advantage that they are able to convert a lot of the users getting it wrong into productive users. This is my opinion is because the complexity is much lower. So their users getting the feedback that what they are doing is not working are with able to muster the effort to overcome and use the tools well.
I think LLVM has such a high complexity bar that trying to achieve the same turnaround here would not be feasible, the contributors don’t seem to have the will and we don’t have the extra time. If we had a more reliable way to detect LLM generated changes, we might be able to get ahead of this but I don’t know if that is possible.
I think this problem will only get worse, my feeling is that unless open source projects start working together to figure out how to push back more effectively we are going to be spending more and more of our time cleaning up this problem.
6 Likes
I don’t recall using section headings in a commit message. But I do use asterisks, backticks, code sections, short lists, etc.
I find markdown can improve readability regardless of whether it is rendered.
1 Like
Even i use simple markdown use of ` and “` in my commit messages even for gcc and have been doing it now for maybe 3 years now.
I do that to show what is code and variable names or operations. Especially like and or or which can get confusing otherwise. I used to use ” In the past but changed to use ` as it was more standard.
Nothing else though.
1 Like
Some Markdown uses are benign and very common (like code). Some are less common, but often still benign in isolation (bullet point lists). Some are rare and a strong indicator for AI generated PR descriptions (headers). There probably are cases where the use of headers for a very complex PR may be justified, but in 95% of the cases it indicates an AI generated PR description. I think that’s what Tom is referring to with “Markdown”.
If we wanted to pursue that general direction, I’d suggest to:
- Change the AI policy to make AI generated PR descriptions hard forbidden, rather than merely discouraged. (This makes things much simpler for reviewers, because we have a clear policy violation rather than something much more nebulous.)
- If the PR description contains markdown headers (or other AI indicators) post a comment that points to our AI policy and summarizes salient pointers (like disclosure requirement and no AI-generated PR descriptions). This should be phrased as a FYI rather than accusation (there will be false positives). A variant of this would be to only make a single post for first-time contributors, but I think this may be less effective.
1 Like
GitHub interprets PR summaries as Markdown, so any attempt at formatting will be rendered poorly unless proper Markdown is used. I wouldn’t frame this as a Markdown problem.
From our Developer Policy on commit messages:
we don’t enforce the format of commit messages
The real issue seems to be the content of summaries rather than their format.
Unfortunately, we don’t currently provide much guidance on what constitutes a good GitHub PR summary. I like this guidance from the MLIR commit message docs:
Prefer describing why the change is implemented rather than what it does. The latter can be inferred from the code.
There’s also this recently blog post by @shafik that I found very insightful:
Why Good Summaries are Important to a Pull Request
Blog about C++, C, Undefined Behavior and Compilers
Fundamentally:
- If we want contributors to follow a specific format in PR summaries (or avoid certain things), then we should provide clear, concrete guidelines.
- If we expect new contributors to follow specific policies (e.g. the AI Tool Policy), then we should surface the key ones prominently, for example in the welcome message, and possibly even ask contributors to confirm they’ve read them before granting commit access.
I don’t believe that simply requiring people to write summaries without using AI will, by itself, address the underlying problem.
More generally, this observation resonates with a pattern that @tstellar described earlier:
tstellar:
I’ve been noticing a pattern lately that is starting to concern me:
- New user submits patch.
- Reviewer provides a lot of feedback.
- Patch gets merged after multiple updates and rounds of review.
- Patch is reverted.
If this pattern disproportionately affects new contributors, then perhaps we should revisit how we onboard them. For example, assigning a community mentor to guide contributors through their first few PRs could help catch issues earlier and set clearer expectations. That’s effectively how we support junior colleagues joining our team at Arm.
Thanks for bringing this up!
banach-space:
For example, assigning a community mentor to guide contributors through their first few PRs could help catch issues earlier and set clearer expectations.
I think many new contributors are hobbyists that end up only merging a few PRs, so it may not be worth it to assign them a mentor.
However, it might be profitable to push the Discord as somewhere they could get help and advice, and instead of assigning mentors, asking these same people to be more active on the Discord to answer questions.
I use Markdown wherever possible, especially for code, because I think formatted text looks much better. However, I rarely use section headings, except in long RFC-style PRs.
I maintain a few pieces of MLIR and ADT/Support and have also experienced an increase in review load, especially from new or relatively new contributors who are clearly being aided by AI tools. I’ll also preface this by saying that I do use agentic tools in my own workflow, and I’m familiar with the both typical pitfalls and benefits. The current iteration of tools seem to amplify one’s skills and weaknesses.
IME, more often than not, AI-generated PRs from new contributors correctly identify a problem / new corner case, but fail to produce a meaningfully general solution. I won’t be linking to specific PRs, but my experience has been that it’s not immediately obvious if a PR is extractive or not – this only becomes apparent after a few iterations, for example, when the author refuses to debug a related part of existing code or open a discourse thread to discuss a more general solution. These folks may be well intentioned but without the ability or commitment to meaningfully contribute yet. The recommendation I’ve heard been given is to treat these more like bug report in disguise – from this POV, AI tools may be complementing fuzzing quite well.
nikic:
Some Markdown uses are benign and very common (like
code). Some are less common, but often still benign in isolation (bullet point lists). Some are rare and a strong indicator for AI generated PR descriptions (headers). There probably are cases where the use of headers for a very complex PR may be justified, but in 95% of the cases it indicates an AI generated PR description. I think that’s what Tom is referring to with “Markdown”.
banach-space:
The real issue seems to be the content of summaries rather than their format.
Markdown has ubiquitous supports across editors / github / discourse / discord, and I also use it in PR and commit descriptions. I think it’s perfectly fine format especially since it readable in plaintext mode too.
The issue with AI-generated PR/commit messages is that they are needlessly verbose without getting to the point. I think that’s expected since the tool (and possibly the author) may not know who the audience is and how to best get their ideas across. A tell-tale sign of an auto generated PR description is multiple sections like Problem, Solution, Test Plan, and the use of unicode / emoji.
- Change the AI policy to make AI generated PR descriptions hard forbidden, rather than merely discouraged. (This makes things much simpler for reviewers, because we have a clear policy violation rather than something much more nebulous.)
+1, I’d strongly prefer for the prose to be human-authored, with the caveat that, IMO, it make sense to allow some small portions be tool-generated (e.g., a table with benchmark results, asking the tool to fix up your grammar, etc.).
banach-space:
If we want contributors to follow a specific format in PR summaries (or avoid certain things), then we should provide clear, concrete guidelines.
I think a specific recipe / format for a PR description is an anti-pattern and would introduce unnecessary review churn. Rather, I’d be in favor of expanding the general guidance with best practices if this is missing from our docs.
1 Like
Thibaultm:
I think many new contributors are hobbyists that end up only merging a few PRs, so it may not be worth it to assign them a mentor.
Very true.
Many contributors are “casual” contributors, and IMO we should try to minimise the effort required from both sides in such cases – especially since the long-term ROI for LLVM is often low. With that in mind, one possible approach could be that, unless a new contributor has a mentor/sponsor, their contributions are initially limited to GitHub issues labelled as good-first-issue (where I assume each issue already has a sponsor who can help guide the work).
Just thinking out loud and trying to identify some actionable steps for us.
kuhar:
I think a specific recipe / format for a PR description is an anti-pattern and would introduce unnecessary review churn. Rather, I’d be in favor of expanding the general guidance with best practices if this is missing from our docs.
I tend to agree. That said, IMO it’s even more important to:
- Identify our key policies.
- Make sure those policies are clearly communicated to new contributors.
Stepping back, it feels like our expectations aren’t being communicated clearly enough today.
1 Like
I opened up LLVM Discourse this fine Sunday afternoon to share a link to the vouch tool by Mitchell Hashimoto, which I found on this Mastodon post, quoted for you:
AI eliminated the natural barrier to entry that let OSS projects trust by default. People told me to do something rather than just complain. So I did. Introducing Vouch: explicit trust management for open source. Trusted people vouch for others. GitHub – mitchellh/vouch: A community trust management system based on explicit vouches to participate.
The idea is simple: Unvouched users can’t contribute to your projects. Very bad users can be explicitly “denounced”, effectively blocked. Users are vouched or denounced by contributors via GitHub issue or discussion comments or via the CLI.
Integration into GitHub is as simple as adopting the published GitHub actions. Done. Additionally, the system itself is generic to forges and not tied to GitHub in any way.
I haven’t thought deeply enough about it to know what the bar for “vouching” for someone is, but the ability to denounce a user would help us deal with the problem of tracking user reputation. If you sent a PR and it ends up wasting reviewer time, future reviewers will not get stuck there.
On the topic of markdown in commit messages, perhaps we’re just using that as a heuristic for detecting LLM output. Maybe the right move is to just run a general-purpose LLM output detector on the PR description, and have that trigger the recommended policy nag message. I don’t know how expensive LLM-detection models are, but ideally it’s cheap enough to run as a GitHub action.
Just out of curiosity, how can we make sure the PR is generated by AI? Does the LLM-detection technique stable now?
I think the problem here is not whether or not the PR is generated by AI. The problem here is that the author refused to think and communicate. (If what described in the thread is true). This is bad no matter if the author uses AI or not.