"Why would I commit something written by AI with myself as author?"
Because you're the one who decided to take responsibility for it, and actually choose to PR it in its ultimate form.
What utility do the reviews/maintainers get from you marking whats written by you vs. chatgpt? Other than your ability to scapegoat the LLM?
The only thing that actually affects me (the hypothetical reviewer) and the project is the quality of the actual code, and, ideally, the presence of a contributer (you) who can actually answer for that code. The presence or absence of LLM generated code by your hand makes no difference to me or the project, why would it? Why would it affect my decision making whatsoever?
Its your code, end of story. Either that or the PR should just be rejected, because nobody is taking responsibility for it.
As someone mostly outside of the vibe coding stuff, I can see the benefit in having both the model and the author information.
Model information for traceability and possibly future analysis/statistics, and author to know who is taking responsibility for the changes (and, thus, has deeply reviewed and understood them).
As long as those two information are present in the commit, I guess which commit field should hold which information is for the project to standardise. (but it should be normalised within a project, otherwise the "traceability/statistics" part cannot be applied reliably).
Yeah, nothing wrong with keeping the metadata - but "Authored-by" is both credit and an attestation of responsibility. I think people just haven't thought about it too much and see it mostly as credit and less as responsibility.
I disagree. “Authored by” - and authorship in general - says who did the work. Not who signed off on the work. Reviewed-by me, authored by Claude feels most correct.
> Before AI, did you credit your code completion engine for the portions of code it completed?
Code completions before LLMs was helping me type faster by completing variable names, variable types, function arguments, and that’s about it. It was faster than typing it all out character by character, but the auto completion wasn’t doing anything outside of what I was already intending to write.
With an LLM, I give brief explanations in English to it and it returns tens to hundreds of lines of code at a time. For some people perhaps even more than that. Or you could be having a “conversation” with the LLM about the feature to be added first and then when you’ve explored what it will be like conceptually, you tell it to implement that.
In either case, I would then commit all of that resulting code with the name of the LLM I used as author, and my name as the committer. The tool wrote the code. I committed it.
As the committer of the code, I am responsible for what I commit to the code base, and everyone is able to see who the committer was. I don’t need to claim authorship over the code that the tool wrote in order for people to be able to see who committed it. And it is in my opinion incorrect to claim authorship over any commit that consists for the very most part of AI generated code.
True. Might also vary depending on how one uses the LLM.
For example, in a given interaction the user of the LLM might be acting more like someone requesting a feature, and the LLM is left to implement it. Or the user might be acting akin to a bug reporter providing details on something that’s not working the way it should and again leaving the LLM to implement it.
While on the other hand, someone might instruct the LLM to do something very specific with detailed constraints, and in that way the LLM would perhaps be more along the line of a fancy auto-complete to write the lines of code for something that the user of the LLM would otherwise have written more or less exactly the same by hand.
Claude adds "Co-authored by" attribution for itself when committing, so you can see the human author and also the bot.
I think this is a good balance, because if you don't care about the bot you still see the human author. And if you do care (for example, I'd like to be able to review commits and see which were substantially bot-written and which were mostly human) then it's also easy.
> I'd like to be able to review commits and see which were substantially bot-written and which were mostly human) then it's also easy.
Why is this, though? I'm genuinely curious. My code-quality bar doesn't change either way, so why would this be anything but distracting to my decision making?
Personally it would make the choice to say no to the entire thing a whole lot easier if they self-reported on themselves automatically and with no recourse to hide the fact that they've used LLMs. I want to see it for dependencies (I already avoid them, and would especially do so with ones heavily developed via LLMs), products I'd like to use, PRs submitted to my projects, and so on, so I can choose to avoid them.
Mostly this is because, all things considered, I really do not need to interact with any of that, so I'm doing it by choice. Since it's entirely voluntary I have absolutely no incentive to interact with things no one bothered to spend real time and effort on.
If you choose not to use software written with LLM assisstance, you'll use to a first approximation 0% of software in the coming years.
Even excluding open source, there are no serious tech companies not using AI right now. I don't see how your position is tenable, unless you plan to completely disconnect.
This is shouting at the clouds I'm afraid (I don't mean this in a dismissive way). I understand the reasoning, but it's frankly none of your business how I write my code or my commits, unless I choose to share that with you. You also have a right to deny my PRs in your own project of course, and you don't even have to tell me why! I think on github at least you can even ban me from submitting PRs.
While I agree that it would be nice to filter out low effort PRs, I just don't see how you could possibly police it without infringing on freedoms. If you made it mandatory for frontier models, people would find a way around it, or simply write commits themselves, or use open weight models from China, etc.
Accountability. Same reason I want to read human written content rather than obvious AI: both can be equally shit, but at least with humans there's a high probability of the aspirational quality of wanting to be considered "good"
With AI I have no way of telling if it was from a one line prompt or hundreds. I have to assume it was one line by default if there's no human sticking their neck out for it.
LLMs can make mistakes in different ways than humans tend to. Think "confidently wrong human throwing flags up with their entire approach" vs. "confidently wrong LLM writing convincing-looking code that misunderstands or ignores things under the surface."
Outside of your one personal project, it can also benefit you to understand the current tendencies and limitations of AI agents, either to consider whether they're in a state that'd be useful to use for yourself, or to know if there are any patterns in how they operate (or not, if you're claiming that).
Burying your head in the sand and choosing to be a guinea pig for AI companies by reviewing all of their slop with the same care you'd review human contributions with (instead of cutting them off early when identified as problematic) is your prerogative, but it assumes you're fine being isolated from the industry.
Sure, the point about LLM "mistakes" etc being harder to detect is valid, although I'm not entirely sure how to compare this with human hard to detect mistakes. If anything I find LLM code shortcomings often a bit easier to spot because a lot of the time they're just uneeded dependencies, useless comments, useless replication of logic, etc. This is where testing come into play too and I'm definitely reviewing your tests (obviously).
>Burying your head in the sand and choosing to be a guinea pig for AI companies by reviewing all of their slop with the same care you'd review human contributions with (instead of cutting them off early when identified as problematic) is your prerogative, but it assumes you're fine being isolated from the industry.
I mean listen: I wish with every fiber of my being that LLMs would dissapear off the face of the earth for eternity, but I really don't think I'm being "isolating myself from the industry" by not simply dismissing LLM code. If I find a PR to be problematic I would just cut it off, thats how I review in the first place. I'm telling some random human who submitted the code to me that I am rejecting their PR cause its low quality, I'm not sending anthropic some long detailed list of my feedback.
This is also kind of a moot point either way, because everyone can just trivially hide the fact that they used LLMs if they want to.
> If anything I find LLM code shortcomings often a bit easier to spot because a lot of the time they're just uneeded dependencies, useless comments, useless replication of logic, etc.
By this logic, it's useful to know whether something was LLM-generated or not because if it was, you can more quickly come to the conclusion that it's LLM weirdness and short-circuit your review there. If it's human code (or if you don't know), then you have to assume there might be a reason for whatever you're looking at, and may spend more time looking into it before coming to the conclusion that it's simple nonsense.
> This is also kind of a moot point either way, because everyone can just trivially hide the fact that they used LLMs if they want to.
Maybe, but this thread's about someone who said "I'd like to be able to review commits and see which were substantially bot-written and which were mostly human," and you asking why. It seems we've uncovered several feasible answers to your question of "why would you want that?"
Because you're the one who decided to take responsibility for it, and actually choose to PR it in its ultimate form.
What utility do the reviews/maintainers get from you marking whats written by you vs. chatgpt? Other than your ability to scapegoat the LLM?
The only thing that actually affects me (the hypothetical reviewer) and the project is the quality of the actual code, and, ideally, the presence of a contributer (you) who can actually answer for that code. The presence or absence of LLM generated code by your hand makes no difference to me or the project, why would it? Why would it affect my decision making whatsoever?
Its your code, end of story. Either that or the PR should just be rejected, because nobody is taking responsibility for it.