Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I tried exactly that, several times, over and over.

Except on "hello world" situations (which I guess is a solid part of the corpus LLMs are trained with) these tools were consistently slower.

Last time was an area where several files were subtly different in a section that essentially does about the same thing, and needed to be aligned and made consistent†.

Time to - begrudgingly - do it manually: 5min

Time to come up with a one-shot shell incantation: 10min

Time to very dumbly manually mark the areas with ===BEGIN=== and ===END=== and come up with a one-shot shell incantation: 3min

Time to do it for the LLM: 45min††; also it required regular petting every 20ish command so zero chance of letting it run and doing something else†††.

Time to review + manually fix the LLM output which missed two sections, left obsolete comments, and modified four files that were entirely unrelated yet clearly declared as out of scope in the prompt: 5min

Consistently, proponents have been telling me "yeah you need to practice more, I'm getting fine results so you're holding it wrong, we can do a session together and I'll show you how to do it", which they do, and then it doesn't work, and they're like "well I'll look into it and circle back" and I never hear from them again.

As for suggestions, for every good completion where I accept saying "oh well, why not", 99 get rejected: the majority are complete hallucinations absolutely unrelated to the surrounding logic, a third are either broken or introduce non-working code, and 1-5 _actively dangerous_ in some way.

The only places where I found LLMs vaguely useful are:

- Asking questions about an unknown codebase. It still hallucinates and misdirects or is excessively repetitive about some things (even with rules) but it can crudely draw a rough "map" and make non-obvious connections about two distant areas, which can be welcome.

- Asking for a quick code review in addition to the one I ask to humans; 70% of such output is laughably useless (although harmless beyond the noise + energy cost), 30% is duplicate of human reviews but I can get it earlier, and sometimes it unearths a good point that has been overlooked.

† No, the specific section cannot+should not be factored out

†† And that's because I interrupted it because it was going about modifying files that it should not have.

††† A bit of a lie because I did the other three ways during that time. Which also is telling because the time to do the other ways would actually be _lower_ because I was interrupted by / had to keep tabs on what the AI agent was doing.



Please specify the model. If you’re using chatgpt’s default model as is common, it is complete useless slop




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: