I'm 100% an AI skeptic. I don't think we're near AGI, well any reasonable definition of it. I don't think we're remotely near a super intelligence or the singularity or whatever. But that doesn't matter. These tools will change how work is done. What matters is what he says here:
> Peeps, let’s do some really simple back-of-envelope math. Trust me, it won’t be difficult math.
> You get the LLM to draft some code for you that’s 80% complete/correct.
> You tweak the last 20% by hand.
> How much of a productivity increase is that? Well jeepers, if you’re only doing 1/5th the work, then you are… punches buttons on calculator watch… five times as productive.
The hard part is the engineering, yes, and now we can actually focus on it. 90% of the work that software engineers do isn't novel. People aren't sitting around coding complicated algorithms that require PhDs. Most of the work is doing pretty boring stuff. And even if LLMs can only write 50% of that you're still going to get a massive boost in productivity.
Of course this isn't always going to be great. Debugging this code is probably going to suck, and even with the best engineering it'll likely accelerate big balls of mud. But it's going to enable a lot of code to get written by a lot more people and it's super exciting.
> But it's going to enable a lot of code to get written by a lot more people and it's super exciting.
Is that exciting though? A lot of the code we've currently got is garbage. We are already slogging through a morass of code. I could be wrong, but I don't think LLMs are going to change this trend.
it is GREAT time to slowly get into the consulting side of our industry. there will be so much mess that I think there will be troves of opportunity to come in on a contract and fix messes. there always was but there will be a ton more!
> A lot of the code we've currently got is garbage.
That is why I have never understood the harsh criticism of quality of LLM generated code. What do people think much of the models were trained on? Garbage in -> Garbage out.
I think it's exciting that more people will be able to build more things. I agree though that most code is garbage, and that it's likely going to get worse. The future can be both concerning and exciting.
Personal anecdote: I pushed some LLM generated code into Prod during a call with the client today. It was simple stuff, just a string manipulation function, about 10 LOC. Absolutely could have done it by hand but it was really cool to not have to think about the details - I just gave some inputs and desired outputs and had the function in hand and tested it. Instead of saying "I'll reply when the code is done" I was able to say "it's done, go ahead and continue" and that was a cool moment for me.
That's the perfect application for llm code! It is a well defined problem, with known inputs and outputs, and probably many examples the llm slurped up from open source code.
And that isn't a bad thing! Now us engineers need to focus on the high level stuff, like what code to write in the first place.
> How much of a productivity increase is that? Well jeepers, if you’re only doing 1/5th the work, then you are… punches buttons on calculator watch… five times as productive.
To state the obvious -- only if all units of work are of equal difficulty.
The original calculation is not even right, because 100% of the work software engineers do probably isn’t even writing code. A lot of the time they’re debugging code or attending some kind of meeting. So if only 20% of their time is coding and they do 1/5th of it, they’re like what, 16% more productive overall?
Now factor in the expense of using LLMs, it’s not worth it, the math isn’t mathing. You can’t even squeeze a 2x improvement, let alone the 5-10x that has become expected in the tech industry.
Also absent in these discussions is who maintains the 80% that wasn’t written by them when it needs to either be fixed or adapted to a new use case. This is where I typically see “fast” AI prototypes fall over completely. Will that get better? I’m ambivalent but could concede ‘probably’ but at the end of the day I don’t think you can get away from the fact that for every non trivial programming task, there does need to be a human somewhere to verify/validate work, or at the very least understand it enough to debug it. This seems like an inescapable fact.
And one thing people don’t realize is that if you’re working in a codebase heavily coded by AI, you’re going to want to see the original prompts used if you don’t know the original engineers intent. And no one seems to track this. It becomes a game of Chinese whispers.
Honestly this quickly becomes a shit show and I don’t know how anyone can seriously consider having a large amount of a codebase coded by AI.
> And one thing people don’t realize is that if you’re working in a codebase heavily coded by AI, you’re going to want to see the original prompts used if you don’t know the original engineers intent.
If I'm working in a codebase that is heavily coded by anyone who isn't "me, within the last week", I'm going to want to see the implementation-technology-neutral requirements (and, ideally, human-written tests/test scenarios, unit and acceptance and everywhere in between, operationalizing the requirements) because I can't know the original engineer's intent.
This isn't any less true when the engine was wet biological goop instead of silicon.
How is this different from any legacy code base? Everywhere I've worked there's been swaths of code where the initial intent is lost to time and debugging them is basically archeology.
Random AI code isn't going to be categorically different from random code copied from StackOverflow, or code simply written a few years ago. Reading code will always be harder than writing it.
Legacy code is written by humans, and when you read stuff written by humans you can anticipate what their intent was because you are a human as well.
With AI, there is no intent, the AI is a probabilistic model and you have to guess if the model got it right.
It’s like the difference between driving next to a human driver vs a self driving car. You have some idea of what the human will do but the self driving car could flip out at any moment and do something irrational.
> Legacy code is written by humans, and when you read stuff written by humans you can anticipate what their intent was because you are a human as well.
Well, no. This is both technically categorically false (because you can only "anticipate" something in advance, and the intent is in the past by the time you are reading the code, though you might in principal infer their intent), and often practically false even when read with "infer" in place of "anticipate", as it is quite common for the intent of human-written code to be non-obvious, except in the sense of "the purpose of a system is what it does", in which sense AI-written code is not particularly more opaque.
No, in time it’s going to look just like that but with hallucinations and large areas of the codebase no human has even read. No doubt digging through code and trying to understand will involve a lot more prompting back and forth
> No, in time it’s going to look just like that but with hallucinations and large areas of the codebase no human has even read.
An area of code that no human has ever read is no different to me than an area of code lots of humans who aren't me have read, and that's not that different from an area of code that I have read, but not particularly recently.
I mean, except for who it is that I am going to be swearing about when it breaks and I finally do read it.
An area of code that’s been read a lot is likely to be like a paved road. Sure, it could have cracks or mud but it’s generally understood and really bad warts fixed or warning signs left behind. An area of code that runs rarely and is inspected even more rarely is likely to have bring more surprises and unexpected behaviors.
> How is this different from any legacy code base? Everywhere I've worked there's been swaths of code where the initial intent is lost to time and debugging them is basically archeology.
I will concede your point even though I disagree with it for the sake of this argument - specifically, that it's not different than legacy code bases (I believe it is, because they were written by humans with much more business and code context than any LLM on the market can currently consume) - then why use AI at all if it isn't very much different?
I can say from my actual experience and specialty, which is dissecting spaghetti'd codebases (more from a cloud infrastructure perspective, which is where much of my career has been focused on), that any kind of lost knowledge in legacy codebases usually presents themselves in clues or just asking some basic questions from the business owners that do remain. someone knows something about the legacy code that's running the business, whether that's what it's for or what the original purpose was, and I don't realistically expect an LLM/chatbot/AI will ever be able to sus that out like a human could, which involves a lot of meetings and talking to people (IME). This is just based on my experience untangling large codebases where the original writers had been gone for 5+ years. From my perspective expecting an AI to maintain huge balls of mud is much more likely to result in bigger piles of mud than they originally generated - I don't see how it can logically follow that they'll somehow improve the ball of mud such that it no longer is a ball of mud. They're especially prone to this because of their 100% agreeability. And given the current strategy of just using increasingly larger context windows and more compute to account for these types of problems - I don't see how expecting an AI to maintain a huge ball of mud for a long time is realistically feasible. every line of code and related business context then adds to the cost exponentially in a way that doesn't happen when just hiring some fresh junior with a lot of salt and vinegar and get-up wouldn't also solve with enough determination.
A common thing that comes up in SWE is that the business asks for something that's either stupid, unreasonable, or a huge waste of time and money - Senior engineers know when to say no or the spots to push back. LLM's and the cleverest "prompt engineers" simply don't and I don't see any world where this gets better, again, due to the agreeability issues. I also don't see or understand a world where these same AI engineers can communicate to the business the constraints or timelines in a way that makes sense. I don't expect this to improve, because every business tends to have its unique problems that can't simply be trained on from some training set.
My concern is that it's always harder to comprehend and debug code you didn't write, so there are hidden costs to using a copy-pasted solution even if you "get to the testing phase faster." Shaving percentages in one place could add different ones elsewhere.
Imagine if your manager announced a wonderful new productivity process: From now on a herd of excited interns will get the first stab at feature requests, and allll that the senior developers need to do is take that output and juuuust spruce it up a bit.
Then you have plenty of time for those TPS reports. (Now, later in my career, I have a lot more sympathy for Tom Smykowski, speaking with customers so the engineers don't have to.)
> Of course this isn't always going to be great. Debugging this code is probably going to suck, and even with the best engineering it'll likely accelerate big balls of mud. But it's going to enable a lot of code to get written by a lot more people and it's super exciting.
This is quite literally what junior engineers are.
I've worked at many places filled to the brim with junior engineers. They weren't enjoyable places to work. Those businesses did not succeed. They sure did produce a lot of code though.
I'd also be very suspicious of this Pareto distribution, to me it implies that the 20% of work we're talking about is the hard part. Spitting out code without understanding it definitely sounds like the easy part, the part that doesn't require much if any thinking. I'd be much more interested to see a breakdown of TIME, not volume of code; how much time does using an LLM save (or not) in the context of a given task?
Whatever time is be saved (if?) will be taken up by the higher expectation of the worker. Imagine a manager’s monologue “now that you have AI, wouldnt’t it be easier to implement those features that we had to scrap because we were understaffed? Soon we’ll be even more understaffed and you’ll have to work harder.”
The thing with junior engineers is that they may eventually become proper seniors and you'll play your role in it by working with them. Nobody learns anything from working with LLMs.
You can't become a senior and skip being a junior.
This isn't about the promise of AI to me, it's more about this type of AI technology, which should more accurately be called LLMs, as there are others.
Skepticism aside, looking at what's been possible or similar, much like Microsoft word introduces more and more grammar related features, LLMs are reasonably decent for the use case of supplying your input to it and speeding up iterations with you at the drivers seat. Whether it's text in a word processor, or code in an editor, there is likely some meaningful improvements to not dismiss the technology.
The inverse of that, where someone wants a magic seed prompt to do everything, could prove to be a bit more challenging where existing models are trained on what they know, and steps to move beyond that where they learn appear to be a few years away.
> But it's going to enable a lot of code to get written by a lot more people and it's super exciting.
I think it will cut both ways. I guess it depends on the type of work. I feel like expectations will fill to meet the space of the increased productivity. At least, I can see this being true in a professional sense.
If I could be five times more productive i.e., work 80% less, then I would be quite excited. However, I imagine most of us will be still be working just as much -- if not more.
Maybe I am just jaded, but I can't help but be reminded of the old saying, "The reward for digging the biggest hole is a bigger shovel."
For personal endeavors, I am nothing but excited for this technology.
> Peeps, let’s do some really simple back-of-envelope math. Trust me, it won’t be difficult math.
> You get the LLM to draft some code for you that’s 80% complete/correct.
> You tweak the last 20% by hand.
> How much of a productivity increase is that? Well jeepers, if you’re only doing 1/5th the work, then you are… punches buttons on calculator watch… five times as productive.
The hard part is the engineering, yes, and now we can actually focus on it. 90% of the work that software engineers do isn't novel. People aren't sitting around coding complicated algorithms that require PhDs. Most of the work is doing pretty boring stuff. And even if LLMs can only write 50% of that you're still going to get a massive boost in productivity.
Of course this isn't always going to be great. Debugging this code is probably going to suck, and even with the best engineering it'll likely accelerate big balls of mud. But it's going to enable a lot of code to get written by a lot more people and it's super exciting.