Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

An LLM company using regexes for sentiment analysis? That's like a truck company using horses to transport parts. Weird choice.
 help



The difference in response time - especially versus a regex running locally - is really difficult to express to someone who hasn't made much use of LLM calls in their natural language projects.

Someone said 10,000x slower, but that's off - in my experience - by about four orders of magnitude. And that's average, it gets much worse.

Now personally I would have maybe made a call through a "traditional" ML widget (scikit, numpy, spaCy, fastText, sentence-transformer, etc) but - for me anyway - that whole entire stack is Python. Transpiling all that to TS might be a maintenance burden I don't particularly feel like taking on. And on client facing code I'm not really sure it's even possible.


So, think of it as a business man: You don't really care if your customers swear or whatever, but you know that it'll generate bad headlines. So you gotta do something. Just like a door lock isn't designed for a master criminal, you don't need to design your filter for some master swearer; no, you design it good enough that it gives the impression that further tries are futile.

So yeah, you do what's less intesive to the cpu, but also, you do what's enough to prevent the majority of the concerns where a screenshot or log ends up showing blatant "unmoral" behavior.


This door lock doesn’t even work against people speaking French, so I think they could have tried a mite harder.

The up-side of the US market is (almost) everyone there speaks English. The down side is, that includes all the well-networked pearl-clutchers. Europe (including France) will have the same people, but it's harder to coordinate a network of pearl-clutching between some saying "Il faut protéger nos enfants de cette vulgarité!" and others saying "Η τηλεόραση και τα μέσα ενημέρωσης διαστρεβλώνουν τις αξίες μας!" even when they care about the exact same media.

For headlines, that's enough.

For what's behind the pearl-clutching, for what leads to the headlines pandering to them being worth writing, I agree with everyone else on this thread saying a simple word list is weird and probably pointless. Not just for false-negatives, but also false-positives: the Latin influence on many European languages leads to one very big politically-incorrect-in-the-USA problem for all the EU products talking about anything "black" (which includes what's printed on some brands of dark chocolate, one of which I saw in Hungary even though Hungarian isn't a Latin language but an Ugric language and only takes influences from Latin).


I just went through quite an adventure trying to translate back and forth from/to Hungarian to/from different languages to figure out which Hungarian word you meant, and arrived at the conclusion that this language is encrypted against human comprehension.

dark chocolate is "étcsokoládé" literally edible-chocolate in Hungarian.

i heared the throat-cleaning "Negró" candy (marketed by a chimney sweeper man with soot-covered face) was usually which hurt English-speaking people's self-deprecating sensitivities.


En toute honnêteté, je pense avoir dit "damn it" plus d'une fois à chat gépété avant de fermer la fenêtre dans un accès de rage

Nom de dieu de putain de bordel de merde de saloperie de connard d'enculé de ta mère.

It's like wiping your arse with silk.

There are only Americans on the internet.

Yea.. but.. in English only.

Fortunately I can swear pretty well in Spanish.


Only a native speaker can tell if you swear well in a foreign language.

And Claude can't tell at all.

That's like saying you can use a chisel for woodworking.

If it’s good enough it’s good enough, but just like there are many more options than going full blown LLM or just use a regex there are more options than transpile a massive Python stack to TS or give up.

> Someone said 10,000x slower, but that's off - in my experience - by about four orders of magnitude.

You do know that 10,000x _is_ four orders of magnitude, right? :-D


OP is saying that in their experience it is more like eight orders of magnitude

I guess I need reading glasses ... :-D

They're sending it to an llm anyway tho? Not sure why they wouldn't just add a sentiment field to the requested response shape.

because a regex on the client is free vs gpu compute is absolutely not.

BUT THEY'RE ALREADY RUNNING IT THROUGH THE LLM.

Because they want it to be executed quickly and cheaply without blocking the workflow? Doesn’t seem very weird to me at all.

They probably have statistics on it and saw that certain phrases happen over and over so why waste compute on inference.

More likely their LLM Agent just produced that regex and they didn't even notice.

The problem with regex is multi-language support and how big the regex will bloat if you to support even 10 languages.

Supporting 10 different languages in regex is a drop in the ocean. The regex can be generated programmatically and you can compress regexes easily. We used to have a compressed regex that could match any placename or street name in the UK in a few MB of RAM. It was silly quick.

I think it will depend on the language. There are a few non-latin languages where a simple word search likely won't be enough for a regex to properly apply.

Exactly this. Unicode is a big beast to consider in regex concats.

woah. This is a regex use I've never heard of. I'd absolutely love to see a writeup on this approach - how its done and when it's useful.

You can literally | together every street address or other string you want to match in a giant disjunction, and then run a DFA/NFA minimization over that to get it down to a reasonable size. Maybe there are some fast regex simplification algorithms as well, but working directly with the finite automata has decades of research and probably can be more fully optimized.

This was many moons ago, written in perl. From memory we used Regexp::Trie - https://metacpan.org/release/DANKOGAI/Regexp-Trie-0.02/view/...

We used it to tokenize search input and combined it with a solr backend. Worked really remarkably well.


We're talking about Claude Code. If you're coding and not writing or thinking in English, the agents and people reading that code will have bigger problems than a regexp missing a swear word :).

I talk to it in non-English. But have rules to have everything in code and documentation in english. Only speaking with me should use my native language. Why would that be a problem?

Because 90% of training data was in English and therefore the model perform best in this language.

In my experience these models work fine using another language, if it’s a widely spoken one. For example, sometimes I prompt in Spanish, just to practice. It doesn’t seem to affect the quality of code generation.

They literally just have to subtract the vector for the source language and add the vector for the target.

It’s the original use case for LLMs.


Thank you. +1. There are obviously differences and things getting lost or slightly misaligned in the latent space, and these do cause degradation in reasoning quality, but the decline is very small in high resource languages.

It’s just a subjective observation.

It just can’t be a case simply because how ML works. In short, the more diverse and high quality texts with reasoning reach examples were in the training set, the better model performs on a given language.

So unless Spanish subset had much more quality-dense examples, to make up for volume, there is no way the quality of reasoning in Spanish is on par with English.

I apologise for the rambling explanation, I sure someone with ML expertise here can it explain it better.


I saw a curious post recently that explored this idea, and showed that it isn’t really the case. The internal layers of the model aren’t really reasoning in English, or in any human language.

Translation in/out of human languages only happens at the edges of the model.

Internal layer activations for the same concept are similar regardless of language, while activations at the top/bottom layers diverge. Meanwhile the pattern is reversed for same language different content.


So we do at least agree on the fact that quality of human language <-> embeddings transition depends on how good target language is represented in the training dataset?

Even if it happens at the edge, on every conversation turn, I may assume non captured small subtleties of meaning over time can accumulate into significant error.


The agent may be translating to English behind the scenes. There are all kinds of hacks layered on top of the underlying models. But however exactly it works, it’s clearly not the case that these things only make use of Spanish training data when you prompt them in Spanish.

Lets take a language with orders of magnitude less training data, like Nahuatl. The following is what happens if I ask Claude Code in Nahuatl to give me an example of a simple program that I can execute inside a web browser (avoiding any direct keyword prompts like 'Javascript' or 'Hello World'). Interestingly, it responds in Spanish. I doubt that it has any relevant training data in Nahuatl. (Apologies if the Nahuatl is wrong – I just used Google translate.)

    ¿Uelis techmakas se neskayotl tlen se programa tlen amo ouij tlen ueliskia nijchiuas ipan se navegador web?

    -----

    Entiendo tu pregunta en náhuatl. Aquí un ejemplo sencillo — una calculadora en HTML puro que puedes abrir directo en el navegador, sin servidor:

    <!DOCTYPE html>
    <html>
    <head>
        <meta charset="UTF-8">
        <title>Calculadora</title>
        <style>
        body { font-family: sans-serif; display: flex; justify-content: center; padding: 40px; }
        .calc { border: 1px solid #ccc; padding: 20px; border-radius: 8px; width: 200px; }
        input { width: 100%; margin-bottom: 10px; font-size: 1.2em; text-align: right; }
        button { width: 22%; margin: 2px; padding: 8px; font-size: 1em; cursor: pointer; }
        .wide { width: 47%; }
        </style>
    </head>
    <body>
    <div class="calc">
        <input type="text" id="pantalla" readonly>
        <div>
        <button onclick="tecla('7')">7</button>
        <button onclick="tecla('8')">8</button>
        <button onclick="tecla('9')">9</button>
        <button onclick="tecla('/')">÷</button>
    
        <button onclick="tecla('4')">4</button>
        <button onclick="tecla('5')">5</button>
        <button onclick="tecla('6')">6</button>
        <button onclick="tecla('*')">×</button>
    
        <button onclick="tecla('1')">1</button>
        <button onclick="tecla('2')">2</button>
        <button onclick="tecla('3')">3</button>
        <button onclick="tecla('-')">−</button>
    
        <button onclick="tecla('0')">0</button>
        <button onclick="tecla('.')">.</button>
        <button onclick="tecla('+')">+</button>
        <button onclick="limpiar()">C</button>
    
        <button class="wide" onclick="calcular()">=</button>
        <button class="wide" onclick="borrar()">⌫</button>
        </div>
    </div>
    
    <script>
        const pantalla = document.getElementById('pantalla');
    
        function tecla(val) {
        pantalla.value += val;
        }
    
        function calcular() {
        try {
            pantalla.value = eval(pantalla.value);
        } catch {
            pantalla.value = 'Error';
        }
        }
    
        function limpiar() { pantalla.value = ''; }
        function borrar() { pantalla.value = pantalla.value.slice(0, -1); }
    </script>
    </body>
    </html>
    
    Guarda esto como calculadora.html y ábrelo en cualquier navegador — no necesita servidor ni dependencias. Es un buen punto de partida para aprender HTML,
    CSS y JavaScript.

> it’s clearly not the case that these things only make use of Spanish training data when you prompt them in Spanish.

It’s not! And I’ve never said that.

Anyways, I’m not even sure what we are arguing about, as it’s 100% fact that SOTA models perform better in English, the only interesting question here how much better, is it negligible or actually makes a difference in real world use-cases.


It’s negligible as far as I can tell. If the LLM can “speak” the language well then you can prompt it in that language and get more or less the same results as in English.

In my experience agents tend to (counterintuitively) perform better when the business language is not English / does not match the code's language. I'm assuming the increased attention mitigates the higher "cognitive" load.

Claude handles human languages other than English just fine.

They only need to look at one language to get a statistically meaningful picture into common flaws with their model(s) or application.

If they want to drill down to flaws that only affect a particular language, then they could add a regex for that as well/instead.


Did you just complain about bloat, in anything using npm?

Why do you need to do it at the client side? You are leaking so much information on the client side. And considering the speed of Claude code, if you really want to do on the client side, a few seconds won't be a big deal.

Depends what its used by, if I recall theres an `/insights` command/skill built in whatever you want to call it that generates a HTML file. I believe it gives you stats on when you're frustrated with it and (useless) suggestions on how to "use claude better".

Additionally after looking at the source it looks like a lot of Anthropics own internal test tooling/debug (ie. stuff stripped out at build time) is in this source mapping. Theres one part that prompts their own users (or whatever) to use a report issue command whenever frustration is detected. It's possible its using it for this.


> a few seconds won't be a big deal

it is not that slow


It looks like it's just for logging, why does it need to block?

Better question - why would you call an LLM (expensive in compute terms) for something that a regex can do (cheap in compute terms)

Regex is going to be something like 10,000 times quicker than the quickest LLM call, multiply that by billions of prompts


This is assuming the regex is doing a good job. It is not. Also you can embed a very tiny model if you really want to flag as many negatives as possible (I don't know anthropic's goal with this) - it would be quick and free.

I think it's a very reasonable tradeoff, getting 99% of true positives at the fraction of cost (both runtime and engineering).

Besides, they probably do a separate analysis on server side either way, so they can check a true positive to false positive ratio.


Oh it’s worse than that. This one ended up getting my account banned: https://github.com/anthropics/claude-code/issues/22284

This is a tricky problem, I mean, Pinyin also uses the English alphabet.

It is not a tricky problem because it has a simple and obvious solution: do not filter or block usage just because the input includes a word like "gun".

Wow, that's horrible.

... and closed for inactivity like basically every issue in the repo, of course.

Because they actually want it to work 100% of the time and cost nothing.

Maybe hard to believe but not everyone is speaking English to Claude

Then they made it wrong. For example, "What the actual fuck?" is not getting flagged, neither is "What the *fuck*".

It is exceedingly obvious that the goal here is to catch at least 75-80% of negative sentiment and not to be exhaustive and pedantic and think of every possible way someone could express themselves.

Classic over-engineering. Their approach is just fine 90% of the time for the use case it’s intended for.

75-80% [1], 90%, 99% [2]. In other words, no one has any idea.

I doubt it's anywhere that high because even if you don't write anything fancy and simply capitalize the first word like you'd normally do at the beginning of a sentence, the regex won't flag it.

Anyway, I don't really care, might just as well be 99.99%. This is not a hill I'm going to die on :P

[1]: https://news.ycombinator.com/item?id=47587286

[2]: https://news.ycombinator.com/item?id=47586932


It compares to lowercase input, so doesn't matter. The rest is still valid

Except that it's a list of English keywords. Swearing at the computer is the one thing I'll hear devs switch back to their native language for constantly

They evidently ran a statistical analysis and determined that virtually no one uses those phrases as a quick retort to a model's unsatisfying answer... so they don't need to optimize for them.

what you are suggesting would be like a truck company using trucks to move things within the truck

That’s what they do. Ever heard of a hand truck?

I never knew the name of that device.

Thanks


Depending on the region you live in, it's also frequently called a "dolly"

Isn’t a dolly a flat 4 wheeled platform thingy? A hand truck is the two wheeled thing that tilts back.

Ha! Where I'm from a "dolly" was the two-wheeled thing. The four-wheeler thing wasn't common before big-boxes took over the hardware business, but I think my dad would have called it a "cart", maybe a "hand-cart".

Grew up with two wheeled: dolly and four wheeled: piano dolly. Was an adult when I heard hand-truck. I prefer dolly. Nicer mouth feel.

Do we have a hand llm perchance?

Yeah it’s called a regex. With a lot of human assistance it can do less but fits in smaller spaces and doesn’t break down.

It’s also deterministic, unlike llms…

Well, regex doesn't hallucinate....right?

I just went to expertSexChange.com…

buttbuttination


Good to have more than a hammer in your toolbox!

Because they are engineers? The difference between an engineer and a hobbyist is an engineer has to optimize the cost.

As they say: any idiot can build a bridge that stands, only an engineer can build a bridge that barely stands.


Cloud hosted call centers using LLMs is one of my specialties. While I use an LLM for more nuanced sentiment analysis, I definitely use a list of keywords as a first level filter.

A lot if things dont make sense until you involve scale. Regex could be good enough do give a general gist.

Asking a non deterministic software to act like a deterministic one (regex) can be a significantly higher use of tokens/compute for no benefit.

Some things will be much better with inference, others won’t be.


Using some ML to derive a sentiment regex seems like a good actually?

It's more like a truck company using people to transport some parts. I could be wrong here, but I bet this happens in Volvo's fabrics a lot.

Don’t worry, they used an llm to generate the regex.

This just proves its vibe coded because LLMs love writing solutions like that. I probably have a hundred examples just like it in my history.

Actually, this could be a case where its useful. Even it only catches half the complaints, that's still a lot of data, far more than ordinary telemetry used to collect.

because impact of WTF might be lost in the result of the analysis if you solely rely on LLM.

parsing WTF with regex also signifies the impact and reduces the noise in metrics

"determinism > non-determinism" when you are analysing the sentiment, why not make some things more deterministic.

Cool thing about this solution, is that you can evaluate LLM sentiment accuracy against regex based approach and analyse discrepancies


It's more like workers on a large oil tanker using bicycles to move around it, rather than trying to use another oil tanker.

Not everything done by claude-code is decided by LLM. They need the wrapper to be deterministic (or one-time generated) code?

I used regexes in a similar way but my implementation was vibecoded, hmmm, using your analysis Claude Code writes code by hand.

The amount of trust and safety work that depends on google translate and the humble regex, beggars the imagination.

> That's like a truck company using horses to transport parts. Weird choice.

Easy way to claim more “horse power.”


More like a car company transporting their shipments by truck. It's more efficient

LLMs cost money, regular expressions are free. It really isn't so strange.

As far as I can tell they do nothing with it. They just log it.

They had the problem of sentiment analysis. They use regexes.

You know the drill.


LLMs are good at writing complex regex, from my experience

Using regex with LLMs isn't uncommon at all.

Maybe. Could just be a pre filter.

They're searching for multiple substrings in a single pass, regexes are the optimal solution for that.

The issue isn't that regex are a solution to find a substring. The issue is that you shouldn't be looking for substrings in the first place.

This has buttbuttin energy. Welcome to the 80s I guess.


> The issue is that you shouldn't be looking for substrings in the first place.

Why? They clearly just want to log conversations that are likely to display extreme user frustration with minimal overhead. They could do a full-blown NLP-driven sentiment analysis on every prompt but I reckon it would not be as cost-effective as this.


>Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.

The only time to use a regex is when searching with a human in the loop. All other uses are better handled some other way.

>They could do a full-blown NLP-driven sentiment analysis on every prompt but I reckon it would not be as cost-effective as this.

Every conversation is sent to an llm at least a thousand times the size of gpt2 which could one shot this nearly a decade ago.


> Every conversation is sent to an llm at least a thousand times the size of gpt2 which could one shot this nearly a decade ago.

Yes, but that is _what the product does_. What we are talking about is _telemetry_.


Very likely vibe coded.

I've seen Claude Code went with a regex approach for a similar sentiment-related task.


My understanding of vibe coding is when someone doesn’t look at the code and just uses prompts until the app “looks and acts” correct.

I doubt you are making regex and not looking at it, even if it was AI generated.


Clbuttic!

It's fast, but it'll miss a ton of cases. This feels like it would be better served by a prompt instruction, or an additional tiny neural network.

And some of the entries are too short and will create false positives. It'll match the word "offset" ("ffs"), for example. EDIT: no it won't, I missed the \b. Still sounds weird to me.


It’s fast and it matches 80% of the cases. There’s no point in overengineering it.

> There’s no point in overengineering it.

I swear this whole thread about regexes is just fake rage at something, and I bet it'd be reversed had they used something heavier (omg, look they're using an LLM call where a simple regex would have worked, lul)...


The pattern only matches if both ends are word boundaries. So "diffs" won't match, but "Oh, ffs!" will. It's also why they had to use the pattern "shit(ty|tiest)" instead of just "shit".

You're right, I missed the \b's. Thanks for the correction.

It's all regex anyways

it's like a faster than light spaceship company using horses. There's been infinite solutions to do this better even CPU only for years lol.

hmm not a terrible idea (I think).

You have a semi expensive process. But you want to keep particular known context out. So a quick and dirty search just in front of the expensive process. So instead of 'figure sentiment (20seconds)'. You have 'quick check sentiment (<1sec)' then do the 'figure sentiment v2 (5seconds)'. Now if it is just pure regex then your analogy would hold up just fine.

I could see me totally making a design choice like that.


It's almost as if LLMs are unreliable



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: