Hacker Newsnew | past | comments | ask | show | jobs | submit | pants2's commentslogin

Maybe the only solution to GPTisms is infinite context. If I'm talking to my coworker every day I would consciously recognize when I already used a metaphor recently and switch it up. However if my memory got reset every hour, I certainly might tell the same story or use the same metaphor over and over.

> However if my memory got reset every hour, I certainly might tell the same story or use the same metaphor over and over.

All people repeat the same stories and phraseology to some extent, and some people are as bad or worse than LLM chat bots in their predictability. I wonder if the latter have weak long-term memory on the scale of months to years, even if they remember things well from decades ago.


Honestly I think there is more to it - even with infinite context, the LLM needs some kind of intelligence to know what is noise and what is not, you resort to "thinking" - making it create garbage it then feeds to itself.

Learning a language is a big complex task, but it is far from real intelligence.


Nice, OpenAI mentioned my HackerNews post in their article :) I appreciate that they wrote a whole blog post to explain!

https://news.ycombinator.com/item?id=47319285


Lock in is pretty easy these days. Just a dummy example, Claude models are trained on their `str_replace_based_edit_tool` edit tool[1] which is very different from OpenAI's `apply_patch` tool[2].

1. https://platform.claude.com/docs/en/agents-and-tools/tool-us...

2. https://developers.openai.com/api/docs/guides/tools-apply-pa...


Also Discord - tons of people use Discord as a social network and keep up with friends. I must have 5 friend groups that have their own Discords with some overlap.

So did you disclose this responsibly? Posting about it publicly first is asking for that sensitive data to be leaked. Might as well hack and repost that PII yourself.

This is not a data leakage. They deliberately included 999 of their customers' email addresses in publicly accessible JavaScript code in order to test certain features on them.

Certainly that wasn't intentional to broadcast to the public? Sounds like a textbook data leak.

> A data leak is the unauthorized, often unintentional exposure of sensitive, confidential, or personal information to an external party, usually resulting from weak infrastructure, human error, or system errors.


Consider medical device software. Often embedded C code, needs to be rigorously documented and tested, has longer development cycles, and certainly no attitudes of "bugs are fine, ship it and we'll patch later."


Doesn't give much information about how they were generated


The example they gave of 4d splats used a room with dozens of cameras, and it only talked about the software used for 3d, not 4d.

For the outdoors examples on that site I can only assume they used dozens of drones?


Is anyone here actually using pro models through the API? I'd be very curious what the use-case is.

Yes. High value work where cost (mostly) doesn't matter. For example, if I need to look over a legal doc for possible mistakes (part of a workflow i have), it doesn't matter (in my case) whether it costs $0.01 or $10.00, since it's a somewhat infrequent event. So i'll pay $9.99 more, even if the model is only slightly better.

I'm surprised I never heard people talking about using -Pro variants, even though their rates ($125-175/M?) aren't drastically larger than old Opus ($75/M), which people seemed to use

Indeed, even just Terms of Service and Privacy Policy work. Infrequent enough that cost isn't an issue, but model quality absolutely is

Yes? The same reason you would use it via the tooling.

And "valet" is supposed to rhyme with "ballot" not "ballet" but you'll still sound like an idiot if you say "take your car to the val-it"


Your Merriam Webster source has "val-it" as the first pronunciation (but I think in this case both are correct and valit is less common)

It does.. and I've never heard anyone say it that way (and I appreciate that you chose the only dictionary that gave anything close to your argument).. but that's still nothing like "ballot".

Drink some clarit with the valit over a good filit.

Jeeves (the gentleman's personal gentleman) is a valet that would be pronounced VAL-et.

Labs still aren't publishing ARC-AGI-3 scores, even though it's been out for some time. Is it because the numbers are too embarrassing?

Honest answer is that it isn't done running yet. It takes some human bandwidth and time to run, so results weren't ready by this morning. We don't know what the score will be, but will probably go up on the leaderboard sometime soon. I personally don't put a lot of stock in the ARC-AGI evals, as it's not relevant to most work that people do, but should still be interesting to see as a measure of reasoning ability.

(I work at OpenAI.)


Because they want to keep the narrative that they'll achieve AGI with LLMs alive.

GPT-5.5 was just released and OpenAI didnt mention ARC AGI 3 at all, their score probably sucks.

To be fair, there's not much to report. Isn't it pretty much at 0?

Opus-4.6 with 0.5% currently leads GPT-5.4 with 0.2%[1].

Seems meaningful even if the absolute numbers are very low. That's sort of the excitement of it.

2. https://arcprize.org/leaderboard


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: