More

PedroBatista · 2026-05-03T04:45:10 1777783510

Great to know, but what was the cost both in terms of $$ and tokens used?

Not to invalidate these benchmark results because they are useful, but the real usefulness it what they are capable to do when real people interact with them at scale.

Regardless, these are good news, because now that Microsoft is basically giving up their all-in strategy with Github's Copilot and Anthropic is playing the "I'm too good for you" game, it's about time for them to get pressed into not making this AI world into a divide between the haves and the have-nots.

keyle · 2026-05-03T04:57:52 1777784272

Re pricing. Never as high as frontier commercial models.

CryptoBanker · 2026-05-03T05:55:31 1777787731

You’d be surprised with some long running complex tasks. I’ve seen Kimi spend 8 minutes (total) thinking on a task that Claude got done in 30 seconds. They both ultimately got it right, but Kimi spent ~$2.25 to Claude’s ~$0.20

PedroBatista · 2026-04-22T02:22:36 1776824556

This is such a rug pull.

I'm a paying customer and I did not receive ANY communication about this. Was using Opus this afternoon and then it disappeared.

Microsoft really can't stop being Microsoft. I don't dispute the need to charge more for those models, but there is a basic decency to do things and as usual the Big Tech fuckery and complete lack of morals makes them do this in a way that generates total mistrust where it could be just annoyance.

I'll see how Sonnet handles the most difficult problems but I'm foresee a subscription cancelation soon.

PedroBatista · 2026-04-20T00:28:00 1776644880

It appears what really ended their little scam was the $421 million of reported revenue based on complete lies.

Because lying to investors about product hasn't been really an issue lately, even Intel ~5 years ago did some presentations that were a complete fantasy back when they were desperate to keep their stock value but could not produce a chip smaller than 14nm.

If they prosecute CEOs based on lies to investors other than accounting, almost all AI startups would go down.

ralph84 · 2026-04-20T01:47:49 1776649669

CEOs can say basically anything when it's talking about the future. They just have to include a safe harbor disclaimer about forward-looking statements.

PedroBatista · 2026-04-10T15:05:46 1775833546

The more I live the more I believe people at the top operated in some sort of cult mentality. The level of gullibleness, temporary lack of critical thinking is only matched by their sociopathy and Machiavellianism.

I'm sure it's a great big model, but the level of hype and dishonesty is something out of Sam Altman's book.

Of course it's because of the upcoming IPO, but that's the end game, for now it's critical to get those private equity guys and bank institutions to believe the gospel and hold the bag, only then the suckers from the secondary markets will be allowed to be suckers too.

icedchai · 2026-04-10T16:24:05 1775838245

A good percentage of cybersecurity has always been theater. If their model helps to separate the wheat from the chaff, maybe it'll be an improvement.

bwfan123 · 2026-04-10T16:33:44 1775838824

> A good percentage of cybersecurity has always been theater

It is great to be in a "best-effort" business where there are no consequences for bad things happening. Cybersecurity is one of those businesses. Web search, feeds and ads are another.

Imagine you are selling locks to secure homes. A thief breaks the lock. The lock-maker is not held liable. In fact, they now start selling stronger locks, and lock sales actually improve with more thefts.

SpicyLemonZest · 2026-04-10T16:41:54 1775839314

I'm definitely optimistic that the long-term trajectory is positive. All important software can undergo extensive penetration testing with cutting-edge vulnerability research techniques before launch? Sounds great. The problem is what goes wrong on the pathway to there.

guzfip · 2026-04-10T16:59:15 1775840355

It sounds like it’ll just kill the wheat and the chaff.

Still probably a benefit depending on your philosophy.

colechristensen · 2026-04-10T16:20:11 1775838011

There's a serious problem with being very popular/prominent/powerful and becoming surrounded by sycophants out of a sort of survival of the fittest and then developing a progressively more distorted view of reality as a result. When everything can appear to be made to work to the person at the center they start making progressively worse decisions which are consequence free because of the sway they already have. (this is a big reason why "disruptor" startups work)

reducesuffering · 2026-04-10T16:17:41 1775837861

Or, you're wrong. And the smartest AI Research Scientists and the top banking officials are both correctly worried about the ramifications. That's what you'd expect if there really was an issue here. Are you aware of the deep seated bugs in critical software that were already uncovered with Mythos? Are you able to steelman the issue here at all?

alephnerd · 2026-04-10T16:25:54 1775838354

> Are you aware of the deep seated bugs in critical software that were already uncovered with Mythos

This. 100% this.

A large portion of the industry is under NDA right now, but most of the F500 have already already deployed or started deploying foundational models for AppSec usecases all the way back in 2023.

Sev1 vulns have already been detected using "older" foundation models like Opus 4.x

Of course the noise is significant, but that's something you already faced with DAST, SAST, and other products, and is why most security teams are also pairing models with experienced security professionals to adjudicate and treat foundation model results as another threat intel feed.

colechristensen · 2026-04-10T16:21:36 1775838096

Two things can be true.

Historically bad security that people just got by with matched with powerful tools that aren't any better than the best people, but now can be deployed by mediocre people.

SpicyLemonZest · 2026-04-10T16:37:40 1775839060

Which is exactly what Anthropic understands the situation to be. They state at the beginning of the Glasswing blogpost that Mythos is not better than the best vulnerability researchers. But it doesn't have to be to become a tremendously big deal.

cestith · 2026-04-10T18:20:49 1775845249

There is not just a lower barrier to entry. The best use of a tool will still be made by the most knowledgeable users. So we’re looking at lowering the bar some, but another big deal is the scale at which the top experts can work. That might actually be the longer lever. Imagine a top expert burning tokens across whole repo histories of a few dozen projects looking for likely but unconfirmed flaws, then having the model flag and rank those suspects for their own review in triaged order.

xvector · 2026-04-10T17:59:13 1775843953

Will you eat your words when major vuln disclosures come out 3-4 months from now?

ofjcihen · 2026-04-10T18:08:30 1775844510

Will you eat your words when you find out major vuln disclosures have been happening for decades?

scottyah · 2026-04-10T18:12:15 1775844735

They obviously meant on an unprecedented scale.

ofjcihen · 2026-04-10T18:16:00 1775844960

Sure, and healthy skepticism before proof is a sign of wisdom.

Which makes taking claims from companies at face value…?

downrightmike · 2026-04-10T16:58:51 1775840331

Need to dump the bag on retail investors and pensions before they implode

PedroBatista · 2026-04-07T23:29:22 1775604562

Very cool pictures, especially those ones backlit by the Sun are something new. ie real photos that we usually only see in sci-fi games or movies.

But the real question is: Who of those 4 clogged up the toilet? That's what the public demands to know.

PedroBatista · 2026-04-07T23:21:58 1775604118

People and by people I mean architects and lead devs at big account orgs ( $$$ ) have been using S3 as a filesystem as one of the backbones of their usually wacky mega complex projects.

So there always been a pressure to AWS make it work like that. I suspect the amount of support tickets AWS receives related to "My S3 backed project is slow/fails sometimes/run into AWS limits (like the max number of buckets per account)" and "Why don't.." questions in the design phase which many times AWS people are in the room, serve as enough of a long applied pressure to overcome technical limitations of S3.

I'm not a fan of this type of "let's put a fresh coat on top of it and pretend it's something that fundamentally is not" abstractions. But I suspect here is a case of social pressure turbo charged by $$$.

PedroBatista · 2026-04-07T10:36:53 1775558213

Wait, are stargate data centers a real thing? I thought it was a financial/political vehicle to pump the markets and kick the can down the road.

ben_w · 2026-04-07T22:10:07 1775599807

They can be both.

I'm of the opinion that while e.g. xAI is in a pump game, OpenAI is at least trying to make money. But even if they're not, even if the DCs are as you say "a financial/political vehicle to pump the markets", they can still be physically real things.

That said, I have no idea how close to complete the Stargate UAE site is.

PedroBatista · 2026-04-07T03:47:58 1775533678

Totally an organic and transparent marketplace that joins together publishers and consumers huh?

It has been down since the COVID boom for obvious reasons, and then it has gone even more.. Google needing the billions to put into the AI burner is just and unfortunate coincidence..

PedroBatista · 2026-04-03T11:29:35 1775215775

Please remind me: Is there any legitimate business venture that can operate outside the laws of the country they are registered?

If there is, why don’t these people who write blog posts and comments about how “this is all a scam!!” “It’s a psyop! “They” control it all!” If it’s all black and white, if there no real difference between a company like Proton and Google or Microsoft, then why don’t they create a business that provides a service where there’s no way to any government know anything at all, ever? They’ll be printing money..

But perhaps the conspiracy realm and public broadcast of ideals is more attractive than a real business.

Yes, you shouldn’t trust 100% in a person let alone a group of people that form a company. Grow up.

PedroBatista · 2026-03-31T09:31:20 1774949480

Relax, while mentioning the real world without any criticism for the soundness of the solution is absolute nonsense, some would say idiotic, thinking only in the absolute best solution given your narrow world view is not any better.

hresvelgr · 2026-03-31T11:26:37 1774956397

While I agree that my view is narrow, the "best solution" in question is what we used to do, and it was fine. There are still many places that manually manage dependencies. Fundamentally automatic software versioning is an under-developed area in need of attention, and technologies like semantic versioning which are ubiquitous are closer to suggestions, and not true indicators of breaking changes. My personal view is that fully automatic dependency version management is an ongoing experiment and should be treated as such.