Do we have any estimate on the size of OpenAI top of the line models? Would they...

GreenWatermelon · on Feb 1, 2025

It's nice because a company can optionaly provide a SOTA reasoning model for their clients, without having to go through a middleman e.g. HR co. Can provide an LLM for their HRMS system for a small 2000$ investment. Not 2000/month, just a one time 2000 investment.

NeutralCrane · on Feb 1, 2025

No one will be doing anything practical with a local version of Deepseek on a $2000 server. The token throughout of this thing is like, 1 token every 4 seconds. It would take nearly a full minute just to produce a standard “Roses are red, violets are blue” poem. There’s absolutely no practical usage that you can use that for. It’s cool that you can do it, and it’s a step in the right direction, but self-hosting these wont be a viable alternative to using providers like OpenAI for business applications for a while.

menaerus · on Feb 2, 2025

> but self-hosting these wont be a viable alternative to using providers like OpenAI for business applications for a while.

Why not? While 3-4 tok/s is still on the lower end, it is still usable to the point that I can use it for any task that doesn't require me to get into a real-time communication with the model.

In other words, I don't mind waiting a 1-minute for good-enough response from the model for topic that would take me multiples of that to compile and research on my own. It's a clear net win.

NeutralCrane · on Feb 7, 2025

If you have so little throughput that you don’t need more than 3 tokens a second, you are processing so little data that your costs from the LLM providers won’t even sniff the $2000 you will spend on the hardware to self host.

bbzylstra · on Feb 1, 2025

The OP said they were getting 3-4 TPS on their $2000 rig.

walterbell · on Feb 1, 2025

We crawl so we can learn to walk.

yapyap · on Feb 1, 2025

Is the size of OpenAI‘s top of the line models even relevant? Last I checked they weren’t open source in the slightest.

exe34 · on Feb 1, 2025

it would make sense if you don't want somebody else to have access to all your code and customer data.