My stupid pelican benchmark proves to be genuinely quite useful here, you get a ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		simonw 44 days ago \| parent \| context \| favorite \| on: GPT‑5.3‑Codex‑Spark My stupid pelican benchmark proves to be genuinely quite useful here, you get a visual representation of the quality difference between GPT-5.3-Codex-Spark and full GPT-5.3-Codex: https://simonwillison.net/2026/Feb/12/codex-spark/

mzl 44 days ago | [–]

I find it interesting that the spark version seems worse than the gpt-oss version (https://simonwillison.net/2025/Aug/5/gpt-oss/)

lacoolj 44 days ago | [–]

These are the ones I look for every time a new model is released. Incorporates so many things into one single benchmark.

Also your blog is tops. Keep it up, love the work.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact