Do you think it's just part of their training set now?

alexeiz · 2026-04-02T18:35:17 1775154917

It's time to do "frog on a skateboard" now.

Wyverald · 2026-04-04T01:16:17 1775265377

In case you haven't seen this: https://x.com/JeffDean/status/2024525132266688757

lysace · 2026-04-02T19:24:43 1775157883

Seems very likely, even if Google has behaved ethically.

Simon and YC/HN has published/boosted these gradual improvements and evaluations for quite some time now.

There is a https://simonwillison.net/robots.txt but it allows pretty much everything, AI-wise.

simonw · 2026-04-02T17:53:36 1775152416

If it's part of their training set why do the 2B and 4B models produce such terrible SVGs?

vessenes · 2026-04-02T18:23:41 1775154221

We were promised full SVG zoos, Simon. I want to see SVG pangolins please

nickpsecurity · 2026-04-02T22:51:05 1775170265

Larger models better understand and reproduce what's in their training set.

For example, I used to get verbatim quotes and answers from copyrighted works when I used GPT-3.5. That's what clued me in to the copyright problem. Whereas, the smallest models often produced nonsense about the same topics. Because small models often produce nonsense.

You might need to do a new test each time to avoid your old ones being scraped into the training sets. Maybe a new one for each model produced after your last one. Totally unrelated to the last one, too.

wolttam · 2026-04-02T20:39:03 1775162343

Because it is in their training set but it's unrealistic to expect a 2B or 4B model to be able to perfectly reproduce everything it's seen before.

The training no doubt contributed to their ability to (very) loosely approximate an SVG of pelican on a bicycle, though.

Frankly I'm impressed

retinaros · 2026-04-02T20:05:27 1775160327

because generating nice looking svg requires handling code, shapes, long context, reasoning and at 2b you most likely will break the syntax of the file 9 times out of 10 if you train for that. or you will need to go for simpler pelicans. might not be worth to ft on a 2b. but on their top tier open model it is definitly worth it. even not directly but just crawling a github would make it train on your pelicans.