Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You can tell how much they cared about data quality because they never took the time to look at context-dependent glyph equivalencies. And some context-sensitive algorithms might not make the same mistakes as a naive “guess what characters are here” algorithm that just uses glyph shapes. You run into this a LOT with ALPR systems because some of the presses excluded some characters. O and 0 are the most common character equivalency. But only in certain places.

OCR is actually complicated if you’re trying to rely on the data for something.



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: