Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thanks for the report, you found a real bug. That document is a scan processed with Adobe Paper Capture, which adds an invisible OCR text layer on top of the scanned image. Veil sees that text and treats the PDF as native, so it protects the image from inversion instead of inverting it. The dark border you see is the PDF background margin between the page edge and where the raster image starts, that margin gets inverted by the CSS. I'll probably need to cross the text detection with image coverage, meaning that if there's an image covering almost the entire page, it's a scan even if it has native text. Thanks for the specific document, it'll be very useful for reproducing the issue.


Fixed. Full details in the commit here: https://github.com/simoneamico-ux-dev/veil/commit/9d09d9c

In short, by checking 3 simple signals veil can now distinguish a scan with overlaid OCR text from a native PDF with images. The first is whether the image covers more than 40% of the page, meaning it dominates the surface. The second is whether there are more than 200 characters, enough to be a document and not just a cover. The third is whether the image is predominantly blank paper rather than a photograph, verified by sampling the luminance of the pixels. When all three conditions are true, the image is no longer protected and the inversion applies normally. The same detection runs in the export path too. Thanks again for the file, importjelly.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: