Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Even outside Web, you still have mostly-ASCII:

* filenames

* identifiers

* config files

* text protocols

* host names, email addresses

* embedded scripts (including SQL and OpenGL shaders)

* command line interfaces

* translations for languages using Latin alphabets

I don't think 2/3 size reduction for some languages will offset the cost in all the other places.



For some of these things we don’t have much choice, because the encoding is part of some lower-level API (file system, OpenGL, CLI), which usually don’t accept arbitrary encoding. They accept only one, and unless you want to waste time converting, you better use that exact encoding.

Other stuff like IDs, shaders before GL 4.2, and many text protocols aren’t Unicode at all.

For configs I usually use UTF-8 myself, because I don’t like writing parsers for custom formats and just use XML, and any standard-compliant parser supports all of them.


If English is your world yeah.

Some of us use other languages and like to use them everywhere we can.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: