Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Lua has an interesting approach here. In Lua, all strings are interned. If you have "two" strings that consist of the same bytes, you are guaranteed that they have the same address and are the same object. Basically, every time a string is created from some operation, it's looked up in a hash table of the existing strings and if an identical one is found, that gets reused.

However, that hash table stores weak references to those strings. If nothing else refers to a string, the GC can and will remove it from the string table.

This gives you great memory use for strings and optimally fast string comparisons. The cost is that creating a string is probably a bit slower because you have to check the string table for the existing one first.

It's an interesting set of trade-offs. I think it makes a lot of sense for Lua which uses hash tables for everything, including method dispatch and where string comparison must be fast. I'm not sure how much sense it would make for other languages.



A problem with that approach:

You can discover what internal strings are held in a web application via a timing attack.

Better hope you never hold onto a reference to internal credentials inside the application! (Say... DB username / password? Passwords before they're hashed? Etc.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: