We use many lock-free datastructures which rely heavily upon the x86_64 architec...

otterley · on Feb 23, 2012

> the expanded virtual address space enables us to mmap everything

This sounds like a disaster waiting to happen when a node's working set is larger than RAM. Have you considered the impact on performance due to excessive I/O resulting from this kind of overcommit?

Redis performance, for example, suffers tremendously when its database size exceeds the available RAM, which is why the authors advise implementors to cap its database size. The only real difference I can see here is that Redis allocates its storage anonymously, while Hyperdex uses file-backed pages; in either case, performance under memory pressure will largely be governed by how the VM chooses to move pages in and out of its backing store -- behavior the application has no control over.

Optimizing the performance of working sets larger than RAM is hard. Redis had a (thankfully) aborted attempt to do so (the VM is gone in the most recent stable release); and the InnoDB buffer pool in MySQL has been refined for many years and is subject to quite a bit of tuning for specific workloads. (See also http://blog.kennejima.com/post/1226487020/thoughts-on-redis for some thoughtful discussion on the subject.)

You claim on the one hand that Hyperdex was designed to work on data sets larger than RAM, but on the other hand you admit that all the benchmarks were performed with a working set smaller than RAM. I'd like to see comparative benchmarks where the working set is larger than RAM to be convinced.

atombender · on Feb 23, 2012

> This sounds like a disaster waiting to happen

Well, define "disaster". A memory-mapped dataset will cause pathological performance only when there is thrashing: If you are accessing a small percentage of the entire dataset, then unused data will be paged out and remain paged out. If the bulk of data being accessed exceeds the amount of available physical RAM, then you will get I/O trashing.

Memory-mapping is a good alternative to static allocation or a home-grown paging system because it lets the kernel handle the dynamics of allocation, letting your application transparently and gracefully handle RAM tension situation by relinquishing memory space to other apps. Kernels (including Linux and Windows) and CPUs are extremely efficient at paging I/O, much more efficient than a hand-written paging system because there's no need for the application's code to check whether a page is in physical memory — that's handled by the CPU itself.

Of course, any I/O incurred by a too-large dataset will drastically reduce performance compared to in-memory speed. But paging in itself does not necessarily lead to "disaster".

otterley · on Feb 23, 2012

Did you read my entire comment? I specifically said, "when the working set is larger than RAM."

Your observations, while true in the abstract, don't reflect real-world behavior under these conditions.

icanhazdata · on March 3, 2012

Working set larger than ram doesn't imply thrashing. It all depends on the page replacement algorithm employed by the OS, and the applications that are running.