Oops. Sorry, I will be careful next time. Thanks for pointing that out.
> does not include a discussion on multithreaded allocator design
This is very true. tcmalloc seems to have been the earliest design with thread-local pools. jemalloc didn't originally have this design[1], and over time many allocators just adopted it, including SuperMalloc and others.
Actually, thread-local pools predates tcmalloc by quite a few years. Cribbing from the related work section from a paper I'm a co-author on from 2006 (http://www.scott-a-s.com/files/ismm06.pdf):
"Streamflow uses segregated object allocation in thread-private
heaps, as in several other thread-safe allocators including Hoard
[3], Maged Michael’s lock-free memory allocator [18], Tcmalloc
from Google’s performance tools [10], LKmalloc [15], ptmalloc
[9], and Vee and Hsu’s allocator [25]. In particular, Streamflow
uses strictly thread-local object allocation, both thread-local and
remote deallocation and mechanisms for recycling free page blocks
to avoid false sharing and memory blowup [3, 18]."
[3] E. Berger, K. Mckinley, R. Blumofe, and P. Wilson. Hoard: A Scalable
Memory Allocator for Multithreaded Applications. In Proc. of the 9th
International Conference on Architectural Support for Programming
Languages and Operating Systems, pages 117–128, Cambridge, MA,
November 2000.
[15] P. Larson and M. Krishnan. Memory Allocation for Long-Running
Server Applications. In Proceedings of the First International
Symposium on Memory Management, pages 176–185, Vancouver,
BC, October 1998.
[18] M. Michael. Scalable Lock-free Dynamic Memory Allocation. In
Proceedings of the ACM SIGPLAN 2004 Conference on Programming
Language Design and Implementation, pages 35–46, Washington,
DC, June 2004.
[25] V. Vee and W. Hsu. A Scalable and Efficient Storage Allocator
on Shared Memory Multiprocessors. In Proceedings of the 1999
International Symposium on Parallel Architectures, Algorithms and
Networks, pages 230–235, Perth, Australia, June 1999.
The earliest appears to be Larson and Krishnan from 1998. It appears that in the late '90s and early 2000s, it was SMP focused, for servers. Then in the early to mid 2000s, people (including my advisor) started realizing this whole "multicore" thing was for real, and system software would have to change.
I wasn't sure where it appeared first, either! I had to dig out that old related work section. There may be work that predates the '98 reference, but it may not have gotten much attention. (I had assumed Hoard would be the first in the literature, but that's from 2000.) I think when it shows up is more related to the available hardware at the time, and what people were doing with it. It's not a huge stretch to imagine thread-local pools, but I don't think enough people were paying attention to the problem before then.
Oops. Sorry, I will be careful next time. Thanks for pointing that out.
> does not include a discussion on multithreaded allocator design
This is very true. tcmalloc seems to have been the earliest design with thread-local pools. jemalloc didn't originally have this design[1], and over time many allocators just adopted it, including SuperMalloc and others.
[1] https://www.facebook.com/notes/facebook-engineering/scalable... (search for tcmalloc)