jemalloc是什么？它是如何工作的？有哪些好处？

Question

jemalloc是什么？它是如何工作的？有哪些好处？

firefoxmalloc

97

Firefox 3带有一个新的分配器：jemalloc。

我在几个地方听说过这个新的分配器更好。但是，最高排名的谷歌搜索结果没有提供更多信息，我对它的工作原理感兴趣。

- Albert

4个回答

20

有一份有趣的资源：C源代码本身: https://dxr.mozilla.org/mozilla-central/source/memory/build/mozjemalloc.cpp (旧版)

一开始，简短的摘要概述了它大致的工作方式。

// This allocator implementation is designed to provide scalable performance
// for multi-threaded programs on multi-processor systems.  The following
// features are included for this purpose:
//
//   + Multiple arenas are used if there are multiple CPUs, which reduces lock
//     contention and cache sloshing.
//
//   + Cache line sharing between arenas is avoided for internal data
//     structures.
//
//   + Memory is managed in chunks and runs (chunks can be split into runs),
//     rather than as individual pages.  This provides a constant-time
//     mechanism for associating allocations with particular arenas.
//
// Allocation requests are rounded up to the nearest size class, and no record
// of the original request size is maintained.  Allocations are broken into
// categories according to size class.  Assuming runtime defaults, 4 kB pages
// and a 16 byte quantum on a 32-bit system, the size classes in each category
// are as follows:
//
//   |=====================================|
//   | Category | Subcategory    |    Size |
//   |=====================================|
//   | Small    | Tiny           |       4 |
//   |          |                |       8 |
//   |          |----------------+---------|
//   |          | Quantum-spaced |      16 |
//   |          |                |      32 |
//   |          |                |      48 |
//   |          |                |     ... |
//   |          |                |     480 |
//   |          |                |     496 |
//   |          |                |     512 |
//   |          |----------------+---------|
//   |          | Sub-page       |    1 kB |
//   |          |                |    2 kB |
//   |=====================================|
//   | Large                     |    4 kB |
//   |                           |    8 kB |
//   |                           |   12 kB |
//   |                           |     ... |
//   |                           | 1012 kB |
//   |                           | 1016 kB |
//   |                           | 1020 kB |
//   |=====================================|
//   | Huge                      |    1 MB |
//   |                           |    2 MB |
//   |                           |    3 MB |
//   |                           |     ... |
//   |=====================================|
//
// NOTE: Due to Mozilla bug 691003, we cannot reserve less than one word for an
// allocation on Linux or Mac.  So on 32-bit *nix, the smallest bucket size is
// 4 bytes, and on 64-bit, the smallest bucket size is 8 bytes.
//
// A different mechanism is used for each category:
//
//   Small : Each size class is segregated into its own set of runs.  Each run
//           maintains a bitmap of which regions are free/allocated.
//
//   Large : Each allocation is backed by a dedicated run.  Metadata are stored
//           in the associated arena chunk header maps.
//
//   Huge : Each allocation is backed by a dedicated contiguous set of chunks.
//          Metadata are stored in a separate red-black tree.
//
// *****************************************************************************

虽然，缺少更深入的算法分析。

- Albert

7

关于jemalloc为mozilla带来的好处，根据http://blog.pavlov.net/2008/03/11/firefox-3-memory-usage/（同时也是mozilla+jemalloc的第一个谷歌搜索结果）：我们得出结论，在长时间运行后，jemalloc给我们带来了最小的碎片化。我们在Windows Vista上的自动化测试中发现，当我们打开jemalloc时，内存使用量下降了22%。

- Nickolay

3

Aerospike在2013年的一个私有分支中实现了jemalloc。2014年，它被纳入Aerospike 3.3。Psi Mankoski刚刚在High Scalability上写了关于Aerospike的实现以及何时如何有效地使用jemalloc的文章。

jemalloc确实帮助Aerospike充分利用现代多线程、多CPU、多核计算机架构。jemalloc还内置了一些非常重要的调试功能来管理区域。调试功能使得Psi能够准确判断是真正的内存泄漏还是由于内存碎片造成的结果。Psi还讨论了线程缓存和每个线程分配提供的整体性能（速度）提升。

- Peter Corless

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- paxdiablo · Accepted Answer

jemalloc最初是为FreeBSD开发的，由一位名叫"Jason Evans"的人提出的想法，因此被称为“je”。如果我没有曾经编写过一个名为paxos的操作系统，我可能会嘲笑他自负。

请参见这份PDF文件，其中详细描述了算法的工作原理。

其主要好处是在多处理器和多线程系统中实现可扩展性，部分地通过使用多个竞技场（从中进行分配的原始内存块）来实现。

在单线程的情况下，使用多个竞技场没有真正的好处，因此使用单个竞技场。

但是，在多线程的情况下，会创建许多竞技场（与处理器数量相比，竞技场数量增加四倍），并以轮换方式将线程分配到这些竞技场中。

这意味着可以减少锁争用，因为虽然多个线程可能同时调用malloc或free，但只有共享同一竞技场的线程才会竞争。具有不同竞技场的两个线程不会相互影响。

此外，jemalloc试图优化高速缓存局部性，因为从RAM获取数据的行为比使用CPU高速缓存中已有的数据要慢得多（与从磁盘中快速获取和从磁盘中慢速获取之间的区别概念类似）。为此，它首先尝试尽可能减少整体内存使用量，因为这更有可能确保应用程序的全部工作集在高速缓存中。

如果无法实现该目标，则尝试确保分配是连续的，因为分配在一起的内存倾向于同时使用。

从白皮书中可以看出，这些策略对于单线程使用提供了类似于当前最佳算法的性能，同时为多线程使用提供了改进。