1805644 - Speedometer 2 is ~5% faster with --disable-jemalloc

Reporter

Description

•

3 years ago

See the perf comparison here for a --disable-jemalloc build:

https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=0c1382d65cf4765c31ee57a1d7fb34582f20922a&newProject=try&newRevision=1df24dd07d99a784ca22a13463a8ffe136cce728&framework=13&page=1

Some of the subtests are > 15% faster without jemalloc.

It's hard to say how much of this is from the system allocator being faster vs extra overhead we have for security features such as poisoning, but it suggests potential wins in this area.

Jan de Mooij [:jandem]

Reporter

Updated

•

3 years ago

Blocks: speedometer3

Alex Thayer [:alexical] (she/her)

Comment 1

•

3 years ago

Marking this as P1 because I think it's critical that we get high confidence on why we see this effect, and whether we should invest in doing something about it, earlier rather than later. I think it's entirely plausible that our answer will be "this makes sense because X and so there's nothing we can do here," but this is a lot of opportunity to leave lying around.

Being concrete, what I would like to see answers to:

What distinguishes the tests that see very large wins from this from the tests that don't? (NOTE: I would start profiling tests roughly in order from highest confidence to lowest in the perfherder view, in order to hopefully see stable profiles which can easily be compared)
Can the difference in performance be explained by time directly attributed within malloc/free/etc, or could effects like bug 1805255 be playing a part?
For posterity, what's the whole list of extra variables that are bundled with us using jemalloc (poisining, PHC, profiler memory instrumentation all come to mind, but what's the whole list?)
- Can we isolate the impact of those things?
What's the list of jemalloc-specific tuning that we've done outside of jemalloc, and what is the impact of all of those things? (DOMArena and DOM's work to try to size things in-line with jemalloc implementation details, uses of jemalloc_thread_local_arena, etc. - what's the whole list?)
- Can we isolate the impact of those things?

If nothing else, answering these questions explicitly and in writing will give us something specific to point to detailing why we have not invested in migrating off of it.

Priority: -- → P1

	Speedometer	logalloc-replay
jemalloc with poisoning	81.2 ±1.6	8.607s ±0.057
jemalloc without poisoning	83.8 ±1.1	6.249s ±0.038
ptmalloc	88.9 ±1.8	7.026 s ±0.036