Open Bug 1757426 Opened 2 years ago Updated 2 months ago

wordle-analyzer.com takes a massive amount of time to make progress compared to Chromium

Categories

(Core :: JavaScript Engine, defect, P2)

defect

Tracking

()

People

(Reporter: emilio, Unassigned)

References

(Blocks 2 open bugs)

Details

e.g., open: https://wordle-analyzer.com/?seed=2&guesses=yxymrbdnscwjloobiaamweouc&hm=0 (it's not today's wordle fwiw :P)

Chrome seems much more responsive than us in that page. A profile seems to spend most of the time doing GC in the worker threads: https://share.firefox.dev/3IvaVMh

Type: task → defect

A profile from a bit later, from when it starts making a bit of progress: https://share.firefox.dev/36Eatx2

Inverting the call stack, it looks like (in both profiles) the vast majority of non-idle samples in the DOM worker threads are spent in _lll_lock_wait, during jemalloc free and arena_malloc. It looks like there's lock contention in jemalloc. The frees occur during nursery collection. The allocations occur under SetObject::create, while calling init on the underlying hashtable.

sfink: Is there anything interesting here?

Flags: needinfo?(sphink)

Currently the jemalloc arenas we use are per process - maybe they should be per runtime? This would increase overhead per worker but should reduce contention.

I was confused at first because worker minor GCs don't emit markers, and I wasn't seeing any minor GCs in the workers' marker tables using the profiler.

At least in this particular case, it would be great if we could allocate the entire Set data structures in nursery space rather than on the heap. It looks like the SetObjects themselves are in the nursery, and even the js::ValueSets they contain (aka OrderedHashSet<HashableValue, HashableValue::Hasher, ZoneAllocPolicy>) are stored in the nursery. But the data within the ValueSets is still heap-allocated, and so must be freed, which causes lock contention when massive numbers of Sets are used.

But this isn't straightforward. Those tables need to be resizeable, so many of them would probably end up heap-allocating anyway.

Irresponsible speculation: what if nursery Sets stored their data in a common _nurserySetTable (an OrderedHashMap keyed off of a tuple<Set*,key>)? It would be allocated in the malloc heap. When tenuring a Set, you'd copy its elements to a regular heap allocation. It would be cleared at the end of a minor GC. But that would require a bunch of code to handle two very different layouts.

Never mind, I think jonco is right, per-runtime jemalloc arenas sound like a good fix here.

Flags: needinfo?(sphink)

I'm poking around at jemalloc_thread_local_arena(true) to see what happens.

Blocks: GC, sm-runtime
Severity: -- → S4
Priority: -- → P2
You need to log in before you can comment on or make changes to this bug.