wordle-analyzer.com takes a massive amount of time to make progress compared to Chromium
Categories
(Core :: JavaScript Engine, defect, P2)
Tracking
()
People
(Reporter: emilio, Unassigned)
References
(Blocks 2 open bugs)
Details
e.g., open: https://wordle-analyzer.com/?seed=2&guesses=yxymrbdnscwjloobiaamweouc&hm=0 (it's not today's wordle fwiw :P)
Chrome seems much more responsive than us in that page. A profile seems to spend most of the time doing GC in the worker threads: https://share.firefox.dev/3IvaVMh
Reporter | ||
Updated•2 years ago
|
Reporter | ||
Comment 1•2 years ago
|
||
A profile from a bit later, from when it starts making a bit of progress: https://share.firefox.dev/36Eatx2
Comment 2•2 years ago
|
||
Inverting the call stack, it looks like (in both profiles) the vast majority of non-idle samples in the DOM worker threads are spent in _lll_lock_wait
, during jemalloc free and arena_malloc. It looks like there's lock contention in jemalloc. The frees occur during nursery collection. The allocations occur under SetObject::create, while calling init on the underlying hashtable.
sfink: Is there anything interesting here?
Comment 3•2 years ago
|
||
Currently the jemalloc arenas we use are per process - maybe they should be per runtime? This would increase overhead per worker but should reduce contention.
Comment 4•2 years ago
|
||
I was confused at first because worker minor GCs don't emit markers, and I wasn't seeing any minor GCs in the workers' marker tables using the profiler.
At least in this particular case, it would be great if we could allocate the entire Set data structures in nursery space rather than on the heap. It looks like the SetObject
s themselves are in the nursery, and even the js::ValueSet
s they contain (aka OrderedHashSet<HashableValue, HashableValue::Hasher, ZoneAllocPolicy>
) are stored in the nursery. But the data within the ValueSet
s is still heap-allocated, and so must be freed, which causes lock contention when massive numbers of Sets are used.
But this isn't straightforward. Those tables need to be resizeable, so many of them would probably end up heap-allocating anyway.
Irresponsible speculation: what if nursery Sets stored their data in a common _nurserySetTable
(an OrderedHashMap
keyed off of a tuple<Set*,key>
)? It would be allocated in the malloc heap. When tenuring a Set, you'd copy its elements to a regular heap allocation. It would be cleared at the end of a minor GC. But that would require a bunch of code to handle two very different layouts.
Never mind, I think jonco is right, per-runtime jemalloc arenas sound like a good fix here.
Comment 5•2 years ago
|
||
I'm poking around at jemalloc_thread_local_arena(true)
to see what happens.
Updated•2 years ago
|
Comment 6•2 months ago
|
||
Latest profile: https://share.firefox.dev/3T5udzt
Description
•