UnmarkGrayGCThingRecursively calling TimeStamp::Now causes hangs on Windows (via UnmarkGrayJSListenersInCCGenerationDocuments)
Categories
(Core :: JavaScript: GC, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox94 | --- | fixed |
People
(Reporter: florian, Assigned: jonco)
References
Details
(Keywords: hang, perf, Whiteboard: [bhr:UnmarkGrayGCThingRecursively])
Attachments
(1 file)
See https://fqueze.github.io/hang-stats/#date=20210815&row=0&filter=UnmarkGrayGCThingRecursively
The typical stack is:
NtQueryPerformanceCounter ntdll
RtlQueryPerformanceCounter ntdll
static mozilla::TimeStamp::Now(bool) mozglue
JS::UnmarkGrayGCThingRecursively(JS::GCCellPtr) xul
...
The frequency of this hang varies a lot depending on the day. There's probably a difference between week days and week-end days.
Comment 1•3 years ago
|
||
It looks like this is happening during nsContentUtils::UnmarkGrayJSListenersInCCGenerationDocuments(). I'm not sure why we're seeing hangs from TimeStamp::Now(). Maybe we could batch up these calls into the JS engine so there wouldn't be as many TimeStamp calls?
Assignee | ||
Comment 2•3 years ago
|
||
It looks like the call to TimeStamp::Now is from Statistics::recordPhaseBegin. I guess this can be an overhead if we unmark many gray things in a loop.
Comment 3•3 years ago
|
||
Besides batching these unmark grey calls, another better way to reduce the performance overhead would be to move more things into mozilla::dom::TraceBlackJS(), so we mark these things black initially, instead of marking them grey and then later unmarking them to black.
Updated•3 years ago
|
Comment 4•3 years ago
|
||
While fixing this might need work on the JS side, I'd argue that the primary cause of this is the way we dump tons of stuff into UnmarkGrey from the DOM side, so I'm going to move it to DOM.
Comment 5•3 years ago
|
||
I believe performance related things go to Performance, so...
Comment 7•3 years ago
|
||
Hmm, having TimeStamp::Now() in a hot code path is surprising.
DOM could optimize how often calls are being made, but UnmarkGrayGCThingRecursively is a hot method, so TimeStamp::Now() call there should be probably optimized out anyhow.
Assignee | ||
Comment 8•3 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #7)
Steve, I'd be OK with removing the GC stats phase recording from gray unmarking. Gray unmarking is a hot code path that can be called many times repeatedly, unlike the other GC phases which occur as part of a GC and do a bunch of work in each phase. What do you think?
Updated•3 years ago
|
Comment 9•3 years ago
|
||
I'm a little torn. It seems like timing gray unmarking is exactly what we need in order to motivate things like bug 1727587 and measure resulting improvements. On the other hand, grabbing a timestamp in hot code feels very wrong. I guess one alternative would be to count calls without timing them, or only time calls that recurse past the initial object. But that's getting more complex.
I will also note that we also have a profiling category for gray unmarking, which covers some of the same ground.
An in-between solution would be to make GC stats recording for gray unmarking be conditional on the profiler being enabled. You would have to rely on the profiler to tell you that there is a problem, but then when you're already in the profiler the phase data for the GC would be correct.
All in all, I'm in favor of just removing GC stats recording from gray unmarking entirely.
Comment 10•3 years ago
|
||
Is TimeStamp::Now() hanging? Or is it just that we are spending a lot of time unmarking things, and that we are reporting the first thing which happens to be observable by the BHR?
Another option would be that we have a loop of GCs which are not doing anything beside reporting statistics, but this would be unlikely to hang on its own, and the issue would be in the thing which causes this GC loop.
Reporter | ||
Comment 11•3 years ago
|
||
(In reply to Nicolas B. Pierron [:nbp] from comment #10)
Is TimeStamp::Now() hanging?
It's not really hanging, but calling RtlQueryPerformanceCounter is known to be slow on Windows, and it's likely called many times.
Comment 12•3 years ago
|
||
Could this issue be caused by Bug 1724980?
Comment 13•3 years ago
|
||
(In reply to Nicolas B. Pierron [:nbp] from comment #12)
Could this issue be caused by Bug 1724980?
I don't believe so. That runnable isn't running any GCs, so it wouldn't be gray-unmarking anything.
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 14•3 years ago
|
||
Updated•3 years ago
|
Comment 15•3 years ago
|
||
Comment 16•3 years ago
|
||
bugherder |
Description
•