Parallel unmarking is slower on 64 bit Linux
Categories
(Core :: JavaScript: GC, defect, P1)
Tracking
()
People
(Reporter: jonco, Assigned: jonco)
Details
Attachments
(4 files)
Looking at telemetry since bug 1618638 landed, I can see a reduction in the proportion of budget overruns due to waiting for unmarking to finish on most platforms. The only exception is 64 bit Linux where there is an increase.
Assignee | ||
Comment 1•5 years ago
|
||
I don't know what's causing this, but I'm going to make a bunch of small improvements and see if that improves matters.
Assignee | ||
Comment 2•5 years ago
|
||
The change to use multiple parallel threads for unmarking made it so that all unmarking must complete before any other GC work was started. This patch queues all parallel work / tasks at the same time and waits for everything to complete.
Updated•5 years ago
|
Assignee | ||
Comment 3•5 years ago
|
||
Previously we could start more parallel tasks than there were work items. This changes things around so we take a work item when we create a ParallelWorker, so that we can stop creating them if we run of of work.
I change the foreground cell update code to not use ParallelWorker at all for work that happens entirely on the main thread and doesn't need synchronisation.
Depends on D66961
Assignee | ||
Comment 4•5 years ago
|
||
Yoshi pointed out that we don't always unlock the helper thread mutex when using AutoRunParallelWork. The code will work anyway, but it looks strange and perhaps we should not rely on this. This patch adds an unlock in the one place it's currently missing.
Depends on D66962
Assignee | ||
Comment 5•5 years ago
|
||
Not a performance improvement, but I tidyup I was planning. This changes makes use use the same number of threads for all parallel work rather than making this a parameter. The number of threads used is half the cpu count clamped between 1 and 8.
The main change is that we will use fewer threads for sweeping weak caches now, but I doubt that will make much difference since all this work is likely to be memory bound.
This cleaned up GCRuntime::updateCellPointers; the thread count passed in could never be zero here anyway.
Depends on D66964
Updated•5 years ago
|
Comment 7•5 years ago
|
||
bugherder |
Assignee | ||
Comment 8•5 years ago
|
||
This hasn't worked and over the last four days I can see a big increase in the proportion of GC_SLOW_TASK telemetry caused by unmarking. Let's start the investigation by backing out the second patch ("Start fewer parallel tasks if not enough work to satisfy maximum number of threads") as that's the largest change.
Comment 9•5 years ago
|
||
Backed out changeset ff16837de269 (bug 1622757) by dev's request
Backout:
https://hg.mozilla.org/integration/autoland/rev/c9de99b454667ec0a7a04058313943a058f22995
Comment 10•5 years ago
|
||
The priority flag is not set for this bug.
:jonco, could you have a look please?
For more information, please visit auto_nag documentation.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 11•5 years ago
|
||
Backing out that change hasn't made any discernable impact on telemetry so I'm going to reland it.
Comment 12•5 years ago
|
||
Comment 13•5 years ago
|
||
bugherder |
Assignee | ||
Comment 14•5 years ago
|
||
Looking at the GC_SLOW_TASK telemetry again, I can see that the largest task changes a lot when we change the code. But the GC_BUDGET_OVERRUN telemetry doesn't necessarily link that with longer overruns. Looking at that telemetry, we see a regression when bug 1618638 landed and a subseqent improvement to original levels when the patches in this bug landed, on all platforms including linux. Therefore I'm going to mark this as fixed now.
Updated•5 years ago
|
Description
•