Closed Bug 1935829 Opened 1 year ago Closed 7 months ago

Nightly doesnt use parallel threads on a 1.6s long GC Marking on a modified testcase (and spends 26s in Sweep phase)

Categories

(Core :: JavaScript Engine, task, P3)

task

Tracking

()

RESOLVED FIXED
146 Branch
Tracking Status
firefox146 --- fixed

People

(Reporter: mayankleoboy1, Assigned: sfink)

References

(Blocks 1 open bug)

Details

Attachments

(3 files, 2 obsolete files)

Attached file test.html

Profile: Nightly: https://share.firefox.dev/4f8682E

Testcase is modified from the testcase in bug 1821107 (so I have no idea what I'm doing)

And here is a profile where it appears to be using parallel thread: https://share.firefox.dev/3OO5qwI

This is doing weak map marking, which is not parallelised. However it is taking an extremely long time.

Most of the time is in Zone::enterWeakMarkingMode and looking at that it seems that we are not making progress but yielding to the mutator and restarting marking the ephemeron edges every time we exhaust the slice budget.

Steve, any ideas here? I think we need to restart in case the mutator added more entries to this table. Is that right? Maybe we could set a flag and run to completion if we previously yielded here. I don't see any obvious way to restart from where we left off.

Flags: needinfo?(sphink)

I have a set of changes that improve this test case.

  • initially, it takes 3.6 us per entry (for 4M entries)
  • turning the GC nonincremental if enterWMM blows the budget drops it to 1.5us
  • reducing the memory usage of the gcEphemeronEdges table by using tagged pointers for the values has no effect on the time
  • "optimizing" failed lookups by using a bit on the arena has no effect on the time, so I dropped the patch
  • delaying tracing weakmaps' entries until everything reachable has been marked drops it to 0.9us

This last is a heuristic. It is good in this case because the order of marking happens to be WeakMap-first (because it's earlier in the code, roughly), which is the worst possible. This is not guaranteed to always work; you could have a whole bunch of objects that are only reachable through a WeakMap, and then you'd build up a giant ephemeron edge table again. But this at least makes it much less likely to fall into the problematic case, at the cost of an unfortunate amount of added complexity. (Because of lock ordering, the entry marking needs to be done on the main marking thread. Though it's possible that we could be dropping the helper thread state lock, and then that would no longer be the case?)

Another approach might be to add fine-grained locking around the ephemeron edge table. This would add some lock contention, but might simplify the code by allowing the (deferred) traceWeakEntries to happen on the parallel marking threads.

Flags: needinfo?(sphink)
Severity: -- → S3
Priority: -- → P3
See Also: → 1953446

For some reason, this testcase has become 2x slower now : https://share.firefox.dev/4krCBVJ (60s) . Profile in comment 0 took 30s

Edit: filed as bug 1954261

See Also: → 1954261

maybe this is now higher priority because of https://bugzilla.mozilla.org/show_bug.cgi?id=1821107#c14 ?

Assignee: nobody → sphink
Status: NEW → ASSIGNED

Comment on attachment 9516310 [details]
Bug 1935829 - have GCMarker::{stop,reset} call a common GCMarker::postMarking() r=jonco

Revision D266264 was moved to bug 1991980. Setting attachment 9516310 [details] to obsolete.

Attachment #9516310 - Attachment is obsolete: true

Comment on attachment 9516312 [details]
Bug 1935829 - Reorder marking to delay tracing a WeakMap's entries until everything else has been marked r=jonco

Revision D266265 was moved to bug 1991980. Setting attachment 9516312 [details] to obsolete.

Attachment #9516312 - Attachment is obsolete: true
Duplicate of this bug: 1954261
Pushed by ctuns@mozilla.com: https://github.com/mozilla-firefox/firefox/commit/448d989e1e3b https://hg.mozilla.org/integration/autoland/rev/5c84d5b9db60 Revert "Bug 1935829 - Use tagged ptrs for EphemeronEdge r=jonco" for causing SM bustages in Sweeping.cpp

Backed out for causing SM bustages

Flags: needinfo?(sphink)
Flags: needinfo?(sphink)
Status: ASSIGNED → RESOLVED
Closed: 7 months ago
Resolution: --- → FIXED
Target Milestone: --- → 146 Branch

Edit:
there is some odd behaviour with the profiler and this testcase. I have filed bug 1994907 to track it separately.

See Also: → 1994907
QA Whiteboard: [qa-triage-done-c147/b146]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: