Buffering gray roots takes a long time

RESOLVED FIXED in Firefox 55

Status

()

Core
JavaScript: GC
RESOLVED FIXED
3 months ago
2 months ago

People

(Reporter: jonco, Assigned: jonco)

Tracking

55 Branch
mozilla55
Points:
---

Firefox Tracking Flags

(firefox55 fixed)

Details

(Whiteboard: [qf:p1])

Attachments

(2 attachments, 1 obsolete attachment)

(Assignee)

Description

3 months ago
Following updates to the statistics code in bug 1361458 we have seen an increase in the number of collections where buffering gray roots was the longest phase.

I investigated this it see whether this was real or a problem with the stats changes.  Running membench locally I see times > 10ms for buffering gray roots even when there are only a few tens of tabs open.  So it seems like this is a real problem.
(Assignee)

Comment 1

3 months ago
For example, here is some trace output:

    *** bufferGrayRoots 16.824372 mS, 80560 collecting, 0 non, state 1
      cycle collector holders 15.790875 42541
      TraceAdditionalNativeGrayRoots 1.019483 0
        XPConnect wrapped JS roots 0.700865 5286
        XPConnect variant roots 0.000100 0
        XPConnect wrapped native wrapper maps 0.296006 0

The numbers are time in milliseconds and loop iterations.  The above is for a full GC, but in some cases we were doing a zone GC and there were many roots reported to us for zones that were not being traced.  It seems that most of the time comes from tracing the CC's held JS objects of which there can be many.
Yeah, the holders are just a single gigantic hash table. With a bit of work, they could be made per-zone, which would help the zone GC case. I expect that holders, unlike XPCWNs, are only going to be involved with a single compartment.

Maybe we could cache the objects with a write barrier to avoid the callback overhead.

Comment 3

3 months ago
In theory a holder may move across zones. Say, some chrome JS moves a chrome DOM node to a content DOM node.
Whiteboard: [qf]
(Assignee)

Comment 4

3 months ago
Created attachment 8875347 [details] [diff] [review]
bug1369748-refactor-begin-mark-phase

Patch to split GCRuntime::beginMarkPhase into a bunch of smaller methods/functions.
Attachment #8875347 - Flags: review?(sphink)
(Assignee)

Comment 5

3 months ago
Created attachment 8875365 [details] [diff] [review]
bug1369748-parallelise-begin-mark-phase

Patch to run unmarking and buffering of gray roots on helper threads in GCRuntime::beginMarkPhase.

I added a new node in the stats heirarchy for preparation done before marking starts.  I had to remove an assertion from DOM workers that their trace method is called on the main thread.

Try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b61878c0ebb1520d3a767260a24fee9ca0abd796
Attachment #8875365 - Flags: review?(sphink)
Running possibly weird browser code off the main thread is a little scary.
Attachment #8875347 - Flags: review?(sphink) → review+
Could you change the worker assert to check that it is either on the main thread or a JS worker thread? Or maybe at least that it isn't on a worker thread?
Comment on attachment 8875365 [details] [diff] [review]
bug1369748-parallelise-begin-mark-phase

Review of attachment 8875365 [details] [diff] [review]:
-----------------------------------------------------------------

I'm sure there *must* be something horribly wrong with this patch, but I'm not seeing it. Looks good to me.

::: dom/workers/WorkerPrivate.cpp
@@ -4033,5 @@
>  
>  template <class Derived>
>  NS_IMPL_CYCLE_COLLECTION_TRACE_BEGIN_INHERITED(WorkerPrivateParent<Derived>,
>                                                 DOMEventTargetHelper)
> -  tmp->AssertIsOnParentThread();

How hard would it be to tmp->AssertIsOnParentThreadOrPreparingForGC() or something?

::: js/src/gc/GenerateStatsPhases.py
@@ +79,5 @@
> +        PhaseKind("UNMARK", "Unmark", 7),
> +        PhaseKind("BUFFER_GRAY_ROOTS", "Buffer Gray Roots", 49),
> +        PhaseKind("MARK_DISCARD_CODE", "Mark Discard Code", 3),
> +        PhaseKind("RELAZIFY_FUNCTIONS", "Relazify Functions", 4),
> +        PhaseKind("PURGE", "Purge", 5),

Heh. Now the telemetry numbering serves as a history of how this code used to work.
Attachment #8875365 - Flags: review?(sphink) → review+
Whiteboard: [qf] → [qf:p1]
(Assignee)

Comment 9

2 months ago
Created attachment 8876050 [details] [diff] [review]
bug1369748-parallelise-begin-mark-phase v2

Updated to remove another DOM assertion that tracing happens on the main thread.
Attachment #8875365 - Attachment is obsolete: true
Attachment #8876050 - Flags: review+
(Assignee)

Comment 10

2 months ago
Comment on attachment 8876050 [details] [diff] [review]
bug1369748-parallelise-begin-mark-phase v2

Requesting additional review for DOM changes to remove assertions that tracing happens on a specific thread, and to check you're happy with this change.
Attachment #8876050 - Flags: review?(bugs)

Updated

2 months ago
Attachment #8876050 - Flags: review?(bugs) → review+

Comment 11

2 months ago
Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/aafdd9bcceaf
Refactor GCRuntime::beginMarkPhase r=sfink
https://hg.mozilla.org/integration/mozilla-inbound/rev/825e71dae9bf
Parallelise the start of GC marking r=sfink r=smaug

Comment 12

2 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/aafdd9bcceaf
https://hg.mozilla.org/mozilla-central/rev/825e71dae9bf
Status: NEW → RESOLVED
Last Resolved: 2 months ago
status-firefox55: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
You need to log in before you can comment on or make changes to this bug.