Large regression in page close cycle collection pauses

VERIFIED FIXED in Firefox 51

Status

()

Core
XPCOM
--
major
VERIFIED FIXED
2 years ago
11 months ago

People

(Reporter: Virtual, Assigned: mccr8)

Tracking

(5 keywords)

51 Branch
mozilla52
All
Windows 7
hang, nightly-community, perf, regression, topperf
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox48 unaffected, firefox49 unaffected, firefox-esr45 unaffected, firefox50 unaffected, firefox51+ verified, firefox52+ verified)

Details

(Whiteboard: gfx-noted)

MozReview Requests

()

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(4 attachments, 8 obsolete attachments)

1.36 KB, text/plain
Details
3.28 KB, text/plain
Details
58 bytes, text/x-review-board-request
smaug
: review+
Details | Review
58 bytes, text/x-review-board-request
smaug
: review+
Details | Review
Created attachment 8789225 [details]
53EhBLj6 (LZMA2).7z

After some time of browsing Firefox Nightly will hangs for some time making itself unresponsive for this time.

Peja Stija from MozillaZine got reliable STR (that it happens more frequently with "Self-Destructing Cookies" extension installed) and got in on Gecko Profiler here - https://cleopatra.io/#report=7ade24c52a5aa5dd1e15e3e668bdff5e599d62b5&jankOnly=true&selection=0,1,2,3,4,5,6,7,8,9,265,266,267,411,412,413,414,432,23,24,25,3127,26,36,23,24,427,428,23,24,49,50,24,25,3128,26,36,23,24,427,428,23,24,49,50,24,25,3129,26,36,23,24,427,428,23,24,49,50,24,25,1130,26,36,1131,50,24,25,1132,26,36,23,24,427,428,23,24,49,50,24,25,3133,26,36,23,24,427,428,23,24,49,50,24,25,1130,26,36,1131,50,24,25,1132,26,36,23,24,427,428,23,24,49,50,24

In my case I get only microshutter scrolling on Gecko Profiler, because when I want to catch a real hang the browser hangs completely. Profile is in attachments, care, after extracting it's nearly 80MB file.
Created attachment 8789257 [details]
Graphics section from about:config.txt

e10s is disabled
I'm pretty sure that Firefox 48 and 49 versions are unaffected, because it happens for not more than about a month.
status-firefox48: ? → unaffected
status-firefox49: ? → unaffected
[Tracking Requested - why for this release]: Regression
tracking-firefox51: --- → ?
The profiler says that you're printing the display list out for some reason. Is your preference "layout.display-list.dump" or "layout.display-list.dump-content" enabled? Can you try this with a clean profile please?
Flags: needinfo?(bernesb)
Whiteboard: gfx-noted
Yes. I had these 2 preferences enabled, as I was thinking that they will help in gathering more useful data. I try to catch it one more time, but it's hard as Gecko Profiler hangs the whole browser.
In meantime please look on Peja Stija profile, which shows these hangs, as mine only shows microshutter scrolling.

Comment 6

2 years ago
(In reply to Mason Chang [:mchang] from comment #4)
> The profiler says that you're printing the display list out for some reason.
> Is your preference "layout.display-list.dump" or
> "layout.display-list.dump-content" enabled? Can you try this with a clean
> profile please?

In my case, I have neither of those settings enabled.
From Peja's profile, it looks like the JS engine is doing lots of work and the GC is spending lots of time doing things. Nothing from graphics stands out here. Can someone from the JS team look at this please?
Component: Graphics → JavaScript Engine
Flags: needinfo?(naveed)

Comment 8

2 years ago
Please disregard my problem (and profile), i didn't encounter it since using a blank profile, so it must be a setting/add-on related issue and most possible not related to Virtual_ManPL problem.
Please reopen if this comes up again. Thanks
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Flags: needinfo?(naveed)
Resolution: --- → WORKSFORME
Flags: needinfo?(bernesb)

Comment 10

2 years ago
Just would like to add that the OP's problem probably is a complete different issue, and my be worth to focus on the attachment posted, since WORKSFORME probably doesn't fit for  Virtual_ManPL initial problem.
The attachment posted, at least from the profile, is because he had a very expensive debug preference on.
I would like for someone to diagnose at least Peja Stija profile, because like I said in my case Gecko Profiler hangs the whole browser.
Moreover it's clearly the regression, because going back to stable version not produce any hangs with same profile and extensions.
Status: RESOLVED → REOPENED
Flags: needinfo?(naveed)
Resolution: WORKSFORME → ---
(In reply to Virtual_ManPL [:Virtual] - (ni? me) from comment #12)
> I would like for someone to diagnose at least Peja Stija profile, because
> like I said in my case Gecko Profiler hangs the whole browser.
> Moreover it's clearly the regression, because going back to stable version
> not produce any hangs with same profile and extensions.

Can you bisect to find a bug which is causing this please?
Flags: needinfo?(bernesb)
I'm not sure what to do about Peja's profile since a clean profile fixes it for him, which means a bad add on is probably doing something. 

Virtual, is this still happening for you with a clean profile?
(In reply to Mason Chang [:mchang] from comment #14)
> I'm not sure what to do about Peja's profile since a clean profile fixes it
> for him, which means a bad add on is probably doing something.

It's hard for me to believe that my profile is broken or some extensions are causing this, as when I go back to stable Firefox version I didn't encounter any more hangs with same configuration.
Maybe diagnosing Peja Stija profile will give some hints what's broken.



(In reply to Mason Chang [:mchang] from comment #13)
> Can you bisect to find a bug which is causing this please?
(In reply to Mason Chang [:mchang] from comment #14)
> Virtual, is this still happening for you with a clean profile?

I will try to diagnose and find it, but not faster than 1-2 weeks,
as this issue isn't reproducible straight a way (at least for me) and I don't have that much time right now for this time consuming thing.



@ Peja Stija - Maybe you could help with finding regression range, as you could easily reproduce it (comparing to me)? Even getting daily instead of inbound builds will be a big help.
Flags: needinfo?(zombulla14)

Comment 16

2 years ago
(In reply to Virtual_ManPL [:Virtual] - (ni? me) from comment #15)

> @ Peja Stija - Maybe you could help with finding regression range, as you
> could easily reproduce it (comparing to me)? Even getting daily instead of
> inbound builds will be a big help.

I'll do it, can't promise direct results, cause the hanging I experienced goes back weeks (if not even months) and I was of the impression that that was just part of using a nightly build.
I'll try to recreate and chime in when certain.

Comment 17

2 years ago
Well, using the non-blank profile, I even get the hangs ever so many minutes with no single add-on enabled, so this must be either due to a corruption, a wrong setting or a bug. Personally I'm out of options.

Comment 18

2 years ago
I cannot pinpoint this to some add-on, it seems to be exclusive, no matter what add-ons installed or not.
I can reproduce it 60% of the time using these steps :

- Have 5 pinned tabs (of which 2 are youtube video pages)
- Reload all 5 pretty fast one after another
- A couple of seconds to have a minute after all 5 have finished (re-)loading, the hang circle appears, sometimes up to 10 seconds long and the browser hangs.

Comment 19

2 years ago
Reproduced with a clean profile, and having a couple of Youtube video pages open, 1 playing, the rest paused. Getting the browser in the background and do something else for a couple of minutes, then bringing the browser back in the front, result = seconds long hang(s).

Profiler : https://cleopatra.io/#report=52b38903b411d4e1f4250c2f1675576cfaddd61b&jankOnly=true&filter=%5B%7B%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A27589865,%22end%22%3A27590994%7D,%7B%22type%22%3A%22FocusedCallstackPrefixSampleFilter%22,%22name%22%3A%22(root)%22,%22focusedCallstack%22%3A%5B0%5D,%22appliesToJS%22%3Afalse%7D%5D&selection=0,1,2,3,4,5,6,7,8
I can confirm this, something fishy is going on.
Status: REOPENED → NEW
Flags: needinfo?(bernesb)
(In reply to Peja Stija from comment #19)
> Reproduced with a clean profile, and having a couple of Youtube video pages
> open, 1 playing, the rest paused. Getting the browser in the background and
> do something else for a couple of minutes, then bringing the browser back in
> the front, result = seconds long hang(s).
> 
> Profiler :
> https://cleopatra.io/
> #report=52b38903b411d4e1f4250c2f1675576cfaddd61b&jankOnly=true&filter=%5B%7B%
> 22type%22%3A%22RangeSampleFilter%22,%22start%22%3A27589865,
> %22end%22%3A27590994%7D,
> %7B%22type%22%3A%22FocusedCallstackPrefixSampleFilter%22,
> %22name%22%3A%22(root)%22,%22focusedCallstack%22%3A%5B0%5D,
> %22appliesToJS%22%3Afalse%7D%5D&selection=0,1,2,3,4,5,6,7,8

Is this happening on developer edition or release?

From this profile, it looks like we have large restyling problems and JS JIT problems. Changing components.

Comment 22

2 years ago
(In reply to Mason Chang [:mchang] from comment #21)
> (In reply to Peja Stija from comment #19)
> 
> Is this happening on developer edition or release?
> 
> From this profile, it looks like we have large restyling problems and JS JIT
> problems. Changing components.

This is happening on latest nightly (2016-09-13 32bit) with a clean profile.
Reproducable 50-60% of the time with the procedure explained in comment 19.
(In reply to Peja Stija from comment #22)
> (In reply to Mason Chang [:mchang] from comment #21)
> > (In reply to Peja Stija from comment #19)
> > 
> > Is this happening on developer edition or release?
> > 
> > From this profile, it looks like we have large restyling problems and JS JIT
> > problems. Changing components.
> 
> This is happening on latest nightly (2016-09-13 32bit) with a clean profile.
> Reproducable 50-60% of the time with the procedure explained in comment 19.

Yes, but is it reproducible on developer edition or release?

Comment 24

2 years ago
(In reply to Mason Chang [:mchang] from comment #23)
...
> Yes, but is it reproducible on developer edition or release?

I don't have developer edition, and I cannot reproduce this behavior on the release version.

Something I somehow overlooked, (might be a clue?) I see there are now a bunch of failure logs added to the Graphics part in about:support, last time I checked this page some months ago,  I remember not seeing these there at that time.

Screenshot : https://postimg.org/image/tgi9nqzxh/
(In reply to Peja Stija from comment #24)
> (In reply to Mason Chang [:mchang] from comment #23)
> ...
> > Yes, but is it reproducible on developer edition or release?
> 
> I don't have developer edition, and I cannot reproduce this behavior on the
> release version.
> 
> Something I somehow overlooked, (might be a clue?) I see there are now a
> bunch of failure logs added to the Graphics part in about:support, last time
> I checked this page some months ago,  I remember not seeing these there at
> that time.
> 
> Screenshot : https://postimg.org/image/tgi9nqzxh/

Those alerts are related to bug 1302713. Can you please download the developer edition and see if it still happens? Thanks!
Summary: Constant hangs and poor scrolling performance in Firefox Nightly → Constant hangs in Firefox Nightly

Comment 26

2 years ago
(In reply to Mason Chang [:mchang] from comment #25)

> Those alerts are related to bug 1302713. Can you please download the
> developer edition and see if it still happens? Thanks!

I installed the developer edition with it's own profile, and there everything stays perfectly fluid, no hangs detected after trying to recreate it for a couple of hours.
Have to say, Dev Edition definitely feels a lot more fluent than current nightlies.

Comment 27

2 years ago
Just to add, I stupidly forgot to add this every time, the 3 or 4 Youtube video tabs to reproduce the hangs are actually pinned tabs, don't know if this is of any more value to get to the core ?
In my case I don't even need to use pinned tabs or have opened YouTube to get Nightly hang after long (>30min) browsing session.
Firefox 50 set to unaffected, per Comment #26
status-firefox50: ? → unaffected
status-firefox-esr45: --- → unaffected

Comment 30

2 years ago
Another profile make it clear, horrible hangs, when getting FF nightly in background, doing something else, bring FF nightly in the front again, and reload a page (or open something in a new tab)

https://cleopatra.io/#report=b8abe7c86711961c06b933e75f0e34337ce25b2e&selection=0,1,2,3,4,5,6,7,8,9,10
Looking on naveed profile he isn't active (at least here, on Bugzilla) for 4 years. Do you have in mind any other person to ping for, so he or she could analyzed these profiles with recorded hangs?
Flags: needinfo?(zombulla14)
Flags: needinfo?(naveed)
Flags: needinfo?(mchang)
The profile in comment 30 shows a long cycle collection. I have also been experiencing very long cycle collection in Nightly, since about one or two weeks ago. Here's a profile from my machine: https://cleopatra.io/#report=44cf509ee14050a9675de21ddabf714f50d42626

The CYCLE_COLLECTOR_MAX_PAUSE telemetry measurement shows a big upwards jump on August 26: https://mzl.la/2cV5tIl
Flags: needinfo?(terrence)
Pushlog for the range 2016-08-25 to 2016-08-26: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=01748a2b1a463f24efd9cd8abad9ccfd76b037b8&tochange=a551f534773cf2d6933f78ce7d82a7a33a99643e

Maybe bug 1296484? It looks like that one was backed out within the range, though, not sure if it ended up making it into the 20160826 nightly build.
Comment hidden (offtopic)
Clearing the regressionwindow-wanted keyword as a pushlog was provided in comment 33.
Keywords: regressionwindow-wanted
Forwarding ni? to CC owners.
Flags: needinfo?(terrence)
Flags: needinfo?(continuation)
Flags: needinfo?(bugs)
(Assignee)

Comment 39

2 years ago
I think we should back out bug 1296484 for now. It is very plausible that in a page closing situation we somehow end up destroying things more aggressively now. I think I'm seeing this locally, with an easier STR. I'll try bisecting.
Blocks: 1296484
Flags: needinfo?(continuation)
Flags: needinfo?(bugs)
(Assignee)

Updated

2 years ago
Component: JavaScript Engine → XPCOM
(Assignee)

Comment 40

2 years ago
Well, I'll bisect to confirm, as it may require also backing out bug 1237058.
(Assignee)

Comment 41

2 years ago
str
Testing locally, it doesn't seem like bug 1296484 is the cause.

Here are my steps to reproduce:

1. Set javascript.options.mem.log to true in about config. Open the browser console. Type "cc(" into the "filter output" box so you only see CC activity.
2. Open https://news.ycombinator.com/ then wait a few seconds (I just use this as an example of a very simple site).
3. Open http://www.wowhead.com/ in two tabs, leaving the browser open to one of them, then wait ten seconds. (This page is doing something awful like continuously reloading an iframe so it generates a ton of potential garbage.)
4. Right click on the news.ycombinator tab, and do close all tabs to the right.
5. Look for a large CC pause a few seconds later.

In Nightly, using either of RyanVM's builds from comment 37, the CC pause looks like this:

CC(T+21.8)[content] max pause: 216ms, total time: 530ms, slices: 49, suspected: 251, visited: 13126 RCed and 276827 GCed, collected: 13063 RCed and 276768 GCed (276768|0|0 waiting for GC)
ForgetSkippable 8 times before CC, min: 0 ms, max: 1 ms, avg: 0 ms, total: 6 ms, max sync: 0 ms, removed: 221

In Aurora it looks like this:

CC(T+38.1)[content] max pause: 35ms, total time: 306ms, slices: 51, suspected: 1096, visited: 15689 RCed and 282392 GCed, collected: 15608 RCed and 282332 GCed (282332|0|0 waiting for GC)
ForgetSkippable 9 times before CC, min: 0 ms, max: 3 ms, avg: 1 ms, total: 11 ms, max sync: 0 ms, removed: 6223

The max pause is much larger, even though we're looking at about the same amount of stuff in both cases. My new theory is that we messed up the CC budget somehow, so we're not doing very much work, so our final "catch up" slice of work has to do more. I'll try a local build to investigate that. Maybe bug 1297367 did something.
Flags: needinfo?(mchang)
(Assignee)

Updated

2 years ago
Summary: Constant hangs in Firefox Nightly → Large regression in page close cycle collection pauses
(Assignee)

Comment 42

2 years ago
Local testing indicates this regression is from bug 1263355. I'll do some local testing to figure out where exactly the time is going.
Blocks: 1263355
No longer blocks: 1296484
(Assignee)

Comment 43

2 years ago
Created attachment 8792634 [details]
examples of good and bad CC times
Has Regression Range: --- → yes
Has STR: --- → yes
(Assignee)

Comment 44

2 years ago
The profile in comment 32 is spending a ton of time tracing the children of JS objects. The frontend rewrite patch added a new GC tracekind for scopes. This is not getting added to the CC graph. It sounds like a bunch of objects can hold onto a scope, so we'll end up tracing the same scope over and over again. I think the fix is to then start adding Scopes to the CC graph. This needs to be done without exposing the actual definition of a scope to Gecko. Presumably a callback will be involved, like what is done for shapes.
Assignee: nobody → continuation
status-firefox52: --- → affected

Comment 45

2 years ago
(In reply to Andrew McCreight [:mccr8] from comment #44)
> The profile in comment 32 is spending a ton of time tracing the children of
> JS objects. The frontend rewrite patch added a new GC tracekind for scopes.
> This is not getting added to the CC graph. It sounds like a bunch of objects
> can hold onto a scope, so we'll end up tracing the same scope over and over
> again. I think the fix is to then start adding Scopes to the CC graph. This
> needs to be done without exposing the actual definition of a scope to Gecko.
> Presumably a callback will be involved, like what is done for shapes.

For some extra clarity, the things that now have their own GC TraceKinds in the rewrite *used* to be JSObjects, so they "just worked" when traced by the CC.
(Assignee)

Comment 46

2 years ago
I should add DEBUG counting to the graph building phase that compares the number of CCKind to non-CCKind children visited, and asserts if the ratio is too low. That should prevent quadratic blowups in the future and hopefully won't be too expensive.
[Tracking Requested - why for this release]: Regression
tracking-firefox52: --- → ?
Tracking 51/52 for this regression. Thanks to both Virtual_ManPL and Peja for doing all the testing and attaching their logs.
tracking-firefox51: ? → +
tracking-firefox52: ? → +
(Assignee)

Updated

2 years ago
See Also: bug 1295272
(Assignee)

Comment 49

2 years ago
I have work in progress patches that, basically, adds Scope to AddToCCKind().

In a debug build, on the STR I mentioned before, I was seeing about 450,000 objects traversed, with about 770,000 calls to JS::TraceChildren. The CC time stats were like this:
CC(T+52.5)[content] max pause: 1624ms, total time: 1991ms, slices: 48, suspected: 4457, visited: 21673 RCed and 450139 GCed, collected: 16231 RCed and 349956 GCed (349956|0|0 waiting for GC)

With my patches, about 500,000 objects are being traversed (more!) but there are only about 580,000 calls to JS::TraceChildren. The CC max pause is about 40% of the previous one:
CC(T+52.6)[content] max pause: 648ms, total time: 944ms, slices: 49, suspected: 433, visited: 19126 RCed and 496476 GCed, collected: 18970 RCed and 493321 GCed (493321|0|0 waiting for GC)

I'll try to finish up the patches and put them up for review soon.
(Assignee)

Comment 50

2 years ago
I also filed bug 1304205 about making slice budgets larger for longer running slice times, to reduce max pauses for long CCs.
(Assignee)

Comment 51

2 years ago
In Linux64 runs, I saw trace to traverse ratios of up to 1.58 in a patched version. In an unpatched version, I see many places with a ratio of 2.3. I'll try adding an assert for a ratio of 2.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Assignee)

Comment 57

2 years ago
I still need to wait for various try runs to finish, but I'm reasonably confident this is okay now, so I'm putting my patches up for review.

Part 1 and 2 are to fix the regression, while parts 3 through 5 add infrastructure to catch regressions like this in the future. Parts 1, 3 and 4 are refactoring patches that should not impact behavior. Part 5 is adds debug-only stats that are used in an assertion. I'm inclined to also backport 3-5 because we're early in Aurora, and the refactoring in there will make it harder to backport future patches otherwise. They should be low risk for opt builds.
(Assignee)

Comment 58

2 years ago
try run showing how orange it is with a ratio limit of 2.0, and without part 2:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d6cdadc86ac7

opt looks okay:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=adda84b9be2b

all platform debug try run:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1c6e526473c2
Android build failure is due to another patch in my queue. Hopefully Android will be okay...
(Assignee)

Comment 59

2 years ago
I see an assertion failure with 2.0, so I'll have to bump it up a bit. The ratio was 2.017049. Maybe 2.2 will be high enough to provide some safety. The bad runs all have a ratio of 2.3 or higher.
Comment hidden (mozreview-request)
(Assignee)

Updated

2 years ago
Attachment #8793433 - Attachment is obsolete: true
Attachment #8793433 - Flags: review?(bugs)
(Assignee)

Updated

2 years ago
Attachment #8793434 - Attachment is obsolete: true
Attachment #8793434 - Flags: review?(bugs)
(Assignee)

Updated

2 years ago
Attachment #8793435 - Attachment is obsolete: true
Attachment #8793435 - Flags: review?(bugs)
(Assignee)

Updated

2 years ago
Attachment #8793436 - Attachment is obsolete: true
Attachment #8793436 - Flags: review?(bugs)
(Assignee)

Comment 61

2 years ago
Well, that did not do what I wanted.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)

Comment 66

2 years ago
mozreview-review
Comment on attachment 8793923 [details]
Bug 1301301, part 1 - Unify NoteJSObject and NoteJSScript into NoteJSChild.

https://reviewboard.mozilla.org/r/80512/#review79262

I assume latter patches will explain why these changes are needed, but I guess that is quite obvious.
Attachment #8793923 - Flags: review?(bugs) → review+

Comment 67

2 years ago
mozreview-review
Comment on attachment 8793924 [details]
Bug 1301301, part 2 - Add Scope as an AddToCCKind.

https://reviewboard.mozilla.org/r/80514/#review79264

(not about this bug but I'm trying to remember what ObjectGroup type is and why it isn't listed here.)
Attachment #8793924 - Flags: review?(bugs) → review+

Comment 68

2 years ago
mozreview-review
Comment on attachment 8793925 [details]
Bug 1301301, part 3 - Implement Begin and EndCycleCollectionCallback in CCJSContext.

https://reviewboard.mozilla.org/r/80516/#review79266
Attachment #8793925 - Flags: review?(bugs) → review+

Comment 69

2 years ago
mozreview-review
Comment on attachment 8793926 [details]
Bug 1301301, part 4 - Make calls to JS::TraceChildren go through a single method on CCJSContext.

https://reviewboard.mozilla.org/r/80518/#review79272

::: xpcom/base/CycleCollectedJSContext.cpp:133
(Diff revision 1)
>  } // namespace mozilla
>  
> -struct NoteWeakMapChildrenTracer : public JS::CallbackTracer
> +struct CCJSTracer : public JS::CallbackTracer
>  {
> -  NoteWeakMapChildrenTracer(JSContext* aCx,
> +  CCJSTracer(CycleCollectedJSContext* aCx,
> +             WeakMapTraceKind aWeakTraceKind = TraceWeakMapValues)

I don't yet understand this weakmap stuff here.

Comment 70

2 years ago
mozreview-review
Comment on attachment 8793437 [details]
Bug 1301301, part 5 - In debug builds, check if there are too many calls to JS::TraceChildren.

https://reviewboard.mozilla.org/r/80174/#review79274

::: xpcom/base/CycleCollectedJSContext.h:482
(Diff revision 2)
>      void invoke(JS::HandleObject scope, Closure& closure) override;
>    };
>    EnvironmentPreparer mEnvironmentPreparer;
> +
> +#ifdef DEBUG
> +  uint32_t mNumTraversed;

mNumTraversed is only about JS things, so its name should say so. I first through this was about all things traversed.
mNumOfTraversedJSThings?

Comment 71

2 years ago
mozreview-review
Comment on attachment 8793926 [details]
Bug 1301301, part 4 - Make calls to JS::TraceChildren go through a single method on CCJSContext.

https://reviewboard.mozilla.org/r/80518/#review79280

It might make sense to make CCJSTracer's ctor protected, so that only those subclasses could use it.

::: xpcom/base/CycleCollectedJSContext.cpp:150
(Diff revision 1)
> +
> +struct NoteWeakMapChildrenTracer : public CCJSTracer
> +{
> +  NoteWeakMapChildrenTracer(CycleCollectedJSContext* aCx,
>                              nsCycleCollectionNoteRootCallback& aCb)
> -    : JS::CallbackTracer(aCx), mCb(aCb), mTracedAny(false), mMap(nullptr),
> +    : CCJSTracer(aCx), mCb(aCb), mTracedAny(false), mMap(nullptr),

Note to myself...This relies on TraceWeakMapValues.
And looks like WeakMapTraceKind weakTraceKind = TraceWeakMapValues is also in JS::CallbackTracer, so behavior shouldn't change.

::: xpcom/base/CycleCollectedJSContext.cpp:347
(Diff revision 1)
> -{
> -  TraversalTracer(JSContext* aCx, nsCycleCollectionTraversalCallback& aCb)
> -    : JS::CallbackTracer(aCx, DoNotTraceWeakMaps), mCb(aCb)
> -  {
> +{
> -  }
> +  TraversalTracer(CycleCollectedJSContext* aCx,
> +                  nsCycleCollectionTraversalCallback& aCb)

note to myself...ok, here we keep using DoNotTraceWeakMaps
Attachment #8793926 - Flags: review?(bugs) → review+
(Assignee)

Comment 72

2 years ago
(In reply to Olli Pettay [:smaug] (TPAC) from comment #67)
> (not about this bug but I'm trying to remember what ObjectGroup type is and
> why it isn't listed here.)

I'm not entirely sure. I think it is some kind of shape-like thing. It would probably be useful to further profile page close CCs with a bunch of JS.

Yeah, the weak map stuff is a little messy. I don't remember off the top of my head exactly how it all works.

> mNumTraversed is only about JS things, so its name should say so.

That's a good point. I'll rename it to mNumTraversedGCThings, because it is incremented in TraverseGCThing.

> It might make sense to make CCJSTracer's ctor protected, so that only those subclasses could use it.

Good point. I have done that.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Assignee)

Updated

2 years ago
Keywords: leave-open

Comment 75

2 years ago
Pushed by amccreight@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/af8d3e696e93
part 1 - Unify NoteJSObject and NoteJSScript into NoteJSChild. r=smaug
https://hg.mozilla.org/integration/mozilla-inbound/rev/8dd081bff020
part 2 - Add Scope as an AddToCCKind. r=smaug

Comment 76

2 years ago
mozreview-review
Comment on attachment 8793437 [details]
Bug 1301301, part 5 - In debug builds, check if there are too many calls to JS::TraceChildren.

https://reviewboard.mozilla.org/r/80174/#review79576

::: xpcom/base/CycleCollectedJSContext.cpp:1384
(Diff revision 3)
> +  // can spend a lot of time in JS::TraceChildren(). If you add a new
> +  // TraceKind and are hitting this assertion, adding it to
> +  // AddToCCKind might help. (The check is not done for small CC
> +  // graphs to avoid noise.)
> +  if (mNumTraceChildren > 1000) {
> +    double traceRatio = ((double)mNumTraceChildren) / ((double)mNumTraversedGCThings);

You want to ensure mNumTraversedGCThings != 0
Attachment #8793437 - Flags: review?(bugs) → review+
(Assignee)

Comment 78

2 years ago
(In reply to Olli Pettay [:smaug] (TPAC) from comment #76)
> You want to ensure mNumTraversedGCThings != 0

Oh, good catch. I even saw that in a local run, but I forgot to fix it.
(Assignee)

Updated

2 years ago
Keywords: leave-open

Comment 79

2 years ago
Pushed by amccreight@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/45fb9c1ce63a
part 3 - Implement Begin and EndCycleCollectionCallback in CCJSContext. r=smaug
https://hg.mozilla.org/integration/mozilla-inbound/rev/6186eae0c2b7
part 4 - Make calls to JS::TraceChildren go through a single method on CCJSContext. r=smaug
https://hg.mozilla.org/integration/mozilla-inbound/rev/da70c5abd73a
part 5 - In debug builds, check if there are too many calls to JS::TraceChildren. r=smaug
(Assignee)

Comment 81

2 years ago
Hmm, well, this is likely not the only backout those patches will have, so I'll spin off the assertions into a separate bug to facilitate backporting as fast as possible.
(Assignee)

Updated

2 years ago
Blocks: 1306086
(Assignee)

Updated

2 years ago
Attachment #8793924 - Attachment is obsolete: true
(Assignee)

Updated

2 years ago
Attachment #8793437 - Attachment is obsolete: true
(Assignee)

Updated

2 years ago
Attachment #8793924 - Attachment is obsolete: false
(Assignee)

Updated

2 years ago
Attachment #8793925 - Attachment is obsolete: true
(Assignee)

Updated

2 years ago
Attachment #8793926 - Attachment is obsolete: true
(Assignee)

Comment 82

2 years ago
This only hit m-c on the 9-27 build, so I'm going to wait a few more days to confirm that this fixes the regression on telemetry, then request uplift to Aurora.
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago2 years ago
Flags: needinfo?(continuation)
Resolution: --- → FIXED
(Assignee)

Updated

2 years ago
status-firefox52: affected → fixed
Flags: needinfo?(continuation)
(Assignee)

Updated

2 years ago
Flags: needinfo?(continuation)

Updated

2 years ago
Target Milestone: --- → mozilla52
I want to add that, I'm no longer encountering these annoying hangs for more than 2 days,
so I making this bug as VERIFIED.

Thank you very much! \o/
Status: RESOLVED → VERIFIED
status-firefox52: fixed → verified
(Assignee)

Comment 84

2 years ago
Looking at telemetry, I think this did fix the regression in mean and 95th percentile cycle collector max pauses.

https://mzl.la/2cGVplp
Flags: needinfo?(continuation)
(Assignee)

Comment 85

2 years ago
Comment on attachment 8793923 [details]
Bug 1301301, part 1 - Unify NoteJSObject and NoteJSScript into NoteJSChild.

Approval Request Comment
[Feature/regressing bug #]: bug 1263355
[User impact if declined]: very long hangs
[Describe test coverage new/current, TreeHerder]: This code runs all the time. I have patches in progress to check this particular regression, but we only regress this every few years so I think it isn't too critical that this gets landed immediately.
[Risks and why]: 
[String/UUID change made/needed]: Fairly low. Part 1 has a lot of changes, but is mostly a code refactoring, so it is low risk. Part 2 is small, and potentially risky, but it should really just change our behavior back to before bug 1263355 landed.
Attachment #8793923 - Flags: approval-mozilla-aurora?
(Assignee)

Updated

2 years ago
Attachment #8793924 - Flags: approval-mozilla-aurora?
Hi :mccr8,
Are patch 1 & 2 the only patches we want to uplift to 51 aurora?
Flags: needinfo?(continuation)
(Assignee)

Comment 87

2 years ago
(In reply to Gerry Chang [:gchang] from comment #86)
> Hi :mccr8,
> Are patch 1 & 2 the only patches we want to uplift to 51 aurora?

Yes. The rest have not landed yet, and are only needed for some regression testing, which I think is not super important for this code. (I spun them off into a separate bug to simplify tracking.)
Flags: needinfo?(continuation)
Comment on attachment 8793923 [details]
Bug 1301301, part 1 - Unify NoteJSObject and NoteJSScript into NoteJSChild.

Fix a regression. Take it in 51 aurora.
Attachment #8793923 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Updated

2 years ago
Attachment #8793924 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(Assignee)

Updated

2 years ago
Duplicate of this bug: 1302055
You need to log in before you can comment on or make changes to this bug.