Bugzilla

Comment 4

•

8 years ago

The profiler says that you're printing the display list out for some reason. Is your preference "layout.display-list.dump" or "layout.display-list.dump-content" enabled? Can you try this with a clean profile please?

Flags: needinfo?(bernesb)

Virtual_ManPL [:Virtual] 🇵🇱 - (please needinfo? me - so I will see your comment/reply/question/etc.)

Updated

•

8 years ago

Whiteboard: gfx-noted

Reporter

Comment 5

•

8 years ago

Yes. I had these 2 preferences enabled, as I was thinking that they will help in gathering more useful data. I try to catch it one more time, but it's hard as Gecko Profiler hangs the whole browser.
In meantime please look on Peja Stija profile, which shows these hangs, as mine only shows microshutter scrolling.

Comment 6

•

8 years ago

(In reply to Mason Chang [:mchang] from comment #4)
> The profiler says that you're printing the display list out for some reason.
> Is your preference "layout.display-list.dump" or
> "layout.display-list.dump-content" enabled? Can you try this with a clean
> profile please?

In my case, I have neither of those settings enabled.

Comment 7

•

8 years ago

From Peja's profile, it looks like the JS engine is doing lots of work and the GC is spending lots of time doing things. Nothing from graphics stands out here. Can someone from the JS team look at this please?

Component: Graphics → JavaScript Engine

Flags: needinfo?(naveed)

Comment 8

•

8 years ago

Please disregard my problem (and profile), i didn't encounter it since using a blank profile, so it must be a setting/add-on related issue and most possible not related to Virtual_ManPL problem.

Comment 9

•

8 years ago

Please reopen if this comes up again. Thanks

Status: NEW → RESOLVED

Closed: 8 years ago

Flags: needinfo?(naveed)

Resolution: --- → WORKSFORME

Updated

•

8 years ago

Flags: needinfo?(bernesb)

Comment 10

•

8 years ago

Just would like to add that the OP's problem probably is a complete different issue, and my be worth to focus on the attachment posted, since WORKSFORME probably doesn't fit for  Virtual_ManPL initial problem.

Virtual_ManPL [:Virtual] 🇵🇱 - (please needinfo? me - so I will see your comment/reply/question/etc.)

Comment 11

•

8 years ago

The attachment posted, at least from the profile, is because he had a very expensive debug preference on.

Reporter

Comment 12

•

8 years ago

I would like for someone to diagnose at least Peja Stija profile, because like I said in my case Gecko Profiler hangs the whole browser.
Moreover it's clearly the regression, because going back to stable version not produce any hangs with same profile and extensions.

Status: RESOLVED → REOPENED

Flags: needinfo?(naveed)

Resolution: WORKSFORME → ---

Virtual_ManPL [:Virtual] 🇵🇱 - (please needinfo? me - so I will see your comment/reply/question/etc.)

Reporter

Updated

•

8 years ago

Attachment #8789225 - Attachment is obsolete: true

Comment 13

•

8 years ago

(In reply to Virtual_ManPL [:Virtual] - (ni? me) from comment #12)
> I would like for someone to diagnose at least Peja Stija profile, because
> like I said in my case Gecko Profiler hangs the whole browser.
> Moreover it's clearly the regression, because going back to stable version
> not produce any hangs with same profile and extensions.

Can you bisect to find a bug which is causing this please?

Flags: needinfo?(bernesb)

Virtual_ManPL [:Virtual] 🇵🇱 - (please needinfo? me - so I will see your comment/reply/question/etc.)

Comment 14

•

8 years ago

I'm not sure what to do about Peja's profile since a clean profile fixes it for him, which means a bad add on is probably doing something. 

Virtual, is this still happening for you with a clean profile?

Reporter

Comment 15

•

8 years ago

(In reply to Mason Chang [:mchang] from comment #14)
> I'm not sure what to do about Peja's profile since a clean profile fixes it
> for him, which means a bad add on is probably doing something.

It's hard for me to believe that my profile is broken or some extensions are causing this, as when I go back to stable Firefox version I didn't encounter any more hangs with same configuration.
Maybe diagnosing Peja Stija profile will give some hints what's broken.



(In reply to Mason Chang [:mchang] from comment #13)
> Can you bisect to find a bug which is causing this please?
(In reply to Mason Chang [:mchang] from comment #14)
> Virtual, is this still happening for you with a clean profile?

I will try to diagnose and find it, but not faster than 1-2 weeks,
as this issue isn't reproducible straight a way (at least for me) and I don't have that much time right now for this time consuming thing.



@ Peja Stija - Maybe you could help with finding regression range, as you could easily reproduce it (comparing to me)? Even getting daily instead of inbound builds will be a big help.

Flags: needinfo?(zombulla14)

Comment 16

•

8 years ago

(In reply to Virtual_ManPL [:Virtual] - (ni? me) from comment #15)

> @ Peja Stija - Maybe you could help with finding regression range, as you
> could easily reproduce it (comparing to me)? Even getting daily instead of
> inbound builds will be a big help.

I'll do it, can't promise direct results, cause the hanging I experienced goes back weeks (if not even months) and I was of the impression that that was just part of using a nightly build.
I'll try to recreate and chime in when certain.

Comment 17

•

8 years ago

Well, using the non-blank profile, I even get the hangs ever so many minutes with no single add-on enabled, so this must be either due to a corruption, a wrong setting or a bug. Personally I'm out of options.

Comment 18

•

8 years ago

I cannot pinpoint this to some add-on, it seems to be exclusive, no matter what add-ons installed or not.
I can reproduce it 60% of the time using these steps :

- Have 5 pinned tabs (of which 2 are youtube video pages)
- Reload all 5 pretty fast one after another
- A couple of seconds to have a minute after all 5 have finished (re-)loading, the hang circle appears, sometimes up to 10 seconds long and the browser hangs.

Virtual_ManPL [:Virtual] 🇵🇱 - (please needinfo? me - so I will see your comment/reply/question/etc.)

Comment 19

•

8 years ago

Reproduced with a clean profile, and having a couple of Youtube video pages open, 1 playing, the rest paused. Getting the browser in the background and do something else for a couple of minutes, then bringing the browser back in the front, result = seconds long hang(s).

Profiler : https://cleopatra.io/#report=52b38903b411d4e1f4250c2f1675576cfaddd61b&jankOnly=true&filter=%5B%7B%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A27589865,%22end%22%3A27590994%7D,%7B%22type%22%3A%22FocusedCallstackPrefixSampleFilter%22,%22name%22%3A%22(root)%22,%22focusedCallstack%22%3A%5B0%5D,%22appliesToJS%22%3Afalse%7D%5D&selection=0,1,2,3,4,5,6,7,8

Reporter

Comment 20

•

8 years ago

I can confirm this, something fishy is going on.

Status: REOPENED → NEW

Flags: needinfo?(bernesb)

Comment 21

•

8 years ago

(In reply to Peja Stija from comment #19)
> Reproduced with a clean profile, and having a couple of Youtube video pages
> open, 1 playing, the rest paused. Getting the browser in the background and
> do something else for a couple of minutes, then bringing the browser back in
> the front, result = seconds long hang(s).
> 
> Profiler :
> https://cleopatra.io/
> #report=52b38903b411d4e1f4250c2f1675576cfaddd61b&jankOnly=true&filter=%5B%7B%
> 22type%22%3A%22RangeSampleFilter%22,%22start%22%3A27589865,
> %22end%22%3A27590994%7D,
> %7B%22type%22%3A%22FocusedCallstackPrefixSampleFilter%22,
> %22name%22%3A%22(root)%22,%22focusedCallstack%22%3A%5B0%5D,
> %22appliesToJS%22%3Afalse%7D%5D&selection=0,1,2,3,4,5,6,7,8

Is this happening on developer edition or release?

From this profile, it looks like we have large restyling problems and JS JIT problems. Changing components.

Comment 22

•

8 years ago

(In reply to Mason Chang [:mchang] from comment #21)
> (In reply to Peja Stija from comment #19)
> 
> Is this happening on developer edition or release?
> 
> From this profile, it looks like we have large restyling problems and JS JIT
> problems. Changing components.

This is happening on latest nightly (2016-09-13 32bit) with a clean profile.
Reproducable 50-60% of the time with the procedure explained in comment 19.

Comment 23

•

8 years ago

(In reply to Peja Stija from comment #22)
> (In reply to Mason Chang [:mchang] from comment #21)
> > (In reply to Peja Stija from comment #19)
> > 
> > Is this happening on developer edition or release?
> > 
> > From this profile, it looks like we have large restyling problems and JS JIT
> > problems. Changing components.
> 
> This is happening on latest nightly (2016-09-13 32bit) with a clean profile.
> Reproducable 50-60% of the time with the procedure explained in comment 19.

Yes, but is it reproducible on developer edition or release?

Comment 24

•

8 years ago

(In reply to Mason Chang [:mchang] from comment #23)
...
> Yes, but is it reproducible on developer edition or release?

I don't have developer edition, and I cannot reproduce this behavior on the release version.

Something I somehow overlooked, (might be a clue?) I see there are now a bunch of failure logs added to the Graphics part in about:support, last time I checked this page some months ago,  I remember not seeing these there at that time.

Screenshot : https://postimg.org/image/tgi9nqzxh/

Virtual_ManPL [:Virtual] 🇵🇱 - (please needinfo? me - so I will see your comment/reply/question/etc.)

Comment 25

•

8 years ago

(In reply to Peja Stija from comment #24)
> (In reply to Mason Chang [:mchang] from comment #23)
> ...
> > Yes, but is it reproducible on developer edition or release?
> 
> I don't have developer edition, and I cannot reproduce this behavior on the
> release version.
> 
> Something I somehow overlooked, (might be a clue?) I see there are now a
> bunch of failure logs added to the Graphics part in about:support, last time
> I checked this page some months ago,  I remember not seeing these there at
> that time.
> 
> Screenshot : https://postimg.org/image/tgi9nqzxh/

Those alerts are related to bug 1302713. Can you please download the developer edition and see if it still happens? Thanks!

Reporter

Updated

•

8 years ago

Summary: Constant hangs and poor scrolling performance in Firefox Nightly → Constant hangs in Firefox Nightly

Comment 26

•

8 years ago

(In reply to Mason Chang [:mchang] from comment #25)

> Those alerts are related to bug 1302713. Can you please download the
> developer edition and see if it still happens? Thanks!

I installed the developer edition with it's own profile, and there everything stays perfectly fluid, no hangs detected after trying to recreate it for a couple of hours.
Have to say, Dev Edition definitely feels a lot more fluent than current nightlies.

Virtual_ManPL [:Virtual] 🇵🇱 - (please needinfo? me - so I will see your comment/reply/question/etc.)

Comment 27

•

8 years ago

Just to add, I stupidly forgot to add this every time, the 3 or 4 Youtube video tabs to reproduce the hangs are actually pinned tabs, don't know if this is of any more value to get to the core ?

Reporter

Comment 28

•

8 years ago

In my case I don't even need to use pinned tabs or have opened YouTube to get Nightly hang after long (>30min) browsing session.

Virtual_ManPL [:Virtual] 🇵🇱 - (please needinfo? me - so I will see your comment/reply/question/etc.)

Reporter

Comment 29

•

8 years ago

Firefox 50 set to unaffected, per Comment #26

status-firefox50: ? → unaffected

Virtual_ManPL [:Virtual] 🇵🇱 - (please needinfo? me - so I will see your comment/reply/question/etc.)

Reporter

Updated

•

8 years ago

status-firefox-esr45: --- → unaffected

Virtual_ManPL [:Virtual] 🇵🇱 - (please needinfo? me - so I will see your comment/reply/question/etc.)

Comment 30

•

8 years ago

Another profile make it clear, horrible hangs, when getting FF nightly in background, doing something else, bring FF nightly in the front again, and reload a page (or open something in a new tab)

https://cleopatra.io/#report=b8abe7c86711961c06b933e75f0e34337ce25b2e&selection=0,1,2,3,4,5,6,7,8,9,10

Reporter

Comment 31

•

8 years ago

Looking on naveed profile he isn't active (at least here, on Bugzilla) for 4 years. Do you have in mind any other person to ping for, so he or she could analyzed these profiles with recorded hangs?

Flags: needinfo?(zombulla14)

Flags: needinfo?(naveed)

Flags: needinfo?(mchang)

Markus Stange [:mstange]

Comment 32

•

8 years ago

The profile in comment 30 shows a long cycle collection. I have also been experiencing very long cycle collection in Nightly, since about one or two weeks ago. Here's a profile from my machine: https://cleopatra.io/#report=44cf509ee14050a9675de21ddabf714f50d42626

The CYCLE_COLLECTOR_MAX_PAUSE telemetry measurement shows a big upwards jump on August 26: https://mzl.la/2cV5tIl

Flags: needinfo?(terrence)

Markus Stange [:mstange]

Comment 33

•

8 years ago

Pushlog for the range 2016-08-25 to 2016-08-26: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=01748a2b1a463f24efd9cd8abad9ccfd76b037b8&tochange=a551f534773cf2d6933f78ce7d82a7a33a99643e

Maybe bug 1296484? It looks like that one was backed out within the range, though, not sure if it ended up making it into the 20160826 nightly build.

Comment hidden (offtopic)

(In reply to Markus Stange [:mstange] from comment #32)
> The profile in comment 30 shows a long cycle collection. I have also been
> experiencing very long cycle collection in Nightly, since about one or two
> weeks ago. Here's a profile from my machine:
> https://cleopatra.io/#report=44cf509ee14050a9675de21ddabf714f50d42626
> 
> The CYCLE_COLLECTOR_MAX_PAUSE telemetry measurement shows a big upwards jump
> on August 26: https://mzl.la/2cV5tIl

We're breaking new hang records lately, and nobody seems to give a damn.

Ciprian Muresan [:cmuresan], Ecosystem QA

Comment 35

•

8 years ago

https://telemetry.mozilla.org/new-pipeline/evo.html#!aggregates=mean&cumulative=0&e10s=false&end_date=null&keys=&max_channel_version=nightly%252F51&measure=CYCLE_COLLECTOR_MAX_PAUSE&min_channel_version=nightly%252F50&processType=true&product=Firefox&sanitize=1&sort_keys=submissions&start_date=null&trim=1&use_submission_date=0

Comment 36

•

8 years ago

Clearing the regressionwindow-wanted keyword as a pushlog was provided in comment 33.

Keywords: regressionwindow-wanted

Ryan VanderMeulen [:RyanVM]

Comment 37

•

8 years ago

This should block 51 IMO. Bug 1296484 looks plausible.

Try pushes to hopefully confirm whether this is bug 1296484 or not.
rev a8cc2852ae3f (parent rev):  https://treeherder.mozilla.org/#/jobs?repo=try&revision=5d24b3f64393
rev c68fdbfcec39 (bug 1296484): https://treeherder.mozilla.org/#/jobs?repo=try&revision=21189aa1d112

Though interestingly, smaug pointed out that 25th percentile graph shows a big regression around 8-Sep, which better fits when this bug was filed. So maybe there's more to this story?
https://telemetry.mozilla.org/new-pipeline/evo.html#!aggregates=median!5th-percentile!25th-percentile&cumulative=0&end_date=2016-09-08&keys=!__none__!__none__&max_channel_version=nightly%252F51&measure=CYCLE_COLLECTOR_MAX_PAUSE&min_channel_version=nightly%252F48&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2016-09-08&trim=1&use_submission_date=0

Terrence Cole [:terrence]

Comment 38

•

8 years ago

Forwarding ni? to CC owners.

Flags: needinfo?(terrence)

Flags: needinfo?(continuation)

Flags: needinfo?(bugs)

Assignee

Comment 39

•

8 years ago

I think we should back out bug 1296484 for now. It is very plausible that in a page closing situation we somehow end up destroying things more aggressively now. I think I'm seeing this locally, with an easier STR. I'll try bisecting.

Blocks: 1296484

Flags: needinfo?(continuation)

Flags: needinfo?(bugs)

Assignee

Updated

•

8 years ago

Component: JavaScript Engine → XPCOM

Assignee

Comment 40

•

8 years ago

Well, I'll bisect to confirm, as it may require also backing out bug 1237058.

Assignee

Comment 41

•

8 years ago

str

Testing locally, it doesn't seem like bug 1296484 is the cause.

Here are my steps to reproduce:

1. Set javascript.options.mem.log to true in about config. Open the browser console. Type "cc(" into the "filter output" box so you only see CC activity.
2. Open https://news.ycombinator.com/ then wait a few seconds (I just use this as an example of a very simple site).
3. Open http://www.wowhead.com/ in two tabs, leaving the browser open to one of them, then wait ten seconds. (This page is doing something awful like continuously reloading an iframe so it generates a ton of potential garbage.)
4. Right click on the news.ycombinator tab, and do close all tabs to the right.
5. Look for a large CC pause a few seconds later.

In Nightly, using either of RyanVM's builds from comment 37, the CC pause looks like this:

CC(T+21.8)[content] max pause: 216ms, total time: 530ms, slices: 49, suspected: 251, visited: 13126 RCed and 276827 GCed, collected: 13063 RCed and 276768 GCed (276768|0|0 waiting for GC)
ForgetSkippable 8 times before CC, min: 0 ms, max: 1 ms, avg: 0 ms, total: 6 ms, max sync: 0 ms, removed: 221

In Aurora it looks like this:

CC(T+38.1)[content] max pause: 35ms, total time: 306ms, slices: 51, suspected: 1096, visited: 15689 RCed and 282392 GCed, collected: 15608 RCed and 282332 GCed (282332|0|0 waiting for GC)
ForgetSkippable 9 times before CC, min: 0 ms, max: 3 ms, avg: 1 ms, total: 11 ms, max sync: 0 ms, removed: 6223

The max pause is much larger, even though we're looking at about the same amount of stuff in both cases. My new theory is that we messed up the CC budget somehow, so we're not doing very much work, so our final "catch up" slice of work has to do more. I'll try a local build to investigate that. Maybe bug 1297367 did something.

Flags: needinfo?(mchang)

Assignee

Updated

•

8 years ago

Summary: Constant hangs in Firefox Nightly → Large regression in page close cycle collection pauses

Assignee

Comment 42

•

8 years ago

Local testing indicates this regression is from bug 1263355. I'll do some local testing to figure out where exactly the time is going.

Blocks: 1263355
No longer blocks: 1296484

Virtual_ManPL [:Virtual] 🇵🇱 - (please needinfo? me - so I will see your comment/reply/question/etc.)

Assignee

Comment 43

•

8 years ago

Attached file examples of good and bad CC times — Details

Reporter

Updated

•

8 years ago

Has Regression Range: --- → yes

Has STR: --- → yes

Assignee

Comment 44

•

8 years ago

The profile in comment 32 is spending a ton of time tracing the children of JS objects. The frontend rewrite patch added a new GC tracekind for scopes. This is not getting added to the CC graph. It sounds like a bunch of objects can hold onto a scope, so we'll end up tracing the same scope over and over again. I think the fix is to then start adding Scopes to the CC graph. This needs to be done without exposing the actual definition of a scope to Gecko. Presumably a callback will be involved, like what is done for shapes.

Assignee: nobody → continuation

status-firefox52: --- → affected

Shu-yu Guo [:shu]

Comment 45

•

8 years ago

(In reply to Andrew McCreight [:mccr8] from comment #44)
> The profile in comment 32 is spending a ton of time tracing the children of
> JS objects. The frontend rewrite patch added a new GC tracekind for scopes.
> This is not getting added to the CC graph. It sounds like a bunch of objects
> can hold onto a scope, so we'll end up tracing the same scope over and over
> again. I think the fix is to then start adding Scopes to the CC graph. This
> needs to be done without exposing the actual definition of a scope to Gecko.
> Presumably a callback will be involved, like what is done for shapes.

For some extra clarity, the things that now have their own GC TraceKinds in the rewrite *used* to be JSObjects, so they "just worked" when traced by the CC.

Virtual_ManPL [:Virtual] 🇵🇱 - (please needinfo? me - so I will see your comment/reply/question/etc.)

Assignee

Comment 46

•

8 years ago

I should add DEBUG counting to the graph building phase that compares the number of CCKind to non-CCKind children visited, and asserts if the ratio is too low. That should prevent quadratic blowups in the future and hopefully won't be too expensive.

Reporter

Comment 47

•

8 years ago

[Tracking Requested - why for this release]: Regression

tracking-firefox52: --- → ?

Virtual_ManPL [:Virtual] 🇵🇱 - (please needinfo? me - so I will see your comment/reply/question/etc.)

Reporter

Updated

•

8 years ago

Status: NEW → ASSIGNED

Milan Sreckovic [:milan] (needinfo for best results)

Updated

•

8 years ago

Comment 48

•

8 years ago

Tracking 51/52 for this regression. Thanks to both Virtual_ManPL and Peja for doing all the testing and attaching their logs.

tracking-firefox51: ? → +

tracking-firefox52: ? → +

Assignee

Updated

•

8 years ago

Comment 49

•

8 years ago

I have work in progress patches that, basically, adds Scope to AddToCCKind().

In a debug build, on the STR I mentioned before, I was seeing about 450,000 objects traversed, with about 770,000 calls to JS::TraceChildren. The CC time stats were like this:
CC(T+52.5)[content] max pause: 1624ms, total time: 1991ms, slices: 48, suspected: 4457, visited: 21673 RCed and 450139 GCed, collected: 16231 RCed and 349956 GCed (349956|0|0 waiting for GC)

With my patches, about 500,000 objects are being traversed (more!) but there are only about 580,000 calls to JS::TraceChildren. The CC max pause is about 40% of the previous one:
CC(T+52.6)[content] max pause: 648ms, total time: 944ms, slices: 49, suspected: 433, visited: 19126 RCed and 496476 GCed, collected: 18970 RCed and 493321 GCed (493321|0|0 waiting for GC)

I'll try to finish up the patches and put them up for review soon.

Assignee

Comment 50

•

8 years ago

I also filed bug 1304205 about making slice budgets larger for longer running slice times, to reduce max pauses for long CCs.

Assignee

Comment 51

•

8 years ago

In Linux64 runs, I saw trace to traverse ratios of up to 1.58 in a patched version. In an unpatched version, I see many places with a ratio of 2.3. I'll try adding an assert for a ratio of 2.

Comment hidden (mozreview-request)

This will let my next patch pass in other GC things.

This should not change behavior in any important way.

Review commit: https://reviewboard.mozilla.org/r/80166/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/80166/

Comment hidden (mozreview-request)

Bug 1263355 changed scopes from JSObjects (which are represented in
the CC graph) to a new kind of GC thing. Many objects can share the
same scope, so they will end up calling the scope trace method
repeatedly, causing cycle collections to become much slower in some
cases.

Review commit: https://reviewboard.mozilla.org/r/80168/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/80168/

Comment hidden (mozreview-request)

This will let my later patch run code at the start and end of each CC.

This patch should not change any behavior.

Review commit: https://reviewboard.mozilla.org/r/80170/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/80170/

Comment hidden (mozreview-request)

In this patch I add a new tracer base class, CCJSTracer, and make all
of the code in CycleCollectedJSContext that used to do
JS::TraceChildren depend on it. This will let me record the number of
calls to JS::TraceChildren in one place, in a later patch.

This should not change any behavior.

Review commit: https://reviewboard.mozilla.org/r/80172/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/80172/

Comment hidden (mozreview-request)

This can indicate that we are repeatedly calling TraceChildren on
non-AddToCCKind GC things.

Review commit: https://reviewboard.mozilla.org/r/80174/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/80174/

Assignee

Comment 57

•

8 years ago

I still need to wait for various try runs to finish, but I'm reasonably confident this is okay now, so I'm putting my patches up for review.

Part 1 and 2 are to fix the regression, while parts 3 through 5 add infrastructure to catch regressions like this in the future. Parts 1, 3 and 4 are refactoring patches that should not impact behavior. Part 5 is adds debug-only stats that are used in an assertion. I'm inclined to also backport 3-5 because we're early in Aurora, and the refactoring in there will make it harder to backport future patches otherwise. They should be low risk for opt builds.

Assignee

Comment 58

•

8 years ago

try run showing how orange it is with a ratio limit of 2.0, and without part 2:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d6cdadc86ac7

opt looks okay:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=adda84b9be2b

all platform debug try run:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1c6e526473c2
Android build failure is due to another patch in my queue. Hopefully Android will be okay...

Assignee

Comment 59

•

8 years ago

I see an assertion failure with 2.0, so I'll have to bump it up a bit. The ratio was 2.017049. Maybe 2.2 will be high enough to provide some safety. The bad runs all have a ratio of 2.3 or higher.

Comment hidden (mozreview-request)

Comment on attachment 8793437 [details]
Bug 1301301, part 5 - In debug builds, check if there are too many calls to JS::TraceChildren.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/80174/diff/1-2/

Assignee

Updated

•

8 years ago

Attachment #8793433 - Attachment is obsolete: true

Attachment #8793433 - Flags: review?(bugs)

Assignee

Updated

•

8 years ago

Attachment #8793434 - Attachment is obsolete: true

Attachment #8793434 - Flags: review?(bugs)

Assignee

Updated

•

8 years ago

Attachment #8793435 - Attachment is obsolete: true

Attachment #8793435 - Flags: review?(bugs)

Assignee

Updated

•

8 years ago

Attachment #8793436 - Attachment is obsolete: true

Attachment #8793436 - Flags: review?(bugs)

Assignee

Comment 61

•

8 years ago

Well, that did not do what I wanted.

Comment hidden (mozreview-request)

This will let my next patch pass in other GC things.

This should not change behavior in any important way.

Review commit: https://reviewboard.mozilla.org/r/80512/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/80512/

Comment hidden (mozreview-request)

Bug 1263355 changed scopes from JSObjects (which are represented in
the CC graph) to a new kind of GC thing. Many objects can share the
same scope, so they will end up calling the scope trace method
repeatedly, causing cycle collections to become much slower in some
cases.

Review commit: https://reviewboard.mozilla.org/r/80514/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/80514/

Comment hidden (mozreview-request)

This will let my later patch run code at the start and end of each CC.

This patch should not change any behavior.

Review commit: https://reviewboard.mozilla.org/r/80516/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/80516/

Comment hidden (mozreview-request)

In this patch I add a new tracer base class, CCJSTracer, and make all
of the code in CycleCollectedJSContext that used to do
JS::TraceChildren depend on it. This will let me record the number of
calls to JS::TraceChildren in one place, in a later patch.

This should not change any behavior.

Review commit: https://reviewboard.mozilla.org/r/80518/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/80518/

Comment 66

•

8 years ago

mozreview-review

Comment on attachment 8793923 [details]
Bug 1301301, part 1 - Unify NoteJSObject and NoteJSScript into NoteJSChild.

https://reviewboard.mozilla.org/r/80512/#review79262

I assume latter patches will explain why these changes are needed, but I guess that is quite obvious.

Attachment #8793923 - Flags: review?(bugs) → review+

Comment 67

•

8 years ago

mozreview-review

Comment on attachment 8793924 [details]
Bug 1301301, part 2 - Add Scope as an AddToCCKind.

https://reviewboard.mozilla.org/r/80514/#review79264

(not about this bug but I'm trying to remember what ObjectGroup type is and why it isn't listed here.)

Attachment #8793924 - Flags: review?(bugs) → review+

Comment 68

•

8 years ago

mozreview-review

Comment on attachment 8793925 [details]
Bug 1301301, part 3 - Implement Begin and EndCycleCollectionCallback in CCJSContext.

https://reviewboard.mozilla.org/r/80516/#review79266

Attachment #8793925 - Flags: review?(bugs) → review+

Comment 69

•

8 years ago

mozreview-review

Comment on attachment 8793926 [details]
Bug 1301301, part 4 - Make calls to JS::TraceChildren go through a single method on CCJSContext.

https://reviewboard.mozilla.org/r/80518/#review79272

::: xpcom/base/CycleCollectedJSContext.cpp:133
(Diff revision 1)
>  } // namespace mozilla
>  
> -struct NoteWeakMapChildrenTracer : public JS::CallbackTracer
> +struct CCJSTracer : public JS::CallbackTracer
>  {
> -  NoteWeakMapChildrenTracer(JSContext* aCx,
> +  CCJSTracer(CycleCollectedJSContext* aCx,
> +             WeakMapTraceKind aWeakTraceKind = TraceWeakMapValues)

I don't yet understand this weakmap stuff here.

Comment 70

•

8 years ago

mozreview-review

Comment on attachment 8793437 [details]
Bug 1301301, part 5 - In debug builds, check if there are too many calls to JS::TraceChildren.

https://reviewboard.mozilla.org/r/80174/#review79274

::: xpcom/base/CycleCollectedJSContext.h:482
(Diff revision 2)
>      void invoke(JS::HandleObject scope, Closure& closure) override;
>    };
>    EnvironmentPreparer mEnvironmentPreparer;
> +
> +#ifdef DEBUG
> +  uint32_t mNumTraversed;

mNumTraversed is only about JS things, so its name should say so. I first through this was about all things traversed.
mNumOfTraversedJSThings?

Comment 71

•

8 years ago

mozreview-review

Comment on attachment 8793926 [details]
Bug 1301301, part 4 - Make calls to JS::TraceChildren go through a single method on CCJSContext.

https://reviewboard.mozilla.org/r/80518/#review79280

It might make sense to make CCJSTracer's ctor protected, so that only those subclasses could use it.

::: xpcom/base/CycleCollectedJSContext.cpp:150
(Diff revision 1)
> +
> +struct NoteWeakMapChildrenTracer : public CCJSTracer
> +{
> +  NoteWeakMapChildrenTracer(CycleCollectedJSContext* aCx,
>                              nsCycleCollectionNoteRootCallback& aCb)
> -    : JS::CallbackTracer(aCx), mCb(aCb), mTracedAny(false), mMap(nullptr),
> +    : CCJSTracer(aCx), mCb(aCb), mTracedAny(false), mMap(nullptr),

Note to myself...This relies on TraceWeakMapValues.
And looks like WeakMapTraceKind weakTraceKind = TraceWeakMapValues is also in JS::CallbackTracer, so behavior shouldn't change.

::: xpcom/base/CycleCollectedJSContext.cpp:347
(Diff revision 1)
> -{
> -  TraversalTracer(JSContext* aCx, nsCycleCollectionTraversalCallback& aCb)
> -    : JS::CallbackTracer(aCx, DoNotTraceWeakMaps), mCb(aCb)
> -  {
> +{
> -  }
> +  TraversalTracer(CycleCollectedJSContext* aCx,
> +                  nsCycleCollectionTraversalCallback& aCb)

note to myself...ok, here we keep using DoNotTraceWeakMaps

Attachment #8793926 - Flags: review?(bugs) → review+

Assignee

Comment 72

•

8 years ago

(In reply to Olli Pettay [:smaug] (TPAC) from comment #67)
> (not about this bug but I'm trying to remember what ObjectGroup type is and
> why it isn't listed here.)

I'm not entirely sure. I think it is some kind of shape-like thing. It would probably be useful to further profile page close CCs with a bunch of JS.

Yeah, the weak map stuff is a little messy. I don't remember off the top of my head exactly how it all works.

> mNumTraversed is only about JS things, so its name should say so.

That's a good point. I'll rename it to mNumTraversedGCThings, because it is incremented in TraverseGCThing.

> It might make sense to make CCJSTracer's ctor protected, so that only those subclasses could use it.

Good point. I have done that.

Comment hidden (mozreview-request)

Comment on attachment 8793926 [details]
Bug 1301301, part 4 - Make calls to JS::TraceChildren go through a single method on CCJSContext.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/80518/diff/1-2/

Comment hidden (mozreview-request)

Comment on attachment 8793437 [details]
Bug 1301301, part 5 - In debug builds, check if there are too many calls to JS::TraceChildren.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/80174/diff/2-3/

Assignee

Updated

•

8 years ago

Keywords: leave-open

Pulsebot

Comment 75

•

8 years ago

Pushed by amccreight@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/af8d3e696e93
part 1 - Unify NoteJSObject and NoteJSScript into NoteJSChild. r=smaug
https://hg.mozilla.org/integration/mozilla-inbound/rev/8dd081bff020
part 2 - Add Scope as an AddToCCKind. r=smaug

Virtual_ManPL [:Virtual] 🇵🇱 - (please needinfo? me - so I will see your comment/reply/question/etc.)

Comment 76

•

8 years ago

mozreview-review

Comment on attachment 8793437 [details]
Bug 1301301, part 5 - In debug builds, check if there are too many calls to JS::TraceChildren.

https://reviewboard.mozilla.org/r/80174/#review79576

::: xpcom/base/CycleCollectedJSContext.cpp:1384
(Diff revision 3)
> +  // can spend a lot of time in JS::TraceChildren(). If you add a new
> +  // TraceKind and are hitting this assertion, adding it to
> +  // AddToCCKind might help. (The check is not done for small CC
> +  // graphs to avoid noise.)
> +  if (mNumTraceChildren > 1000) {
> +    double traceRatio = ((double)mNumTraceChildren) / ((double)mNumTraversedGCThings);

You want to ensure mNumTraversedGCThings != 0

Attachment #8793437 - Flags: review?(bugs) → review+

Reporter

Updated

•

8 years ago

Updated

•

8 years ago

Comment 77

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/af8d3e696e93
https://hg.mozilla.org/mozilla-central/rev/8dd081bff020

Assignee

Comment 78

•

8 years ago

(In reply to Olli Pettay [:smaug] (TPAC) from comment #76)
> You want to ensure mNumTraversedGCThings != 0

Oh, good catch. I even saw that in a local run, but I forgot to fix it.

Assignee

Updated

•

8 years ago

Keywords: leave-open

Pulsebot

Comment 79

•

8 years ago

Pushed by amccreight@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/45fb9c1ce63a
part 3 - Implement Begin and EndCycleCollectionCallback in CCJSContext. r=smaug
https://hg.mozilla.org/integration/mozilla-inbound/rev/6186eae0c2b7
part 4 - Make calls to JS::TraceChildren go through a single method on CCJSContext. r=smaug
https://hg.mozilla.org/integration/mozilla-inbound/rev/da70c5abd73a
part 5 - In debug builds, check if there are too many calls to JS::TraceChildren. r=smaug

Phil Ringnalda (:philor)

Comment 80

•

8 years ago

Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/d217f8dd78d65ae5c90d98fb7191879d12d9ff14

The assertion got hit in https://treeherder.mozilla.org/logviewer.html#?job_id=36582894&repo=mozilla-inbound and https://treeherder.mozilla.org/logviewer.html#?job_id=36579865&repo=mozilla-inbound and it might have been getting hit and swallowed in VideoPuppeteer, https://treeherder.mozilla.org/logviewer.html#?job_id=36564623&repo=mozilla-inbound and a dozen others, or that might be something else, dunno.

Assignee

Comment 81

•

8 years ago

Hmm, well, this is likely not the only backout those patches will have, so I'll spin off the assertions into a separate bug to facilitate backporting as fast as possible.

Assignee

Updated

•

8 years ago

Blocks: 1306086

Assignee

Updated

•

8 years ago

Attachment #8793924 - Attachment is obsolete: true

Assignee

Updated

•

8 years ago

Attachment #8793437 - Attachment is obsolete: true

Assignee

Updated

•

8 years ago

Attachment #8793924 - Attachment is obsolete: false

Assignee

Updated

•

8 years ago

Attachment #8793925 - Attachment is obsolete: true

Assignee

Updated

•

8 years ago

Attachment #8793926 - Attachment is obsolete: true

Assignee

Comment 82

•

8 years ago

This only hit m-c on the 9-27 build, so I'm going to wait a few more days to confirm that this fixes the regression on telemetry, then request uplift to Aurora.

Status: ASSIGNED → RESOLVED

Closed: 8 years ago → 8 years ago

Flags: needinfo?(continuation)

Resolution: --- → FIXED

Assignee

Updated

•

8 years ago

status-firefox52: affected → fixed

Flags: needinfo?(continuation)

Virtual_ManPL [:Virtual] 🇵🇱 - (please needinfo? me - so I will see your comment/reply/question/etc.)

Assignee

Updated

•

8 years ago

Flags: needinfo?(continuation)

Ekanan Ketunuti

Updated

•

8 years ago

Target Milestone: --- → mozilla52

Reporter

Comment 83

•

8 years ago

I want to add that, I'm no longer encountering these annoying hangs for more than 2 days,
so I making this bug as VERIFIED.

Thank you very much! \o/

Status: RESOLVED → VERIFIED

status-firefox52: fixed → verified

Assignee

Comment 84

•

8 years ago

Looking at telemetry, I think this did fix the regression in mean and 95th percentile cycle collector max pauses.

https://mzl.la/2cGVplp

Flags: needinfo?(continuation)

Assignee

Comment 85

•

8 years ago

Comment on attachment 8793923 [details]
Bug 1301301, part 1 - Unify NoteJSObject and NoteJSScript into NoteJSChild.

Approval Request Comment
[Feature/regressing bug #]: bug 1263355
[User impact if declined]: very long hangs
[Describe test coverage new/current, TreeHerder]: This code runs all the time. I have patches in progress to check this particular regression, but we only regress this every few years so I think it isn't too critical that this gets landed immediately.
[Risks and why]: 
[String/UUID change made/needed]: Fairly low. Part 1 has a lot of changes, but is mostly a code refactoring, so it is low risk. Part 2 is small, and potentially risky, but it should really just change our behavior back to before bug 1263355 landed.

Attachment #8793923 - Flags: approval-mozilla-aurora?

Assignee

Updated

•

8 years ago

Attachment #8793924 - Flags: approval-mozilla-aurora?

Gerry Chang [:gchang]

Comment 86

•

8 years ago

Hi :mccr8,
Are patch 1 & 2 the only patches we want to uplift to 51 aurora?

Flags: needinfo?(continuation)