Trying to navigate History.com causes high CPU usage and Firefox stops working

VERIFIED FIXED in Firefox 46

Status

()

defect
VERIFIED FIXED
4 years ago
3 years ago

People

(Reporter: rowbot, Assigned: birtles)

Tracking

({regression})

Trunk
mozilla48
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox45 unaffected, firefox46 fixed, firefox47 fixed, firefox48 fixed)

Details

()

Attachments

(3 attachments, 5 obsolete attachments)

I am using the latest nightly x64 on Windows 10.  I'm working on getting a regression range, but have gotten 3 different ranges each of the 3 times I've run it, so I'm just going to get this bug on file for now.

STR:
1) Visit http://www.history.com/shows
2) Click on any of the shows

Actual Results:
CPU usage of plugin-container jumps to 14% and just sits there. Trying to close Firefox or the tab takes a minimum of 5 seconds and no other tabs will load during this time.  I left Firefox alone for 10 minutes on 1 attempt and memory grew to over 2 GB.

Expected Results:
The page for the show clicked on should be loaded.
I forgot to mention that I already tried testing with a clean profile with no luck.  I also tried disabling e10s and the result is worse since the entire browser just locks up.
Here are the mozregression results, looks like Bug 1238660 is the culprit:

Last good revision: 6020a4cb41a77a09484c24a5875bb221714c0e6a (2016-01-12)
First bad revision: ad1f85f172b7302bef0fa9780df8e2b962780ac6 (2016-01-13)
Pushlog: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=6020a4cb41a77a09484c24a5875bb221714c0e6a&tochange=ad1f85f172b7302bef0fa9780df8e2b962780ac6

Last good revision: da1d2c63b0686a11be1b14c6e1f76a4875bb0e64
First bad revision: 531d1f6d1cde1182e9f7f9dff81a4fc5abc0a601
Pushlog: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=da1d2c63b0686a11be1b14c6e1f76a4875bb0e64&tochange=531d1f6d1cde1182e9f7f9dff81a4fc5abc0a601

Last good revision: 4680d0b685242a811eef759584ce6a6d27e15732
First bad revision: f58e7ca07205514611d65a65aba46cbf3db5cddf
Pushlog: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=4680d0b685242a811eef759584ce6a6d27e15732&tochange=f58e7ca07205514611d65a65aba46cbf3db5cddf
Blocks: 1238660
Component: General → CSS Parsing and Computation
Hiro, could you look at this?
Flags: needinfo?(hiikezoe)
My gut feeling tells me that we should set mWinsInCascade true in case of CSS transitions.
Posted file Reduced test case not finished yet (obsolete) —
This issue highly related to web fonts which don't exist.

This html is still in the middle of reducing but I am attaching it for now because I did accidentally remove a html file which I was actually working last night.
Posted file More reduced. (obsolete) —
Attachment #8715614 - Attachment is obsolete: true
I am now digging into a bit deeper.  We can not bail out a loop[1] in RestyleTracker::DoProcessRestyles.  That's because all of AnimationCollection::mCheckGeneration for all elements do not match to the generation owned by RestyleManager at all?

[1] https://dxr.mozilla.org/mozilla-central/rev/584870f1cbc5d060a57e147ce249f736956e2b62/layout/base/RestyleTracker.cpp#222
Flags: needinfo?(hiikezoe)
Disable gfx.downloadable_fonts.enabled solves this issue. Interesting.
The styling process we can not bail out comes from nsPresContext::UserFontSetUpdated() [1].

[1] https://dxr.mozilla.org/mozilla-central/rev/584870f1cbc5d060a57e147ce249f736956e2b62/layout/base/nsPresContext.cpp#2244

It's actually called from layout process [2].

[2] https://dxr.mozilla.org/mozilla-central/rev/584870f1cbc5d060a57e147ce249f736956e2b62/layout/base/nsRefreshDriver.cpp#1715

I don't know why we can not bail out the loop if it's called from layout process.
Duplicate of this bug: 1248943
Posted patch A possible fix (obsolete) — Splinter Review
We could take the same approach as we did in bug 1242872, not to create temporary CSSTransition.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=d2f289d39302

This is dirty but for confirmation that this patch does not break anything.
Lots of oranges..  It seems that 'removedAnimations' is not notified to MutationObserver.
Posted file 1245260.html
A test case has no web fonts.
Web fonts is actually not related to this issue.
An empty script tag seems to be a trigger of this issue.
Attachment #8715619 - Attachment is obsolete: true
I'm surprised that bug 1238660 regresses transitions since it mostly deals with animations. I guess it's related to how we handle mWinsInCascade. I'll take a further look.
Confirmed that making mWinsInCascade to default to true fixes this.

I suspect making it false will cause |changed| in EffectCompositor::UpdateCascadeResults to become true causing us to request another restyle. Digging further...
(In reply to Brian Birtles (:birtles) from comment #15)
> Confirmed that making mWinsInCascade to default to true fixes this.
> 
> I suspect making it false will cause |changed| in
> EffectCompositor::UpdateCascadeResults to become true causing us to request
> another restyle.

Yes. What I observed is that once the another restyle is processed, AND the animation generation check [1] is failed,  we try to resolve style without the transition property[2].  As a result of it, we repeatedly request restyle because the transition is considered that the transition should be animated there[3].

[1] https://dxr.mozilla.org/mozilla-central/rev/2b5237c178ea02133a777396c24dd2b713f2b8ee/layout/style/nsTransitionManager.cpp#296
[2] https://dxr.mozilla.org/mozilla-central/rev/2b5237c178ea02133a777396c24dd2b713f2b8ee/layout/style/nsTransitionManager.cpp#330
[3] https://dxr.mozilla.org/mozilla-central/rev/2b5237c178ea02133a777396c24dd2b713f2b8ee/layout/style/nsTransitionManager.cpp#532
(In reply to Hiroyuki Ikezoe (:hiro) from comment #12)
> Lots of oranges..  It seems that 'removedAnimations' is not notified to
> MutationObserver.

I think the trouble is that we can't just turn old transitions into new transitions. It should be a new object identity when we start a new transition.

I'm going to see if we can get that animation generation check to work for us.

It might also be reasonable, when generating a new transition where there is an existing transition, to transfer the value of the mWinsInCascade member over, but I suspect we shouldn't be generating a new transition in the first place.
Regarding the animation check, the root of the problem is that we're overloading the animation generation to do two things: detect redundant style changes and detect when layers on the compositor need to be updated.

In the test case we have a bunch of nested elements and we get into a situation where we generate a transition, its mWinsInCascade member is initially false, so when we update the cascade we end up marking it true, detecting that as a change that we need to synchronize with the compositor and then requesting a layer restyle. That "layer restyle" updates the animation generation on the restyle manager. And we do it twice because we request a layer restyle for both animations and transitions levels of the cascade.

Then, when we get back to where we triggered the transition, we update the check generation on the transition collection using the updated animation generation on the restyle manager.

Furthermore, we post a number of restyles on the same element. One from the EffectCompositor when we detect that the cascade has changed, and another in nsTransitionManager since we always post a restyle after triggering a transition.

However, before we get to doing that restyle, we'll trigger transitions on other elements which will, in turn, spin the animation generation on the restyle manager. By the time we get back to doing the restyle we posted on the original element, the animation generation on the restyle manager will no longer match and the check we do at the start of nsTransitionManager::StyleContextChanged to avoid generating redundant transitions will fail.

So the problem is we're updating the animation generation in a case where we're not expecting it to. The RequestRestyle code was originally intended to be used in response to API calls where it's appropriate to update the animation generation. However, when we're in the middle of a restyle, we really only want to update it once (in RestyleManager::ProcessPendingRestyles()).

So we probably want to either:

* Split out the mechanism used to synchronize animations on layers from the mechanism used to detect redundant transitions, OR
* Somehow block RequestRestyle(Layer) from incrementing the generation while inside ProcessPendingRestyles
While processing restyles and starting transitions, we may trigger
a call to EffectCompositor::UpdateCascadeResults which may, in turn, call
EffectCompositor::RequestRestyle with RestyleType::Layer, which ultimately
results in a call to RestyleManager::IncrementAnimationGeneration().

That can cause the check for animation generations at the start of
nsTransitionManager::StyleContextChanged to fail causing us to *not* ignore
a subsequent restyle. With certain combinations of content this can mean that
restyles are posted in such a manner than an infinite cycle of restyles is
never broken.

This patch causes RestyleManager to ignore calls to IncrementAnimationGeneration
whilst it is already processing restyles so that the animation generation is
only ever updated once per restyle. This makes the check for a matching
animation generation in nsTransitionManager::StyleContextChanged work as
expected, preventing us from generating needless transitions which can produce
this endless loop.
Assignee: nobody → bbirtles
Status: NEW → ASSIGNED
Attachment #8726571 - Attachment is obsolete: true
Attachment #8724657 - Attachment is obsolete: true
The test case, attachment 8725125 [details], does not cause any hangs on opt builds.  So it's not suitable for automation test, we should use web fonts for automation just in case.
Keywords: regression
David, review ping? This bug has now entered beta so we need to try to uplift a fix soon.
Flags: needinfo?(dbaron)
Comment on attachment 8726604 [details] [diff] [review]
Ignore redundant calls to RestyleManager::IncrementAnimationGeneration

Sorry about that; r=dbaron.  It took me a bit of time to remember how this animation generation works.
Flags: needinfo?(dbaron)
Attachment #8726604 - Flags: review?(dbaron) → review+
(In reply to Brian Birtles (:birtles) from comment #18)
> Regarding the animation check, the root of the problem is that we're
> overloading the animation generation to do two things: detect redundant
> style changes and detect when layers on the compositor need to be updated.

Where are we using it to detect redundant style changes?
(In reply to David Baron [:dbaron] ⌚️UTC-7 from comment #25)
> (In reply to Brian Birtles (:birtles) from comment #18)
> > Regarding the animation check, the root of the problem is that we're
> > overloading the animation generation to do two things: detect redundant
> > style changes and detect when layers on the compositor need to be updated.
> 
> Where are we using it to detect redundant style changes?

Perhaps my terminology is wrong, but comment 18 describes a situation where the animation check should prevent us from triggering redundant transitions.
https://hg.mozilla.org/mozilla-central/rev/4da14b8142a5
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Posted patch Add crashtest (obsolete) — Splinter Review
I've verified that this crashtest fails without attachment 8726604 [details] [diff] [review] but passes with it
Attachment #8733656 - Flags: review?(hiikezoe)
(We need to uplift this to aurora and beta but I want to make sure we have test coverage first.)
Comment on attachment 8733656 [details] [diff] [review]
Add crashtest

Review of attachment 8733656 [details] [diff] [review]:
-----------------------------------------------------------------

Does this test fail on opt builds too?
If not, we might need below fonce-face entry in style section.

@font-face {
  font-family: "icon-fonts";
  src: url(x);
}

::: layout/style/crashtests/crashtests.list
@@ +136,5 @@
>  load 1233135-1.html
>  load 1233135-2.html
>  load 1238660-1.html
>  load 1247865-1.html
> +load 1245260-1.html

Nit: This should be one line before.
Attachment #8733656 - Flags: review?(hiikezoe) → review+
After adding the @font-face rule it consistently fails in both debug and release once I revert the fix from this bug:

  https://treeherder.mozilla.org/#/jobs?repo=try&revision=8dbb7ce7c773

But passes with the fix in place:

  https://treeherder.mozilla.org/#/jobs?repo=try&revision=aa79a66fc5c7

Will land once m-i opens and request aurora/beta uplift after letting the test run for a day or so.
Posted patch Add crashtestSplinter Review
MozReview-Commit-ID: IBe0zbsCJJy
Attachment #8733656 - Attachment is obsolete: true
Comment on attachment 8726604 [details] [diff] [review]
Ignore redundant calls to RestyleManager::IncrementAnimationGeneration

Approval Request Comment
[Feature/regressing bug #]: 1238660
[User impact if declined]: Browser hangs with certain uses of transitions (e.g. history.com)
[Describe test coverage new/current, TreeHerder]: Has baked on m-c for ~1 week. Includes crashtest (separate patch--which we should probably uplift too I guess)
[Risks and why]: Nothing obvious. It's possible it could break some existing content by causing us to not update transitions when we ought to, but that seems unlikely. 
[String/UUID change made/needed]: None
Attachment #8726604 - Flags: approval-mozilla-beta?
Attachment #8726604 - Flags: approval-mozilla-aurora?
Comment on attachment 8734167 [details] [diff] [review]
Add crashtest

Approval Request Comment
See comment 36. This is the crashtest that goes with that patch.
Attachment #8734167 - Flags: approval-mozilla-beta?
Attachment #8734167 - Flags: approval-mozilla-aurora?
Trevor, could you please verify this issue is fixed as expected on a latest Nightly build? Thanks!
Flags: needinfo?(smokey101stair)
Comment on attachment 8726604 [details] [diff] [review]
Ignore redundant calls to RestyleManager::IncrementAnimationGeneration

Fixes a browser hang, Aurora47+
Attachment #8726604 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8734167 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Confirming that this is now fixed using today's nightly 2016-03-28.
Status: RESOLVED → VERIFIED
Flags: needinfo?(smokey101stair)
Comment on attachment 8726604 [details] [diff] [review]
Ignore redundant calls to RestyleManager::IncrementAnimationGeneration

Fix for crash, verified on m-c, includes a new test. Let's try this on beta 6, I would like to avoid shipping the regression.
Attachment #8726604 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8734167 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.