Crash [@ nsRefreshDriver::RemoveRefreshObserver] with CSS transition, navigation

RESOLVED FIXED in Firefox 44

Status

()

defect
--
critical
RESOLVED FIXED
4 years ago
3 years ago

People

(Reporter: jruderman, Assigned: birtles)

Tracking

(Blocks 1 bug, 4 keywords)

Trunk
mozilla46
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox44 fixed, firefox45 fixed, firefox46 fixed, b2g-v2.5 fixed)

Details

(crash signature)

Attachments

(4 attachments, 1 obsolete attachment)

@ mozilla::dom::DocumentTimeline::WillRefresh

> -> 148      MOZ_ASSERT(GetRefreshDriver(),
>    149                 "Refresh driver should still be valid inside WillRefresh");
>    150      GetRefreshDriver()->RemoveRefreshObserver(this, Flush_Style);

Assertion failure: GetRefreshDriver() (Refresh driver should still be valid inside WillRefresh), at dom/animation/DocumentTimeline.cpp:149

In non-debug builds, it reaches the next line, calls nsRefreshDriver::RemoveRefreshObserver with null |this|, and quickly crashes within that function.
Posted file stack
Tagging birtles, as he added this assertion (and the subsequent line that derefs the null refresh-driver pointer) in bug 1195180. Seems likely that this is a regression from that bug, though I suppose it's also possible that this assertion was valid at that point in time & has become invalid since.
Component: Layout → DOM: Animation
Flags: needinfo?(bbirtles)
Depends on: 1195180
Actually there was a null-check and no assertion when I wrote the patch but during review it got removed :/ (bug 1195180 comment 23).

I'll have to work out if something changed since then.
It looks like at the *start* of the call to WillRefresh() mDocument->GetShell() is returning null.

nsRefreshDriver checks for a null pres shell before calling WillRefresh[1] so I guess we're looking up a different pres context when we go via mDocument. I guess the combination of document.write() followed by history.back() and the presence of a pop-up window triggers a situation where mDocument on DocumentTimeline no longer has a pres shell.

[1] https://dxr.mozilla.org/mozilla-central/rev/cb66ffeb6725e8344818e8e2f707ae2eaeb953b4/layout/base/nsRefreshDriver.cpp#1653
My current theory is that using document.write() + history.back() means we end up getting evicted from the bfcache and hence mDocument->GetShell() returns null.
Flags: needinfo?(bbirtles)
(In reply to Jesse Ruderman from comment #0)
> Created attachment 8698691 [details]
> testcase (crashes Firefox in 2-3 seconds if popups are enabled)
> 
> @ mozilla::dom::DocumentTimeline::WillRefresh
> 
> > -> 148      MOZ_ASSERT(GetRefreshDriver(),
> >    149                 "Refresh driver should still be valid inside WillRefresh");
> >    150      GetRefreshDriver()->RemoveRefreshObserver(this, Flush_Style);
> 
> Assertion failure: GetRefreshDriver() (Refresh driver should still be valid
> inside WillRefresh), at dom/animation/DocumentTimeline.cpp:149

I could reproduce this crash on non E10S. The DocumentTimeline was nullified (caused by document.write) at [1], but someone held its reference so the DocumentTimeline was alive at that time. 
https://dxr.mozilla.org/mozilla-central/rev/e6463ae7eda2775bc84593bb4a0742940bb87379/dom/base/nsDocument.cpp#2120
(In reply to Hiroyuki Ikezoe (:hiro) from comment #7)
> (In reply to Jesse Ruderman from comment #0)
> > Created attachment 8698691 [details]
> > testcase (crashes Firefox in 2-3 seconds if popups are enabled)
> > 
> > @ mozilla::dom::DocumentTimeline::WillRefresh
> > 
> > > -> 148      MOZ_ASSERT(GetRefreshDriver(),
> > >    149                 "Refresh driver should still be valid inside WillRefresh");
> > >    150      GetRefreshDriver()->RemoveRefreshObserver(this, Flush_Style);
> > 
> > Assertion failure: GetRefreshDriver() (Refresh driver should still be valid
> > inside WillRefresh), at dom/animation/DocumentTimeline.cpp:149
> 
> I could reproduce this crash on non E10S. The DocumentTimeline was nullified
> (caused by document.write) at [1], but someone held its reference so the
> DocumentTimeline was alive at that time. 

I was wrong. Nobody held the reference. The DocumentTimeline was just alive because cycle collector did not run at the time.
So, we should call RemoveRefreshObserver for DocumentTimeline just before nullifying it.
Assignee: nobody → bbirtles
Status: NEW → ASSIGNED
Attachment #8699365 - Flags: review?(bugs)
(In reply to Brian Birtles (:birtles) from comment #6)
> My current theory is that using document.write() + history.back() means we
> end up getting evicted from the bfcache and hence mDocument->GetShell()
> returns null.
Yes, any mutation to the DOM tree evicts bfcache for that entry.
And document.write() removes all the child nodes from document before adding new content.
Comment on attachment 8699365 [details] [diff] [review]
Detach obsolete DocumentTimeline from refresh driver when the document is reset

>+  if (mDocumentTimeline) {
>+    nsRefreshDriver* rd = mPresShell->GetPresContext()->RefreshDriver();
You need to null check mPresShell and GetPresContext()
Attachment #8699365 - Flags: review?(bugs) → review+
Do we need the same stuff for animation controller?
(In reply to Olli Pettay [:smaug] from comment #12)
> Do we need the same stuff for animation controller?

I haven't been able to reproduce a similar crash using the same test with SVG animation elements. I'm not sure exactly why except that:

* We don't release the animation controller in nsDocument::Reset
* We pause the controller on page hide which disconnects it from the refresh driver
* We don't try to access the refresh driver in WillRefresh

I suspect it's that second point that is particularly pertinent here.

We've been fuzzing the SMIL code for quite a few years and haven't come across a bug in that area yet so I suspect it's ok. I'm not sure it's worth digging into too much further since long-term I expect that class to go away (we'll either drop SMIL altogether or rewrite it JS using the Web Animations API).
https://hg.mozilla.org/mozilla-central/rev/84169b40100b
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
And we need this on branches too
This is the version of the patch that landed on m-c (which includes the review feedback from comment 11).

Approval Request Comment
[Feature/regressing bug #]: bug 1195180
[User impact if declined]: Crash when using document.write() in combination with CSS animations/transitions in some situations (e.g. when pop-ups are present)
[Describe test coverage new/current, TreeHerder]: Patch includes regression test, has baked on mozilla-central for 2 weeks.
[Risks and why]: None
[String/UUID change made/needed]: None
Attachment #8699365 - Attachment is obsolete: true
Attachment #8703912 - Flags: approval-mozilla-beta?
Attachment #8703912 - Flags: approval-mozilla-aurora?
Comment on attachment 8703912 [details] [diff] [review]
Detach obsolete DocumentTimeline from refresh driver when the document is reset

Taking it in aurora.
Could you detail why you think that there is absolutely no risk? To me, every single change carries risk.
Attachment #8703912 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8703912 [details] [diff] [review]
Detach obsolete DocumentTimeline from refresh driver when the document is reset

Crash fix that includes an automated test, beta44+
Attachment #8703912 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to Sylvestre Ledru [:sylvestre] from comment #18)
> Comment on attachment 8703912 [details] [diff] [review]
> Detach obsolete DocumentTimeline from refresh driver when the document is
> reset
> 
> Taking it in aurora.
> Could you detail why you think that there is absolutely no risk? To me,
> every single change carries risk.

None here is just shorthand for "none other than the usual possibility that this introduces some otherwise unforeseen adverse side effects" since I noticed others were no longer writing that. That is, I don't know of any specific risks here.
Duplicate of this bug: 1232273
Blocks: 1195180, 1232273
No longer depends on: 1195180
Keywords: regression
Hi guys, this crash signature is spiking up on 47.0b. This has jumped 136 spots up and is ranked at #15 at the moment. Could you please help find an owner who can investigate the crash reports? Thanks!
Flags: needinfo?(overholt)
Flags: needinfo?(bugs)
Flags: needinfo?(bugs)
This or similar stacks? Do you have links to some crash reports?
Flags: needinfo?(bugs) → needinfo?(rkothari)
(In reply to Olli Pettay [:smaug] (pto-ish May 4 - May 8) from comment #26)
> This or similar stacks? Do you have links to some crash reports?

They re-mapped crash-stats version info and since that change was made an hour ago, I don't see this in the top ranked crashes on 47.0b. Sorry for the confusion!
Flags: needinfo?(rkothari)
Flags: needinfo?(overholt)
Flags: needinfo?(bugs)
You need to log in before you can comment on or make changes to this bug.