promiseDocumentFlushed's refresh observer can stick around and cause crashes in nsGlobalWindowInner::DidRefresh
Categories
(Core :: Layout, defect)
Tracking
()
People
(Reporter: Gijs, Assigned: emilio)
References
Details
Crash Data
Attachments
(2 files)
I ran into this in bug 1693008.
The code under test I'm adding is using promiseDocumentFlushed
to react to layout changes when the title changes, so that it can add/remove an "overflown" attribute which adds a mask to the title of a dialog.
The test exercises this code, and uses a mutation observer to await the new attribute and text content, and then immediately closes the prompt.
This reliably crashed my opt builds with this stack:
0 xul.dll!nsGlobalWindowInner::DidRefresh() [nsGlobalWindowInner.cpp:092ee6b0c9f2b093ae808df6ee8c9d49ef0d3b5d : 7060 + 0x0]
rax = 0x0000000000000000 rdx = 0x0000024fdece2ea0
rcx = 0x000000000000068f rbx = 0x0000024fe060d568
rsi = 0x0000024fe060d580 rdi = 0x0000024fe060d400
rbp = 0x00007ff8c456a0a0 rsp = 0x000000ea2dbfcc00
r8 = 0x7efefefefefefe00 r9 = 0x000000007ffee001
r10 = 0x00000fff10b0ceee r11 = 0x0000400000000000
r12 = 0x0000024fd9261000 r13 = 0x00007ff888f69198
r14 = 0x000000000242e350 r15 = 0x00007ff888f69101
rip = 0x00007ff8858677ba
Found by: given as instruction pointer in context
1 xul.dll!nsRefreshDriver::Tick(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp) [nsRefreshDriver.cpp:092ee6b0c9f2b093ae808df6ee8c9d49ef0d3b5d : 2388 + 0xc]
rbx = 0x0000024fe060d568 rbp = 0x00007ff8c456a0a0
rsp = 0x000000ea2dbfcc50 r12 = 0x0000024fd9261000
r13 = 0x00007ff888f69198 r14 = 0x000000000242e350
r15 = 0x00007ff888f69101 rip = 0x00007ff886d5fedb
Found by: call frame info
2 xul.dll!mozilla::RefreshDriverTimer::TickRefreshDrivers(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp, nsTArray<RefPtr<nsRefreshDriver> >&) [nsRefreshDriver.cpp:092ee6b0c9f2b093ae808df6ee8c9d49ef0d3b5d : 324 + 0x1f]
rbx = 0x0000024fe060d568 rbp = 0x00007ff8c456a0a0
rsp = 0x000000ea2dbfe170 r12 = 0x0000024fd9261000
r13 = 0x00007ff888f69198 r14 = 0x000000000242e350
r15 = 0x00007ff888f69101 rip = 0x00007ff886d64980
Found by: call frame info
3 xul.dll!mozilla::RefreshDriverTimer::Tick(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp) [nsRefreshDriver.cpp:092ee6b0c9f2b093ae808df6ee8c9d49ef0d3b5d : 339 + 0x1f]
rbx = 0x0000024fe060d568 rbp = 0x00007ff8c456a0a0
rsp = 0x000000ea2dbfe200 r12 = 0x0000024fd9261000
r13 = 0x00007ff888f69198 r14 = 0x000000000242e350
r15 = 0x00007ff888f69101 rip = 0x00007ff886d647e1
Found by: call frame info
4 xul.dll!mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::TickRefreshDriver(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp) [nsRefreshDriver.cpp:092ee6b0c9f2b093ae808df6ee8c9d49ef0d3b5d : 699 + 0x27]
rbx = 0x0000024fe060d568 rbp = 0x00007ff8c456a0a0
rsp = 0x000000ea2dbfe2a0 r12 = 0x0000024fd9261000
r13 = 0x00007ff888f69198 r14 = 0x000000000242e350
r15 = 0x00007ff888f69101 rip = 0x00007ff886d64357
Found by: call frame info
5 xul.dll!mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::NotifyParentProcessVsync() [nsRefreshDriver.cpp:092ee6b0c9f2b093ae808df6ee8c9d49ef0d3b5d : 612 + 0x23]
rbx = 0x0000024fe060d568 rbp = 0x00007ff8c456a0a0
rsp = 0x000000ea2dbfe3a0 r12 = 0x0000024fd9261000
r13 = 0x00007ff888f69198 r14 = 0x000000000242e350
r15 = 0x00007ff888f69101 rip = 0x00007ff886d63dca
Found by: call frame info
6 xul.dll!mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::ParentProcessVsyncNotifier::Run() [nsRefreshDriver.cpp:092ee6b0c9f2b093ae808df6ee8c9d49ef0d3b5d : 502 + 0x5]
rbx = 0x0000024fe060d568 rbp = 0x00007ff8c456a0a0
rsp = 0x000000ea2dbfe460 r12 = 0x0000024fd9261000
r13 = 0x00007ff888f69198 r14 = 0x000000000242e350
r15 = 0x00007ff888f69101 rip = 0x00007ff886d64aff
Found by: call frame info
When debugging, it becomes clear that presShell
is null (so a debug build would hit the relevant debug assert) - the dialog (which is really just a same-process <browser>
inside the browser.xhtml window) closing destroys its pres shell, and the document flushed code uses PresShell::AddPostRefreshObserver
. Those don't get removed properly when the document goes away and things go ka-boom when they try to fire after the presshell is gone. Emilio suggested the PromiseDocumentFlushed code should add itself as a one-shot observer instead.
For the moment I can work around in the test by using await new Promise(requestAnimationFrame)
before closing the dialog, but we should fix the implementation here.
Comment 1•3 years ago
|
||
So, just to confirm -- should these work as STR here? Or am I missing anything?
(1) Change browser/base/content/test/tabPrompts/browser_contentOrigins.js
to look the way it did in the first uploaded version of your bug 1693008 patch -- i.e. this version: https://phabricator.services.mozilla.com/D109152?id=415810
(2) Run ./mach test browser/base/content/test/tabPrompts/browser_contentOrigins.js
I tried this on Mac in a debug build and didn't hit any crashes or assertions, FWIW (but I do see you were using an opt build, and in phabricator you mentioned Windows, so either of those could be a relevant difference between the env I've tested so far & your affected environment).
Comment 2•3 years ago
|
||
Yeah, those STR (with a debug+opt build on linux) were basically sufficient to trigger the bug.
I didn't hit the crash the first time, but I did when I ran it with --run-until-failure --repeat=1000
(and it only took ~2 minutes to crash).
I captured it in rr; I'll poke around a little and post what I find.
Comment 3•3 years ago
|
||
So it looks like this is what happens, roughly:
- During a refresh driver tick,
ProcessRestyledFrames
callsRecreateFramesForContent
- This causes a
nsSubDocumentFrame
(for an iframe or similar) to be destroyed. - its DestroyFrom method queues up an
nsHideViewer
runnable to be invoked as a script runner.
Then:
- Near the end of that refresh driver tick, when the script blocker in
DoFlushPendingNotifications()
goes out of scope, ournsHideViewer
runnable gets run, which (via a few intermediate layers) callsnsDocumentViewer::DestroyPresShell
, which triggers Document::DeletePresShell, which nulls out the Document's mPresShell pointer. Here's that backtrace:
#0 mozilla::dom::Document::DeletePresShell() (this=0x7fb51d61e000) at $SRC/dom/base/Document.cpp:6776
#1 0x00007fb54adc65f0 in mozilla::PresShell::Destroy() (this=0x7fb50cc97000) at $SRC/layout/base/PresShell.cpp:1423
#2 0x00007fb54ae8118e in nsDocumentViewer::DestroyPresShell() (this=0x7fb52e38a580) at $SRC/layout/base/nsDocumentViewer.cpp:3647
#3 0x00007fb54ae7d699 in nsDocumentViewer::Hide() (this=0x7fb52e38a580) at $SRC/layout/base/nsDocumentViewer.cpp:2198
#4 0x00007fb54d2ad79f in nsDocShell::SetVisibility(bool) (this=0x7fb511a32c00, aVisibility=false) at $SRC/docshell/base/nsDocShell.cpp:5077
#5 0x00007fb546de3365 in nsFrameLoader::Hide() (this=0x7fb50e59aa00) at $SRC/dom/base/nsFrameLoader.cpp:1147
#6 0x00007fb54b153405 in nsHideViewer::Run() (this=0x7fb50cd0fe20) at $SRC/layout/generic/nsSubDocumentFrame.cpp:882
#7 0x00007fb5469556e9 in nsContentUtils::RemoveScriptBlocker() () at $SRC/dom/base/nsContentUtils.cpp:5569
#8 0x00007fb545e882d1 in nsAutoScriptBlocker::~nsAutoScriptBlocker() (this=0x7ffe34bfecf8) at $SRC/dom/base/nsContentUtils.h:3476
#9 0x00007fb54add6f9f in mozilla::PresShell::DoFlushPendingNotifications(mozilla::ChangesToFlush) (this=0x7fb522705000, aFlush=...) at $SRC/layout/base/PresShell.cpp:4203
#10 0x00007fb5469b1753 in mozilla::PresShell::FlushPendingNotifications(mozilla::ChangesToFlush) (this=0x7fb522705000, aType=...) at $OBJ/dist/include/mozilla/PresShell.h:1408
#11 0x00007fb54ad85d5b in nsRefreshDriver::Tick(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp) (this=0x7fb522751000, aId=..., aNowTime=...) at $SRC/layout/base/nsRefreshDriver.cpp:2190
#12 0x00007fb54ad93a92 in mozilla::RefreshDriverTimer::TickDriver(nsRefreshDriver*, mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp) (driver=0x7fb522751000, aId=..., now=...) at $SRC/layout/base/nsRefreshDriver.cpp:347
#13 0x00007fb54ad938b4 in mozilla::RefreshDriverTimer::TickRefreshDrivers(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp, nsTArray<RefPtr<nsRefreshDriver> >&) (this=0x7fb52f3bbd60, aId=..., aNow=..., aDrivers=nsTArray<RefPtr<nsRefreshDriver> > & = {...}) at $SRC/layout/base/nsRefreshDriver.cpp:325
#14 0x00007fb54ad936fe in mozilla::RefreshDriverTimer::Tick(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp) (this=0x7fb52f3bbd60, aId=..., now=...) at $SRC/layout/base/nsRefreshDriver.cpp:341
Then, later on in the same call to nsRefreshDriver::Tick
(stack level 11), we hit the offending call to nsGlobalWindowInner::DidRefresh
and assert/crash, because the nsGlobalWindowInner's mDoc is pointing to the inner document that has just been hidden and has had its mRefreshDriver pointer nulled out.
More investigation to be done, but that's what I've got so far
Reporter | ||
Comment 4•3 years ago
|
||
I'm seeing these crashes turn up some more on various jobs on https://treeherder.mozilla.org/jobs?repo=try&revision=cabf5e40b9404ba3c80570c6195ef0f11f9cba3b&selectedTaskRun=VAuc4_ZCS2WQLYdzV5Msiw.0 , e.g.:
https://treeherder.mozilla.org/jobs?repo=try&revision=cabf5e40b9404ba3c80570c6195ef0f11f9cba3b&selectedTaskRun=XRt5nJ34Tmqbp8jQTW9Qug.0
https://treeherder.mozilla.org/jobs?repo=try&revision=cabf5e40b9404ba3c80570c6195ef0f11f9cba3b&selectedTaskRun=bfJAcD9SRmSDSus6TZ9uJg.0
https://treeherder.mozilla.org/jobs?repo=try&revision=cabf5e40b9404ba3c80570c6195ef0f11f9cba3b&selectedTaskRun=FdJqX4pkQu28_unI5fF7Kw.0
https://treeherder.mozilla.org/jobs?repo=try&revision=cabf5e40b9404ba3c80570c6195ef0f11f9cba3b&selectedTaskRun=EmyJ1KxrQZeDIod5AqZANA.0
This includes Linux debug (last one). Fixing this (or finding a workaround) is likely going to block bug 1704616, which we'd like to land this week prior to 89 branching. To reproduce locally, it should be sufficient to run the test locally with the browser.proton.modals.enabled
pref flipped to true (which you can also do with the mochitest framework).
Dan, did you get any closer to a solution here? At the time, Emilio suggested the PromiseDocumentFlushed code should add itself as a one-shot observer instead - would that be a possible solution?
Comment 5•3 years ago
•
|
||
I didn't get any closer beyond what I've posted above - I was mostly trying to dump some info about what's going on when this crash happens, since I did happen to catch it in rr (and I was intending to circle back to do some more poking around, but haven't been able to yet).
I'm not familiar with PromiseDocumentFlushed or one-shot observers, so I can't comment on emilio's suggestion at this point. emilio, do you have any cycles this week to investigate the approach that you suggested?
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 7•3 years ago
|
||
Allow using the MOZ_KnownLive function to get around it.
Use case is the following: I have an std::function member variable, and I want
that member to be able to capture this
.
Using a strong reference creates a cycle and thus would leak. I know this
to
always outlive the member, so it is fine to use a weak capture there.
Assignee | ||
Comment 8•3 years ago
|
||
By rejecting the relevant promises, instead of crashing (and if we didn't crash
we'd leave the window registered as a refresh driver observer, which would be
bad).
For this, we modify the OneShotPostRefreshObserver API to deal with cancelation
due to shell destruction.
We fix APZ's usage of this API, which was doing something extremely weird
(returning a refcounted object in a UniquePtr).
Depends on D111850
Updated•3 years ago
|
Pushed by ealvarez@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/ee4b9934e074 Add an escape hatch for the refcounted inside lambda checker. r=andi
Comment 10•3 years ago
|
||
Pushed by ealvarez@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/472077cb1e38 Make promiseDocumentFlushed handle presshell destruction correctly. r=smaug,botond
Comment 11•3 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/ee4b9934e074
https://hg.mozilla.org/mozilla-central/rev/472077cb1e38
Updated•3 years ago
|
Description
•