Closed
Bug 1330484
Opened 8 years ago
Closed 8 years ago
crash near NULL [mozilla::a11y::NotificationController::WillRefresh]
Categories
(Core :: Disability Access APIs, defect)
Core
Disability Access APIs
Tracking
()
RESOLVED
FIXED
mozilla55
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox53 | --- | disabled |
firefox54 | --- | disabled |
firefox55 | + | fixed |
People
(Reporter: tsmith, Assigned: handyman)
References
(Blocks 1 open bug)
Details
(Keywords: crash, csectype-nullptr, testcase, Whiteboard: aes+)
Crash Data
Attachments
(4 files, 3 obsolete files)
225 bytes,
text/html
|
Details | |
5.14 KB,
text/plain
|
Details | |
2.58 KB,
patch
|
jimm
:
review+
|
Details | Diff | Splinter Review |
1.59 KB,
patch
|
jimm
:
review+
|
Details | Diff | Splinter Review |
No description provided.
Reporter | ||
Comment 1•8 years ago
|
||
Bah, darn trackpad submitted before I was done.
==8687==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000102 (pc 0x7fc5ba03d0fa bp 0x7ffdc31a0210 sp 0x7ffdc319fca0 T0)
#0 0x7fc5ba03d0f9 in mozilla::a11y::NotificationController::WillRefresh(mozilla::TimeStamp) /home/worker/workspace/build/src/accessible/base/NotificationController.cpp:585
#1 0x7fc5b8e6d8e6 in nsRefreshDriver::Tick(long, mozilla::TimeStamp) /home/worker/workspace/build/src/layout/base/nsRefreshDriver.cpp:1796:7
#2 0x7fc5b8e7b575 in mozilla::RefreshDriverTimer::TickRefreshDrivers(long, mozilla::TimeStamp, nsTArray<RefPtr<nsRefreshDriver> >&) /home/worker/workspace/build/src/layout/base/nsRefreshDriver.cpp:295:7
#3 0x7fc5b8e7b1da in mozilla::RefreshDriverTimer::Tick(long, mozilla::TimeStamp) /home/worker/workspace/build/src/layout/base/nsRefreshDriver.cpp:316:5
#4 0x7fc5b8e7d8de in applyImpl<mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver, void (mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::*)(mozilla::TimeStamp), StoreCopyPassByConstLRef<mozilla::TimeStamp> , 0> /home/worker/workspace/build/src/obj-firefox/dist/include/nsThreadUtils.h:768:12
#5 0x7fc5b8e7d8de in apply<mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver, void (mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::*)(mozilla::TimeStamp)> /home/worker/workspace/build/src/obj-firefox/dist/include/nsThreadUtils.h:774
#6 0x7fc5b8e7d8de in mozilla::detail::RunnableMethodImpl<void (mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::*)(mozilla::TimeStamp), true, false, mozilla::TimeStamp>::Run() /home/worker/workspace/build/src/obj-firefox/dist/include/nsThreadUtils.h:803
#7 0x7fc5b255142b in nsThread::ProcessNextEvent(bool, bool*) /home/worker/workspace/build/src/xpcom/threads/nsThread.cpp:1240:7
#8 0x7fc5b25d52fc in NS_ProcessNextEvent(nsIThread*, bool) /home/worker/workspace/build/src/xpcom/glue/nsThreadUtils.cpp:390:10
#9 0x7fc5b338d2df in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) /home/worker/workspace/build/src/ipc/glue/MessagePump.cpp:96:21
#10 0x7fc5b32fb2a8 in RunInternal /home/worker/workspace/build/src/ipc/chromium/src/base/message_loop.cc:238:3
...
see log.txt
Reporter | ||
Comment 2•8 years ago
|
||
Comment 3•8 years ago
|
||
(In reply to Tyson Smith [:tsmith] from comment #2)
> Created attachment 8826025 [details]
> test_case.html
That doesn't seem to crash for me even if I let it run for a while. Any particular str?
Reporter | ||
Comment 4•8 years ago
|
||
Requires fuzzPriv extension to reproduce:
https://github.com/MozillaSecurity/funfuzz/tree/master/dom/extension
This seems to be timing related. e10s was disabled.
Reporter | ||
Comment 5•8 years ago
|
||
Try to make the test case more reliable.
I can still repro very quickly on ASan opt builds with e10s disabled.
https://tools.taskcluster.net/index/artifacts/#gecko.v2.mozilla-central.latest.firefox/gecko.v2.mozilla-central.latest.firefox.linux64-asan-opt
Reporter | ||
Comment 6•8 years ago
|
||
I missed marking the old one as obsolete.
Attachment #8826025 -
Attachment is obsolete: true
Attachment #8837737 -
Attachment is obsolete: true
Comment 7•8 years ago
|
||
No luck on debug asan build with e10s disabled for me as well.
Could you probably confirm where the problem exactly, since line numbers from the original report seems out of dated. Is it crashed because nsRefreshDriver keeps a dead pointer to NotificationController, or is it crashed somewhere inside NotificationController::WillRefresh?
Reporter | ||
Comment 8•8 years ago
|
||
(In reply to alexander :surkov from comment #7)
> No luck on debug asan build with e10s disabled for me as well.
Please use an ASan opt build not debug.
> Could you probably confirm where the problem exactly, since line numbers
> from the original report seems out of dated. Is it crashed because
> nsRefreshDriver keeps a dead pointer to NotificationController, or is it
> crashed somewhere inside NotificationController::WillRefresh?
I'll attached an updated log.
Reporter | ||
Comment 9•8 years ago
|
||
Repro'd using https://tools.taskcluster.net/index/artifacts/#gecko.v2.mozilla-central.latest.firefox/gecko.v2.mozilla-central.latest.firefox.linux64-asan-opt
moz_source_stamp = 7abeac2f2d668554f0093fc0bdb1488f9a77d16e
Updated•8 years ago
|
Crash Signature: [@ mozilla::a11y::NotificationController::WillRefresh]
Comment 10•8 years ago
|
||
#9 Nightly topcrash in Nightly 20170224030232.
tbsaunde, this signature was put into bug 1341731 but perhaps it shouldn't have been.l
Flags: needinfo?(tbsaunde+mozbugs)
Comment 11•8 years ago
|
||
#6 Windows topcrash in Nightly 20170303030202.
Comment 12•8 years ago
|
||
Alex, ping. Comment 9 provided a log.
Flags: needinfo?(tbsaunde+mozbugs) → needinfo?(surkov.alexander)
Comment 13•8 years ago
|
||
more debugging is needed, here's a crash stopper
Assignee: nobody → surkov.alexander
Flags: needinfo?(surkov.alexander)
Attachment #8850610 -
Flags: review?(yzenevich)
Comment 14•8 years ago
|
||
Comment on attachment 8850610 [details] [diff] [review]
stop bleeding
Review of attachment 8850610 [details] [diff] [review]:
-----------------------------------------------------------------
looks reasonable. thanks
Attachment #8850610 -
Flags: review?(yzenevich) → review+
Comment 15•8 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/df69ffc4f6da6e939012afb0df30dd93e49240f8
Bug 1330484 - crash in NotificationController::WillRefresh, r=yzen
Comment 16•8 years ago
|
||
leave open until we figure out what's going on there; the patch has a debug assert for easier crash catching
Keywords: leave-open
Comment 17•8 years ago
|
||
bugherder |
Comment 18•8 years ago
|
||
We're hitting this assert a lot:
https://hg.mozilla.org/mozilla-central/annotate/45692c884fdd/accessible/base/NotificationController.cpp#l861
Is that relevant to this bug or should we file a new report?
Flags: needinfo?(tbsaunde+mozbugs)
Comment 19•8 years ago
|
||
I'm not sure I've seen a good stack for this, so I'm not sure. Can't hurt to fix it though.
Flags: needinfo?(tbsaunde+mozbugs)
Comment 20•8 years ago
|
||
- High crash rate in Nightly builds (both 54 and 55) (diagnostic assert)
- Startup crash
- Linux and Windows
https://hg.mozilla.org/mozilla-central/annotate/201231223cd4/accessible/base/NotificationController.cpp#l858
https://crash-stats.mozilla.com/signature/?product=Firefox&version=55.0a1&version=54.0a1&signature=mozilla%3A%3Aa11y%3A%3ANotificationController%3A%3AWillRefresh&date=%3E%3D2017-02-01T15%3A07%3A00.000Z&date=%3C2017-04-11T15%3A07%3A32.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-date&page=1#graph
We should fix this, but it's not going to affect release builds. Question: without the diagnostic assert, will the browser blow up someplace else in release due to this?
https://hg.mozilla.org/mozilla-central/annotate/201231223cd4/accessible/base/NotificationController.cpp#l858
Whiteboard: aes+
Updated•8 years ago
|
Assignee: surkov.alexander → tbsaunde+mozbugs
Flags: needinfo?(tbsaunde+mozbugs)
Comment 21•8 years ago
|
||
For recent builds this is #3 on nightly and should be given commensurate attention.
Comment 22•8 years ago
|
||
Still #3 Windows topcrash on Nightly 20170421030241.
tbsaunde, any updates?
Comment 23•8 years ago
|
||
[Tracking Requested - why for this release]:
#3 topcrash on Nightly.
tracking-firefox55:
--- → ?
Comment 24•8 years ago
|
||
I tried reproducing some on monday and tuesday, but didn't have any luck. I've been pto since then.
Flags: needinfo?(tbsaunde+mozbugs)
Comment 26•8 years ago
|
||
(In reply to Jim Mathies [:jimm] from comment #20)
> - High crash rate in Nightly builds (both 54 and 55) (diagnostic assert)
> - Startup crash
> - Linux and Windows
> https://hg.mozilla.org/mozilla-central/annotate/201231223cd4/accessible/base/
> NotificationController.cpp#l858
it feels different from the original bug, it might be worth to have another bug open for that
(In reply to Trevor Saunders (:tbsaunde) from comment #24)
> I tried reproducing some on monday and tuesday, but didn't have any luck.
(In reply to Marcia Knous [:marcia - use ni] from comment #25)
> Tracking 55+ as this is a top crash.
would it be reasonable to add a null check for parentIPCDoc then and replace diagnostic_assert or ordinal assert?
Comment 27•8 years ago
|
||
#5 Windows topcrash in Nightly 20170428030259.
Comment 28•8 years ago
|
||
Reminder for Trevor to see if we can catch this in try.
Flags: needinfo?(tbsaunde+mozbugs)
Comment 29•8 years ago
|
||
Trevor I think this was hit here: https://treeherder.mozilla.org/logviewer.html#?job_id=98491161&repo=try&lineNumber=16399
In reftest/tests/layout/reftests/forms/legend/shadow-dom.html
Comment 30•8 years ago
|
||
I didn't have much luck triggering this in windows mochitest runs, ended up triggering a number of other asserts though:
test run:
https://treeherder.mozilla.org/logviewer.html#?job_id=98394381&repo=try&lineNumber=8494
errors:
* [Child 5656] ###!!! ASSERTION: We must reach document accessible implementing text interface!: 'Not Reached'
* [Parent 5464] ###!!! ASSERTION: no outer doc for accessible remote tab!: 'outerDoc'
* PROCESS-CRASH | dom/html/test/test_fullscreen-api.html | application crashed [@ mozilla::a11y::DocAccessibleParent::RemoveChildDoc(mozilla::a11y::DocAccessibleParent *)]
test run: https://treeherder.mozilla.org/logviewer.html#?job_id=98394416&repo=try&lineNumber=35666
errors:
* Child 3452] ###!!! ASSERTION: We must reach document accessible implementing text interface!: 'Not Reached'
Comment hidden (obsolete) |
Comment 32•8 years ago
|
||
This seems like it should be fixable. It's a startup crash for one thing, which should provide some clues.
typical crash:
https://crash-stats.mozilla.com/report/index/7b318eb1-2e19-40ae-8b00-e03d10170517
code:
https://hg.mozilla.org/mozilla-central/annotate/b8e9b674033b/accessible/base/NotificationController.cpp#l857
I looked at DocAccessible's IPCDoc() to see how it could be null -
http://searchfox.org/mozilla-central/search?q=SetIPCDoc&path=
We set this in three places, each of which is during some form of tear down -
// found in mozilla::a11y::DocAccessibleChildBase::~DocAccessibleChildBase
// found in mozilla::a11y::DocAccessibleChildBase::ActorDestroy
// found in mozilla::a11y::DocAccessibleChildBase::DetachDocument
Seems to me that when we run across this in NotificationController, this object is in the process of getting torn down. When we do have this pointer, looks like we are setting up a new child doc so that we can call SendBindChildDoc on it. Can we bail early here like we do for the check of Defunct just above this code block?
https://hg.mozilla.org/mozilla-central/annotate/b8e9b674033b/accessible/base/NotificationController.cpp#l851
Updated•8 years ago
|
Flags: needinfo?(tbsaunde+mozbugs)
Updated•8 years ago
|
Assignee: tbsaunde+mozbugs → davidp99
Comment 33•8 years ago
|
||
This is the #7 Windows topcrash in Nightly 20170519030205.
Assignee | ||
Comment 34•8 years ago
|
||
Still no STR or anything like that. We were going to add crash-stats to learn more about the conditions of the bug but I ended up resolving most of my questions manually. For example, I am convinced that the missing IPC actor at the root of this bug must have existed at _some_ point -- to be in this part of a DocAccessible's NotificationController's timer handler, the DocAccessible would need to have had a parent [1][2] (enforced by state machine), and would have generated its actor in the same call where it got that parent [3].
Instead, I think this patch works to avoid an edge condition. I'm not 100% that this is happening but I do believe its safe and is a good candidate.
A loose example of what could be the cause:
1. A DocAccessible DA1 binds as parent of another DA2. In the process, DA1 and DA2 obtain DocAccessibleChild IPC objects DAC1 and DAC2.
2. DA2 queues a request to bind as parent of a third document, DA3.
3. DAC2 (somehow) gets an ActorDestroy message, causing it to remove itself. Now DAC2 has no actor.
4. DA2's NotificationController processes the queued bind request from step 2, hitting the assert.
---
I shutdown the child document that failed to bind because it is now improperly represented remotely. This is what we currently do when the binding fails for other reasons.
[1] https://dxr.mozilla.org/mozilla-central/rev/5b74bbf20e803e299790d266fc6ebf5d53b7a1b7/accessible/base/NotificationController.cpp#745
[2] https://dxr.mozilla.org/mozilla-central/rev/5b74bbf20e803e299790d266fc6ebf5d53b7a1b7/accessible/base/NotificationController.cpp#615
[3] https://dxr.mozilla.org/mozilla-central/rev/5b74bbf20e803e299790d266fc6ebf5d53b7a1b7/accessible/base/NotificationController.cpp#867
Attachment #8870093 -
Flags: review?(aklotz)
Assignee | ||
Comment 35•8 years ago
|
||
Apropos of nothing except that I ran into this a few times while working on this bug. The patch removes an assert that is invalid. Here's an example where it trips:
RecvHideEvent is received on a DocAccessibleParent DAP1 that has a child document DAP2... that in turn has a child document DAP3. DAP1::RecvHideEvent calls Shutdown on its root ProxyAccessible. This recursively calls Shutdown on child ProxyAccessibles until it reaches the one that is the parent of the child document DAP2. ProxyAccessibleBase::Shutdown then attempts to Unbind DAP2, which calls DAP1::RemoveChildDoc, which trips the assert because DAP2 has child document DAP3.
---
I traced the assert back to the earliest a11y e10s work (bug 982842). The assert really says that RemoveChildDoc can only be used as part of DocAccessibleParent Shutdown, which it initially was but is not now -- here it is being used by DocAccessibleParent::Unbind as part of a normal accessible element's Shutdown. I dont think the assert condition is required _but_ the code involved [1] suggests that the issue may have been introduced from a hack. It also suggests that no one may really know the right answer. Quoting ProxyAccessibleBase::Shutdown:
> // XXX Ideally this wouldn't be necessary, but it seems OuterDoc accessibles
> // can be destroyed before the doc they own.
We could change the code to e.g. recursively unbind child documents in post-order instead of just unbinding the immediate child so that the condition is met but I don't know that unbinding those documents is the right result.
[1] https://dxr.mozilla.org/mozilla-central/rev/9851fcb0bf4d855c36729d7de19f0fa5c9f69776/accessible/ipc/ProxyAccessibleBase.cpp#34
Attachment #8870102 -
Flags: review?(aklotz)
Assignee | ||
Comment 36•8 years ago
|
||
Comment 37•8 years ago
|
||
Comment on attachment 8850610 [details] [diff] [review]
stop bleeding
>diff --git a/accessible/base/NotificationController.cpp b/accessible/base/NotificationController.cpp
>--- a/accessible/base/NotificationController.cpp
>+++ b/accessible/base/NotificationController.cpp
>@@ -750,18 +750,21 @@ NotificationController::WillRefresh(mozi
>
> outerDocAcc->RemoveChild(childDoc);
> }
>
> // Failed to bind the child document, destroy it.
> childDoc->Shutdown();
> }
> }
>-
> mHangingChildDocuments.Clear();
>+ MOZ_ASSERT(mDocument, "Illicit document shutdown");
>+ if (!mDocument) {
>+ return;
>+ }
>
> // If the document is ready and all its subdocuments are completely loaded
> // then process the document load.
> if (mDocument->HasLoadState(DocAccessible::eReady) &&
> !mDocument->HasLoadState(DocAccessible::eCompletelyLoaded) &&
> hangingDocCnt == 0) {
> uint32_t childDocCnt = mDocument->ChildDocumentCount(), childDocIdx = 0;
> for (; childDocIdx < childDocCnt; childDocIdx++) {
Attachment #8850610 -
Attachment is obsolete: true
Updated•8 years ago
|
Attachment #8870093 -
Flags: review?(aklotz) → review+
Comment 38•8 years ago
|
||
Comment on attachment 8870102 [details] [diff] [review]
2. Part 2: Remove invalid assert that checked that removed a11y child documents have no children
r+ing these for aklotz, Aaron if you have any additional feedback dparks can address in follow ups.
Attachment #8870102 -
Flags: review?(aklotz) → review+
Assignee | ||
Updated•8 years ago
|
Keywords: checkin-needed
Comment 39•8 years ago
|
||
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1257d5a66ed0
Part 1: Detect and Shutdown binding accessibles that have lost their actors. r=jimm
https://hg.mozilla.org/integration/mozilla-inbound/rev/d2c50b0bab73
Part 2: Remove invalid assert that checked that removed a11y child documents have no children. r=jimm
Keywords: checkin-needed
Comment 40•8 years ago
|
||
bugherder |
Comment 41•8 years ago
|
||
Reporter | ||
Comment 42•8 years ago
|
||
With mozilla-central 20170524-37d777d87200 and the attached test case I can trigger the following assertion, Should this be a separate bug?
Assertion failure: mDocument (Illicit document shutdown), at src/accessible/base/NotificationController.cpp:768
==58708==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7f8eb94de661 bp 0x7ffffdb4ceb0 sp 0x7ffffdb4c8e0 T0)
==58708==The signal is caused by a WRITE memory access.
==58708==Hint: address points to the zero page.
#0 0x7f8eb94de660 in mozilla::a11y::NotificationController::WillRefresh(mozilla::TimeStamp) src/accessible/base/NotificationController.cpp:705:75
#1 0x7f8eb736b35f in nsRefreshDriver::Tick(long, mozilla::TimeStamp) src/layout/base/nsRefreshDriver.cpp:1790:12
#2 0x7f8eb737414e in mozilla::RefreshDriverTimer::TickRefreshDrivers(long, mozilla::TimeStamp, nsTArray<RefPtr<nsRefreshDriver> >&) src/layout/base/nsRefreshDriver.cpp:300:7
#3 0x7f8eb7373f1d in mozilla::RefreshDriverTimer::Tick(long, mozilla::TimeStamp) src/layout/base/nsRefreshDriver.cpp:321:5
#4 0x7f8eb7377d65 in mozilla::VsyncRefreshDriverTimer::RunRefreshDrivers(mozilla::TimeStamp) src/layout/base/nsRefreshDriver.cpp:753:5
#5 0x7f8eb7376886 in mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::TickRefreshDriver(mozilla::TimeStamp) src/layout/base/nsRefreshDriver.cpp:666:35
#6 0x7f8eb7372537 in mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::ParentProcessVsyncNotifier::Run() src/layout/base/nsRefreshDriver.cpp:512:20
#7 0x7f8eb1db4ff1 in nsThread::ProcessNextEvent(bool, bool*) src/xpcom/threads/nsThread.cpp:1302:14
#8 0x7f8eb1dbf1a0 in NS_ProcessNextEvent(nsIThread*, bool) src/xpcom/threads/nsThreadUtils.cpp:393:10
#9 0x7f8eb28e9235 in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) src/ipc/glue/MessagePump.cpp:96:21
#10 0x7f8eb28361d7 in MessageLoop::RunInternal() src/ipc/chromium/src/base/message_loop.cc:238:10
#11 0x7f8eb2836069 in MessageLoop::Run() src/ipc/chromium/src/base/message_loop.cc:211:3
#12 0x7f8eb6e9a07a in nsBaseAppShell::Run() src/widget/nsBaseAppShell.cpp:156:27
#13 0x7f8eb9a7a471 in nsAppStartup::Run() src/toolkit/components/startup/nsAppStartup.cpp:283:30
#14 0x7f8eb9bd3a52 in XREMain::XRE_mainRun() src/toolkit/xre/nsAppRunner.cpp:4568:22
#15 0x7f8eb9bd568b in XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) src/toolkit/xre/nsAppRunner.cpp:4748:8
#16 0x7f8eb9bd6578 in XRE_main(int, char**, mozilla::BootstrapConfig const&) src/toolkit/xre/nsAppRunner.cpp:4843:21
#17 0x4eca88 in do_main(int, char**, char**) src/browser/app/nsBrowserApp.cpp:236:22
#18 0x4ec3a0 in main src/browser/app/nsBrowserApp.cpp:309:16
#19 0x7f8ece87482f in __libc_start_main /build/glibc-9tT8Do/glibc-2.23/csu/../csu/libc-start.c:291
#20 0x41e0d4 in _start (m-c-1495643560-asan-debug/firefox+0x41e0d4)
Reporter | ||
Updated•8 years ago
|
Flags: in-testsuite?
Assignee | ||
Comment 43•8 years ago
|
||
This looks related to me. We can leave it in this bug for now. I want to see what happens with these patches for another day or so before deciding what to do about this.
The assert would seem to suggest that the DocAccessible tree is developing a loop...!
Comment 44•8 years ago
|
||
fixed in the 5-23 build.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Comment 45•8 years ago
|
||
\o/
Comment 46•8 years ago
|
||
Is it possible to land the attached test in a useful fashion?
status-firefox53:
--- → disabled
status-firefox54:
--- → disabled
status-firefox-esr52:
--- → unaffected
Flags: needinfo?(davidp99)
Keywords: leave-open
Target Milestone: --- → mozilla55
Assignee | ||
Comment 47•8 years ago
|
||
Would've been great but unfortunately no. I actually haven't seen the test case cause this crash or the issue Tyson brought up in comment 42. Its very timing dependent, which is why the patches were kind of a stab in the dark to begin with.
Flags: needinfo?(davidp99)
You need to log in
before you can comment on or make changes to this bug.
Description
•