Closed
Bug 753665
Opened 9 years ago
Closed 9 years ago
shutdown crash in nsWindow::FindTopLevel
Categories
(Core :: Widget: Android, defect)
Tracking
()
VERIFIED
FIXED
mozilla16
People
(Reporter: scoobidiver, Assigned: kats)
Details
(Keywords: crash, reproducible, topcrash, Whiteboard: [native-crash])
Crash Data
Attachments
(5 files, 1 obsolete file)
1.47 KB,
patch
|
blassey
:
review+
|
Details | Diff | Splinter Review |
5.45 KB,
patch
|
kats
:
review-
|
Details | Diff | Splinter Review |
1.50 KB,
patch
|
blassey
:
review+
|
Details | Diff | Splinter Review |
1.06 KB,
patch
|
blassey
:
review+
|
Details | Diff | Splinter Review |
904 bytes,
patch
|
blassey
:
review+
akeybl
:
approval-mozilla-aurora+
johnath
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
It happens from time to time. Signature nsWindow::FindTopLevel More Reports Search UUID 8d979a38-e2ff-4325-bd07-23bbf2120510 Date Processed 2012-05-10 05:50:49 Uptime 143 Install Age 3.0 minutes since version was first installed. Install Time 2012-05-10 05:47:36 Product FennecAndroid Version 15.0a1 Build ID 20120509030514 Release Channel nightly OS Linux OS Version 0.0.0 Linux 2.6.35.10-g3f43272 #1 PREEMPT Tue Jun 14 21:11:23 CST 2011 armv7l Build Architecture arm Build Architecture Info Crash Reason SIGSEGV Crash Address 0x544c App Notes AdapterVendorID: bravo, AdapterDeviceID: HTC Desire. AdapterDescription: 'Model: 'HTC Desire', Product: 'htc_bravo', Manufacturer: 'HTC', Hardware: 'bravo''. HTC HTC Desire htc_wwe/htc_bravo/bravo:2.3.3/GRI40/96875.1:user/release-keys EMCheckCompatibility True Frame Module Signature Source 0 libxul.so nsWindow::FindTopLevel widget/android/nsWindow.cpp:217 1 libxul.so nsWindow::~nsWindow widget/android/nsWindow.cpp:203 2 libxul.so nsWindow::~nsWindow widget/android/nsWindow.cpp:210 3 libxul.so nsBaseWidget::Release widget/xpwidgets/nsBaseWidget.cpp:82 4 libxul.so nsWindow::Release widget/android/nsWindow.cpp:89 5 libxul.so nsCOMPtr_base::~nsCOMPtr_base obj-firefox/xpcom/build/nsCOMPtr.cpp:81 6 libxul.so nsTouchEvent::~nsTouchEvent nsCOMPtr.h:480 7 libxul.so nsDOMTouchEvent::~nsDOMTouchEvent content/events/src/nsDOMTouchEvent.cpp:243 8 libxul.so nsDOMTouchEvent::~nsDOMTouchEvent content/events/src/nsDOMTouchEvent.cpp:246 9 libxul.so nsDOMEvent::Release content/events/src/nsDOMEvent.cpp:217 10 libxul.so nsDOMUIEvent::Release content/events/src/nsDOMUIEvent.cpp:123 11 libxul.so nsDOMTouchEvent::Release content/events/src/nsDOMTouchEvent.cpp:270 12 libxul.so XPCJSRuntime::GCCallback js/xpconnect/src/XPCJSRuntime.cpp:623 13 libxul.so js::GC js/src/jsgc.cpp:3697 14 libxul.so JS_GC js/src/jsapi.cpp:2865 15 libxul.so nsXREDirProvider::DoShutdown toolkit/xre/nsXREDirProvider.cpp:850 16 libxul.so ScopedXPCOMStartup::~ScopedXPCOMStartup toolkit/xre/nsAppRunner.cpp:1125 17 libxul.so XREMain::XRE_main toolkit/xre/nsAppRunner.cpp:3879 18 libxul.so XRE_main toolkit/xre/nsAppRunner.cpp:3933 ... More reports at: https://crash-stats.mozilla.com/report/list?signature=nsWindow%3A%3AFindTopLevel
Assignee | ||
Comment 1•9 years ago
|
||
Oh, excellent. sewardj found this using valgrind yesterday too, he said he got the following from valgrind. ==1798== Invalid read of size 4 ==1798== at 0x36618BD6: nsWindow::~nsWindow() (nsWindow.cpp:204) ==1798== by 0x36618C83: nsWindow::~nsWindow() (nsWindow.cpp:210) ==1798== by 0x3661CDAF: nsBaseWidget::Release() (nsBaseWidget.cpp:82) ==1798== by 0x36616F59: nsWindow::Release() (nsWindow.cpp:89) ==1798== by 0x3673DB4D: nsCOMPtr_base::~nsCOMPtr_base() (nsCOMPtr.cpp:81) ==1798== by 0x3615196D: nsDOMTouchEvent::~nsDOMTouchEvent() (nsCOMPtr.h:480) ==1798== by 0x361519FF: nsDOMTouchEvent::~nsDOMTouchEvent() (nsDOMTouchEvent.cpp:246) ==1798== by 0x36137D5D: nsDOMEvent::Release() (nsDOMEvent.cpp:217) ==1798== by 0x3613A501: nsDOMUIEvent::Release() (nsDOMUIEvent.cpp:123) ==1798== by 0x361513E5: nsDOMTouchEvent::Release() (nsDOMTouchEvent.cpp:270) ==1798== by 0x364547EF: XPCJSRuntime::GCCallback(JSRuntime*, JSGCStatus) (XPCJSRuntime.cpp:623) ==1798== by 0x36A28BC7: Collect(JSRuntime*, bool, long long, js::JSGCInvocationKind, js::gcreason::Reason) (jsgc.cpp:3697) ==1798== Address 0x21249a34 is 140 bytes inside a block of size 384 free'd ==1798== at 0x4804F44: free (vg_replace_malloc.c:427) ==1798== by 0x2275F699: moz_free (mozalloc.cpp:81) ==1798== by 0x36618C89: nsWindow::~nsWindow() (mozalloc.h:253) ==1798== by 0x3661CDAF: nsBaseWidget::Release() (nsBaseWidget.cpp:82) ==1798== by 0x36616F59: nsWindow::Release() (nsWindow.cpp:89) ==1798== by 0x3673DB4D: nsCOMPtr_base::~nsCOMPtr_base() (nsCOMPtr.cpp:81) ==1798== by 0x35F13EFD: nsDeviceContext::~nsDeviceContext() (nsCOMPtr.h:480) ==1798== by 0x35F58565: nsPresContext::~nsPresContext() (nsDeviceContext.h:60) ==1798== by 0x35F586B7: nsPresContext::~nsPresContext() (nsPresContext.cpp:335) ==1798== by 0x35F554B9: nsPresContext::Release() (nsPresContext.cpp:345) ==1798== by 0x36137E5D: nsDOMEvent::~nsDOMEvent() (nsAutoPtr.h:908) ==1798== by 0x36140E01: nsDOMPageTransitionEvent::~nsDOMPageTransitionEvent() (nsDOMPageTransitionEvent.h:47) ==1798== Invalid write of size 4 ==1798== at 0x36618BE0: nsWindow::~nsWindow() (nsWindow.cpp:205) ==1798== by 0x36618C83: nsWindow::~nsWindow() (nsWindow.cpp:210) ==1798== by 0x3661CDAF: nsBaseWidget::Release() (nsBaseWidget.cpp:82) ==1798== by 0x36616F59: nsWindow::Release() (nsWindow.cpp:89) ==1798== by 0x3673DB4D: nsCOMPtr_base::~nsCOMPtr_base() (nsCOMPtr.cpp:81) ==1798== by 0x3615196D: nsDOMTouchEvent::~nsDOMTouchEvent() (nsCOMPtr.h:480) ==1798== by 0x361519FF: nsDOMTouchEvent::~nsDOMTouchEvent() (nsDOMTouchEvent.cpp:246) ==1798== by 0x36137D5D: nsDOMEvent::Release() (nsDOMEvent.cpp:217) ==1798== by 0x3613A501: nsDOMUIEvent::Release() (nsDOMUIEvent.cpp:123) ==1798== by 0x361513E5: nsDOMTouchEvent::Release() (nsDOMTouchEvent.cpp:270) ==1798== by 0x364547EF: XPCJSRuntime::GCCallback(JSRuntime*, JSGCStatus) (XPCJSRuntime.cpp:623) ==1798== by 0x36A28BC7: Collect(JSRuntime*, bool, long long, js::JSGCInvocationKind, js::gcreason::Reason) (jsgc.cpp:3697) ==1798== Address 0x21249a34 is 140 bytes inside a block of size 384 free'd ==1798== at 0x4804F44: free (vg_replace_malloc.c:427) ==1798== by 0x2275F699: moz_free (mozalloc.cpp:81) ==1798== by 0x36618C89: nsWindow::~nsWindow() (mozalloc.h:253) ==1798== by 0x3661CDAF: nsBaseWidget::Release() (nsBaseWidget.cpp:82) ==1798== by 0x36616F59: nsWindow::Release() (nsWindow.cpp:89) ==1798== by 0x3673DB4D: nsCOMPtr_base::~nsCOMPtr_base() (nsCOMPtr.cpp:81) ==1798== by 0x35F13EFD: nsDeviceContext::~nsDeviceContext() (nsCOMPtr.h:480) ==1798== by 0x35F58565: nsPresContext::~nsPresContext() (nsDeviceContext.h:60) ==1798== by 0x35F586B7: nsPresContext::~nsPresContext() (nsPresContext.cpp:335) ==1798== by 0x35F554B9: nsPresContext::Release() (nsPresContext.cpp:345) ==1798== by 0x36137E5D: nsDOMEvent::~nsDOMEvent() (nsAutoPtr.h:908) ==1798== by 0x36140E01: nsDOMPageTransitionEvent::~nsDOMPageTransitionEvent() (nsDOMPageTransitionEvent.h:47) It seems to me that the mParent pointer on nsWindow could become invalid because it's just a nsWindow* and not some fancy refptr. If that becomes invalid then it would cause this behaviour (both the crash and the invalid memory read/write).
Assignee | ||
Comment 2•9 years ago
|
||
Just ran into this on my GN device with an m-c build. I spent some time panning around http://i.imgur.com/61GmV.jpg, and then hit the home button to go back to the homescreen, and then soon (within a second or two) re-entered fennec by hitting the homescreen icon. It crashed within a second or so of coming up. I had gdb attached, so I could verify it was this same crash.
blocking-fennec1.0: --- → ?
Updated•9 years ago
|
Assignee: nobody → bugmail.mozilla
blocking-fennec1.0: ? → +
Assignee | ||
Comment 3•9 years ago
|
||
Unfortunately I'm unable to repro using the steps in comment 2. Seems like it happens randomly.
Assignee | ||
Comment 4•9 years ago
|
||
Looking at the stack dumps this is actually a shutdown crash and therefore probably less important. Still should be fixed though. To provide a little more context for what I said in comment 1: the actual error pointed to be the valgrind output in comment 1 refers to these lines of code in the destructor: if (top->mFocus == this) top->mFocus = nsnull; It looks like "top" has already been freed, so reading/writing top->mFocus is the problem. "top" comes from FindTopLevel(), which as far as I can see, will only return a garbage value if the chain of mParent references has gone bad. I looked through the nsWindow code to see how mParent might be corrupted, and the only way I can see that might happen is if (1) a window is destructed without having Destroy() called on it first, or (2) a window has a child added to it after Destroy() was called on it but before it is destructed. Need to trace the code some more...
Summary: crash in nsWindow::FindTopLevel → shutdown crash in nsWindow::FindTopLevel
Assignee | ||
Comment 5•9 years ago
|
||
After staring at this code all day I found one bug, but I don't think it's the cause of this problem. Putting it here anyway, but will request review if/when I do find the actual problem and have a patch for it.
Comment 6•9 years ago
|
||
This crash is currently Fennec 15.0a1's #3 topcrash for the past 3 days.
Assignee | ||
Comment 7•9 years ago
|
||
The more I stare at this code the less things make sense. But I found some more bugs!
Assignee | ||
Comment 8•9 years ago
|
||
Comment on attachment 626178 [details] [diff] [review] (1/4) Fix bug in error-handling code This is likely not the cause of this crash, but it's a bug nonetheless.
Attachment #626178 -
Attachment description: Fix bug in error-handling code → (1/4) Fix bug in error-handling code
Attachment #626178 -
Flags: review?(blassey.bugs)
Assignee | ||
Comment 9•9 years ago
|
||
Some of the DispatchEvent calls weren't gripping the window hard enough. This may or may not be the cause of this crash.
Attachment #627333 -
Flags: review?(blassey.bugs)
Assignee | ||
Comment 10•9 years ago
|
||
This one may actually be the cause of the crash, but I'm very unsure about this. What I'm seeing is that the nsTouchEvent object is stack-allocated, and gets passed to DispatchEvent. This eventually ends up creating a nsDOMTouchEvent object with that stack-allocated nsTouchEvent object. When the nsDOMTouchEvent object destructor runs, it calls delete on the stored pointer (see constructor/destructor at http://mxr.mozilla.org/mozilla-central/source/content/events/src/nsDOMTouchEvent.cpp#189). Calling delete on stack-allocated objects can results in all sorts of weird behaviour including this crash. Making these objects heap allocated should in theory fix it. However the nsTouchEvent pointers get passed around in a lot of places and I don't what the ownership model for these event objects are so this change could end up leaking these objects too. Any feedback here would be helpful.
Attachment #627335 -
Flags: review?(blassey.bugs)
Assignee | ||
Comment 11•9 years ago
|
||
Minor things I noticed after staring at this code for so long
Attachment #627336 -
Flags: review?(blassey.bugs)
Assignee | ||
Comment 12•9 years ago
|
||
Comment on attachment 627335 [details] [diff] [review] (3/4) Heap-allocate the nsTouchEvent objects wesj pointed out that the nsDOMTouchEvent has an mEventIsInternal flag that guards against this. And yet I'm seeing that delete get hit in gdb. I will need to investigate this further.
Attachment #627335 -
Flags: review?(blassey.bugs) → review-
Comment 13•9 years ago
|
||
Comment on attachment 626178 [details] [diff] [review] (1/4) Fix bug in error-handling code Review of attachment 626178 [details] [diff] [review]: ----------------------------------------------------------------- ::: widget/android/nsWindow.cpp @@ +244,5 @@ > for (PRUint32 i = 0; i < mChildren.Length(); ++i) { > // why do we still have children? > ALOG("### Warning: Destroying window %p and reparenting child %p to null!", (void*)this, (void*)mChildren[i]); > mChildren[i]->SetParent(nsnull); > + i--; // we just removed a child, so decrement counter kats, you are decrementing the loop counter before it is incremented, so i is always 0. Could this for loop be replaced with something like while (mChildren.Length() > 0)?
Comment 14•9 years ago
|
||
Comment on attachment 626178 [details] [diff] [review] (1/4) Fix bug in error-handling code Review of attachment 626178 [details] [diff] [review]: ----------------------------------------------------------------- ::: widget/android/nsWindow.cpp @@ +244,5 @@ > for (PRUint32 i = 0; i < mChildren.Length(); ++i) { > // why do we still have children? > ALOG("### Warning: Destroying window %p and reparenting child %p to null!", (void*)this, (void*)mChildren[i]); > mChildren[i]->SetParent(nsnull); > + i--; // we just removed a child, so decrement counter As Chris said I think his would be better as: while (mChildren.Length()) mChildren[0]->SetParent(nsnull);
Attachment #626178 -
Flags: review?(blassey.bugs) → review-
Updated•9 years ago
|
Attachment #627333 -
Flags: review?(blassey.bugs) → review+
Assignee | ||
Comment 15•9 years ago
|
||
Fixed up to use a while loop
Attachment #626178 -
Attachment is obsolete: true
Attachment #627351 -
Flags: review?(blassey.bugs)
Updated•9 years ago
|
Attachment #627351 -
Flags: review?(blassey.bugs) → review+
Updated•9 years ago
|
Attachment #627336 -
Flags: review?(blassey.bugs) → review+
Assignee | ||
Comment 16•9 years ago
|
||
Landed the three patches, but this bug should be left open since I'm not sure if those patches will actually fix the crash. https://hg.mozilla.org/integration/mozilla-inbound/rev/88787676b14a https://hg.mozilla.org/integration/mozilla-inbound/rev/20a421502a41 https://hg.mozilla.org/integration/mozilla-inbound/rev/a8cc00d4ca48
Whiteboard: [native-crash] → [native-crash][leave open]
Target Milestone: --- → mozilla15
Comment 17•9 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/88787676b14a https://hg.mozilla.org/mozilla-central/rev/20a421502a41 https://hg.mozilla.org/mozilla-central/rev/a8cc00d4ca48
Flags: in-testsuite-
Comment 18•9 years ago
|
||
Looks like there was a couple crashes in the 27th nightly.
Comment 19•9 years ago
|
||
Yeah, here's a crash from the 29th: https://crash-stats.mozilla.com/report/index/b718547d-7e69-4970-afb8-78f3a2120529
Updated•9 years ago
|
blocking-fennec1.0: + → .N+
Comment 20•9 years ago
|
||
Re-triaging to strip out non-OMG-ON-FIRE release blockers - pushing this to .N+ but we'd still love to look at a safe patch.
7th on aurora https://crash-stats.mozilla.com/topcrasher/byversion/FennecAndroid/15.0a2/7 agree w/ Comment 20.
Keywords: topcrash
Comment 22•9 years ago
|
||
Kat's the is the #12 top crasher on trunk and #11 on beta so I'd say it is not fixed. However, being that it is out of the top 10, I'm renominating it to get it off the .N+ list
blocking-fennec1.0: .N+ → ?
Updated•9 years ago
|
blocking-fennec1.0: ? → .N+
Assignee | ||
Comment 23•9 years ago
|
||
Ok, I finally got STR for this that seem to work reliably. 1. Start fennec 2. Load http://people.mozilla.org/~kgupta/bug/753665.html 3. Pinch zoom in 4. Quit fennec from the menu The shutdown procedure crashes Fennec; if you have gdb attached you can see it is this crash. What's happening is there are two windows, one a child of the other. The child nsWindow has ::Destroy called on it, which removes it from the parent's child list. Then the parent has ::Destroy called on it, which does the normal cleanup. The parent is then destructed, leaving a dangling mParent reference in the child. Then the child is garbage-collected, and it tries accessing the bad mParent reference. This STR depends on touch events because the touch events keep a nsCOMPtr reference to the child window, and that's what makes the child window get destructed after the parent window; if the windows are destructed in the reverse order then there is no crash. The fix is to ensure that in nsWindow::Destroy(), the mParent reference is nulled out. Patch coming up.
Assignee | ||
Comment 24•9 years ago
|
||
Call SetParent(nsnull) which more cleanly disconnects a child from it's parent, instead of manually removing the child from the parent's child list. In particular, this ensures that mParent gets set to null for the child.
Attachment #634161 -
Flags: review?(blassey.bugs)
Updated•9 years ago
|
Attachment #634161 -
Flags: review?(blassey.bugs) → review+
Assignee | ||
Comment 25•9 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/4180a4a2ebf2
status-firefox14:
--- → affected
status-firefox15:
--- → affected
status-firefox16:
--- → fixed
Whiteboard: [native-crash][leave open] → [native-crash]
Assignee | ||
Comment 26•9 years ago
|
||
Comment on attachment 634161 [details] [diff] [review] Actual fix [Approval Request Comment] Bug caused by (feature/regressing bug #): User impact if declined: fennec sometimes crash during shutdown Testing completed (on m-c, etc.): just landed on inbound, pre-nomming to get it on the radar. we should probably wait to get crash stats data to confirm this fixes the issue before uplifting. Risk to taking this patch (and alternatives if risky): low risk, mobile only String or UUID changes made by this patch: none
Attachment #634161 -
Flags: approval-mozilla-beta?
Attachment #634161 -
Flags: approval-mozilla-aurora?
Reporter | ||
Updated•9 years ago
|
Keywords: reproducible
Comment 27•9 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/4180a4a2ebf2
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: mozilla15 → mozilla16
Comment 28•9 years ago
|
||
Comment on attachment 634161 [details] [diff] [review] Actual fix In preparation for possible 14.0.1 inclusion, let's land on Aurora 15.
Attachment #634161 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Assignee | ||
Comment 29•9 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/a25e23ff8455
Updated•9 years ago
|
Attachment #634161 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Assignee | ||
Comment 30•9 years ago
|
||
https://hg.mozilla.org/releases/mozilla-beta/rev/766e5964701c
Comment 31•9 years ago
|
||
Unable to get Firefox to crash following the steps in comment 23. Verified on: Nightly 16.0a1 2012-07-12/Aurora 15.0a2 2012-07-11/Firefox Mobile 14 beta 12 build 1 HTC Desire Android 2.2.2
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•