Closed Bug 1062758 Opened 7 years ago Closed 6 years ago

crash in nsObserverService::RemoveObserver(nsIObserver*, char const*)

Categories

(Firefox for Android Graveyard :: General, defect)

34 Branch
ARM
Android
defect
Not set
critical

Tracking

(firefox34 wontfix, firefox35+ wontfix, firefox36+ verified, firefox37+ fixed, fennec37+)

RESOLVED FIXED
Firefox 37
Tracking Status
firefox34 --- wontfix
firefox35 + wontfix
firefox36 + verified
firefox37 + fixed
fennec 37+ ---

People

(Reporter: CristinaM, Assigned: snorp)

References

Details

(Keywords: crash, reproducible)

Crash Data

Attachments

(2 files)

This bug was filed from the Socorro interface and is 
report bp-12f54c65-c691-4967-9ca7-7cbdc2140904.
=============================================================
Device: Samsung Galaxy R (Android 2.3.4)
Build: Firefox for Android 34.0a2 (2014-09-03)

Steps to reproduce:
1. Navigate to a page;
2. Enter in Guest mode via custom menu-> Tools -> New guest session -> Continue.

Expected results:
Guest session is opened and about:home is displayed.

Actual results:
Firefox crashes.

Notes: Possible an Android 2.3 specific issue.
 0 	libxul.so 	nsObserverService::RemoveObserver(nsIObserver*, char const*) 	xpcom/ds/nsObserverService.cpp
1 	libxul.so 	nsExpirationTracker<mozilla::layers::TileClient, 3u>::~nsExpirationTracker() 	xpcom/ds/nsExpirationTracker.h
2 	libxul.so 	mozilla::DefaultDelete<mozilla::layers::TileExpiry>::operator()(mozilla::layers::TileExpiry*) const 	gfx/layers/client/TiledContentClient.cpp
3 	libxul.so 	mozilla::UniquePtr<mozilla::layers::TileExpiry, mozilla::DefaultDelete<mozilla::layers::TileExpiry> >::~UniquePtr() 	mfbt/UniquePtr.h
4 	libmozglue.so 	ElfLoader::DestructorCaller::Call() 	mozglue/linker/ElfLoader.cpp
5 	libmozglue.so 	ElfLoader::__wrap_cxa_finalize(void*) 	mozglue/linker/ElfLoader.cpp
6 	libmozglue.so 	CustomElf::CallFunction(void*) const 	mozglue/linker/CustomElf.h
7 	libmozglue.so 	CustomElf::CallFini() 	mozglue/linker/CustomElf.cpp
8 	libmozglue.so 	CustomElf::~CustomElf() 	mozglue/linker/CustomElf.cpp
9 	libmozglue.so 	CustomElf::~CustomElf() 	mozglue/linker/CustomElf.cpp
10 	libmozglue.so 	mozilla::detail::RefCounted<LibHandle, (mozilla::detail::RefCountAtomicity)0>::Release() const 	mozglue/linker/ElfLoader.h
11 	libmozglue.so 	LibHandle::ReleaseDirectRef() 	mozglue/linker/ElfLoader.h
12 	libmozglue.so 	ElfLoader::~ElfLoader() 	mozglue/linker/ElfLoader.cpp
Ø 13 	libc.so 	libc.so@0x1cffb 	
14 	libmozglue.so 	__wrap_dladdr 	mozglue/linker/ElfLoader.cpp
I had the same crash during regular browsing on a Nexus 7 / Android  4.3.
Reproducible with Nightly (2014-10-01) using Asus Transformer TF101 (Android 4.0.3).
Fennec always crashes with the above signature on Gingerbread devices when entering in guest mode
tracking-fennec: --- → ?
Keywords: reproducible
Looks like #4 crash on 34.0b1
Assignee: nobody → bnicholson
tracking-fennec: ? → 34+
https://crash-stats.mozilla.com/report/index/1a3805a3-ae0f-4bbb-b12e-bceae2141028
100% reporducible on ZTE Grand X In x86 (4.0.4) when entering in Guest mode
Does MV have that device?
I've tried multiple devices and emulators, but haven't been able to reproduce. Kevin, do you have any of these, or know if any of them might be sitting in MV somewhere?
Flags: needinfo?(kbrosnan)
Yes this device is in the QA lab in MTV. Marc Schifer or one of the other QA team in MV should be able help you find it. What you are looking for, http://www.gsmarena.com/samsung_i9103_galaxy_r-pictures-3967.php
Flags: needinfo?(kbrosnan)
Thanks. I'll try to find this tomorrow.
Here's a dump of what I've found so far:
* Using a TF101 tablet, I crash every time I click "New Guest Session". The crash reporter comes up, but then we reopen Fennec in guest mode, so things still seem to work. The same thing happens when leaving guest mode. I don't see anything in logcat.
* I'm not able to reproduce this with a debug build (though I haven't yet tried a debug build with the crash reporter). Nothing suspicious shows up in logcat. Also, since we use System.exit() to restart into guest mode, gdb gets killed in the process. I'm not sure how useful gdb would be even if this was reproducible with a debug build.
* Mark pointed out a similar bug, bug 1005336, which got fixed in Fx32 and was a regression from cache v2. Not sure if it's related.

I probably won't have the cycles to get this fixed in 34, but I guess my next step would be to throw a bunch of logs in nsObserverService/nsExpirationTracker to see if we can find anything. Eugen, as someone with more platform experience, do you know of a better way to debug this?
Flags: needinfo?(esawin)
Not really, adding logs is what I would do if attaching gdb is no option. Jim could know some tricks around the crash reporter though.
Flags: needinfo?(esawin)
This crash appears to happen on the NS_ENSURE_VALIDCALL marco here:
http://mxr.mozilla.org/mozilla-central/source/xpcom/ds/nsObserverService.cpp#282

This will crash (MOZ_CRASH) if the call is not on the main thread:
http://mxr.mozilla.org/mozilla-central/source/xpcom/ds/nsObserverService.cpp#245

We have seen other cases where the nsObserverService is crashing for similar reasons (bug 1071217). NI'ing Seth and Benjamin in case they have some ideas.
Flags: needinfo?(seth)
Flags: needinfo?(benjamin)
This is a top 5 (or higher) crash on Fx36, Fx35 and Fx34
This particular signature can happen in several different scenarios. The stack from comment 1 indicates that we're unloading libc, running static destructors which includes a mozilla::UniquePtr<mozilla::layers::TileExpiry...>::~UniquePtr() which then tries to use XPCOM after XPCOM has been shut down and we abort.

There are a few possible problems here:

* Are we unloading libc when we shouldn't be?
* Why do we have a static UniquePtr that's not cleared/leaked on shutdown?
* Should we neuter all static objects like this rather than running their destructors at shutdown? any static object like nsCOMPtr/nsRefPtr/nsAutoPtr could have similar problems if we leaked an object in it across XPCOM shutdown.
Flags: needinfo?(benjamin)
Benjamin's the expert here, but FWIW I concur with his assessment.
Flags: needinfo?(seth)
Assignee: bnicholson → nobody
FWIW, I can't reproduce on Nexus 10 with Android L, Guest Session works correctly there.
So, looking at this bug since it seems to have kinda stalled. We're crashing while trying to destroy mozilla::layers::TileExpiry as a result of the static UniquePtr being destroyed due to ELF destructors being run. We have code to destroy the TileExpiry in gfxPlatform::Shutdown() which should avoid problems like this. This indicates that we aren't cleanly shutting down gecko. Indeed, after looking at how we enter guest mode, it looks like we just do an exit(). We need to modify that to shut down XRE first.
Duplicate of this bug: 1110954
Duplicate of this bug: 1112253
gl;hf
Assignee: nobody → snorp
Status: NEW → ASSIGNED
This avoids system.exit() directly just about everywhere, allowing gecko to go through its normal shutdown process.
Attachment #8538136 - Flags: review?(nchen)
Attachment #8538136 - Flags: review?(mark.finkle)
Comment on attachment 8538136 [details] [diff] [review]
Try to shutdown gracefully on Android

Questions:
1. I notice a few of the systemExit -> gracefulExit are after doRestart(). I wonder if the gracefulExit will create some timeout issues when attemnpting to restart. I suppose testing guestMode and addon restarts could be enough to see if issues exits.

2. I see we send a "Browser:Quit" event already, when doing a Menu > Quit [1]. In there we try to support the "Clear Data on Quit" feature. Should gracefulExit does this? If so, the Menu > Quit code could just call gracefulExit().

3. Looks like sending "" to "Browser:Quit" should be OK, if we decide to continue to just do that [2].

[1] http://mxr.mozilla.org/mozilla-central/source/mobile/android/base/GeckoApp.java#489
[2] http://mxr.mozilla.org/mozilla-central/source/mobile/android/chrome/content/browser.js#1696
(In reply to Mark Finkle (:mfinkle) from comment #23)
> Comment on attachment 8538136 [details] [diff] [review]
> Try to shutdown gracefully on Android
> 
> Questions:
> 1. I notice a few of the systemExit -> gracefulExit are after doRestart(). I
> wonder if the gracefulExit will create some timeout issues when attemnpting
> to restart. I suppose testing guestMode and addon restarts could be enough
> to see if issues exits.

It could, yeah. This is just continuing the series of grotesque hacks around this stuff. In practice it seems to work for me at least. Getting all of this crap fixed up is something I want to do with the GeckoView and JNI refactoring work.

> 
> 2. I see we send a "Browser:Quit" event already, when doing a Menu > Quit
> [1]. In there we try to support the "Clear Data on Quit" feature. Should
> gracefulExit does this? If so, the Menu > Quit code could just call
> gracefulExit().

We don't try to do that in the places where I replaced the systemExit(), so I didn't bother, but we could just add that command argument to gracefulExit().
Comment on attachment 8538136 [details] [diff] [review]
Try to shutdown gracefully on Android

Maybe file a followup bug for someone to unify gracefulExit and "close on exit"
Attachment #8538136 - Flags: review?(mark.finkle) → review+
Attachment #8538136 - Flags: review?(nchen) → review+
This is a high volume crash on 35, which has only two betas left before ship, is there a safe uplift nomination coming here?
Flags: needinfo?(snorp)
This hasn't landed yet, and I didn't see an existing Try push, so might as well make sure it's not going to burn anything when it lands.

FWIW, from my perspective this looks like the lowest possible risk for a patch that futzes with shutdown, so I'd support an uplift nom.
https://hg.mozilla.org/integration/mozilla-inbound/rev/11e3df6283a0

Lets make sure this fixes things on Nightly and then uplift
Flags: needinfo?(snorp)
https://hg.mozilla.org/mozilla-central/rev/11e3df6283a0
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 37
please nominate for beta/aurora uplift
Flags: needinfo?(snorp)
I'm still able to reproduce this crash on Firefox for Android 37.0a1 (2015-01-04) using Asus Transformer TF101 (Android 4.0.3).
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Lukas Blakk [:lsblakk] use ?needinfo from comment #31)
> please nominate for beta/aurora uplift

Let's see what's going on first.
Flags: needinfo?(snorp)
Too late for final mobile 35 beta, wontfixing, moving tracking to 36/37 so we can follow up with more time to assess results of attempts to fix.
tracking-fennec: 34+ → 37+
Recent incarnations of this crash appear to be compositor related, strangely.

https://crash-stats.mozilla.com/report/index/3bee9ab7-9790-4b87-b55d-bd95d2150124

I suspect we hit a divide-by-zero and then crap out when handling it.
I cloned this to bug 1127464 for the new stack, and I think we can close this one.
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Marking 37 and 38 as fixed and will set appropriate flags in bug 1127464. mfinkle marked 36 as wontfix but is there any value in uplifting this patch to Beta?
Flags: needinfo?(snorp)
Flags: needinfo?(mark.finkle)
I suppose we could, but I don't know how much of an impact it might be on overall crash levels. The patch in this bug is not too risky.
Flags: needinfo?(mark.finkle)
I'm setting 36 back to affected to put this on Sylvestre's list. I'll let him make the call on whether he wants to take this given what he knows about the 36 mobile release. 

Sylvestre - Note that the problem as originally reported will not be fixed completely until bug 1127464 is fixed.
Flags: needinfo?(sledru)
(In reply to Lawrence Mandel [:lmandel] (use needinfo) from comment #39)
> I'm setting 36 back to affected to put this on Sylvestre's list. I'll let
> him make the call on whether he wants to take this given what he knows about
> the 36 mobile release. 
> 
> Sylvestre - Note that the problem as originally reported will not be fixed
> completely until bug 1127464 is fixed.

Although they have similar stacks (towards the end), they are really very different bugs. This one was a crash involving guest mode, the other one is basically OOM.
Flags: needinfo?(snorp)
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #40)
> Although they have similar stacks (towards the end), they are really very
> different bugs. This one was a crash involving guest mode, the other one is
> basically OOM.

Thank you for the clarification. Clearly I misunderstood.

In that case, can you nom this bug for Beta?
Comment on attachment 8538136 [details] [diff] [review]
Try to shutdown gracefully on Android

Approval Request Comment
[Feature/regressing bug #]: ?
[User impact if declined]: Crashes when entering/exiting guest mode
[Describe test coverage new/current, TreeHerder]: Nightly, Aurora
[Risks and why]: low, only slightly changes some exit behavior
[String/UUID change made/needed]: none
Attachment #8538136 - Flags: approval-mozilla-beta?
Flags: needinfo?(sledru)
Attachment #8538136 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
I was able to reproduce the crash on Firefox for Android 36 Beta 6 using ZTE x86 (Android 4.0.4).
I am not able to reproduce the crash on Firefox for Android 36 Beta 8.
Attached file logs-guestmode.txt
The crash is still reproducible on Firefox 36 Beta 9.
On ZTE x86, I've installed Beta 8 and sometimes I can reproduce the crash and sometimes I don't. Also, sometimes, the browser closes when choosing to enter guest mode/leave guest mode. I've attached the logs for the last scenario, where the browser closes without crash report.
Sounds like we need to explore more, a part 2 in a new bug.
(In reply to Aaron Train [:aaronmt] from comment #46)
> Sounds like we need to explore more, a part 2 in a new bug.

nm bug 1127464
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.