Closed Bug 614547 Opened 9 years ago Closed 9 years ago

Fennec crash [@ GeckoStart ]

Categories

(Core :: Widget: Android, defect, P1, critical)

ARM
Android
defect

Tracking

()

VERIFIED FIXED
Tracking Status
fennec 2.0b4+ ---

People

(Reporter: ahoza, Assigned: blassey)

References

Details

(Keywords: crash, regression, topcrash)

Crash Data

Attachments

(3 files, 3 obsolete files)

Device: Motorola Droid 2
BuildID: Mozilla /5.0 (Android;Linux armv7l;rv:2.0b8pre) Gecko/20101124 Firefox/4.0b8pre Fennec /4.0b3pre

Prerequisites: 
Put device in landscape mode (open hardware keyboard)

Steps to reproduce:

Scenario 1:

1. Start Fennec.
2. Go to Options->Addons and choose to install a recommended add-on.
3. When prompted tap "Restart".


Scenario 2:
1. Start Fennec.
2. Go to Options-> Preferences and choose another language from the list.
3. When prompted tap "Restart".

Expected results:
Browser is restarted and add-on is installed/ localization changed.

Actual results:
After tapping "Restart", screen turns black.

Sometimes I get, "Fennec crashed" screen and the crash is the following:
http://crash-stats.mozilla.com/report/index/fef085f3-b11f-4397-bf85-705052101124
Marking as blocking Fennec 2.0
blocking2.0: --- → ?
Summary: Fennec hangs while restarting Motorola Droid2 in landscape mode, after installing add-on/ changing localization causes device to freeze → Fennec hangs while restarting Motorola Droid2 in landscape mode, after installing add-on/ changing localization causes device to freeze; Crash Report [@ GeckoStart ]
Summary: Fennec hangs while restarting Motorola Droid2 in landscape mode, after installing add-on/ changing localization causes device to freeze; Crash Report [@ GeckoStart ] → Fennec hangs while restarting Motorola Droid2 in landscape mode, after installing add-on/ changing localization causes device to freeze; Crash [@ GeckoStart ]
blocking2.0: ? → ---
tracking-fennec: --- → ?
It is #4 top crasher in Fennec 4.0b3pre for the last 3 days.
Keywords: crash, topcrash
Summary: Fennec hangs while restarting Motorola Droid2 in landscape mode, after installing add-on/ changing localization causes device to freeze; Crash [@ GeckoStart ] → Fennec hangs while restarting Motorola Droid2 in landscape mode, after installing add-on/ changing localization; Crash [@ GeckoStart ]
Component: General → Widget: Android
Product: Fennec → Core
QA Contact: general → android
As there is a bug in Socorro (current date is blocked at 26/11/10), comment 2 is wrong.
It is #1 top crasher in Fennec 4.0b3pre for the last 3 days.

It is not a startup crash, even if STR in comment 0 reproduce a startup crash.

The regression range is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=4b9ba5049e66&tochange=7f5cd850578e

Signature	GeckoStart
UUID	6ae7f590-f593-4c7d-b5a0-93ca62101130
Time 	2010-11-30 05:54:12.214770
Uptime	1690
Last Crash	907266 seconds (1.5 weeks) before submission
Install Age	72601 seconds (20.2 hours) since version was first installed.
Product	Fennec
Version	4.0b3pre
Build ID	20101129041950
Branch	2.0
OS	Linux
OS Version	0.0.0 Linux 2.6.32.15-ge2fb08e #11 PREEMPT Wed Sep 1 15:08:40 CST 2010 armv7l
CPU	arm
CPU Info	
Crash Reason	SIGSEGV
Crash Address	0x1c
App Notes 	HTC PC36100
sprint/htc_supersonic/supersonic/supersonic:2.2/FRF91/252548:user/release-keys

Frame 	Module 	Signature [Expand] 	Source
0 		@0x1c 	
1 	libxul.so 	GeckoStart 	toolkit/xre/nsAndroidStartup.cpp:76
2 	libxul.so 	nsWindow::OnAndroidEvent 	widget/src/android/nsWindow.cpp:843
3 	libxul.so 	nsAppShell::ProcessNextNativeEvent 	widget/src/android/nsAppShell.cpp:286
4 	libxul.so 	nsBaseAppShell::DoProcessNextNativeEvent 	widget/src/xpwidgets/nsBaseAppShell.cpp:163
5 	libxul.so 	nsBaseAppShell::OnProcessNextEvent 	widget/src/xpwidgets/nsBaseAppShell.cpp:309
6 	libxul.so 	mozilla::dom::ContentParent::OnProcessNextEvent 	dom/ipc/ContentParent.cpp:655
7 	libxul.so 	nsThread::ProcessNextEvent 	nsTArray.h:135
8 	libxul.so 	NS_ProcessNextEvent_P 	nsThreadUtils.cpp:250
9 	libxul.so 	nsThread::Shutdown 	xpcom/threads/nsThread.cpp:491
10 	libxul.so 	nsSocketTransportService::Shutdown 	netwerk/base/src/nsSocketTransportService2.cpp:467
11 	libxul.so 	nsIOService::SetOffline 	nsCOMPtr.h:800
12 	libxul.so 	nsIOService::Observe 	netwerk/base/src/nsIOService.cpp:921
13 	libxul.so 	nsObserverList::NotifyObservers 	nsVoidArray.h:63
14 	libxul.so 	nsObserverService::NotifyObservers 	nsTHashtable.h:170
15 	libxul.so 	nsXREDirProvider::DoShutdown 	nsCOMPtr.h:800
16 	libxul.so 	ScopedXPCOMStartup::~ScopedXPCOMStartup 	toolkit/xre/nsAppRunner.cpp:1115
17 	libxul.so 	XRE_main 	nsCOMPtr.h:800
18 	libxul.so 	GeckoStart 	toolkit/xre/nsAndroidStartup.cpp:131
19 	libc.so 	libc.so@0x10f47 	
20 	libc.so 	libc.so@0x10a33 	

More reports at:
http://crash-stats.mozilla.com/report/list?product=Fennec&version=Fennec%3A4.0b3pre&query_search=signature&query_type=exact&query=GeckoStart&range_value=1&range_unit=weeks&hang_type=any&process_type=any&plugin_field=&plugin_query_type=&plugin_query=&do_query=1&admin=&signature=GeckoStart
Keywords: regression
Summary: Fennec hangs while restarting Motorola Droid2 in landscape mode, after installing add-on/ changing localization; Crash [@ GeckoStart ] → Fennec crash [@ GeckoStart ]
I hit this crash this morning on the HTC Evo 4 while trying to perform an update from 20101129 to 20101130. I was not trying to install anything at the time of the crash.
tracking-fennec: ? → 2.0+
Priority: -- → P1
Assignee: nobody → blassey.bugs
Got this while clicking the "Apply Update" notification with the main Fennec window closed
Duplicate of this bug: 621049
Since I switched back to nightlies, I am getting this crash consistently when I update my HTC Evo with the STR in Comment 6.
This is just a theory since I haven't yet reproduced this in a local build.  It might be possible for getLibraryMapping to return null if (onNewIntent > launch > runGecko > nativeRun > GeckoStart) happens before (onCreate > loadGeckoLibs > loadLibs).  But if I understand correctly, onCreate should always be called first.
Assignee: blassey.bugs → alexp
http://crash-stats.mozilla.com/report/index/bp-4f238426-34a8-4650-9d60-657152110112 is my latest crash. For me it still happens when I update. Added at the request of ashah who was trying to reproduce it.
I have found why this happens, though haven't come up with the final solution yet.

The actual crash is caused by an attempt to use a window, which was already destroyed during the shutdown procedure.

In my tests it crashed inside nsWindow::OnDraw() called from nsWindow::OnAndroidEvent(). As far as I understand, the nsAppShell::ProcessNextNativeEvent() gets an event from the queue, and that event stores a "NativeWindow", which is a pointer to nsWindow.
When the window is destroyed, the event in the queue still contains the pointer and when it's getting processed, nsWindow::OnAndroidEvent() is called for an invalid window, so it crashes.

One solution could be to check the validity of a window when the event is being processed, but this will be performance hit to check each and every event against some window list.

A better approach probably could be to call some method from the nsWindow destructor, which would go through the events in the queue, and remove those related to this window. I tried some draft mechanism, it helped, but I faced some other issue when Fennec got frozen instead of the crash, so it needs more investigation.

But I'm thinking about skipping that all together, and do the restart completely other way - not by going trough the regular shutdown/restart procedure implemented in native C++ code, but by calling Java method GeckoApp.doRestart(), which will stop and restart the app on a higher level, and will shut down the native libraries as usual. I tried this approach, it worked without issues. I just need to properly propagate the call from FE JS code.

Anyone has any comments/thoughts on this?
Attached patch [WIP] Fix the crash (obsolete) — Splinter Review
This change fixes the crash itself. Though there is still a problem with Restarter freezing - it just doesn't run, its onCreate does not get called.
This patch allows to easily reproduce the issue.
Steps:
- Apply the patch, build, and install
- Start Fennec
- Check for updates from "about:firefox" page (or wait just for a minute - it should check automatically)
- When the update notification appears in the status bar, press Home or Back until Fennec window hides and browser goes to background
- Open the notification area and tap on the update notification
- Fennec will restart right away, and most likely will crash
(In reply to comment #13)
> Created attachment 503541 [details] [diff] [review]
> [WIP] Fix the crash
> 
> This change fixes the crash itself.

Some more observations with this patch:
- The freezing happens when GeckoApp.doRestart() tries to start the Restarter activity - the activity opens, but does not run, its onCreate() method does not get called.
- An attempt to start another application activity instead of the Restarter succeeds - that activity starts and works without any problem.
- Main Fennec activity does not finish properly when finish() is called from doRestart(): onPause() and onDestroy() are not called.

I have a suspicion the problem might be related to the main thread stopping right after it calls GeckoAppShell.onXreExit() and GeckoApp.doRestart().
I'll be away next week, and won't be able to continue working on this issue.
Hope I've provided enough information for someone to nail it and get all the glory. :)
Assignee: alexp → nobody
Attached patch patch (obsolete) — Splinter Review
calling finish() and exit(0) in onXreExit() seems to fix this hang. Also changed the loop in RemoveEventsForWindow() to count backwards.
Assignee: nobody → blassey.bugs
Attachment #503541 - Attachment is obsolete: true
Attachment #504617 - Flags: review?(mwu)
Attached patch patch (obsolete) — Splinter Review
removes an extra finish() call
Attachment #504617 - Attachment is obsolete: true
Attachment #504623 - Flags: review?(mwu)
Attachment #504617 - Flags: review?(mwu)
Duplicate of this bug: 609293
tracking-fennec: 2.0+ → 2.0b4+
Comment on attachment 504623 [details] [diff] [review]
patch

No need to store the native window. It's always TopWindow(), which the global handler will direct the event to if there's no explicit native window. Find all the instances of "new AndroidGeckoEvent(TopWindow()..." in nsWindow.cpp and replace TopWindow() with nsnull. This eliminates the need to deal with events containing pointers to non-existent windows.
Attachment #504623 - Flags: review?(mwu) → review-
Attached patch patchSplinter Review
Attachment #504623 - Attachment is obsolete: true
Attachment #504871 - Flags: review?(mwu)
Comment on attachment 504871 [details] [diff] [review]
patch

So this fixes the issue completely? No need for the java changes?
Attachment #504871 - Flags: review?(mwu) → review+
(In reply to comment #22)
> Comment on attachment 504871 [details] [diff] [review]
> patch
> 
> So this fixes the issue completely? No need for the java changes?

yea.... no need for the pixie dust. 

My best guess is that the other patch threw out an event that was needed for a clean shut down, which prevented us from exiting until we called exit(0) explicitly. We might want to take that java change as well though since it seems to protect against that particular problem.
pushed http://hg.mozilla.org/mozilla-central/rev/d72f0df4b1c8
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Verified as fixed on buildID: Mozilla /5.0 (Android;Linux armv7l;rv:2.0b10pre) Gecko/20110120 Firefox/4.0b10pre Fennec /4.0b4pre; device: Motorola Droid 2.
Status: RESOLVED → VERIFIED
Crash Signature: [@ GeckoStart ]
You need to log in before you can comment on or make changes to this bug.