Last Comment Bug 757025 - crash in mozilla::hal::NotifyScreenConfigurationChange
: crash in mozilla::hal::NotifyScreenConfigurationChange
Status: VERIFIED FIXED
[native-crash]
: crash
Product: Core
Classification: Components
Component: Widget: Android (show other bugs)
: Trunk
: ARM Android
: -- critical (vote)
: mozilla15
Assigned To: Kartikaya Gupta (email:kats@mozilla.com)
:
: Jim Chen [:jchen] [:darchons]
Mentors:
: 740329 (view as bug list)
Depends on: 757821
Blocks:
  Show dependency treegraph
 
Reported: 2012-05-21 06:15 PDT by Scoobidiver (away)
Modified: 2012-05-30 02:10 PDT (History)
8 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
verified
verified
+


Attachments
Patch to force the error (811 bytes, patch)
2012-05-24 06:22 PDT, Kartikaya Gupta (email:kats@mozilla.com)
no flags Details | Diff | Splinter Review
Patch (693 bytes, patch)
2012-05-24 06:26 PDT, Kartikaya Gupta (email:kats@mozilla.com)
cjones.bugs: review+
bugzilla: approval‑mozilla‑aurora+
Details | Diff | Splinter Review

Description Scoobidiver (away) 2012-05-21 06:15:40 PDT
It's likely related to or a new form of bug 740329.

Signature 	mozilla::hal::NotifyScreenConfigurationChange More Reports Search
UUID	0f35cd89-5871-408f-a6ab-c02e02120521
Date Processed	2012-05-21 11:43:05
Uptime	2552
Last Crash	3.1 hours before submission
Install Age	18.2 hours since version was first installed.
Install Time	2012-05-20 17:33:26
Product	FennecAndroid
Version	15.0a1
Build ID	20120520030530
Release Channel	nightly
OS	Linux
OS Version	0.0.0 Linux 3.0.8-gda6252b #1 SMP PREEMPT Fri Apr 13 11:35:09 PDT 2012 armv7l
Build Architecture	arm
Build Architecture Info	
Crash Reason	SIGSEGV
Crash Address	0x0
App Notes 	
AdapterVendorID: tuna, AdapterDeviceID: Galaxy Nexus.
AdapterDescription: 'Model: 'Galaxy Nexus', Product: 'yakju', Manufacturer: 'samsung', Hardware: 'tuna''.
samsung Galaxy Nexus
google/yakju/maguro:4.0.4/IMM76I/330937:user/release-keys
EMCheckCompatibility	True

Frame 	Module 	Signature 	Source
0 	libxul.so 	mozilla::hal::NotifyScreenConfigurationChange 	
1 	libxul.so 	nsAppShell::ProcessNextNativeEvent 	widget/android/nsAppShell.cpp:586
2 	libxul.so 	nsBaseAppShell::DoProcessNextNativeEvent 	widget/xpwidgets/nsBaseAppShell.cpp:172
3 	libxul.so 	nsBaseAppShell::OnProcessNextEvent 	widget/xpwidgets/nsBaseAppShell.cpp:313
4 	libxul.so 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:618
5 	libxul.so 	NS_ProcessNextEvent_P 	obj-firefox/xpcom/build/nsThreadUtils.cpp:245
6 	libxul.so 	mozilla::ipc::MessagePump::Run 	ipc/glue/MessagePump.cpp:114
7 	libxul.so 	MessageLoop::RunInternal 	ipc/chromium/src/base/message_loop.cc:208
8 	libxul.so 	MessageLoop::Run 	ipc/chromium/src/base/message_loop.cc:201
9 	libxul.so 	nsBaseAppShell::Run 	widget/xpwidgets/nsBaseAppShell.cpp:196
10 	libxul.so 	nsAppStartup::Run 	toolkit/components/startup/nsAppStartup.cpp:296
11 	libxul.so 	XREMain::XRE_mainRun 	toolkit/xre/nsAppRunner.cpp:3792
12 	libxul.so 	XREMain::XRE_main 	toolkit/xre/nsAppRunner.cpp:3869
13 	libxul.so 	XRE_main 	toolkit/xre/nsAppRunner.cpp:3945
...

More reports at:
https://crash-stats.mozilla.com/report/list?signature=mozilla%3A%3Ahal%3A%3ANotifyScreenConfigurationChange
Comment 1 Martijn Wargers [:mwargers] (not working for Mozilla) 2012-05-21 08:11:55 PDT
Yes, I was crashing here.
Comment 2 Martijn Wargers [:mwargers] (not working for Mozilla) 2012-05-21 09:02:59 PDT
I was hoping I could get a similar testcase as bug 735114, but am unable to, thus far.
Comment 3 Kartikaya Gupta (email:kats@mozilla.com) 2012-05-22 16:19:04 PDT
*** Bug 740329 has been marked as a duplicate of this bug. ***
Comment 4 Kartikaya Gupta (email:kats@mozilla.com) 2012-05-22 16:20:51 PDT
(In reply to Scoobidiver from comment #0)
> It's likely related to or a new form of bug 740329.
> 

Yeah, bug 745145 changed NotifyScreenOrientationChange to NotifyScreenConfigurationChange so they're the same bug. I duped that one to here since this reflects the latest code in m-c. Also cc'ing Chris Jones who seems to have touched this code a fair bit.
Comment 5 Martijn Wargers [:mwargers] (not working for Mozilla) 2012-05-22 16:33:54 PDT
I could sort of reproduce it with this:
http://people.mozilla.org/~mwargers/tests/dom/overflow_orientationchange_parent.html
You need to allow popups.
And in order to have a chance of reproducing, you need to switch orientation portrait/landscape while the testcase is running.

I managed to reproduce it a couple of times on my Galaxy Nexus, but it is very difficult.
Comment 6 Scoobidiver (away) 2012-05-22 21:49:17 PDT
I am adding the signature from 14.0 in order to track it for the first native version.
It's #28 top crasher in 14.0b2, and #16 if you remove fixed crashes.
Comment 8 Chris Peterson [:cpeterson] 2012-05-23 10:09:38 PDT
This crash has jumped to #1 for 15.0a1.
Comment 9 Scoobidiver (away) 2012-05-23 10:15:14 PDT
(In reply to Chris Peterson (:cpeterson) from comment #8)
> This crash has jumped to #1 for 15.0a1.
Because one user found STR.
Comment 10 Martijn Wargers [:mwargers] (not working for Mozilla) 2012-05-23 10:16:24 PDT
Yeah, that user would be me. I guess topcrash data is easy to influence.
Comment 11 Kartikaya Gupta (email:kats@mozilla.com) 2012-05-23 19:42:00 PDT
(In reply to Martijn Wargers [:mw22] (QA - IRC nick: mw22) from comment #10)
> I guess topcrash data is easy to influence.

I suppose that's a really good problem to have :)
Comment 12 Robert Kaiser 2012-05-24 04:42:35 PDT
(In reply to Kartikaya Gupta (:kats) from comment #11)
> I suppose that's a really good problem to have :)

Well, in this case it's because we have so few ADUs - ~800 for native Nightly.
Comment 13 Kartikaya Gupta (email:kats@mozilla.com) 2012-05-24 06:03:37 PDT
(In reply to Martijn Wargers [:mw22] (QA - IRC nick: mw22) from comment #5)
> I managed to reproduce it a couple of times on my Galaxy Nexus, but it is
> very difficult.

Any chance you still have the logcat from when you reproduced it?
Comment 14 Kartikaya Gupta (email:kats@mozilla.com) 2012-05-24 06:22:54 PDT
Created attachment 626778 [details] [diff] [review]
Patch to force the error

I think I figured out why this is happening. The code in RemoveObserver [1] deletes the mObservers array when all observers have been removed, and also disables notifications. In a single-threaded world this is fine, as no further notifications will come through. On Android (maybe B2G as well?), however, the screen orientation notification is asynchronous (sent as a message from java->gecko) and so there can be a notification in-flight when this happens. The notification gets processed when mObservers is null at [2] and causes the sadness.

I was able to artificially induce the scenario by applying the attached patch, which forces a screen orientation notification at exactly the wrong time, and loading :mw22's test page. The MOZ_ASSERT(mObservers) trips and SIGSEGV's in my debug build, but on a release build I assume it would continue after spitting out a warning or something. The fix should be to return there in the non-debug case.

[1] http://hg.mozilla.org/mozilla-central/annotate/1158503601be/hal/Hal.cpp#l181
[2] http://hg.mozilla.org/mozilla-central/annotate/1158503601be/hal/Hal.cpp#l209
Comment 15 Kartikaya Gupta (email:kats@mozilla.com) 2012-05-24 06:26:42 PDT
Created attachment 626781 [details] [diff] [review]
Patch
Comment 16 Chris Peterson [:cpeterson] 2012-05-24 10:24:33 PDT
Comment on attachment 626781 [details] [diff] [review]
Patch

Review of attachment 626781 [details] [diff] [review]:
-----------------------------------------------------------------

::: hal/Hal.cpp
@@ +206,5 @@
>      }
>    }
>  
>    void BroadcastInformation(const InfoType& aInfo) {
>      MOZ_ASSERT(mObservers);

kats, if NULL mObservers is a valid case, should we remove the assert?
Comment 17 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-05-24 10:40:34 PDT
Comment on attachment 626781 [details] [diff] [review]
Patch

Thanks for debugging this.

>diff --git a/hal/Hal.cpp b/hal/Hal.cpp

>   void BroadcastInformation(const InfoType& aInfo) {
>     MOZ_ASSERT(mObservers);

Please remove this assert and add a comment about why we're null-checking.

r=me with that.
Comment 18 Kartikaya Gupta (email:kats@mozilla.com) 2012-05-24 10:57:20 PDT
Landed with the assert removed and a comment added.

https://hg.mozilla.org/integration/mozilla-inbound/rev/56ad2fee6962
Comment 19 Ed Morley [:emorley] 2012-05-25 08:35:18 PDT
https://hg.mozilla.org/mozilla-central/rev/56ad2fee6962
Comment 20 Kartikaya Gupta (email:kats@mozilla.com) 2012-05-25 08:41:52 PDT
Comment on attachment 626781 [details] [diff] [review]
Patch

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 
User impact if declined: crashes
Testing completed (on m-c, etc.): on m-c
Risk to taking this patch (and alternatives if risky): mobile blocker
String or UUID changes made by this patch: none
Comment 21 Kartikaya Gupta (email:kats@mozilla.com) 2012-05-28 11:47:12 PDT
https://hg.mozilla.org/releases/mozilla-aurora/rev/8b330f11e780
Comment 22 Catalin Suciu [:csuciu] 2012-05-30 02:10:21 PDT
I was not able to reproduce this crash using Martijn's testcase (see comment #5)

Verified fixed on:
Nightly 15.0a1 (2012-05-29) 
Aurora 14.0a2 (2012-05-29)

Device: Galaxy Nexus (Android 4.0.2)

Note You need to log in before you can comment on or make changes to this bug.