Closed Bug 999071 Opened 6 years ago Closed 6 years ago

crash in nsObserverService::NotifyObservers(nsISupports*, char const*, char16_t const*)

Categories

(Firefox for Android :: General, defect, critical)

31 Branch
All
Android
defect
Not set
critical

Tracking

()

RESOLVED FIXED
Firefox 32
Tracking Status
firefox31 + fixed
firefox32 --- fixed

People

(Reporter: aaronmt, Assigned: jchen)

References

Details

(Keywords: crash, regression, topcrash-android-armv7)

Crash Data

Attachments

(4 files, 1 obsolete file)

This bug was filed from the Socorro interface and is 
report bp-18dcaf46-2538-472b-86a5-1837c2140416.
=============================================================
~77 crashes in the past 7 days affecting wide range of devices
Assignee: nobody → nchen
Status: NEW → ASSIGNED
Blocks: 993734
Keywords: regression
The immediate cause of this is bug 993734 (Firefox 31). The code which uses the observer service on the wrong thread was introduced in bug 863777 (Firefox 24).
Duplicate of this bug: 999543
Jumped to #1.
Duplicate of this bug: 1000045
The Android ANR reporter uses the profiler from off the main thread, which is necessary. Unfortunately that means we need special cases in the profiler to guard against using the observer service from off the main thread.
Attachment #8411205 - Flags: review?(bgirard)
This patch adds a test for the ANR reporter that also acts as a test for this bug.
Attachment #8411208 - Flags: review?(blassey.bugs)
For the ANR reporter test to work, the test needs write access to the ANR traces file. When we delete that file in the test harness, tests no longer have the right permissions to create it or write to it. Therefore, instead of deleting the file, we should empty the file and set the right permissions so that the ANR reporter test works.
Attachment #8411211 - Flags: review?(gbrown)
Attachment #8411211 - Flags: review?(gbrown) → review+
Attachment #8411208 - Flags: review?(blassey.bugs) → review+
Blocks: 1000889
Comment on attachment 8411205 [details] [diff] [review]
Don't notify observers in profiler when used by Android ANR reporter (v1)

Review of attachment 8411205 [details] [diff] [review]:
-----------------------------------------------------------------

Shouldn't this just wrap the code it wants disabled (observer stuff)? When we add more code to these methods then it will get accidentally disabled for ANR.
(In reply to Benoit Girard (:BenWa) from comment #9)
> Comment on attachment 8411205 [details] [diff] [review]
> Don't notify observers in profiler when used by Android ANR reporter (v1)
> 
> Review of attachment 8411205 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Shouldn't this just wrap the code it wants disabled (observer stuff)? When
> we add more code to these methods then it will get accidentally disabled for
> ANR.

Yeah we can do that.
Attachment #8411205 - Attachment is obsolete: true
Attachment #8411205 - Flags: review?(bgirard)
Attachment #8411929 - Flags: review?(bgirard)
Attachment #8411929 - Flags: review?(bgirard) → review+
I had to back this out in https://hg.mozilla.org/integration/mozilla-inbound/rev/d83cad7c5853 for android rc1 failures: https://tbpl.mozilla.org/php/getParsedLog.php?id=38446086&tree=Mozilla-Inbound


1 INFO TEST-START | testANRReporter
2 INFO TEST-PASS | testANRReporter | Given message occurred for registered event: {"type":"Gecko:Ready"} - Gecko:Ready should equal Gecko:Ready
EventExpecter: no longer listening for Gecko:Ready
3 INFO TEST-PASS | testANRReporter | Ping directory is empty
4 INFO TEST-PASS | testANRReporter | Successfully set package name - org.mozilla.fennec should equal org.mozilla.fennec
5 INFO TEST-PASS | testANRReporter | testContext should not be null - android.app.ContextImpl@48394508 should not equal null
6 INFO TEST-INFO | testANRReporter | Triggering ANR
7 INFO TEST-INFO | testANRReporter | Triggering second ANR
8 INFO TEST-INFO | testANRReporter | Waiting for ping
9 INFO TEST-PASS | testANRReporter | Ping directory exists
10 INFO TEST-PASS | testANRReporter | Ping directory is a directory
11 INFO TEST-PASS | testANRReporter | Ping directory is not empty - [Ljava.io.File;@484b70c0 should not equal null
12 INFO TEST-PASS | testANRReporter | ANR reporter wrote one ping - 1 should equal 1
13 INFO TEST-PASS | testANRReporter | Ping exists
14 INFO TEST-PASS | testANRReporter | Ping is a file
15 INFO TEST-PASS | testANRReporter | Ping is readable
16 INFO TEST-INFO | testANRReporter | Found ping file - /mnt/sdcard/tests/profile/saved-telemetry-pings/1b0d7e88-540c-4c49-a9ab-0759043f325e
17 INFO TEST-PASS | testANRReporter | Read from ping file
Exception caught during test!
org.json.JSONException: Unterminated string at character 2647 of {"reason":"android-anr-report","slug":"1b0d7e88-540c-4c49-a9ab-0759043f325e","payload":{"ver":1,"simpleMeasurements":{"uptime":0},"info":{"reason":"android-anr-report","OS":"Android","version":"8","appID":"{aa3c5121-dab2-40e2-81ca-7ea25febc110}","appVersion":"31.0a1","appName":"Fennec","appBuildID":"20140424145909","appUpdateChannel":"default","platformBuildID":"20140424145909","locale":"en-US","cpucount":2,"memsize":881,"arch":"armeabi-v7a","kernel_version":"2.6.32.9-00002-gd8084dc-dirty","device":"harmony","manufacturer":"nvidia","hardware":"nvidia"},"androidANR":"----- pid 1 at 2014-01-15 18:55:51 -----\nCmd line: org.mozilla.fennec\n\nJNI: CheckJNI is off; workarounds are off; pins=0; globals=397\n\nDALVIK THREADS:\n(mutexes: tll=0 tsl=0 tscl=0 ghl=0)\n\n\"main\" prio=5 tid=1 WAIT\n  | group=\"main\" sCount=1 dsCount=0 obj=0x41d6bc90 self=0x41d5a3c8\n  | sysTid=3485 nice=0 sched=0\/0 cgrp=apps handle=1074852180\n  | state=S schedstat=( 0 0 0 ) utm=1065 stm=152 core=0\n  at java.lang.Object.wait(Native Method)\n  - waiting on <0x427ab340> (a org.mozilla.gecko.GeckoEditable$5)\n  at java.lang.Object.wait(Object.java:364)\n  at org.mozilla.gecko.GeckoEditable$5.run(GeckoEditable.java:746)\n  at android.os.Handler.handleCallback(Handler.java:733)\n  at android.os.Handler.dispatchMessage(Handler.java:95)\n  at android.os.Looper.loop(Looper.java:137)\n  at android.app.ActivityThread.main(ActivityThread.java:4998)\n  at java.lang.reflect.Method.invokeNative(Native Method)\n  at java.lang.reflect.Method.invoke(Method.java:515)\n  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:777)\n  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:593)\n  at dalvik.system.NativeStart.main(Native Method)\n\n\"Gecko\" prio=5 tid=16 SUSPENDED\n  | group=\"main\" sCount=1 dsCount=0 obj=0x426e2b28 self=0x76ae92e8\n  | sysTid=3541 nice=0 sched=0\/0 cgrp=apps handle=1991153472\n  | state=S schedstat=( 0 0 0 ) utm=1118 stm=145 core=0\n  #00  pc 00000904  \/system\/lib\/libc.so (__futex_syscall3+4294832136)\n  #01  pc 0000eec4  \/system\/lib\/libc.so (__pthread_cond_timedwait_relative+48)\n  #02  pc 0000ef24  \/system\/lib\/libc.so (__pthread_cond_timedwait+64)\n  #03  pc 000536b7  \/system\/lib\/libdvm.so\n  #04  pc 00053c79  \/system\/lib\/libdvm.so (dvmChangeStatus(Thread*, ThreadStatus)+34)\n  #05  pc 00049507  \/system\/lib\/libdvm.so\n  #06  pc 0004d84b  \/system\/lib\/libdvm.so\n  #07  pc 0003f1df  \/dev\/ashmem\/libxul.so (deleted)\n  at org.mozilla.gecko.mozglue.GeckoLoader.nativeRun(Native Method)\n  at org.mozilla.gecko.GeckoAppShell.runGecko(GeckoAppShell.java:384)\n  at org.m
	at org.json.JSONTokener.syntaxError(JSONTokener.java:446)
	at org.json.JSONTokener.nextString(JSONTokener.java:226)
	at org.json.JSONTokener.nextValue(JSONTokener.java:103)
	at org.json.JSONTokener.readObject(JSONTokener.java:381)
	at org.json.JSONTokener.nextValue(JSONTokener.java:96)
	at org.json.JSONTokener.readObject(JSONTokener.java:381)
	at org.json.JSONTokener.nextValue(JSONTokener.java:96)
	at org.json.JSONObject.<init>(JSONObject.java:154)
	at org.json.JSONObject.<init>(JSONObject.java:171)
	at org.mozilla.gecko.tests.testANRReporter.testANRReporter(testANRReporter.java:173)
	at java.lang.reflect.Method.invokeNative(Native Method)
	at java.lang.reflect.Method.invoke(Method.java:521)
	at android.test.InstrumentationTestCase.runMethod(InstrumentationTestCase.java:204)
	at android.test.InstrumentationTestCase.runTest(InstrumentationTestCase.java:194)
	at android.test.ActivityInstrumentationTestCase2.runTest(ActivityInstrumentationTestCase2.java:186)
	at org.mozilla.gecko.tests.BaseTest.runTest(BaseTest.java:125)
	at junit.framework.TestCase.runBare(TestCase.java:127)
	at junit.framework.TestResult$1.protect(TestResult.java:106)
	at junit.framework.TestResult.runProtected(TestResult.java:124)
	at junit.framework.TestResult.run(TestResult.java:109)
	at junit.framework.TestCase.run(TestCase.java:118)
	at android.test.AndroidTestRunner.runTest(AndroidTestRunner.java:169)
	at android.test.AndroidTestRunner.runTest(AndroidTestRunner.java:154)
	at android.test.InstrumentationTestRunner.onStart(InstrumentationTestRunner.java:520)
	at android.app.Instrumentation$InstrumentationThread.run(Instrumentation.java:1447)
18 INFO TEST-UNEXPECTED-FAIL | testANRReporter | Exception caught - org.json.JSONException: Unterminated string at character 2647 of {"reason":"android-anr-report","slug":"1b0d7e88-540c-4c49-a9ab-0759043f325e","payload":{"ver":1,"simpleMeasurements":{"uptime":0},"info":{"reason":"android-anr-report","OS":"Android","version":"8","appID":"{aa3c5121-dab2-40e2-81ca-7ea25febc110}","appVersion":"31.0a1","appName":"Fennec","appBuildID":"20140424145909","appUpdateChannel":"default","platformBuildID":"20140424145909","locale":"en-US","cpucount":2,"memsize":881,"arch":"armeabi-v7a","kernel_version":"2.6.32.9-00002-gd8084dc-dirty","device":"harmony","manufacturer":"nvidia","hardware":"nvidia"},"androidANR":"----- pid 1 at 2014-01-15 18:55:51 -----\nCmd line: org.mozilla.fennec\n\nJNI: CheckJNI is off; workarounds are off; pins=0; globals=397\n\nDALVIK THREADS:\n(mutexes: tll=0 tsl=0 tscl=0 ghl=0)\n\n\"main\" prio=5 tid=1 WAIT\n  | group=\"main\" sCount=1 dsCount=0 obj=0x41d6bc90 self=0x41d5a3c8\n  | sysTid=3485 nice=0 sched=0\/0 cgrp=apps handle=1074852180\n  | state=S schedstat=( 0 0 0 ) utm=1065 stm=152 core=0\n  at java.lang.Object.wait(Native Method)\n  - waiting on <0x427ab340> (a org.mozilla.gecko.GeckoEditable$5)\n  at java.lang.Object.wait(Object.java:364)\n  at org.mozilla.gecko.GeckoEditable$5.run(GeckoEditable.java:746)\n  at android.os.Handler.handleCallback(Handler.java:733)\n  at android.os.Handler.dispatchMessage(Handler.java:95)\n  at android.os.Looper.loop(Looper.java:137)\n  at android.app.ActivityThread.main(ActivityThread.java:4998)\n  at java.lang.reflect.Method.invokeNative(Native Method)\n  at java.lang.reflect.Method.invoke(Method.java:515)\n  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:777)\n  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:593)\n  at dalvik.system.NativeStart.main(Native Method)\n\n\"Gecko\" prio=5 tid=16 SUSPENDED\n  | group=\"main\" sCount=1 dsCount=0 obj=0x426e2b28 self=0x76ae92e8\n  | sysTid=3541 nice=0 sched=0\/0 cgrp=apps handle=1991153472\n  | state=S schedstat=( 0 0 0 ) utm=1118 stm=145 core=0\n  #00  pc 00000904  \/system\/lib\/libc.so (__futex_syscall3+4294832136)\n  #01  pc 0000eec4  \/system\/lib\/libc.so (__pthread_cond_timedwait_relative+48)\n  #02  pc 0000ef24  \/system\/lib\/libc.so (__pthread_cond_timedwait+64)\n  #03  pc 000536b7  \/system\/lib\/libdvm.so\n  #04  pc 00053c79  \/system\/lib\/libdvm.so (dvmChangeStatus(Thread*, ThreadStatus)+34)\n  #05  pc 00049507  \/system\/lib\/libdvm.so\n  #06  pc 0004d84b  \/system\/lib\/libdvm.so\n  #07  pc 0003f1df  \/dev\/ashmem\/libxul.so (deleted)\n  at org.mozilla.gecko.mozglue.GeckoLoader.nativeRun(Native Method)\n  at org.mozilla.gecko.GeckoAppShell.runGecko(GeckoAppShell.java:384)\n  at org.m
19 INFO TEST-END | testANRReporter | finished in 11792ms
20 INFO TEST-START | Shutdown
The test was failing because it didn't wait long enough for the ANR reporter to write the ping.

Since the fix is not affected, I pushed it again to catch the train,
https://hg.mozilla.org/integration/mozilla-inbound/rev/60b6bdaafba7
Keywords: leave-open
(In reply to Wes Kocher (:KWierso) from comment #14)
> I had to back this out again in
> https://hg.mozilla.org/integration/mozilla-inbound/rev/97fc047681a2 for
> different android rc1 orange:
> https://tbpl.mozilla.org/php/getParsedLog.php?id=38514189&tree=Mozilla-
> Inbound

That was unrelated. See bug 977167 comment 12.
Flags: needinfo?(nchen)
The test failed because we did not wait long enough for the ping to finish writing. In that case we could be reading a partially-written ping and fail to parse it as a result. This patch makes sure that the ping is finished writing before continuing.
Attachment #8413824 - Flags: review?(blassey.bugs)
Attachment #8413824 - Flags: review?(blassey.bugs) → review+
Let's get an uplift to Fx31 as soon as we feel the code has stablized
(In reply to Mark Finkle (:mfinkle) from comment #19)
> Let's get an uplift to Fx31 as soon as we feel the code has stablized

Yup. The test was giving me trouble, but the fix itself is very low risk and can be uplifted ASAP.
Comment on attachment 8411929 [details] [diff] [review]
Don't notify observers in profiler when used by Android ANR reporter (v2)

[Approval Request Comment]

Bug caused by (feature/regressing bug #): Bug 993734

User impact if declined: Top crash

Testing completed (on m-c, etc.): Locally, m-c

Risk to taking this patch (and alternatives if risky): Very low risk; only affects Android

String or IDL/UUID changes made by this patch: None
Attachment #8411929 - Flags: approval-mozilla-aurora?
Attachment #8411929 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.