Closed
Bug 1501449
Opened 7 years ago
Closed 7 years ago
Context.startForegroundService() did not then call Service.startForeground()
Categories
(Firefox for Android Graveyard :: General, defect, P1)
Tracking
(firefox63+ fixed, firefox64 fixed, firefox65 fixed)
RESOLVED
FIXED
Firefox 65
People
(Reporter: lizzard, Assigned: petru)
References
Details
(Keywords: regression, Whiteboard: --do_not_change--[priority:high])
Attachments
(1 file)
|
46 bytes,
text/x-phabricator-request
|
pascalc
:
approval-mozilla-release+
|
Details | Review |
This hang is spiking with 63 release with 2300 reports today affecting 1600 users.
It seems to be spread fairly evenly across different devices.
| Reporter | ||
Updated•7 years ago
|
Comment 1•7 years ago
|
||
Looks as if the top affected devices are all Samsung Galaxy devices:
Galaxy S8 (dreamlte)
Galaxy S9+ (star2qltesq)
Galaxy S7 (herolte)
Galaxy Note8 (greatqlte)
Galaxy S9 (starqltesq)
Comment 2•7 years ago
|
||
I forgot to add all the users were running Android 8.0.
Updated•7 years ago
|
| Assignee | ||
Comment 3•7 years ago
|
||
I cannot find recent reports for this
https://crash-stats.mozilla.com/topcrashers/?product=FennecAndroid&version=63.0&days=3&_facets_size=50
| Assignee | ||
Updated•7 years ago
|
Assignee: nobody → petru.lingurar
Status: NEW → ASSIGNED
Updated•7 years ago
|
Comment 4•7 years ago
|
||
(In reply to Petru-Mugurel Lingurar[:petru] from comment #3)
> I cannot find recent reports for this
> https://crash-stats.mozilla.com/topcrashers/
> ?product=FennecAndroid&version=63.0&days=3&_facets_size=50
It is possible that somehow a crash report isn't getting generated. Sometimes we see things in GPC that don't necessarily show up in crash stats.
Comment 5•7 years ago
|
||
Context.startForegroundService() did not then call Service.startForeground() also shows up as an issue in beta - just checked the Google Play Console. There is also another similar error above that one in beta:
Context.startForegroundService() did not then call Service.startForeground(): ServiceRecord{ef8fc2c u0 org.mozilla.firefox_beta/org.mozilla.gecko.CrashReporterService}
| Assignee | ||
Comment 6•7 years ago
|
||
As I understand this was flagged by Google Play right after the 63 launch.
But it doesn't appear in our crash reports (hint 1).
There are 6 cases in which we call `startForegroundService()` [1]
In 4 of those cases `startForeground()` immediately follows in Service's `onStartCommand()` so this should work nicely.
In two cases `startForegroundService()` is used to start CrashHandlerService which does not call `startForeground()` with a Notification [2].
We knew this and decided that it was better to omit it so to not bug the user and also because CrashHandlerService's work takes very little time.
Because we are not posting a notification for CrashHandlerService's background work, after ~5 seconds ActivityManager posts an ANR in Android's log with the message `Context.startForegroundService() did not then call Service.startForeground()`. Nothing gets through to the user though.
I think this might be the cause for this recent errors. They haven't appeared before the release and they don't appear in our crash stats because they are transparent to the user. But Google Play still logs them because they do appear in Android's log.
I saw that :snorp updated the crash reporting API in bug 1483329 and now `CrashHandlerService` calls `System.exit(0)` to shutdown the JVM before the ANR [3].
To not wait for 64 we could
- bring only this small change down to 63
- actually use a user visible Notification. This would need it's own separate Notification Channel though and I think will need to be supported in the future (although there would be no need for the Notification in 64 with the updated crash reporting API).
Because of this I think we should go with just bringing down that small change: terminate the app immediately after `CrashHandlerService` finishes it's work and so avoid that ANR.
Please keep in mind that this would be a speculative fix as I do not have any reports/stacktrace/STR.
[1] https://dxr.mozilla.org/mozilla-central/search?q=startForegroundService&redirect=true
[2] https://hg.mozilla.org/releases/mozilla-release/file/84219fbf133c/mobile/android/geckoview/src/main/java/org/mozilla/gecko/CrashReporterService.java#l80
[3] https://dxr.mozilla.org/mozilla-central/source/mobile/android/base/java/org/mozilla/gecko/CrashHandlerService.java#21
| Assignee | ||
Comment 7•7 years ago
|
||
(In reply to Marcia Knous [:marcia - needinfo? me] from comment #5)
> Context.startForegroundService() did not then call Service.startForeground()
> also shows up as an issue in beta - just checked the Google Play Console.
> There is also another similar error above that one in beta:
>
> Context.startForegroundService() did not then call
> Service.startForeground(): ServiceRecord{ef8fc2c u0
> org.mozilla.firefox_beta/org.mozilla.gecko.CrashReporterService}
And here is the reference to to CrashReporterService as I suspected.
So I think there are so many of this errors because this would be posted in Android's log for each and every error our crash reporter catches. But they do not get through to the user.
Still, to keep GP's Console clean I think that before FF 64 which should avoid this situation we could go on the same route and terminate the app immediately after `CrashReporterService` finishes its work to avoid that ANR.
Comment 8•7 years ago
|
||
There is a report at the bottom of the page where this is called out, it starts with what is below, but there are also further stacks - if it is useful I can cut and paste and add as an attachment:
"main" tid=1 Native
"main" prio=5 tid=1 Native
| group="main" sCount=1 dsCount=0 flags=1 obj=0x7329c4d8 self=0xeb9f5000
| sysTid=24993 nice=0 cgrp=default sched=0/0 handle=0xefe0c4bc
| state=S schedstat=( 88368963 1142576 103 ) utm=5 stm=3 core=1 HZ=100
| stack=0xff0be000-0xff0c0000 stackSize=8MB
| held mutexes=
#00 pc 000000000004a090 /system/lib/libc.so (__epoll_pwait+20)
#01 pc 000000000001b77d /system/lib/libc.so (epoll_pwait+60)
#02 pc 000000000001b7ad /system/lib/libc.so (epoll_wait+12)
#03 pc 0000000000010343 /system/lib/libutils.so (_ZN7android6Looper9pollInnerEi+118)
#04 pc 0000000000010235 /system/lib/libutils.so (_ZN7android6Looper8pollOnceEiPiS1_PPv+32)
#05 pc 00000000000e4ba1 /system/lib/libandroid_runtime.so (_ZN7android18NativeMessageQueue8pollOnceEP7_JNIEnvP8_jobjecti+24)
#06 pc 00000000001a627d /system/framework/arm/boot-framework.oat (Java_android_os_MessageQueue_nativePollOnce__JI+92)
at android.os.MessageQueue.nativePollOnce (Native method)
at android.os.MessageQueue.next (MessageQueue.java:325)
at android.os.Looper.loop (Looper.java:142)
at android.app.ActivityThread.main (ActivityThread.java:6944)
at java.lang.reflect.Method.invoke (Native method)
at com.android.internal.os.Zygote$MethodAndArgsCaller.run (Zygote.java:327)
at com.android.internal.os.ZygoteInit.main (ZygoteInit.java:1374)
| Assignee | ||
Comment 9•7 years ago
|
||
This service is started as a foreground service.
But because it does little work it doesn't post a Notification to bug the user
unneededly.
Although all it's work is done in < 5 seconds, after them passing the
ActivityManager will post an ANR error to logcat because we aren't calling
Service.startForeground() and actually posting a Notification.
Although nothing gets through to the user this errors are registered in the
Google Play console.
To prevent them we terminate the currently running JVM immedately after
CrashReporterService's work is completed, which is under those ~5 seconds.
The call for this is put into onUnbind() because in my tests if that call is put
at the end of onHandleWork() the JVM would be shutdown before the JobScheduler
would have had the chance to unbind and so after some time the service would be
binded again, and onHandleWork would be executed again.
onUbind() is called immediately after onHandleWork() finishes, after JobScheduler
unbinds, so this seems the earliest possible moment to terminate.
Comment 10•7 years ago
|
||
This moved back into the top ANR spot on the GPC and we received another alert on 10-27.
| Assignee | ||
Comment 11•7 years ago
|
||
This ANR is logged in Google Play console for every crash caught by our crash reporter.
But it should not get through to the user.
(A simple test would be going to "about:crashparent" . This would crash the browser and display the CrashReporterActivity which allows to user to enter details about the crash. After ~5 seconds the ANR would be posted in logs, but this would not cause any other issue to the user)
Comment 12•7 years ago
|
||
(In reply to Petru-Mugurel Lingurar[:petru] from comment #11)
> This ANR is logged in Google Play console for every crash caught by our
> crash reporter.
> But it should not get through to the user.
> (A simple test would be going to "about:crashparent" . This would crash the
> browser and display the CrashReporterActivity which allows to user to enter
> details about the crash. After ~5 seconds the ANR would be posted in logs,
> but this would not cause any other issue to the user)
Is there a reason it primarily is happening on Android 8?
Comment 13•7 years ago
|
||
Comment 14•7 years ago
|
||
Randall, we would like to have this patch in a Fennec dot release in the days to come, can you give us a status update on the review of the patch? Thanks
Flags: needinfo?(rbarker)
Updated•7 years ago
|
Flags: needinfo?(rbarker)
| Assignee | ||
Comment 15•7 years ago
|
||
Comment on attachment 9020360 [details]
Bug 1501449 - Terminate the JVM immediately after CrashReporterService finishes to prevent ANR; r?sdaswani
[Beta/Release Uplift Approval Request]
Feature/Bug causing the regression: Bug 1467461
User impact if declined: Nothing should be visible to the user but the Google Play Console would be clogged with ANR reports.
Is this code covered by automated tests?: No
Has the fix been verified in Nightly?: No
Needs manual test from QE?: No
If yes, steps to reproduce:
List of other uplifts needed: None
Risk to taking this patch: Low
Why is the change risky/not risky? (and alternatives if risky): Low risk because it is a small change, tested locally.
It mimics what is already done on Beta 64 with no reported issues.
String changes made/needed:
Attachment #9020360 -
Flags: approval-mozilla-release?
| Reporter | ||
Comment 16•7 years ago
|
||
Marking this fixed for 64 and 65 from comment 6 -- it was fixed in bug 1483329.
We still need to land the patch here on m-r.
status-firefox64:
--- → fixed
status-firefox65:
--- → fixed
Comment 17•7 years ago
|
||
Comment on attachment 9020360 [details]
Bug 1501449 - Terminate the JVM immediately after CrashReporterService finishes to prevent ANR; r?sdaswani
Minimal patch already on nightly and beta with no regression, approved for our next Fennec dot release thanks.
[Triage Comment]
Attachment #9020360 -
Flags: approval-mozilla-release? → approval-mozilla-release+
Updated•7 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Comment 18•7 years ago
|
||
| bugherder uplift | ||
| Reporter | ||
Updated•7 years ago
|
Priority: -- → P1
Updated•5 years ago
|
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•