ANR executing service org.mozilla.firefox/org.mozilla.gecko.GeckoService
Categories
(Firefox for Android Graveyard :: General, defect, P2)
Tracking
(firefox63+ wontfix, firefox64 wontfix, firefox65 fixed, firefox66+ fixed)
People
(Reporter: marcia, Assigned: snorp)
References
Details
(Keywords: regression, Whiteboard: [geckoview:p2])
Attachments
(2 files)
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-beta+
|
Details | Review |
Seen while looking at Google Play information - executing service org.mozilla.firefox/org.mozilla.gecko.GeckoService comes up as affecting users. It seems in the past we had a similar bug in the 51 cycle, Bug 1347875 which was eventually resolved invalid. Similar to Bug 1501449, the affected devices seem to be all Samsung, but not all are running Android 8.0.
Updated•6 years ago
|
Comment 1•6 years ago
|
||
Susheel, can we get this bug prioritized and assigned? Thanks
Sure, using the new prioritization per :lizzard. Marcia let me know if you think this should be more than a P2.
Comment 3•6 years ago
|
||
Sylvestre, this is hard for the team to investigate if they can't see the Play STore reports. Can we give access at some level for Petru Lingurar, Vlad Baicu, and Andrei Lazar? Thanks!
Comment 4•6 years ago
|
||
Done! Next time, please open a separate bug for this as this isn't related to the ANR issue.
Updated•6 years ago
|
Comment 5•6 years ago
|
||
Interestingly, all such ANRs started to appear October 2nd and just in 62.0.3 So I think it is a regression introduced by one of this patches https://hg.mozilla.org/releases/mozilla-release/pushloghtml?fromchange=FIREFOX_62_0_2_RELEASE&tochange=FIREFOX_62_0_3_RELEASE
Comment 6•6 years ago
|
||
When GeckoService() is destroyed we block it's onDestroy() until Gecko handles OnPause() [1] The issue is that the standard ANR timeout expires before that happens. In GPC I see that main is not blocked but it waits for that Gecko call to finish. And then I see that the timeout expired on GeckoLoader.nativeRun() > "Gecko" prio=5 tid=13 Native > | group="main" sCount=1 dsCount=0 flags=1 obj=0x134008a8 self=0xdbdb1e00 > | sysTid=5993 nice=0 cgrp=default sched=0/0 handle=0xc8b48970 > | state=S schedstat=( 6406333549 1738291105 9010 ) utm=562 stm=78 core=1 HZ=100 > | stack=0xc8a46000-0xc8a48000 stackSize=1038KB > | held mutexes= > #00 pc 0000000000018eac /system/lib/libc.so (syscall+28) > #01 pc 000000000004846d /system/lib/libc.so (_ZL24__pthread_cond_timedwaitP23pthread_cond_internal_tP15pthread_mutex_tbPK8timespec+102) > #02 pc 0000000000052b53 /data/app/org.mozilla.firefox-8GK6Ju2ILM1wm2ZiMN9TjQ==/lib/arm/libmozglue.so (???) > at org.mozilla.gecko.mozglue.GeckoLoader.nativeRun (Native method) > at org.mozilla.gecko.GeckoThread.run (GeckoThread.java:498) [1] https://dxr.mozilla.org/mozilla-central/rev/b3da3f53f8042d6e2e8f90cd0086e354d96ba2fc/mobile/android/base/java/org/mozilla/gecko/GeckoService.java#106
Comment 7•6 years ago
|
||
There's a massive drop in this ANR's occurrences after October 8th, the date 63 was pushed to all users. This ANR does not show on latest release, all it's occurrences being still tied to the 62.0.3 release. I think this may have been resolved with the latest release but will keep the ticket open to continue tracking it.
Comment 8•6 years ago
|
||
Petru will follow up and make sure this is no longer showing up in 63.
Comment 9•6 years ago
|
||
Wontfix for 63 because of comment #7 + we are unlikely to have another dot release before 64 ships in 3 weeks from now.
Updated•6 years ago
|
Comment 10•6 years ago
|
||
That ANR signature doesn't appear for 63 anymore as the code was refactored so the ANR migrated to: > executing service org.mozilla.firefox/org.mozilla.gecko.GeckoServicesCreatorService same underlying source, > #01 pc 0000000000047b37 /system/lib/libc.so >(_ZL24__pthread_cond_timedwaitP23pthread_cond_internal_tP15pthread_mutex_tbPK8timespec+102) > > #02 pc 000000000005138b /data/app/org.mozilla.firefox-Z4u94PeFNgRgfv6QoB49Ew==/lib/arm/libmozglue.so (???) > > at org.mozilla.gecko.GeckoThread.waitOnGecko (Native method) > > at org.mozilla.gecko.GeckoService.onDestroy (GeckoService.java:106) As I understand, that GeckoThread.waitOnGecko() just adds a new event to be processed after the previous call on GeckoThread finishes, effectively blocking the current thread - Main, until previous event - GeckoThread.onPause()[1] is handled. James, can someone from Gecko take a look at why GeckoThread.onPause() takes so long to complete? [1] https://dxr.mozilla.org/mozilla-central/source/mobile/android/base/java/org/mozilla/gecko/GeckoService.java#101
Assignee | ||
Comment 11•6 years ago
|
||
The Gecko thread is blocking on the UI thread for some reason. We have a few spots where we have to do this, and therefore it's not good to block the UI thread on Gecko. We should try to remove the waitForGecko() calls if able. Maybe we could add a timeout to waitForGecko()?
Assignee | ||
Updated•6 years ago
|
Comment 12•6 years ago
|
||
That waitOnGecko() was introduced by bug 1260243 > ... so that Gecko is more likely to be in a consistent state if Android kills our process so I think removing it is not ideal. Tried to add a timeout (cannot block for more than 5 seconds [1]) on just Java side but a proper solution would involve modifying the native method which I currently don't know how to. Unassigning and NIing James to find a more suitable owner. [1] https://developer.android.com/training/articles/perf-anr
Updated•6 years ago
|
Assignee | ||
Comment 14•5 years ago
|
||
All of the current usage can survive a timeout, and we'd rather that than a deadlock. Future code that does want to risk a deadlock can call `GeckoThread.waitOnGeckoForever` instead.
Comment 15•5 years ago
|
||
Pushed by jwillcox@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/9bca860f26dc Make GeckoThread.waitOnGecko() time out by default. r=geckoview-reviewers,esawin
Comment 16•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/9bca860f26dc
Comment 17•5 years ago
|
||
Maybe worth considering for a Beta uplift after it's had some Nightly bake time.
Assignee | ||
Comment 19•5 years ago
|
||
According to the play store this isn't fixed.
Assignee | ||
Comment 20•5 years ago
|
||
All of the stacks I've seen on the play store have the Gecko thread waiting in VsyncSource.java:51. We can probably just make that part async.
Updated•5 years ago
|
Assignee | ||
Comment 21•5 years ago
|
||
Updated•5 years ago
|
Comment 22•5 years ago
|
||
Pushed by jwillcox@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d3b95f808e28 Avoid synchronous wait when creating VsyncSource r=geckoview-reviewers,droeh#geckoview-reviewers
Comment 23•5 years ago
|
||
bugherder |
Hi James, could you please nominate this patch for uplift to Beta65 given https://bugzilla.mozilla.org/show_bug.cgi?id=1518285#c2? Thanks!
Assignee | ||
Comment 25•5 years ago
|
||
Comment on attachment 9034470 [details]
Bug 1501748 - Avoid synchronous wait when creating VsyncSource r?#geckoview-reviewers
[Beta/Release Uplift Approval Request]
Feature/Bug causing the regression: Bug 1432019
User impact if declined: Occassional hangs in Fennec/GeckoView
Is this code covered by automated tests?: Yes
Has the fix been verified in Nightly?: No
Needs manual test from QE?: No
If yes, steps to reproduce: We don't have reliable STR.
List of other uplifts needed: None
Risk to taking this patch: Medium
Why is the change risky/not risky? (and alternatives if risky): Well-understood path, but does involve some thread interaction issues.
String changes made/needed: None
Comment 26•5 years ago
|
||
Comment on attachment 9034470 [details]
Bug 1501748 - Avoid synchronous wait when creating VsyncSource r?#geckoview-reviewers
[Triage Comment]
The Fennec 65 ANR rate is super high at the moment. I'm going to uplift this and respin 65.0b9 so we can get some quick feedback on impact. We can also spin a b10 later this week if this patch proves insufficient.
Comment 27•5 years ago
|
||
bugherder uplift |
Comment 28•5 years ago
|
||
https://hg.mozilla.org/projects/cedar/rev/d3b95f808e2878849127b8f4fc43527202eecf2a Bug 1501748 - Avoid synchronous wait when creating VsyncSource r=geckoview-reviewers,droeh#geckoview-reviewers
Updated•3 years ago
|
Description
•