Closed Bug 1501748 Opened 6 years ago Closed 5 years ago

ANR executing service org.mozilla.firefox/org.mozilla.gecko.GeckoService

Categories

(Firefox for Android Graveyard :: General, defect, P2)

Firefox 63
Unspecified
Android
defect

Tracking

(firefox63+ wontfix, firefox64 wontfix, firefox65 fixed, firefox66+ fixed)

RESOLVED FIXED
Firefox 66
Tracking Status
firefox63 + wontfix
firefox64 --- wontfix
firefox65 --- fixed
firefox66 + fixed

People

(Reporter: marcia, Assigned: snorp)

References

Details

(Keywords: regression, Whiteboard: [geckoview:p2])

Attachments

(2 files)

Seen while looking at Google Play information - executing service org.mozilla.firefox/org.mozilla.gecko.GeckoService comes up as affecting users. It seems in the past we had a similar bug in the 51 cycle, Bug 1347875 which was eventually resolved invalid.

Similar to Bug 1501449, the affected devices seem to be all Samsung, but not all are running Android 8.0.
Susheel, can we get this bug prioritized and assigned? Thanks
Flags: needinfo?(sdaswani)
Sure, using the new prioritization per :lizzard. Marcia let me know if you think this should be more than a P2.
Flags: needinfo?(sdaswani)
Priority: -- → P2
Sylvestre, this is hard for the team to investigate if they can't see the Play STore reports.  Can we give access at some level for Petru Lingurar, Vlad Baicu, and Andrei Lazar? Thanks!
Flags: needinfo?(sledru)
Done! Next time, please open a separate bug for this as this isn't related to the ANR issue.
Flags: needinfo?(sledru)
Assignee: nobody → petru.lingurar
Status: NEW → ASSIGNED
Interestingly, all such ANRs started to appear October 2nd and just in 62.0.3
So I think it is a regression introduced by one of this patches https://hg.mozilla.org/releases/mozilla-release/pushloghtml?fromchange=FIREFOX_62_0_2_RELEASE&tochange=FIREFOX_62_0_3_RELEASE
When GeckoService() is destroyed we block it's onDestroy() until Gecko handles OnPause() [1]
The issue is that the standard ANR timeout expires before that happens.
In GPC I see that main is not blocked but it waits for that Gecko call to finish.
And then I see that the timeout expired on GeckoLoader.nativeRun()
> "Gecko" prio=5 tid=13 Native
>        | group="main" sCount=1 dsCount=0 flags=1 obj=0x134008a8 self=0xdbdb1e00
>        | sysTid=5993 nice=0 cgrp=default sched=0/0 handle=0xc8b48970
>        | state=S schedstat=( 6406333549 1738291105 9010 ) utm=562 stm=78 core=1 HZ=100
>        | stack=0xc8a46000-0xc8a48000 stackSize=1038KB
>        | held mutexes=
>  #00  pc 0000000000018eac  /system/lib/libc.so (syscall+28)
>  #01  pc 000000000004846d  /system/lib/libc.so (_ZL24__pthread_cond_timedwaitP23pthread_cond_internal_tP15pthread_mutex_tbPK8timespec+102)
>  #02  pc 0000000000052b53  /data/app/org.mozilla.firefox-8GK6Ju2ILM1wm2ZiMN9TjQ==/lib/arm/libmozglue.so (???)
>        at org.mozilla.gecko.mozglue.GeckoLoader.nativeRun (Native method)
>        at org.mozilla.gecko.GeckoThread.run (GeckoThread.java:498)

[1] https://dxr.mozilla.org/mozilla-central/rev/b3da3f53f8042d6e2e8f90cd0086e354d96ba2fc/mobile/android/base/java/org/mozilla/gecko/GeckoService.java#106
There's a massive drop in this ANR's occurrences after October 8th, the date 63 was pushed to all users.
This ANR does not show on latest release, all it's occurrences being still tied to the 62.0.3 release.

I think this may have been resolved with the latest release but will keep the ticket open to continue tracking it.
Petru will follow up and make sure this is no longer showing up in 63.
Wontfix for 63 because of comment #7 + we are unlikely to have another dot release before 64 ships in 3 weeks from now.
That ANR signature doesn't appear for 63 anymore as the code was refactored so the ANR migrated to:
> executing service org.mozilla.firefox/org.mozilla.gecko.GeckoServicesCreatorService
same underlying source,
>  #01  pc 0000000000047b37  /system/lib/libc.so >(_ZL24__pthread_cond_timedwaitP23pthread_cond_internal_tP15pthread_mutex_tbPK8timespec+102)
> 
> #02  pc 000000000005138b  /data/app/org.mozilla.firefox-Z4u94PeFNgRgfv6QoB49Ew==/lib/arm/libmozglue.so (???)
> 
>  at org.mozilla.gecko.GeckoThread.waitOnGecko (Native method)
> 
>  at org.mozilla.gecko.GeckoService.onDestroy (GeckoService.java:106)

As I understand, that GeckoThread.waitOnGecko() just adds a new event to be processed after the previous call on GeckoThread finishes, effectively blocking the current thread - Main, until previous event - GeckoThread.onPause()[1] is handled.

James, can someone from Gecko take a look at why GeckoThread.onPause() takes so long to complete?

[1] https://dxr.mozilla.org/mozilla-central/source/mobile/android/base/java/org/mozilla/gecko/GeckoService.java#101
Flags: needinfo?(snorp)
The Gecko thread is blocking on the UI thread for some reason. We have a few spots where we have to do this, and therefore it's not good to block the UI thread on Gecko. We should try to remove the waitForGecko() calls if able. Maybe we could add a timeout to waitForGecko()?
Flags: needinfo?(snorp)
Whiteboard: [geckoview]
That waitOnGecko() was introduced by bug 1260243
> ... so that Gecko is more likely to be in a consistent state if Android kills our process
so I think removing it is not ideal.

Tried to add a timeout (cannot block for more than 5 seconds [1]) on just Java side but a proper solution would involve modifying the native method which I currently don't know how to.

Unassigning and NIing James to find a more suitable owner.

[1] https://developer.android.com/training/articles/perf-anr
Flags: needinfo?(snorp)
Assignee: petru.lingurar → snorp
Whiteboard: [geckoview] → [geckoview:p2]
I'm working on a patch here.
Flags: needinfo?(snorp)
All of the current usage can survive a timeout, and we'd rather
that than a deadlock. Future code that does want to risk a
deadlock can call `GeckoThread.waitOnGeckoForever` instead.
Pushed by jwillcox@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9bca860f26dc
Make GeckoThread.waitOnGecko() time out by default. r=geckoview-reviewers,esawin
https://hg.mozilla.org/mozilla-central/rev/9bca860f26dc
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 66
Maybe worth considering for a Beta uplift after it's had some Nightly bake time.
Want to request uplift to beta?
Flags: needinfo?(snorp)
According to the play store this isn't fixed.
Status: RESOLVED → REOPENED
Flags: needinfo?(snorp)
Resolution: FIXED → ---
All of the stacks I've seen on the play store have the Gecko thread waiting in VsyncSource.java:51. We can probably just make that part async.
Pushed by jwillcox@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d3b95f808e28
Avoid synchronous wait when creating VsyncSource r=geckoview-reviewers,droeh#geckoview-reviewers
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED

Hi James, could you please nominate this patch for uplift to Beta65 given https://bugzilla.mozilla.org/show_bug.cgi?id=1518285#c2? Thanks!

Flags: needinfo?(snorp)

Comment on attachment 9034470 [details]
Bug 1501748 - Avoid synchronous wait when creating VsyncSource r?#geckoview-reviewers

[Beta/Release Uplift Approval Request]

Feature/Bug causing the regression: Bug 1432019

User impact if declined: Occassional hangs in Fennec/GeckoView

Is this code covered by automated tests?: Yes

Has the fix been verified in Nightly?: No

Needs manual test from QE?: No

If yes, steps to reproduce: We don't have reliable STR.

List of other uplifts needed: None

Risk to taking this patch: Medium

Why is the change risky/not risky? (and alternatives if risky): Well-understood path, but does involve some thread interaction issues.

String changes made/needed: None

Flags: needinfo?(snorp)
Attachment #9034470 - Flags: approval-mozilla-beta?

Comment on attachment 9034470 [details]
Bug 1501748 - Avoid synchronous wait when creating VsyncSource r?#geckoview-reviewers

[Triage Comment]
The Fennec 65 ANR rate is super high at the moment. I'm going to uplift this and respin 65.0b9 so we can get some quick feedback on impact. We can also spin a b10 later this week if this patch proves insufficient.

Attachment #9034470 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
https://hg.mozilla.org/projects/cedar/rev/d3b95f808e2878849127b8f4fc43527202eecf2a
Bug 1501748 - Avoid synchronous wait when creating VsyncSource r=geckoview-reviewers,droeh#geckoview-reviewers
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: