Closed Bug 1556083 Opened 5 years ago Closed 5 years ago

High ANR rate in 68.0b5 for the arm32 apk

Categories

(Firefox for Android Graveyard :: General, defect, P1)

Firefox 68
Unspecified
Android
defect

Tracking

(firefox67 unaffected, firefox67.0.1 unaffected, firefox68+ verified, firefox69+ verified)

VERIFIED FIXED
Firefox 69
Tracking Status
firefox67 --- unaffected
firefox67.0.1 --- unaffected
firefox68 + verified
firefox69 + verified

People

(Reporter: marcia, Assigned: petru)

References

Details

(Keywords: regression, reproducible)

Attachments

(2 files)

While reviewing the GPC I noticed that there was a notification that the ANR rate was above the accepted threshold. Given the fact that we are on the cusp of going to ESR mode, I thought it was important to file and track this issue.

The top issues in the cluster are:

  • Input dispatching timed out (org.mozilla.firefox_beta/org.mozilla.gecko.BrowserApp, Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago. Wait queue length: 2. Wait queue head age: 12660.8ms.)
  • Broadcast of Intent { act=android.intent.action.SCREEN_OFF flg=0x50200010 (has extras) }

Assigning P1 for investigation.

Priority: -- → P1

It looks as if 2015630993 (68.0) is the version that has the high ANR rate (not 2015630995 (68.0), so we may be OK here assume the second one is the latest version.

Both of those are 68.0b5, 2015630993 is the version code for the arm32 apk, 2015630995 is the arm64 one (while 2015630997 is x86_32 and 2015630999 is x86_64)

Petru or Andrei - If you are able to see info in GPS, any ideas on what might be going with the arm32 apk?

Flags: needinfo?(petru.lingurar)
Flags: needinfo?(andrei.a.lazar)
Summary: High ANR rate in 68.0b5 → High ANR rate in 68.0b5 for the arm32 apk

Hello,
Tried to investigate this but we don't currently have access to beta's developer console.
Filed bug 1556439 for that.

Flags: needinfo?(petru.lingurar)
Flags: needinfo?(andrei.a.lazar)
Attached file anr.txt

Hi!
I can reproduce this issue on Beta 68.0b7 with Motorola Nexus 6 (Android 7.1.1) by clearing data and restarting Fennec.

Keywords: reproducible

I checked the Play console and this issue is still present. Petru - Have you been able to take a look since you now have access? Thanks.

Flags: needinfo?(petru.lingurar)

On it.

Assignee: nobody → petru.lingurar
Status: NEW → ASSIGNED
Flags: needinfo?(petru.lingurar)

Classic deadlock situation possible because getDatabaseHelperForProfile(..)
would lock on [PerProfileDatabase] and then try to on [GeckoProfile] while at
the same time it would be possible for another thread which already had the
[GeckoProfile] lock to call this method and so try to acquire the
[PerProfileDatabase] lock.

The simplest solution to resolve this and the one I went with is to ensure that
one of those threads will not need both locks and it turns out that the
getDatabaseHelperForProfile method can easily be refactored to use only the
GeckoProfile lock, change which would not significantly increase the block of
code synchronized with the same key.

Turns out that in both cases "main" was trying to queuePersistAllTabs() and for this it had to wait for the GeckoProfile lock which was already part of a deadlock between

"GeckoBackgroundThread" had a GeckoProfile lock and needed the PerProfileDatabases lock
"A background thread" had a PerProfileDatabases lock and needed the GeckoProfile lock.

Keywords: checkin-needed

We'll want to uplift this deadlock fix to Fennec 68, though we might need to postpone the uplift until after 68 ESR has been branched.

Pushed by jcristau@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2cada2586c93
Resolve deadlock by using just one lock, not two; r=VladBaicu

Keywords: checkin-needed
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 69

[Tracking Requested - why for this release]:

We will want to uplift this Fennec crash fix to the ESR 68 branch for the Fennec 68.1 release.

Although I don't think it's a recent regression but just a fluke that affected some, in bug 1554660 I think Eliza and Mira could reproduce this same issue fairly often.
Could you please test to see if that problem is resolved?

Flags: qe-verify+

(In reply to Chris Peterson [:cpeterson] from comment #15)

[Tracking Requested - why for this release]:

We will want to uplift this Fennec crash fix to the ESR 68 branch for the Fennec 68.1 release.

To be honest I'd almost consider it for 68.0... Petru do you think this is safe enough?

Flags: needinfo?(petru.lingurar)

The provided solution was simple enough without changing any logic so I don't see a risk of regressions.
The ANRs are pretty bad so the sooner we can resolve them, the better experience our users will have.
Was thinking of waiting for a validation from QA for bug 1554660 which I think had this same cause and with that we'll have even more reasons and confidence to uplift this.

Flags: needinfo?(petru.lingurar)

Eliza, you said you could reproduce this in comment 6, can you check again with the latest 69 apk off mozilla-central?

Flags: needinfo?(eliza.balazs)

Hi!
I tested this with the latest 69 apk from mozilla-central, Nightly 69.0a1 (2019-06-26) with Motorola Nexus 6 (Android 7.1.1), Sony Xperia Z5 Premium (Android 7.1.1), Motorola Moto G6 (Android 8) and I could not reproduce the issue.
Due to my findings I will mark this as verified on Firefox 69.
Thanks!

Flags: qe-verify+
Flags: needinfo?(eliza.balazs)

Comment on attachment 9073521 [details]
Bug 1556083 - Resolve deadlock by using just one lock, not two; r?VladBaicu

Beta/Release Uplift Approval Request

  • User impact if declined: Potential "Application Not Responding"
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce: Clean start of the app
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Not risky as it is a very small change, verified by QA.
  • String changes made/needed:
Attachment #9073521 - Flags: approval-mozilla-beta?
Flags: qe-verify+

Comment on attachment 9073521 [details]
Bug 1556083 - Resolve deadlock by using just one lock, not two; r?VladBaicu

fix for increased ANR rate, approved for beta68

Attachment #9073521 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Hello, I can confirm that the issue is not reproducible on Beta 68.0b14 using Motorola Nexus 6 (Android 7.1.1) and Motorola Moto G6 (Android 8). Due to my findings, I will mark this as verified.
Thanks!

Status: RESOLVED → VERIFIED
Flags: qe-verify+

Commenting here since this was brought up in the Channel meeting. Since this issue, it doesn't appear that we have had another warning about a high ANR rate. I do still see some instances of Input dispatching timed out errors in the current production, but we seem to have those in every production release.

Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: