Closed Bug 2033229 Opened 15 days ago Closed 14 days ago

Crash in [@ mozilla.appservices.viaduct.ViaductException$SetBackendException: at mozilla.appservices.viaduct.FfiConverterTypeViaductError.read(viaduct.kt:21)]

Categories

(Focus :: General, defect)

Unspecified
Android
defect

Tracking

(relnote-firefox 150+, firefox149 unaffected, firefox150+ fixed, firefox151+ fixed)

RESOLVED FIXED
151 Branch
Tracking Status
relnote-firefox --- 150+
firefox149 --- unaffected
firefox150 + fixed
firefox151 + fixed

People

(Reporter: dholbert, Assigned: jonalmeida)

Details

(Keywords: crash)

Crash Data

Attachments

(2 files)

Crash report: https://crash-stats.mozilla.org/report/index/3dae114e-7ad2-43f0-97c2-3a1e90260418

Top 10 frames:

0  mozilla.appservices.viaduct.FfiConverterTypeViaductError  read  viaduct.kt:21
1  mozilla.appservices.viaduct.FfiConverterTypeViaductError  read  viaduct.kt:1
2  mozilla.appservices.viaduct.FfiConverter  liftFromRustBuffer  viaduct.kt:14
3  mozilla.appservices.viaduct.FfiConverterTypeViaductError  liftFromRustBuffer  viaduct.kt:2
4  mozilla.appservices.viaduct.FfiConverterTypeViaductError  liftFromRustBuffer  viaduct.kt:1
5  mozilla.appservices.viaduct.FfiConverterRustBuffer  lift  viaduct.kt:2
6  mozilla.appservices.viaduct.FfiConverterTypeViaductError  lift  viaduct.kt:3
7  mozilla.appservices.viaduct.FfiConverterTypeViaductError  lift  viaduct.kt:2
8  mozilla.appservices.viaduct.ViaductException$ErrorHandler  lift  viaduct.kt:2
9  mozilla.appservices.viaduct.ViaductException$ErrorHandler  lift  viaduct.kt:1

This crash signature has started spiking in the past 2-3 days; there are essentially zero crashes before 3 days ago, and then 100-300 crashes per day for the past 2-3 days.

I've hit twice myself.  Both times I hit it, I wasn't actually using Focus; I just noticed an Android notification telling me that Focus had crashed.

A few debugging notes I wrote for myself that I'll leave here to make future me less sad.

  • Q. Is the cause related to using applicationScope?
    • Bug 2003023 - Refactor manual CoroutineScope usage to lifecycle-aware scopes
    • No, because we've been setting the RustHttpConfig from an IO dispatcher since 2021 (see: link)
  • Sentry has some more symbolicated crash stats, like this one that suggests it's a remote settings worker syncing.
    • On Jan 19th it seems like the events doubled since the previous day, but this is untrue - Sentry error reporting only goes back 90 days.
      • Goes back at least as far as 147.
  • Q. Why does this not occur on Fenix?
    • Fenix guards against this by blocking the onCreate from finishing until megazordDeferred is complete.
    • We should do similar work to bug 2005839 for Focus as well.
  • Q. Why did this occur now?
    • We are calling RustHttpConfig.setClient on a background dispatcher that races with the work manager remote settings task.
    • Bug 2020606 removed eager initialization of WorkManager - this meant that we were getting a full Application#onCreate initialization happening first before the work manager task was scheduled.
  • I was here before: https://github.com/mozilla-mobile/focus-android/issues/5781
Assignee: nobody → jonalmeida942
Status: NEW → ASSIGNED

This should be safe to do because setClient doesn't perform any tasks
when the reference to the client is set. Removing this from the IO
coroutine ensures we do this work immediately.

I opted for this solution instead of awaiting on the coroutine task
similar to FenixApplication, because there is a Nimbus initialization
task which performs a network fetch and the implications of that are bit
larger than I would like for crash fix; this needs to be uplift-able.

Future works should be done similar to FenixApplication where we make
the startup flow a lot more obvious - see Bug 2005839.

Dianna, can we pause the 150 release for Focus until we can get this patch uplifted? The crash rate is already quite high at the current state of the rollout at 5%.

Flags: needinfo?(dsmith)
Status: ASSIGNED → RESOLVED
Closed: 14 days ago
Resolution: --- → FIXED
Target Milestone: --- → 151 Branch

firefox-beta Uplift Approval Request

  • User impact if declined/Reason for urgency: This stops a background crash from occurring almost every day for Focus 150 users. Currently at this moment, the Focus 150 release is at 5% of our user base on the Play Store with 178 crash counts so far, which is quite high.

QE validation is not straight-forward, so it's not recommended - having Focus running at least once and then wait 24 hours may reproduce the crash but since it's a race that would not be verification enough.

We would be better off observing the crash data instead.

  • Code covered by automated testing?: no
  • Fix verified in Nightly?: no
  • Needs manual QE testing?: no
  • Steps to reproduce for manual QE testing:
  • Risk associated with taking this patch: low
  • Explanation of risk level: The patch takes away the asynchronous nature in the startup process and makes it synchronous for simplicity.
  • String changes made/needed?: No.
  • Is Android affected?: yes
Attachment #9571473 - Flags: approval-mozilla-beta?

This should be safe to do because setClient doesn't perform any tasks
when the reference to the client is set. Removing this from the IO
coroutine ensures we do this work immediately.

I opted for this solution instead of awaiting on the coroutine task
similar to FenixApplication, because there is a Nimbus initialization
task which performs a network fetch and the implications of that are bit
larger than I would like for crash fix; this needs to be uplift-able.

Future works should be done similar to FenixApplication where we make
the startup flow a lot more obvious - see Bug 2005839.

Original Revision: https://phabricator.services.mozilla.com/D295156

(In reply to Phabricator Automation from comment #6)

Currently at this moment, the Focus 150 release is at 5% of our user base on the Play Store with 178 crash counts so far, which is quite high

Much more than 178 crashes, right now at least - crash-stats shows that there have been 1118 crashes so far in 150.0 release (tapping the crash signaturest the top of this bug, and then "aggregations" and aggregating by version).

Focus/Klar 150.0 have been halted at 5%.

Flags: needinfo?(dsmith)
Attachment #9571473 - Flags: approval-mozilla-beta? → approval-mozilla-release?
Attachment #9571473 - Flags: approval-mozilla-release? → approval-mozilla-release+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: