Crash in [@ mozilla.appservices.viaduct.ViaductException$SetBackendException: at mozilla.appservices.viaduct.FfiConverterTypeViaductError.read(viaduct.kt:21)]
Categories
(Focus :: General, defect)
Tracking
(relnote-firefox 150+, firefox149 unaffected, firefox150+ fixed, firefox151+ fixed)
People
(Reporter: dholbert, Assigned: jonalmeida)
Details
(Keywords: crash)
Crash Data
Attachments
(2 files)
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
phab-bot
:
approval-mozilla-release+
|
Details | Review |
Crash report: https://crash-stats.mozilla.org/report/index/3dae114e-7ad2-43f0-97c2-3a1e90260418
Top 10 frames:
0 mozilla.appservices.viaduct.FfiConverterTypeViaductError read viaduct.kt:21
1 mozilla.appservices.viaduct.FfiConverterTypeViaductError read viaduct.kt:1
2 mozilla.appservices.viaduct.FfiConverter liftFromRustBuffer viaduct.kt:14
3 mozilla.appservices.viaduct.FfiConverterTypeViaductError liftFromRustBuffer viaduct.kt:2
4 mozilla.appservices.viaduct.FfiConverterTypeViaductError liftFromRustBuffer viaduct.kt:1
5 mozilla.appservices.viaduct.FfiConverterRustBuffer lift viaduct.kt:2
6 mozilla.appservices.viaduct.FfiConverterTypeViaductError lift viaduct.kt:3
7 mozilla.appservices.viaduct.FfiConverterTypeViaductError lift viaduct.kt:2
8 mozilla.appservices.viaduct.ViaductException$ErrorHandler lift viaduct.kt:2
9 mozilla.appservices.viaduct.ViaductException$ErrorHandler lift viaduct.kt:1
This crash signature has started spiking in the past 2-3 days; there are essentially zero crashes before 3 days ago, and then 100-300 crashes per day for the past 2-3 days.
I've hit twice myself. Both times I hit it, I wasn't actually using Focus; I just noticed an Android notification telling me that Focus had crashed.
Updated•15 days ago
|
| Assignee | ||
Comment 1•15 days ago
•
|
||
| important | ||
A few debugging notes I wrote for myself that I'll leave here to make future me less sad.
- Q. Is the cause related to using
applicationScope?- Bug 2003023 - Refactor manual CoroutineScope usage to lifecycle-aware scopes
- No, because we've been setting the RustHttpConfig from an IO dispatcher since 2021 (see: link)
- Sentry has some more symbolicated crash stats, like this one that suggests it's a remote settings worker syncing.
- On Jan 19th it seems like the events doubled since the previous day, but this is untrue - Sentry error reporting only goes back 90 days.
- Goes back at least as far as 147.
- On Jan 19th it seems like the events doubled since the previous day, but this is untrue - Sentry error reporting only goes back 90 days.
- Q. Why does this not occur on Fenix?
- Fenix guards against this by blocking the
onCreatefrom finishing untilmegazordDeferredis complete. - We should do similar work to bug 2005839 for Focus as well.
- Fenix guards against this by blocking the
- Q. Why did this occur now?
- We are calling
RustHttpConfig.setClienton a background dispatcher that races with the work manager remote settings task. - Bug 2020606 removed eager initialization of WorkManager - this meant that we were getting a full
Application#onCreateinitialization happening first before the work manager task was scheduled.
- We are calling
- I was here before: https://github.com/mozilla-mobile/focus-android/issues/5781
| Assignee | ||
Comment 2•15 days ago
|
||
This should be safe to do because setClient doesn't perform any tasks
when the reference to the client is set. Removing this from the IO
coroutine ensures we do this work immediately.
I opted for this solution instead of awaiting on the coroutine task
similar to FenixApplication, because there is a Nimbus initialization
task which performs a network fetch and the implications of that are bit
larger than I would like for crash fix; this needs to be uplift-able.
Future works should be done similar to FenixApplication where we make
the startup flow a lot more obvious - see Bug 2005839.
| Assignee | ||
Comment 3•15 days ago
|
||
Dianna, can we pause the 150 release for Focus until we can get this patch uplifted? The crash rate is already quite high at the current state of the rollout at 5%.
Comment 5•14 days ago
|
||
| bugherder | ||
Comment 6•13 days ago
|
||
firefox-beta Uplift Approval Request
- User impact if declined/Reason for urgency: This stops a background crash from occurring almost every day for Focus 150 users. Currently at this moment, the Focus 150 release is at 5% of our user base on the Play Store with 178 crash counts so far, which is quite high.
QE validation is not straight-forward, so it's not recommended - having Focus running at least once and then wait 24 hours may reproduce the crash but since it's a race that would not be verification enough.
We would be better off observing the crash data instead.
- Code covered by automated testing?: no
- Fix verified in Nightly?: no
- Needs manual QE testing?: no
- Steps to reproduce for manual QE testing:
- Risk associated with taking this patch: low
- Explanation of risk level: The patch takes away the asynchronous nature in the startup process and makes it synchronous for simplicity.
- String changes made/needed?: No.
- Is Android affected?: yes
| Assignee | ||
Comment 7•13 days ago
|
||
This should be safe to do because setClient doesn't perform any tasks
when the reference to the client is set. Removing this from the IO
coroutine ensures we do this work immediately.
I opted for this solution instead of awaiting on the coroutine task
similar to FenixApplication, because there is a Nimbus initialization
task which performs a network fetch and the implications of that are bit
larger than I would like for crash fix; this needs to be uplift-able.
Future works should be done similar to FenixApplication where we make
the startup flow a lot more obvious - see Bug 2005839.
Original Revision: https://phabricator.services.mozilla.com/D295156
| Reporter | ||
Comment 8•13 days ago
|
||
(In reply to Phabricator Automation from comment #6)
Currently at this moment, the Focus 150 release is at 5% of our user base on the Play Store with 178 crash counts so far, which is quite high
Much more than 178 crashes, right now at least - crash-stats shows that there have been 1118 crashes so far in 150.0 release (tapping the crash signaturest the top of this bug, and then "aggregations" and aggregating by version).
Updated•13 days ago
|
Updated•10 days ago
|
Updated•10 days ago
|
Comment 10•10 days ago
|
||
| uplift | ||
Updated•6 days ago
|
Description
•