Closed Bug 1682638 Opened 4 years ago Closed 4 years ago

Frequent Beta Crash in [@ core::option::expect_failed | core::ops::function::FnOnce::call_once<T>]

Categories

(Toolkit :: Telemetry, defect, P1)

defect

Tracking

()

VERIFIED FIXED
86 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox83 --- unaffected
firefox84 --- unaffected
firefox85 blocking verified
firefox86 --- verified

People

(Reporter: aryx, Assigned: janerik)

Details

(Keywords: crash, regression, topcrash)

Crash Data

Attachments

(2 files)

This only affects Firefox 85.0b1 so far. It's frequent.

Crash report: https://crash-stats.mozilla.org/report/index/13e1daf2-1669-48ff-9114-90f0f0201215

MOZ_CRASH Reason: Global Glean object not initialized

Top 10 frames of crashing thread:

0 xul.dll RustMozCrash mozglue/static/rust/wrappers.cpp:16
1 xul.dll mozglue_static::panic_hook mozglue/static/rust/lib.rs:89
2 xul.dll core::ops::function::Fn::call<fn ../7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/core/src/ops/function.rs:227
3 xul.dll std::panicking::rust_panic_with_hook ../7eac88abb2e57e752f3302f02be5f3ce3d7adfb4//library/std/src/panicking.rs:581
4 xul.dll std::panicking::begin_panic_handler::{{closure}} ../7eac88abb2e57e752f3302f02be5f3ce3d7adfb4//library/std/src/panicking.rs:484
5 xul.dll std::sys_common::backtrace::__rust_end_short_backtrace<closure-0, !> ../7eac88abb2e57e752f3302f02be5f3ce3d7adfb4//library/std/src/sys_common/backtrace.rs:153
6 xul.dll std::panicking::begin_panic_handler ../7eac88abb2e57e752f3302f02be5f3ce3d7adfb4//library/std/src/panicking.rs:483
7 xul.dll core::panicking::panic_fmt ../7eac88abb2e57e752f3302f02be5f3ce3d7adfb4//library/core/src/panicking.rs:85
8 xul.dll core::option::expect_failed ../7eac88abb2e57e752f3302f02be5f3ce3d7adfb4//library/core/src/option.rs:1226
9 xul.dll core::ops::function::FnOnce::call_once<closure-0, tuple<>> ../7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/core/src/ops/function.rs:227
Flags: needinfo?(jrediger)
Flags: needinfo?(chutten)
Flags: needinfo?(jrediger) → needinfo?(alessio.placitelli)

#2 top crash so far in early 85.0b1 rollout

Jeepers, we need to get a better stack than this somehow.

Alrighty, so this crash can happen in one of three places: one in FOG and two in the Glean SDK.

FOG

FOG can expect and crash with that message within with_glean in the fog crate before init

From searchfox, calling with_glean (regardless of before or after init) is only done in datetimes, string lists, and timing distributions, none of which exist to be called see metrics.yaml.

Verdict: Likely isn't FOG

Glean SDK

Same thing, but both with_glean and with_glean_mut. A lot of these are dealt with by being launched by or happening after a block against the Dispatcher, which doesn't come out of its hole until after init.

There are a few exceptions, mostly inside named threads (like glean.uploader and glean.init), but the crashing thread in the report (37) is nameless.

There are a few suspicious cases around things like set_debug_view_tag set_source_tags and set_log_pings where we only check if initialize was called, not that initialize had completed and the global glean is present.

I'm at a loss. Jan-Erik and Alessio are the two I'd ask for help on this, and they'll be back tomorrow CET morning

Flags: needinfo?(chutten) → needinfo?(jrediger)

I think I know what's going on, looking at the other threads:

Thread 0 is calling glean::shutdown(), so all this is happening when the browser is being shutdown and we quite correctly try to shutdown Glean.
I assume what is happening is that the browser starts, FOG is not intialized (for whatever reason), then we try to shut down, which unblocks the queue before it is actually ending. If the queue contains any recordings they will try to access the Glean object.

What we should do if we never flushed is to not flush the queue now but skip it. This will require some changes to the dispatcher, I'm taking a look now.

Flags: needinfo?(jrediger)
Assignee: nobody → jrediger
Priority: -- → P1
Crash Signature: [@ core::option::expect_failed | core::ops::function::FnOnce::call_once<T>] → [@ core::option::expect_failed | core::ops::function::FnOnce::call_once<T>] [@ core::option::expect_failed | core::ops::function::FnOnce::call_once{{vtable.shim}}]

Hello! It seems that I can reproduce the crash on Windows 10x64/x86 with Firefox 85.0b2 (20201215185920) by restarting or closing Firefox while opened with this profile: link

This is one of the crash reports: link

(In reply to Alexandru Trif, QA [:atrif] from comment #4)

Hello! It seems that I can reproduce the crash on Windows 10x64/x86 with Firefox 85.0b2 (20201215185920) by restarting or closing Firefox while opened with this profile: link

This is one of the crash reports: link

Thanks, that will be helpful! Was this profile newly created or did that profile exist before upgrading to 85 beta 2?

Flags: needinfo?(alessio.placitelli) → needinfo?(alexandru.trif)

(In reply to Jan-Erik Rediger [:janerik] (Away 2020-12-21 to 2021-01-04) from comment #5)

(In reply to Alexandru Trif, QA [:atrif] from comment #4)

Hello! It seems that I can reproduce the crash on Windows 10x64/x86 with Firefox 85.0b2 (20201215185920) by restarting or closing Firefox while opened with this profile: link

This is one of the crash reports: link

Thanks, that will be helpful! Was this profile newly created or did that profile exist before upgrading to 85 beta 2?

This was newly created today with 85.0b2 from what I can remeber and there was only some youtube navigation performed while testing PiP functionality on Windows 10x86. I observed this first time when I closed Firefox and then I saw that is reproducible while restarting it too. I then transferred the profile and used it on another two machines with Windows 10x64 and I could reproduce the crash. If more information is needed please let me know. Thank you!

Flags: needinfo?(alexandru.trif)

Thanks, that information and the profile was very helpful and I at least identified why this leads to crashes such as the above.
For some reason the database file is empty and that is not properly handled by FOG (why the database file is empty is another mystery I need to solve)

FWIW I'd like to get a fix or backout in beta today or tomorrow if at all possible, so we don't leave this crash there over the holiday break.

(In reply to Julien Cristau [:jcristau] from comment #9)

FWIW I'd like to get a fix or backout in beta today or tomorrow if at all possible, so we don't leave this crash there over the holiday break.

We're working on it. The above pull request was done yesterday and is reviewed, I also kicked off some try runs last night. I will now take this to m-c, preparing it to land and be uplifted then.

Thanks, much appreciated.

Comment on attachment 9193713 [details]
Bug 1682638 - Update to Glean v33.9.1. r?dexter

Beta/Release Uplift Approval Request

  • User impact if declined: Potential crashes on shutdown.

(it's about to land in Nightly, so it's only verified manually by me on a local build there for now)

  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The changes are covered by automated tests in Glean (external repository), I also did manual tests on a local build.
  • String changes made/needed:
Attachment #9193713 - Flags: approval-mozilla-beta?

Comment on attachment 9193713 [details]
Bug 1682638 - Update to Glean v33.9.1. r?dexter

topcrash fix, approved for 85.0b3

Attachment #9193713 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 86 Branch
QA Whiteboard: [qa-triaged]

Hello! Verified the issue using the attached profile on comment 4 with Firefox 85.0b3 (20201217185930) and 86.0a1 (20201217214927) on Windows 10x86/x64 and Windows7x64. Firefox is no longer crashing when closing/restarting while opened with the mentioned profile. On Windows 10x64 I have also updated from 85.0b2 to 85.0b3 while using the profile and restarted/closed Firefox after the update was applied and there were no crashes encountered with 85.0b3.

Status: RESOLVED → VERIFIED
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: