Closed Bug 1611770 Opened 4 months ago Closed 4 months ago

Crash in [@ core::option::expect_failed | chrono::offset::local::tm_to_datetime]

Categories

(Data Platform and Tools :: Glean: SDK, defect)

Unspecified
Windows 8
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: gsvelto, Unassigned)

Details

(Keywords: crash, Whiteboard: [telemetry:glean-rs:m11])

Crash Data

This bug is for crash report bp-d46acdc6-0696-498d-874f-9fbe70200125.

Top 10 frames of crashing thread:

0 xul.dll RustMozCrash mozglue/static/rust/wrappers.cpp:16
1 xul.dll mozglue_static::panic_hook mozglue/static/rust/lib.rs:89
2 xul.dll core::ops::function::Fn::call<fn src/libcore/ops/function.rs:227
3 xul.dll std::panicking::rust_panic_with_hook src/libstd/panicking.rs:477
4 xul.dll std::panicking::continue_panic_fmt src/libstd/panicking.rs:380
5 xul.dll std::panicking::rust_begin_panic src/libstd/panicking.rs:307
6 xul.dll core::panicking::panic_fmt src/libcore/panicking.rs:85
7 xul.dll core::option::expect_failed src/libcore/option.rs:1190
8 xul.dll chrono::offset::local::tm_to_datetime third_party/rust/chrono/src/offset/local.rs:19
9 xul.dll static chrono::offset::local::Local::now third_party/rust/chrono/src/offset/local.rs:92

This is an odd crash but probably valid. It's happening only on nightly with the oldest affected build being 20200117094453. The crashes seem all to be coming from the same machine which under normal condition I would ignore as flaky hardware or a corrupted installation. However we're hitting this assertion in Rust code deep into a Glean call.

Since this seems dependent on the machine's time configuration it might be a valid bug - I once hit a bug in the preferences which was triggered only with a specific mixed locale so it wouldn't be all that strange.

Jan-Erik, you mentioned you had found the culprit for this: would you mind adding a comment here pointing to where it fails and why? Bonus points for a possible fix :-)

Component: Telemetry → Glean: SDK
Flags: needinfo?(jrediger)
Product: Toolkit → Data Platform and Tools
Whiteboard: [telemetry:glean-rs:m11]

In my quest to track this down I traced it down to the old time crate and its Windows implementation of getting the local time.

Unfortunately that's where my exploration ended. chrono itself (the time/date crate we're using) has a very generous check for the timezone to be within [-24h, +24h].
That leds me to believe something really weird going on (or some data corruption along the way).

I don't have a good way to fix this right now, as this is a crash when getting the initial local time (Local::now()) early on and chrono doesn't have a fallible function for that.

  • We could try to upstream a fallible version to chrono, not sure how likely that is to be accepted. Seems like a very good case where a panic is appropriate.
  • We could catch_unwind the whole initialization to avoid crashes down the line. Now this is all a prototype, where catching and ignoring the panic (and disabling fogotype further on) is fine and we might rather keep Firefox running for users.

(Worth to call out: the prototype is only running on Nightly and thus this crash will not affect release at all)

Flags: needinfo?(jrediger)

Closing this down as WONTFIX.
We identified the code that's causing it, but have no way to reproduce nor catch that at the moment (as per above).
Crashes didn't occur again for some time and we're about to disable the fogotype for now anyway.

Once we bring back Glean we might want to take another look to see if we can (or want) to handle it.

Status: NEW → RESOLVED
Closed: 4 months ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.