Closed Bug 1611770 Opened 5 years ago Closed 4 years ago

Crash in [@ core::option::expect_failed | chrono::offset::local::tm_to_datetime]

Categories

(Data Platform and Tools :: Glean: SDK, defect, P2)

Unspecified
Windows 8
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gsvelto, Assigned: janerik)

References

Details

(Keywords: crash, Whiteboard: [telemetry:glean-rs:m11])

Crash Data

Attachments

(3 files)

This bug is for crash report bp-d46acdc6-0696-498d-874f-9fbe70200125.

Top 10 frames of crashing thread:

0 xul.dll RustMozCrash mozglue/static/rust/wrappers.cpp:16
1 xul.dll mozglue_static::panic_hook mozglue/static/rust/lib.rs:89
2 xul.dll core::ops::function::Fn::call<fn src/libcore/ops/function.rs:227
3 xul.dll std::panicking::rust_panic_with_hook src/libstd/panicking.rs:477
4 xul.dll std::panicking::continue_panic_fmt src/libstd/panicking.rs:380
5 xul.dll std::panicking::rust_begin_panic src/libstd/panicking.rs:307
6 xul.dll core::panicking::panic_fmt src/libcore/panicking.rs:85
7 xul.dll core::option::expect_failed src/libcore/option.rs:1190
8 xul.dll chrono::offset::local::tm_to_datetime third_party/rust/chrono/src/offset/local.rs:19
9 xul.dll static chrono::offset::local::Local::now third_party/rust/chrono/src/offset/local.rs:92

This is an odd crash but probably valid. It's happening only on nightly with the oldest affected build being 20200117094453. The crashes seem all to be coming from the same machine which under normal condition I would ignore as flaky hardware or a corrupted installation. However we're hitting this assertion in Rust code deep into a Glean call.

Since this seems dependent on the machine's time configuration it might be a valid bug - I once hit a bug in the preferences which was triggered only with a specific mixed locale so it wouldn't be all that strange.

Jan-Erik, you mentioned you had found the culprit for this: would you mind adding a comment here pointing to where it fails and why? Bonus points for a possible fix :-)

Component: Telemetry → Glean: SDK
Flags: needinfo?(jrediger)
Product: Toolkit → Data Platform and Tools
Whiteboard: [telemetry:glean-rs:m11]

In my quest to track this down I traced it down to the old time crate and its Windows implementation of getting the local time.

Unfortunately that's where my exploration ended. chrono itself (the time/date crate we're using) has a very generous check for the timezone to be within [-24h, +24h].
That leds me to believe something really weird going on (or some data corruption along the way).

I don't have a good way to fix this right now, as this is a crash when getting the initial local time (Local::now()) early on and chrono doesn't have a fallible function for that.

  • We could try to upstream a fallible version to chrono, not sure how likely that is to be accepted. Seems like a very good case where a panic is appropriate.
  • We could catch_unwind the whole initialization to avoid crashes down the line. Now this is all a prototype, where catching and ignoring the panic (and disabling fogotype further on) is fine and we might rather keep Firefox running for users.

(Worth to call out: the prototype is only running on Nightly and thus this crash will not affect release at all)

Flags: needinfo?(jrediger)

Closing this down as WONTFIX.
We identified the code that's causing it, but have no way to reproduce nor catch that at the moment (as per above).
Crashes didn't occur again for some time and we're about to disable the fogotype for now anyway.

Once we bring back Glean we might want to take another look to see if we can (or want) to handle it.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX

This crash had been low frequency but the crash rate has increased since bug 1698518 landed (11 crashes with yesterday's builds). Jan-Erik, please check if this issue needs action.

Flags: needinfo?(jrediger)

Well, I can confidently say that bug 1698518 can't be the cause, because the code is not yet used anywhere.
I'm still baffled that this occurs, but I will see if I see anything more than last time. Leaving the ni? on for now.

I spent a bit of time looking into the data.
From all I can see right now one of these clients reports an offset of either 0 or 60 minutes to UTC. Which is completely valid and shouldn't cause a crash.

So something happened when that crash hits.

Reopening for another look.

Assignee: nobody → jrediger
Status: RESOLVED → REOPENED
Flags: needinfo?(jrediger)
Priority: -- → P2
Resolution: WONTFIX → ---
Blocks: 1704748
Attachment #9215377 - Flags: data-review?(chutten)

Comment on attachment 9215377 [details]
data_review_request-1611770.txt

DATA COLLECTION REVIEW RESPONSE:

Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?

Yes.

Is there a control mechanism that allows the user to turn the data collection on and off?

Yes. This collection is Glean. Glean-embedding projects supply opt-out mechanisms.

If the request is for permanent data collection, is there someone who will monitor the data over time?

No. This collection will expire by 2021-06-30.

Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 1, Technical.

Is the data collection request for default-on or default-off?

Default on for all channels.

Does the instrumentation include the addition of any new identifiers?

No.

Is the data collection covered by the existing Firefox privacy notice?

Yes.

Does there need to be a check-in in the future to determine whether to renew the data?

Yes. Jan-Erik is responsible for renewing or removing the collection before it expires by 2021-06-30.


Result: datareview+

Attachment #9215377 - Flags: data-review?(chutten) → data-review+
Status: REOPENED → RESOLVED
Closed: 5 years ago4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: