Crash in _ZN3std9panicking20rust_panic_with_hook17h5dd7da6bb3d06020E

RESOLVED WORKSFORME

Status

()

defect
P1
critical
RESOLVED WORKSFORME
3 years ago
2 years ago

People

(Reporter: philipp, Unassigned)

Tracking

({crash, regression})

50 Branch
x86
Windows 7
Points:
---

Firefox Tracking Flags

(firefox49 unaffected, firefox50 wontfix, firefox51 unaffected, firefox52 unaffected)

Details

(crash signature)

Attachments

(1 attachment)

Reporter

Description

3 years ago
This bug was filed from the Socorro interface and is 
report bp-1f1d41e2-e800-4230-884b-8a5842160925.
=============================================================
Crashing Thread (30)
Frame 	Module 	Signature 	Source
0 	xul.dll 	_ZN3std9panicking20rust_panic_with_hook17h5dd7da6bb3d06020E 	
1 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
2 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
3 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
4 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
5 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
6 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
7 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
8 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
9 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
10 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
11 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
12 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
13 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
14 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
15 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
16 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
17 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
18 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
19 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
20 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
21 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
22 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
23 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
24 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
25 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
26 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
27 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
28 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
29 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
30 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
31 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
32 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
33 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
34 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
35 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
36 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
37 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
38 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
39 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
40 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
41 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
42 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
43 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
44 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
45 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
46 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
47 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
48 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
49 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
50 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
51 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
52 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
53 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
54 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
55 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
56 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
57 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
58 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
59 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
60 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
61 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
62 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
63 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
64 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
65 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
66 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
67 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
68 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
69 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
70 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
71 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
72 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
73 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
74 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
75 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
76 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
77 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
78 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
79 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
80 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
81 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
82 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
83 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
84 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
85 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
86 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
87 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
88 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
89 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
1015 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
1016 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
1017 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
1018 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
1019 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
1020 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
1021 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
1022 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
1023 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE 	
1024 	xul.dll 	_ZN3std3ffi5c_str7CString4_new17heb6c4ccd5b4e425dE

crashes with this signature seem to be regressing since firefox 50 builds. they are happening in the content process and so far only crashes in en-us builds and on windows 7 were recorded. 
in beta 50.0b1 this is making up 1% of all content crashes at the moment.
Reporter

Comment 1

3 years ago
[@ _ZN3std10sys_common12thread_local9StaticKey9lazy_init17had14e48ac3fe2cacE ] seems to have quite a similar pattern
Crash Signature: [@ _ZN3std9panicking20rust_panic_with_hook17h5dd7da6bb3d06020E] → [@ _ZN3std9panicking20rust_panic_with_hook17h5dd7da6bb3d06020E] [@ _ZN3std10sys_common12thread_local9StaticKey9lazy_init17had14e48ac3fe2cacE ]
Reporter

Updated

3 years ago
Crash Signature: [@ _ZN3std9panicking20rust_panic_with_hook17h5dd7da6bb3d06020E] [@ _ZN3std10sys_common12thread_local9StaticKey9lazy_init17had14e48ac3fe2cacE ] → [@ _ZN3std9panicking20rust_panic_with_hook17h5dd7da6bb3d06020E] [@ _ZN3std10sys_common12thread_local9StaticKey9lazy_init17had14e48ac3fe2cacE ] [@ _ZN51_$LT$sys..rand..OsRng$u20$as$u20$core_rand..Rng$GT$10fill_bytes17hf70f94c364406ebdE ]
The failure mode we're manifesting here looks like a bug in the Rust standard library:
https://github.com/rust-lang/rust/issues/36749

We're failing to allocate some TLS, and panic'ing, and then trying to do that again in the Rust panic handler and recursively panic'ing until we hit stack overflow.
The root cause here seems to be "something used up all the available thread-local storage keys". MSDN says the limit is 1,088 per process: https://msdn.microsoft.com/en-us/library/ms686749.aspx

We call TlsAlloc in a bunch of places:
https://dxr.mozilla.org/mozilla-central/search?tree=mozilla-central&q=TlsAlloc

I don't see any obvious third-party software in the module list of that crash report, so I'm not sure we can blame it on someone else's buggy software.
David, can you tell us what happened here?
Flags: needinfo?(dmajor)
It looks like ted has mostly diagnosed this already.
Flags: needinfo?(dmajor)
https://github.com/rust-lang/rust/issues/36749#issuecomment-249673187 says: "EDIT: though throwing away capability to “dlopen” dlls made by Rust on WinXP sounds an okay tradeoff to me."

If that's the chosen fix, does it negatively affect Firefox? Do we have an alternative for fixing this in beta or should we accept these crashes in 50?
Flags: needinfo?(ted)
Fixing Rust would mean we'd have to pick up a new version of rust, and taking a nightly Rust on Firefox Beta doesn't sound like a good idea. I don't really know what's causing the leak here--I'm not sure there's a way to diagnose this from minidumps alone.

I think all the signatures listed here are in fact the same crash. It just happens to be a stack overflow, and we don't have good unwind info for the Rust frames, so whatever's at the top of the stack gets blamed, and the rest of the stack is garbage.

It's not entirely out of the question that Rust code is to blame here. We did switch from Rust 1.9 to 1.10 between Firefox 49 and 50:
https://hg.mozilla.org/releases/mozilla-release/file/FIREFOX_49_0_RELEASE/browser/config/tooltool-manifests/win32/releng.manifest#l9
https://hg.mozilla.org/releases/mozilla-beta/file/FIREFOX_50_0b1_RELEASE/browser/config/tooltool-manifests/win32/releng.manifest#l9
Flags: needinfo?(ted)
Ralph, if this is indeed something in Rust, could we perhaps switch back to the C++ MP4 demuxer? (I really have no idea what I'm talking about here in case it isn't clear)
Flags: needinfo?(giles)
(100.0% in signature vs 00.69% overall) reason = EXCEPTION_STACK_OVERFLOW
(100.0% in signature vs 28.39% overall) dom_ipc_enabled = 1
(40.19% in signature vs 05.02% overall) bios_manufacturer = Intel Corp.
(28.04% in signature vs 02.49% overall) build_id = 20160929120120 ∧ bios_manufacturer = Intel Corp.
(26.17% in signature vs 01.94% overall) platform_version = 6.1.7601 Service Pack 1 ∧ bios_manufacturer = Intel Corp.
(24.30% in signature vs 04.20% overall) cpu_info = GenuineIntel family 6 model 23 stepping 10 | 2 ∧ adapter_device_id = 0x2e32
(17.76% in signature vs 02.43% overall) platform_version = 6.1.7601 Service Pack 1 ∧ adapter_device_id = 0x2e32

Based on the correlations on the first crash sign, this is an e10s only problem.
Interesting about e10s, does that explain the sudden increase from beta.1 to beta.2 and .3?

If this is causing problems we can turn off the rust code in beta for win32, but that's not viable for later versions. We need the new demuxer for new features and to improve security. And note that the proposed upstream rust fix involves dropping winxp support. We need to get to the bottom of the leak.

If we could try reverting the toolchain update and build beta with the 1.9 (or 1.11 or 1.12) rust releases. We don't actually require 1.10 until Firefox 51 (bug 1268727). That won't help if it's a TLS leak from elsewhere in gecko, but if it removed the crash it would tell us it was a specific regression from the rust standard library up or code generation.
Flags: needinfo?(giles)
(In reply to Ralph Giles (:rillian) needinfo me from comment #10)
> If we could try reverting the toolchain update and build beta with the 1.9
> (or 1.11 or 1.12) rust releases. We don't actually require 1.10 until
> Firefox 51 (bug 1268727).

Can we do that, Ritu?
Flags: needinfo?(rkothari)
(In reply to Andrew Overholt [:overholt] from comment #11)
> (In reply to Ralph Giles (:rillian) needinfo me from comment #10)
> > If we could try reverting the toolchain update and build beta with the 1.9
> > (or 1.11 or 1.12) rust releases. We don't actually require 1.10 until
> > Firefox 51 (bug 1268727).
> 
> Can we do that, Ritu?

I would like to say yes but I want to understand what kind of positive and negative impact such a change could bring about in Beta50 release readiness? Ralph?
Flags: needinfo?(rkothari) → needinfo?(giles)
Comment hidden (mozreview-request)
Comment on attachment 8799029 [details]
Bug 1305315 - Revert win32 builders to rust 1.9.0.

Approval Request Comment

[Feature/regressing bug #]: 1305315

[User impact if declined]: Some users experience crashes, we think from thread-local storage leaks.

[Describe test coverage new/current, TreeHerder]: Pushed to try!

[Risks and why]: This reverts the rust toolchain update to the one which worked for firefox 49. That should be safe enough; if the code we have in-tree builds that should be sufficient to verify the change. This is experimenting on beta to try to stop a crash we can't reproduce elsewhere, which isn't ideal, but I think the risk is worthwhile.

[String/UUID change made/needed]: None
Flags: needinfo?(giles)
Attachment #8799029 - Flags: approval-mozilla-beta?

Comment 15

3 years ago
mozreview-review
Comment on attachment 8799029 [details]
Bug 1305315 - Revert win32 builders to rust 1.9.0.

https://reviewboard.mozilla.org/r/84326/#review82960
Attachment #8799029 - Flags: review?(ted) → review+
Comment on attachment 8799029 [details]
Bug 1305315 - Revert win32 builders to rust 1.9.0.

This seems like a low risk, reverting back to a rust version that is used in Release49. Beta50+
Attachment #8799029 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Reporter

Comment 19

3 years ago
a new [@ _ZN3std10sys_common6unwind18begin_unwind_inner17h39d40f52add53ef7E ] signature is showing up in 50.0b6 now which looks quite similar to what the reverting tried to address: https://crash-stats.mozilla.com/report/index/bc6bf520-85f8-43eb-a002-137a02161013
Flags: needinfo?(ted)
Flags: needinfo?(giles)
Hmm. This stack shows non-rust code before the loop in the rust standard library. Ted, does that affect your analysis?

The toolchain reversion does seem to have stopped another new crash in bug 1305250.
Flags: needinfo?(giles)
Reporter

Updated

3 years ago
Crash Signature: [@ _ZN3std9panicking20rust_panic_with_hook17h5dd7da6bb3d06020E] [@ _ZN3std10sys_common12thread_local9StaticKey9lazy_init17had14e48ac3fe2cacE ] [@ _ZN51_$LT$sys..rand..OsRng$u20$as$u20$core_rand..Rng$GT$10fill_bytes17hf70f94c364406ebdE ] → [@ _ZN3std9panicking20rust_panic_with_hook17h5dd7da6bb3d06020E] [@ _ZN3std10sys_common12thread_local9StaticKey9lazy_init17had14e48ac3fe2cacE ] [@ _ZN51_$LT$sys..rand..OsRng$u20$as$u20$core_rand..Rng$GT$10fill_bytes17hf70f94c364406ebdE ] [@ _ZN52_$LT$sy…
Component: General → Audio/Video: Playback
I don't have any other real ideas here, sorry.
Flags: needinfo?(ted)
Ralph, 
Do you have other ideas?
Flags: needinfo?(giles)
I don't know what might be causing this either. Sorry.
Flags: needinfo?(giles)
See Also: → 1320134
We may consider adding diagnostic code to gather more information so we can move this bug forward. The idea is that we interpose TlsAlloc() and record the latest 20 or so stacks calling TlsAlloc() in the crash report if the process is running out of TLS slots. Then we can symbolicate the stacks offline. We should see repeated patterns in the recorded stacks.
Are we seeing any variants of this signature on 51+? None of the ones in this bug appear to be showing up.
Flags: needinfo?(madperson)
Reporter

Comment 26

2 years ago
no, i don't see any signatures starting with _ZN* after firefox 50 either. should we close it as WFM?
Flags: needinfo?(madperson)
SGTM :)
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.