Closed Bug 1385971 Opened 7 years ago Closed 7 years ago

stylo: Crash in core::result::unwrap_failed<T> | std::collections::hash::map::{{impl}}::new::KEYS::__init | style::properties::cascade

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla58

Tracking Flags:

Tracking

Status

firefox-esr52

---

unaffected

firefox54

---

unaffected

firefox55

---

unaffected

firefox56

---

wontfix

firefox57

---

fixed

firefox58

---

fixed

People

(Reporter: calixte, Assigned: manishearth, NeedInfo)

References

Details

(Keywords: crash)

Crash Data

Attachments

(3 files)

Bug 1385971 - stylo: Use FnvHashMap everywhere, remove default HashMap construction methods; 7 years ago Manish Goregaokar [:manishearth] 59 bytes, text/x-review-board-request	bholley : review+	Details
Emilio's patch (for uplift) 7 years ago Manish Goregaokar [:manishearth] 26.62 KB, patch	ritu : approval-mozilla-beta+	Details \| Diff \| Splinter Review
My patch (for uplift) 7 years ago Manish Goregaokar [:manishearth] 14.01 KB, patch	ritu : approval-mozilla-beta+	Details \| Diff \| Splinter Review

Calixte Denizet (:calixte)

Reporter

Description

•

7 years ago

This bug was filed from the Socorro interface and is report bp-2a72f9aa-bb7c-42f5-ac45-0f7880170729. ============================================================= There is 1 crash in nightly 56 with buildid 20170729100254. The backtrace is showing that 'style::properties::cascade' is calling std::collections::hash::map. By the way this crash signature exists for non-stylo crashes.

Chris Peterson [:cpeterson]

Updated

•

7 years ago

Priority: -- → P2

Bobby Holley (:bholley)

Updated

•

7 years ago

Priority: P2 → --

Bobby Holley (:bholley)

Updated

•

7 years ago

Priority: -- → P3

Ralph Giles (:rillian)

Comment 1

•

7 years ago

FWIW, bug 1385976 has a crash report with the same signature starting in mp4parse_new, where I don't see any way the HashMap init machinery would be called, so I fear this is unrelated corruption.

Bobby Holley (:bholley)

Comment 2

•

7 years ago

(In reply to Ralph Giles (:rillian) | needinfo me from comment #1) > FWIW, bug 1385976 has a crash report with the same signature starting in > mp4parse_new, where I don't see any way the HashMap init machinery would be > called, so I fear this is unrelated corruption. Ok.

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → WORKSFORME

Chris Peterson [:cpeterson]

Comment 3

•

7 years ago

Reopening this bug and marking this bug as Stylo P1 because this is top crash #10 in Beta 56. I see 88 instances of this crash signature in Beta 56.0b1, but none from Nightly 57.0a1. https://crash-stats.mozilla.com/search/?signature=%3Dcore%3A%3Aresult%3A%3Aunwrap_failed%3CT%3E%20%7C%20std%3A%3Acollections%3A%3Ahash%3A%3Amap%3A%3A%7B%7Bimpl%7D%7D%3A%3Anew%3A%3AKEYS%3A%3A__init%20%7C%20style%3A%3Aproperties%3A%3Acascade&product=Firefox&date=%3E%3D2017-07-11T08%3A00%3A28.000Z&date=%3C2017-08-11T08%3A00%3A28.000Z&_sort=-date&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature

Status: RESOLVED → REOPENED

status-firefox57: --- → unaffected

Priority: P3 → P1

Resolution: WORKSFORME → ---

Summary: Stylo: Crash in core::result::unwrap_failed<T> | std::collections::hash::map::{{impl}}::new::KEYS::__init → stylo: Crash in core::result::unwrap_failed<T> | std::collections::hash::map::{{impl}}::new::KEYS::__init | style::properties::cascade

Bobby Holley (:bholley)

Comment 4

•

7 years ago

(In reply to Chris Peterson [:cpeterson] from comment #3) > Reopening this bug and marking this bug as Stylo P1 because this is top > crash #10 in Beta 56. I see 88 instances of this crash signature in Beta > 56.0b1, but none from Nightly 57.0a1. Is the stylo experiment running in beta 56?

Flags: needinfo?(cpeterson)

Chris Peterson [:cpeterson]

Comment 5

•

7 years ago

(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #4) > Is the stylo experiment running in beta 56? No. We haven't started the beta experiment yet. I see now that all 88 crash reports are from a single Windows 7 user. They must have manually set the layout.css.servo.enabled perf. Many of their crash URLs look like JIRA bug dashboards. I'll ask around to see if anyone has access to a JIRA system.

Crash Signature: [@ core::result::unwrap_failed<T> | std::collections::hash::map::{{impl}}::new::KEYS::__init] → [@ core::result::unwrap_failed<T> | std::collections::hash::map::{{impl}}::new::KEYS::__init [@ core::result::unwrap_failed<T> | std::collections::hash::map::{{impl}}::new::KEYS::__init | style::properties::cascade ]

Flags: needinfo?(cpeterson)

Priority: P1 → P3

Xidorn Quan [:xidorn] UTC+10

Comment 6

•

7 years ago

Doesn't seem to happen after build 20170813100233. Closing as WFM.

Status: REOPENED → RESOLVED

Closed: 7 years ago → 7 years ago

Resolution: --- → WORKSFORME

Chris Peterson [:cpeterson]

Comment 7

•

7 years ago

I still see a couple crash reports from more recent Nightly 57 builds, such as bp-110ace36-65b4-469b-902b-bffe10170830 from build id 20170829100404. But not enough instances to warrant reopening this bug at this time.

Calixte Denizet (:calixte)

Reporter

Comment 8

•

7 years ago

For information, there are 102 crashes for 5 installations in 57.0b0 (dev edition).

Chris Peterson [:cpeterson]

Comment 9

•

7 years ago

The 57.0b crashes are from build ids 20170917031738 and 20170921191414 on Windows 7. There are no crashes reported in Nightly 58 yet.

Crash Signature: [@ core::result::unwrap_failed<T> | std::collections::hash::map::{{impl}}::new::KEYS::__init [@ core::result::unwrap_failed<T> | std::collections::hash::map::{{impl}}::new::KEYS::__init | style::properties::cascade ] → [@ core::result::unwrap_failed<T> | std::collections::hash::map::{{impl}}::new::KEYS::__init] [@ core::result::unwrap_failed<T> | std::collections::hash::map::{{impl}}::new::KEYS::__init | style::properties::cascade ]

status-firefox56: affected → wontfix

status-firefox57: unaffected → ?

status-firefox58: --- → unaffected

Bobby Holley (:bholley)

Comment 10

•

7 years ago

These new reports all seem to have MOZ_CRASH_REASON: to create an OS RNG: Error { repr: Os { code: 127, message: "OS Error 127 (FormatMessageW() returned error 15100)" } }

Status: RESOLVED → REOPENED

Flags: needinfo?(manishearth)

Resolution: WORKSFORME → ---

Bobby Holley (:bholley)

Comment 11

•

7 years ago

If we're failing to open an RNG filehandle or something, we should consider failing gracefully somehow and just not reseeding rather than bringing down the whole process.

Manish Goregaokar [:manishearth]

Assignee

Comment 12

•

7 years ago

Ok, so what's happening here is that the OS RNG returned some error (we don't know what). This triggered a panic, and in attempting to get more error information from the OS we get ERROR_MUI_FILE_NOT_FOUND, which means that the language resources (l10n files likely) for the system language are somehow broken, which also triggered a panic. This is not a double-panic situation since the second panic is triggered before the first panic actually starts, in the Display implementation of the error (which is called in attempting to create a string to pass to the panic). I don't know how to fix this, but we can fix up the Windows error handling stuff in Rust to display the original error code when this double-error situation happens. Sound good?

Manish Goregaokar [:manishearth]

Assignee

Comment 13

•

7 years ago

Yeah, we're failing to somehow init an OS RNG instance. The stdlib doesn't expose an error state for this, and hashglobe doesn't actually fork RandomState (we reuse the stdlib one).

Flags: needinfo?(manishearth)

Manish Goregaokar [:manishearth]

Assignee

Comment 14

•

7 years ago

Hm, wait, error code 127 is ERROR_PROC_NOT_FOUND. That's ... weird.

Bobby Holley (:bholley)

Comment 15

•

7 years ago

unique crash reasons from the reports: failed to create an OS RNG: Error { repr: Os { code: 127, message: "The specified procedure could not be found." } } failed to create an OS RNG: Error { repr: Os { code: 127, message: "OS Error 127 (FormatMessageW() returned error 15100)" } } failed to create an OS RNG: Error { repr: Os { code: -2146893818, message: "OS Error -2146893818 (FormatMessageW() returned error 15100)" } } failed to create an OS RNG: Error { repr: Os { code: -2146893818, message: "Firma no válida." } } failed to create an OS RNG: Error { repr: Os { code: -2146893801, message: "Provider type not defined." } } failed to create an OS RNG: Error { repr: Os { code: -2146893795, message: "Библиотека поставщика проинициализирована неправильно." } }

Bobby Holley (:bholley)

Comment 16

•

7 years ago

NI Manish to write up his analysis here per discussion in the meeting.

Flags: needinfo?(manishearth)

Manish Goregaokar [:manishearth]

Assignee

Comment 17

•

7 years ago

So my analysis of error 127 was basically "it shouldn't happen", that error occurs when you request a nonexistant API. Similarly, the other error has to do with the provider not existing. However in OsRng we *always* pass down PROV_RSA_FULL and that's a preexisting provider (not user-defined) so it should always exist. Basically this feels a lot like the DLL is corrupted somehow. The Go folks had similar troubles and also chalked it up to DLL corruption on the target system (https://github.com/golang/go/issues/15655). What's interesting is that Firefox seems to not have had such crashes before? (or maybe it has, I can't find anything on socorro) Perhaps we don't report the error as clearly (or we ignore it? looking into this) Since it's very likely a corrupted DLL, I'd probably wontfix this unless it gets more frequent. The fix involves forking RandomState as well and making all of its APIs fallible too, which won't be nice.

Flags: needinfo?(manishearth)

Chris Peterson [:cpeterson]

Comment 18

•

7 years ago

We've had periodic problems with Firefox startup crashes in calls to the Windows API rand_s() because bad third-party software injected DLL hooks into advapi32.dll (e.g. bug 694344, bug 1167248, bug 1369361). OsRng calls CryptGenRandom() in advapi32.dll, so maybe we're hitting a similar problem here. I looked for suspicious third-party DLLs in a few of these crash reports' module lists, but I didn't see any DLLs in common that stood out. If we do find a problem DLL, we can add it to our DLL blocklist.

status-firefox57: ? → affected

status-firefox58: unaffected → affected

status-firefox-esr52: --- → unaffected

OS: Windows 10 → Windows

Chris Peterson [:cpeterson]

Comment 19

•

7 years ago

Now that Beta 57.0b3 is live, we're seeing more of these __init crashes. There are 477 crashes from 17 installations of 57.0b3. I'm bumping this bug's priority from P3 to P2 because it is now Beta top crash #17.

Priority: P3 → P2

Bobby Holley (:bholley)

Comment 20

•

7 years ago

Could we add a separate codepath to fallibly try to create an OS RNG, and if that fails, set a flag on hashglobe to skip the random seed?

Flags: needinfo?(manishearth)

Bobby Holley (:bholley)

Comment 21

•

7 years ago

(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #20) > Could we add a separate codepath to fallibly try to create an OS RNG, and if > that fails, set a flag on hashglobe to skip the random seed? (To be clear, I'm talking about doing this at startup)

Manish Goregaokar [:manishearth]

Assignee

Comment 22

•

7 years ago

We could, yes. A global atomic flag I presume? I'm also going to file a Rust bug about this, this seems pretty bad.

Flags: needinfo?(manishearth)

Bobby Holley (:bholley)

Comment 23

•

7 years ago

(In reply to Manish Goregaokar [:manishearth] from comment #22) > We could, yes. A global atomic flag I presume? Yep. If we load/store it Relaxed it should be free. > I'm also going to file a Rust bug about this, this seems pretty bad. Yeah, definitely worth doing.

Manish Goregaokar [:manishearth]

Assignee

Comment 24

•

7 years ago

I have interviews tomorrow, but will try to get to this as soon as i can.

Flags: needinfo?(manishearth)

Bobby Holley (:bholley)

Updated

•

7 years ago

Assignee: nobody → manishearth

Manish Goregaokar [:manishearth]

Assignee

Comment 25

•

7 years ago

Filed https://github.com/rust-lang/rust/issues/44911

Manish Goregaokar [:manishearth]

Assignee

Comment 26

•

7 years ago

FWIW I tried the approach of forking RandomState first because if it's actually easy I'd prefer to do that over using global statics. It's not easy. The problem is that while it's easy to fork RandomState and have it use hardcoded "random"[1] values when OsRng fails, the BuildHasher implementation on RandomState can't be forked, because it needs the ability to initialize a SipHasher13 with its random state. SipHasher13 is also an unstable/private API. We could fork SipHasher13 and DefaultHasher as well, but that's a lot of forking. Alternatively we could fork DefaultHasher only and switch it over to using SipHasher24, which for some reason does have a stable API. I believe both hashing algorithms are fine for this purpose but one may be slower than the other; and in this case the penalty would be borne by all. So I'll stick to using the OsRng. [1]: https://xkcd.com/221/

Flags: needinfo?(manishearth)

Manish Goregaokar [:manishearth]

Assignee

Comment 27

•

7 years ago

> So I'll stick to using the OsRng. er, stick to using the "trigger OsRng in Servo_Initialize() with a once-init static" solution

Comment hidden (mozreview-request)

Manish Goregaokar [:manishearth]

Assignee

Comment 29

•

7 years ago

Really this patch should remove the new() functions entirely so when we unfork HashMap we'll still use FNV. I'll make this change after emilio's patch for fixing the property hashset lands.

Bobby Holley (:bholley)

Comment 30

•

7 years ago

mozreview-review

Comment on attachment 8913848 [details] Bug 1385971 - stylo: Use FnvHashMap everywhere, remove default HashMap construction methods; https://reviewboard.mozilla.org/r/185240/#review190244 Per IRC discussion, we're just going to remove the default here.

Attachment #8913848 - Flags: review?(bobbyholley) → review-

Chris Peterson [:cpeterson]

Updated

•

7 years ago

Comment 31

•

7 years ago

Note that even though this hasn't landed, https://github.com/servo/servo/pull/18679 should have removed the offending code.

Comment hidden (mozreview-request)

Bobby Holley (:bholley)

Comment 33

•

7 years ago

mozreview-review

Comment on attachment 8913848 [details] Bug 1385971 - stylo: Use FnvHashMap everywhere, remove default HashMap construction methods; https://reviewboard.mozilla.org/r/185240/#review190670

Attachment #8913848 - Flags: review?(bobbyholley) → review+

Manish Goregaokar [:manishearth]

Assignee

Comment 34

•

7 years ago

https://github.com/servo/servo/pull/18712 We'll want to uplift both, all of these seem to be in paths that will eventually get called; the cascade() one just gets called first.

Manish Goregaokar [:manishearth]

Assignee

Comment 35

•

7 years ago

Emilio's patch: https://hg.mozilla.org/mozilla-central/rev/7e790d5276a1 My patch: https://hg.mozilla.org/integration/autoland/rev/be0e197531f2

Manish Goregaokar [:manishearth]

Assignee

Comment 36

•

7 years ago

Attached patch Emilio's patch (for uplift) — Details — Splinter Review

Approval Request Comment [Feature/Bug causing the regression]: None [User impact if declined]: May crash on windows sometimes [Is this code covered by automated tests?]: No [Has the fix been verified in Nightly?]: No (but the crashing code is gone) [Needs manual test from QE? If yes, steps to reproduce]: No [List of other uplifts needed for the feature/fix]: The other patch attached here [Is the change risky?]: May cause minor changes in performance, probably positive [Why is the change risky/not risky?]: [String changes made/needed]: None

Attachment #8915011 - Flags: approval-mozilla-beta?

Manish Goregaokar [:manishearth]

Assignee

Comment 37

•

7 years ago

Attached patch My patch (for uplift) — Details — Splinter Review

Attachment #8915012 - Flags: approval-mozilla-beta?

Henrik Skupin [:whimboo][⌚️UTC+2]

Updated

•

7 years ago

Blocks: 1404647

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 38

•

7 years ago

Can we mark this bug as fixed now given the comment 35?

Flags: needinfo?(manishearth)

Manish Goregaokar [:manishearth]

Assignee

Comment 39

•

7 years ago

https://hg.mozilla.org/mozilla-central/rev/be0e197531f2 yep

Status: REOPENED → RESOLVED

Closed: 7 years ago → 7 years ago

Flags: needinfo?(manishearth)

Resolution: --- → FIXED

Henrik Skupin [:whimboo][⌚️UTC+2]

Updated

•

7 years ago

status-firefox58: affected → fixed

Target Milestone: --- → mozilla58

Bobby Holley (:bholley)

Comment 40

•

7 years ago

(In reply to Manish Goregaokar [:manishearth] from comment #35) > Emilio's patch: https://hg.mozilla.org/mozilla-central/rev/7e790d5276a1 Note to relman - I really want the above in the next beta, since it may help us diagnose some other unrelated crashes.

Bobby Holley (:bholley)

Updated

•

7 years ago

Flags: needinfo?(rkothari)

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 41

•

7 years ago

Comment on attachment 8915011 [details] [diff] [review] Emilio's patch (for uplift) Crash fix in Stylo code, Beta57+

Flags: needinfo?(rkothari)

Attachment #8915011 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Updated

•

7 years ago

Attachment #8915012 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 42

•

7 years ago

Hi Stuart, Kevin, Joel (from the monitoring/perf team), there is a potential perf impact (mostly positive). Just FYI in case you notice something on the talos tests.

Flags: needinfo?(sphilp)

Flags: needinfo?(kbrosnan)

Flags: needinfo?(jmaher)

Ryan VanderMeulen [:RyanVM]

Comment 43

•

7 years ago

The uplift patches attached don't apply cleanly to Beta. Please rebase.

Flags: needinfo?(manishearth)

Bobby Holley (:bholley)

Comment 44

•

7 years ago

I'll do the uplift.

Flags: needinfo?(manishearth)

Bobby Holley (:bholley)

Comment 45

•

7 years ago

uplift

remote: https://hg.mozilla.org/releases/mozilla-beta/rev/54d1b49d5ddf187cb180d48dd67c6d9c77759477 remote: https://hg.mozilla.org/releases/mozilla-beta/rev/3a3dd3b7cfc609f10c603073cd07949a5b7b9472

status-firefox57: affected → fixed

Joel Maher ( :jmaher ) (UTC -8)

Comment 46

•

7 years ago

thanks- we will keep an eye out for this.

Flags: needinfo?(jmaher)

Kevin Brosnan [Ex-Mozilla]

Updated

•

7 years ago

Flags: needinfo?(kbrosnan)

You need to log in before you can comment on or make changes to this bug.