Closed Bug 1411250 Opened 7 years ago Closed 7 years ago

stylo: Intermittent thread 'StyleThread#3' panicked at 'weak_rng: failed to create seeded RNG: Error { repr: Os { code: 5, message: "Input/output error" } }', /builds/worker/workspace/build/src/third_party/rust/rand/src/lib.rs:898:18

Categories

(Core :: Layout, defect, P2)

Unspecified
Linux
defect

Tracking

()

RESOLVED FIXED
mozilla58
Tracking Status
firefox-esr52 --- unaffected
firefox56 --- unaffected
firefox57 --- wontfix
firefox58 --- fixed

People

(Reporter: aryx, Assigned: manishearth)

References

Details

(Keywords: intermittent-failure)

Attachments

(2 files)

https://treeherder.mozilla.org/logviewer.html#?job_id=139160909&repo=mozilla-inbound

[task 2017-10-24T12:12:42.802Z] 12:12:42     INFO - TEST-START | /css/compositing-1/mix-blend-mode/mix-blend-mode-blended-element-interposed.html
[task 2017-10-24T12:12:42.842Z] 12:12:42     INFO - PID 2805 | ++DOCSHELL 0xc4f3d400 == 5 [pid = 2805] [id = {cdb2828c-d6b0-4af2-a46b-c33934e87afb}]
[task 2017-10-24T12:12:42.843Z] 12:12:42     INFO - PID 2805 | ++DOMWINDOW == 10 (0xc4f3d800) [pid = 2805] [serial = 10] [outer = (nil)]
[task 2017-10-24T12:12:42.843Z] 12:12:42     INFO - PID 2805 | ++DOMWINDOW == 11 (0xc4f3dc00) [pid = 2805] [serial = 11] [outer = 0xc4f3d800]
[task 2017-10-24T12:12:42.865Z] 12:12:42     INFO - PID 2805 | [Parent 2805, Main Thread] WARNING: Cannot set transparency mode on non-popup windows.: file /builds/worker/workspace/build/src/widget/gtk/nsWindow.cpp, line 4371
[task 2017-10-24T12:12:42.881Z] 12:12:42     INFO - PID 2805 | ++DOCSHELL 0xe63e6400 == 1 [pid = 2903] [id = {0e8e5e83-3bc3-41e5-977f-1e15be76f507}]
[task 2017-10-24T12:12:42.922Z] 12:12:42     INFO - PID 2805 | [Parent 2805, Main Thread] WARNING: Cannot set transparency mode on non-popup windows.: file /builds/worker/workspace/build/src/widget/gtk/nsWindow.cpp, line 4371
[task 2017-10-24T12:12:42.946Z] 12:12:42     INFO - PID 2805 | ++DOMWINDOW == 1 (0xe63e9000) [pid = 2903] [serial = 1] [outer = (nil)]
[task 2017-10-24T12:12:43.083Z] 12:12:43     INFO - PID 2805 | ++DOMWINDOW == 2 (0xe1d81400) [pid = 2903] [serial = 2] [outer = 0xe63e9000]
[task 2017-10-24T12:12:43.361Z] 12:12:43     INFO - PID 2805 | 1508847163357	Marionette	DEBUG	Register listener.js for window 4294967297
[task 2017-10-24T12:12:43.397Z] 12:12:43     INFO - PID 2805 | ++DOMWINDOW == 3 (0xe0fdc400) [pid = 2903] [serial = 3] [outer = 0xe63e9000]
[task 2017-10-24T12:12:43.438Z] 12:12:43     INFO - PID 2805 | Sandbox: Unexpected EOF, op 0 flags 02100000 path /dev/urandom
[task 2017-10-24T12:12:43.439Z] 12:12:43    ERROR - PID 2805 | thread 'StyleThread#3' panicked at 'weak_rng: failed to create seeded RNG: Error { repr: Os { code: 5, message: "Input/output error" } }', /builds/worker/workspace/build/src/third_party/rust/rand/src/lib.rs:898:18
[task 2017-10-24T12:12:43.439Z] 12:12:43     INFO - PID 2805 | stack backtrace:
[task 2017-10-24T12:12:43.459Z] 12:12:43     INFO - PID 2805 | 1508847163457	Marionette	INFO	Testing http://web-platform.test:8000/css/compositing-1/mix-blend-mode/mix-blend-mode-blended-element-interposed.html == http://web-platform.test:8000/css/compositing-1/mix-blend-mode/reference/green-square.html
[task 2017-10-24T12:12:43.724Z] 12:12:43     INFO - PID 2805 |    0: 0xf32c9f3a - std::sys::imp::backtrace::tracing::imp::unwind_backtrace::hfc7985b08e763a82
[task 2017-10-24T12:12:43.725Z] 12:12:43     INFO - PID 2805 |    1: 0xf32c4d62 - std::sys_common::backtrace::_print::h16a1db02a59ead63
[task 2017-10-24T12:12:43.725Z] 12:12:43     INFO - PID 2805 |    2: 0xf32d5d1c - std::panicking::default_hook::{{closure}}::h48ecee46f2eefc30
[task 2017-10-24T12:12:43.725Z] 12:12:43     INFO - PID 2805 |    3: 0xf32d5ac1 - std::panicking::default_hook::hb4c92ae8d005ca44
[task 2017-10-24T12:12:43.726Z] 12:12:43     INFO - PID 2805 |    4: 0xf276e687 - gkrust_shared::install_rust_panic_hook::{{closure}}::h8b31b5ba7b6976df
[task 2017-10-24T12:12:43.726Z] 12:12:43     INFO - PID 2805 |    5: 0xf32d6270 - std::panicking::rust_panic_with_hook::h25d461655d60b1a5
[task 2017-10-24T12:12:43.726Z] 12:12:43     INFO - PID 2805 |    6: 0xf32d6093 - std::panicking::begin_panic::h0f6fdd9abfd7dfb9
[task 2017-10-24T12:12:43.727Z] 12:12:43     INFO - PID 2805 |    7: 0xf32d6016 - std::panicking::begin_panic_fmt::ha31e26b280c9e878
[task 2017-10-24T12:12:43.728Z] 12:12:43     INFO - PID 2805 |    8: 0xf3270657 - rand::weak_rng::hc1689b96c6c9cd22
[task 2017-10-24T12:12:43.729Z] 12:12:43     INFO - PID 2805 |    9: 0xf32603fc - rayon_core::registry::main_loop::hb070a6087af3fa65
[task 2017-10-24T12:12:43.729Z] 12:12:43     INFO - PID 2805 |   10: 0xf3258ca6 - rayon_core::registry::Registry::new::{{closure}}::h7b2afcb252ce7aa8
[task 2017-10-24T12:12:43.730Z] 12:12:43     INFO - PID 2805 |   11: 0xf32547c0 - std::sys_common::backtrace::__rust_begin_short_backtrace::h7f3ef03104c43c4b
[task 2017-10-24T12:12:43.730Z] 12:12:43     INFO - PID 2805 |   12: 0xf325ccf0 - std::thread::Builder::spawn::{{closure}}::{{closure}}::hb68b67286a993380
[task 2017-10-24T12:12:43.731Z] 12:12:43     INFO - PID 2805 |   13: 0xf325c7f0 - <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h7d019497d6f9ef8f
[task 2017-10-24T12:12:43.731Z] 12:12:43     INFO - PID 2805 |   14: 0xf325502c - std::panicking::try::do_call::h39907cdd35f9e329
[task 2017-10-24T12:12:43.732Z] 12:12:43     INFO - PID 2805 |   15: 0xf32dad19 - <unknown>
[task 2017-10-24T12:12:43.732Z] 12:12:43     INFO - PID 2805 | Redirecting call to abort() to mozalloc_abort
[task 2017-10-24T12:12:43.732Z] 12:12:43     INFO - PID 2805 | 
[task 2017-10-24T12:12:43.733Z] 12:12:43     INFO - PID 2805 | Hit MOZ_CRASH() at /builds/worker/workspace/build/src/memory/mozalloc/mozalloc_abort.cpp:33
Flags: needinfo?(emilio)
Looks a lot like bug 1409444. Manish, Xidorn, any insight here? Input / Output error doesn't sound particularly enlightening...
Flags: needinfo?(xidorn+moz)
Flags: needinfo?(manishearth)
Flags: needinfo?(emilio)
Hmm. Looks like we need https://github.com/rust-lang-nursery/rand/issues/180 to happen after all.
Flags: needinfo?(manishearth)
I didn't involve a lot on the rand issue, so no idea.
Flags: needinfo?(xidorn+moz)
(In reply to Emilio Cobos Álvarez [:emilio] from comment #1)
> Looks a lot like bug 1409444. Manish, Xidorn, any insight here? Input /
> Output error doesn't sound particularly enlightening...

We've had similar test failures reading from /dev/urandom in the past, such as Fennec bug 1140806 and sandbox permission bug 995069. We should fix this, but I doubt many real Linux users will hit this problem.

Did these /dev/urandom errors only start after updating the rand crate version 0.3.17? IIUC, the Linux OsRng should prefer getrandom() over reading from /dev/urandom, so I don't know why we are hitting this error on Linux. Is OsRng's is_getrandom_available() broken on our Linux test machine?

https://github.com/rust-lang-nursery/rand/blob/master/src/os.rs

(In reply to Manish Goregaokar [:manishearth] from comment #2)
> Hmm. Looks like we need https://github.com/rust-lang-nursery/rand/issues/180
> to happen after all.

I submitted a PR to fix #180 a few days ago (with Windows in mind) that should avoid this error on Linux:

https://github.com/rust-lang-nursery/rand/pull/181
Blocks: stylo
OS: Unspecified → Linux
Priority: -- → P2
Summary: Intermittent thread 'StyleThread#3' panicked at 'weak_rng: failed to create seeded RNG: Error { repr: Os { code: 5, message: "Input/output error" } }', /builds/worker/workspace/build/src/third_party/rust/rand/src/lib.rs:898:18 → stylo: Intermittent thread 'StyleThread#3' panicked at 'weak_rng: failed to create seeded RNG: Error { repr: Os { code: 5, message: "Input/output error" } }', /builds/worker/workspace/build/src/third_party/rust/rand/src/lib.rs:898:18
Alex, we are seeing some intermittent test failures on Linux because weak_rng panics after failing to read from /dev/urandom. Shouldn't the Linux OsRng prefer getrandom() over reading from /dev/urandom? I don't know why OsRng is even trying to read from /dev/urandom. Perhaps is_getrandom_available() is broken on our Linux test machines or blocked by some Firefox sandbox permission?

Also, can you please take a look at my proposed `rand` PR #181? Since we don't fully understand these Windows or Linux RNG errors, my proposed PR to seed weak_rng with the system time (as a fallback instead of panicking) might be a good safety net.

https://github.com/rust-lang-nursery/rand/pull/181
Flags: needinfo?(acrichton)
Yeah I'm not sure why something like `is_getrandom_available()` is returning false for the Gecko CI machines. Maybe the kernel is actually to old to support getrandom?

I'll take a look at the PR. Do you need a release after merging?
Flags: needinfo?(acrichton)
(In reply to Alex Crichton [:acrichto] from comment #6)
> I'll take a look at the PR. Do you need a release after merging?

Thanks! A new rand release would be helpful, but it is not urgent.
(In reply to Alex Crichton [:acrichto] from comment #6)
> Yeah I'm not sure why something like `is_getrandom_available()` is returning
> false for the Gecko CI machines. Maybe the kernel is actually to old to
> support getrandom?

Emilio or Manish, if either of you have a Linux dev machine handy, can you verify that is_getrandom_available() works for you? 

This test failure was on Ubuntu 16.04.3 LTS (Xenial Xerus), which does support getrandom() [1]. Maybe there is something about the Linux VMs that affects getrandom()? Regardless, if is_getrandom_available() works correctly on your Linux dev machine, then at least we know the code works somewhere and we probably don't need to worry too much why the test machine is trying to read /dev/urandom.

[1] http://manpages.ubuntu.com/manpages/xenial/en/man2/getrandom.2.html
[2] https://github.com/rust-lang-nursery/rand/blob/6fd1009174d3f9f544db716a013d57dd70578a12/src/os.rs#L144-L164
Flags: needinfo?(manishearth)
Flags: needinfo?(emilio)
When I say "works correctly on your Linux dev machine", I specifically mean in the Firefox content process sandbox. Maybe there is something about the content process that affects getrandom().
I tried to break on that function in a content process under an rr trace and couldn't, but getrandom was called a bunch of times successfully.
Flags: needinfo?(emilio)
Flags: needinfo?(manishearth)
(In reply to Alex Crichton [:acrichto] from comment #6)
> I'll take a look at the PR. Do you need a release after merging?

Alex, can you please make a new rand release (0.3.18?) that includes the weak_rng fix some time next week? I'd like to get this fix into Firefox Nightly 58 before November 13. Thanks!
Flags: needinfo?(acrichton)
Ok I've now published 0.3.18
Flags: needinfo?(acrichton)
(In reply to Alex Crichton [:acrichto] from comment #13)
> Ok I've now published 0.3.18

Thanks!

@ Manish (or anyone who feels like it): do you mind revendoring rand so we get version 0.3.18 in Nightly 58 some time this week?
Flags: needinfo?(manishearth)
Flags: needinfo?(manishearth)
Comment on attachment 8925700 [details]
Bug 1411250 - Bump rand crate to 0.3.18 ;

https://reviewboard.mozilla.org/r/196832/#review202056
Attachment #8925700 - Flags: review?(xidorn+moz) → review+
Comment on attachment 8925701 [details]
Bug 1411250 - Revendor deps;

https://reviewboard.mozilla.org/r/196834/#review202058
Attachment #8925701 - Flags: review?(xidorn+moz) → review+
https://hg.mozilla.org/mozilla-central/rev/f0fbcf42783f
https://hg.mozilla.org/mozilla-central/rev/63ebc045fa98
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
Assignee: nobody → manishearth
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: