Intermittent error: failed to run custom build command for `webrender v0.61.0 (/builds/worker/checkouts/gecko/gfx/wr/webrender)`
Categories
(Core :: Graphics: WebRender, defect, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox81 | --- | fixed |
People
(Reporter: CosminS, Assigned: jnicol)
References
Details
(Keywords: intermittent-failure)
Attachments
(1 file)
Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=301843360&repo=mozilla-central&lineNumber=8463
[task 2020-05-12T04:14:06.927Z] [webrender 0.61.0] Finished optimizing shader ShaderOptimizationInput { shader_name: "composite", config: "TEXTURE_RECT", gl_version: Gl }
[task 2020-05-12T04:14:06.932Z] [webrender 0.61.0] Optimizing shader ShaderOptimizationInput { shader_name: "composite", config: "YUV", gl_version: Gl }
[task 2020-05-12T04:14:06.934Z] error: failed to run custom build command for `webrender v0.61.0 (/builds/worker/checkouts/gecko/gfx/wr/webrender)`
[task 2020-05-12T04:14:06.934Z]
[task 2020-05-12T04:14:06.934Z] Caused by:
[task 2020-05-12T04:14:06.934Z] process didn't exit successfully: `/builds/worker/checkouts/gecko/gfx/wr/target/release/build/webrender-3238fc880e142c10/build-script-build` (signal: 11, SIGSEGV: invalid memory reference)
[task 2020-05-12T04:14:06.934Z] --- stdout
[task 2020-05-12T04:14:06.934Z] cargo:rerun-if-changed=res
[task 2020-05-12T04:14:06.934Z] cargo:rerun-if-changed=res/cs_border_solid.glsl
[task 2020-05-12T04:14:06.934Z] cargo:rerun-if-changed=res/render_task.glsl
[task 2020-05-12T04:14:06.934Z] cargo:rerun-if-changed=res/base.glsl
Comment 1•4 years ago
|
||
Does this look like an issue with the shader optimization pass Jamie? (the surrounding logs might be a red herring).
Assignee | ||
Comment 2•4 years ago
|
||
It does indeed look like an issue with the shader optimization pass.
You'd think the likely culprit would be a crash in the optimizer library itself. But I'm sure when I've seen that happen before, the panic gets caught by the crossbeam_utils::thread::scope we used to spawn the optimizer threads, and we fail gracefully-ish in the build script. This looks like a segfault in the build script's main thread? Maybe some unsafety in the optimizer rust bindings? Or in crossbeam or build-parallel?
Comment hidden (Intermittent Failures Robot) |
Comment 4•4 years ago
|
||
The severity field is not set for this bug.
:jbonisteel, could you have a look please?
For more information, please visit auto_nag documentation.
Updated•4 years ago
|
Comment 5•4 years ago
|
||
Servo's CI caught the backtrace of the crash (using https://github.com/servo/webrender/commit/34d968adeda2e06b057a13d14a88df5766b38eda):
Stack trace
0: build_script_build::backtrace::print
at /root/.cargo/git/checkouts/webrender-479e138e1c1b9e8b/34d968a/webrender/backtrace.rs:17
1: build_script_build::handler
at /root/.cargo/git/checkouts/webrender-479e138e1c1b9e8b/34d968a/webrender/build.rs:262
2: <unknown>
3: _ZNK9glsl_type12is_interfaceEv
at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/glslopt-0.1.2/glsl-optimizer/src/compiler/glsl_types.h:871
4: _ZN11ir_variableC2EPK9glsl_typePKc16ir_variable_mode
at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/glslopt-0.1.2/glsl-optimizer/src/compiler/glsl/ir.cpp:1833
5: _ZN12_GLOBAL__N_126builtin_variable_generator12add_variableEPKcPK9glsl_typei16ir_variable_modei
at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/glslopt-0.1.2/glsl-optimizer/src/compiler/glsl/builtin_variables.cpp:558
6: _ZN12_GLOBAL__N_126builtin_variable_generator11add_uniformEPK9glsl_typeiPKc
at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/glslopt-0.1.2/glsl-optimizer/src/compiler/glsl/builtin_variables.cpp:613
7: _ZN12_GLOBAL__N_126builtin_variable_generator11add_uniformEPK9glsl_typePKc
at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/glslopt-0.1.2/glsl-optimizer/src/compiler/glsl/builtin_variables.cpp:449
8: _ZN12_GLOBAL__N_126builtin_variable_generator17generate_uniformsEv
at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/glslopt-0.1.2/glsl-optimizer/src/compiler/glsl/builtin_variables.cpp:983
9: _Z31_mesa_glsl_initialize_variablesP9exec_listP22_mesa_glsl_parse_state
at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/glslopt-0.1.2/glsl-optimizer/src/compiler/glsl/builtin_variables.cpp:1553
10: _Z16_mesa_ast_to_hirP9exec_listP22_mesa_glsl_parse_state
at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/glslopt-0.1.2/glsl-optimizer/src/compiler/glsl/ast_to_hir.cpp:131
11: glslopt_optimize
at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/glslopt-0.1.2/glsl-optimizer/src/compiler/glsl/glsl_optimizer.cpp:673
12: glslopt::Context::optimize
at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/glslopt-0.1.2/src/lib.rs:48
13: build_script_build::write_optimized_shaders::{{closure}}
at /root/.cargo/git/checkouts/webrender-479e138e1c1b9e8b/34d968a/webrender/build.rs:159
14: core::ops::function::impls::<impl core::ops::function::Fn<A> for &F>::call
at /rustc/2753fab7ce3647033146b07c8b6c9f4856a910b0/src/libcore/ops/function.rs:243
15: build_parallel::compile_object
at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/build-parallel-0.1.1/src/lib.rs:24
16: build_parallel::compile_objects::{{closure}}::{{closure}}
at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/build-parallel-0.1.1/src/lib.rs:90
17: crossbeam_utils::thread::ScopedThreadBuilder::spawn::{{closure}}
at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/crossbeam-utils-0.7.2/src/thread.rs:415
18: crossbeam_utils::thread::ScopedThreadBuilder::spawn::{{closure}}
at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/crossbeam-utils-0.7.2/src/thread.rs:423
19: <alloc::boxed::Box<F> as core::ops::function::FnMut<A>>::call_mut
at /rustc/2753fab7ce3647033146b07c8b6c9f4856a910b0/src/liballoc/boxed.rs:1088
20: crossbeam_utils::thread::ScopedThreadBuilder::spawn::{{closure}}
at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/crossbeam-utils-0.7.2/src/thread.rs:431
21: std::sys_common::backtrace::__rust_begin_short_backtrace
at /rustc/2753fab7ce3647033146b07c8b6c9f4856a910b0/src/libstd/sys_common/backtrace.rs:130
22: std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}}
at /rustc/2753fab7ce3647033146b07c8b6c9f4856a910b0/src/libstd/thread/mod.rs:475
23: <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
at /rustc/2753fab7ce3647033146b07c8b6c9f4856a910b0/src/libstd/panic.rs:318
24: std::panicking::try::do_call
at /rustc/2753fab7ce3647033146b07c8b6c9f4856a910b0/src/libstd/panicking.rs:348
25: __rust_try
26: std::panicking::try
at /rustc/2753fab7ce3647033146b07c8b6c9f4856a910b0/src/libstd/panicking.rs:325
27: std::panic::catch_unwind
at /rustc/2753fab7ce3647033146b07c8b6c9f4856a910b0/src/libstd/panic.rs:394
28: std::thread::Builder::spawn_unchecked::{{closure}}
at /rustc/2753fab7ce3647033146b07c8b6c9f4856a910b0/src/libstd/thread/mod.rs:474
29: core::ops::function::FnOnce::call_once{{vtable.shim}}
at /rustc/2753fab7ce3647033146b07c8b6c9f4856a910b0/src/libcore/ops/function.rs:233
30: <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
at /rustc/2753fab7ce3647033146b07c8b6c9f4856a910b0/src/liballoc/boxed.rs:1081
<alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
at /rustc/2753fab7ce3647033146b07c8b6c9f4856a910b0/src/liballoc/boxed.rs:1081
std::sys::unix::thread::Thread::new::thread_start
at /rustc/2753fab7ce3647033146b07c8b6c9f4856a910b0/src/libstd/sys/unix/thread.rs:87
31: start_thread
32: __clone
This looks like a race when using the type hashtables in this code, since we're segfaulting calling methods on a type that was created a few stack frames earlier. This sounds very similar to the problem described in https://github.com/jamienicol/glsl-optimizer/commit/d6ec0aa7edfbe1c86861a4643b6b095a243d24ad, but clearly has either regressed or is a separate instance.
In the short term, it would be nice to make it possible to disable the multithreading in the webrender build script so Servo's CI doesn't keep hitting this.
Assignee | ||
Comment 6•4 years ago
•
|
||
(In reply to Josh Matthews [:jdm] from comment #5)
In the short term, it would be nice to make it possible to disable the multithreading in the webrender build script so Servo's CI doesn't keep hitting this.
What would be the best way for servo to control this? An environment variable MOZ_WR_DISABLE_PARALLEL_GLSLOPT=1
?
This looks like a race when using the type hashtables in this code, since we're segfaulting calling methods on a type that was created a few stack frames earlier. This sounds very similar to the problem described in https://github.com/jamienicol/glsl-optimizer/commit/d6ec0aa7edfbe1c86861a4643b6b095a243d24ad, but clearly has either regressed or is a separate instance.
It's not immediately clear what the problem is: that hash table is only accessed in that function and glsl_type_singleton_decref()
, and the mutex is locked in both cases. The code has changed slightly since the commit you linked to, the mem_mutex
was removed in this commit, and each glsl_type
gets its own ralloc_context
. I will investigate this further
Assignee | ||
Updated•4 years ago
|
Comment 7•4 years ago
|
||
(In reply to Jamie Nicol [:jnicol] from comment #6)
(In reply to Josh Matthews [:jdm] from comment #5)
In the short term, it would be nice to make it possible to disable the multithreading in the webrender build script so Servo's CI doesn't keep hitting this.
What would be the best way for servo to control this? An environment variable
MOZ_WR_DISABLE_PARALLEL_GLSLOPT=1
?
An environment variable or a cargo feature would be fine.
Assignee | ||
Comment 8•4 years ago
|
||
Filed bug 1653940 for the option to disable parallel. Have a patch ready.
I believe I've found the cause of the crash, however.
The original glsl-optimizer was based on an ancient version of mesa, and appeared to have stubbed out the mutex implementation as you linked to on matrix and the servo bug. But even with a working mutex it was full of threading bugs (eg it predated the commit you link to in comment 4 by several years). It crashed immediately and 100% reliably as soon as more than 1 thread was used.
We spent quite a lot of time rebasing it on to recent mesa, which which fixed the majority of the thread issues (but not this one obviously). That was mesa 20.0 we rebased onto. I can see that in 20.1 there is this commit: https://gitlab.freedesktop.org/mesa/mesa/-/commit/d101ca3f5ad85731cedbe7ab399d4323cca1aac6, which seems very likely to be what we need.
I will attempt to merge in mesa 20.1, but if that proves too difficult will just cherry-pick that commit
Assignee | ||
Comment 9•4 years ago
|
||
Update webrender's dependency on glslopt to 0.1.4. This includes an updated version of Mesa, which
has fixed a race condition that was causing intermittent build failures.
Comment 10•4 years ago
|
||
Pushed by jnicol@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/be4c6ef3da4c Update glslopt to fix intermittent build error. r=lsalzman
Comment 11•4 years ago
|
||
bugherder |
Comment 12•2 years ago
|
||
I met the same problem in the latest version, seems the patches have been applied.
I wonder if I need to add any configuration options to enable the fixes?
Assignee | ||
Comment 13•2 years ago
|
||
Can you please file a new bug and attach the terminal output. Thanks.
Comment 14•2 years ago
|
||
(In reply to Jamie Nicol [:jnicol] from comment #13)
Can you please file a new bug and attach the terminal output. Thanks.
Sure.
Description
•