Closed Bug 1637148 Opened 4 years ago Closed 4 years ago

Intermittent error: failed to run custom build command for `webrender v0.61.0 (/builds/worker/checkouts/gecko/gfx/wr/webrender)`

Categories

(Core :: Graphics: WebRender, defect, P3)

defect

Tracking

()

RESOLVED FIXED
81 Branch
Tracking Status
firefox81 --- fixed

People

(Reporter: CosminS, Assigned: jnicol)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

Th link: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&group_state=expanded&resultStatus=testfailed%2Cbusted%2Cexception&revision=0db4052181f50970fc18383df98eeb40ab7ce684&selectedTaskRun=aELPRtaXR7-Vm_iAgu_Nxg-0

Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=301843360&repo=mozilla-central&lineNumber=8463

Raw log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/aELPRtaXR7-Vm_iAgu_Nxg/runs/0/artifacts/public/logs/live_backing.log

[task 2020-05-12T04:14:06.927Z] [webrender 0.61.0] Finished optimizing shader ShaderOptimizationInput { shader_name: "composite", config: "TEXTURE_RECT", gl_version: Gl }
[task 2020-05-12T04:14:06.932Z] [webrender 0.61.0] Optimizing shader ShaderOptimizationInput { shader_name: "composite", config: "YUV", gl_version: Gl }
[task 2020-05-12T04:14:06.934Z] error: failed to run custom build command for `webrender v0.61.0 (/builds/worker/checkouts/gecko/gfx/wr/webrender)`
[task 2020-05-12T04:14:06.934Z] 
[task 2020-05-12T04:14:06.934Z] Caused by:
[task 2020-05-12T04:14:06.934Z]   process didn't exit successfully: `/builds/worker/checkouts/gecko/gfx/wr/target/release/build/webrender-3238fc880e142c10/build-script-build` (signal: 11, SIGSEGV: invalid memory reference)
[task 2020-05-12T04:14:06.934Z] --- stdout
[task 2020-05-12T04:14:06.934Z] cargo:rerun-if-changed=res
[task 2020-05-12T04:14:06.934Z] cargo:rerun-if-changed=res/cs_border_solid.glsl
[task 2020-05-12T04:14:06.934Z] cargo:rerun-if-changed=res/render_task.glsl
[task 2020-05-12T04:14:06.934Z] cargo:rerun-if-changed=res/base.glsl

Does this look like an issue with the shader optimization pass Jamie? (the surrounding logs might be a red herring).

Flags: needinfo?(jnicol)

It does indeed look like an issue with the shader optimization pass.

You'd think the likely culprit would be a crash in the optimizer library itself. But I'm sure when I've seen that happen before, the panic gets caught by the crossbeam_utils::thread::scope we used to spawn the optimizer threads, and we fail gracefully-ish in the build script. This looks like a segfault in the build script's main thread? Maybe some unsafety in the optimizer rust bindings? Or in crossbeam or build-parallel?

Flags: needinfo?(jnicol)

The severity field is not set for this bug.
:jbonisteel, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(jbonisteel)
Severity: -- → S3
Flags: needinfo?(jbonisteel)
Priority: -- → P3
See Also: → 1642482

Servo's CI caught the backtrace of the crash (using https://github.com/servo/webrender/commit/34d968adeda2e06b057a13d14a88df5766b38eda):

  Stack trace
     0: build_script_build::backtrace::print
               at /root/.cargo/git/checkouts/webrender-479e138e1c1b9e8b/34d968a/webrender/backtrace.rs:17
     1: build_script_build::handler
               at /root/.cargo/git/checkouts/webrender-479e138e1c1b9e8b/34d968a/webrender/build.rs:262
     2: <unknown>
     3: _ZNK9glsl_type12is_interfaceEv
               at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/glslopt-0.1.2/glsl-optimizer/src/compiler/glsl_types.h:871
     4: _ZN11ir_variableC2EPK9glsl_typePKc16ir_variable_mode
               at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/glslopt-0.1.2/glsl-optimizer/src/compiler/glsl/ir.cpp:1833
     5: _ZN12_GLOBAL__N_126builtin_variable_generator12add_variableEPKcPK9glsl_typei16ir_variable_modei
               at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/glslopt-0.1.2/glsl-optimizer/src/compiler/glsl/builtin_variables.cpp:558
     6: _ZN12_GLOBAL__N_126builtin_variable_generator11add_uniformEPK9glsl_typeiPKc
               at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/glslopt-0.1.2/glsl-optimizer/src/compiler/glsl/builtin_variables.cpp:613
     7: _ZN12_GLOBAL__N_126builtin_variable_generator11add_uniformEPK9glsl_typePKc
               at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/glslopt-0.1.2/glsl-optimizer/src/compiler/glsl/builtin_variables.cpp:449
     8: _ZN12_GLOBAL__N_126builtin_variable_generator17generate_uniformsEv
               at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/glslopt-0.1.2/glsl-optimizer/src/compiler/glsl/builtin_variables.cpp:983
     9: _Z31_mesa_glsl_initialize_variablesP9exec_listP22_mesa_glsl_parse_state
               at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/glslopt-0.1.2/glsl-optimizer/src/compiler/glsl/builtin_variables.cpp:1553
    10: _Z16_mesa_ast_to_hirP9exec_listP22_mesa_glsl_parse_state
               at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/glslopt-0.1.2/glsl-optimizer/src/compiler/glsl/ast_to_hir.cpp:131
    11: glslopt_optimize
               at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/glslopt-0.1.2/glsl-optimizer/src/compiler/glsl/glsl_optimizer.cpp:673
    12: glslopt::Context::optimize
               at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/glslopt-0.1.2/src/lib.rs:48
    13: build_script_build::write_optimized_shaders::{{closure}}
               at /root/.cargo/git/checkouts/webrender-479e138e1c1b9e8b/34d968a/webrender/build.rs:159
    14: core::ops::function::impls::<impl core::ops::function::Fn<A> for &F>::call
               at /rustc/2753fab7ce3647033146b07c8b6c9f4856a910b0/src/libcore/ops/function.rs:243
    15: build_parallel::compile_object
               at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/build-parallel-0.1.1/src/lib.rs:24
    16: build_parallel::compile_objects::{{closure}}::{{closure}}
               at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/build-parallel-0.1.1/src/lib.rs:90
    17: crossbeam_utils::thread::ScopedThreadBuilder::spawn::{{closure}}
               at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/crossbeam-utils-0.7.2/src/thread.rs:415
    18: crossbeam_utils::thread::ScopedThreadBuilder::spawn::{{closure}}
               at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/crossbeam-utils-0.7.2/src/thread.rs:423
    19: <alloc::boxed::Box<F> as core::ops::function::FnMut<A>>::call_mut
               at /rustc/2753fab7ce3647033146b07c8b6c9f4856a910b0/src/liballoc/boxed.rs:1088
    20: crossbeam_utils::thread::ScopedThreadBuilder::spawn::{{closure}}
               at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/crossbeam-utils-0.7.2/src/thread.rs:431
    21: std::sys_common::backtrace::__rust_begin_short_backtrace
               at /rustc/2753fab7ce3647033146b07c8b6c9f4856a910b0/src/libstd/sys_common/backtrace.rs:130
    22: std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}}
               at /rustc/2753fab7ce3647033146b07c8b6c9f4856a910b0/src/libstd/thread/mod.rs:475
    23: <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
               at /rustc/2753fab7ce3647033146b07c8b6c9f4856a910b0/src/libstd/panic.rs:318
    24: std::panicking::try::do_call
               at /rustc/2753fab7ce3647033146b07c8b6c9f4856a910b0/src/libstd/panicking.rs:348
    25: __rust_try
    26: std::panicking::try
               at /rustc/2753fab7ce3647033146b07c8b6c9f4856a910b0/src/libstd/panicking.rs:325
    27: std::panic::catch_unwind
               at /rustc/2753fab7ce3647033146b07c8b6c9f4856a910b0/src/libstd/panic.rs:394
    28: std::thread::Builder::spawn_unchecked::{{closure}}
               at /rustc/2753fab7ce3647033146b07c8b6c9f4856a910b0/src/libstd/thread/mod.rs:474
    29: core::ops::function::FnOnce::call_once{{vtable.shim}}
               at /rustc/2753fab7ce3647033146b07c8b6c9f4856a910b0/src/libcore/ops/function.rs:233
    30: <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
               at /rustc/2753fab7ce3647033146b07c8b6c9f4856a910b0/src/liballoc/boxed.rs:1081
        <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
               at /rustc/2753fab7ce3647033146b07c8b6c9f4856a910b0/src/liballoc/boxed.rs:1081
        std::sys::unix::thread::Thread::new::thread_start
               at /rustc/2753fab7ce3647033146b07c8b6c9f4856a910b0/src/libstd/sys/unix/thread.rs:87
    31: start_thread
    32: __clone

This looks like a race when using the type hashtables in this code, since we're segfaulting calling methods on a type that was created a few stack frames earlier. This sounds very similar to the problem described in https://github.com/jamienicol/glsl-optimizer/commit/d6ec0aa7edfbe1c86861a4643b6b095a243d24ad, but clearly has either regressed or is a separate instance.

In the short term, it would be nice to make it possible to disable the multithreading in the webrender build script so Servo's CI doesn't keep hitting this.

(In reply to Josh Matthews [:jdm] from comment #5)

In the short term, it would be nice to make it possible to disable the multithreading in the webrender build script so Servo's CI doesn't keep hitting this.

What would be the best way for servo to control this? An environment variable MOZ_WR_DISABLE_PARALLEL_GLSLOPT=1?

This looks like a race when using the type hashtables in this code, since we're segfaulting calling methods on a type that was created a few stack frames earlier. This sounds very similar to the problem described in https://github.com/jamienicol/glsl-optimizer/commit/d6ec0aa7edfbe1c86861a4643b6b095a243d24ad, but clearly has either regressed or is a separate instance.

It's not immediately clear what the problem is: that hash table is only accessed in that function and glsl_type_singleton_decref(), and the mutex is locked in both cases. The code has changed slightly since the commit you linked to, the mem_mutex was removed in this commit, and each glsl_type gets its own ralloc_context. I will investigate this further

Assignee: nobody → jnicol

(In reply to Jamie Nicol [:jnicol] from comment #6)

(In reply to Josh Matthews [:jdm] from comment #5)

In the short term, it would be nice to make it possible to disable the multithreading in the webrender build script so Servo's CI doesn't keep hitting this.

What would be the best way for servo to control this? An environment variable MOZ_WR_DISABLE_PARALLEL_GLSLOPT=1?

An environment variable or a cargo feature would be fine.

See Also: → 1653940

Filed bug 1653940 for the option to disable parallel. Have a patch ready.

I believe I've found the cause of the crash, however.

The original glsl-optimizer was based on an ancient version of mesa, and appeared to have stubbed out the mutex implementation as you linked to on matrix and the servo bug. But even with a working mutex it was full of threading bugs (eg it predated the commit you link to in comment 4 by several years). It crashed immediately and 100% reliably as soon as more than 1 thread was used.

We spent quite a lot of time rebasing it on to recent mesa, which which fixed the majority of the thread issues (but not this one obviously). That was mesa 20.0 we rebased onto. I can see that in 20.1 there is this commit: https://gitlab.freedesktop.org/mesa/mesa/-/commit/d101ca3f5ad85731cedbe7ab399d4323cca1aac6, which seems very likely to be what we need.

I will attempt to merge in mesa 20.1, but if that proves too difficult will just cherry-pick that commit

Update webrender's dependency on glslopt to 0.1.4. This includes an updated version of Mesa, which
has fixed a race condition that was causing intermittent build failures.

Pushed by jnicol@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/be4c6ef3da4c
Update glslopt to fix intermittent build error. r=lsalzman
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 81 Branch

I met the same problem in the latest version, seems the patches have been applied.
I wonder if I need to add any configuration options to enable the fixes?

Can you please file a new bug and attach the terminal output. Thanks.

(In reply to Jamie Nicol [:jnicol] from comment #13)

Can you please file a new bug and attach the terminal output. Thanks.
Sure.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: