Open Bug 1601704 Opened 4 years ago Updated 2 years ago

Intermittent make[4]: *** [../../../dist/bin/gdb-tests] Error 1

Categories

(Firefox Build System :: Task Configuration, defect, P5)

defect

Tracking

(Not tracked)

People

(Reporter: sfink, Assigned: glandium)

References

Details

(Keywords: intermittent-failure, leave-open, regression)

Attachments

(2 files)

+++ This bug was initially created as a clone of Bug #1570522 +++

Bug 1570522 was originally caused by an out of space issue. The same symptom (failure to link gdb-tests) is now being caused by something else.

Copied from bug 1447695:

This started to fail quite frequently today:
https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=success%2Ctestfailed%2Cbusted%2Cexception&searchStr=c93018f1e173e285a117f34a528343cb8536c34f&fromchange=d58db9c67aae38c26425a0a70de1a5df1a64f721&tochange=d07675191e795e0d3b41535d6e0a5b01fc47c702&selectedJob=279762373

This is a different bug. It could have been bug 1570522, but that one got duped here because the first log posted there happened to get the same symptom (error compiling gdb-tests) from a different cause (out of space). The above build, and the majority of the ones that are now getting classified as this bug, are different and do not appear to be related to "No space left on device." The relevant portion of the log is:

[task 2019-12-05T12:00:13.605Z] 12:00:13 INFO - /builds/worker/workspace/build/src/obj-firefox/x86_64-linux-android/release/libjsrust.a(wrappers.o): In function AnnotateMozCrashReason': [task 2019-12-05T12:00:13.606Z] 12:00:13 INFO - /builds/worker/workspace/build/src/obj-firefox/dist/include/mozilla/Assertions.h:42: undefined reference to__asan_report_store8'
[task 2019-12-05T12:00:13.606Z] 12:00:13 INFO - /builds/worker/workspace/build/src/obj-firefox/x86_64-linux-android/release/libjsrust.a(wrappers.o): In function MOZ_Crash': [task 2019-12-05T12:00:13.607Z] 12:00:13 INFO - /builds/worker/workspace/build/src/obj-firefox/dist/include/mozilla/Assertions.h:332: undefined reference to__asan_report_store4'
[task 2019-12-05T12:00:13.607Z] 12:00:13 INFO - /builds/worker/workspace/build/src/obj-firefox/dist/include/mozilla/Assertions.h:332: undefined reference to __asan_handle_no_return' [task 2019-12-05T12:00:13.607Z] 12:00:13 INFO - /builds/worker/workspace/build/src/obj-firefox/x86_64-linux-android/release/libjsrust.a(wrappers.o): In functionasan.module_ctor':
[task 2019-12-05T12:00:13.608Z] 12:00:13 INFO - wrappers.cpp:(.text.asan.module_ctor+0x2): undefined reference to __asan_init' [task 2019-12-05T12:00:13.608Z] 12:00:13 INFO - wrappers.cpp:(.text.asan.module_ctor+0x7): undefined reference to__asan_version_mismatch_check_v8'
[task 2019-12-05T12:00:13.608Z] 12:00:13 INFO - clang-9: error: linker command failed with exit code 1 (use -v to see invocation)
[task 2019-12-05T12:00:13.609Z] 12:00:13 INFO - /builds/worker/workspace/build/src/config/rules.mk:522: recipe for target '../../../dist/bin/gdb-tests' failed
[task 2019-12-05T12:00:13.609Z] 12:00:13 ERROR - make[4]: *** [../../../dist/bin/gdb-tests] Error 1

It is very odd to see the asan error here. Here's the configure line for js/src:

/builds/worker/workspace/build/src/configure.py --enable-project=js --enable-crashreporter --with-android-min-sdk=21 --with-branding=mobile/android/branding/nightly --host=x86_64-pc-linux-gnu --target=x86_64-pc-linux-android --enable-tests --disable-debug --disable-rust-debug --enable-release --enable-optimize --with-android-ndk=/builds/worker/fetches/android-ndk --without-android-toolchain --with-android-version=21 --with-ccache=/builds/worker/fetches/sccache/sccache --with-toolchain-prefix=/builds/worker/fetches/android-ndk/toolchains/x86_64-4.9/prebuilt/linux-x86_64/bin/x86_64-linux-android- --enable-debug-symbols --disable-address-sanitizer --disable-memory-sanitizer --disable-thread-sanitizer --disable-undefined-sanitizer --disable-signed-overflow-sanitizer --disable-unsigned-overflow-sanitizer --enable-frame-pointers --disable-coverage --enable-cargo-incremental --enable-linker=bfd --enable-clang-plugin --disable-mozsearch-plugin --disable-stdcxx-compat --disable-fuzzing --enable-jemalloc --enable-replace-malloc --without-linux-headers --enable-warnings-as-errors --disable-valgrind --without-libclang-path --without-clang-path --disable-profile-generate --disable-profile-use --without-pgo-profile-path --disable-cross-pgo --enable-lto=cross --enable-js-shell --enable-ion --disable-simulator --disable-instruments --disable-callgrind --enable-profiling --disable-vtune --disable-gc-trace --disable-gczeal --enable-small-chunk-size --enable-trace-logging --disable-oom-breakpoint --disable-perf --disable-jitspew --disable-masm-verbose --disable-more-deterministic --enable-ctypes --without-system-ffi --disable-pipeline-operator --enable-binast --enable-rust-simd --enable-cranelift --disable-wasm-codegen-debug --enable-typed-objects --enable-wasm-bulk-memory --enable-wasm-reftypes --enable-wasm-bigint --enable-wasm-gc --enable-wasm-private-reftypes --disable-wasm-multi-value --disable-new-regexp --with-nspr-cflags=-I/builds/worker/workspace/build/src/obj-firefox/dist/include/nspr --with-nspr-libs=-L/builds/worker/workspace/build/src/obj-firefox/dist/lib -lnspr4 -lplc4 -lplds4 --prefix=/builds/worker/workspace/build/src/obj-firefox/dist JS_STANDALONE=

Specifically note the --disable-address-sanitizer (and all other sanitizers are disabled as well.)

Given that this seems to be some sort of build config problem, I'm needinfo'ing it over there. :build-peers doesn't take needinfo, so I'll try chmanchester.

Flags: needinfo?(cmanchester)

This smells sccache cache mess-up.

(In reply to Mike Hommey [:glandium] (high latency) from comment #4)

This smells sccache cache mess-up.

This doesn't reproduce on try, so I'm inclined to agree. The timing and some of the other details here make me think bug 1482167/bug 1596950 may be related.

Flags: needinfo?(cmanchester)

There's a problem with this theory, though: there are green jobs and red jobs using the same sccache buckets.

Okay, this is a rehash of a problem we had with lmdb.
We are getting a libjsrust from sccache that contains a different wrappers.o than what it should contain, while we do get the right wrappers.o independently. I need to dig to find that lmdb bug.

Okay, found an explanation how sccache is tripped up, and a workaround on our end: https://github.com/rust-lang/rust/issues/58393#issuecomment-562410696

Assignee: nobody → mh+mozilla

Let's add some fun fact for posterity:

  • Because a recent cranelift update made sccache have cache misses for cranelift, it also had cache misses for jsrust, so the problem didn't appear... until bug 1601233, where the non-determinism in cranelift's build was fixed, at which point sccache could work again... to hit this bug.
  • And this started being visible on the push that followed, for bug 1592415.
    As to why that didn't happen before:
  • Before bug 1596950, C/C++ code built from rust didn't get ASAN flags, so there was no occasion for this to happen at all.
  • Bug 1482167 landed after the cranelift update that "broke" sccache, but before the other cranelift update that fixed it.

Ultimately, the original trigger is bug 1594998, but it didn't become a problem until bug 1601233.

Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/autoland/rev/b515b0637e5e
Work around sccache/rust confusion about mozglue-static. r=chmanchester
Keywords: leave-open

While it doesn't seem to be happening (yet), the same problem that
happened with jsrust can just as well happen with other leaf staticlib
crates that indirectly use mozglue-static.

So... I thought a little more to this and... EVERY single crate that builds C/C++ code is a race condition away between an ASAN and a non-ASAN build from blowing up the build in the same way. Working around in the build system is not sustainable. Can we do something on the sccache side or do we need to push for a fix in cargo/rust?

Flags: needinfo?(nfroyd)
Flags: needinfo?(cmanchester)

I took a look at this and I don't see an easy work around in sccache. I'll ping Alex in the upstream issue.

Flags: needinfo?(nfroyd)
Flags: needinfo?(cmanchester)
Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/autoland/rev/21ea137ee89c
Work around sccache/rust confusion about mozglue-static. libxul part. r=firefox-build-system-reviewers,chmanchester
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: