Closed Bug 1455201 Opened 6 years ago Closed 6 years ago

SpiderMonkey Intermittent subprocess.CalledProcessError: Command '['sh', '-c', u'/builds/worker/workspace/build/src/js/src/configure --enable-stdcxx-compat --enable-optimize --enable-nspr-build --prefix=/builds/worker/workspace/build/src/obj-spider/dist'

Categories

(Core :: JavaScript Engine, defect, P5)

defect

Tracking

()

RESOLVED FIXED
mozilla61
Tracking Status
firefox61 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: sfink)

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

Filed by: philringnalda [at] gmail.com

https://treeherder.mozilla.org/logviewer.html#?job_id=174422531&repo=autoland

https://queue.taskcluster.net/v1/task/HyyqNlxXT2eo64A1ooZVjQ/runs/0/artifacts/public/logs/live_backing.log

[task 2018-04-18T22:53:21.832Z] ERROR: Only GCC 6.1 or newer is supported (found version 4.7.2).
[task 2018-04-18T22:53:21.858Z] Traceback (most recent call last):
[task 2018-04-18T22:53:21.858Z]   File "/builds/worker/workspace/build/src/js/src/devtools/automation/autospider.py", line 373, in <module>
[task 2018-04-18T22:53:21.858Z]     run_command(['sh', '-c', posixpath.join(PDIR.js_src, 'configure') + ' ' + CONFIGURE_ARGS], check=True)
[task 2018-04-18T22:53:21.858Z]   File "/builds/worker/workspace/build/src/js/src/devtools/automation/autospider.py", line 313, in run_command
[task 2018-04-18T22:53:21.858Z]     raise subprocess.CalledProcessError(status, command, output=stderr)
[task 2018-04-18T22:53:21.858Z] subprocess.CalledProcessError: Command '['sh', '-c', u'/builds/worker/workspace/build/src/js/src/configure --enable-stdcxx-compat  --enable-optimize --enable-nspr-build --prefix=/builds/worker/workspace/build/src/obj-spider/dist']' returned non-zero exit status 1
This was the first job that this worker took, so my wild guess is that SM shell builds are depending on browser builds to have already told the worker to install GCC 6.1 or newer, rather than telling the worker themselves.
Flags: needinfo?(sphink)
(In reply to Phil Ringnalda (:philor) from comment #1)
> This was the first job that this worker took, so my wild guess is that SM
> shell builds are depending on browser builds to have already told the worker
> to install GCC 6.1 or newer, rather than telling the worker themselves.

Hm. That's an interesting theory, and scary enough to get me to look further into this. But AFAIK, no job changes the installed compiler. They all download the compiler via tooltool (these days, from a toolchain dependency in taskcluster, but that's how they work.) So it seems like autospider.py is not setting CC and CXX correctly. But I can't explain why that would be different for one machine vs another, and I agree that having this fail on the first job is important here.

Looking at it, this whole CC setting thing seems kind of awful. If my job depends on a gcc toolchain job, then it seems like the appropriate settings ought to be propagated at the same level, rather than guessed by the script that is run. Then again, some jobs depend on both gcc and clang toolchain jobs, so... um. Maybe I'm arguing for

    toolchains:
        - linux64-clang
        - linux64-gcc
    toolchain-env:
        - linux64-gcc

or

    toolchains:
        - linux64-clang
        - linux64-gcc
    toolchain-env:
        CC: {tooltool_checkout}/bin/gcc
        CXX: {tooltool_checkout}/bin/g++

or

    toolchains:
        - linux64-clang
        - linux64-gcc
    worker:
        env:
            CC: {tooltool_checkout}/bin/gcc
            CXX: {tooltool_checkout}/bin/g++

but never mind, now I'm totally on a tangent.

CC/CXX handling has also come up when dmajor attempted to run these with clang on Windows, so I have something totally screwed up here.
Ok, the way this is supposed to work is that the taskcluster build script sets the env var TOOLTOOL_CHECKOUT, autospider.py appends "gcc" or "clang" to that, then looks for a binary named $TOOLTOOL_CHECKOUT/gcc/bin/gcc or the clang variant. Then it sets CC to that. And it does it all in python and I have no logging describing what's going on. :( I'll fix that.

TOOLTOOL_CHECKOUT is set in sm-tooltool-config.sh, which is sourced by build-sm.sh. And that *is* logged, so... /builds/worker/workspace. And the compiler was downloaded to... oh, interesting

[task 2018-04-18T22:53:19.906Z]  0:07.64 Failed to download clang.tar.xz

That's not gcc, but it seemed to stop trying to download anything after that.

Aha! So that's the issue. It fails partway through the tooltool download for the toolchain dependencies, but doesn't exit as a result. So we proceed without downloading the compiler at all, and the above logic doesn't get triggered at all, so it just falls back to not setting CC/CXX at all.

The clang download failed like this, btw:

    task 2018-04-18T22:53:19.906Z]  0:07.64 ('Connection aborted.', BadStatusLine("''",))

philor, can your hypothesis hat explain why that might be more likely to happen on an initial job? Note that it successfully downloaded something else via tooltool just before (breakpad-tools, which is not yet a toolchain job.)
Flags: needinfo?(sphink)
I introduced the problem in bug 1442786 when preventing set -e from infecting a script that sourced sm-tooltool-config.sh. But I didn't notice that it also prevented set -e from working at all. Originally, it was

  set -e
  ...do stuff that might fail...

I changed it to

  (
  set -e
  ...do stuff that might fail...
  )

which resulted in the "do stuff" code stopping immediately on an error, as intended, but then the script itself continued happily. This patch switches to

  (
  set -e
  ...do stuff that might fail...
  ) || exit 1

I don't have a lot of confidence that I didn't horribly break something with the autospider.py logging changes, but the try push at https://treeherder.mozilla.org/#/jobs?repo=try&revision=0d527a971e5722950c965b282a21b2eb358a0010 should cover it pretty well.
Attachment #8969364 - Flags: review?(jcoppeard)
Assignee: nobody → sphink
Status: NEW → ASSIGNED
Comment on attachment 8969364 [details] [diff] [review]
Propagate failure out of set -e region and add verbose logging to autospider.py

Review of attachment 8969364 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/devtools/automation/autospider.py
@@ +254,5 @@
>      p for p in (bindir, env.get('LD_LIBRARY_PATH')) if p)
>  
> +if args.verbose:
> +    for v in ('CC', 'CXX', 'LD_LIBRARY_PATH'):
> +        print("default {name} = {value}".format(v, env[v]))

The try build is complaining because you haven't named the format parameters here.

::: taskcluster/scripts/builder/sm-tooltool-config.sh
@@ +58,5 @@
>  BROWSER_PLATFORM=$PLATFORM_OS$BITS
>  
>  (cd $TOOLTOOL_CHECKOUT && ${SRCDIR}/mach artifact toolchain${TOOLTOOL_MANIFEST:+ -v $TOOLTOOL_AUTH_FLAGS --tooltool-url $TOOLTOOL_SERVER --tooltool-manifest $SRCDIR/$TOOLTOOL_MANIFEST}${TOOLTOOL_CACHE:+ --cache-dir $TOOLTOOL_CACHE}${MOZ_TOOLCHAINS:+ ${MOZ_TOOLCHAINS}})
>  
> +) || exit 1 # end of set -e scope

I would totally have made this mistake.
Attachment #8969364 - Flags: review?(jcoppeard) → review+
Pushed by sfink@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/5a4670484c17
Propagate failure out of set -e region and add verbose logging to autospider.py, r=jonco
https://hg.mozilla.org/mozilla-central/rev/5a4670484c17
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
You need to log in before you can comment on or make changes to this bug.