Closed Bug 1234052 Opened 9 years ago Closed 8 years ago

B2G JB Emulator and Device Image opt builds fail because they can't downloaded necessary resources

Categories

(Release Engineering :: Release Automation: Other, defect)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: aryx, Unassigned)

Details

This affects all B2G JB Emulator and Device Image opt builds

E.g. https://treeherder.mozilla.org/logviewer.html#?job_id=6270312&repo=fx-team
https://treeherder.mozilla.org/logviewer.html#?job_id=18820322&repo=mozilla-inbound

[taskcluster-vcs] 0 run start : (cwd: /home/worker/workspace/gecko/testing/taskcluster/scripts/builder) tar -x -z -C /home/worker/workspace/B2G -f /home/worker/.tc-vcs-repo/sources/git.mozilla.org/external/caf/platform/prebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.9/master.tar.gz

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
[taskcluster-vcs:error] run end (with error) NOT RETRYING!: tar -x -z -C /home/worker/workspace/B2G -f /home/worker/.tc-vcs-repo/sources/git.mozilla.org/external/caf/platform/prebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.9/master.tar.gz
[taskcluster-vcs:error] Error when extracting archive

/usr/local/lib/node_modules/taskcluster-vcs/build/bin/tc-vcs.js:57
        throw err;
              ^
Error: Error running command: tar -x -z -C /home/worker/workspace/B2G -f /home/worker/.tc-vcs-repo/sources/git.mozilla.org/external/caf/platform/prebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.9/master.tar.gz
    at Error (<anonymous>)
    at run$ (/usr/local/lib/node_modules/taskcluster-vcs/build/run.js:128:15)
    at tryCatch (/usr/local/lib/node_modules/taskcluster-vcs/node_modules/6to5/node_modules/regenerator-6to5/runtime.js:53:40)
    at GeneratorFunctionPrototype.invoke (/usr/local/lib/node_modules/taskcluster-vcs/node_modules/6to5/node_modules/regenerator-6to5/runtime.js:209:22)
    at tryCatch (/usr/local/lib/node_modules/taskcluster-vcs/node_modules/6to5/node_modules/regenerator-6to5/runtime.js:53:40)
    at Function.step (/usr/local/lib/node_modules/taskcluster-vcs/node_modules/6to5/node_modules/regenerator-6to5/runtime.js:103:22)
    at /usr/local/lib/node_modules/taskcluster-vcs/node_modules/6to5/node_modules/core-js/shim.js:1283:41
    at /usr/local/lib/node_modules/taskcluster-vcs/node_modules/6to5/node_modules/core-js/shim.js:1293:10
    at process._tickCallback (node.js:442:13)
========= Finished Build ./build-emulator.sh /home/worker/workspace (results: 8, elapsed: 135 secs) (at 2015-11-20 15:10:36.685000) =========
[taskcluster] === Task Finished ===
[taskcluster] Artifact "public/build" not found at "/home/worker/artifacts/"
[taskcluster] Unsuccessful task run with exit code: 8 completed in 693.562 seconds
Judging from the logs, that file was built by  https://tools.taskcluster.net/task-inspector/#c2yiIR1-T3-RK_cy7SIBig/  That task doesn't have any obvious errors in it (in the second run anyway)

tc-vcs is fetching

  https://queue.taskcluster.net/v1/task/c2yiIR1-T3-RK_cy7SIBig/artifacts/public/git.mozilla.org/external/caf/platform/prebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.9/master.tar.gz

which isn't even a well-formed artifact URL (missing the /runs/1).  Add to that, the task has no artifacts, although the task logs clearly show it uploading artifacts.  I suspect this is because the task is a month old.  I bet those artifacts are set to expire after one month, as the cache is refreshed every 4 hours.  But somehow the B2G jobs aren't using the latest-and-greatest cached values, they're using the old values.

So I'm really not sure what's going on here or how to fix it..
<garndt> I'm only half here (getting christmas stuff done), I scheduled a cache job for the failed package...hopefully that resolves things https://tools.taskcluster.net/task-inspector/#RJ-bgjYiScSVKq6XyOOOSw/0

That job has completed - the question is, will the runs trying to consume those artifacts find it?

I see this in the latest log:

  curl -L -o /home/worker/.tc-vcs-repo/sources/git.mozilla.org/external/caf/platform/prebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.9/master.tar.gz https://queue.taskcluster.net/v1/task/RJ-bgjYiScSVKq6XyOOOSw/artifacts/public/git.mozilla.org/external/caf/platform/prebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.9/master.tar.gz

which is another task in the same graph (and thus up-to-date).  So things seem to be working again.

However, I suspect there's still an underlying issue here -- it appears that the task greg re-triggered is not the same as that triggered from https://tools.taskcluster.net/hooks/#taskcluster/tc-vcs-refresh -- are there more tc-vcs updates we need to be scheduling?
Flags: needinfo?(sdeckelmann)
After updating that one cache, another one was discovered to be missing.  I scheduled that one and retriggered the emulator-jb build (it's still running the build and past the tc-vcs part so at least emulator-jb seems to be fixed).

The hooks job to update the caches is supposed to run every 4 hours and it's supposed to force cloning when the artifact doesn't exist (which is what the task in comment 2 does successfully).  I'm not sure if these are failing currently or where I might find out the status of the jobs that have run.  I'm guessing I can't find that out retroactively, at least not yet.
> https://queue.taskcluster.net/v1/task/c2yiIR1-T3-RK_cy7SIBig/artifacts/
> public/git.mozilla.org/external/caf/platform/prebuilts/gcc/linux-x86/arm/arm-
> linux-androideabi-4.9/master.tar.gz
> 
> which isn't even a well-formed artifact URL (missing the /runs/1).

It is for getting the latest artifact from the latest run.

http://docs.taskcluster.net/queue/api-docs/#getLatestArtifact

> "I bet those artifacts are set to expire after one month"

Correct, by default artifacts expire after 30 days for this tool.
https://github.com/taskcluster/taskcluster-vcs/blob/0f3c635b634e56d4501a6e7544d8e2aea314db8d/src/clitools.js#L79
Also, in more recent versions of tc-vcs there are checks to see if that cached artifact exists instead of giving such an unhelpful error message when attempting to untar the file that was retrieved, which is just an error artifact that says the artifact is not found.

I started testing this out last week in our builder image (image used for b2g desktop and emulator builds) to ensure 2.3.17 doesn't have problem.

I can submit it for review on Monday and we can push it out.  Wander can test it out with the phone-builder image.  I don't anticipate an issue upgrading that either.

This will then put error messages in the logs and on treeherder such as:

[taskcluster-vcs:error] Artifact "public/git.mozilla.org/b2g/libnfcemu/master.tar.gz" not found for task ID Mngn_tpcRoqMsYbUjM9Y9w. This could be caused by the artifact not being created or being marked as expired.
[taskcluster:error] Cached copy of 'libnfcemu' could not be found. Use '--force-clone' to perform a full clone
re: tree closure

I believe the trees could be opened now.  The builds I retriggered so far have either A) come back green B) are past the tc-vcs point so that shouldn't be an issue.
Flags: needinfo?(sdeckelmann)
In the wrong component, but fixed all the same.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.