Closed Bug 1285732 Opened 3 years ago Closed 3 years ago

B2G Aries and Nexus 5 L cache broken

Categories

(Firefox OS Graveyard :: General, defect)

ARM
Gonk (Firefox OS)
defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gerard-majax, Assigned: garndt)

References

Details

Do you know what is going on?
Flags: needinfo?(wcosta)
Flags: needinfo?(garndt)
So far I'm not 100% what has changed to cause this to start failing, but it looks like we are receiving a 401 response when trying to create artifacts to upload for these cache jobs using the taskcluster-proxy.  All emulator and device builds seem to be affected by this.  Our normal cache jobs seem to be ok.
Retrigger also failed the same way. The last successfull task I could find was https://tools.taskcluster.net/task-inspector/#ONF3M8IKTBuD7SgkeLPEIw/

I cannot tell if it is of any importance, but that task was run out of nexus-5-l: "create-repo-cache --force-clone --upload --proxy https://github.com/mozilla-b2g/B2G https://hg.mozilla.org/mozilla-central/raw-file/default/b2g/config/nexus-5-l/sources.xml" ; while the failing task is made from aries: https://tools.taskcluster.net/task-inspector/#T2H3A6JuQEC0sdenO1H30A/

Another difference I could spot while checking the logs is that curl is not called the same way.

On a successfull upload, we get:
> [taskcluster-vcs] 0 run start : (cwd: /) curl --header 'Content-Type: application/x-tar' --header 'Content-Encoding: gzip' -X PUT --data-binary @'/root/.tc-vcs-repo/sources/git.mozilla.org/external/caf/platform/prebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.9/master.tar.gz' 'https://taskcluster-public-artifacts.s3-us-west-2.amazonaws.com/ONF3M8IKTBuD7SgkeLPEIw/0/public/git.mozilla.org/external/caf/platform/prebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.9/master.tar.gz?AWSAccessKeyId=AKIAJQESBGXODWDRTZUA&Content-Type=application%2Fx-tar&Expires=1467881677&Signature=QPEu8ANu6R84v%2FMEmwXj0rJDwdc%3D'
> [taskcluster-vcs] run end : curl --header 'Content-Type: application/x-tar' --header 'Content-Encoding: gzip' -X PUT --data-binary @'/root/.tc-vcs-repo/sources/git.mozilla.org/external/caf/platform/prebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.9/master.tar.gz' 'https://taskcluster-public-artifacts.s3-us-west-2.amazonaws.com/ONF3M8IKTBuD7SgkeLPEIw/0/public/git.mozilla.org/external/caf/platform/prebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.9/master.tar.gz?AWSAccessKeyId=AKIAJQESBGXODWDRTZUA&Content-Type=application%2Fx-tar&Expires=1467881677&Signature=QPEu8ANu6R84v%2FMEmwXj0rJDwdc%3D' (0) in 4610 ms

On a failed upload, we get:
> [taskcluster-vcs] 0 run start : (cwd: /) curl --header 'Content-Type: application/x-tar' --header 'Content-Encoding: gzip' -X PUT --data-binary @'/root/.tc-vcs-repo/sources/git.mozilla.org/external/caf/platform/prebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.9/master.tar.gz' '{{url}}'
> [taskcluster-vcs:warning] run end (with error) try (0/10) retrying in 8763.582732062787 ms : curl --header 'Content-Type: application/x-tar' --header 'Content-Encoding: gzip' -X PUT --data-binary @'/root/.tc-vcs-repo/sources/git.mozilla.org/external/caf/platform/prebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.9/master.tar.gz' '{{url}}'

Note the difference of target URL, where in the failure case we have |{{url}}|. Was there some failed string escape?
At least, I see curl warnings:
> curl: (3) [globbing] nested brace in column 2
After some debugging with tc-vcs, it appears this is the reason artifact creation is failing:

  "details": {
    "status": "auth-failed",
    "message": "ext.certificate.expiry < now"
  }

I'm not sure why this is happening yet as things have not changed in regards to how credentials are updated within taskcluster-proxy, but somehow somewhere something is going wrong.

I also tried to add "--depth" to the `repo init` call, but caused things to fail for other reasons.
I think I have pinpointed the issue:
https://github.com/taskcluster/docker-worker/blob/master/lib/features/taskcluster_proxy.js#L86

Tasks that are running with the proxy for greater than the initial claim of the task will have this issue.  The taskcluster proxy is not getting updated with the newest claim credentials but rather the credentials of the original claim (task.claim.credentials).  This should be changed to update the proxy with the credentials passed in the event that was emitted.

I've attempted to test this out with a docker-worker deployment but I have hit an issue creating an AMI all day long.  I hope that the issue goes away.
Flags: needinfo?(wcosta)
Ok, it looks like these tasks for nexus and aries cache succeeded now with the new docker-worker fixes.  Builds should be verified though.

https://tools.taskcluster.net/task-graph-inspector/#JdUQK2tRTHqy5ODaeICfUQ/caHK1O5zQtykU6tpaOcIlQ/
https://tools.taskcluster.net/task-graph-inspector/#JdUQK2tRTHqy5ODaeICfUQ/Rza6oXcmTDGfX4bCUZbKsw/1
Flags: needinfo?(garndt)
Assignee: nobody → garndt
Thanks Greg for not giving up!
np! glad all is well.  It appears there was a green build on m-c so I'll consider this resolved.
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Blocks: 1288426
You need to log in before you can comment on or make changes to this bug.