Closed Bug 1558446 Opened 5 years ago Closed 5 years ago

Massive failures in pulling docker images - CLOSED TREES

Categories

(Taskcluster :: Operations and Service Requests, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: NarcisB, Assigned: dustin)

References

Details

(Whiteboard: [stockwell disable-recommended])

Massive failures appeared on all trees, e.g:
https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception%2Crunnable&revision=f0901390f6a2b8bdb174e1a965d9977a34b68ccf&selectedJob=251141218
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=251141218&repo=autoland&lineNumber=730

11:20:04 INFO - 4:51.54 sleeping for 200.28s (attempt 4/5)
11:23:24 INFO - 8:11.83 attempt 5/5
11:23:25 INFO - Error running mach:
11:23:25 INFO - ['artifact', 'toolchain', '-v', '--retry', '4', '--artifact-manifest', 'z:\task_1560247089\build\src\toolchains.json', '--tooltool-manifest', 'z:\task_1560247089\build\src\browser/config/tooltool-manifests/win64/releng.manifest', '--tooltool-url', 'https://tooltool.mozilla-releng.net/', '--authentication-file', 'c:\builds\relengapi.tok', '--cache-dir', 'c:/builds/tooltool_cache', 'public/build/clang.tar.bz2@Vml6QAVqSh62Zhu6ZIhrMA', 'public/build/rustc.tar.xz@Okx1h4vOTamZs1YYOKwZJQ', 'public/build/rust-size.tar.bz2@FNaoWc83RB-jFVdfoeyn6A', 'public/build/cbindgen.tar.bz2@PlIOhyDsSPujrUO8aOY9wg', 'public/build/nasm.tar.bz2@V-PcM7EoT0KSUTLVCkIq5Q', 'public/build/node.tar.bz2@ds1k4Rb4QLWYyHR4CZsS3g']
11:23:25 INFO - The error occurred in code that was called by the mach command. This is either
11:23:25 INFO - a bug in the called code itself or in the way that mach is calling it.
11:23:25 INFO - You can invoke |./mach busted| to check if this issue is already on file. If it
11:23:25 INFO - isn't, please use |./mach busted file| to report it. If |./mach busted| is
11:23:25 INFO - misbehaving, you can also inspect the dependencies of bug 1543241.
11:23:25 INFO - If filing a bug, please include the full output of mach, including this error
11:23:25 INFO - message.
11:23:25 INFO - The details of the failure are as follows:
11:23:25 INFO - HTTPError: 503 Server Error: Service Unavailable for url: https://cloud-mirror.taskcluster.net/v1/redirect/s3/us-west-1/https%3A%2F%2Fs3.us-west-2.amazonaws.com%2Ftaskcluster-public-artifacts%2FVml6QAVqSh62Zhu6ZIhrMA%2F0%2Fpublic%2Fchain-of-trust.json
11:23:25 INFO - File "z:\task_1560247089\build\src\python/mozbuild/mozbuild/mach_commands.py", line 1603, in artifact_toolchain
11:23:25 INFO - record = ArtifactRecord(task_id, name)
11:23:25 INFO - File "z:\task_1560247089\build\src\python/mozbuild/mozbuild/mach_commands.py", line 1516, in init
11:23:25 INFO - cot.raise_for_status()
11:23:25 INFO - File "z:\task_1560247089\build\src\third_party/python/requests\requests\models.py", line 840, in raise_for_status
11:23:25 INFO - raise HTTPError(http_error_msg, response=self)
11:23:25 ERROR - Return code: 1
11:23:25 ERROR - 1 not in success codes: [0]
11:23:25 WARNING - setting return code to 2
11:23:25 FATAL - Halting on failure while running ['c:\mozilla-build\python\python.exe', '-u', 'z:\task_1560247089\build\src\mach', 'artifact', 'toolchain', '-v', '--retry', '4', '--artifact-manifest', 'z:\task_1560247089\build\src\toolchains.json', '--tooltool-manifest', 'z:\task_1560247089\build\src\browser/config/tooltool-manifests/win64/releng.manifest', '--tooltool-url', 'https://tooltool.mozilla-releng.net/', '--authentication-file', 'c:\builds\relengapi.tok', '--cache-dir', 'c:/builds/tooltool_cache', 'public/build/clang.tar.bz2@Vml6QAVqSh62Zhu6ZIhrMA', 'public/build/rustc.tar.xz@Okx1h4vOTamZs1YYOKwZJQ', 'public/build/rust-size.tar.bz2@FNaoWc83RB-jFVdfoeyn6A', 'public/build/cbindgen.tar.bz2@PlIOhyDsSPujrUO8aOY9wg', 'public/build/nasm.tar.bz2@V-PcM7EoT0KSUTLVCkIq5Q', 'public/build/node.tar.bz2@ds1k4Rb4QLWYyHR4CZsS3g']
11:23:25 FATAL - Running post_fatal callback...
11:23:25 FATAL - Exiting 2
11:23:25 INFO - [mozharness: 2019-06-11 11:23:25.125000Z] Finished build step (failed)
11:23:25 INFO - Running post-run listener: _parse_build_tests_ccov
11:23:25 INFO - Running post-run listener: _shutdown_sccache
11:23:25 INFO - Running post-run listener: _summarize
11:23:25 ERROR - # TBPL FAILURE #
11:23:25 INFO - [mozharness: 2019-06-11 11:23:25.126000Z] FxDesktopBuild summary:
11:23:25 ERROR - # TBPL FAILURE #
[taskcluster 2019-06-11T11:23:25.142Z] Exit Code: 2
[taskcluster 2019-06-11T11:23:25.142Z] User Time: 0s
[taskcluster 2019-06-11T11:23:25.142Z] Kernel Time: 0s
[taskcluster 2019-06-11T11:23:25.142Z] Wall Time: 8m18.3626984s
[taskcluster 2019-06-11T11:23:25.142Z] Result: FAILED

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=251141225&repo=autoland&lineNumber=13

[taskcluster 2019-06-11 11:08:43.606Z] Downloading artifact "public/image.tar.zst" from task ID: cfMwjw6kTW-Rn8fiy17oHw.
[taskcluster:error] Error downloading "public/image.tar.zst" from task ID "cfMwjw6kTW-Rn8fiy17oHw". Error: Service Unavailable Next Attempt in: 13904.45 ms.
[taskcluster:error] Error downloading "public/image.tar.zst" from task ID "cfMwjw6kTW-Rn8fiy17oHw". Error: Service Unavailable Next Attempt in: 31632.40 ms.
[taskcluster:error] Error downloading "public/image.tar.zst" from task ID "cfMwjw6kTW-Rn8fiy17oHw". Error: Service Unavailable Next Attempt in: 71224.34 ms.
[taskcluster:error] Error downloading "public/image.tar.zst" from task ID "cfMwjw6kTW-Rn8fiy17oHw". Error: Service Unavailable Next Attempt in: 115180.72 ms.
[taskcluster:error] Pulling docker image has failed.
[taskcluster:error] Error: Error loading docker image. Could not download artifact "public/image.tar.zst from task "cfMwjw6kTW-Rn8fiy17oHw" after 5 attempt(s). Error: Service Unavailable
[taskcluster 2019-06-11 11:12:37.167Z] Unsuccessful task run with exit code: -1 completed in 234.486 seconds

Severity: normal → blocker
Component: Workers → Operations and Service Requests
Flags: needinfo?(pmoore)
Flags: needinfo?(dustin)
Priority: -- → P1
Jun 11 11:02:59 cloud-mirror app/web.2: Uncaught Exception! Attempting to report to Sentry and crash. 
Jun 11 11:02:59 cloud-mirror app/web.2: Tue, 11 Jun 2019 11:02:58 GMT typed-env-config Config file missing: user-config.yml 
Jun 11 11:02:59 cloud-mirror app/web.2: Error: Redis connection to ec2-18-214-219-21.compute-1.amazonaws.com:6449 failed - connect ECONNREFUSED 18.214.219.21:6449 
Jun 11 11:02:59 cloud-mirror app/web.2:     at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1174:14) 
Jun 11 11:02:59 cloud-mirror app/web.2: 2019-06-11T11:02:58.777Z taskcluster-client Calling: sentryDSN, retry: 0 
Jun 11 11:02:59 cloud-mirror app/web.2: Uncaught Exception! Attempting to report to Sentry and crash. 
Jun 11 11:02:59 cloud-mirror app/web.2: AbortError: Redis connection lost and command aborted. It might have been processed. 
Jun 11 11:02:59 cloud-mirror app/web.2:     at RedisClient.flush_and_error (/app/node_modules/redis/index.js:357:23) 
Jun 11 11:02:59 cloud-mirror app/web.2:     at RedisClient.connection_gone (/app/node_modules/redis/index.js:659:14) 
Jun 11 11:02:59 cloud-mirror app/web.2:     at Socket.<anonymous> (/app/node_modules/redis/index.js:289:14) 
Jun 11 11:02:59 cloud-mirror app/web.2:     at Object.onceWrapper (events.js:272:13) 
Jun 11 11:02:59 cloud-mirror app/web.2:     at Socket.emit (events.js:180:13) 
Jun 11 11:02:59 cloud-mirror app/web.2:     at Socket.emit (domain.js:422:20) 
Jun 11 11:02:59 cloud-mirror app/web.2:     at TCP._handle.close [as _onclose] (net.js:541:12) 
Jun 11 11:02:59 cloud-mirror app/web.2: Tue, 11 Jun 2019 11:02:58 GMT taskcluster-lib-validate finished walking tree of schemas 
Jun 11 11:02:59 cloud-mirror app/web.2: Tue, 11 Jun 2019 11:02:58 GMT taskcluster-lib-validate Publishing schemas 
Jun 11 11:02:59 cloud-mirror app/web.2: Tue, 11 Jun 2019 11:02:58 GMT taskcluster-lib-validate Using default s3 client 
Jun 11 11:02:59 cloud-mirror app/web.2: Tue, 11 Jun 2019 11:02:58 GMT cloud-proxy:main Redis config: {"host":"<mumble>.compute-1.amazonaws.com","port":"6449","password":"<mumble>"} 
Jun 11 11:02:59 cloud-mirror app/web.2: 2019-06-11T11:02:59.059Z taskcluster-client Success calling: sentryDSN, (0 retries) 
Jun 11 11:02:59 cloud-mirror app/web.2: Tue, 11 Jun 2019 11:02:59 GMT base:app Server listening on port 37827 
Jun 11 11:02:59 cloud-mirror app/web.2: Succesfully reported error to Sentry. 
Jun 11 11:02:59 cloud-mirror heroku/web.2: State changed from starting to crashed 
Jun 11 11:02:59 cloud-mirror heroku/web.2: Process exited with status 1 
Flags: needinfo?(dustin)

For reasons I never understood, cloud-mirror does not know how to read from the REDIS_URL variable that Heroku sets, preferring to read from its own env variables. So when Heroku changes the env var, it fails.

The static services also need to be updated -- I'll do that now.

Whiteboard: [stockwell disable-recommended]
Flags: needinfo?(pmoore)

Can this bug be closed now?

Assignee: nobody → dustin
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.