Open Bug 1304943 Opened 9 years ago Updated 6 years ago

Intermittent-infra abort: unexpected response from remote server: empty string cloning mozilla-unified

Categories

(Developer Services :: Mercurial: robustcheckout, defect)

defect
Not set
normal

Tracking

(Not tracked)

REOPENED

People

(Reporter: intermittent-bug-filer, Unassigned)

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

That's a special error. Never seen that one before.
I think robustcheckout should retry when there is an error.
Component: General → Mercurial: robustcheckout
Product: Taskcluster → Developer Services
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INCOMPLETE
https://www.mercurial-scm.org/repo/hg/rev/9bd003052d55 (requires Mercurial 4.4) makes this error a lot easier to detect in robustcheckout.
Status: REOPENED → RESOLVED
Closed: 8 years ago7 years ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
https://treeherder.mozilla.org/logviewer.html#?job_id=193313781&repo=autoland&lineNumber=434 Traceback (most recent call last): File "mercurial\scmutil.pyc", line 154, in callcatch File "mercurial\dispatch.pyc", line 314, in _runcatchfunc File "mercurial\dispatch.pyc", line 918, in _dispatch File "mercurial\dispatch.pyc", line 673, in runcommand File "mercurial\dispatch.pyc", line 926, in _runcommand File "mercurial\dispatch.pyc", line 915, in <lambda> File "mercurial\util.pyc", line 1195, in check File "C:/mozilla-build/robustcheckout.py", line 265, in robustcheckout File "C:/mozilla-build/robustcheckout.py", line 547, in _docheckout File "mercurial\hg.pyc", line 567, in clone File "mercurial\hg.pyc", line 427, in clonewithshare File "mercurial\hg.pyc", line 661, in clone File "mercurial\exchange.pyc", line 1360, in pull File "mercurial\exchange.pyc", line 2071, in _maybeapplyclonebundle File "mercurial\exchange.pyc", line 2252, in trypullbundlefromurl File "mercurial\streamclone.pyc", line 434, in apply File "mercurial\streamclone.pyc", line 422, in applybundlev1 File "mercurial\streamclone.pyc", line 362, in consumev1 ResponseError: ('unexpected response from remote server:', '') abort: unexpected response from remote server: empty string [taskcluster 2018-08-10T16:44:37.465Z] Exit Code: 255 [taskcluster 2018-08-10T16:44:37.465Z] User Time: 0s [taskcluster 2018-08-10T16:44:37.465Z] Kernel Time: 0s [taskcluster 2018-08-10T16:44:37.465Z] Wall Time: 7m17.8672573s [taskcluster 2018-08-10T16:44:37.465Z] Result: FAILED [taskcluster 2018-08-10T16:44:37.465Z] === Task Finished === [taskcluster 2018-08-10T16:44:37.465Z] Task Duration: 7m18.0157322s [taskcluster:error] Uploading error artifact public/build from file public/build with message "Could not read directory 'Z:\\task_1533918496\\public\\build'", reason "file-missing-on-worker" and expiry 2019-08-10T16:36:10.835Z [taskcluster:error] TASK FAILURE during artifact upload: file-missing-on-worker: Could not read directory 'Z:\task_1533918496\public\build' [taskcluster 2018-08-10T16:44:38.175Z] Uploading artifact public/logs/certified.log from file generic-worker\certified.log with content encoding "gzip", mime type "text/plain; charset=utf-8" and expiry 2019-08-10T16:36:10.835Z [taskcluster 2018-08-10T16:44:42.233Z] Uploading artifact public/chainOfTrust.json.asc from file generic-worker\chainOfTrust.json.asc with content encoding "gzip", mime type "text/plain; charset=utf-8" and expiry 2019-08-10T16:36:10.835Z [taskcluster 2018-08-10T16:44:43.553Z] Uploading redirect artifact public/logs/live.log to URL https://queue.taskcluster.net/v1/task/WBtnUK5-TvuKlqZ7DEquBg/runs/0/artifacts/public/logs/live_backing.log with mime type "text/plain; charset=utf-8" and expiry 2019-08-10T16:36:10.835Z [taskcluster:error] exit status 255 [taskcluster:error] file-missing-on-worker: Could not read directory 'Z:\task_1533918496\public\build'
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---

Trees are closed because of this issue hitting builds on autoland and inbound so far:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&resultStatus=busted&revision=e8aebe488b2f2e567940577de25013d00e818f7c

https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=busted&revision=c0101502b8b76ad563a3e84b5df203586394f64d&selectedJob=243346423

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=243347927&repo=autoland&lineNumber=43
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=243346447&repo=autoland&lineNumber=41
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=243346436&repo=autoland&lineNumber=174
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=243346450&repo=autoland&lineNumber=887

some logs have:
16:37:32 INFO - 9:16.78 Downloading to temporary location c:\builds\tooltool_cache\ca280e0ff910b26e-rust-size.tar.bz2
16:37:36 INFO - 9:20.52 500 Server Error: Internal Server Error for url: https://cloud-mirror-production-us-east-1.s3.amazonaws.com/https%3A%2F%2Fs3.us-west-2.amazonaws.com%2Ftaskcluster-public-artifacts%2FKT-A_y-FRnKgTFeC2Gy4UQ%2F0%2Fpublic%2Fbuild%2Frust-size.tar.bz2
16:37:36 INFO - 9:20.52 Failed to download rust-size.tar.bz2
16:37:36 ERROR - Return code: 1
16:37:36 ERROR - 1 not in success codes: [0]
16:37:36 WARNING - setting return code to 2
16:37:36 FATAL - Halting on failure while running ['c:\mozilla-build\python\python.exe', '-u', 'z:\task_1556548670\build\src\mach', 'artifact', 'toolchain', '-v', '--retry', '4', '--artifact-manifest', 'z:\task_1556548670\build\src\toolchains.json', '--tooltool-manifest', 'z:\task_1556548670\build\src\browser/config/tooltool-manifests/win64/aarch64.manifest', '--tooltool-url', 'https://tooltool.mozilla-releng.net/', '--authentication-file', 'c:\builds\relengapi.tok', '--cache-dir', 'c:/builds/tooltool_cache', 'public/build/clang.tar.bz2@B-YcuzPmTXiZ2KS3jker4A', 'public/build/rustc.tar.xz@aY-Ygf0eRRSnSen8xZbe9Q', 'public/build/rust-size.tar.bz2@KT-A_y-FRnKgTFeC2Gy4UQ', 'public/build/cbindgen.tar.bz2@RuaUtgEOTH2-j1dc1LF1Lg', 'public/build/nasm.tar.bz2@CVU2Y5olS_a-E8avD5aaXA', 'public/build/node.tar.bz2@AfGoeZNKRniA75Tgze6PQw']
16:37:36 FATAL - Running post_fatal callback...

Flags: needinfo?(sheehan)

All the logs from comment 71 are from tasks running in us-east-1. :fubar pointed out that AWS is reporting increased error rates from S3 in that region, which may be the cause of today's spike.

Usually when we see these errors, it's due to a corrupt clone bundle being downloaded/applied. In case this is another instance of a bad bundle, I rolled back to yesterday's bundles for mozilla-inbound.

Flags: needinfo?(sheehan)

If we wanted to add a retry for this error, would that be a new case in this function?

That function has retry logic for mercurial.error.RepoError - this bug looks to come from an uncaught mercurial.error.ResponseError (from the stack trace in comment 42). We probably need a new function called in a similar fashion to handlerepoerror.

edit: extending handlepullerror would work as well.

Pushed by cosheehan@mozilla.com:
https://hg.mozilla.org/hgcustom/version-control-tools/rev/67f35f8064e4
Add retries when hitting a ResponseError r=sheehan

Status: REOPENED → RESOLVED
Closed: 7 years ago6 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Pushed by cosheehan@mozilla.com:
https://hg.mozilla.org/hgcustom/version-control-tools/rev/fd52a18d052f
robustcheckout: mark strings as byte-strings and add comment about untested branch

Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: