Intermittent-infra ChunkedEncodingError: ("Connection broken: error(104, 'Connection reset by peer')", error(104, 'Connection reset by peer'))

RESOLVED FIXED

Status

defect
P2
normal
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: aryx, Assigned: garbas)

Tracking

({intermittent-failure})

unspecified
Dependency tree / graph

Firefox Tracking Flags

(firefox55 fixed, firefox56 fixed)

Details

(Whiteboard: [stockwell infra])

Attachments

(1 attachment)

https://treeherder.mozilla.org/logviewer.html#?job_id=98866326&repo=autoland

Started yesterday or so (or got at least more frequent).

[task 2017-05-13T07:47:52.843455Z] 07:47:52     INFO -   0:21.55 Downloading gtk3.tar.xz
[task 2017-05-13T07:47:52.843631Z] 07:47:52     INFO -   0:21.55 attempt 1/5
[task 2017-05-13T07:47:52.843891Z] 07:47:52     INFO -   0:21.55 Downloading to temporary location /home/worker/tooltool-cache/3915f8ec396c56a8a92e6f9695b70f09ce9d1582359d1258e37e3fd43a143bc974410e4cfc27f500e095f34a8956206e0ebf799b7287f0f38def0d5e34ed71c9
[task 2017-05-13T07:47:53.252435Z] 07:47:53     INFO -   0:21.96 Downloading... 0.0 %
[task 2017-05-13T07:47:53.266578Z] 07:47:53     INFO -   0:21.98 Downloading... 5.0 %
[task 2017-05-13T07:47:53.278759Z] 07:47:53     INFO -   0:21.99 Downloading... 10.0 %
[task 2017-05-13T07:47:53.281625Z] 07:47:53     INFO -   0:21.99 Downloading... 15.1 %
[task 2017-05-13T07:47:53.289058Z] 07:47:53     INFO -  Error running mach:
[task 2017-05-13T07:47:53.289410Z] 07:47:53     INFO -      ['artifact', 'toolchain', '-v', '--retry', '4', '--tooltool-manifest', '/home/worker/workspace/build/src/browser/config/tooltool-manifests/linux64/releng.manifest', '--tooltool-url', 'http://relengapi/tooltool/', '--cache-dir', '/home/worker/tooltool-cache']
[task 2017-05-13T07:47:53.289627Z] 07:47:53     INFO -  The error occurred in code that was called by the mach command. This is either
[task 2017-05-13T07:47:53.289867Z] 07:47:53     INFO -  a bug in the called code itself or in the way that mach is calling it.
[task 2017-05-13T07:47:53.290092Z] 07:47:53     INFO -  You should consider filing a bug for this issue.
[task 2017-05-13T07:47:53.290328Z] 07:47:53     INFO -  If filing a bug, please include the full output of mach, including this error
[task 2017-05-13T07:47:53.290528Z] 07:47:53     INFO -  message.
[task 2017-05-13T07:47:53.290790Z] 07:47:53     INFO -  The details of the failure are as follows:
[task 2017-05-13T07:47:53.291051Z] 07:47:53     INFO -  ChunkedEncodingError: ("Connection broken: error(104, 'Connection reset by peer')", error(104, 'Connection reset by peer'))
[task 2017-05-13T07:47:53.291296Z] 07:47:53     INFO -    File "/home/worker/workspace/build/src/python/mozbuild/mozbuild/mach_commands.py", line 1755, in artifact_toolchain
[task 2017-05-13T07:47:53.291496Z] 07:47:53     INFO -      record.fetch_with(cache)
[task 2017-05-13T07:47:53.291750Z] 07:47:53     INFO -    File "/home/worker/workspace/build/src/python/mozbuild/mozbuild/mach_commands.py", line 1659, in fetch_with
[task 2017-05-13T07:47:53.291956Z] 07:47:53     INFO -      self.filename = cache.fetch(self.url)
[task 2017-05-13T07:47:53.292198Z] 07:47:53     INFO -    File "/home/worker/workspace/build/src/python/mozbuild/mozbuild/artifacts.py", line 816, in fetch
[task 2017-05-13T07:47:53.292390Z] 07:47:53     INFO -      dl.wait()
[task 2017-05-13T07:47:53.292634Z] 07:47:53     INFO -    File "/home/worker/workspace/build/src/python/dlmanager/dlmanager/manager.py", line 101, in wait
[task 2017-05-13T07:47:53.292833Z] 07:47:53     INFO -      self.raise_if_error()
[task 2017-05-13T07:47:53.293079Z] 07:47:53     INFO -    File "/home/worker/workspace/build/src/python/dlmanager/dlmanager/manager.py", line 116, in raise_if_error
[task 2017-05-13T07:47:53.293278Z] 07:47:53     INFO -      six.reraise(*self.__error)
[task 2017-05-13T07:47:53.293527Z] 07:47:53     INFO -    File "/home/worker/workspace/build/src/python/dlmanager/dlmanager/manager.py", line 168, in _download
[task 2017-05-13T07:47:53.293737Z] 07:47:53     INFO -      for chunk in response.iter_content(chunk_size):
[task 2017-05-13T07:47:53.293974Z] 07:47:53     INFO -    File "/home/worker/workspace/build/src/python/requests/requests/models.py", line 663, in generate
[task 2017-05-13T07:47:53.294174Z] 07:47:53     INFO -      raise ChunkedEncodingError(e)
[task 2017-05-13T07:47:53.306997Z] 07:47:53    ERROR - Return code: 1
[task 2017-05-13T07:47:53.307268Z] 07:47:53    ERROR - 1 not in success codes: [0]
Component: Mozharness → General
Priority: -- → P2
Whiteboard: [stockwell infra]
we have had an increase of these failures from Jun 19-21, hard to tell if this is sustained or a spike for a couple days.  :catlee, as the triage owner for releng/general, could you get someone to look into this failure so we can fix any issues that might be fixable or at least understand why this is occurring more frequently
Flags: needinfo?(catlee)
Rok, can you take a quick look? Recent logs show this error intermittently when downloading from tooltool.
Flags: needinfo?(catlee) → needinfo?(rgarbas)
:catlee: i'll take a look
Assignee: nobody → rgarbas
Flags: needinfo?(rgarbas)
Comment on attachment 8883870 [details]
Bug 1364650 - retry on ChunkedEncodingError;

https://reviewboard.mozilla.org/r/154858/#review160356

FWIW, all these errors derive from requests.exceptions.RequestException. However, there are definitely some derived error types that we don't want to retry on (like too many redirects). It is too bad there isn't a class hierarchy for all errors that are likely transient.

Also, I'm kinda surprised we are seeing errors reported at the chunked transfer level. That's got to be a bad network connection, misbehaving server, or something odd. I'd expect the connection to die at the TCP layer before seeing a chunked transfer error. Who knows. It is quite possible the version of requests we are using mis-attributes the error.
Attachment #8883870 - Flags: review?(gps) → review+
Pushed by mtabara@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d29d22da431d
retry on ChunkedEncodingError; r=gps DONTBUILD CLOSED TREE
https://hg.mozilla.org/mozilla-central/rev/d29d22da431d
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Depends on: 1385621
I see only 1 attempt to download in https://treeherder.mozilla.org/#/jobs?repo=autoland&selectedJob=136894707, which suggests this isn't really an intermittent failure so much as badly configured robustness handling:

[task 2017-10-13T20:41:44.264Z]  0:02.15 Downloading clang.tar.xz
[task 2017-10-13T20:41:44.264Z]  0:02.15 attempt 1/1
[task 2017-10-13T20:41:44.264Z]  0:02.15 Downloading to temporary location /builds/worker/tooltool-cache/ef88091b08550b8f-clang.tar.xz
[task 2017-10-13T20:41:44.592Z]  0:02.48 Downloading... 0.0 %
[task 2017-10-13T20:41:44.665Z]  0:02.55 Downloading... 5.0 %
[task 2017-10-13T20:41:44.747Z]  0:02.63 Downloading... 10.0 %
[task 2017-10-13T20:41:44.831Z]  0:02.72 Downloading... 15.0 %
[task 2017-10-13T20:41:44.914Z]  0:02.80 Downloading... 20.0 %
[task 2017-10-13T20:41:44.997Z]  0:02.88 Downloading... 25.0 %
[task 2017-10-13T20:41:45.080Z]  0:02.97 Downloading... 30.0 %
[task 2017-10-13T20:41:45.163Z]  0:03.05 Downloading... 35.0 %
[task 2017-10-13T20:41:45.246Z]  0:03.13 Downloading... 40.0 %
[task 2017-10-13T20:41:45.329Z]  0:03.21 Downloading... 45.0 %
[task 2017-10-13T20:41:45.450Z]  0:03.34 Downloading... 50.0 %
[task 2017-10-13T20:41:45.518Z]  0:03.40 Downloading... 55.0 %
[task 2017-10-13T20:41:45.578Z]  0:03.46 Downloading... 60.0 %
[task 2017-10-13T20:41:45.662Z]  0:03.55 Downloading... 65.0 %
[task 2017-10-13T20:41:45.744Z]  0:03.63 Downloading... 70.0 %
[task 2017-10-13T20:41:45.828Z]  0:03.71 Downloading... 75.0 %
[task 2017-10-13T20:41:45.910Z]  0:03.80 Downloading... 80.0 %
[task 2017-10-13T20:41:45.992Z]  0:03.88 Downloading... 85.0 %
[task 2017-10-13T20:41:46.077Z]  0:03.96 Downloading... 90.0 %
[task 2017-10-13T20:41:46.115Z]  0:04.00 ("Connection broken: error(104, 'Connection reset by peer')", error(104, 'Connection reset by peer'))
[task 2017-10-13T20:41:46.115Z]  0:04.00 Failed to download clang.tar.xz
You need to log in before you can comment on or make changes to this bug.