Closed
Bug 1364695
Opened 8 years ago
Closed 8 years ago
Intermittent ConnectionError: ('Connection aborted.', BadStatusLine("''",))
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task, P1)
Infrastructure & Operations Graveyard
CIDuty
Tracking
(firefox55 fixed, firefox56 fixed)
RESOLVED
FIXED
People
(Reporter: aryx, Assigned: garbas)
Details
(Keywords: intermittent-failure, Whiteboard: [stockwell infra])
Attachments
(1 file)
1.98 KB,
patch
|
gps
:
review+
|
Details | Diff | Splinter Review |
https://treeherder.mozilla.org/logviewer.html#?job_id=98866763&repo=mozilla-inbound
07:57:24 INFO - 0:19.23 Downloading... 100.0 %
07:57:28 INFO - 0:23.48 Downloaded artifact to c:\builds\tooltool_cache\babc414ffc0457d27f5a1ed24a8e4873afbe2f1c1a4075469a27c005e1babc3b2a788f643f825efedff95b79686664c67ec4340ed535487168a3482e68559bc7
07:57:29 INFO - 0:24.83 hashed u'c:\\builds\\tooltool_cache\\babc414ffc0457d27f5a1ed24a8e4873afbe2f1c1a4075469a27c005e1babc3b2a788f643f825efedff95b79686664c67ec4340ed535487168a3482e68559bc7' with sha512 to be babc414ffc0457d27f5a1ed24a8e4873afbe2f1c1a4075469a27c005e1babc3b2a788f643f825efedff95b79686664c67ec4340ed535487168a3482e68559bc7
07:57:29 INFO - 0:24.83 Downloading clang.tar.bz2
07:57:29 INFO - 0:24.83 attempt 1/5
07:57:29 INFO - 0:24.83 Downloading to temporary location c:\builds\tooltool_cache\44dee70d525ea93952af27f943d1cc773311970c31d971d2bc2e3437cce0c899f3a03ddd8e42e86f1b4fd9ab1c4bc1767cdb0406eb4b3934ae4fc272dab830dc
07:57:30 INFO - Error running mach:
07:57:30 INFO - ['artifact', 'toolchain', '-v', '--retry', '4', '--tooltool-manifest', 'z:\\task_1494661831\\build\\src\\browser\\config\\tooltool-manifests\\win64\\clang.manifest', '--tooltool-url', 'https://api.pub.build.mozilla.org/tooltool/', '--authentication-file', 'c:\\builds\\relengapi.tok', '--cache-dir', 'c:/builds/tooltool_cache']
07:57:30 INFO - The error occurred in code that was called by the mach command. This is either
07:57:30 INFO - a bug in the called code itself or in the way that mach is calling it.
07:57:30 INFO - You should consider filing a bug for this issue.
07:57:30 INFO - If filing a bug, please include the full output of mach, including this error
07:57:30 INFO - message.
07:57:30 INFO - The details of the failure are as follows:
07:57:30 INFO - ConnectionError: ('Connection aborted.', BadStatusLine("''",))
07:57:30 INFO - File "z:\task_1494661831\build\src\python/mozbuild/mozbuild/mach_commands.py", line 1755, in artifact_toolchain
07:57:30 INFO - record.fetch_with(cache)
07:57:30 INFO - File "z:\task_1494661831\build\src\python/mozbuild/mozbuild/mach_commands.py", line 1659, in fetch_with
07:57:30 INFO - self.filename = cache.fetch(self.url)
07:57:30 INFO - File "z:\task_1494661831\build\src\python/mozbuild\mozbuild\artifacts.py", line 816, in fetch
07:57:30 INFO - dl.wait()
07:57:30 INFO - File "z:\task_1494661831\build\src\python/dlmanager\dlmanager\manager.py", line 101, in wait
07:57:30 INFO - self.raise_if_error()
07:57:30 INFO - File "z:\task_1494661831\build\src\python/dlmanager\dlmanager\manager.py", line 116, in raise_if_error
07:57:30 INFO - six.reraise(*self.__error)
07:57:30 INFO - File "z:\task_1494661831\build\src\python/dlmanager\dlmanager\manager.py", line 157, in _download
07:57:30 INFO - with closing(session.get(url, stream=True)) as response:
07:57:30 INFO - File "z:\task_1494661831\build\src\python/requests\requests\sessions.py", line 480, in get
07:57:30 INFO - return self.request('GET', url, **kwargs)
07:57:30 INFO - File "z:\task_1494661831\build\src\python/requests\requests\sessions.py", line 468, in request
07:57:30 INFO - resp = self.send(prep, **send_kwargs)
07:57:30 INFO - File "z:\task_1494661831\build\src\python/requests\requests\sessions.py", line 597, in send
07:57:30 INFO - history = [resp for resp in gen] if allow_redirects else []
07:57:30 INFO - File "z:\task_1494661831\build\src\python/requests\requests\sessions.py", line 195, in resolve_redirects
07:57:30 INFO - **adapter_kwargs
07:57:30 INFO - File "z:\task_1494661831\build\src\python/requests\requests\sessions.py", line 576, in send
07:57:30 INFO - r = adapter.send(request, **kwargs)
07:57:30 INFO - File "z:\task_1494661831\build\src\python/requests\requests\adapters.py", line 426, in send
07:57:30 INFO - raise ConnectionError(err, request=request)
07:57:30 ERROR - Return code: 1
07:57:30 ERROR - 1 not in success codes: [0]
07:57:30 WARNING - setting return code to 2
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
Reporter | |
Comment 3•8 years ago
|
||
This hits again, this time massively: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=367734cc9370f2528dc564921e3d678cb352f514&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable
Closed trees for this.
Severity: normal → blocker
Component: Build Config → Buildduty
Product: Core → Release Engineering
QA Contact: catlee
Updated•8 years ago
|
Whiteboard: [stockwell infra]
Assignee | ||
Comment 4•8 years ago
|
||
one of the webheads of relengapi didn't have port 5432 open to listen to new database. this was fixed in Bug 1344364.
Assignee | ||
Comment 5•8 years ago
|
||
one sucessfull build (finally).
https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=07facc83000c26e54cd48adabcc6530c96686497&filter-searchStr=Linux+debug+Executed+by+TaskCluster+build-linux%2Fdebug+tc(B)
Also reponse time dropped from ~700ms to ~400ms
Assignee: nobody → rgarbas
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
||
Comment 10•8 years ago
|
||
:garbas -- This is much better compared to the May 22 spike, but failures continue at a rate of a dozen or so per day. Do you think you'll be able to eliminate this failure? Would it be helpful to retry when this happens? (It looks like there is retry logic involved, but it is not utilized in this case - most failures happen on "attempt 1/5".)
Flags: needinfo?(rgarbas)
Updated•8 years ago
|
Severity: blocker → critical
Priority: -- → P1
Assignee | ||
Comment 11•8 years ago
|
||
:gbrown: looks like there is something going wrong with retry logic. i will give it a look.
Flags: needinfo?(rgarbas)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 15•8 years ago
|
||
Retry currently only happens when ``requests.exceptions.HttpError`` exception is raised. I think retry should also happen on ``requests.exceptions.ConnectionError``.
Should I also add a timeout between each retries, (eg. one second)?
Attachment #8876734 -
Flags: review?(gps)
Comment 16•8 years ago
|
||
Comment on attachment 8876734 [details] [diff] [review]
tooltool_retry_on_connection_error.patch
Review of attachment 8876734 [details] [diff] [review]:
-----------------------------------------------------------------
This is a valid solution so it earns r+.
But I think a better solution is to use the built-in retry logic in requests. See https://stackoverflow.com/questions/15431044/can-i-set-max-retries-for-requests-request for code patterns. Note how it is even possible to configure backoff intervals for the retry logic. Also, remember to .mount('https://') as well as 'http://'.
Attachment #8876734 -
Flags: review?(gps) → review+
Comment 17•8 years ago
|
||
(In reply to Gregory Szorc [:gps] from comment #16)
> Comment on attachment 8876734 [details] [diff] [review]
> tooltool_retry_on_connection_error.patch
>
> Review of attachment 8876734 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> This is a valid solution so it earns r+.
>
> But I think a better solution is to use the built-in retry logic in
> requests. See
> https://stackoverflow.com/questions/15431044/can-i-set-max-retries-for-
> requests-request for code patterns. Note how it is even possible to
> configure backoff intervals for the retry logic. Also, remember to
> .mount('https://') as well as 'http://'.
The problem with using the built-in retry logic is that it won't retry for HTTP errors, and then you end up with two retry strategies.
Comment 18•8 years ago
|
||
Comment on attachment 8876734 [details] [diff] [review]
tooltool_retry_on_connection_error.patch
Review of attachment 8876734 [details] [diff] [review]:
-----------------------------------------------------------------
::: python/mozbuild/mozbuild/mach_commands.py
@@ +1790,5 @@
> sleeptime=60)):
> try:
> record.fetch_with(cache)
> + except (requests.exceptions.HTTPError,
> + requests.exceptions.ConnectionError) as e:
Note it might be worth being broader than ConnectionError and HTTPError here, and use RequestException. Although that might be too broad... maybe add Timeout only?
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 21•8 years ago
|
||
Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/mozilla-inbound/rev/45b27cacb06e
Make `mach artifact toolchain` also retry on ConnectionError. r=gps
Comment 22•8 years ago
|
||
Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/mozilla-inbound/rev/7bc0766b6a76
Syntax fixup for bug 1364695 for bustage. r=me
Comment 23•8 years ago
|
||
bugherder |
Comment 24•8 years ago
|
||
bugherder uplift |
status-firefox55:
--- → fixed
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•7 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•6 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•