Closed
Bug 1177190
Opened 10 years ago
Closed 10 years ago
git+http doesn't appear to honor keep alive settings with Centos 6
Categories
(Release Engineering :: General, defect)
Release Engineering
General
Tracking
(firefox42 fixed)
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
firefox42 | --- | fixed |
People
(Reporter: u429623, Assigned: wcosta)
References
Details
Attachments
(2 files)
1.64 KB,
patch
|
u429623
:
review+
u429623
:
checked-in+
|
Details | Diff | Splinter Review |
40 bytes,
text/x-review-board-request
|
dustin
:
review+
|
Details |
tl;dr: timeouts fetching from git.mozilla.org may be due to lack of HTTP keep alive support in libcurl
Over the last few days, there have been numerous reports of timeouts in TC jobs interacting with git.mozilla.org.
Due to excellent detective work by a combined TC, MOC, Dev Services & Releng Crew, the following events were noticed:
- TC builder client had a "hung" git fetch for /external/caf/platform/external/libpng.git
- TC builder client had a TCP socket in CLOSE_WAIT state
- git1.dmz.scl3 did not have an associated connection
- git1.dmz.scl3 did have some "client disconnect" messages, but unclear if related
- zlb VIP did not have an associated connection
- zlb does not log connection terminations
The socket in CLOSE_WAIT state triggered a check of keep alive configuration. Neither client nor server override the default setting of 5 seconds for git protocol connections.
However, while researching if there was a configuration setting for keepalive on git+HTTP, the following article http://git.661346.n2.nabble.com/PATCH-http-enable-keepalive-on-TCP-sockets-td7597589.html suggested that git+HTTP keepalives were only supported with libcurl version 7.25 and later.
Investigation of the TC builder client showed it is using centos6, which has version 7.16.7 of libcurl.
There are several options from here - that's what this bug is to coordinate.
Comment 2•10 years ago
|
||
Awesome detective work guys, and nice summary!
Comment 3•10 years ago
|
||
Do we understand why this only started to affect us so severely within the last week?
Moving bug - also occurring in Buildbot jobs, which also use Centos6 builders.
Component: TaskCluster → General Automation
Product: Testing → Release Engineering
QA Contact: catlee
WORKAROUND: change timeout to allow quicker fails while rest of problem investigated.
"10 min" picked as 33% higher than average time.
Attachment #8626311 -
Flags: review?(catlee)
Comment on attachment 8626311 [details] [diff] [review]
timeout.patch
r+ from :catlee IRL (yay WW)
Attachment #8626311 -
Flags: review?(catlee) → review+
Comment on attachment 8626311 [details] [diff] [review]
timeout.patch
https://hg.mozilla.org/build/mozharness/rev/f5d11e85e980
Attachment #8626311 -
Flags: checked-in+
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 10•10 years ago
|
||
Failure logs with the newer 600s timeout:
https://treeherder.mozilla.org/logviewer.html#?job_id=11126605&repo=mozilla-inbound
Reporter | ||
Comment 11•10 years ago
|
||
Next step is to get someone from b2g build team to debug the "repo tool" output and/or add debugging output to it.
Until we know what specific command is failing, and how, we're stuck.
ni: mwu for help and/or a reference
Flags: needinfo?(mwu)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 26•10 years ago
|
||
This is arguably a tree-closing issue (or hide most B2G emulator/device image builds), so this needs attention. I'm not sure who's around right now, but a ~50+% failure rate isn't acceptable and needs attention from *someone* ASAP.
Flags: needinfo?(sdeckelmann)
Flags: needinfo?(jonas)
Flags: needinfo?(jocheng)
Flags: needinfo?(faramarz)
Flags: needinfo?(fabrice)
Comment 27•10 years ago
|
||
No references here. We don't usually mess with git and repo. Sounds like something changed on the automation side and needs to be backed out. Alternately, you can experiment with upgrading git and/or libcurl.
Flags: needinfo?(mwu)
Comment 28•10 years ago
|
||
:wcosta is working right now on upgrading libcurl to stop burning builds. The longer-term fix here is probably getting off CentOS6.
Flags: needinfo?(sdeckelmann)
Flags: needinfo?(jonas)
Updated•10 years ago
|
Flags: needinfo?(fabrice)
Assignee | ||
Updated•10 years ago
|
Assignee: nobody → wcosta
Status: NEW → ASSIGNED
Assignee | ||
Comment 29•10 years ago
|
||
Bug 1177190: Update libcurl in docker images. r=selena
libcurl shipped with CentOS 6 doesn't support keepalive. This is causing
builds to burn.
Assignee | ||
Updated•10 years ago
|
Attachment #8627515 -
Flags: review?(sdeckelmann)
Assignee | ||
Comment 30•10 years ago
|
||
Comment on attachment 8627515 [details]
MozReview Request: Bug 1177190: Update libcurl in docker images. r=selenamarie
Bug 1177190: Update libcurl in docker images. r=selena
libcurl shipped with CentOS 6 doesn't support keepalive. This is causing
builds to burn.
Updated•10 years ago
|
Attachment #8627515 -
Flags: review?(sdeckelmann) → review+
Comment 31•10 years ago
|
||
Comment on attachment 8627515 [details]
MozReview Request: Bug 1177190: Update libcurl in docker images. r=selenamarie
https://reviewboard.mozilla.org/r/12255/#review10733
Ship It!
Assignee | ||
Comment 32•10 years ago
|
||
Comment 33•10 years ago
|
||
(In reply to Wander Lairson Costa [:wcosta] from comment #32)
> https://hg.mozilla.org/integration/b2g-inbound/rev/bba8e8d63c37
sorry had to back this out for perma failures like https://treeherder.mozilla.org/logviewer.html#?job_id=2225232&repo=b2g-inbound
Comment 34•10 years ago
|
||
Updated•10 years ago
|
Flags: needinfo?(wcosta)
Assignee | ||
Comment 35•10 years ago
|
||
Emulator bustage was caused by Bug 1178899. Should be fixed now, I could run a successfully build:
https://tools.taskcluster.net/task-inspector/#a8ecyIG3QEuGN95D7BOXUg/2
Can we backout the backout?
Flags: needinfo?(wcosta) → needinfo?(cbook)
Assignee | ||
Comment 36•10 years ago
|
||
Just talked to selena, we are going to that in other way.
Flags: needinfo?(cbook)
Assignee | ||
Comment 37•10 years ago
|
||
Comment on attachment 8627515 [details]
MozReview Request: Bug 1177190: Update libcurl in docker images. r=selenamarie
Bug 1177190: Update libcurl in docker images. r=selenamarie
libcurl on CentOS 6 doesn't support keealive, so we upgrade it.
The approach we take to avoid breaking buildbot machines is to
grab libcurl from CentOS 7, build it on CentOS 6 and upload rpms
to S3.
Attachment #8627515 -
Attachment description: MozReview Request: Bug 1177190: Update libcurl in docker images. r=selena → MozReview Request: Bug 1177190: Update libcurl in docker images. r=selenamarie
Attachment #8627515 -
Flags: review+ → review?(sdeckelmann)
Assignee | ||
Updated•10 years ago
|
Attachment #8627515 -
Flags: review?(sdeckelmann) → review?(dustin)
Assignee | ||
Comment 38•10 years ago
|
||
Comment on attachment 8627515 [details]
MozReview Request: Bug 1177190: Update libcurl in docker images. r=selenamarie
Bug 1177190: Update libcurl in docker images. r=selenamarie
libcurl on CentOS 6 doesn't support keealive, so we upgrade it.
The approach we take to avoid breaking buildbot machines is to
grab libcurl from CentOS 7, build it on CentOS 6 and upload rpms
to S3.
Assignee | ||
Comment 39•10 years ago
|
||
Comment 40•10 years ago
|
||
https://reviewboard.mozilla.org/r/12253/#review10881
::: testing/docker/b2g-build/Dockerfile:18
(Diff revision 2)
> + cd -
You should be able to just 'yum install $url' which avoids loading the RPMs onto disk
Comment 41•10 years ago
|
||
Comment on attachment 8627515 [details]
MozReview Request: Bug 1177190: Update libcurl in docker images. r=selenamarie
It'd be good to have some comments in there regarding why these aren't installed from a yum repo, too.
FWIW, there's another option to enforce keepalive for everything:
http://www.tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/#libkeepalive
why the linux kernel doesn't do this by default, I don't know. Would the Internet collapse from an extra TCP round trip every 5 minutes? The number of TCP connections that last that long is a vanishingly small portion of all TCP connections. But I digress..
Attachment #8627515 -
Flags: review?(dustin) → review+
Comment 42•10 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
status-firefox42:
--- → fixed
Resolution: --- → FIXED
Updated•10 years ago
|
Flags: needinfo?(jocheng)
Updated•10 years ago
|
Flags: needinfo?(faramarz)
Updated•7 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•