Closed
Bug 767739
Opened 13 years ago
Closed 13 years ago
Possible issue with hg http servers blocking 13.0.2 chemspill
Categories
(Developer Services :: General, task)
Developer Services
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: hwine, Assigned: fox2mike)
References
Details
Attachments
(2 files)
build-vpn servers hanging after connect.
Updated•13 years ago
|
Assignee: server-ops-devservices → mburns
Summary: hg http servers not hanging, blocks 13.0.2 chemspill → hg http servers hanging, blocks 13.0.2 chemspill
This has impacted 12 "repack" builders so far, all when pulling over http. For example:
http://buildbot-master12.build.mozilla.org:8001/builders/release-mozilla-release-linux64_repack_5%2F6/builds/2/steps/run_script/logs/stdio
Assignee | ||
Updated•13 years ago
|
Assignee: mburns → shyam
Assignee | ||
Comment 2•13 years ago
|
||
Can't see anything obviously wrong, but ganglia says the load on hgweb1 shot up at 1330 (compared to before). I've disabled hgweb1 in the pool for now.
No idea why releng builders are NOT using their exclusive 3 machine pool of hg-internal.dmz.scl3.mozilla.com, but that's for another bug.
Assignee | ||
Comment 3•13 years ago
|
||
Corresponding to load, there's a spike in network traffic as well. This could have been legitimate traffic too, fwiw. This is also why issues like this are easier to debug if releng builders use machines assigned to them (so we know no public traffic hits those machines).
Assignee | ||
Comment 4•13 years ago
|
||
Zeus also shows monitor failures for hgweb1, but about an hour before ganglia shows any issues. I'm inclined to disregard those are not related.
The entries in the log :
23/Jun/2012:12:28:15 -0700 SERIOUS Pool hgweb-http, Node 10.22.74.32:80: Node 10.22.74.32 has failed - A monitor has detected a failure
23/Jun/2012:12:22:17 -0700 SERIOUS Pool hgweb-http, Node 10.22.74.32:80: Node 10.22.74.32 has failed - A monitor has detected a failure
23/Jun/2012:12:22:17 -0700 SERIOUS Pool hgweb-http, Node 10.22.74.32:80: Node 10.22.74.32 has failed - A monitor has detected a failure
23/Jun/2012:12:22:17 -0700 SERIOUS Pool hgweb-http, Node 10.22.74.32:80: Node 10.22.74.32 has failed - A monitor has detected a failure
Assignee | ||
Updated•13 years ago
|
Summary: hg http servers hanging, blocks 13.0.2 chemspill → Possible issue with hg http servers blocking 13.0.2 chemspill
Assignee | ||
Comment 5•13 years ago
|
||
My theory here :
hgweb1 probably had legit traffic, was doing work..and then got hit by multiple l10n repo requests from the builders and was slower than usual to respond. Since the builders seem to be hitting hg.m.o, they're going to be contesting with regular hg http traffic, but that shouldn't be causing much issue.
I've now kicked httpd on hgweb1 and added it back to the pool.
hwine confirms on IRC that all his builders are good and have finished what they're doing.
Going to close this bug out.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Comment 6•13 years ago
|
||
(In reply to Shyam Mani [:fox2mike] from comment #5)
> My theory here :
>
> hgweb1 probably had legit traffic, was doing work..and then got hit by
> multiple l10n repo requests from the builders and was slower than usual to
> respond. Since the builders seem to be hitting hg.m.o, they're going to be
> contesting with regular hg http traffic, but that shouldn't be causing much
> issue.
>
> I've now kicked httpd on hgweb1 and added it back to the pool.
Possibly related, see https://bugzilla.mozilla.org/show_bug.cgi?id=767762#c3
> hwine confirms on IRC that all his builders are good and have finished what
> they're doing.
>
> Going to close this bug out.
Updated•11 years ago
|
Component: Server Operations: Developer Services → General
Product: mozilla.org → Developer Services
You need to log in
before you can comment on or make changes to this bug.
Description
•