Closed Bug 767739 Opened 13 years ago Closed 13 years ago

Possible issue with hg http servers blocking 13.0.2 chemspill

Categories

(Developer Services :: General, task)

task
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: hwine, Assigned: fox2mike)

References

Details

Attachments

(2 files)

build-vpn servers hanging after connect.
Assignee: server-ops-devservices → mburns
Summary: hg http servers not hanging, blocks 13.0.2 chemspill → hg http servers hanging, blocks 13.0.2 chemspill
Assignee: mburns → shyam
Can't see anything obviously wrong, but ganglia says the load on hgweb1 shot up at 1330 (compared to before). I've disabled hgweb1 in the pool for now. No idea why releng builders are NOT using their exclusive 3 machine pool of hg-internal.dmz.scl3.mozilla.com, but that's for another bug.
Corresponding to load, there's a spike in network traffic as well. This could have been legitimate traffic too, fwiw. This is also why issues like this are easier to debug if releng builders use machines assigned to them (so we know no public traffic hits those machines).
Zeus also shows monitor failures for hgweb1, but about an hour before ganglia shows any issues. I'm inclined to disregard those are not related. The entries in the log : 23/Jun/2012:12:28:15 -0700 SERIOUS Pool hgweb-http, Node 10.22.74.32:80: Node 10.22.74.32 has failed - A monitor has detected a failure 23/Jun/2012:12:22:17 -0700 SERIOUS Pool hgweb-http, Node 10.22.74.32:80: Node 10.22.74.32 has failed - A monitor has detected a failure 23/Jun/2012:12:22:17 -0700 SERIOUS Pool hgweb-http, Node 10.22.74.32:80: Node 10.22.74.32 has failed - A monitor has detected a failure 23/Jun/2012:12:22:17 -0700 SERIOUS Pool hgweb-http, Node 10.22.74.32:80: Node 10.22.74.32 has failed - A monitor has detected a failure
Summary: hg http servers hanging, blocks 13.0.2 chemspill → Possible issue with hg http servers blocking 13.0.2 chemspill
My theory here : hgweb1 probably had legit traffic, was doing work..and then got hit by multiple l10n repo requests from the builders and was slower than usual to respond. Since the builders seem to be hitting hg.m.o, they're going to be contesting with regular hg http traffic, but that shouldn't be causing much issue. I've now kicked httpd on hgweb1 and added it back to the pool. hwine confirms on IRC that all his builders are good and have finished what they're doing. Going to close this bug out.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Blocks: 767745
(In reply to Shyam Mani [:fox2mike] from comment #5) > My theory here : > > hgweb1 probably had legit traffic, was doing work..and then got hit by > multiple l10n repo requests from the builders and was slower than usual to > respond. Since the builders seem to be hitting hg.m.o, they're going to be > contesting with regular hg http traffic, but that shouldn't be causing much > issue. > > I've now kicked httpd on hgweb1 and added it back to the pool. Possibly related, see https://bugzilla.mozilla.org/show_bug.cgi?id=767762#c3 > hwine confirms on IRC that all his builders are good and have finished what > they're doing. > > Going to close this bug out.
Component: Server Operations: Developer Services → General
Product: mozilla.org → Developer Services
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: