Closed Bug 571305 Opened 14 years ago Closed 14 years ago

hg is burping

Categories

(Infrastructure & Operations :: RelOps: General, task)

x86
macOS
task
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: armenzg, Assigned: chizu)

Details

I had few builds failing at different time because they could not clone from hg with:
> abort: premature EOF reading chunk (got 9239 bytes, expected 24241)

gavin posted this on #developers
[10 12:02:22] <nagios> [74] dm-hg01:http - hg.mozilla.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds

It also takes a long time to load http://hg.mozilla.org/try and TBPL for try is unusable.
http://tests.themasta.com/tinderboxpushlog/?tree=MozillaTry

I am raising the severity since we will have to close try which these days has major usage by developers than even mozilla-central.
Severity: critical → blocker
Assignee: server-ops → aravind
The problems in the last hour have been the dm-vcview hosts swapping. Restarting httpd fixes it for a while but it continues to leak memory.
For context, the try server was reset this morning in bug 570265.
The only difference on the steps from the documentation was to make the group to be scm_level_1 instead of hg_mozilla.

Tinderbox.m.o was also restarted if it could be related somehow. 

Fore reference:
https://wiki.mozilla.org/index.php?title=ReleaseEngineering%3AResetTryServer&diff=229657&oldid=208141
Aravind and I noticed dm-vcview04 having network problems and lacking the nagios monitors to alert us when it's gone unavailable. It handles a lot of the traffic, so the extra load was probably killing the other three.

I think this is the known bnx2x driver problem we've had on other servers. Next step is to update that and get dm-vcview04 into nagios.P
Assignee: aravind → thardcastle
It's in nagios now and the driver has been upgraded. If dm-vcview04 has any further problems they'll page on-call, so we'll watch for that.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Nothing has changed as far as I can see.  For example, tbpl on try can't be loaded, and the actual hg repository is not accessible as well.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Ehsan, I think that is much more likely to be fallout from bug 570265 than one of the machines behind hg.m.o being broken.
(In reply to comment #6)
> Ehsan, I think that is much more likely to be fallout from bug 570265 than one
> of the machines behind hg.m.o being broken.

It might be.  I'm not talking about technical details.  But this is a critical problem, which is blocking developers.  I for one have three sets of patches right now that I need try server results for, and I'm blocked on them.  I'm sure that other developers have the same problem as well.

If I should file another bug, I will gladly.  If there is anything else that I need to do, please let me know!
Ok, separating the issues. Resolving this fixed on basis of the health of dm-vcview04. I'll reopen bug 570265 and blockerize it for the try issue.
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
Summary: hg is burping and try is not feeling well → hg is burping
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.