Status

mozilla.org Graveyard
Server Operations
RESOLVED FIXED
9 years ago
3 years ago

People

(Reporter: bhearsum, Assigned: chizu)

Tracking

Details

(Reporter)

Description

9 years ago
The try repo has once again grown more heads than it can handle, we need to reset it. Looking to do this in the downtime on the morning of August 20th.

Comment 1

9 years ago
Sorry to make this critical but try-server seems to be not usable until this is fixed.

  TERM=linux
  USER=cltbld
  _=/tools/buildbot/bin/buildbot
 closing stdin
 using PTY: False
requesting all changes
abort: HTTP Error 414: Request-URI Too Large
elapsedTime=0.617869
program finished with exit code 255
=== Output ended ===
Severity: normal → critical
I'm downgrading this back to normal severity because the evidence is that the 414 error is another symptom of hg.m.o not working properly (bug 511258). There are only four of these errors, across all the try builds in the last day or so, counted against many more failed clones in the style of bug 511258, and lots of successful clones, so it's not a systematic problem. That's not to say that cleaning out all the heads isn't necessary, just that it's not the root cause.
Severity: critical → normal
One theory about the underlying hg problem is that an intensive server process can crowd out other server processes for memory, so I think it might be worth stripping the extra heads here.  I would imagine that the extra heads add significantly to the working set of the server process, so while we get more RAM installed in the hg hosts (someone is on the way to the colo, I believe!), could we strip these heads as well?

Comment 4

9 years ago
5 out of 7 try server columns failed for me last time I pushed, so this isn't just sporadic.
We went ahead and tried this but it hasn't been successful. After triggering two try server runs (16 builds in total), there were 8 successful clones and 8 transaction aborts on premature EOFs. Given that's 50/50 I think we should start hammering on specific proxy+endpoint combinations until we find the culprit. Lets do that in bug 511258.

On the Releng side I had to restart the master on sm-try-master, since a reconfig didn't convince the HgPoller to forget about the most recent revision from the old repo (ala bug 500246).
Assignee: bhearsum → thardcastle
Status: ASSIGNED → RESOLVED
Last Resolved: 9 years ago
Component: Release Engineering → Server Operations
OS: Mac OS X → All
QA Contact: release → mrz
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.