Closed Bug 491092 Opened 15 years ago Closed 15 years ago

Reset hg repo used by TryServer

Categories

(mozilla.org Graveyard :: Server Operations, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: joduinn, Assigned: aravind)

Details

Can you please reset the hg repo that is used by TryServer? 

The hg repo being used by TryServer has been used since TryServer came online last year, and now has too many heads for pushloghtml to be able to handle (currently ~962 heads).

This will require a brief reboot/restart of the TryServer, so will need to be announced as part of the downtime notice. If you let us know when you plan to do this, we'll start blogs/newsgroup posts.

(As an aside, going forward, we should do this on some recurring basis.)
(In reply to comment #0)
> (As an aside, going forward, we should do this on some recurring basis.)

How do people feel about planning time for this on the first Monday of every month? Mondays are the best time (excluding weekends) because we get the fewest pushes then.
(In reply to comment #1)
> (In reply to comment #0)
> > (As an aside, going forward, we should do this on some recurring basis.)
> 
> How do people feel about planning time for this on the first Monday of every
> month? Mondays are the best time (excluding weekends) because we get the fewest
> pushes then.

Once a month should be plenty. However, instead of Monday, I'd prefer Tuesday evening, during the regular Tuesday evening downtime? If we can get TryServer master to boot up in a working state, we dont even need to be around for it, right?
(In reply to comment #2)
> (In reply to comment #1)
> > (In reply to comment #0)
> > > (As an aside, going forward, we should do this on some recurring basis.)
> > 
> > How do people feel about planning time for this on the first Monday of every
> > month? Mondays are the best time (excluding weekends) because we get the fewest
> > pushes then.
> 
> Once a month should be plenty. However, instead of Monday, I'd prefer Tuesday
> evening, during the regular Tuesday evening downtime? 

This would work if we make sure we don't do it until 10pm or so. Our longest builds (mac/win32 unittest) take 2-2.5h to run - so I'd want to make sure that everything pushed before the downtime starts has enough time to complete. 10pm should be 3h after it starts - so that WFM.

> If we can get TryServer
> master to boot up in a working state, we dont even need to be around for it,
> right?

It already does AFAIK. And yeah, given that, we wouldn't have to be around.

On a side note, I'm currently trying to run some numbers on how long it will take the try repo to get to a point where http clones will fail - more to come there.
Some additional data:
* HTTP cloning fails after a repository has 96 heads or more.
* During March we had 216 pushes to try, during April we had 211.

We definitely need to fix the try server code so it doesn't break at 96 heads, but we might want to consider resetting the repository on a weekly basis, too.
Is it possible to identify old branches and 'hg strip' them?
Assignee: server-ops → aravind
Seems to me that it would be simpler to just delete it and re-clone it from mozilla-central.
Followup from group meeting and a later chat with Aravind;

1) One idea today was to reconfig the server to disable hg poller, delete/replace the hg repo and then reconfig the server to reenable the hg poller. The hope was that this would allow pending try jobs to remain queued up and so make the downtime "invisible" to developers. However, Aravind suggested this would still lose pending jobs, because they wouldnt apply cleanly to the newly recreated repo, so not worth the extra effort. 

2) If we still need to do a downtime for this, can we power it down totally and move it off eql-logic disks at the same time?

3) There was some discussion on when is a good / bad time to do this. To start discussions, what about doing this during the usual IT downtime Thursday night? 



(As an aside, I know we're doing into some details here, but it feels right, as we are doing this for the first time, and are looking at doing this on a recurring basis. Taking the time to figure it out properly is good!)
(In reply to comment #7)
> 3) There was some discussion on when is a good / bad time to do this. To start
> discussions, what about doing this during the usual IT downtime Thursday night? 
> 

On Thursday, April 30th we had the following pushes at the following times: 17:38:31, 18:38:59, 20:37:21, 22:48:44.
On Thursday, April 23rd: 21:08:27, 22:12:28
(We don't have data for anything earlier)

Compared with other times of the day, this is a very small amount. Our biggest try load appears to be 10am - 4pm pacific time.

Given all of that, I think this is an ideal time. What do others think?
Flags: needs-downtime+
Do we care about saving the old contents of the repo?  (I'm not sure.)
(In reply to comment #8)
> (In reply to comment #7)
> > 3) There was some discussion on when is a good / bad time to do this. To start
> > discussions, what about doing this during the usual IT downtime Thursday night? 
> > 
> 
> On Thursday, April 30th we had the following pushes at the following times:
> 17:38:31, 18:38:59, 20:37:21, 22:48:44.
> On Thursday, April 23rd: 21:08:27, 22:12:28
> (We don't have data for anything earlier)
> 
> Compared with other times of the day, this is a very small amount. Our biggest
> try load appears to be 10am - 4pm pacific time.
> 
> Given all of that, I think this is an ideal time. What do others think?

Downtime notice sent to newsgroups for 7pm tomorrow (thurs 7th).
(In reply to comment #9)
> Do we care about saving the old contents of the repo?  (I'm not sure.)

We're explicitly not saving the old contents of the repo. It seemed to me that if patches were good, they were landed in "real" repos, and that if patches were bad, they were still being worked on in the developer's repo. Whats on TryServer is totally transient.

If thats not what developers expect, let me know, and I can post to newsgroups to clarify.
I wonder whether, when a developer pushes from an old repository without "-r tip", they may accidentally push several heads, making this problem return faster.
I'd be surprised if anyone keeps multiple heads in a m-c clone, since you have to explicitly remember not to push the old ones to m-c. But I could be wrong.
I do, because there's a head for every saved MQ state (when you do hg qsave -e -c to merge a queue forward).

FWIW, I have linked from bug comments to tryserver changesets on a couple of occasions, but nothing that I think is critical to keep around.

But I really hope we just fix the tools so that a many-headed monster is ok, and don't continue this try-repo-pruning thing forever.
I talked about this a bit with one of the other hg guys. We came up with the idea for an extension that switches out the hgweb unbundle protocol command to save the sent in bundles independently; the bundle could be saved/transferred separately, so that you don't end up with such a many-headed repo.

Some things are just always going to be slow with this many heads, and there's not really any way around that.
(In reply to comment #5)
> Is it possible to identify old branches and 'hg strip' them?

(This would look like a "simple" and "no downtime" solution :-)
Locally, I used to create heads then strip them, before I switched to using mq...)
Wiped the try repo clean and cloned from mozilla-central.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.