491092 - Reset hg repo used by TryServer

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Reporter

Description

•

15 years ago

Can you please reset the hg repo that is used by TryServer? 

The hg repo being used by TryServer has been used since TryServer came online last year, and now has too many heads for pushloghtml to be able to handle (currently ~962 heads).

This will require a brief reboot/restart of the TryServer, so will need to be announced as part of the downtime notice. If you let us know when you plan to do this, we'll start blogs/newsgroup posts.

(As an aside, going forward, we should do this on some recurring basis.)

bhearsum@mozilla.com (:bhearsum)

Comment 1

•

15 years ago

(In reply to comment #0)
> (As an aside, going forward, we should do this on some recurring basis.)

How do people feel about planning time for this on the first Monday of every month? Mondays are the best time (excluding weekends) because we get the fewest pushes then.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Reporter

Comment 2

•

15 years ago

(In reply to comment #1)
> (In reply to comment #0)
> > (As an aside, going forward, we should do this on some recurring basis.)
> 
> How do people feel about planning time for this on the first Monday of every
> month? Mondays are the best time (excluding weekends) because we get the fewest
> pushes then.

Once a month should be plenty. However, instead of Monday, I'd prefer Tuesday evening, during the regular Tuesday evening downtime? If we can get TryServer master to boot up in a working state, we dont even need to be around for it, right?

bhearsum@mozilla.com (:bhearsum)

Comment 3

•

15 years ago

(In reply to comment #2)
> (In reply to comment #1)
> > (In reply to comment #0)
> > > (As an aside, going forward, we should do this on some recurring basis.)
> > 
> > How do people feel about planning time for this on the first Monday of every
> > month? Mondays are the best time (excluding weekends) because we get the fewest
> > pushes then.
> 
> Once a month should be plenty. However, instead of Monday, I'd prefer Tuesday
> evening, during the regular Tuesday evening downtime? 

This would work if we make sure we don't do it until 10pm or so. Our longest builds (mac/win32 unittest) take 2-2.5h to run - so I'd want to make sure that everything pushed before the downtime starts has enough time to complete. 10pm should be 3h after it starts - so that WFM.

> If we can get TryServer
> master to boot up in a working state, we dont even need to be around for it,
> right?

It already does AFAIK. And yeah, given that, we wouldn't have to be around.

On a side note, I'm currently trying to run some numbers on how long it will take the try repo to get to a point where http clones will fail - more to come there.

bhearsum@mozilla.com (:bhearsum)

Comment 4

•

15 years ago

Some additional data:
* HTTP cloning fails after a repository has 96 heads or more.
* During March we had 216 pushes to try, during April we had 211.

We definitely need to fix the try server code so it doesn't break at 96 heads, but we might want to consider resetting the repository on a weekly basis, too.

Chris AtLee [:catlee]

Comment 5

•

15 years ago

Is it possible to identify old branches and 'hg strip' them?

Aravind Gottipati [:aravind]

Assignee

Updated

•

15 years ago

Assignee: server-ops → aravind

Benjamin Smedberg

Comment 6

•

15 years ago

Seems to me that it would be simpler to just delete it and re-clone it from mozilla-central.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Reporter

Comment 7

•

15 years ago

Followup from group meeting and a later chat with Aravind;

1) One idea today was to reconfig the server to disable hg poller, delete/replace the hg repo and then reconfig the server to reenable the hg poller. The hope was that this would allow pending try jobs to remain queued up and so make the downtime "invisible" to developers. However, Aravind suggested this would still lose pending jobs, because they wouldnt apply cleanly to the newly recreated repo, so not worth the extra effort. 

2) If we still need to do a downtime for this, can we power it down totally and move it off eql-logic disks at the same time?

3) There was some discussion on when is a good / bad time to do this. To start discussions, what about doing this during the usual IT downtime Thursday night? 



(As an aside, I know we're doing into some details here, but it feels right, as we are doing this for the first time, and are looking at doing this on a recurring basis. Taking the time to figure it out properly is good!)

bhearsum@mozilla.com (:bhearsum)

Comment 8

•

15 years ago

(In reply to comment #7)
> 3) There was some discussion on when is a good / bad time to do this. To start
> discussions, what about doing this during the usual IT downtime Thursday night? 
> 

On Thursday, April 30th we had the following pushes at the following times: 17:38:31, 18:38:59, 20:37:21, 22:48:44.
On Thursday, April 23rd: 21:08:27, 22:12:28
(We don't have data for anything earlier)

Compared with other times of the day, this is a very small amount. Our biggest try load appears to be 10am - 4pm pacific time.

Given all of that, I think this is an ideal time. What do others think?

Aravind Gottipati [:aravind]

Assignee

Updated

•

15 years ago

Flags: needs-downtime+

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 9

•

15 years ago

Do we care about saving the old contents of the repo?  (I'm not sure.)

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Reporter

Comment 10

•

15 years ago

(In reply to comment #8)
> (In reply to comment #7)
> > 3) There was some discussion on when is a good / bad time to do this. To start
> > discussions, what about doing this during the usual IT downtime Thursday night? 
> > 
> 
> On Thursday, April 30th we had the following pushes at the following times:
> 17:38:31, 18:38:59, 20:37:21, 22:48:44.
> On Thursday, April 23rd: 21:08:27, 22:12:28
> (We don't have data for anything earlier)
> 
> Compared with other times of the day, this is a very small amount. Our biggest
> try load appears to be 10am - 4pm pacific time.
> 
> Given all of that, I think this is an ideal time. What do others think?

Downtime notice sent to newsgroups for 7pm tomorrow (thurs 7th).

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Reporter

Comment 11

•

15 years ago

(In reply to comment #9)
> Do we care about saving the old contents of the repo?  (I'm not sure.)

We're explicitly not saving the old contents of the repo. It seemed to me that if patches were good, they were landed in "real" repos, and that if patches were bad, they were still being worked on in the developer's repo. Whats on TryServer is totally transient.

If thats not what developers expect, let me know, and I can post to newsgroups to clarify.

Karl Tomlinson (:karlt)

Comment 12

•

15 years ago

I wonder whether, when a developer pushes from an old repository without "-r tip", they may accidentally push several heads, making this problem return faster.

(not currently active) Ted Mielczarek

Comment 13

•

15 years ago

I'd be surprised if anyone keeps multiple heads in a m-c clone, since you have to explicitly remember not to push the old ones to m-c. But I could be wrong.

Benjamin Smedberg

Comment 14

•

15 years ago

I do, because there's a head for every saved MQ state (when you do hg qsave -e -c to merge a queue forward).

FWIW, I have linked from bug comments to tryserver changesets on a couple of occasions, but nothing that I think is critical to keep around.

But I really hope we just fix the tools so that a many-headed monster is ok, and don't continue this try-repo-pruning thing forever.

Dirkjan Ochtman (:djc)

Comment 15

•

15 years ago

I talked about this a bit with one of the other hg guys. We came up with the idea for an extension that switches out the hgweb unbundle protocol command to save the sent in bundles independently; the bundle could be saved/transferred separately, so that you don't end up with such a many-headed repo.

Some things are just always going to be slow with this many heads, and there's not really any way around that.

Serge Gautherie (:sgautherie)

Comment 16

•

15 years ago

(In reply to comment #5)
> Is it possible to identify old branches and 'hg strip' them?

(This would look like a "simple" and "no downtime" solution :-)
Locally, I used to create heads then strip them, before I switched to using mq...)

Aravind Gottipati [:aravind]

Assignee

Comment 17

•

15 years ago

Wiped the try repo clean and cloned from mozilla-central.

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

9 years ago

Product: mozilla.org → mozilla.org Graveyard

Bugzilla

Quick Search

Reset hg repo used by TryServer

Categories

(mozilla.org Graveyard :: Server Operations, task)

Tracking

(Not tracked)

People

(Reporter: joduinn, Assigned: aravind)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Updated

Comment 6

Comment 7

Comment 8

Updated

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Comment 16

Comment 17

Updated