Closed Bug 595275 Opened 14 years ago Closed 14 years ago

IT script created to prune or re-clone the try repo with no manual steps

Categories

(mozilla.org Graveyard :: Server Operations, task)

x86
All
task
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: lsblakk, Unassigned)

References

Details

(Whiteboard: [buildduty][try])

Looking for an automated solution for the slowing down of try pushlog due to the buildup of heads on try.  Can a script be put in place to strip heads periodically or automatically reclone the try repo on a regular basis?
Bumping the importance of this, in light of things like bug 532412 anything that can help keep the try repo trimmed and reliable would help make it more robust for the higher level of usage we're seeing.
Severity: normal → major
Assignee: server-ops → aravind
In the past, we have periodically cleaned out the try repo and cloned stuff from mozilla-central.  

Are we not doing that anymore?

If not, how do we decide which heads to cull?  Anything older than a week?  How do we distinguish between extraneous heads from user repos and those on mozilla-central?

And once we do cull heads, won't they be pushed right back in when folks push their changes to the try repo?
(In reply to comment #2)
> In the past, we have periodically cleaned out the try repo and cloned stuff
> from mozilla-central.  
> 
> Are we not doing that anymore?

What this bug is looking for is a way to make that process automated, and more frequent since the increased usage of try repo has lead to a quicker buildup of heads that slow down pushlog. We don't go back to try revisions for any reason after the results for that push have completed.  That may become something needed at a later date, but it should be < 2 weeks of storage time.  So if we have automatic re-cloning of the try repo during super slow usage times (eg: weekend late night Pacific time) we could put out the message that there is a known try-repo downtime every N weeks at N time.
 
> And once we do cull heads, won't they be pushed right back in when folks push
> their changes to the try repo?

Afaik people push from their m-c checkouts so the heads build up over time, they don't come back all at once.  Again, I think a bi-weekly automatic recloning schedule should improve the head build up issue.
Any new information, progress, or ETA on this?
@lukas - when do you need this by?
Closing the bug this blocks is a q3 goal - part of our 'old bugs' smackdown.
I had written up instructions to clone the try repo in the past (https://wiki.mozilla.org/ReleaseEngineering:ResetTryServer).  Those instructions are still good and we will need to add a few more steps to deal with the in memory try repo.  I can add those remaining steps to a script pretty easily.  When can we try this on the main try repo?
Flags: needs-treeclosure?
OS: Mac OS X → All
requesting treeclosure - want this in the next RelEng downtime, if possible.
releng - punt back when we have a date scheduled.
Assignee: aravind → nobody
Component: Server Operations → Release Engineering
QA Contact: mrz → release
Next tree closure is Sunday, 1-4pm PDT.
Assignee: nobody → server-ops
Component: Release Engineering → Server Operations
QA Contact: release → mrz
Flags: needs-downtime+
Whiteboard: 10/16/2010 @ 1pm
Putting this back to ServerOps since this bug is already being tracked by a RelEng bug 529179.

Also perhaps the scripting that Aravind wants to try out (from comment 7) can be tested on a staging repo first?  The goal of this bug is to be able to clean up the try repo on a regular basis without a downtime being needed.
We don't have a staging repo for the try repo.  The try repo is unique in the way its used and setup (with a tmpfs etc.. ).  I would be impractical to do this in a staging env.
Assignee: server-ops → aravind
(In reply to comment #12)
> We don't have a staging repo for the try repo.  The try repo is unique in the
> way its used and setup (with a tmpfs etc.. ).  I would be impractical to do
> this in a staging env.

Good to know, in that case will you be able to use the upcoming tree-closure window on Sunday?
(In reply to comment #13)
> Good to know, in that case will you be able to use the upcoming tree-closure
> window on Sunday?

I thought the downtime was Saturday afternoon (1 PM Pacific).. I am planning on doing this at that time.
n.m Ravi/John said this is now being done on Sunday.
Whiteboard: 10/16/2010 @ 1pm → 10/17/2010 @ 1pm
Script ready @ /repo/hg/scripts/reset_try.sh.

For now I suggest logging an IT request bug and having some folks in IT run the script.  If you are ready for us to cron it, let me know when and how often, and I can add it to cron.

One run of the reset script takes about 30 minutes to complete.


Here is a sample run.
[root@dm-svn02 scripts]# time ./reset_try.sh
Disabling try on hg.m.o
Stopping httpd: [OK]
Deleting the current try repo
Cloning mozilla-central into the try repo
Fixing try repo permissions
Cleaning up pushlogdb
Syncing changes into the tmpfs repo
Starting httpd: [  OK  ]
All done

real    23m33.907s
user    0m4.990s
sys     0m26.572s
Depends on: 605037
Pushing back to releng for schedule to auto-prune.  IT prefers this to be a manual process, not something out of cron.
Assignee: aravind → nobody
Component: Server Operations → Release Engineering
Flags: needs-downtime+
QA Contact: mrz → release
lsblakk:

Do you have any data on when (date/time) the try load is low enough that the prune job could be run?

That seems to me to be when we schedule this with IT
Whiteboard: 10/17/2010 @ 1pm → [buildduty][try]
Looking at the past couple of months of try usage here:

https://build.mozilla.org/buildapi/reports/pushes?starttime=1283324400&endtime=1287730800

It looks like midnight on Sunday is a good time, pushes are often 0 and if we were to make it automatically close the try tinderbox page so that pushes to try were stopped with a hook while the cleanup happened then we could feasibly make this entirely automated.
Summary: Automatically prune or re-clone the try repo → IT script created to prune or re-clone the try repo with no manual steps
I'm fixing the description of this to more accurately reflect what this part of the bug was about, setting it back to ServerOps just for tracking purposes, and closing it since the end goal was achieved here.  The discussion about how to implement this from a Releng perspective can continue in bug 529179
Assignee: nobody → server-ops
Status: NEW → RESOLVED
Closed: 14 years ago
Component: Release Engineering → Server Operations
QA Contact: release → mrz
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.