Closed
Bug 1053558
Opened 11 years ago
Closed 11 years ago
We should probably reset try to stop breaking hg.m.o
Categories
(Developer Services :: Mercurial: hg.mozilla.org, defect)
Developer Services
Mercurial: hg.mozilla.org
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: KWierso, Assigned: bkero)
References
Details
All of these recent tree-wide closures seem to happen after tryserver has been open for a while. Backlog in #vcs makes it sound like resetting try would make these problems go away.
[16:37] bkero gps: I do have quite a few tracebacks ending in chmap.py', in 'update'
[16:37] bkero #19 file '/root/gunicorn/lib/python2.6/site-packages/mercurial/branchmap.py', in 'updatecache'
[16:37] bkero #23 file '/root/gunicorn/lib/python2.6/site-packages/mercurial/localrepo.
[16:42] gps bkero: that's the branch cache
[16:43] gps bkero: cache population time is proportional to number of heads
[16:43] gps so it might be time to reset try
We should try to schedule this for sometime soon so we can put this nightmare behind us.
Comment 1•11 years ago
|
||
The evidence (trace output and tracebacks from processes on pegged cores) supports a known Mercurial scaling problem with branch cache population on mega-headed repos is being hit on the web heads.
Culling the heads is the mitigation strategy.
The hot function is http://selenic.com/repo/hg/file/8a7bd2dccd44/mercurial/branchmap.py#l146 (from 2.5.4).
This function has been significantly rewritten in newer versions and should scale farther than before:
http://selenic.com/repo/hg/file/44d6818b9cd9/mercurial/branchmap.py#l227
Comment 2•11 years ago
|
||
(In reply to Gregory Szorc [:gps] from comment #1)
> The evidence (trace output and tracebacks from processes on pegged cores)
> supports a known Mercurial scaling problem with branch cache population on
> mega-headed repos is being hit on the web heads.
>
> Culling the heads is the mitigation strategy.
If it's only a number of heads problem, we could merge them.
Comment 3•11 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #2)
> If it's only a number of heads problem, we could merge them.
(and merge new heads as they come in)
Let's go with a try reset -- worst case, it will provide a data point.
To clarify the context of the irc log in comment 0:
- that discussion is from a test harness running a different version of hg (3.1 vs production 2.5.4), and under a different WSGI container (gunicorn vs apache mod_wsgi), than used in production.
Also moving to new home of hg bugs
Assignee: server-ops-webops → nobody
Component: WebOps: Source Control → Repos and Hooks
Product: Infrastructure & Operations → Release Engineering
QA Contact: nmaul → hwine
(In reply to Mike Hommey [:glandium] from comment #3)
> (In reply to Mike Hommey [:glandium] from comment #2)
> > If it's only a number of heads problem, we could merge them.
>
> (and merge new heads as they come in)
fwiw, this was tried in the past, and did not affect ssh push times. Since try pushes are no longer an issue (or masked), this may be worth a retry
Comment 6•11 years ago
|
||
I would try merging heads before doing it for real.
Make a clone of the try repo, merge the heads. Then `rm .hg/cache/*` and `hg --time branches` and see what happens. If that is less than a few minutes, we are in business.
Comment 7•11 years ago
|
||
Comment 8•11 years ago
|
||
http://hg.stage.mozaws.net/mirrors/generaldelta/try/ is a live backup of Try. It goes back several Try resets :D
Comment 9•11 years ago
|
||
(In reply to Gregory Szorc [:gps] from comment #8)
> http://hg.stage.mozaws.net/mirrors/generaldelta/try/ is a live backup of
> Try. It goes back several Try resets :D
It'd be nicer if it used the same UI as hg.m.o, and it would be awesome if we 302'd try/ 404s to there.
Assignee | ||
Comment 11•11 years ago
|
||
2014-08-13-1830: [bkero@boris ~]$ parallel ssh {} rm -rf /repo/hg/mozilla/try ::: hgweb{1..10}.dmz.scl3.mozilla.com
2014-08-13-1830: [root@hgssh1 ~]$ /repo/hg/scripts/reset_try.sh
Resetting try is a disruptive event to developer worflows and must be coordinated with RelEng buildduty, and notifications sent to the CAB and dev mailing lists.
Proceed? (y/N): y
Okay, here we go!
Moving current try repo to /repo/hg/nonlive/try-reset-2014-08-13-1826Cloning mozilla-central into the try repo
requesting all changes
adding changesets
adding manifests
adding file changes
added 199347 changesets with 1113408 changes to 165015 files
Trying to insert into pushlog.
Please do not interrupt...
Inserted into the pushlog db successfully.
real 32m36.422s
user 11m46.610s
sys 1m33.189s
Fixing try repo permissions
Cleaning up pushlog.db
All done
2014-08-13-1901: [hg@hgssh1 ~]$ /usr/local/bin/repo-push.sh /try
2014-08-13-1924: [hg@hgssh1 ~]$
Try has been reset
Comment 12•11 years ago
|
||
(In reply to Gregory Szorc [:gps] from comment #1)
> The evidence (trace output and tracebacks from processes on pegged cores)
> supports a known Mercurial scaling problem with branch cache population on
> mega-headed repos is being hit on the web heads.
...
> This function has been significantly rewritten in newer versions and should
> scale farther than before
Sounds like we should update Mercurial on hg.m.o then :-)
Filed bug 1053705.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Product: Release Engineering → Developer Services
You need to log in
before you can comment on or make changes to this bug.
Description
•