Closed Bug 1053705 Opened 10 years ago Closed 10 years ago

Update Mercurial on hg.mozilla.org to v3.2.1

Categories

(Developer Services :: Mercurial: hg.mozilla.org, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Unassigned)

References

(Depends on 1 open bug)

Details

(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1501] )

We're currently running 2.5.4 on hg.mozilla.org, to which we updated in bug 741353.

According to:

(In reply to Gregory Szorc [:gps] from bug 1053558 comment #1)
> The evidence (trace output and tracebacks from processes on pegged cores)
> supports a known Mercurial scaling problem with branch cache population on
> mega-headed repos is being hit on the web heads.
> 
> Culling the heads is the mitigation strategy.
> 
> The hot function is
> http://selenic.com/repo/hg/file/8a7bd2dccd44/mercurial/branchmap.py#l146
> (from 2.5.4).
> 
> This function has been significantly rewritten in newer versions and should
> scale farther than before:
> 
> http://selenic.com/repo/hg/file/44d6818b9cd9/mercurial/branchmap.py#l227

...updating should help with bug 1040308 (and as a bonus, all of the dependants of bug 945383).

The latest is 3.1:
http://mercurial.selenic.com/downloads
http://mercurial.selenic.com/wiki/WhatsNew
Summary: Update Mercurial (Hg) to v3.1 → Update Mercurial on hg.mozilla.org to v3.1
Depends on: 741353
Upgrading the server is on the goals list for the Developer Services group.

You can help them attain that goal by increasing the test coverage of the custom code running on the server. Increasing test coverage reduces uncertainty and removes a huge barrier to change.

The version-control-tools repository has a unified testing environment (run-mercurial-tests.py). There is even the start of continuous integration (https://ci.mozilla.org/job/version-control-tools/).

What you can do to speed this up is:

1) Move all code used by the server into the version-control-tools repository
2) Hook up run-mercurial-tests.py to test all that code
3) Start writing tests for untested code (pass --cover to run-mercurial-tests.py to produce code coverage)

I'd also prefer we rewrite the hghooks tests to use Mercurial's ".t tests" (http://mercurial.selenic.com/wiki/WritingTests) so everything is more consistent.

Ping me if you have any questions or want code reviews.
Depends on: 1064602
The package for Mercurial 3.1.1 has been built on rhel6dev64.sandbox.phx1 and is already uploaded to mrepo.

The package has also been installed on the staging server (hg.allizom.org) and has passed all of the written tests:

# ./run-mercurial-tests.py
WARNING: Not running tests optimally. Specify -j to run tests in parallel.
..................................
# Ran 34 tests, 0 skipped, 0 warned, 0 failed.


The source for these individual tests can be checked at http://hg.mozilla.org/hgcustom/version-control-tools/
Summary: Update Mercurial on hg.mozilla.org to v3.1 → Update Mercurial on hg.mozilla.org to v3.1.1
Do we want 3.1.1 on our hosts still, despite the major regression :gps found in 3.1, that he expects will be fixed only once 3.2 ships?

Or do we expect that specific regression to not affect our server infra?
Flags: needinfo?(gps)
Flags: needinfo?(bkero)
gps and I discussed the upgrade last week in #vcs (logs available). We went over test coverage (and the fixing thereof) and the planned upgrade to 3.1.1. He said the upgrade to be safe, and that the revset performance regression was to be watched, but was unlikely to impact us.

Additionally the upcoming upgrade will only be for webheads, with the hgssh hosts remaining on 2.5.4 until further tests are written.
Flags: needinfo?(bkero)
Depends on: 1068520
Did the new update break the display of Mercurial pushlogs? They are empty now. See bug 1070637.
Depends on: 1071126
Depends on: 1071296
The revset perf regression only impacts operations that add a lot of changesets. e.g. clone. For that reason release automation should probably not <=3.1.1. However, I think server operation should mostly be fine.

Pierre-Yves looked into the revset issues extensively in the past few weeks. He's contributed a number of proper fixes to what will become 3.2. He's also contributed enough bandaids to make 3.1.2 usable.

We should target hg.mozilla.org and release automation to move to 3.1.2 shortly after it is released within the next week, just so we don't have exposure to the revset performance regression on hg.mozilla.org.
Flags: needinfo?(gps)
Depends on: 1075318
Depends on: 1075275
Component: Server Operations: Developer Services → Mercurial: hg.mozilla.org
Product: mozilla.org → Developer Services
Whiteboard: [kanban:engops:https://kanbanize.com/ctrl_board/6/174]
This bug will be used to track the upgrade of hgssh to 3.1.2, which we've all signed off on in the team meeting. We're just waiting a scheduling at this time.
Summary: Update Mercurial on hg.mozilla.org to v3.1.1 → Update Mercurial on hg.mozilla.org to v3.1.2
Whiteboard: [kanban:engops:https://kanbanize.com/ctrl_board/6/174] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1485] [kanban:engops:https://kanbanize.com/ctrl_board/6/174]
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1485] [kanban:engops:https://kanbanize.com/ctrl_board/6/174] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1488] [kanban:engops:https://kanbanize.com/ctrl_board/6/174]
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1488] [kanban:engops:https://kanbanize.com/ctrl_board/6/174] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1494] [kanban:engops:https://kanbanize.com/ctrl_board/6/174]
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1494] [kanban:engops:https://kanbanize.com/ctrl_board/6/174] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1495] [kanban:engops:https://kanbanize.com/ctrl_board/6/174]
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1495] [kanban:engops:https://kanbanize.com/ctrl_board/6/174] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1501] [kanban:engops:https://kanbanize.com/ctrl_board/6/174]
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1501] [kanban:engops:https://kanbanize.com/ctrl_board/6/174] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1501]
Blocks: 1087431
Dave: My understanding is we've targeted 11 AM PST this Saturday, Nov 15 for this upgrade. I've never participated in a TCW before. Can you please tell me what I need to do?
Flags: needinfo?(dcurado)
hg 3.2.1 was released yesterday. It contains some important memory use fixes that we'd like to get deployed. Our continuous integration says 3.2 works just as well as 3.1.2. We trust CI. So, we're going to upgrade to 3.2.1.
Summary: Update Mercurial on hg.mozilla.org to v3.1.2 → Update Mercurial on hg.mozilla.org to v3.2.1
Hey Greg -- Sounds like a few different people will be doing work on saturday.
We're still working out a process for that, but what I have been doing is using
the oncall engineer in the MOC as the central coordinator during the maint. window.

That way, I can say: "OK, I'm going to do x y z change now, and you can expect to
see blah blah blah as a result."  If we all use the MOC as a central coordinator,
we can be pretty sure that I won't be making changes which screw you up while you
are doing your changes.  

As well, I like keeping the MOC staff in the loop, so that if they get alarms
due to the changes I am making, they know it is not actually something breaking.
(although I've certainly made some big chunks of breakage during maint windows!)

HTHs -- if you'd like more detail, let me know.  Otherwise, I'd say we can just
coordinate a time via the etherpad, and (again) use the #moc folks to coordinate.
Dave: Thanks for the info! This sounds easy enough. I'll hang around #moc around 1100 PST on Saturday and will coordinate with them.
Depends on: 1063208
Ben: is there a way for me to snapshot the mount point on hgssh immediately before the upgrade so I can consult old versions of the filesystem in case I need to compare or roll back certain things?

I don't plan to do a full filesystem rollback. I just want to mitigate disaster :)
Flags: needinfo?(bkero)
It's WAFL, copy-on-write not point-in-time, so you can just go back to the most recent snapshot dir to recover. While there are tools to allow snapshot creation from the host (as opposed to from the filer), we don't have them installed anywhere. That'd be something we'd need to work with the storage team, if we needed to go that route.
Flags: needinfo?(bkero)
Depends on: 1098614
Blocks: 1099979
No longer depends on: 1063208
Everything is now running 3.2.1.

Closing bug.
Status: NEW → RESOLVED
Closed: 10 years ago
Flags: needinfo?(dcurado)
Resolution: --- → FIXED
Blocks: 1100027
You need to log in before you can comment on or make changes to this bug.