1343902 - Pushing to mozreview is very slow

Reporter

Description

•

8 years ago

I just tried doing a basic "time hg push -c someid review", with autopublish=True set. This sits there for a while "looking for changes", then eventually pushes. The "time" output is: 2.159u 0.727s 0:53.11 5.4% 0+0k 0+35io 411pf+0w So it sat there for close to 50 seconds of non-CPU time. With autopublish=False it's even worse, because you have to wait for it to prompt you, so you can't do anything while you wait. The user experience actually ends up worse/slower than bzexport, which takes some doing. :( This is pretty similar to the horrible push times we have on try every so often, and possibly for the same reasons (too many heads)?

:glob ✱

Comment 1

•

8 years ago

before gps took leave he indicated to me that the number of heads on that repo shouldn't be a problem for the time being. i'm not going to exclude that as the issue as reviewboard/gecko has ~44k heads. what's interesting is it's intermittent - i haven't seen a slow push, but i also work opposite to peak times. i asked MOC if there were any recent alerts triggered, and it looks like we're hitting memory limits. Service Warning[03-05-2017 23:14:18] SERVICE ALERT: reviewboard-hg2.dmz.scl3.mozilla.com;Out of memory - killed process;WARNING;HARD;1;Log errors: Mar 5 23:12:41 reviewboard-hg2.dmz.scl3.mozilla.com kernel: [11324941.253462] Out of memory: Kill process 1168 (httpd) score 140 or sacrifice child as this falls into the "not helping things" category i've filed bug 1345035 to increase the memory on that VM.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 2

•

8 years ago

> what's interesting is it's intermittent - i haven't seen a slow push, but i also work opposite to peak times. I can do some random review pushes at different times of day and record the times taken if that would be useful...

Steven MacLeod [:smacleod]

Comment 3

•

8 years ago

(In reply to Byron Jones ‹:glob› from comment #1) > before gps took leave he indicated to me that the number of heads on that > repo shouldn't be a problem for the time being. > i'm not going to exclude that as the issue as reviewboard/gecko has ~44k > heads. > > what's interesting is it's intermittent - i haven't seen a slow push, but i > also work opposite to peak times. > > > i asked MOC if there were any recent alerts triggered, and it looks like > we're hitting memory limits. > > Service Warning[03-05-2017 23:14:18] SERVICE ALERT: > reviewboard-hg2.dmz.scl3.mozilla.com;Out of memory - killed > process;WARNING;HARD;1;Log errors: Mar 5 23:12:41 > reviewboard-hg2.dmz.scl3.mozilla.com kernel: [11324941.253462] Out of > memory: Kill process 1168 (httpd) score 140 or sacrifice child > > as this falls into the "not helping things" category i've filed bug 1345035 > to increase the memory on that VM. Are you pushing to the gecko review repo though? The head problem will be dependent upon the specific review repo you're pushing to, not the number of heads for all the repos together.

Gregory Szorc [:gps]

Assignee

Updated

•

8 years ago

Assignee: nobody → gps

Status: NEW → ASSIGNED

Gregory Szorc [:gps]

Assignee

Comment 5

•

8 years ago

gcox: I suspect slow I/O on reviewboard-hg2 may be partially to blame for the slowness here. Could you tell me a bit about how the I/O is backed? SSD? Spinning disk? Are there provisioned IOPS? The most important questions are "can I/O performance easily be improved" and "what are our options?"

Greg Cox [:gcox]

Comment 6

•

8 years ago

Attached image reviewboard-hg2 IOPS — Details

15k SAS disks. There's not provisioned IOPS; it's a shared pool. "can I/O performance easily be improved" - not easily. When I saw the bug, I went to look at the graphs. The instant reaction here is that it's an eyeball average around "1000 for 5 minutes". For comparison's sake, our 'worst' virtualized databases use around 2000 IOPS, and the cluster floats around 18000 IOPS. So, you're using a fair bit, but not an unheard-of amount.

Gregory Szorc [:gps]

Assignee

Comment 7

•

8 years ago

Thanks for the info. That big spike a few minutes ago was likely me cloning the repo. It's good that we don't see spikes outside of that. That gives me some comfort that I/O isn't a big a problem as I thought. I'll poke around and try to find another culprit.

Gregory Szorc [:gps]

Assignee

Comment 8

•

8 years ago

I cloned the review repo and was able to reproduce slow `hg unbundle` performance. I tracked this down to a membership test on a Python list in Mercurial. I submitted a patch (https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-March/095416.html) that changes the list to a set and this makes `hg unbundle` 14x faster (~18.5s to ~1.3s) on my machine (which has a faster CPU than reviewboard-hg). This change will hopefully be accepted into 4.1.2. If not, it should be part of 4.2.0. Until we deploy a Mercurial with the fix, we could work around the performance issue by doing dummy merge commits (like we do on the Try repo) to reduce the number of heads. However, the change is trivial and I'm planning on deploying Mercurial 4.1 to production in the next few days. So I'm leaning towards creating a custom 4.1.1 package with this trivial one-liner applied.

Blocks: 1333616

Gregory Szorc [:gps]

Assignee

Comment 9

•

8 years ago

The patch has been accepted upstream and will be in 4.1.2, which should be released on April 1 (no joke).

Gregory Szorc [:gps]

Assignee

Comment 10

•

8 years ago

Since this is a pressing performance issue and since I'm not going to deploy 4.1 until next week, I hacked changegroup.py on the server to contain the fix (after verifying the code was similar, of course). This /may/ get reverted if a deploy is done. This is obviously a quick and dirty hack. The actual changegroup application now takes <5s. However, it looks like discovery is still taking 10+s though. So I may cull heads on the repo (like we've done with Try). Long term fix for this is still bug 1055298. Facebook has authored an "infinitepush" extension that does most of what we want. I'm in a dialog with their engineers about making it work better with vanilla Mercurial, which is a soft blocker on us using it at Mozilla. Anyway, I'll keep this bug open to track pruning heads. I should have that done be EOD.

Gregory Szorc [:gps]

Assignee

Comment 11

•

8 years ago

I have a process running on my home machine to do dummy merge commits to prune the number of heads. It appears to be running at ~1.6 commit/s (although it gets slightly faster over time as there are a few O(n) operations wrt number of heads). It pushes ~500 head closures every 5 minutes. This will take several more hours to complete. But I'm pretty confident it will get the job done. So I'm going to resolve this bug. Before I go, a few final comments. First, I was surprised by the O(n^2) CPU issue in Mercurial. I hadn't observed that before as part of measuring performance issues with Try. I think that's because I was always fixated with the network overhead from discovery. Second, I feel I owe gcox a light apology for calling out I/O slowness. When I first started poking at the server, I performed a clone of the gecko review repo and noticed from `dstat` output that I/O performance didn't appear to be consistently as good as I've seen on the hgweb machines (which have SSDs). I was seeing a bunch of CPU time in I/O wait, which is a tell-tale sign of an I/O bottleneck. While the I/O on reviewboard-hg isn't as good as hgweb, it does appear to be more than adequate for most visible-to-end-user operations. So my initial hunch that I/O was partially responsible was wrong. Sorry for the mini fire drill, gcox.

Status: ASSIGNED → RESOLVED

Closed: 8 years ago

Resolution: --- → FIXED

Bugzilla

Pushing to mozreview is very slow

Categories

(MozReview Graveyard :: Integration: Mercurial, defect)

Tracking

(Not tracked)

People

(Reporter: bzbarsky, Assigned: gps)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Updated

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Attachment

General

Description

File Name

Content Type