After cloning https://hg.mozilla.org/try, attempting to hg pull https://hg.mozilla.org/try results in the client reporting an HTTP 413: Request Entity Too Large error. I suspect this has to do with Mercurial sending a large HTTP entity body to describe the many heads on the try repository. This problem can be worked around by using the ssh:// transport protocol for pulling from try. While this issue only seems to impact the try repository, I suspect it can occur on *any* repository. We likely don't see this error on other repositories due to the special massively multi-headed structure of the try repository. That being said, since repositories like releases/mozilla-beta continue to add new heads, they will almost certainly cross the current entity size threshold at some time and become unusable over HTTP. If I had infinite time, I would create a user repository and push/pull new heads until the repository broke due to this error. I would then estimate when mozilla-beta (likely the next repository to encounter this issue due to its branching model) would stop working, at that point becoming a fire drill.
The old heads on aurora/beta are closed as part of the process for moving to a new version. Do you know if closed heads still count towards the limit here? Per http://mozilla.github.io/process-releases/draft/development_specifics/ : hg commit -R mozilla-aurora -r default --close-branch -m 'closing old head'
Debugging the wire protocol exchange a bit, the problem occurs in 2 places. As originally filed, the error will occur when the client is sending a "getbundle" command to the server. As part of this command, the client transfers a list of requested/to-pull heads and common nodes in the request's HTTP headers. This list of heads was previously sent to the client by the server. This client transfers the list of nodes by formulating a data chunk then splitting it into X-HgArg-N HTTP request headers (where N is a number) so each header line is no more than 1024 bytes. When trying to pull the try repo into my monolithic repo 134 X-HgArg headers with a total header value size of 136606 bytes. This error will occur when pulling a repository that has introduced many new heads. If you frequently pull, presumably the number of new heads will be small and the "getbundle" request won't hit the limit. You will not receive this error on initial clone because the client in this scenario effectively says "send all of the heads." The 2nd scenario for receiving this error occurs on subsequent pulls and can occur with *any* repository. This error occurs earlier in pull during node discovery. Before the client requests what to pull, it needs to figure out what to pull. AFAICT it does this by transmitting a list of its heads and the server will calculate what's missing. Again, this transmission of heads occurs via HTTP request headers, triggering the 413. The easy way to trigger this is to clone the try repo then pull from e.g. mozilla-central. As best I can tell, the client transmits all its heads as part of discovery. This includes closed branches. Therefore, mozilla-beta will eventually be susceptible to this limitation. I recommend we increase the HTTP request headers size limit for hg.mozilla.org so this won't be an issue.
I've always merged my closed heads so that I wouldn't run into issues like this. I think there are scenarios where this leads to incorrect further merges down the lines, but there could be like a "closed" branch closed branches get merged into.
Since there are other things than discovery protocol that are O(number of heads), I think just changing the request header size limit isn't a sufficient solution.
Release branches are heads by definition. IMO it doesn't make sense to merge these heads because there conceptually will always be a head for each Gecko version. I think we could be smarter about how we maintain these heads, including reducing the total number. But I don't think we'll ever limit ourselves to N number of heads because we'll always be introducing new Gecko versions. Anyway, I believe a discussion about our heads/branching/release structure is beyond the scope of this bug. We have a problem with the configuration of the hg.mozilla.org HTTP server that needs fixed. Let's limit the discussion to that.
A couple of points: - try typically has this problem only after several thousand heads (currently 3357) - mozilla-beta currently has 384 heads - mozilla-beta gains heads: - per uplift (6 wks), so approx 10 per year - per beta release build (now 2x wk), so approx 4/wk, 200/yr - pull of try is not recommended (see original blog post), ssh will work if needed - http does work if you pull a specific rev (that's how the builds work over http) So, we're a ways out from hitting this issue on m-b IT operates hg, and is responsible for any issues like this - moving to their component.
Assignee: nobody → server-ops-webops
Component: Release Engineering: Repos and Hooks → WebOps: Source Control
Product: mozilla.org → Infrastructure & Operations
QA Contact: nmaul
On the back end, this is an easy fix: add 'LimitRequestFields 300' to the apache config (where 300 is really any number > 150, atm). Unfortunately, nobody talks directly to the backend. I'm currently trying to figure out if/how to change Zeus' behavior, since that is what's actually throwing the 413 error.
Zeus has settings which match *some* of Apache's LimitRequestField* directives, but not the one we need. I'm waiting to hear back from Stingray support. I'm not sure if this is interesting or not, but I noticed that if one cloned try via SSH and then served *that* repo via HTTP, you could clone successfully without getting the 413 error. If you rsync try and serve via HTTP you do get the 413, though.
Any update on this? I'm looking to set up an automated process to monitor pushes to the try server and that process would need an SSH key to talk to try. I'd prefer to not go down that route. But if it's easiest...
Ah, sorry I'd forgotten to follow up! There is a Zeus setting we could change that would allow try to be cloned via http, but it would affect all hg traffic and quite likely increase memory usage on the Zeus nodes; the settings are not very fine grained, at all. We would also need to monitor the size of try and readjust the Zeus setting as it grew. Adding another ssh key to LDAP is vastly easier and comes with no further maintenance, so unless there is a new requirement for cloning try over http, that would be my suggestion.
I am now getting this when pushing to fx-team or inbound. I am unable to push my Firefox patches. My workflow is severely impacted. Upgrading to P1. $ hg out -r . fx-team --debug --traceback comparing with fx-team using https://hg.mozilla.org/integration/fx-team sending capabilities command hg.mozilla.org certificate matched fingerprint af:27:b9:34:47:4e:e5:98:01:f6:83:2b:51:c9:aa:d8:df:fb:1a:27 query 1; heads sending batch command Traceback (most recent call last): File "/Users/gps/lib/python2.7/site-packages/mercurial/dispatch.py", line 134, in _runcatch return _dispatch(req) File "/Users/gps/lib/python2.7/site-packages/mercurial/dispatch.py", line 806, in _dispatch cmdpats, cmdoptions) File "/Users/gps/lib/python2.7/site-packages/mercurial/dispatch.py", line 586, in runcommand ret = _runcommand(ui, options, cmd, d) File "/Users/gps/lib/python2.7/site-packages/mercurial/extensions.py", line 196, in wrap return wrapper(origfn, *args, **kwargs) File "/Users/gps/lib/python2.7/site-packages/hgext/pager.py", line 138, in pagecmd return orig(ui, options, cmd, cmdfunc) File "/Users/gps/lib/python2.7/site-packages/mercurial/extensions.py", line 196, in wrap return wrapper(origfn, *args, **kwargs) File "/Users/gps/lib/python2.7/site-packages/hgext/color.py", line 417, in colorcmd return orig(ui_, opts, cmd, cmdfunc) File "/Users/gps/lib/python2.7/site-packages/mercurial/dispatch.py", line 897, in _runcommand return checkargs() File "/Users/gps/lib/python2.7/site-packages/mercurial/dispatch.py", line 868, in checkargs return cmdfunc() File "/Users/gps/lib/python2.7/site-packages/mercurial/dispatch.py", line 803, in <lambda> d = lambda: util.checksignature(func)(ui, *args, **cmdoptions) File "/Users/gps/lib/python2.7/site-packages/mercurial/util.py", line 511, in check return func(*args, **kwargs) File "/Users/gps/lib/python2.7/site-packages/mercurial/extensions.py", line 151, in wrap util.checksignature(origfn), *args, **kwargs) File "/Users/gps/lib/python2.7/site-packages/mercurial/util.py", line 511, in check return func(*args, **kwargs) File "/Users/gps/lib/python2.7/site-packages/hgext/mq.py", line 3383, in mqcommand return orig(ui, repo, *args, **kwargs) File "/Users/gps/lib/python2.7/site-packages/mercurial/util.py", line 511, in check return func(*args, **kwargs) File "/Users/gps/lib/python2.7/site-packages/mercurial/commands.py", line 4375, in outgoing return hg.outgoing(ui, repo, dest, opts) File "/Users/gps/lib/python2.7/site-packages/mercurial/hg.py", line 576, in outgoing o = _outgoing(ui, repo, dest, opts) File "/Users/gps/lib/python2.7/site-packages/mercurial/hg.py", line 558, in _outgoing force=opts.get('force')) File "/Users/gps/lib/python2.7/site-packages/mercurial/extensions.py", line 196, in wrap return wrapper(origfn, *args, **kwargs) File "/Users/gps/src/hgsubversion/hgsubversion/__init__.py", line 113, in findcommonoutgoing return orig(*args, **opts) File "/Users/gps/lib/python2.7/site-packages/mercurial/discovery.py", line 109, in findcommonoutgoing commoninc = findcommonincoming(repo, other, force=force) File "/Users/gps/lib/python2.7/site-packages/mercurial/discovery.py", line 46, in findcommonincoming abortwhenunrelated=not force) File "/Users/gps/lib/python2.7/site-packages/mercurial/setdiscovery.py", line 106, in findcommonheads batch.submit() File "/Users/gps/lib/python2.7/site-packages/mercurial/wireproto.py", line 76, in submit self._submitreq(req, rsp) File "/Users/gps/lib/python2.7/site-packages/mercurial/wireproto.py", line 78, in _submitreq encresults = self.remote._submitbatch(req) File "/Users/gps/lib/python2.7/site-packages/mercurial/wireproto.py", line 160, in _submitbatch rsp = self._call("batch", cmds=';'.join(cmds)) File "/Users/gps/lib/python2.7/site-packages/mercurial/httppeer.py", line 171, in _call fp = self._callstream(cmd, **args) File "/Users/gps/lib/python2.7/site-packages/mercurial/httppeer.py", line 118, in _callstream resp = self.urlopener.open(req) File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 410, in open response = meth(req, response) File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 523, in http_response 'http', request, response, code, msg, hdrs) File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 448, in error return self._call_chain(*args) File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 382, in _call_chain result = func(*args) File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 531, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) HTTPError: HTTP Error 413: Request Entity Too Large abort: HTTP Error 413: Request Entity Too Large
Priority: -- → P1
Summary: hg pull results in HTTP Error 413: Request Entity Too Large (currently try only) → hg pull results in HTTP Error 413: Request Entity Too Large
Summary: hg pull results in HTTP Error 413: Request Entity Too Large → hg push/pull/out/in results in HTTP Error 413: Request Entity Too Large
Actually, push isn't impacted since that goes over ssh. The workaround is to have all server communication go over ssh. But I'm pretty sure IT doesn't want everyone hogging resources on the master server.
Summary: hg push/pull/out/in results in HTTP Error 413: Request Entity Too Large → hg pull/out/in results in HTTP Error 413: Request Entity Too Large
P2 since you can workaround by diverting traffic to SSH. Just my opinion of triage. I won't be offended if others disagree.
Priority: P1 → P2
Yeah, using ssh is only a stop-gap. We'll tweak the zeus setting on Tuesday, since it'll cause increased memory usage by zeus for all hg.m.o traffic. If something hits the fan this weekend and we need to put it in place: > The tunable max_client_buffer (Default value 64k) can be increased to a value > larger than the maximum content-length of the failing POST requests to avoid > this error. Similarly, the traffic manager will reject requests with error 413 > whose HTTP headers are larger than max_client_buffer and increasing its value > should help to fix this error. This can be done at the following link: > Stingray Traffic Manager GUI --> Services --> Virtual Servers --> > <Select concerned Virtual Server> --> Connection Management --> > Memory Limits --> ' Set max_client_buffer' to appropriate value in bytes --> > Update > Please note that increasing this value will increase the memory consumption of > the traffic manager.
Greg -- pulling from try was never envisioned (see original blog posts). Can you get your data by reading the pushlog for try, and pulling by revision as needed?
While the problem was originally encountered on Try, I'm encountering it on non-Try repos (such as inbound and fx-team) with a local repo that has never pulled from Try. My local repo currently has 744 heads. 643 are non-closed. Try had 20,000+ before the last reset. I could probably calculate exactly how many heads are necessary to trigger this bug given a specified header size limit. (It's around what I'm running, since things were fine yesterday.)
Ok, setting changed on zeus. Greg, let me know if you still run into the error?
I *do not* see the error for inbound, fx-team, central, etc. I *do* see the error for Try. (But try is pathological - it's limits are far beyond the normal repos.)
Excellent. I'd doubled the setting, so we should be good for a while. If we run into memory issues on zeus (seems unlikely; it's mostly cache/buffers), we can back it off. Will leave this open until mid next week to keep an eye on it.
I take it back. The fix only works over HTTPS. I'm still seeing 413 via HTTP.
Right, two different VS. Derp. Increased the buffer for http, as well. Carry on.
I think we're good here, at least for a while. Re-open if/when it happens again.
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Component: WebOps: Source Control → General
Product: Infrastructure & Operations → Developer Services
You need to log in before you can comment on or make changes to this bug.