Closed
Bug 767657
Opened 13 years ago
Closed 13 years ago
hg try repo is broken
Categories
(Developer Services :: General, task)
Developer Services
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: ericz, Unassigned)
References
Details
(Whiteboard: [TreeClosure][Workaround in comment 21])
https://hg.mozilla.org/try/ won't load and https://tbpl.mozilla.org/?tree=Try&rev=b5a8c59ecf28 loads only headers.
Comment 1•13 years ago
|
||
https://hg.mozilla.org/try/rev/b5a8c59ecf28 is unable to load, per what I'm seeing that cset was pushed very recently c.f. http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/saurabhanandiit@gmail.com-b5a8c59ecf28/ https://bugzilla.mozilla.org/show_bug.cgi?id=763468 and I see it in buildAPI
Also it (https://hg.mozilla.org/try itself) is giving me connection reset.
This prevents viewing try results for this push, while many other try pushes are still visible (oddly) if you load https://tbpl.mozilla.org/?tree=Try alone, rather than a specific cset.
Comment 2•13 years ago
|
||
FYI, I closed Try until we know the scope of the problem.
It basically makes it impossible for people to easily check their results anyway.
Whiteboard: [TreeClosure]
Comment 3•13 years ago
|
||
FWIW https://tbpl.mozilla.org/?tree=Try with &rev= has been like this since Jun 20 at least
Comment 4•13 years ago
|
||
It seems to me that try is just miserably slow. I am able to load some pages within /try but very slow.
See bug 676420 (and possible others) when RelEng asked us to reset try.
Callek confirmed that after several attempts he was also able to load the page.
I didn't see any errors in the web logs. Nobody could provide a hg error message to point what the issue is, except for try being super slow.
If anyone else can confirm that this is indeed the problem, we can try resetting try.
CC'ing more ppl here.
Comment 5•13 years ago
|
||
And for clarity, several attempts includes 2 attempts after :dumitru was able to load try main summary page (with an ~2 minute slowness) where I was still getting the "connection reset" issue.
After that I get success with that page (albeit long wait) while *The connection to the server was reset while the page was loading.* after 15-20 seconds at https://hg.mozilla.org/try/rev/b5ab1913ee8f (and any other rev I pull out of a hat)
Leaving try closed for now since we still exceed timeout for TBPL's AJAX calls to specific revs, which many devs rely on when pushing to try. I am running a |time hg push -f| for try per :dumitru for sanity, one way or the other I'll leave my findings of that here.
Comment 6•13 years ago
|
||
Push went through, now pages are loading for me just fine in <20 seconds. I'm deciding to reopen tree for now but leaving this bug open for IT/others to chime in. Can be duped around if need be at this point.
Justin@ORION /d/sources/mozilla-central
$ time hg push ssh://hg.mozilla.org/try -f
pushing to ssh://hg.mozilla.org/try
searching for changes
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 9 changesets with 40 changes to 27 files (+1 heads)
remote: Tree try is CLOSED! (http://tinderbox.mozilla.org/Try/status.html)
remote: But you included the magic words. Hope you had permission!
remote: Looks like you used try syntax, going ahead with the push.
remote: If you don't get what you expected, check http://trychooser.pub.build.mozilla.org/ for help
with building your trychooser request.
remote: Thanks for helping save resources, you're the best!
remote: Trying to insert into pushlog.
remote: Please do not interrupt...
remote: Inserted into the pushlog db successfully.
real 8m2.266s
user 0m0.015s
sys 0m0.015s
Comment 7•13 years ago
|
||
Was there a push to try before this one that left things in a broken state? (just a wild guess here)
Updated•13 years ago
|
Severity: critical → normal
fyi - I've started a head merge in bug 767715. While the self-recovery shows this isn't the underlying problem, it may reduce some pressure on the try server.
Comment 9•13 years ago
|
||
Just as a point of reference:
When I had stuff suddenly seeming to work, the queries/jobs completed in ~2 minutes
Right now I get (with manual counting) "The connection to the server was reset while the page was loading." after 36 seconds at https://hg.mozilla.org/try/rev/94bd4a5cef45)
but just wanted to point it out as ongoing, though I think *this* is just a matter of the try repo size/heads, not related to LB load; though there could be a zues timeout embedded helping to cut things out faster
Hoping Hal's run in 767715 makes a difference.
Updated•13 years ago
|
Severity: normal → blocker
Comment 10•13 years ago
|
||
Why is this made a blocker again?
Comment 11•13 years ago
|
||
Sorry, I couldn't load try tbpl, but the problem ended up being in Firefox.
Severity: blocker → normal
I am seeing exactly the same problem trying to load
https://tbpl.mozilla.org/?tree=Try&rev=20e27ef3c670
or even just the patch itself:
https://hg.mozilla.org/try/rev/20e27ef3c670
Running curl eventually worked, but took 2:44 to complete.
and when it fails it says that the connection was closed without any data being received.
Comment 15•13 years ago
|
||
This is a known problem caused by a bug in Mercurial. The changeset in question likely modifies some old sections of code, and calculating the deltas for that takes longer than the
The reason that the index page http://hg.mozilla.org/try takes a long time to load is because that offending revision is being parsed as the top commit of the summary. If I bypass out Zeus load balancers, I can see that it correctly loads the page, albeit in a very slow time.
Back in April I attended a mercurial code sprint, and this is one of the issues that I addressed with them. The creator of mercurial, mpm, looked into the issue and made code modifications to correct this problem. However, the code changes are not staged to land in 2.3 since they did not pass unit tests, and he did not have time to find out why. Please note that the delta generation code for hgweb is different from hg command-line, so clones and updates should go uninterrupted.
tl;dr: It's a mercurial problem, not our hosting. A fix is in the works, but we're up to the mercy of mercurial as to when this gets done.
Would it be possible to cache the full patches on our servers while the problem in mercurial is worked on?
Comment 17•13 years ago
|
||
First paragraph got cut off. The changeset in question likely modifies some old sections of code, and calculating the deltas for that takes longer than the load balancers allow for a timeout.
One method for fixing this in the web interface would be to stuff dummy commits in to get this large calculation out of the latest-10, and thus wouldn't be rendered on the web page anymore.
[root@boris ~]# time curl -H "Host: hg.mozilla.org" http://hgweb4.dmz.scl3.mozilla.com/try/
<...>
<div class="page_nav">
summary |
<this is where the long wait happens>
Comment 18•13 years ago
|
||
Only the web interface should exhibit a problem. Actual cloning and usage of mercurial should be unaffected. I do not think that mercurial has a mechanism for simply 'caching full patches' on a remote. This is typically accomplished by creating a new head.
Additionally, the mercurial problem is not likely to be solved anytime soon, and if it is, it will not be backported to older (existing) mercurial versions.
What is the problem you are trying to solve?
I was thinking of putting a http cache in front of the hg server so that only the first run of
curl -O http://hg.mozilla.org/try/rev/<rev>
takes a long time for any given rev.
Comment 21•13 years ago
|
||
Have added a link to this bug in the Try tree status message, since this issue came up several times on IRC.
Workaround:
(In reply to Justin Wood (:Callek) from bug 768225 comment #2)
> Just to be explicit (here) a workaround is to load the try tree without the
> &rev=* and then use the arrow at the bottom repeatedly until you find your
> push.
Whiteboard: [TreeClosure] → [TreeClosure][Workaround in comment 21]
Comment 22•13 years ago
|
||
This affects developers every day and makes their life harder every day.
Can we please attempt what espindola suggests on comment 16?
Is there anything else that can be attempted?
Have we just been degrading over time? or would a reset of the try repo help?
(It might make no sense what I am asking for; feel free to disregard)
heads have been reduced (bug 767715) without significantly improving problem. That trick used to work, so something else appears to be happening now.
Depends on: 767715
Comment 24•13 years ago
|
||
Another trick: load by name, not by rev:
https://tbpl.mozilla.org/?tree=Try&pusher=rjesup@wgate.com
Comment 25•13 years ago
|
||
Can the offending changeset be stripped from the try repo? While it's true that there are workarounds for this bug, it doesn't make a lot of sense for us to wait until the next time that try gets reset!
Comment 26•13 years ago
|
||
(In reply to Ehsan Akhgari [:ehsan] from comment #25)
> Can the offending changeset be stripped from the try repo? While it's true
> that there are workarounds for this bug, it doesn't make a lot of sense for
> us to wait until the next time that try gets reset!
We have decided we are resetting try tonight, about 8/9p PT
Comment 27•13 years ago
|
||
After the reset we might be now seeing bug 768847.
Comment 28•13 years ago
|
||
We're golden now IIUC.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Component: Server Operations: Developer Services → General
Product: mozilla.org → Developer Services
You need to log in
before you can comment on or make changes to this bug.
Description
•