Closed Bug 460054 Opened 17 years ago Closed 17 years ago

hg clone is slow and timing out after 30 minutes

Categories

(mozilla.org Graveyard :: Server Operations, task)

task
Not set
critical

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: sgautherie, Assigned: aravind)

References

Details

(Keywords: regression)

See (current): http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1224082727.1224084557.12741.gz http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1224082743.1224084548.12727.gz http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1224077505.1224078780.31550.gz This issue already happened before. Previous diagnosis was that the cloning takes longer than the allowed timeout, then it starts, clone a part, brakes, restarts, clone next part, brakes, ..., until the repository is fully cloned some builds later. Maybe a simple workaround would be to enable verbose mode for this Hg command, but it would bloat the log :-|
I'd rather ask IT why it now takes more than 30 minutes to pull a fresh clone from within our own colo, even to an xserve with kick-ass hardware.
Assignee: nobody → server-ops
Component: Release Engineering → Server Operations
QA Contact: release → mrz
Bug 450648 for the where the backend hg.m.o changes happened.
How about - the repo is huge? (540 MB). Requesting every single changeset for every build doesn't seem like a reasonable proposition to me (but then, you probably have your reasons). I moved the web front end to a couple of VMs. This is also a possible reason for this taking so long. Did this start happening since last night? (I moved them - http(s) - yesterday afternoon). I thought we created a ffxbld user that has read/write access to the repos? The ssh part still talks directly to the hg servers. You may have better luck talking ssh.
I think there's a more serious problem here. A clone of http://hg.mozilla.org/build/buildbot-configs timed out after 20 minutes. That repository only has 420 changesets in it and should take 10 seconds or so to clone. From the log: hg clone http://hg.mozilla.org/build/buildbot-configs configs requesting all changes adding changesets adding manifests adding file changes command timed out: 1200 seconds without output, killing pid 23030 process killed by signal 9 http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1224091628.1224092866.2652.gz&fulltext=1
(In reply to comment #3) > How about - the repo is huge? (540 MB). Requesting every single changeset for > every build doesn't seem like a reasonable proposition to me (but then, you > probably have your reasons). I moved the web front end to a couple of VMs. > This is also a possible reason for this taking so long. Did this start > happening since last night? (I moved them - http(s) - yesterday afternoon). > > I thought we created a ffxbld user that has read/write access to the repos? > The ssh part still talks directly to the hg servers. You may have better luck > talking ssh. We don't re-clone for every build - but our nightlies _do_, because they are full clobber builds. Regular dep builds just do 'hg pull && hg up'
(In reply to comment #5) > We don't re-clone for every build - but our nightlies _do_, because they are > full clobber builds. Regular dep builds just do 'hg pull && hg up' oops, sorry then. Wrong assumption on my part. I am looking into it. In general you should however expect those full clone to start taking longer and longer as more and more changesets accumulate.
Assignee: server-ops → aravind
(In reply to comment #6) > oops, sorry then. Wrong assumption on my part. I am looking into it. In > general you should however expect those full clone to start taking longer and > longer as more and more changesets accumulate. Absolutely, we just noticed that some of our nightly builds started failing today and it seemed related to the changes.
[aravind@boris tmp]$ time hg clone http://hg.mozilla.org/mozilla-central destination directory: mozilla-central requesting all changes adding changesets adding manifests adding file changes added 20508 changesets with 104681 changes to 40379 files (+2 heads) updating working directory 34992 files updated, 0 files merged, 0 files removed, 0 files unresolved real 8m12.897s user 2m24.785s sys 0m19.156s [aravind@boris tmp]$ ~/work/code/tmp $rm -rf configs; time hg clone http://hg.mozilla.org/build/buildbot-configs configs requesting all changes adding changesets adding manifests adding file changes added 421 changesets with 623 changes to 104 files updating working directory 90 files updated, 0 files merged, 0 files removed, 0 files unresolved real 0m1.300s user 0m0.490s sys 0m0.250s ~/work/code/tmp $ I have however been able to hang that clone once and had to kill it.
I bumped up the CPUs in the VMs serving hgweb. Lets see if the problem continues.
Just got three full clones from mozilla-central in 4, 6, and 11 minutes. Looks great to me.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Summary: Must increased the timeout for 'Hg clone' command (on Firefox boxes at least) → hg clone is slow (timing out after 30 minutes on nightly builds)
(In reply to comment #3) > I moved the web front end to a couple of VMs. > This is also a possible reason for this taking so long. Did this start > happening since last night? (I moved them - http(s) - yesterday afternoon). Ftr, I think the timeout case was noticed (on Firefox MacOSX !?) (a little) previously to this. But what I'm rather sure of is that I noticed "HTTP 500" errors [which I didn't filed as a bug] on boxes only very very recently.
Same issue again today on the Linux and Linux 64-bit nightlies. I am also getting timeouts trying to do an hg clone on my own Linux system.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I guess it is not just a Linux client issue either. I also just had a timeout on my windows build systems using the official Mozilla build version of hg.
To be clear, one of the nightlies timed out after 30 minutes: requesting all changes adding changesets command timed out: 1800 seconds without output, killing pid 7196 and one of them got their connection reset: requesting all changes adding changesets adding manifests transaction abort! rollback completed abort: Connection reset by peer
(In reply to comment #14) > transaction abort! > rollback completed > abort: Connection reset by peer Afaict, this kind of message is really new, maybe even newer than the HTTP 500 (which I haven't seen anymore very lately); but I think it already happened before comment 9.
moz2-darwin8-slave01, started at 2008/10/16 08:02:02 /tools/python/bin/hg clone --rev 1b853ea8e4180c20b1ddb779fd38c22eb98060eb http://hg.mozilla.org/mozilla-central /builds/slave/trunk_darwin-1/build requesting all changes adding changesets adding manifests adding file changes command timed out: 1200 seconds without output, killing pid 42116 Same again at 2008/10/16 08:23:20. One the current run, it's gotten to the "adding file changes" stage in less than 10 minutes, lets see if it succeeds this time ...
... no. I restarted the buildbot slave after this and it sorted itself out, but the same error occurred in 2008/10/16 11:32:46 run (moz2-darwin8-slave01).
I ended up putting the following kludge in my build script in order to be able to get the source: while [ ! -d mozilla2 ] do hg clone http://hg.mozilla.org/mozilla-central/ mozilla2 done
Raising priority, this is impacting on developers.
Severity: major → critical
I reverted the changes in 450648. So stuff should be back to normal.
Should be now going to a real server in the back end.
Status: REOPENED → RESOLVED
Closed: 17 years ago17 years ago
Resolution: --- → FIXED
Blocks: 450648
Keywords: regression
We had some failures this morning :(. Two at 2am, one at 3:15. requesting all changes adding changesets adding manifests transaction abort! rollback completed abort: Connection reset by peer
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
See also my comment on bug 460062 comment 6.
I've got a builder just timed out cloning buildbot-configs, and a clone running on my own machine is currently stuck as well (been 3 minutes so far just sitting there) $> hg clone http://hg.mozilla.org/build/buildbot-configs destination directory: buildbot-configs requesting all changes adding changesets adding manifests adding file changes ^Z [1]+ Stopped hg clone http://hg.mozilla.org/build/buildbot-configs [gozer@huigui tmp]$ bg [gozer@huigui tmp]$ strace -p %1 Process 24506 attached - interrupt to quit recvfrom(3,
And FYI, this clone operation is connected to : TCP [xxx]:58937->dm-hg02.mozilla.org:http (ESTABLISHED)
(In reply to comment #20) > I reverted the changes in 450648. So stuff should be back to normal. We're still seeing problems, even after the revert. Did anything else change? (Tweaking summary, as this is hitting other builders, in addition to nightly builders.)
Summary: hg clone is slow (timing out after 30 minutes on nightly builds) → hg clone is slow and timing out after 30 minutes
Okay, lets give this one more shot guys. If this doesn't work, to solve bug 450648, I will have to use a different hostname for the frontend hgweb. I have been trying to get around that by trying tcp forwarding, etc. And now, I am down to reverse-proxying. Consider this the final attempt, and then I will just re-open the other bug and propose the frontend host name change.
I am going to mark this resolved for now. Please re-open if needed.
Status: REOPENED → RESOLVED
Closed: 17 years ago17 years ago
Resolution: --- → FIXED
(In reply to comment #11) > But what I'm rather sure of is that I noticed "HTTP 500" errors [which I didn't > filed as a bug] on boxes only very very recently. I filed bug 461873. *** V.Fixed, otherwise.
Status: RESOLVED → VERIFIED
Product: mozilla.org → mozilla.org Graveyard

Hello, me and other people are having this problems and it's preventing us to download the project and contribute.

I've tried many solutions without success. It always gets stuck at the same point and then it throws the same "abort: Connection reset by peer" error.

$> hg clone https://hg.mozilla.org/mozilla-central/ firefox
applying clone bundle from https://hg.cdn.mozilla.net/mozilla-central/5727cf4eeb374896c4c055a6b3b0b89bc33f2432.zstd-max.hg
adding changesets
adding manifests
adding file changes
transaction abort!
rollback completed
abort: Connection reset by peer

Hello! I just managed to get over the abort: Connection reset by peer error and download the full repo. After installing Mercurial, you download the repo using this command:

hg clone --uncompressed https://hg.mozilla.org/mozilla-central/

Make sure you have 40 GB of free space before doing it! After that just follow the step as described in the guide: https://docs.firefox-dev.tools/getting-started/build.html

Good luck

@Helena thanks! You comment saved me some aggravation. It looks like this is an issue again. I looked this up after failing 3 times...40Gigs? No wonder. I will look for a way to just get a smaller subset of changesets. :)

You need to log in before you can comment on or make changes to this bug.