Closed
Bug 610399
Opened 14 years ago
Closed 13 years ago
Occasional disconnects from stage.m.o ("ssh_exchange_identification: Connection closed by remote host")
Categories
(mozilla.org Graveyard :: Server Operations, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: catlee, Assigned: cshields)
References
Details
A few of our buildbot masters have hit an error trying to scp files to state.m.o at 05:29 this morning. The error message is, "ssh_exchange_identification: Connection closed by remote host" buildbot-master2 hit it at 2010-11-08 05:29 talos-master02 hit it at 2010-11-08 05:29 buildbot-master1 hit it at 2010-11-08 05:29
Updated•14 years ago
|
See Also: → 589542
Summary: Occasional disconnects from stage.m.o → Occasional disconnects from stage.m.o ("ssh_exchange_identification: Connection closed by remote host")
Comment 1•14 years ago
|
||
These are all VM's @ 650 Castro. That ESX host can be overloaded at time.
Comment 2•14 years ago
|
||
Could this be related to what dmoore said in bug 589542 about simultaneous connection attempts? See comment 12.
Updated•14 years ago
|
Assignee: server-ops → network-operations
Component: Server Operations → Server Operations: Netops
Comment 3•14 years ago
|
||
Not sure why this is assigned to netops. Did you look at the ssh log files on the host in question?
Comment 4•14 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1293000709.1293005632.4282.gz Linux x86-64 mozilla-central leak test build on 2010/12/21 22:51:49 s: moz2-linux64-slave03 firefox-4.0b9pre.en-US.linux-x86_64.crashrepo 0% 0 0.0KB/s --:-- ETA firefox-4.0b9pre.en-US.linux-x86_64.crashrepo 100% 24MB 24.3MB/s 00:01 ssh_exchange_identification: Connection closed by remote host ssh_exchange_identification: Connection closed by remote host Command ['ssh', '-o', 'IdentityFile=~/.ssh/ffxbld_dsa', 'ffxbld@stage.mozilla.org', 'rm -rf /tmp/tmp.leldE14175/'] returned non-zero exit code: 255 make[1]: *** [upload] Error 2 make[1]: Leaving directory `/builds/slave/cen-lnx64-dbg/build/obj-firefox/browser/installer' make: *** [upload] Error 2 program finished with exit code 2 elapsedTime=9.418832 === Output ended ===
Updated•14 years ago
|
Assignee: network-operations → server-ops
Component: Server Operations: Netops → Server Operations
Assignee | ||
Comment 5•14 years ago
|
||
Is this problem still occurring? I did a quick check of the logs and don't see any anomalies for today. If you can get me a specific time of occurrence that would help too.
Comment 6•14 years ago
|
||
I had this problem when I was doing a staging release last week: > bash -c ssh -l stage-ffxbld -i ~cltbld/.ssh/ffxbld_dsa hg.mozilla.org clone he l10n-central/he > ssh_exchange_identification: Connection closed by remote host I hit it 47 times in just one second: from: Thu Jan 6 13:32:02 2011 to: Thu Jan 6 13:32:03 2011 I am surprised that Axel was not already CCed on this bug. The L10n repacks in general are the jobs that are most likely to hit this problem as during a 10-30 minutes gap we have 80 repacks being uploaded separately *per platform*. This makes L10n repacks more likely to hit this but it seems that we have a Russian roulette. Corey I still believe that this is related to ssh refusing connections as mentioned in comment 2. I believe we could reproduce this by triggering "repo_setup" on staging looping on deletion/creation of repos. The problem happened to me after I triggered it a 3rd time in less than 20 minutes. job#0 - 12:49 - 74 repos deleted from users/stage-ffxbld job#1 - 13:09 - 74 repos deleted from users/stage-ffxbld job#2 - 13:11 - 74 repos deleted from users/stage-ffxbld - 13:12 - wait 10 minutes for hg before recreating repos - 13:22 - 27 recreated repos in users/stage-ffxbld - 13:32 - 47 *FAILED* *HERE* to recreate repos in users/stage-ffxbld job#3 - 13:57 - 74 repos deleted from users/stage-ffxbld - 13:59 - wait 10 minutes for hg before recreating repos - 14:09 - 74 recreated repos in users/stage-ffxbld I hope this info helps.
Comment 7•14 years ago
|
||
l10n nightlies hit this multiple times pretty much every day. Details on failures, and estimates on parallel uploads at least from moco's releng side would be in builddb.
Assignee | ||
Comment 8•14 years ago
|
||
I've increased the maxstartup count from default (10) to 50 and reloaded sshd on stage. Immediately the postponed key messages in the logs have gone away so I think this is a good sign. Please verify that this is working for you tonight Axel.
Assignee | ||
Comment 9•13 years ago
|
||
No complaints, so I'm closing this out.. Please feel free to reopen if the problem comes back.
Assignee: server-ops → cshields
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•9 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•