Closed Bug 762954 Opened 13 years ago Closed 13 years ago

Slowness of data transfer between scl1 slaves and scl3 stage

Categories

(Infrastructure & Operations Graveyard :: NetOps, task)

x86
All
task
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: armenzg, Assigned: afernandez)

Details

This started happening around 30 to 60 minutes. Is there anyway to check what is happening? This will slow down developers' significantly today. 'wget' '--progress=dot:mega' '-N' 'http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-win32-debug/1339091180/firefox-16.0a1.en-US.win32.zip' ... --2012-06-08 09:20:47-- http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-win32-debug/1339091180/firefox-16.0a1.en-US.win32.zip Resolving ftp.mozilla.org... 63.245.215.46 Connecting to ftp.mozilla.org|63.245.215.46|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 28299313 (27M) [application/zip] Saving to: `firefox-16.0a1.en-US.win32.zip' 0K ........ ........ ........ ........ ........ ........ 11% 25.2K 16m16s 3072K ........ ........ ........ ........ ........ ........ 22% 18.0K 17m5s 6144K ........ ........ ........ ........ ........ ........ 33% 26.3K 13m40s 9216K ........ ........ ........ .. 0K ........ ........ ........ ........ ........ ........ 4% 31.7K 32m56s 3072K ........ ........ ........ ........ ........ ........ 9% 28.8K 32m53s 6144K ........ ........ ........ ........ ........ ........ 14% 31.7K 30m40s 9216K ........ ........ ........ ........ ........ ........ 18% 27.8K 29m45s 12288K ........ ........ ........ ........ ........ ........ 23% 35.8K 27m6s 15360K ........ ........ ........ ........ ........ ........ 28% 33.1K 25m11s 18432K ........ .....
Assignee: server-ops → network-operations
Component: Server Operations → Server Operations: Netops
QA Contact: phong → ravi
This is possibly an existing issue with using HTTP for ftp.mozilla.org (recall some talk yesterday about this). If possible, for the time being, use ftp://ftp.mozilla.org Looking into the http:// issue.
Assignee: network-operations → afernandez
We should not change this for production releng systems since we can discover new issues. http:// is the SOP. We can revisit this later but for now let's see what has changed and how to fix it. Thanks for looking into this.
Was caught up in other things but the issue seems to have resolved it self. Will keep an eye on it, but please contact me on irc or whoever is onCall if it happens again. Will close bug in ~ 8hrs or so if the issue does not repeat.
FTR this is still happening. Not sure if it is wide spread or just happening with some slaves and not others. For reference, I have seen some jobs take 20 minutes to download a file instead of the usual 5 to 10 seconds. 'wget' '--progress=dot:mega' '-N' 'http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-win32-debug/1339091180/firefox-16.0a1.en-US.win32.tests.zip' in dir c:\talos-slave\test\build (timeout 1200 secs) watching logfiles {} argv: ['wget', '--progress=dot:mega', '-N', 'http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-win32-debug/1339091180/firefox-16.0a1.en-US.win32.tests.zip'] environment: ALLUSERSPROFILE=C:\ProgramData APPDATA=C:\Users\cltbld\AppData\Roaming COMMONPROGRAMFILES=C:\Program Files\Common Files COMPUTERNAME=TALOS-R3-W7-036 COMSPEC=C:\Windows\system32\cmd.exe FP_NO_HOST_CHECK=NO HOMEDRIVE=C: HOMEPATH=\Users\cltbld LOCALAPPDATA=C:\Users\cltbld\AppData\Local LOGONSERVER=\\TALOS-R3-W7-036 MOZ_CRASHREPORTER_NO_REPORT=1 MOZ_NO_REMOTE=1 NO_EM_RESTART=1 NUMBER_OF_PROCESSORS=2 OS=Windows_NT PATH=C:\mozilla-build;C:\mozilla-build\msys\bin;C:\mozilla-build\msys\local\bin;C:\mozilla-build\buildbotve\scripts;C:\mozilla-build\Python25;C:\mozilla-build\Python25\Scripts;C:\mozilla-build\hg;C:\mozilla-build\7zip;C:\mozilla-build\upx203w;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32;C:\WINDOWS; PATHEXT=.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC PROCESSOR_ARCHITECTURE=x86 PROCESSOR_IDENTIFIER=x86 Family 6 Model 23 Stepping 10, GenuineIntel PROCESSOR_LEVEL=6 PROCESSOR_REVISION=170a PROGRAMDATA=C:\ProgramData PROGRAMFILES=C:\Program Files PROMPT=$P$G PSMODULEPATH=C:\Windows\system32\WindowsPowerShell\v1.0\Modules\ PUBLIC=C:\Users\Public PWD=c:\talos-slave\test\build SYSTEMDRIVE=C: SYSTEMROOT=C:\Windows TEMP=C:\Users\cltbld\AppData\Local\Temp TMP=C:\Users\cltbld\AppData\Local\Temp USERDOMAIN=TALOS-R3-W7-036 USERNAME=cltbld USERPROFILE=C:\Users\cltbld WINDIR=C:\Windows XPCOM_DEBUG_BREAK=warn using PTY: False --2012-06-08 10:03:50-- http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-win32-debug/1339091180/firefox-16.0a1.en-US.win32.tests.zip Resolving ftp.mozilla.org... 63.245.215.46 Connecting to ftp.mozilla.org|63.245.215.46|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 61320279 (58M) [application/zip] Saving to: `firefox-16.0a1.en-US.win32.tests.zip' 0K ........ ........ ........ ........ ........ ........ 5% 16.1K 58m59s 3072K ........ ........ ........ ........ ........ ........ 10% 23.1K 47m17s 6144K ........ ........ ........ ........ ........ ........ 15% 29.3K 39m20s 9216K ........ ........ ........ ........ ........ ........ 20% 19.9K 37m41s 12288K ........ ........ ........ ........ ........ ........ 25% 36.8K 32m14s 15360K ........ ........ ........ ........ ........ ........ 30% 34.2K 28m22s 18432K ........ ........ ........ ........ ........ ........ 35% 25.7K 26m5s 21504K ........ ........ 37% 8.41K=16m41s 2012-06-08 10:20:32 (22.5 KB/s) - Connection closed at byte 23087666. Retrying. --2012-06-08 10:20:32-- (try: 2) http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-win32-debug/1339091180/firefox-16.0a1.en-US.win32.tests.zip Connecting to ftp.mozilla.org|63.245.215.46|:80... connected. HTTP request sent, awaiting response... 206 Partial Content Length: 61320279 (58M), 38232613 (36M) remaining [application/zip] Saving to: `firefox-16.0a1.en-US.win32.tests.zip' [ skipping 21504K ] 21504K ,,,,,,,, ,,,,,,,, ........ ........ ........ ........ 41% 42.0K 21m12s 24576K ........ ........ ........ ........ ........ ........ 46% 25.8K 20m16s 27648K ........ ........ ........ ........ ........ ........ 51% 19.1K 21m1s 30720K ........ ........ ........ ........ ........ ........ 56% 18.7K 20m2s 33792K ........ ........ ........ ........ ........ ........ 61% 17.9K 18m28s 36864K ........ ........ ........ ........ ........ ........ 66% 21.4K 15m55s 39936K ........ ........ ........ ........ ........ ........ 71% 22.3K 13m20s 43008K ........ ........ ........ ....
I got a job that took 51 mins just to download the test package. FTR this is intermitent.
From what I see, zlb5 is still going 1gbps of outbound traffic which is saturating the lb. This has been happening all week, what can we do about this? If we can't move to a 10g node, can we at least enforce bandwidth sharing on the zlb?
jd moved the other ftp vip to a 10g zeus node and now we are doing ~1.5-2gbps of traffic. [cransom@admin1 ~]$ wget -O /dev/null http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-win32-debug/1339091180/firefox-16.0a1.en-US.win32.tests.zip --2012-06-08 11:30:50-- http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-win32-debug/1339091180/firefox-16.0a1.en-US.win32.tests.zip Resolving ftp.mozilla.org... 63.245.215.56 Connecting to ftp.mozilla.org|63.245.215.56|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 61320279 (58M) [application/zip] Saving to: `/dev/null' 100%[===============================================================================>] 61,320,279 19.0M/s in 3.1s 2012-06-08 11:30:53 (19.0 MB/s) - `/dev/null' saved [61320279/61320279] let me know if this ends up regressing.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Thanks a lot for the quick response! Any way that we can track things like this? or get a nagios alerts? or anything that releng could do different? Maybe it is not possible!
Product: mozilla.org → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.