Closed Bug 1146983 Opened 10 years ago Closed 10 years ago

ftp.mozilla.org does not work for me very well

Categories

(Infrastructure & Operations Graveyard :: WebOps: Product Delivery, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: sydpolk, Unassigned)

Details

(Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/815] )

I have a home jenkins instance I use from my home office in Austin. It tries to ping https://ftp.mozilla.org every few minutes to download firefox binaries. It uses mozdownload so only downloads a binary every 24 hours or so. However, in the last week, I can't get into ftp.mozilla.org reliably at all. Some logs are at the following pastebins: https://pastebin.mozilla.org/8826922 https://pastebin.mozilla.org/8826923 I just happened to change ISPs yesterday, but this was happening before that. Have I been blacklisted? Do I need to tell me jenkins jobs to try every 2 hours instead of 1? 6 hours? 24 hours? Or is something else wrong?
Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/815]
Those were actually upstream issues with the FTP Zeus cluster, unrelated to your service. Have you continued to receive these errors since, say, Friday?
Flags: needinfo?(spolk)
Friday is the last time that I was having good success rates.
Flags: needinfo?(spolk)
To clarify: Friday, you had good success rates; did it then worsen Saturday through today?
The last successful download for some of my projects was Friday morning about 6:00 AM. I have had very few successful downloads since.
Syd, When did you switch ISPs? Also, what is your IP address? We haven't made any changes to the ftp cluster since then IIRC.
I switched ISP providers yesterday, 3/23/15, which is Monday. This started before then. My current external IP address is 24.55.31.100. The IP address I was using before Monday was 99.116.238.118.
I think this is originating from the service protection class. > A 503 Server Too Busy response will be sent if the connection fails any of the max_1_connections, max_10_connections, or max_connections_rate tests. Neither IP address is whitelisted, nor blacklisted.
So, the recommendation would be to slow down my attempts to download. I will try that and see if things are better tomorrow.
Verified that this is the case: errors:[23/Mar/2015:12:28:31 -0700] DOS protection/protect-ftp triggersummary Too high a rate of connections from 24.55.31.100, dropped - 3 time(s)
(In reply to Syd Polk :sydpolk from comment #8) > So, the recommendation would be to slow down my attempts to download. I will > try that and see if things are better tomorrow. The limit you're triggering is measured in tens of connections *per minute*, and looking at the load balancer logs, your server is indeed opening a new connection for *each request* when a new binary shows up. If you can combine all of those fetches into a single wget (so that it reuses the connection), or code your app to respond to a 503 by backing off for 60+random(0..30) seconds and then retrying, or run a lightweight HTTP proxy so that the proxy can reuse connections without you having to alter the app behavior, then any of those things should help here. We shouldn't bypass this limit, however, so I'd like to avoid that if possible.
The code that does this is mozdownload. mozdownload loads the directory it wants to look at, and then potentially looks at every subdirectory it wants to look at to see if there is a build. Adding retries is a good way to mitigate the problem here. Calling mozdownload much less frequently should also help.
https://github.com/mozilla/mozdownload/blob/master/mozdownload/scraper.py#L309 Python's latest docs indicate that if you switch to urllib3 as the transport adapter, it will automatically reuse HTTP connections *as long as*: "Note that connections are only released back to the pool for reuse once all body data has been read; be sure to either set stream to False or read the content property of the Response object." Since this code is using stream=True but is *missing* Response.close, it's entirely possible that mozdownload is leaking HTTP connections by leaving them open until keepalive times out on the server-side, forcing the connection to be cleaned up.
The server is working the way it should. I should not be polling so often, and there may be a bug in mozdownload. Pasting in relevant bits of irc conv for posterity. sydpolk 11:09 well, I am hoping anybody can help me 11:11 sydpolk: what's your question? sydpolk 11:11 I cannot get to ftp.mozilla.org 11:11 and haven't been able to for days RyanVM|sheriffduty 11:11 weird sydpolk 11:12 I have a job here at home that tries to mozdownload one/hour 11:12 and that has succeeded once in four days 11:12 just now, I tried curl http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-beta-linux64/ 10 times in a row, and it succeeded once 11:12 seems to make now difference whether I am on VPN or not 11:13 but when I am logged into a machine in MTV2, it works fine 11:13 mreavy|mtg is now known as mreavy dustin 11:15 what error do you get? sydpolk 11:15 Server too busy dustin 11:15 can you pastebin an example? sydpolk 11:19 https://pastebin.mozilla.org/8826920 dustin 11:19 can you add -v? 11:20 I wonder if you're being proxied 11:20 by your ISP 11:20 also, what does ftp.mozilla.org resolve to? sydpolk 11:20 well, I changed isp's yesterday, but I have been having this problem for a few days dustin 11:20 it works fine for me, so it's something specific to you or your system 11:20 could be malware too I suppose sydpolk 11:21 https://pastebin.mozilla.org/8826921 11:21 fails on second mac 11:21 fails on linux vm dustin 11:21 hm 11:21 where are you located? sydpolk 11:22 austin, tx dustin 11:22 and on roadrunner.. shouldn't be that bad sydpolk 11:23 failing on my windows 8.1 box 11:23 I don't think it is malware dustin 11:23 yeah 11:23 I wonder if you've been blacklisted? 11:24 I *think* that error is what the frontend load balancers produce 11:24 and they have DDOS protection of sorts 11:24 it might be time to open a ticket in the product delivery component sydpolk 11:25 ok. I can reduce my frequency of polling 11:25 but surprised that it happend with two different ISPs dustin 11:25 I doubt once an hour would trigger it sydpolk 11:25 well, I have about 20 jenkins jobs firing 1/hour dustin 11:25 hm sydpolk 11:25 bz ticket? dustin 11:25 yeah sydpolk 11:31 bug 1146983 atoll 14:16 is 'mozdownload' used in the releng processes? atoll 14:16 sydpolk identified an issue with the ftp site that i'm concerned might stem from it leaking open connections rail 14:17 I don't think so sydpolk 14:17 mozdownload is used by mozmill and qa extensively atoll 14:18 sydpolk: i don't know enough python to be *certain* about 1146983#c12, but if that turns out to be the case, then your error rate might go down a lot! atoll 14:18 which would be a plus. we also don't rate limit mozilla public IP space, so this never would have been detected there. sydpolk 14:20 right 14:21 I develop automation from my house and then deploy to mozilla public systems atoll 14:21 nods atoll 14:22 sydpolk: thank you for filing this! sydpolk 14:23 np. been driving me nuts. Turning the pump slower seems to be solving my problem 14:23 but I know that people have been having problems using mozdownload 14:23 in the community rail 14:26 I know that we had some issues with FTP last week atoll 14:27 sydpolk: well, i think it's generating and keeping open 1 connection per request 14:28 sydpolk: and then the load balancer caps at 50 open, unless the requests take longer than .. like, 60 seconds or something 14:28 sydpolk: but only on python platforms where urllib3 is the default, since that's what implements silent connection pooling sydpolk 14:28 all of my connections are usually with ubuntu 14 atoll 14:29 so adding r.close after the progress bar hits 100% releases the streaming connection back to the pool for further use sydpolk 14:29 ok, I'll look at that atoll 14:29 instead of effectively forcing the next request to open a *second* connection, maintaining the first open sydpolk 14:29 right atoll 14:29 this wouldn't cause the ftp issues last week, or be visible to anyone from moz ip space 14:29 but it should help anyone using it *elsewhere* 14:30 so i can't say "this is the magic fix" for any of last week's stuff. just a coincidental lucky find. 14:30 (and i could be wrong.) sydpolk 14:30 thanks a lot; you've been a big help. things are already much better just dialing back how often my jobs poll. atoll 14:30 yeah! sydpolk 14:31 I'm glad I cache firefox binaries in my instance; I shudder to think what would happen if each test downloaded stuff itself. atoll 14:31 well, heh 14:32 yeah
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.