Closed
Bug 1146983
Opened 10 years ago
Closed 10 years ago
ftp.mozilla.org does not work for me very well
Categories
(Infrastructure & Operations Graveyard :: WebOps: Product Delivery, task)
Infrastructure & Operations Graveyard
WebOps: Product Delivery
x86
macOS
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: sydpolk, Unassigned)
Details
(Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/815] )
I have a home jenkins instance I use from my home office in Austin. It tries to ping https://ftp.mozilla.org every few minutes to download firefox binaries. It uses mozdownload so only downloads a binary every 24 hours or so.
However, in the last week, I can't get into ftp.mozilla.org reliably at all. Some logs are at the following pastebins:
https://pastebin.mozilla.org/8826922
https://pastebin.mozilla.org/8826923
I just happened to change ISPs yesterday, but this was happening before that.
Have I been blacklisted? Do I need to tell me jenkins jobs to try every 2 hours instead of 1? 6 hours? 24 hours?
Or is something else wrong?
Those were actually upstream issues with the FTP Zeus cluster, unrelated to your service.
Have you continued to receive these errors since, say, Friday?
Flags: needinfo?(spolk)
| Reporter | ||
Comment 2•10 years ago
|
||
Friday is the last time that I was having good success rates.
Flags: needinfo?(spolk)
To clarify: Friday, you had good success rates; did it then worsen Saturday through today?
| Reporter | ||
Comment 4•10 years ago
|
||
The last successful download for some of my projects was Friday morning about 6:00 AM. I have had very few successful downloads since.
Comment 5•10 years ago
|
||
Syd,
When did you switch ISPs?
Also, what is your IP address?
We haven't made any changes to the ftp cluster since then IIRC.
| Reporter | ||
Comment 6•10 years ago
|
||
I switched ISP providers yesterday, 3/23/15, which is Monday. This started before then.
My current external IP address is 24.55.31.100. The IP address I was using before Monday was 99.116.238.118.
I think this is originating from the service protection class.
> A 503 Server Too Busy response will be sent if the connection fails any of the max_1_connections, max_10_connections, or max_connections_rate tests.
Neither IP address is whitelisted, nor blacklisted.
| Reporter | ||
Comment 8•10 years ago
|
||
So, the recommendation would be to slow down my attempts to download. I will try that and see if things are better tomorrow.
Verified that this is the case:
errors:[23/Mar/2015:12:28:31 -0700] DOS protection/protect-ftp triggersummary Too high a rate of connections from 24.55.31.100, dropped - 3 time(s)
Comment 10•10 years ago
|
||
(In reply to Syd Polk :sydpolk from comment #8)
> So, the recommendation would be to slow down my attempts to download. I will
> try that and see if things are better tomorrow.
The limit you're triggering is measured in tens of connections *per minute*, and looking at the load balancer logs, your server is indeed opening a new connection for *each request* when a new binary shows up.
If you can combine all of those fetches into a single wget (so that it reuses the connection), or code your app to respond to a 503 by backing off for 60+random(0..30) seconds and then retrying, or run a lightweight HTTP proxy so that the proxy can reuse connections without you having to alter the app behavior, then any of those things should help here.
We shouldn't bypass this limit, however, so I'd like to avoid that if possible.
| Reporter | ||
Comment 11•10 years ago
|
||
The code that does this is mozdownload. mozdownload loads the directory it wants to look at, and then potentially looks at every subdirectory it wants to look at to see if there is a build.
Adding retries is a good way to mitigate the problem here. Calling mozdownload much less frequently should also help.
Comment 12•10 years ago
|
||
https://github.com/mozilla/mozdownload/blob/master/mozdownload/scraper.py#L309
Python's latest docs indicate that if you switch to urllib3 as the transport adapter, it will automatically reuse HTTP connections *as long as*:
"Note that connections are only released back to the pool for reuse once all body data has been read; be sure to either set stream to False or read the content property of the Response object."
Since this code is using stream=True but is *missing* Response.close, it's entirely possible that mozdownload is leaking HTTP connections by leaving them open until keepalive times out on the server-side, forcing the connection to be cleaned up.
| Reporter | ||
Comment 13•10 years ago
|
||
The server is working the way it should. I should not be polling so often, and there may be a bug in mozdownload. Pasting in relevant bits of irc conv for posterity.
sydpolk
11:09 well, I am hoping anybody can help me
11:11 sydpolk: what's your question?
sydpolk
11:11 I cannot get to ftp.mozilla.org
11:11 and haven't been able to for days
RyanVM|sheriffduty
11:11 weird
sydpolk
11:12 I have a job here at home that tries to mozdownload one/hour
11:12 and that has succeeded once in four days
11:12 just now, I tried curl http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-beta-linux64/ 10 times in a row, and it succeeded once
11:12 seems to make now difference whether I am on VPN or not
11:13 but when I am logged into a machine in MTV2, it works fine
11:13 mreavy|mtg is now known as mreavy
dustin
11:15 what error do you get?
sydpolk
11:15 Server too busy
dustin
11:15 can you pastebin an example?
sydpolk
11:19 https://pastebin.mozilla.org/8826920
dustin
11:19 can you add -v?
11:20 I wonder if you're being proxied
11:20 by your ISP
11:20 also, what does ftp.mozilla.org resolve to?
sydpolk
11:20 well, I changed isp's yesterday, but I have been having this problem for a few days
dustin
11:20 it works fine for me, so it's something specific to you or your system
11:20 could be malware too I suppose
sydpolk
11:21 https://pastebin.mozilla.org/8826921
11:21 fails on second mac
11:21 fails on linux vm
dustin
11:21 hm
11:21 where are you located?
sydpolk
11:22 austin, tx
dustin
11:22 and on roadrunner.. shouldn't be that bad
sydpolk
11:23 failing on my windows 8.1 box
11:23 I don't think it is malware
dustin
11:23 yeah
11:23 I wonder if you've been blacklisted?
11:24 I *think* that error is what the frontend load balancers produce
11:24 and they have DDOS protection of sorts
11:24 it might be time to open a ticket in the product delivery component
sydpolk
11:25 ok. I can reduce my frequency of polling
11:25 but surprised that it happend with two different ISPs
dustin
11:25 I doubt once an hour would trigger it
sydpolk
11:25 well, I have about 20 jenkins jobs firing 1/hour
dustin
11:25 hm
sydpolk
11:25 bz ticket?
dustin
11:25 yeah
sydpolk
11:31 bug 1146983
atoll
14:16 is 'mozdownload' used in the releng processes?
atoll
14:16 sydpolk identified an issue with the ftp site that i'm concerned might stem from it leaking open connections
rail
14:17 I don't think so
sydpolk
14:17 mozdownload is used by mozmill and qa extensively
atoll
14:18 sydpolk: i don't know enough python to be *certain* about 1146983#c12, but if that turns out to be the case, then your error rate might go down a lot!
atoll
14:18 which would be a plus. we also don't rate limit mozilla public IP space, so this never would have been detected there.
sydpolk
14:20 right
14:21 I develop automation from my house and then deploy to mozilla public systems
atoll
14:21 nods
atoll
14:22 sydpolk: thank you for filing this!
sydpolk
14:23 np. been driving me nuts. Turning the pump slower seems to be solving my problem
14:23 but I know that people have been having problems using mozdownload
14:23 in the community
rail
14:26 I know that we had some issues with FTP last week
atoll
14:27 sydpolk: well, i think it's generating and keeping open 1 connection per request
14:28 sydpolk: and then the load balancer caps at 50 open, unless the requests take longer than .. like, 60 seconds or something
14:28 sydpolk: but only on python platforms where urllib3 is the default, since that's what implements silent connection pooling
sydpolk
14:28 all of my connections are usually with ubuntu 14
atoll
14:29 so adding r.close after the progress bar hits 100% releases the streaming connection back to the pool for further use
sydpolk
14:29 ok, I'll look at that
atoll
14:29 instead of effectively forcing the next request to open a *second* connection, maintaining the first open
sydpolk
14:29 right
atoll
14:29 this wouldn't cause the ftp issues last week, or be visible to anyone from moz ip space
14:29 but it should help anyone using it *elsewhere*
14:30 so i can't say "this is the magic fix" for any of last week's stuff. just a coincidental lucky find.
14:30 (and i could be wrong.)
sydpolk
14:30 thanks a lot; you've been a big help. things are already much better just dialing back how often my jobs poll.
atoll
14:30 yeah!
sydpolk
14:31 I'm glad I cache firefox binaries in my instance; I shudder to think what would happen if each test downloaded stuff itself.
atoll
14:31 well, heh
14:32 yeah
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
Updated•9 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•