We recently learned Kaspersky is scanning ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/ once per day, finding ~14000 new files of 270G in size, and attempting to download them via ftp. They say they get very low throughput, but the ftp server doing a lot of stats in nightly/ may cause excessive load on the netapp. Please check the ftp logs to see if this traffic is happening at the same time as we are experiencing issues with networking and ftp in SCL3 (bug 957502), which is typically about 7:30pm Pacific.
Created attachment 8366315 [details] zlb ftp cluster traffic history this was discussed in #infra earlier, but i wanted to included it in the bug. this attachment is a graph of the ftp clusters traffic history over the last 24 hours - you'll see there is no major spike in traffic and these numbers have lots of room for growth in bandwidth. it'd be good to get a little more information about these slow requests, like where they're downloading from (location and/or ip).
Are those times Pacific? If so, peaking from 8pm to 11pm (exactly the timeframe in question) on a Sunday night as high as we do while building and then serving a nightly seems quite suspicious.
Yes, it's Pacific, but I'm not sure where you see a peak between 8 and 11pm. I could believe a ramp from 8 to 11, and then pretty flat until getting quieter at about 8am. Based on the mana docs for product delivery the graph is just ftp:// traffic measured at the Zeus level, which RelEng generates precisely zero of (we're all http). For context, ganglia says we push up to 200MB/s for all of ftpN web heads, which is ftp/http/https for ftp.m.o/stage.m.o/pvtbuilds/CDN origins and probably more I don't know about. cturra, I've asked Kaspersky about when, but not the 'where from'. They have (at least) these netblocks 220.127.116.11 - 18.104.22.168 22.214.171.124 - 126.96.36.199 FWIW, some of the timeouts we see in the Pacific evening are slow d/l from ftp.m.o. That could be either slow tunnel between SCL3 and AWS, or slow http://ftp.mo. Recently I ran find over firefox/nightly and it took an hour to traverse it (philor, that ended Jan 26 14:36 PT if you're looking for a correlation).
we're all done here