Closed Bug 628986 Opened 13 years ago Closed 13 years ago

load average on stage is 105, causing failed builds

Categories

(mozilla.org Graveyard :: Server Operations, task)

x86_64
Linux
task
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bhearsum, Assigned: justdave)

References

Details

Attachments

(1 file)

We're losing a whole bunch of nightlies because of it.
it's mostly post_upload.py instances, which are called when builds finish and are uploading their results to stage.  did something happen to make the disks really slow here?
Changes were made in the past couple of weeks to some of the mounts, in bug 614786. Dunno if it's related to this issue or not.
Assignee: server-ops → jdow
vsftpd on dm-ftp01 is getting hammered, currently 800+ running..  Going to try and settle that down.
Assignee: jdow → cshields
Blocks: 628996
killing apache on dm-download02 solved it.  the firefox tree is reverse-proxied on the back-end to dm-ftp01.  Reviewing the logs showed majority of the traffic was complete MARs for Firefox 2.0.0.20.  Added 2.0.0.20 back to the distribution list for mozilla-releases so it would sync out to mirrors, re-enabled apache on dm-download02, haven't had any additional significant load since...
Assignee: cshields → justdave
(In reply to comment #4)
> killing apache on dm-download02 solved it.  the firefox tree is reverse-proxied
> on the back-end to dm-ftp01.  Reviewing the logs showed majority of the traffic
> was complete MARs for Firefox 2.0.0.20.  Added 2.0.0.20 back to the
> distribution list for mozilla-releases 

Was Firefox2.0.0.20 recently removed from mozilla-releases? Or is this a spike increase in demand for 2.0.0.20 ?


> so it would sync out to mirrors,
> re-enabled apache on dm-download02, haven't had any additional significant load
> since...
ok.
We currently know that every week for the last many weeks, we get hit with about 1.25M download attempts for 2.0.0.20-complete.  I am working on getting a spreadsheet that details things better, but if this build was recently taken out of the mirror network then that could explain the problem you are seeing.
Nelson, please attach the spreadsheet here when the query finishes.
Here's the download info. Broken down by version and download type, all versions of Fx 2.0 with 200 or more avg daily downloads in the analysis period are included.
Any chance we have IP addreses of the clients?
Yes, I can get that for the most recent data.  Need to move it to a secure bug though to protect user privacy
We should spin this issue off into one anyways -- this bug was tracking the outage.
We didn't remove Fx 2.0.0.20 from mozilla-releases any time recently. It would have been gone for many many months.
Perhaps the day has come where we disable all updates from 2.0.0.x -> 2.0.0.24, while leaving the major update from 2.0.0.24 to later branches. I've long held the suspicion that we have people which are stuck in a loop - they download an update, fail to apply it, download again, fail again. Can we prove/disprove that in metrics-land ?
I have the IP data if someone can point me at a bug to post it in.  We are getting thousands of downloads per day from a handful of specific IPs.
bug 628185 removed a few other more recent releases out of the -releases
module, making dm-download02 the only place to go to get those releases.  The
addition of those to dm-download02's load was probably just enough to push it
over.
Did we have any determination of what was going on from metrics?  I don't want to hold this open in my queue, so if there's anything left to do here it needs to be assigned appropriately (or get a new bug filed for the postmortem that goes to the appropriate place so I can close this one).
The attached chart definitely shows significant traffic for downloads of 2.0.0.20.  We never received a bug asking for the IP addresses responsible.

I don't think there is anything more to be done on this bug so I'd think it could be closed out.
Guess so.  If you need anything else from me, reopen it.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: