If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Tinderbox getting intermittent "ERROR 503: Service Temporarily Unavailable." from stage.mozilla.org downloading tests/symbols

RESOLVED FIXED

Status

mozilla.org Graveyard
Server Operations
--
blocker
RESOLVED FIXED
7 years ago
3 years ago

People

(Reporter: philor, Assigned: justdave)

Tracking

Details

(Reporter)

Description

7 years ago
For example, http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1274966315.1274966653.26040.gz

--06:21:00--  http://stage.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux-debug/1274965447/firefox-3.7a5pre.en-US.linux-i686.crashreporter-symbols.zip
Resolving stage.mozilla.org... 10.2.74.116
Connecting to stage.mozilla.org|10.2.74.116|:80... connected.
HTTP request sent, awaiting response... 503 Service Temporarily Unavailable
06:24:09 ERROR 503: Service Temporarily Unavailable.

program finished with exit code 1

(nagios seems to be shouting about ftp.m.o, so you may well already know about it, but I needed a bug to point the "Tree's closed" message at)
When it was down for me, I saw the following message in Firefox:
firefox-stage-backend.mozilla.org.
Assignee: server-ops → justdave
firefox-stage-backend is a reverse-proxy to get all the httpd reads to the firefox staging iscsi mount coming from the same place so it doesn't have to be re-exported via nfs all over the place.  dm-ftp01 is what's actually serving it, and it's hitting maxclients right now (3000) mostly with traffic for 3.6.4-candidates.  It doesn't seem to be particularly loaded though, so I've bumped maxclients up to 4000 and I'm keeping an eye on it.
dm-ftp01 seems to have leveled off with a system load bouncing around between 2 and 4 (which is well within limits since it's got 4 cores), no swapping happening, and around 3200 httpd clients (which is over the previous 3000 maxclients but not by much).  So I think we're clear.
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
just hit the 4000 maxclients limit again, was right after the mpt outage so I'm not sure if it was just people playing catchup or what

Load and memory are still doing okay, so I've bumped it to 5000
FYI, I'm currently getting ~6 KB/s downloading a Camino nightly; an hour ago I was getting consistent timeouts just trying to connect to ftp.m.o, and then ~200 bytes/s download.

Don't know if this is just crushing load of the 3.6.4 beta, or if it's something else; just wanted to mention it.

Updated

7 years ago
Depends on: 568591
Duplicate of this bug: 568525
aravind made some performance tweaks to the kernel TCP settings on dm-ftp01 that seem to have helped a lot.

The 3.6.4 betatest and releasetest update channels have been throttled at 50%.
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.