Closed
Bug 851784
Opened 12 years ago
Closed 12 years ago
Automated tests downloading builds from ftp.m.o hitting intermittent "503 Server Too Busy" and timeout errors
Categories
(mozilla.org Graveyard :: Server Operations, task)
mozilla.org Graveyard
Server Operations
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: philor, Assigned: afernandez)
Details
Attachments
(1 file)
77.69 KB,
image/png
|
Details |
We were getting just a few of these, a couple an hour, after the fix for bug 851705, but now it's more like tens and rapidly increasing. Trees are already closed, but this would keep them closed if the other closer gets fixed.
Example logs of the two sorts of failure (these URLs are not the problem, they are logs of the problem happening - I always confuse people into thinking I'm saying that the logs don't load, when what I mean is "open this URL and read this log of the failure"):
https://tbpl.mozilla.org/php/getParsedLog.php?id=20718014&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=20718222&tree=Mozilla-Inbound
Assignee | ||
Updated•12 years ago
|
Severity: blocker → major
Assignee | ||
Comment 1•12 years ago
|
||
Do you have a subnet (or list of subnets) that we could possibly whitelist so that the measures taken in bug 851705 don't apply to the tree?
Assignee | ||
Comment 2•12 years ago
|
||
:philor I added a list of subnets to the whitelist that should alleviate the issue.
Please let us know if you still experience issues, thank you.
Feel free to bump up importance if it occurs again.
Severity: major → normal
Reporter | ||
Comment 3•12 years ago
|
||
Looks like despite most trees being closed we did still get a couple of them at 06:41, probably when load would have picked up a little bit from nightly builds being tested, so I'd guess it's just a ticking timebomb waiting for us to say "see, it's fine when there's nothing happening, we should make things happen again."
Reporter | ||
Comment 4•12 years ago
|
||
But I got talked into reopening, and within 90 minutes we'd built up enough load that we hit 10 of these.
Reporter | ||
Comment 5•12 years ago
|
||
More like 30 by now. The trees are still open, because I can retrigger the failing jobs, but that means I have to close them 4-6 hours before I'm going to leave, so I might as well page now as page 4-6 hours before I go to sleep (which isn't all that long from now).
Severity: normal → blocker
Assignee | ||
Updated•12 years ago
|
Assignee: server-ops → afernandez
Assignee | ||
Comment 6•12 years ago
|
||
We have increased the previously set bandwidth cap from 500mbps to 700mbps.
This should fix the "ERROR 503: Server Too Busy." errors. If you still see issues, please let us know.
Assignee | ||
Updated•12 years ago
|
Assignee: afernandez → server-ops
Severity: blocker → normal
Assignee | ||
Comment 7•12 years ago
|
||
Was watching the current bandwidth activity and seems at times we reached the 700mbps cap. Increased the cap by another 300mbps for a total of 1000mbps.
Seems at random times we do reach the new 1G cap but it doesn't straight line so it should be much better now. We could possibly increase it to more we prefer to do gradual increases as to not cause load issues on the ftp cluster.
Assignee | ||
Comment 8•12 years ago
|
||
Doubled the cap to 2G
Assignee | ||
Comment 9•12 years ago
|
||
:philor have you experienced the same errors/issues? Are we stable now?
Please advise, thank you.
Reporter | ||
Comment 10•12 years ago
|
||
We survived Sunday just fine, and have survived Monday morning fine, but we're not up to full weekday load, 664 jobs running when full capacity is around 1200.
Assignee | ||
Comment 11•12 years ago
|
||
:philor, ok please keep us updated. We looked at the bandwidth history for 90 days and seems we never reached 2G. There were only two times that it passed the 1.5G barrier.
Reporter | ||
Comment 12•12 years ago
|
||
We did hit the full weekday load several times today without incident (well, with lots of incidents which were not this), so much as I hate to jinx it I think we can call this fixed. Thanks!
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Assignee: server-ops → afernandez
Updated•10 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•