Closed Bug 968979 Opened 11 years ago Closed 11 years ago

Seeing 503s returned from Bouncer/download.mozilla.org in production

Categories

(Infrastructure & Operations Graveyard :: WebOps: Product Delivery, task)

task
Not set
blocker

Tracking

(Not tracked)

VERIFIED WORKSFORME

People

(Reporter: stephend, Assigned: ericz)

References

()

Details

(Whiteboard: [fromAutomation])

We're seeing a lot of 503s (at various times/intervals) from download.mozilla.org hosts; started this morning at 10:10 am (http://qa-selenium.mv.mozilla.com:8080/job/bouncer.prod/26140/), and is sporadically continuing. Here's a sample exception: tests/test_redirects.py:99: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = <class unittestzero.Assert at 0x110ca2050>, first = 503, second = 200 msg = u"Redirect failed with HTTP status 503. \n Failed on http://download.mozilla.o...ad.mozilla.org/?lang=sr&product=firefox-16.0b6&os=osx\n X-Backend-Server: None" @classmethod def equal(self, first, second, msg=None): """ Asserts that 2 elements are the same :Args: - First object to be tested - Second object to be tested - Message that will be printed if it fails """ > assert first == second, msg E AssertionError: Redirect failed with HTTP status 503. E Failed on http://download.mozilla.org E Using {'lang': 'sr', 'product': 'firefox-16.0b6', 'os': 'osx'}. E Response URL: http://download.mozilla.org/?lang=sr&product=firefox-16.0b6&os=osx E X-Backend-Server: None
Assignee: server-ops-webops → eziegenhorn
we're not seeing the same on the production jenkins server: https://ci.mozilla.org/job/bouncer.prod/ are the tests the same?
Flags: needinfo?(stephen.donner)
(In reply to Chris Turra [:cturra] from comment #1) > we're not seeing the same on the production jenkins server: > > https://ci.mozilla.org/job/bouncer.prod/ > > > are the tests the same? Same repo, same runner -- the chief difference looks to be the frequency and environment; ours run from inside the Mountain View corp LAN (on a QA/Rel Eng VLAN?) -- that might've been part of the issue, I don't know. :rhelmer, can you compare the test-run configurations?
Flags: needinfo?(stephen.donner) → needinfo?(rhelmer)
(In reply to Stephen Donner [:stephend] from comment #0) > We're seeing a lot of 503s (at various times/intervals) from > download.mozilla.org hosts; started this morning at 10:10 am > (http://qa-selenium.mv.mozilla.com:8080/job/bouncer.prod/26140/), and is > sporadically continuing. > > Here's a sample exception: > > tests/test_redirects.py:99: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ _ > > self = <class unittestzero.Assert at 0x110ca2050>, first = 503, second = 200 > msg = u"Redirect failed with HTTP status 503. \n Failed on > http://download.mozilla.o...ad.mozilla.org/?lang=sr&product=firefox-16. > 0b6&os=osx\n X-Backend-Server: None" > > @classmethod > def equal(self, first, second, msg=None): > """ > Asserts that 2 elements are the same > > :Args: > - First object to be tested > - Second object to be tested > - Message that will be printed if it fails > """ > > assert first == second, msg > E AssertionError: Redirect failed with HTTP status 503. > E Failed on http://download.mozilla.org > E Using {'lang': 'sr', 'product': 'firefox-16.0b6', 'os': 'osx'}. > E Response URL: > http://download.mozilla.org/?lang=sr&product=firefox-16.0b6&os=osx That the X-Backend-Server is none is strange and seems like something network related? > E X-Backend-Server: None
(In reply to Stephen Donner [:stephend] from comment #2) > (In reply to Chris Turra [:cturra] from comment #1) > > we're not seeing the same on the production jenkins server: > > > > https://ci.mozilla.org/job/bouncer.prod/ > > > > > > are the tests the same? > > Same repo, same runner -- the chief difference looks to be the frequency and > environment; ours run from inside the Mountain View corp LAN (on a QA/Rel > Eng VLAN?) -- that might've been part of the issue, I don't know. :rhelmer, > can you compare the test-run configurations? Sure will take a look. Haven't been able to repro the failure hitting the URL in comment 0 locally fwiw.
(In reply to Brandon Burton [:solarce] from comment #3) > (In reply to Stephen Donner [:stephend] from comment #0) > > We're seeing a lot of 503s (at various times/intervals) from > > download.mozilla.org hosts; started this morning at 10:10 am > > (http://qa-selenium.mv.mozilla.com:8080/job/bouncer.prod/26140/), and is > > sporadically continuing. > > > > Here's a sample exception: > > > > tests/test_redirects.py:99: > > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > > _ _ > > > > self = <class unittestzero.Assert at 0x110ca2050>, first = 503, second = 200 > > msg = u"Redirect failed with HTTP status 503. \n Failed on > > http://download.mozilla.o...ad.mozilla.org/?lang=sr&product=firefox-16. > > 0b6&os=osx\n X-Backend-Server: None" > > > > @classmethod > > def equal(self, first, second, msg=None): > > """ > > Asserts that 2 elements are the same > > > > :Args: > > - First object to be tested > > - Second object to be tested > > - Message that will be printed if it fails > > """ > > > assert first == second, msg > > E AssertionError: Redirect failed with HTTP status 503. > > E Failed on http://download.mozilla.org > > E Using {'lang': 'sr', 'product': 'firefox-16.0b6', 'os': 'osx'}. > > E Response URL: > > http://download.mozilla.org/?lang=sr&product=firefox-16.0b6&os=osx > > That the X-Backend-Server is none is strange and seems like something > network related? > > > E X-Backend-Server: None Can someone check the logs for the load balancer?
Flags: needinfo?(rhelmer)
Sorry, can you clarify the "source -> destination:port" information for hosts that are having trouble?
Additional data point: all seems well in New Relic: https://rpm.newrelic.com/accounts/263620/applications/2621838 Apdex is 0.99, pretty steady, although it does appear to dip slightly every now and then... not in the time window identified, and not by enough to account for this (IMO). Error rate is extremely low... 0.0002%. Around 5-12k requests per minute, and less than 0.1 errors per minute. Server stats look fairly uniform. Response time on bouncer1.webapp.phx1 is a teensy bit higher than the other nodes, but at a glance it looks to be within the normal variance. Overall throughput appears relatively constant (varies with time of day)... I don't see any unusual trends.
Marking as WFM; this auto-resolved, apparently :-\
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WORKSFORME
Bumping to qa verified - transient :p
Status: RESOLVED → VERIFIED
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
No longer blocks: 1976034
You need to log in before you can comment on or make changes to this bug.