Closed
Bug 968979
Opened 11 years ago
Closed 11 years ago
Seeing 503s returned from Bouncer/download.mozilla.org in production
Categories
(Infrastructure & Operations Graveyard :: WebOps: Product Delivery, task)
Infrastructure & Operations Graveyard
WebOps: Product Delivery
Tracking
(Not tracked)
VERIFIED
WORKSFORME
People
(Reporter: stephend, Assigned: ericz)
References
()
Details
(Whiteboard: [fromAutomation])
We're seeing a lot of 503s (at various times/intervals) from download.mozilla.org hosts; started this morning at 10:10 am (http://qa-selenium.mv.mozilla.com:8080/job/bouncer.prod/26140/), and is sporadically continuing.
Here's a sample exception:
tests/test_redirects.py:99:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <class unittestzero.Assert at 0x110ca2050>, first = 503, second = 200
msg = u"Redirect failed with HTTP status 503. \n Failed on http://download.mozilla.o...ad.mozilla.org/?lang=sr&product=firefox-16.0b6&os=osx\n X-Backend-Server: None"
@classmethod
def equal(self, first, second, msg=None):
"""
Asserts that 2 elements are the same
:Args:
- First object to be tested
- Second object to be tested
- Message that will be printed if it fails
"""
> assert first == second, msg
E AssertionError: Redirect failed with HTTP status 503.
E Failed on http://download.mozilla.org
E Using {'lang': 'sr', 'product': 'firefox-16.0b6', 'os': 'osx'}.
E Response URL: http://download.mozilla.org/?lang=sr&product=firefox-16.0b6&os=osx
E X-Backend-Server: None
Assignee | ||
Updated•11 years ago
|
Assignee: server-ops-webops → eziegenhorn
Comment 1•11 years ago
|
||
we're not seeing the same on the production jenkins server:
https://ci.mozilla.org/job/bouncer.prod/
are the tests the same?
Flags: needinfo?(stephen.donner)
Reporter | ||
Comment 2•11 years ago
|
||
(In reply to Chris Turra [:cturra] from comment #1)
> we're not seeing the same on the production jenkins server:
>
> https://ci.mozilla.org/job/bouncer.prod/
>
>
> are the tests the same?
Same repo, same runner -- the chief difference looks to be the frequency and environment; ours run from inside the Mountain View corp LAN (on a QA/Rel Eng VLAN?) -- that might've been part of the issue, I don't know. :rhelmer, can you compare the test-run configurations?
Flags: needinfo?(stephen.donner) → needinfo?(rhelmer)
Comment 3•11 years ago
|
||
(In reply to Stephen Donner [:stephend] from comment #0)
> We're seeing a lot of 503s (at various times/intervals) from
> download.mozilla.org hosts; started this morning at 10:10 am
> (http://qa-selenium.mv.mozilla.com:8080/job/bouncer.prod/26140/), and is
> sporadically continuing.
>
> Here's a sample exception:
>
> tests/test_redirects.py:99:
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _ _
>
> self = <class unittestzero.Assert at 0x110ca2050>, first = 503, second = 200
> msg = u"Redirect failed with HTTP status 503. \n Failed on
> http://download.mozilla.o...ad.mozilla.org/?lang=sr&product=firefox-16.
> 0b6&os=osx\n X-Backend-Server: None"
>
> @classmethod
> def equal(self, first, second, msg=None):
> """
> Asserts that 2 elements are the same
>
> :Args:
> - First object to be tested
> - Second object to be tested
> - Message that will be printed if it fails
> """
> > assert first == second, msg
> E AssertionError: Redirect failed with HTTP status 503.
> E Failed on http://download.mozilla.org
> E Using {'lang': 'sr', 'product': 'firefox-16.0b6', 'os': 'osx'}.
> E Response URL:
> http://download.mozilla.org/?lang=sr&product=firefox-16.0b6&os=osx
That the X-Backend-Server is none is strange and seems like something network related?
> E X-Backend-Server: None
Comment 4•11 years ago
|
||
(In reply to Stephen Donner [:stephend] from comment #2)
> (In reply to Chris Turra [:cturra] from comment #1)
> > we're not seeing the same on the production jenkins server:
> >
> > https://ci.mozilla.org/job/bouncer.prod/
> >
> >
> > are the tests the same?
>
> Same repo, same runner -- the chief difference looks to be the frequency and
> environment; ours run from inside the Mountain View corp LAN (on a QA/Rel
> Eng VLAN?) -- that might've been part of the issue, I don't know. :rhelmer,
> can you compare the test-run configurations?
Sure will take a look. Haven't been able to repro the failure hitting the URL in comment 0 locally fwiw.
Comment 5•11 years ago
|
||
(In reply to Brandon Burton [:solarce] from comment #3)
> (In reply to Stephen Donner [:stephend] from comment #0)
> > We're seeing a lot of 503s (at various times/intervals) from
> > download.mozilla.org hosts; started this morning at 10:10 am
> > (http://qa-selenium.mv.mozilla.com:8080/job/bouncer.prod/26140/), and is
> > sporadically continuing.
> >
> > Here's a sample exception:
> >
> > tests/test_redirects.py:99:
> > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> > _ _
> >
> > self = <class unittestzero.Assert at 0x110ca2050>, first = 503, second = 200
> > msg = u"Redirect failed with HTTP status 503. \n Failed on
> > http://download.mozilla.o...ad.mozilla.org/?lang=sr&product=firefox-16.
> > 0b6&os=osx\n X-Backend-Server: None"
> >
> > @classmethod
> > def equal(self, first, second, msg=None):
> > """
> > Asserts that 2 elements are the same
> >
> > :Args:
> > - First object to be tested
> > - Second object to be tested
> > - Message that will be printed if it fails
> > """
> > > assert first == second, msg
> > E AssertionError: Redirect failed with HTTP status 503.
> > E Failed on http://download.mozilla.org
> > E Using {'lang': 'sr', 'product': 'firefox-16.0b6', 'os': 'osx'}.
> > E Response URL:
> > http://download.mozilla.org/?lang=sr&product=firefox-16.0b6&os=osx
>
> That the X-Backend-Server is none is strange and seems like something
> network related?
>
> > E X-Backend-Server: None
Can someone check the logs for the load balancer?
Flags: needinfo?(rhelmer)
Comment 6•11 years ago
|
||
Sorry, can you clarify the "source -> destination:port" information for hosts that are having trouble?
Comment 7•11 years ago
|
||
Additional data point: all seems well in New Relic:
https://rpm.newrelic.com/accounts/263620/applications/2621838
Apdex is 0.99, pretty steady, although it does appear to dip slightly every now and then... not in the time window identified, and not by enough to account for this (IMO).
Error rate is extremely low... 0.0002%. Around 5-12k requests per minute, and less than 0.1 errors per minute.
Server stats look fairly uniform. Response time on bouncer1.webapp.phx1 is a teensy bit higher than the other nodes, but at a glance it looks to be within the normal variance.
Overall throughput appears relatively constant (varies with time of day)... I don't see any unusual trends.
Reporter | ||
Comment 8•11 years ago
|
||
Marking as WFM; this auto-resolved, apparently :-\
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WORKSFORME
Updated•9 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•