bugzilla.mozilla.org will be intermittently unavailable on Saturday, March 24th, from 16:00 until 20:00 UTC.

browserid.org/verify is on occasion throwing 500 errors (being caught on the load balancer)



Cloud Services
6 years ago
6 years ago


(Reporter: boozeniges, Assigned: petef)


Firefox Tracking Flags

(Not tracked)


(Whiteboard: [qa+])



6 years ago
On both the mozillaignite (https://github.com/rossbruniges/mozilla-ignite/tree/stage) and webmaker (https://github.com/rossbruniges/make.mozilla.org) projects we've recently been experiencing irregular login failures - as of around yesterday afternoon (2pm, GMT).

After a bit of digging and adding in extra logs we were able to find the following error being thrown:

django_browserid.base:INFO Verification URL: https://browserid.org/verify :/projects/mozilla/ignite/mozilla-ignite/vendor-local/src/django-browserid/django_browserid/base.py:118
django_browserid.base:DEBUG Failed to decode JSON. Resp: 500, Content: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">
<title>Service Unavailable</title>
<style type="text/css">
body, p, h1 {
  font-family: Verdana, Arial, Helvetica, sans-serif;
h2 {
  font-family: Arial, Helvetica, sans-serif;
  color: #b10b29;
<h2>Service Unavailable</h2>
<p>The service is temporarily unavailable. Please try again later.</p>


Both have recently been updated to use the new playdoh - so that django can be deployed out if it, fixing this issue (https://github.com/mozilla/playdoh/issues/107)

Setting as major as both projects being effected are hoping to be deployed next week, and also as this may be an issue on other sites using browserID and django-browserID.
browserid.org is Mozilla Services. Moving...
Assignee: server-ops-infra → nobody
Component: Server Operations: Infrastructure → Operations
Product: mozilla.org → Mozilla Services
QA Contact: jdow → operations
Version: other → unspecified

Comment 2

6 years ago
Thanks Shyam - that was the one thing that I wasn't sure about :)
You're welcome Ross. I've poked the services ops folks on IRC as well, if this is incorrect, they'll move it to the right place and look at it.

Comment 4

6 years ago
I think this is probably fixed now.  The verifier pool in scl2 had all backends marked as draining for some odd reason, so when GSLB gave you scl2's IP address for browserid.org, /verify calls would fail.

We need to start QAing the verifier service before we undrain a datacenter during a push.
Assignee: nobody → petef
Yes. I agree.
Seems like I need to update our Test Plan to add a section for Prod push specific stuff.
Probably something that could be automated...
Whiteboard: [qa+]

Comment 6

6 years ago
Filed bug 753828 to monitor zeus pools so we'll catch this before undraining a datacenter in the future.
Should be covered by a script that :jrgm has.
Also, running it once per colo per Prod push should be enough to verify everything is working as expected (getting a 200 back rather than a 500).


6 years ago
Last Resolved: 6 years ago
Resolution: --- → FIXED
Marking as Verified since we appear to have everything in place, including tests for Prod.
You need to log in before you can comment on or make changes to this bug.