Mission Control dev instance has problems since Friday's deploy
Categories
(Cloud Services :: Mission Control, defect)
Tracking
(Not tracked)
People
(Reporter: wlach, Unassigned)
References
Details
Since Friday's updates, I've been seeing a large number of problems with the mission control dev server. It is quite strange -- I'm seeing a lot of errors regarding atomic transactions not completing with the update_builds tasks and when I ssh'ed in to the dev environment I saw this query was hung:
21088 | 2 days 21:04:51.222387 | read_write | SELECT "django_migrations"."app", "django_migrations"."name" FROM "django_migrations"
I'm not sure exactly what this is about, I think it's something Django just runs on start up. The suspicious part is that it seems to have been hung for almost 3 days.
Looking at new relic, it appears we're getting errors coming from the fact that more recent versions of Django expect you to wrap create operations that may fail inside transaction.atomic:
https://docs.djangoproject.com/en/1.8/topics/db/transactions/#controlling-transactions-explicitly
I'm not sure if this explains the problems we're seeing, but I have a PR to fix this issue-- we'll see how it goes after that lands:
Hi Will, Marcia pointed me to this bug (thanks!) and I am wondering if this is a change that is external and could not be tested on staging. Or did we test this on staging before deploying it to MC stable?
http://missioncontrol.telemetry.mozilla.org/ vs https://data-missioncontrol.dev.mozaws.net/#/
I assume the latter is staging and former is stable. Thoughts?
Reporter | ||
Comment 3•5 years ago
|
||
This change has not been deployed to stable, as stated these problems are on the development instance.
Unfortunately there are currently some problems we're seeing on the "stable" version (https://missioncontrol.telemetry.mozilla.org), so this bug has an unusually high priority. Fortunately it looks like my PR above worked, as it's displaying data again. I'll do up a backfill and if everything goes smoothly for the next 24 hours, I'll deploy this new, fixed version to production.
Reporter | ||
Comment 4•5 years ago
•
|
||
Yup, that pull request fixed things nicely so https://data-missioncontrol.dev.mozaws.net/ should now be up to date again. I'll try to get :whd to do up a deploy tomorrow to the production site.
Description
•