Mission Control dev instance has problems since Friday's deploy

RESOLVED FIXED

Status

defect
RESOLVED FIXED
2 months ago
2 months ago

People

(Reporter: wlach, Unassigned)

Tracking

(Blocks 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

Since Friday's updates, I've been seeing a large number of problems with the mission control dev server. It is quite strange -- I'm seeing a lot of errors regarding atomic transactions not completing with the update_builds tasks and when I ssh'ed in to the dev environment I saw this query was hung:

21088 | 2 days 21:04:51.222387 | read_write | SELECT "django_migrations"."app", "django_migrations"."name" FROM "django_migrations"

I'm not sure exactly what this is about, I think it's something Django just runs on start up. The suspicious part is that it seems to have been hung for almost 3 days.

Looking at new relic, it appears we're getting errors coming from the fact that more recent versions of Django expect you to wrap create operations that may fail inside transaction.atomic:

https://docs.djangoproject.com/en/1.8/topics/db/transactions/#controlling-transactions-explicitly

I'm not sure if this explains the problems we're seeing, but I have a PR to fix this issue-- we'll see how it goes after that lands:

https://github.com/mozilla/missioncontrol/pull/366

Flags: needinfo?(whd)

Oops, didn't mean to set needinfo on whd.

Flags: needinfo?(whd)

Hi Will, Marcia pointed me to this bug (thanks!) and I am wondering if this is a change that is external and could not be tested on staging. Or did we test this on staging before deploying it to MC stable?

http://missioncontrol.telemetry.mozilla.org/ vs https://data-missioncontrol.dev.mozaws.net/#/

I assume the latter is staging and former is stable. Thoughts?

Flags: needinfo?(wlachance)

This change has not been deployed to stable, as stated these problems are on the development instance.

Unfortunately there are currently some problems we're seeing on the "stable" version (https://missioncontrol.telemetry.mozilla.org), so this bug has an unusually high priority. Fortunately it looks like my PR above worked, as it's displaying data again. I'll do up a backfill and if everything goes smoothly for the next 24 hours, I'll deploy this new, fixed version to production.

Flags: needinfo?(wlachance)

Yup, that pull request fixed things nicely so https://data-missioncontrol.dev.mozaws.net/ should now be up to date again. I'll try to get :whd to do up a deploy tomorrow to the production site.

Status: NEW → RESOLVED
Last Resolved: 2 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.