Closed
Bug 1353396
Opened 7 years ago
Closed 7 years ago
Hardware survey default view is not populating graphs
Categories
(Cloud Services :: Metrics: Product Metrics, defect, P1)
Cloud Services
Metrics: Product Metrics
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: thuelbert, Assigned: Dexter)
References
Details
(Whiteboard: [measurement:client])
1. point your browser at https://metrics.mozilla.com/firefox-hardware-survey/ 2. eye ball graphs results: no values expected: populate graphs notes: this is a regression, but not sure when it broke
Reporter | ||
Updated•7 years ago
|
Assignee: nobody → alessio.placitelli
Points: --- → 1
Priority: -- → P1
Assignee | ||
Updated•7 years ago
|
Whiteboard: [measurement:client]
Assignee | ||
Comment 1•7 years ago
|
||
tl;dr - The website is up and running now, with the latest data.
This is a bit weird. The job didn't fail on Airflow the last time it ran (like the previous week), but produced an awkward data unit:
> {
> "date": "2017-03-26",
> "broken": 0,
> "inactive": 1
> },
This is basically saying that, for the ETL job, all users were inactive the past week. Which is, obviously, not true. In fact, running the job again by spawning a cluster produces the correct output.
Inspecting the output of the job that ran over Airflow doesn't provide any useful insight.
I'll investigate a bit more on the root causes of this tomorrow.
Assignee | ||
Comment 2•7 years ago
|
||
There was no useful insight from the Airflow log. In order to gather more evidence about the issue, I filed a PR to make the ETL job validate outgoing data before trying to push it to S3. This PR also produces more informative logs. I've put together a postmortem document at [2]. Please note that this PR doesn't fix the problem, but rather makes the scheduled job fail on invalid data so that we don't break the public-facing website. An email is sent when the job fails. [1] - https://github.com/mozilla/firefox-hardware-report/pull/26 [2] - https://docs.google.com/document/d/15n4VHHaxOBshFn3e8Eh2CkRwXF74_T8FXD71Dm1AXP8/edit
Assignee | ||
Comment 3•7 years ago
|
||
The PR from comment 2 was merged. If something wrong happens when the ETL job is triggered with Airflow, we should have enough information to figure out why it's failing.
Assignee | ||
Comment 4•7 years ago
|
||
Looks like the fixes from bug 1355153 worked and the HW-report website updated correctly this week. Closing this as Fixed. For additional information about the outage, see the postmortem doc in comment 2.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•