Open Bug 1755307 Opened 3 years ago Updated 2 years ago

Recreate graphics_telemetry dashboard with SQL and Looker

Categories

(Data Platform and Tools :: General, task, P5)

task

Tracking

(Not tracked)

People

(Reporter: klukas, Unassigned, NeedInfo)

References

Details

(Whiteboard: [dataplatform])

The graphics_telemetry dashboard is populated via a Spark job and outputs static files to S3 for viewing at a URL. It started as a Databricks notebook and migrated to python_mozetl. It relies on a shim for the python_moztelemetry library as well, which was deprecated more than two years ago.

The shape of this job is pretty unlike other pieces of our modern data stack, and we've struggled to maintain it well. It has occasionally gotten into a failure state for weeks at a time, and it has required significant DE investment to debug the job and get it working again.

We should look at recreating this with our modern toolchain: ETL via SQL queries in bigquery-etl, and visualizations in Looker.

:miko - Can I get your perspective on making this change? In particular, the following questions come to mind:

  • Who is the audience for this dashboard? Does it need to be publicly accessible?
  • Can you provide a high-level spec for what the dashboard should provide? That will be very helpful to untangle what's an implementation detail of the spark code and what's the real intent of the logic.
Flags: needinfo?(mikokm)

JeffM can probably answer these questions better.

Flags: needinfo?(mikokm) → needinfo?(jmuizelaar)

The primary audience of the dashboard is the graphics team though it may be used by others.

I haven't used Looker yet, but the main advantage it has over a sql.tememetry.mozilla.org dashboard is that it loads very quickly and has discoverable sub pages.

https://firefoxgraphics.github.io/telemetry/#view=hwsearch is also a pretty handy feature.

It may make sense to split different parts of the functionality out to different and deprecate some of it.

Do we have a modern workflow for producing regularly updating json like: https://analysis-output.telemetry.mozilla.org/gfx/telemetry-data/device-statistics.json?

Flags: needinfo?(jklukas)

(In reply to Jeff Muizelaar [:jrmuizel] from comment #2)

The primary audience of the dashboard is the graphics team though it may be used by others.

I haven't used Looker yet, but the main advantage it has over a sql.tememetry.mozilla.org dashboard is that it loads very quickly and has discoverable sub pages.

https://firefoxgraphics.github.io/telemetry/#view=hwsearch is also a pretty handy feature.

Looker certainly offers more that what's available in STMO, but this is helpful context about what's important for usability here.

It may make sense to split different parts of the functionality out to different and deprecate some of it.

Do we have a modern workflow for producing regularly updating json like: https://analysis-output.telemetry.mozilla.org/gfx/telemetry-data/device-statistics.json?

We do have a process for publishing public json datasets, documented at https://docs.telemetry.mozilla.org/cookbooks/publishing_datasets.html

Flags: needinfo?(jklukas)

Moving this to a P5. This is good work to spec out, but we have a whole bunch of other planes to land this half so we can't commit any resources here. It will take a data engineer a solid quarter to understand the needs here and start creating datasets that could meet them. IIRC this also uses symbolication, which would not be easily reproduced in a SQL pipeline. Willkg has informed me that this job does not do any symbolication.

Priority: -- → P5

I'll note that s3://telemetry-public-analysis-2 is on the chopping block for H2 (since SRE has a strong desire to remove all remaining AWS components of the pipeline), but after discussions with DE, we're going to table the "migrate to standard json datasets" approach in favor of simply moving https://analysis-output.telemetry.mozilla.org/ to point at a GCS bucket. This will be followed by a comprehensive migration of the various bespoke cases to the standard pipeline in 2023 or beyond.

(In reply to Wesley Dawson [:whd] from comment #5)

I'll note that s3://telemetry-public-analysis-2 is on the chopping block for H2 (since SRE has a strong desire to remove all remaining AWS components of the pipeline), but after discussions with DE, we're going to table the "migrate to standard json datasets" approach in favor of simply moving https://analysis-output.telemetry.mozilla.org/ to point at a GCS bucket. This will be followed by a comprehensive migration of the various bespoke cases to the standard pipeline in 2023 or beyond.

In that case, we need to at least have this job write out to GCS instead of S3, right? Is that work being tracked elsewhere?

Flags: needinfo?(whd)

In that case, we need to at least have this job write out to GCS instead of S3, right? Is that work being tracked elsewhere?

Yes I meant to link to it but this is being tracked in https://mozilla-hub.atlassian.net/browse/DSRE-951

Flags: needinfo?(whd)
Whiteboard: [data-platform-infra-wg] → [dataplatform]

Jeff, the job recently broke, and we wanted to confirm that you all are still using the graphics telemetry dashboards (in which case we will un-archive python_moztelemetry and put a fix in).

Flags: needinfo?(jmuizelaar)
Flags: needinfo?(jmuizelaar)
You need to log in before you can comment on or make changes to this bug.