Closed Bug 1647328 Opened 5 years ago Closed 2 years ago

Convert performance release criteria query into incremental query

Categories

(Data Platform and Tools :: General, enhancement, P1)

enhancement
Points:
3

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: esmyth, Assigned: relud)

Details

Attachments

(3 files)

Request is to convert my WIP v3 performance release criteria query to an incremental query.

The query accomplishes several tasks:

  1. Query the perfherder table for in-scope test results,
  2. calculate or normalize a few fields (e.g. os name from platform name),
  3. flattens the data to the replicate level,
  4. joins the flattened table with a gsheet-backed table that defines test aggregation,
  5. aggregates replicate values, (as defined by the fields from the joined rc definition table), to the subtest level,
  6. aggregates the new subtests values to the test level,
  7. calculates a weekly moving average, value for the previous release, and % change for both.

I'm still working through some correctness issues, however, I wanted to get this on file to track the conversion.

Assignee: nobody → dthorn
Status: NEW → ASSIGNED
Points: --- → 3
Priority: -- → P1
Attached file view.sql

:esmyth I looked at the linked script, and when converted to a single query with CTEs it only scans ~42GB of data ($0.21 per run) with weeks_to_query = 52 and release_criteria_helper_exclude_key = "include".

If I'm not missing anything, this is so cheap that perhaps this would be well suited to being created as a BigQuery View (attached), rather than converting it to an incremental query?

Points: 3 → 2

:relud, as suggested, I tried the view approach (query, dashboard). This works, however it means that the page load for the initial dashboard load, any page change on the dashboard, and any filtering on the dashboard takes 2+ minutes as the queries execute in full. When test_results_all and test_results_summary are tables the initial page load is less than 10 seconds and changes are generally as fast or faster. Do you have any suggestions for how to improve the performance for the dashboard?

Flags: needinfo?(dthorn)

When test_results_all and test_results_summary are tables the initial page load is less than 10 seconds and changes are generally as fast or faster. Do you have any suggestions for how to improve the performance for the dashboard?

As I understand it, making those two into tables is really the only option when using Data Studio report. It seems reasonable to change the CREATE OR REPLACE VIEW statements for those into CREATE OR REPLACE TABLE statements, and schedule it to run daily.

While it's still under development, I'd recommend scheduling via the Schedule Query button in the BigQuery console. When it's ready for production, we can add it to https://github.com/bigquery-etl for code review and scheduling.

Flags: needinfo?(dthorn)

What are the next steps here?

Eric, are you ready for this to get scheduled?

Flags: needinfo?(esmyth)

I was about 90% done with the query at the end of July and it fell of my radar since. My goal is to finish the queries by EOM.

I'm not sure if it would make sense to perform another sanity check of my work before I handover for scheduling. The master query is here and the created tables and views all reside in the moz-fx-data-bq-performance:release_criteria dataset.

Flags: needinfo?(esmyth)

:relud, I think we're good for now. Revisions can be handled later as needed.

Not all of the tables and views created by this query are needed once converted. I used them to either work around memory constraints or to facilitate data validation and debugging. The following tables should be available once converted: rc_flattened_test_data, release_criteria, release_criteria_summary, stale_tests, and dashboard_health

Flags: needinfo?(dthorn)

cool, i'll work on converting that to a PR in bigquery-etl

Flags: needinfo?(dthorn)

Hey :relud, could you provide an estimate for when this work will be complete or the pr ready for review?

Flags: needinfo?(dthorn)

PR up for review

Flags: needinfo?(dthorn)

this is blocked on bug 1697646 and bug 1697652 for granting ETL sufficient permissions to run the queries

Points: 2 → 3
Component: Datasets: General → General
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: