Closed Bug 1592930 Opened 5 years ago Closed 5 years ago

Filter out weird buckets coming from Glean SDK <= v19.0.0

Categories

(Data Platform and Tools :: General, defect, P2)

defect
Points:
3

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Dexter, Assigned: klukas)

References

Details

Attachments

(1 file)

In bug 1591938 we found that the Kotlin implementation of the Glean SDK is generating buckets that are only 1ns apart on Android SDK 22. This might have to do with different rounding implementation on that platform.

This problem will go away with Glean SDK version 19.0.0, which implements this part in Rust.

We decided to filter out this flaky bucket data from the views.

See Also: → 1591938

(In reply to Alessio Placitelli [:Dexter] from comment #0)

We decided to filter out this flaky bucket data from the views.

Can you expand a bit more on this?

Flags: needinfo?(alessio.placitelli)

Hey Mike, can you add some more info about how to identify flaky buckets and how to filter them?

Flags: needinfo?(alessio.placitelli) → needinfo?(mdroettboom)

If we just want to remove the incorrectly-collected data, we would just remove all pings where client_info.telemetry_sdk_version < 19.0.0.

To actually correct the buckets, I could probably manually generate a mapping by looking at broken vs. correct bucketing and we could update the numbers. Even then, we'd be introducing a very small amount of error, due to the differences in the original buckets. Since this only affects GV data which is pretty new anyway, I don't know if it's worth the trouble, though.

Flags: needinfo?(mdroettboom)

Since this only affects GV data which is pretty new anyway, I don't know if it's worth the trouble, though.

What tables or products are affected by this? All android products?

If we just want to remove the incorrectly-collected data, we would just remove all pings where client_info.telemetry_sdk_version < 19.0.0.

I assume we'll still be receiving some amount of pings with the old version for some time. Is that true?

Assignee: nobody → jklukas
Points: --- → 3
Priority: -- → P2

:Dexter - It is still unclear to me what criteria we want to use to filter out affected data. The easiest path seems to be to filter in user-facing views based on client_info.telemetry_sdk_version < 19.0.0 and I am inclined to move forward with that, but I still need to know whether all glean tables are affected or just particular products.

Flags: needinfo?(alessio.placitelli)

Hey Mike, any chance you could fill in the details for Jeff? (see comment 5)

Flags: needinfo?(alessio.placitelli) → needinfo?(mdroettboom)

Comment 5 seems right to me, assuming we are just doing that for the histograms (and not losing other data from those old pings).

Flags: needinfo?(mdroettboom)

(In reply to Michael Droettboom [:mdroettboom] from comment #7)

Comment 5 seems right to me, assuming we are just doing that for the histograms (and not losing other data from those old pings).

Based on this discussion, it sounds like we need to alter all metrics views similar to the following:

SELECT
  * REPLACE ( (
    SELECT
      AS STRUCT metrics.* REPLACE(
      IF
        (SAFE_CAST(SPLIT(client_info.telemetry_sdk_build, '.')[
          OFFSET
            (0)] AS INT64) >= 19,
          metrics.timing_distribution,
          NULL) AS timing_distribution)) AS metrics)
FROM
  `moz-fx-data-shared-prod.org_mozilla_fenix_stable.metrics_v1` AS m
WHERE
  DATE(submission_timestamp) = "2019-12-12"
LIMIT
  1000

A few assumptions in there that I'd like to have validated:

  • Only metrics pings contain histograms
  • All histograms appear under metrics.timing_distribution
  • All metrics pings will contain a nested timing_distribution field

If any of the above is not correct, then it may not be feasible to do this generically as part of view generation, and we'll instead have to target individual tables for which we want to provide this filtering in their views. Although we can probably check if a metrics.timing_distribution field exists as part of the logic for creating views.

A few assumptions in there that I'd like to have validated:

  • Only metrics pings contain histograms

In general, this is not true; however in this specific case, it is.

  • All histograms appear under metrics.timing_distribution

Again, in general, not true; however, that's the only piece it looks like needs hiding.

  • All metrics pings will contain a nested timing_distribution field

For Fenix and Fenix nightly, the metrics ping schema has a timing_distribution field. That is again not true in general.

Does that clear things up? Solving this specific case is all I believe is necessary here.

Does that clear things up? Solving this specific case is all I believe is necessary here.

Based on what you just said, we only need to worry about org_mozilla_fenix*.metrics. I will plan to apply the replacement as in the query above and filter it only to metrics pings from fenix products.

The view change is now deployed. Closing.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: