crash_aggregates' dimensions['channel'] has an awful lot of 'Other'

RESOLVED DUPLICATE of bug 1352091

Status

P2
normal
RESOLVED DUPLICATE of bug 1352091
2 years ago
2 months ago

People

(Reporter: chutten, Assigned: mdoglio)

Tracking

Details

(Reporter)

Description

2 years ago
Queries (like [1]) looking at channels in crash_aggregates come up with the usual suspects (aurora, beta, nightly, release) and the usual "whatever" (no, default, government).

There's an awful lot of 'Other', though. It's the fifth most popular (after the usual suspects) in terms of rows in the dataset. It's the third most popular (only beaten by release and beta) in terms of usage_khours.

Does it mean ESR? If so, why isn't it part of the actual "esr" row? What does it mean? Is it just some clients messing with us?

[1]: https://sql.telemetry.mozilla.org/queries/1093/source

Updated

2 years ago
Points: --- → 1
Priority: -- → P2
Something funky is going on here. I think maybe one of the source datasets (main pings?) is using normalizedChannel (as computed upstream at [1]), while another (crash pings?) is using the raw channel name.

I'm not all that familiar with the CrashAggregateView code, but I did see a reference to normalizedChannel at [2]. Clearly something is using the pre-normalized value.

[1] https://github.com/mozilla-services/data-pipeline/blob/master/hindsight/modules/fx.lua#L23
[2] https://github.com/mozilla/telemetry-batch-view/blob/master/src/main/scala/com/mozilla/telemetry/views/CrashAggregateView.scala#L80
Flags: needinfo?(mdoglio)
(Assignee)

Comment 2

2 years ago
Digging into this now
Assignee: nobody → mdoglio
Flags: needinfo?(mdoglio)
(Assignee)

Comment 3

2 years ago
It looks like we have tons of esr indeed, at least in the crash pings. See [1] for details. I don't know if all those `default` are expected, but we should probably assign them as well to a separate bucket. 
Please let me know if you need me to investigate further.

[1]https://gist.github.com/maurodoglio/6d271cc5849b6655e47ef86d87a1517f
(Reporter)

Comment 4

2 years ago
So... normalizedChannel has 5 values? {release, beta, aurora, nightly, Other}

Could we add esr to it? esr seems to be a perfectly-valid channel we could reason about.
(Assignee)

Comment 5

2 years ago
My only concern on changing the normalizedChannel configuration is that there may be jobs/re:dash queries not expecting the new value. :mreid do you have an opinion on that?
Flags: needinfo?(mreid)
It seems likely that there are things relying on the current behaviour of normalizedChannel, but I think the absence of ESR is a mistake that we should fix. So +1 on adding ESR, even if it impacts other queries.

We should also use the same "channel" strategy for both crashes and main pings in the aggregates.
Flags: needinfo?(mreid)

Updated

2 years ago
Summary: crash_aggrregates' dimensions['channel'] has an awful lot of 'Other' → crash_aggregates' dimensions['channel'] has an awful lot of 'Other'
Component: Metrics: Pipeline → Datasets: Crash Aggregates
Product: Cloud Services → Data Platform and Tools
(Reporter)

Updated

2 years ago
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1352091

Updated

2 months ago
Product: Data Platform and Tools → Data Platform and Tools Graveyard
You need to log in before you can comment on or make changes to this bug.