Closed Bug 1297196 Opened 9 years ago Closed 9 years ago

French locale discrepancy between client_count dataset and other datasets (FHRv2, main_summary)

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: kparlante, Assigned: rvitillo)

Details

Attachments

(1 file)

Discussion in slack (#fx-metrics) revealed a discrepancy between a query in re:dash using client_count, and other datasets. RT noticed the locales of French users has an unexpected breakdown in client_count: https://sql.telemetry.mozilla.org/queries/1034/source Saptarshi's FHRv2 analysis had 96% 'fr' locale: https://gist.github.com/saptarshiguha/2c0a4cf21d1776aa525def09f81457ba https://docs.google.com/spreadsheets/d/1ypSpS9SLx5PK3hlpRoxNUEWKf53vFfk9oAYKXkn0xj8/edit#gid=2037446830 https://metrics.mozilla.com/protected/sguha/ctrylocale/ spenrose saw something similar in a notebook looking at main_summary, attached
Summary: French locale discrepancy between client_count dataset and other datasets (ADI, main_summary) → French locale discrepancy between client_count dataset and other datasets (FHRv2, main_summary)
Points: --- → 1
Priority: -- → P1
(In reply to Katie Parlante from comment #0) > RT noticed the locales of French users has an unexpected breakdown in > client_count: > https://sql.telemetry.mozilla.org/queries/1034/source That query is incorrect as it's counting the number of combinations of dimensions and not the number of profiles. See [1] for a correct version. [1] - https://sql.telemetry.mozilla.org/queries/1057/source
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
If a data source makes it easy to get a misleading answer with a query that is not obviously buggy, I do not think the underlying issue should be considered "Resolved Fixed." I have been writing SQL for years and I cannot explain the semantics of the correct version. (In reply to Roberto Agostino Vitillo (:rvitillo) from comment #1) > (In reply to Katie Parlante from comment #0) > > RT noticed the locales of French users has an unexpected breakdown in > > client_count: > > https://sql.telemetry.mozilla.org/queries/1034/source > > That query is incorrect as it's counting the number of combinations of > dimensions and not the number of profiles. See [1] for a correct version. > > [1] - https://sql.telemetry.mozilla.org/queries/1057/source
(In reply to Sam Penrose from comment #2) > If a data source makes it easy to get a misleading answer with a query that > is not obviously buggy, I do not think the underlying issue should be > considered "Resolved Fixed." > > I have been writing SQL for years and I cannot explain the semantics of the > correct version. The client_count table does not contain individual measurements, but roll-ups; you might want to read [1]. We have several bugs open to improve our current documentation and one of the goals of Ryan Harter for this quarter is to describe the various datasets and when one should use one or the other. That said, catching these bugs is precisely why we do peer-reviews. Generally the reviewer should have some experience with the dataset being queried. I am not sure how you ended up reviewing that query since you clearly have never dealt with the client_count dataset before. [1] https://robertovitillo.com/2016/04/12/measuring-product-engagment-at-scale/
(In reply to Roberto Agostino Vitillo (:rvitillo) from comment #3) > (In reply to Sam Penrose from comment #2) > > If a data source makes it easy to get a misleading answer with a query that > > is not obviously buggy, I do not think the underlying issue should be > > considered "Resolved Fixed." > > > > I have been writing SQL for years and I cannot explain the semantics of the > > correct version. > > The client_count table does not contain individual measurements, but > roll-ups; you might want to read [1]. We have several bugs open to improve > our current documentation and one of the goals of Ryan Harter for this > quarter is to describe the various datasets and when one should use one or > the other. > > That said, catching these bugs is precisely why we do peer-reviews. > Generally the reviewer should have some experience with the dataset being > queried. I am not sure how you ended up reviewing that query since you > clearly have never dealt with the client_count dataset before. A colleague used an official company channel to ask for help. I helped him. > [1] > https://robertovitillo.com/2016/04/12/measuring-product-engagment-at-scale/ If your colleagues should read that, perhaps it belongs on the company's Mana or Wiki.
(In reply to Sam Penrose from comment #4) > If your colleagues should read that, perhaps it belongs on the company's > Mana or Wiki. Agreed.
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: