Closed
Bug 1297196
Opened 9 years ago
Closed 9 years ago
French locale discrepancy between client_count dataset and other datasets (FHRv2, main_summary)
Categories
(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)
Cloud Services Graveyard
Metrics: Pipeline
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: kparlante, Assigned: rvitillo)
Details
Attachments
(1 file)
2.87 KB,
text/plain
|
Details |
Discussion in slack (#fx-metrics) revealed a discrepancy between a query in re:dash using client_count, and other datasets.
RT noticed the locales of French users has an unexpected breakdown in client_count:
https://sql.telemetry.mozilla.org/queries/1034/source
Saptarshi's FHRv2 analysis had 96% 'fr' locale:
https://gist.github.com/saptarshiguha/2c0a4cf21d1776aa525def09f81457ba
https://docs.google.com/spreadsheets/d/1ypSpS9SLx5PK3hlpRoxNUEWKf53vFfk9oAYKXkn0xj8/edit#gid=2037446830
https://metrics.mozilla.com/protected/sguha/ctrylocale/
spenrose saw something similar in a notebook looking at main_summary, attached
Reporter | ||
Updated•9 years ago
|
Summary: French locale discrepancy between client_count dataset and other datasets (ADI, main_summary) → French locale discrepancy between client_count dataset and other datasets (FHRv2, main_summary)
Assignee | ||
Updated•9 years ago
|
Points: --- → 1
Priority: -- → P1
Assignee | ||
Comment 1•9 years ago
|
||
(In reply to Katie Parlante from comment #0)
> RT noticed the locales of French users has an unexpected breakdown in
> client_count:
> https://sql.telemetry.mozilla.org/queries/1034/source
That query is incorrect as it's counting the number of combinations of dimensions and not the number of profiles. See [1] for a correct version.
[1] - https://sql.telemetry.mozilla.org/queries/1057/source
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Comment 2•9 years ago
|
||
If a data source makes it easy to get a misleading answer with a query that is not obviously buggy, I do not think the underlying issue should be considered "Resolved Fixed."
I have been writing SQL for years and I cannot explain the semantics of the correct version.
(In reply to Roberto Agostino Vitillo (:rvitillo) from comment #1)
> (In reply to Katie Parlante from comment #0)
> > RT noticed the locales of French users has an unexpected breakdown in
> > client_count:
> > https://sql.telemetry.mozilla.org/queries/1034/source
>
> That query is incorrect as it's counting the number of combinations of
> dimensions and not the number of profiles. See [1] for a correct version.
>
> [1] - https://sql.telemetry.mozilla.org/queries/1057/source
Assignee | ||
Comment 3•9 years ago
|
||
(In reply to Sam Penrose from comment #2)
> If a data source makes it easy to get a misleading answer with a query that
> is not obviously buggy, I do not think the underlying issue should be
> considered "Resolved Fixed."
>
> I have been writing SQL for years and I cannot explain the semantics of the
> correct version.
The client_count table does not contain individual measurements, but roll-ups; you might want to read [1]. We have several bugs open to improve our current documentation and one of the goals of Ryan Harter for this quarter is to describe the various datasets and when one should use one or the other.
That said, catching these bugs is precisely why we do peer-reviews. Generally the reviewer should have some experience with the dataset being queried. I am not sure how you ended up reviewing that query since you clearly have never dealt with the client_count dataset before.
[1] https://robertovitillo.com/2016/04/12/measuring-product-engagment-at-scale/
Comment 4•9 years ago
|
||
(In reply to Roberto Agostino Vitillo (:rvitillo) from comment #3)
> (In reply to Sam Penrose from comment #2)
> > If a data source makes it easy to get a misleading answer with a query that
> > is not obviously buggy, I do not think the underlying issue should be
> > considered "Resolved Fixed."
> >
> > I have been writing SQL for years and I cannot explain the semantics of the
> > correct version.
>
> The client_count table does not contain individual measurements, but
> roll-ups; you might want to read [1]. We have several bugs open to improve
> our current documentation and one of the goals of Ryan Harter for this
> quarter is to describe the various datasets and when one should use one or
> the other.
>
> That said, catching these bugs is precisely why we do peer-reviews.
> Generally the reviewer should have some experience with the dataset being
> queried. I am not sure how you ended up reviewing that query since you
> clearly have never dealt with the client_count dataset before.
A colleague used an official company channel to ask for help. I helped him.
> [1]
> https://robertovitillo.com/2016/04/12/measuring-product-engagment-at-scale/
If your colleagues should read that, perhaps it belongs on the company's Mana or Wiki.
Assignee | ||
Comment 5•9 years ago
|
||
(In reply to Sam Penrose from comment #4)
> If your colleagues should read that, perhaps it belongs on the company's
> Mana or Wiki.
Agreed.
Updated•7 years ago
|
Product: Cloud Services → Cloud Services Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•