Focus Android reports 200k DAUs more on Looker (legacy) than redash (Glean)
Categories
(Data Platform and Tools :: General, defect)
Tracking
(Not tracked)
People
(Reporter: Dexter, Unassigned)
References
Details
This bug was filed because of this conversation with rtestard.
This looker explore consistently shows 200k more DAUs than this query on Glean metrics ping (a similar query with the 'baseline' ping shows comparable numbers).
The order of magnitude of this mismatch is surprising and unexpected, and so we must understand what's going on.
My quick data lineage investigation:
I found that this dashboard is using the telemetry.nondesktop_clients_last_seen table, which is defined in bigquery-etl here, which in turn is using telemetry_derived.core_clients_last_seen, which is based on core_clients_daily, which uses telemetry.core, aka legacy telemetry.
Talking to Jan-Erik, he suggested a few hypothesis:
- Multiple channels in that dataset and not filtered by that (legacy)?
- Both apps (Focus & Klar) marked as the same in the legacy dataset (whereas it's 2 different datasets for Glean)?
Comment 1•3 years ago
|
||
When breaking down DAU per country the main difference between org_mozilla_focus.metrics (https://sql.telemetry.mozilla.org/queries/85845/source#212561) and telemetry.core_clients_daily (https://sql.telemetry.mozilla.org/queries/85711/source#212229) seems to be indonesia. Andoird Play console reports users in Indonesia to be in very low share whereas legacy telemetry reports that 40% of DAU are from ID.
Reporter | ||
Comment 2•3 years ago
|
||
Thank you Romain! This is indeed a good lead for this investigation. I slightly tweaked your query to only report the numbers without Indonesia, and they roughly match now, see here. While this is promising, and the number discrepancy is within the expected range, we should probably still check how things are handled on the dataset side and if there's any specific way to tell if it's a Focus fork.
Reporter | ||
Comment 3•3 years ago
|
||
I was just shown this document and its companion focus investigation which seems to confirm what you found out by yourself, Romain! Congrats :-)
Good news is that Glean is not inflated by that. The second good news is that we might be able to get some insights about what the fork is:
- Glean ping records the client id from legacy telemetry, in Focus
- This allows us to roughly identify the fork by looking at the clients reported in legacy that are not reported by glean, for Indonesia.
Maybe the resulting pings would allow us to understand what's going on :)
Comment 4•3 years ago
•
|
||
The metric that :Dexter points out is only recorded on the deletion-request ping, so I doubt it will be of much use in connecting legacy to Glean data. I happen to know that there is an identifier that might be more suitable for this: https://dictionary.telemetry.mozilla.org/apps/focus_android/metrics/activation_activation_id
This "activation_id" is a synonym for the legacy client id, recorded in Focus Android here: https://searchfox.org/mozilla-mobile/source/focus-android/app/src/main/java/org/mozilla/focus/telemetry/ActivationPing.kt#63
Reporter | ||
Comment 5•3 years ago
|
||
I took a short detour and tried to understand a bit more about this. Here's my understanding so far:
- the Android Console shows a big mismatch between what's reported by legacy telemetry and its install data; this corroborates the theory of forks sending legacy telemetry data, which is by design being filtered by Glean, since the Android Console exclusively shows the data for owned products/app ids;
- this query on legacy data shows that most of the data comes from Focus Android versions < 20. That's interesting because:
- Before starting to use Gecko version numbers (e.g. around v 90), the last version number used by Focus Android was 8.18.0. Mozilla never used, for example, version 19.
- A significant amount of data is coming from Focus Android version 19
- This query was an initial attempt to isolate forks using glean. Looks like potential forks have app build ids that are radically different than the non-forks (e.g. non forks ids are lik 361400349 and forks ids are just two digits usually 20)
- The value format for the default search engine differs between potential forks and non-forks. This seems to indicate that potential forks are stuck on a diverging, old, codebase that was forked a while back from the main Focus Android codebase.
Reporter | ||
Comment 6•3 years ago
|
||
We have a fairly good grasp of what this is about and we wrote a decision brief about it: https://docs.google.com/document/d/1dkuI6017UA_ItFgBnIApPVT1DkF3xVkkcn8jhp6uprY/edit#
Calling this investigation complete.
Assignee | ||
Updated•3 years ago
|
Description
•