Validate Fenix Search Telemetry
Categories
(Data Science :: Review, task, P2)
Tracking
(Not tracked)
People
(Reporter: harter, Assigned: harter)
References
(Blocks 1 open bug)
Details
We need to verify that the newly implemented search telemetry for Fenix looks like we expect it to. Once we know the new probes are reporting as expected we need to prepare ETL to provide a dataset in the form of search_clients_daily for our mobile data.
We want this to all happen before the Fenix MVP launch in late June.
See this doc for more context: Fenix Search Telemetry
Assignee | ||
Comment 1•6 years ago
|
||
FYI, :mconnor, I need example data for these probes before I can make progress on this. Is there a bug or issue where we're tracking the probe definitions?
Moving to P2 while we wait for the probes to be implemented or example data from the engineering team.
Currently we're sending information about the default search engine and search counts. You can see the Glean definitions here:
These show up as part of the metrics
ping in re:dash.
for search.default_engine
the data comes in like
[["search.default_engine.submission_url","https://www.google.com/search?q=&ie=utf-8&oe=utf-8"],["search.default_engine.name","Google"],["search.default_engine.code","google-b-m"]]
I'm looking for an example for search_count right now.
Assignee | ||
Comment 3•6 years ago
|
||
Great! Thanks, :boek. A couple of questions.
- When is the
metrics
ping sent? Every 24 hours? - Do you have an example query against the
metrics
ping? That will help me review some example data. - The default engine
submission_url
you provided doesn't appear to include an engine code. Is than an error? - Please do include an example datum from
search_count
when you find one.
Assignee | ||
Comment 4•6 years ago
|
||
I did some digging this morning to track down some example pings. Here's a re:dash query counting all search_counts that we've seen from Fenix.
A few questions since I'm lacking some context into Fenix:
- We see very few searches (<300 per day). Is that expected?
- I only see searches with the key
__other__
, which does not match the format defined in metrics.yaml. These keys should be of the formatengine.source
, correct?
I also have a query showing the default_engine telemetry. Questions:
- We have three engines with exactly one user. Are these standard or custom engines? IIUC, we should not be reporting custom engines.
- From above, I don't see the partner code included in the submission_url for google or DDG. I do however, see a moz partner code in Bing's submission URL. Is this intentional? Can you confirm these searches are tagged with Mozilla's partner code?
Leaving the NI? For :boek to address these questions. Thanks all!!
Assignee | ||
Updated•6 years ago
|
Hey :harter,
- I wouldn't be surprised. I don't know what our usage is on Nightly. But we should verify based on other telemetry
- We're the first consumers of these APIs in the Glean SDK. So there definitely could be a bug there. I will have to investigate when I get back.
re: default_engine
- We currently don't allow custom engines in Fenix. So they are standard.
- The URL record is the same one we use to build the URL. If the code is missing there we will definitely need a fix
I opened the following bugs to track the work on the Fenix side:
https://github.com/mozilla-mobile/fenix/issues/2260
https://github.com/mozilla-mobile/fenix/issues/2261
Landed a fix yesterday for a bug that was only causing __other__
to be reported. We should start to see new data today.
Regarding default_engine
: I verified that the URL we're using in the app includes the partner code. I'm looking into why it's getting trimmed off by the time it ends up in redash.
Assignee | ||
Comment 7•6 years ago
|
||
Awesome! Thanks, boek. I'll verify the data look like we expect them to tomorrow.
Assignee | ||
Comment 8•6 years ago
|
||
Thanks again for the fix, :boek. We do see new data coming in, but it looks like search_counts
are keyed by engine_source
instead of engine.source
. Can you correct this?
Hey :harter,
The GleanSDK does not allow .
in the keys (which is why it was always reporting __other__
before). So I changed them the separator underscore. I'll ask the Glean team if we can update it
Comment 10•6 years ago
|
||
Follow up bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1552904
Assignee | ||
Comment 11•6 years ago
|
||
Thanks!
Assignee | ||
Comment 12•6 years ago
|
||
Hey all,
I hear we've landed patches for both the default_engine
and search_count.engine
issues described above. I took a look at the most recent data and we are seeing the corrected format in telemetry!
One note, it looks like we're still getting data in the older (erroneous) formats for both fields even for recent app_versions
. Is this expected? Is there a better way for me to stratify the data so we can confirm recent releases are exclusively sending clean data?
Here are my queries for reference:
- https://sql.telemetry.mozilla.org/queries/62482/source
- https://sql.telemetry.mozilla.org/queries/62481/source
Thanks!
Comment 13•6 years ago
|
||
Hey :Harter,
My only guess is the events got created and then sent after the user had updated to a more recent version of Fenix. It shouldn't be possible to send an event separated by an _
in recent versions. I will check with the Glean team to see if they have any other ideas
Comment 14•6 years ago
|
||
:boek's theory is plausible.
Glean persists events to disk as it goes. If any are remaining on disk when the application starts, these are sent immediately, and the queue on disk is cleared. So this would result in events in an old form being sent with a later build id on the first run of the application after upgrading.
However, this case should be pretty rare. Events are normally sent (and cleared) every time the app goes to background (in reponse to the ON_STOP Android lifecycle event). So for them to be queued and sent on next start of the app like this, the app would need to be killed without triggering a ping send. I don't know enough about Android to know when that would occur and how frequent it is in practice, just enough to say that in CAN.
Assignee | ||
Comment 15•6 years ago
|
||
OK, for recent pings from recent builds ~2% of searches are associated with a "bad" engine containing an underscore. I spot checked some offending clients. The majority of users reporting offending engines on a recent build (11590616) send offending pings on exactly one submission_date
which substantiates boek's and mdroetboom's hypothesis that these are old events. Users sending broken engines on more than one date are sending pings with engine = __other_
. Perhaps there is some fringe engine that's still breaking glean's checks, but in general things look good.
Marking this resolved. Thanks for the help all!
Queries for reference:
- Broken engine rates: https://sql.telemetry.mozilla.org/queries/63185/source
- Clients with broken engines: https://sql.telemetry.mozilla.org/queries/63186/source
- An example client with broken engines: https://sql.telemetry.mozilla.org/queries/63187/source
Comment 16•6 years ago
|
||
Hey :harter,
I wonder if we could get more information from the data team about the users that are still reporting engine = other. If we can maybe pin it down to a common locale we might be able to figure out what data is causing trouble.
Assignee | ||
Comment 17•6 years ago
|
||
Sure! Do you mind opening a new bug for this investigation?
Here's an example query that shows the count of distinct users with an __other__
engine for a recent build.
Description
•