Closed Bug 1543433 Opened 5 years ago Closed 5 years ago

Validate Fenix Search Telemetry

Categories

(Data Science :: Review, task, P2)

x86_64
Unspecified
task
Points:
2

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: harter, Assigned: harter)

References

(Blocks 1 open bug)

Details

We need to verify that the newly implemented search telemetry for Fenix looks like we expect it to. Once we know the new probes are reporting as expected we need to prepare ETL to provide a dataset in the form of search_clients_daily for our mobile data.

We want this to all happen before the Fenix MVP launch in late June.

See this doc for more context: Fenix Search Telemetry

Blocks: 1543434

FYI, :mconnor, I need example data for these probes before I can make progress on this. Is there a bug or issue where we're tracking the probe definitions?

Moving to P2 while we wait for the probes to be implemented or example data from the engineering team.

Flags: needinfo?(mconnor)
Priority: P1 → P2

Currently we're sending information about the default search engine and search counts. You can see the Glean definitions here:

  1. search_count
  2. search.default_engine

These show up as part of the metrics ping in re:dash.

for search.default_engine the data comes in like

[["search.default_engine.submission_url","https://www.google.com/search?q=&ie=utf-8&oe=utf-8"],["search.default_engine.name","Google"],["search.default_engine.code","google-b-m"]]

I'm looking for an example for search_count right now.

Great! Thanks, :boek. A couple of questions.

  • When is the metrics ping sent? Every 24 hours?
  • Do you have an example query against the metrics ping? That will help me review some example data.
  • The default engine submission_url you provided doesn't appear to include an engine code. Is than an error?
  • Please do include an example datum from search_count when you find one.
Flags: needinfo?(mconnor) → needinfo?(j)

I did some digging this morning to track down some example pings. Here's a re:dash query counting all search_counts that we've seen from Fenix.

A few questions since I'm lacking some context into Fenix:

  • We see very few searches (<300 per day). Is that expected?
  • I only see searches with the key __other__, which does not match the format defined in metrics.yaml. These keys should be of the format engine.source, correct?

I also have a query showing the default_engine telemetry. Questions:

  • We have three engines with exactly one user. Are these standard or custom engines? IIUC, we should not be reporting custom engines.
  • From above, I don't see the partner code included in the submission_url for google or DDG. I do however, see a moz partner code in Bing's submission URL. Is this intentional? Can you confirm these searches are tagged with Mozilla's partner code?

Leaving the NI? For :boek to address these questions. Thanks all!!

Group: mozilla-employee-confidential

Hey :harter,

  • I wouldn't be surprised. I don't know what our usage is on Nightly. But we should verify based on other telemetry
  • We're the first consumers of these APIs in the Glean SDK. So there definitely could be a bug there. I will have to investigate when I get back.

re: default_engine

  • We currently don't allow custom engines in Fenix. So they are standard.
  • The URL record is the same one we use to build the URL. If the code is missing there we will definitely need a fix

I opened the following bugs to track the work on the Fenix side:
https://github.com/mozilla-mobile/fenix/issues/2260
https://github.com/mozilla-mobile/fenix/issues/2261

Flags: needinfo?(j)

Landed a fix yesterday for a bug that was only causing __other__ to be reported. We should start to see new data today.

Regarding default_engine: I verified that the URL we're using in the app includes the partner code. I'm looking into why it's getting trimmed off by the time it ends up in redash.

Awesome! Thanks, boek. I'll verify the data look like we expect them to tomorrow.

Thanks again for the fix, :boek. We do see new data coming in, but it looks like search_counts are keyed by engine_source instead of engine.source. Can you correct this?

Flags: needinfo?(j)

Hey :harter,

The GleanSDK does not allow . in the keys (which is why it was always reporting __other__ before). So I changed them the separator underscore. I'll ask the Glean team if we can update it

Flags: needinfo?(j)

Thanks!

Hey all,

I hear we've landed patches for both the default_engine and search_count.engine issues described above. I took a look at the most recent data and we are seeing the corrected format in telemetry!

One note, it looks like we're still getting data in the older (erroneous) formats for both fields even for recent app_versions. Is this expected? Is there a better way for me to stratify the data so we can confirm recent releases are exclusively sending clean data?

Here are my queries for reference:

Thanks!

Flags: needinfo?(j)

Hey :Harter,

My only guess is the events got created and then sent after the user had updated to a more recent version of Fenix. It shouldn't be possible to send an event separated by an _ in recent versions. I will check with the Glean team to see if they have any other ideas

:boek's theory is plausible.

Glean persists events to disk as it goes. If any are remaining on disk when the application starts, these are sent immediately, and the queue on disk is cleared. So this would result in events in an old form being sent with a later build id on the first run of the application after upgrading.

However, this case should be pretty rare. Events are normally sent (and cleared) every time the app goes to background (in reponse to the ON_STOP Android lifecycle event). So for them to be queued and sent on next start of the app like this, the app would need to be killed without triggering a ping send. I don't know enough about Android to know when that would occur and how frequent it is in practice, just enough to say that in CAN.

OK, for recent pings from recent builds ~2% of searches are associated with a "bad" engine containing an underscore. I spot checked some offending clients. The majority of users reporting offending engines on a recent build (11590616) send offending pings on exactly one submission_date which substantiates boek's and mdroetboom's hypothesis that these are old events. Users sending broken engines on more than one date are sending pings with engine = __other_. Perhaps there is some fringe engine that's still breaking glean's checks, but in general things look good.

Marking this resolved. Thanks for the help all!

Queries for reference:

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Flags: needinfo?(j)
Resolution: --- → FIXED

Hey :harter,

I wonder if we could get more information from the data team about the users that are still reporting engine = other. If we can maybe pin it down to a common locale we might be able to figure out what data is causing trouble.

Sure! Do you mind opening a new bug for this investigation?

Here's an example query that shows the count of distinct users with an __other__ engine for a recent build.

See Also: → 1566764
Blocks: 1608997
You need to log in before you can comment on or make changes to this bug.