1543433 - Validate Fenix Search Telemetry

Assignee

Description

•

6 years ago

We need to verify that the newly implemented search telemetry for Fenix looks like we expect it to. Once we know the new probes are reporting as expected we need to prepare ETL to provide a dataset in the form of search_clients_daily for our mobile data.

We want this to all happen before the Fenix MVP launch in late June.

See this doc for more context: Fenix Search Telemetry

Ryan Harter [:harter]

Assignee

Updated

•

6 years ago

Blocks: 1543434

Ryan Harter [:harter]

Assignee

Comment 1

•

6 years ago

FYI, :mconnor, I need example data for these probes before I can make progress on this. Is there a bug or issue where we're tracking the probe definitions?

Moving to P2 while we wait for the probes to be implemented or example data from the engineering team.

Flags: needinfo?(mconnor)

Priority: P1 → P2

__oldboek

Comment 2

•

6 years ago

Currently we're sending information about the default search engine and search counts. You can see the Glean definitions here:

These show up as part of the metrics ping in re:dash.

for search.default_engine the data comes in like

[["search.default_engine.submission_url","https://www.google.com/search?q=&ie=utf-8&oe=utf-8"],["search.default_engine.name","Google"],["search.default_engine.code","google-b-m"]]

I'm looking for an example for search_count right now.

Ryan Harter [:harter]

Assignee

Comment 3

•

6 years ago

Great! Thanks, :boek. A couple of questions.

When is the metrics ping sent? Every 24 hours?
Do you have an example query against the metrics ping? That will help me review some example data.
The default engine submission_url you provided doesn't appear to include an engine code. Is than an error?
Please do include an example datum from search_count when you find one.

Flags: needinfo?(mconnor) → needinfo?(j)

Ryan Harter [:harter]

Assignee

Comment 4

•

6 years ago

I did some digging this morning to track down some example pings. Here's a re:dash query counting all search_counts that we've seen from Fenix.

A few questions since I'm lacking some context into Fenix:

We see very few searches (<300 per day). Is that expected?
I only see searches with the key __other__, which does not match the format defined in metrics.yaml. These keys should be of the format engine.source, correct?

I also have a query showing the default_engine telemetry. Questions:

We have three engines with exactly one user. Are these standard or custom engines? IIUC, we should not be reporting custom engines.
From above, I don't see the partner code included in the submission_url for google or DDG. I do however, see a moz partner code in Bing's submission URL. Is this intentional? Can you confirm these searches are tagged with Mozilla's partner code?

Leaving the NI? For :boek to address these questions. Thanks all!!

Ryan Harter [:harter]

Assignee

Updated

•

6 years ago

Group: mozilla-employee-confidential

__oldboek

Comment 5

•

6 years ago

Hey :harter,

I wouldn't be surprised. I don't know what our usage is on Nightly. But we should verify based on other telemetry
We're the first consumers of these APIs in the Glean SDK. So there definitely could be a bug there. I will have to investigate when I get back.

re: default_engine

We currently don't allow custom engines in Fenix. So they are standard.
The URL record is the same one we use to build the URL. If the code is missing there we will definitely need a fix

I opened the following bugs to track the work on the Fenix side:
https://github.com/mozilla-mobile/fenix/issues/2260
https://github.com/mozilla-mobile/fenix/issues/2261

Flags: needinfo?(j)

__oldboek

Comment 6

•

6 years ago

Landed a fix yesterday for a bug that was only causing __other__ to be reported. We should start to see new data today.

Regarding default_engine: I verified that the URL we're using in the app includes the partner code. I'm looking into why it's getting trimmed off by the time it ends up in redash.

Ryan Harter [:harter]

Assignee

Comment 7

•

6 years ago

Awesome! Thanks, boek. I'll verify the data look like we expect them to tomorrow.

Ryan Harter [:harter]

Assignee

Comment 8

•

6 years ago

Thanks again for the fix, :boek. We do see new data coming in, but it looks like search_counts are keyed by engine_source instead of engine.source. Can you correct this?

Flags: needinfo?(j)

__oldboek

Comment 9

•

6 years ago

Hey :harter,

The GleanSDK does not allow . in the keys (which is why it was always reporting __other__ before). So I changed them the separator underscore. I'll ask the Glean team if we can update it

Flags: needinfo?(j)

__oldboek

Comment 10

•

6 years ago

Follow up bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1552904

Ryan Harter [:harter]

Assignee

Comment 11

•

6 years ago

Thanks!

Ryan Harter [:harter]

Assignee

Comment 12

•

6 years ago

Hey all,

I hear we've landed patches for both the default_engine and search_count.engine issues described above. I took a look at the most recent data and we are seeing the corrected format in telemetry!

One note, it looks like we're still getting data in the older (erroneous) formats for both fields even for recent app_versions. Is this expected? Is there a better way for me to stratify the data so we can confirm recent releases are exclusively sending clean data?

Here are my queries for reference:

Thanks!

Flags: needinfo?(j)

__oldboek

Comment 13

•

6 years ago

Hey :Harter,

My only guess is the events got created and then sent after the user had updated to a more recent version of Fenix. It shouldn't be possible to send an event separated by an _ in recent versions. I will check with the Glean team to see if they have any other ideas

Michael Droettboom [:mdroettboom]

Comment 14

•

6 years ago

:boek's theory is plausible.

Glean persists events to disk as it goes. If any are remaining on disk when the application starts, these are sent immediately, and the queue on disk is cleared. So this would result in events in an old form being sent with a later build id on the first run of the application after upgrading.

However, this case should be pretty rare. Events are normally sent (and cleared) every time the app goes to background (in reponse to the ON_STOP Android lifecycle event). So for them to be queued and sent on next start of the app like this, the app would need to be killed without triggering a ping send. I don't know enough about Android to know when that would occur and how frequent it is in practice, just enough to say that in CAN.

Ryan Harter [:harter]

Assignee

Comment 15

•

6 years ago

OK, for recent pings from recent builds ~2% of searches are associated with a "bad" engine containing an underscore. I spot checked some offending clients. The majority of users reporting offending engines on a recent build (11590616) send offending pings on exactly one submission_date which substantiates boek's and mdroetboom's hypothesis that these are old events. Users sending broken engines on more than one date are sending pings with engine = __other_. Perhaps there is some fringe engine that's still breaking glean's checks, but in general things look good.

Marking this resolved. Thanks for the help all!

Queries for reference:

Broken engine rates: https://sql.telemetry.mozilla.org/queries/63185/source
Clients with broken engines: https://sql.telemetry.mozilla.org/queries/63186/source
An example client with broken engines: https://sql.telemetry.mozilla.org/queries/63187/source

Status: ASSIGNED → RESOLVED

Closed: 6 years ago

Flags: needinfo?(j)

Resolution: --- → FIXED

__oldboek

Comment 16

•

6 years ago

Hey :harter,

I wonder if we could get more information from the data team about the users that are still reporting engine = other. If we can maybe pin it down to a common locale we might be able to figure out what data is causing trouble.

Ryan Harter [:harter]

Assignee

Comment 17

•

6 years ago

Sure! Do you mind opening a new bug for this investigation?

Here's an example query that shows the count of distinct users with an __other__ engine for a recent build.

Alessio Placitelli [:Dexter]

Updated

•

6 years ago

Updated

•

5 years ago

Blocks: 1608997

Bugzilla

Validate Fenix Search Telemetry

Categories

(Data Science :: Review, task, P2)

Tracking

(Not tracked)

People

(Reporter: harter, Assigned: harter)

References

(Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Updated

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Comment 16

Comment 17

Updated

Updated