Closed Bug 1495548 Opened 6 years ago Closed 5 years ago

[meta] Add a telemetry probe for SERP ad clicks into core Firefox

Categories

(Firefox :: Search, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
Firefox 65

People

(Reporter: mkaply, Unassigned)

References

Details

(Keywords: meta)

User Story

AD PRESENCE PROBE: We can understand how often users see ads in search
Acceptance Criteria
* When user performs an SAP search or qualified, partner-tagged follow-on, run an in-content check to see if the SERP contains one or more contextual text ads. If yes, increment a counter
* Ad presence counts should be aggregate, by engine, and tied to client_ID. We should be able to do simple napkin math such as total ad displays counted for an engine divided by total user search counts for an engine. EXAMPLE: "Hooli: ad displays = 20. Hooli: search counts = 100. We see ads on roughly 20% of Hooli SERPS for Country XYZ.
* Store/pipeline this data such that we can easily calculate the percentages of users who see ads and those who do not for a given time period (say, a month)
* For users who do not see ads, we can go backwards in telemetry and see if at some point they did see ads
* We can run correlative analysis against installed add-ons, for example, to understand if certain add-ons or other factors prevent users from seeing ads



AD CLICK PROBE: We understand which users click on ads and how often
Acceptance Criteria
* When a user performs an SAP search or qualified, partner-tagged follow-on, and lands on a SERP with contextual ads, check to see if the user clicks on any ads. If yes, increment a counter
* Ad click counts should be aggregate, by engine, tied to client_ID. We should be able to do simple napkin math such as total ad clicks counted for an engine divided by total user search counts for an engine. EXAMPLE: "Hooli: ad clicks = 30. Hooli: search counts = 1000. Users click on Hooli ads roughly 3% of the time
* We should be able to determine which users *do* click on ads, and be able to run correlative analysis on other telemetry attributes, such usage patterns, add-ons installed, search behavior etc.
* We should be able to run analysis by country, etc. to see if ad click behavior is changing. Are more or fewer users clicking on ads? What is the trend for aggregate ad clicks? 
* We should be able to use this ad click probe in top-down analysis in Shield Studies to understand if search UI features or *any other* UI changes impact ad clicks and hence monetization in Firefox. 

Example: A new UI bookmarks/history feature cannibalizes search volume. We believe this is just non-monetizable volume, but is there an impact to revenue? The Shield study has branches two equally-sized branches: 
A: New UI  (treatment)
B. Control. 

Results: The ad click probe shows us that the aggregate ad clicks for treatment branch A were 90,000 ad clicks, and branch B (control) was 100,000 ad clicks. Hence we deduce that the new UI feature could reduce search revenue by roughly 10%

Attachments

(1 file)

In a previous Cliqz test, we created an ad click probe to try to get a sense of how often users click on SERP ads:

https://github.com/past/searchvolmodel
https://github.com/past/searchvolmodel/blob/master/add-on/content/SerpMonitor.jsm#L153

We want to move that into core Firefox.
A really good approach for this would be to use the new actors' infrastructure for it. One can create an actor in browser/actors/ and then declare it in nsBrowserGlue.js. The "matches" parameter can be used to limit it to the desired URLs, and then it can be declared to listen for click events. This will have very little overhead as the actor will only be instantiated once that event happens for a matched page.

Example:
https://searchfox.org/mozilla-central/rev/6ddb5fb144993fb5de044e2e8d900d7643b98a4d/browser/components/nsBrowserGlue.js#35-39
Priority: -- → P3
Target Milestone: --- → Firefox 64
We have this one as a P1 in the Trello, I believe it should be a P1 here. 

In the August blog post, we announced that we'd measure both ad clicks and the presence of ads
https://blog.mozilla.org/data/2018/08/20/effectively-measuring-search-in-firefox/

I'm writing both requirements here. Potentially they build and are preconditions of each other, just like the tagged in-content search probe was the basis of the first ad click probe: A. In-content: Is this search qualifying (has codes, etc.), if so count B. Ad presence: Are ads present, if so, count C. Clicks: did we see a click? If so, count

We can decide whether to split this work into multiple bugs, but let's estimate it first...


***** Requirements / user stories (adding to the above field too...) *****

As the search team...


AD PRESENCE PROBE: We can understand how often users see ads in search
Acceptance Criteria
* When user performs an SAP search or qualified, partner-tagged follow-on, run an in-content check to see if the SERP contains one or more contextual text ads. If yes, increment a counter
* Ad presence counts should be aggregate, by engine, and tied to client_ID. We should be able to do simple napkin math such as total ad displays counted for an engine divided by total user search counts for an engine. EXAMPLE: "Hooli: ad displays = 20. Hooli: search counts = 100. We see ads on roughly 20% of Hooli SERPS for Country XYZ.
* Store/pipeline this data such that we can easily calculate the percentages of users who see ads and those who do not for a given time period (say, a month)
* For users who do not see ads, we can go backwards in telemetry and see if at some point they did see ads
* We can run correlative analysis against installed add-ons, for example, to understand if certain add-ons or other factors prevent users from seeing ads



AD CLICK PROBE: We understand which users click on ads and how often
Acceptance Criteria
* When a user performs an SAP search or qualified, partner-tagged follow-on, and lands on a SERP with contextual ads, check to see if the user clicks on any ads. If yes, increment a counter
* Ad click counts should be aggregate, by engine, tied to client_ID. We should be able to do simple napkin math such as total ad clicks counted for an engine divided by total user search counts for an engine. EXAMPLE: "Hooli: ad clicks = 30. Hooli: search counts = 1000. Users click on Hooli ads roughly 3% of the time
* We should be able to determine which users *do* click on ads, and be able to run correlative analysis on other telemetry attributes, such usage patterns, add-ons installed, search behavior etc.
* We should be able to run analysis by country, etc. to see if ad click behavior is changing. Are more or fewer users clicking on ads? What is the trend for aggregate ad clicks? 
* We should be able to use this ad click probe in top-down analysis in Shield Studies to understand if search UI features or *any other* UI changes impact ad clicks and hence monetization in Firefox. 

Example: A new UI bookmarks/history feature cannibalizes search volume. We believe this is just non-monetizable volume, but is there an impact to revenue? The Shield study has branches two equally-sized branches: 
A: New UI  (treatment)
B. Control. 

Results: The ad click probe shows us that the aggregate ad clicks for treatment branch A were 90,000 ad clicks, and branch B (control) was 100,000 ad clicks. Hence we deduce that the new UI feature could reduce search revenue by roughly 10%
User Story: (updated)
Priority: P3 → P1
Standard8 highlighted the need to discuss the implementation (trains, system add-on, etc.) and noted that this probe will be brittle and can break whenever a SERP changes their page -- hence the need to look at a solution that can be updated off trains. Potentially this is a blended solution that is part in the tree, part out of band (system add-ons, Fx cloud services, etc.)
The current implementation of the searchvolmodel basically assumes that once you're on a search partner page, then clicking a link with a defined prefix in the path, would match an ad-click. This is simpler than I remembered.

We can watch for links being added which point to the ad urls, and then watch for a click happening.

We could potentially extend the existing search engine mechanism with that prefix value, and then this could be delivered in the same way as existing search engine updates.

As long as we don't need to do any special in-page parsing for other search engines, this should work fine (and in any case, probably good enough for a v1).

Regarding the use of actors (comment 1), we might need to some additional work to handle dynamically updating the list of sites we track (aka "matches"). This could be something new for the Actors manager to implement but it might be good enough performance if we handle it ourselves. I'll need to do a bit more finding out here.
Javaun:

- Will this be opt-in or opt-out? 
- Are we going to be recording in private browsing mode.

Other assumptions I'm making:

- Both clicks and keyboard selections will be counted.
- Sidebar / non “main browser” locations also will be counted (unlikely to be many of these, but some extensions do allow loading in sidebar).
Flags: needinfo?(jmoradi)
1. Opt-out. We have clearance to treat this as any other search count
2. Yes to PBM, just like normal search counts. We will log aggregate counts but not separate them to normal vs. PBM. This is consistent with other telemetry counts including search. We can log those counts but not where they occured, so there is no privacy leak. Since all aggregate counts are logged together, there is no evidence that the user ever went to PBM, since all counts could conceivably have taken place in normal windows
Flags: needinfo?(jmoradi)
We've announced this in August, opening the bug.
Group: mozilla-employee-confidential
For data review. NI Francois. Gdoc form and txt attached

https://docs.google.com/document/d/1INyWHXjVbBduz4PzUMRucw1q-XtDlKFbhLofZ3xkmsM/edit
Flags: needinfo?(francois)
Comment on attachment 9022774 [details]
Data Review_ Ad Click probe.txt

> Ad clicks are a more direct measure of revenue than anything we cu

This sentence got cut off in the txt version, as well as the gdoc version.

I assume you mean "anything we currently have." or something similar. Please correct this if you meant something else.
Flags: needinfo?(francois)
Comment on attachment 9022774 [details]
Data Review_ Ad Click probe.txt

Javaun: my NEEDINFO relates to the question in #4.


1) Is there or will there be **documentation** that describes the schema for the ultimate data set available publicly, complete and accurate?

Yes. I assume it will end up in Scalars.yaml.

2) Is there a control mechanism that allows the user to turn the data collection on and off?

Disabling telemetry should turn this off since it's a standard probe.

3) If the request is for permanent data collection, is there someone who will monitor the data over time?**

Yes, Javaun.

4) Using the **[category system of data types](https://wiki.mozilla.org/Firefox/Data_Collection)** on the Mozilla wiki, what collection type of data do the requested measurements fall under?  **

The presence of ads on a search result page would be Category 1.

Clicking on ads, is a form of product interaction (Category 2) since we don't track which ads (or search terms) the user is clicking on.

For completeness, there is maybe a small amount of browsing data (Category 3) being captured since we key the number of searches per provider. So in effect, we know the number of searches that a user does on specific search engine websites. While the number of Google searches a user makes isn't particularly revealing, there could be specialized search engines (e.g. porn-related) we don't want to track in this way.

Javaun, can you confirm that the only search engines we're keying on are the ones we partner with?

5) Is the data collection request for default-on or default-off?

Default ON.

6) Does the instrumentation include the addition of **any *new* identifiers** (whether anonymous or otherwise; e.g., username, random IDs, etc.  See the appendix for more details)?

No.

7) Is the data collection covered by the existing Firefox privacy notice?

Yes, assuming the list of search engines we are tracking is restricted (see question 4).

8) Does there need to be a check-in in the future to determine whether to renew the data?

No, permanent.
Flags: needinfo?(jmoradi)
Attachment #9022774 - Flags: review+
Keywords: meta
Summary: Add a telemetry probe for SERP ad clicks into core Firefox → [meta] Add a telemetry probe for SERP ad clicks into core Firefox
François: to number 4 yes, we would restrict logged aggregate counts to shipped partner engines. I believe for this version we are in fact only supporting one engine (Google). If we extended more it would only be for engines we support via contractual partnerships.

NI'ing Standard8 (Mark). Mark, can you confirm that as we are right now, adding an OpenSearch plugin for "adultsite.com" will not add a search_counts histogram for that site? (Or will only do so for supported partner sites, which right now is probably just Google)
Flags: needinfo?(jmoradi) → needinfo?(standard8)
(In reply to Javaun Moradi [:javaun] from comment #12)
> NI'ing Standard8 (Mark). Mark, can you confirm that as we are right now,
> adding an OpenSearch plugin for "adultsite.com" will not add a search_counts
> histogram for that site? (Or will only do so for supported partner sites,
> which right now is probably just Google)

We have a specific (and separate) list of partners that we'll be managing for what we collect the counts on. Hence, adding extra plugins/add-ons will not affect that list, and we won't collect data for them.

Note, these won't be the SEARCH_COUNTS histogram, they'll be new keyed scalars (bug 1505411 will be adding them).
Flags: needinfo?(standard8)

This was completed via bug 1505411 several months ago, hence closing this out.

Bug 1511065 remains as an enhancement to be done at some stage.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: Firefox 64 → Firefox 65
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: