Closed Bug 1476455 Opened 6 years ago Closed 5 years ago

Pocket Reach - Telemetry

Categories

(Firefox :: New Tab Page, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED
Iteration:
69.1 - May 13 - 26

People

(Reporter: klong, Unassigned)

References

Details

We would like to include some additional telemetry around content recommendations to better allow us to improve the product and resolve bugs faster.

Both content personalization and the decision making process of what stories to show or not to show the user happens client-side within Firefox. This has been a great way for us to put forward a privacy-focus personalization system and is a principle we’re committed to. After a few months of this initial implementation in the wild, we’ve identified some information bottlenecks that are preventing us from iterating quickly on improving the product and also significantly hampering our ability to resolve issues.

One, currently, we have no ability to understand the potential audience size of a particular story without running a guess on the personalization configuration and then tuning it live in the wild. As a result, it means that personalization is very slow to iterate on (as we can only run one of these at a time) and has product risk/potential for bad user experience given that some stories may be shown to more people than expected or less.

Additionally, specific to sponsored stories (spocs), there are a number of reasons why a story may not be shown to a user. However, currently, when a story isn’t shown at the volume that we expected, we have no way of diagnosing the problem or understanding where within the numerous logic steps that the client is deciding not to show it. This again makes it hard for us to move quickly to resolve bugs and issues impacting users.

In order to be able to improve personalization and the experience of both organic and sponsored content recommendations, we need to be able to:

1. Test personalization settings without the need to display an article to a user
2. Understand the reason why a sponsored story was or was not shown to a user.

This requires two new pieces of telemetry data.

Note #1: For the purposes of this ticket, a "personalization configuration" has two components: (1) a list of domains (and associated weights), and (2) a parameter set (config) that determines how the user’s browsing history should be scored in combination with the domain list.

Note #2: For this data to be useful to for analysis, it should use the same pocket-id that is used with the impression/click tracking in place of the client-id.

(1) Personalization Configuration Scores

Every time a new tab is loaded, we would like to log an event with the score for every configuration in the feed response.

Accomplishing this will require some changes to how domain_affinities are passed in the feed response (and thus how activity stream accesses domain affinities to rank content). Pocket will move domain_affinities lists in the /v3/firefox/global-recs response from the spoc objects to the the `settings` of the feed (mapped with a unique identifier):

https://gist.github.com/kennylong/ccb4b09925f31ab21a025c278bdcb5d7

Note: This has the added benefit of reducing the size of the request significantly by reducing the amount of duplicate information returned.

In this example case, the scoring data included for one tab load might be something like:

+--------------------+------------------+-------+
|   parameter_set    |  domain_affinity | score |
+--------------------+------------------+-------+
| default            | business-234     |  0.60 |
| fully-personalized | business-234     |  0.95 |
| default            | sports-235       |  0.60 |
| fully-personalized | sports-235       |  0.00 |
| etc                |                  |       |
+--------------------+------------------+-------+

(2) User Spoc Scoring Logic

In addition to understanding the reach of personalization configurations, we would like to understand where in the sponsored content logic users are getting filtered out. To do that, for each new tab load, we would like to provide an event that indicates why a spoc was shown or not shown on the tab.

The decision process/pre-conditions that have to exist for showing a spoc is as follows:

Does the user have their domain affinities calculated?
Based on the spocsPerNewTab setting: Was the load eligible to see a spoc?
Do any spocs score above their min_score?
Were any of the spocs prevented from being shown due to the frequency cap?
Were any of the spocs not shown because there was another higher scoring spoc available?

If a spoc was not shown, we should be able to understand on each load where within that decision tree it was stopped.
Given the request in bug 1476458, the filter logic in #2 above needs an additional check (added as #6 below):

1. Does the user have their domain affinities calculated?
2. Based on the spocsPerNewTab setting: Was the load eligible to see a spoc?
3. Do any spocs score above their min_score?
4. Were any of the spocs prevented from being shown due to the frequency cap?
5. Were any of the spocs not shown because there was another higher scoring spoc available?
6. Were any of the spocs filtered out because they were expired? (bug 1476458)
could you weigh in on this nan?
Flags: needinfo?(najiang)
(In reply to kenny from comment #0)

The second metric should be straightforward to handle.

> (1) Personalization Configuration Scores
> 
> Every time a new tab is loaded, we would like to log an event with the score
> for every configuration in the feed response.

Sending this metric upon each new tab load sounds a bit redundant. Currently, Activity Stream only calculates user domain affinity once per day, and those scores for each individual story get calculated and cached upon each fetch of pocket feed (every 30 minutes).

So I'd recommend that let's only send this once upon each fetch, and the payload could be structured as follows:

{
  "event": "TOP_STORIES_SCORES",
  "value": {
     "default": [{"id": 1000, "score": 0.5}, {"id": 1001, "score": 0.3}, ...],
     "fully-personlized": [{"id": 1000, "score": 0.7}, {"id": 1001, "score": 0.4}, ...]
  }
}

This doesn't need the change of the current pocket feed. In order to test settings without the need to display an article to a user, we can add a boolean field "hidden" to each store so that the AS only reports the score but not display it to the user.

Let me know what you think on this.
Flags: needinfo?(najiang)
Syncing with Kenny offline on the first metric, they wanted to log this event on each new tab load so that it'd allow them to answer questions like, "there are X users with a score of 0.3 or higher for `fully-personlized`-`id 1001` opened Y tabs".

My proposed metric in Comment 3 only focused on the individual stories and its scores, which won't be able to answer that question without joining against the impression_stats_daily table. Given the size of the impression_stats_daily table, doing frequent table joins could be pretty slow and inefficient.

Th original proposal suits better in this case, and I'd suggest us to add a new ping type (i.e. new table) for this as it's mainly for Pocket optimization and has not much overlap with any of other existing tables.
For clarification, would we still include the scores for each personalization config + story (id) pair? And use the "hidden" method to test reach without showing a spoc?

Separately, we'll want to:

(1) Make sure that the "hidden" spocs don't get counted towards any frequency caps.

(2) Add the "spoc was hidden" to the list of reasons a spoc was filtered out in the "User Spoc Scoring Logic". So the updated list would be:

1. Does the user have their domain affinities calculated?
2. Based on the spocsPerNewTab setting: Was the load eligible to see a spoc?
3. Do any spocs score above their min_score?
4. [NEW] Were any of the spocs filtered out due to the "hidden" flag?
5. Were any of the spocs prevented from being shown due to the frequency cap?
6. Were any of the spocs not shown because there was another higher scoring spoc available?
7. Were any of the spocs filtered out because they were expired? (bug 1476458)
(In reply to kenny from comment #5)
> For clarification, would we still include the scores for each
> personalization config + story (id) pair? And use the "hidden" method to
> test reach without showing a spoc?

Sure, I think we can still do that if you find it useful. Also we do need a clear specs on the behavior of the "hidden" stories.
Scott I'm adding this to the current iteration, feel free to move it as necessary depending on your workflow / plans.
Assignee: nobody → sdowne
Iteration: --- → 63.3 - Aug 6
Priority: -- → P2
Iteration: 63.3 - Aug 6 → ---
Severity: normal → enhancement
Priority: P2 → P1
Blocks: 1512725
Assignee: sdowne → nobody
Iteration: --- → 68.2 - Apr 1 - 14
No longer blocks: 1512725
Iteration: 68.2 - Apr 1 - 14 → 68.3 - Apr 15 - 28
Priority: P1 → P2
Iteration: 68.3 - Apr 15 - 28 → 69.1 - May 13 - 26
No longer blocks: pocket-newtab-69
No longer blocks: pocket-newtab-68

Hey Scott, is this something we want to do in 69?

Flags: needinfo?(sdowne)

We should ask Kenny for his thoughts on this too.

My thoughts are 69 is too soon, and I think some more time needs to happen to rescope this bug now that this one landed: https://bugzilla.mozilla.org/show_bug.cgi?id=1535717

This bug was asking for a handful of things, one of them being described as: "If a spoc was not shown, we should be able to understand on each load where within that decision tree it was stopped."

The above linked bug is described as: "Implement a new AS telemetry event that records whether or not a SPOC was loaded. If it was not loaded, include the reason why."

Unless I'm wrong, those seem fairly similar, and a chunk of this bug was potentially already done in another bug. If that's the case, I'm thinking the following.

  1. Slightly less of a priority, given part of has already landed.
  2. Needs some rescope to see what's still needing to be done, and if it's still worth doing.
  3. It's been 10 months since filing, is it all still something we want?
  4. This bug is probably too big to be reasonably tackled. Given how 1535717 was landed in a separate bug, are there other opportunities to split some of the asks in this bug into smaller chunks. Probably would be easier to land in chunks and reason with.

We should get this bug closed, and some smaller bugs filed to tackle what's left in chunks, for 70 at the earliest.

Flags: needinfo?(sdowne) → needinfo?(kenny)

I agree with Scott that this can be closed. The work in that separate bug is sufficient for the SPOC reason side of things, and any future bugs related to "reach" will come out of the exploration the Pocket Data & Learning team is doing with regard to differential privacy.

Flags: needinfo?(kenny)

Closing this in favor of new bugs when we're ready to tackle this in an upcoming nightly cycle.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Component: Activity Streams: Newtab → New Tab Page
You need to log in before you can comment on or make changes to this bug.