Closed Bug 972933 Opened 10 years ago Closed 6 years ago

Generate token for impression/click submission that allows for estimated unique counts

Categories

(Firefox :: New Tab Page, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: MarcoM, Unassigned)

References

Details

(Whiteboard: [tiles])

1.  Goal
* As a product owner I need to know the unique impressions for the directory tiles so that we can accurately price tiles and rework tiles users aren't seeing.

2.  Acceptance Criteria (AC):
* Unique impressions per region when directory tiles are visible is sent back to Mozilla through a channel such as FHR

3.  Notes/Supporting Documentation:
* Unique impressions is the number of unique user sessions who have seen the tiles.
** ex. 4 users load the page but 2 of them load it 10 times, our unique impression number is still 4 for that day.
* Impressions are only of value while directory tiles are visible, once they have been removed or replaced the data is only interesting for general stats
I agree that this and bug 972936 should probably be channeled through FHR (just like we do search metrics there as well), both because it's a channel we already have and because users need to be able to opt out from sending any data, which is already implemented there and we don't want a large list of data opt-out/in, it should stay comprehensible.
Blocks: 973273
Depends on: 973426
A few questions:

a) Is this unique impressions *per tile* or unique impressions *per New Tab page view*? The two are different.
b) Unique impressions over what time period(s)? Day, week, month, quarter, year ... ?

> should probably be channeled through FHR

Maybe. I think that requires further thought and discussion, to ensure that we are meeting expectations with regard to user benefit, tracking policy, etc.
(In reply to John Jensen from comment #2)
> A few questions:
> 
> a) Is this unique impressions *per tile* or unique impressions *per New Tab
> page view*? The two are different.

This is impression per tile.

> b) Unique impressions over what time period(s)? Day, week, month, quarter,
> year ... ?

Will get you this answer.  Generally it's daily, monthly, annually, and an ad-hoc range that might correlate to a campaign from a Sponsor Tile owner.
Depends on: 974474
No longer depends on: 974474
No longer depends on: 973426
Converting to a work item following discussion with Bryan.
No longer blocks: 973273
Whiteboard: [story] [tiles] → p=0
Whiteboard: p=0 → [tiles] p=0
Status: NEW → ASSIGNED
Assignee: nobody → clarkbw
Whiteboard: [tiles] p=0 → [tiles] p=13 s=it-30c-29a-28b.3
I see there was a bug WONTFIXED for submitting data to FHR, but it lacked context. It sounds like one thinking was cookies for a user id would be required and that would conflict with data submission to FHR.

Is there a reason why we /need/ cookies as opposed to regular logging using IP+UA? There's noise with that technique but there's also noise with cookies, and perhaps for our use case, that might be acceptable if it avoids issues of unique cookie ids.
An alternative to cookie is an explicit id that Firefox creates/controls (perhaps generated per site/subdomain) that is turned off completely when the user turns on DNT.

Part of it is having defaults that work for our use case but still provides users choice.
Whiteboard: [tiles] p=13 s=it-30c-29a-28b.3 → [tiles] p=13 s=it-30c-29a-28b.3 [qa+]
FWIW, I suggested the cookie as an implementation strategy for the following reasons:

* we don't have to invent new UI controls: the existing cookie manager would be sufficient.
* the server can handle DNT
* we automatically get reasonable expiration properties (expires after N days of non-use, etc)

But I'm not wedded to it if a non-cookie ID has other benefits.
Reading through this and related bugs, the only uniqueness seems to be for counting how many people ever see Directory Tiles content. So it's unrelated to measuring impressions and clicks.

To satisfy the unique impression count requirement, couldn't we just remember the last date a directory tile was shown? This isn't tied to reporting any browsing history.
Lets leave this bug as just counting people who see Directory Tiles, I'll file a separate bug for Directory Tile impressions as they two are related but not required to be together.
For the immediate future we'll be able to get an understanding of user count from the impressions via Telemetry.
Whiteboard: [tiles] p=13 s=it-30c-29a-28b.3 [qa+] → [tiles] p=13 s=it-31c-30a-29b.1 [qa+]
QA Contact: paul.silaghi
I'm going to close this bug out, though the discussion is important we are going to have to forgo this metric for now when using Telemetry as our data channel.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → WONTFIX
No longer blocks: fxdesktopbacklog
Whiteboard: [tiles] p=13 s=it-31c-30a-29b.1 [qa+] → [tiles]
(In reply to Bryan Clark (Firefox Search PM) [:clarkbw] from comment #11)
> have to forgo this metric for now when using Telemetry as our data channel.
Reopening: As mentioned in bug 975235 comment 3, we'll be using a non-Telemetry tiles-metrics server to count impressions and clicks, so we'll also be able to count uniques there as well. To be clear, this would be for non-Release builds at least for now.

tspurway has some ideas of using HyperLogLog for probabilistic counting of submissions that also improves privacy compared to standard cookies. At a high level, a user is randomly placed into one of some predetermined number of buckets and reports a logarithmically-weighted random number. Most users will be colliding with the same bucket number and weighted random number, but across many users, there will be some that report a lower probability random number, which increases our estimated cardinality.

We can use this to estimate unique impressions and clicks on tiles without being able to uniquely identify users. This could fit into two bytes of "id" (under 65536 possible) for the millions of Firefox users. The actual number of bits we can tweak to achieve desired accuracy/error rates for some population size.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Flags: firefox-backlog?
Blocks: tiles-dev
Component: Firefox Operations → New Tab Page
Depends on: 975235
Product: Tracking → Firefox
Hardware: x86_64 → All
Summary: Unique impressions metrics for tiles → Generate token for impression/click submission that allows for estimated unique counts
Version: --- → Trunk
The typical use of HyperLogLog is to take an existing UID, hash it, then split it into the index/bucket piece and the rest/run piece, which is used to count the number of leading zeroes (resulting in the logarithmic distribution).

If we choose to randomly generate the index/bucket and zeros-count instead of deriving from a hash of a stable/persistent identifier, we could just do:

bucket = Math.floor(Math.random() * NUM_BUCKETS)
zeroes = Math.floor(Math.log2(Math.random() * (Math.pow(2, MAX_ZEROES) - 1) + 1))

However, stashing the (bucket, zeroes) tuple in the profile directory (pref, localStorage, file) may have different levels of persistence, and it could be trivially edited by someone poking around.
Flags: firefox-backlog? → firefox-backlog+
Assignee: clarkbw → nobody
Depends on: 1042214
Depends on: 1043669
Blocks: 1030832
From tspurway: I think if we hardcode 12 bits HLL resolution, it will have a very simple and robust implementation. 12 bits gives us good accuracy (<1% error rates) with compact tables (4K per key).
Blocks: 1050643
No longer blocks: 1030832
No longer blocks: tiles-dev
Status: REOPENED → RESOLVED
Closed: 10 years ago6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.