Closed Bug 1159571 Opened 9 years ago Closed 9 years ago

Use server provided frequency caps for daily and lifetime totals

Categories

(Firefox :: New Tab Page, defect)

defect
Not set
normal
Points:
8

Tracking

()

VERIFIED FIXED
Firefox 40
Iteration:
40.3 - 11 May
Tracking Status
firefox40 --- verified
firefox41 --- verified

People

(Reporter: Mardak, Assigned: mzhilyaev)

References

(Blocks 1 open bug)

Details

(Whiteboard: .002)

Attachments

(2 files, 5 obsolete files)

We'll want two types of persistent frequency caps: a daily cap and a total cap. These caps come from the server instead of being hardcoded as initially implemented in bug 1140496.
jterry mentioned it would be nice to be able to spread out usage of frequency capping, i.e., show each suggestion once before showing the same thing a second time.

And to confirm, we do want daily and lifetime totals, e.g., 3 daily and 5 total -- this could result in a user seeing a suggestion for 2 days (3, 2) or 5 days (1, 1, 1, 1, 1) or even longer if some days had no impressions (1, 0, 0, 1, 0, 0, 3).

For now, we'll continue frequency capping by url as per bug 1140496.

There's potentially multiple ways to frequency cap on a daily basis:
- show at most daily-cap within 24 hours from first showing
- reset the counter at midnight
- 24 hour rolling (e.g., if we reset at time T, potentially the user sees a tile 3 times at T-1 then 3 more times at T+1)
Ideally, we'd want flexibility for frequency capping for various type combinations. The factors we need to consider are:
- is the sponsorship for awareness or response?
- is the revenue type for the sponsorship by impressions or by clicks?

If the campaign is CPC, we should need tighter control in setting impression and click caps. 

If the revenue type is by impressions, there's a fair balance we need to strike between the possibility of conversion and profitability. The less we display the sponsorship, the effectiveness will rise, however, it might take longer to reach revenue goal. 

There are a number of studies done on this topic which is worth looking into:
https://advertising.microsoft.com/user/en-us/researchlibrary/researchreport/OptimalFrequency.pdf
http://static.googleusercontent.com/media/www.google.fr/fr/fr/doubleclick/pdfs/Better-Brand-Engagement-with-Display-Formats-Feb-2012.pdf

Generally, 3 per 24 hours per user is optimal, but these study are based on IAB standard size display ads which is quite different than our ad products. We should test out what is optimal for our audience depending on product type, revenue type and eventually channel type.
(In reply to Kevin Ghim from comment #2)
> If the campaign is CPC, we should need tighter control in setting impression
> and click caps. 
What kind of tighter controls?
(In reply to Ed Lee :Mardak from comment #3)
> (In reply to Kevin Ghim from comment #2)
> > If the campaign is CPC, we should need tighter control in setting impression
> > and click caps. 
> What kind of tighter controls?

As in, if the campaign is CPC, cap is 2 vs. CPM cap is 5. Not sure what the number is for us, but we'd want to back up our assumptions from frequency vs CTR reporting. Is that possible?
I'm confused about what additional controls you need over using server provided frequency cap values.
(In reply to Ed Lee :Mardak from comment #5)
> I'm confused about what additional controls you need over using server
> provided frequency cap values.

Perhaps I can answer by asking some questions I have. For 39, can we manually set a frequency cap amount per impression and/or clicks per window? If not, we should consider that for future versions. Additionally, can we get reporting on frequency cap performance? Both questions should provide direction towards optimal capping amount per variable type (product, revenue, audience, etc).
Whiteboard: .? → .002
Assignee: nobody → edilee
Iteration: --- → 40.3 - 11 May
Points: --- → 8
I'm not sure what "frequency cap performance" means. With server provided frequency caps for daily and total views, we can set that on a per tile basis. We can measure impressions and clicks to calculate CTR as usual.

So this means we can serve a tile with frequency caps of 3daily+10total and another tile with 5daily+5total or whatever combination and see if that affects CTR.
Blocks: 1160372
Assignee: edilee → mzhilyaev
I believe that we add count of shows to the click ping - we will know the optimal frequency cap early through the campaign.  So we can correct it during the campaign.  Which brings the question of showing doing a round-robin through available suggested tiles rather then showing same suggested tile all the time.
Round robin probably isn't necessary. A call to _updateSuggested will pick the next one at random at this point and spreads out views more than just repeating the same.
Need info 

1) Is directoryId a unique tile identifier?
Do we recycle these id?
Is it possible to get a different tile with same directoryID?

2) If tile contains no daily/total frequency cap, should there be defaults?
If so which ones?
Flags: needinfo?(oyiptong)
(In reply to maxim zhilyaev from comment #10)
> Need info 
> 
> 1) Is directoryId a unique tile identifier?
> Do we recycle these id?
> Is it possible to get a different tile with same directoryID?
> 
The directoryId is unique and will never be reused.
> 2) If tile contains no daily/total frequency cap, should there be defaults?
If so which ones?

We should ask :jterry what a sensible default is.
I think 3/24 is the standard for the daily freq. cap.
Flags: needinfo?(oyiptong)
Depends on: 1161192
Attached patch v1. new frequency capping (obsolete) — Splinter Review
Attachment #8601629 - Flags: review?(msamuel)
Blocks: 1161656
updated documentation at browser/docs/DirectoryLinksProvider.rst
Attachment #8601629 - Attachment is obsolete: true
Attachment #8601629 - Flags: review?(msamuel)
Attachment #8601751 - Flags: review?(msamuel)
Comment on attachment 8601751 [details] [diff] [review]
v2. frequency caps - updated documentation

Review of attachment 8601751 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/modules/DirectoryLinksProvider.jsm
@@ +762,5 @@
> +   * @param json file path
> +   * @return a promise resolved to a valid object or undefined upon error
> +   */
> +  _readJsonFile: function DirectoryLinksProvider_readJsonFile(filePath) {
> +    let jsonObj;

`let jsonObj = {}` here ensures when an error occurs we still return a valid json object rather than `undefined`

@@ +792,5 @@
> +      // check _frequencyCapFilePath for existence
> +      let doesFileExists = yield OS.File.exists(this._frequencyCapFilePath);
> +      if (doesFileExists) {
> +        let fileObject = yield this._readJsonFile(this._frequencyCapFilePath);
> +        if (fileObject) {

I think you can skip this check if you make the change I mention above in _readJsonFile(). So you won't need line 791: `this._frequencyCaps = {};`

And you can directly write `this._frequencyCaps = yield this._readJsonFile(this._frequencyCapFilePath);`

@@ +852,5 @@
> +   *        all cap objects with lastUpdated less than (now() - timeDelta)
> +   *        will be removed. This is done to remove frequency cap objects
> +   *        for unused tile urls
> +   */
> +  _pruneFrequencyCapUrls: function DirectoryLinksProvider_pruneFrequencyCapUrls(timeDelta = DEFAULT_PRUNE_TIME_DELTA) {

Pruning based on a time delta seems a little arbitrary to me. Why is this delta chosen? What happens if a URL was last updated more than 10 days ago so we delete its frequency cap info and stop showing the tile even though it didn't reach its cap? Maybe we should prune for urls that have already reached their caps?
(In reply to Marina Samuel [:emtwo] from comment #15)
> Pruning based on a time delta seems a little arbitrary to me.
This is more of a failsafe of making sure we don't store frequency counts for old urls forever.

> Maybe we should prune for urls that have already reached their caps?
If we prune too quickly, then we'll end up showing something that reached its total cap -> prune -> now uncapped again.
Talked to Marina and we agreed to keep _readJsonFile() unchanged because it may be useful to check if file read fails.  We also decided to keep file existence check logic in _readFrequencyCapFile() method.
Attached patch v3. minor documentation tweak (obsolete) — Splinter Review
Attachment #8601751 - Attachment is obsolete: true
Attachment #8601751 - Flags: review?(msamuel)
Attachment #8602235 - Flags: review?(adw)
Attached patch v4. yet another minor doc change (obsolete) — Splinter Review
yes another one liner
Attachment #8602235 - Attachment is obsolete: true
Attachment #8602235 - Flags: review?(adw)
Attachment #8602238 - Flags: review?(adw)
Comment on attachment 8602238 [details] [diff] [review]
v4. yet another minor doc change

Review of attachment 8602238 [details] [diff] [review]:
-----------------------------------------------------------------

Technically this looks fine, but I think there's kind of a big problem around leaking the user's history.  Sorry for not thinking of this earlier when you sketched the idea to me, Max.  (I'd ask you about it in IRC first but it's the end of the day and I see you're not there, and I promsied a review today.)  I also have code comments below that I typed up earlier.

There are a couple of aspects.  You're storing capObject.clicked, so it's possible to tell that the user visited capObject's URL.  But even without that, you're storing the simple fact that a suggested link was shown in newtab at all.  (Because the URLs that are the keys in the _frequencyCaps object are suggested URLs, not URLs from the user's history, correct?)  So while it's not possible to tell what the user has in their history (setting aside the capObject.clicked problem), it is possible to guess at the kinds of pages that might be in their history based on the ones that were suggested.

Am I wrong about any of that?

I think it's OK to store all that information as long as the `clicked` suggested pages remain in the user's history and as long as the pages that prompted the suggested sites remain in the user's history, but once those pages are deleted from history, I don't think it's acceptable to keep that info on disk, not even for the 10-day window in the patch.

There are a couple of technical options, but it probably gets complicated when you think about purging suggested URLs when the pages that prompted the suggestions are deleted.  You might be able to use the Places annotation service.  Then your data would be hooked into the history service automatically and you wouldn't have to worry about leaking anything (with the aforementioned complication around suggested URLs).  But I don't think the annotation service is used very much.  We could ask Marco for his opinion.  http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/nsIAnnotationService.idl

The other option is to do what you're doing but listen for history deletions so you can update your data.  In that case I think we should add onDeleteURI (from nsINavHistoryObserver) to PlacesProvider and have it broadcast onLinkChanged.  DirectoryLinksProvider already observes PlacesProvider, so it would be notified when a link is deleted.  Then you would update your data.

You and Ed should figure something out, if what I'm saying is reasonable.

::: browser/docs/DirectoryLinksProvider.rst
@@ +183,5 @@
>    Suggested Tile if the user has the site in one of the top 100 most-frecent
>    pages. Only preapproved array of strings that are hardcoded into the
>    DirectoryLinksProvider module are allowed.
> +- ``frequency_caps`` - an object consisting of daily and total frequency caps
> +  that limit a number of times a Suggested Tile can be shown in the new tab

a number -> the number

::: browser/modules/DirectoryLinksProvider.jsm
@@ +448,5 @@
>      let ping = new XMLHttpRequest();
>      ping.open("POST", pingEndPoint + (action == "view" ? "view" : "click"));
>      ping.send(JSON.stringify(data));
>  
> +    return Task.spawn(function() {

function* ()

@@ +542,5 @@
>  
>        // Allow for one link suggestion on top of the default directory links
>        this.maxNumLinks = links.length + 1;
>  
> +      // prune frequency caps off outdated urls

off -> of (I think?)

@@ +761,5 @@
> +   * Reads json file, parses its content, and returns resulting object
> +   * @param json file path
> +   * @return a promise resolved to a valid object or undefined upon error
> +   */
> +  _readJsonFile: function DirectoryLinksProvider_readJsonFile(filePath) {

Nit: You could use Task.async to make this function a little nicer.  That's basically a style choice though, so up to you.

https://developer.mozilla.org/en-US/docs/Mozilla/JavaScript_code_modules/Task.jsm#async%28%29

@@ +785,5 @@
> +   * @return a promise resolved upon load completion
> +   *         on error or non-exstent file _frequencyCaps is set to empty object
> +   */
> +  _readFrequencyCapFile: function DirectoryLinksProvider_readFrequencyCapFile() {
> +    return Task.spawn(function() {

Nit: Same nit here about Task.async.  Instead of

foo: function(arg) {
  return Task.spawn(function* () {
  });
}

you can write

foo: Task.async(function* (arg) {
}) // note you don't need bind() here

@@ +789,5 @@
> +    return Task.spawn(function() {
> +      // empty _frequencyCaps object - it will be set to file content
> +      this._frequencyCaps = {};
> +      // check _frequencyCapFilePath for existence
> +      let doesFileExists = yield OS.File.exists(this._frequencyCapFilePath);

Just read the file without checking if it exists first.  OS.File.read will reject its promise with an error in that case, which you handle.

@@ +870,5 @@
> +  _isTodayTimestamp: function DirectoryLinksProvider_isTodayTimestamp(timestamp) {
> +    let showOn = new Date(timestamp);
> +    let today = new Date();
> +    // call timestamps identical if both day and month are same
> +    return showOn.getDate() == today.getDate() && showOn.getMonth() == today.getMonth();

Please check the year too just to prevent the possibility of embarrassing bugs.  And "timestamps identical" isn't right, is it?  It's "days identical."

@@ +891,5 @@
> +      // update lastShownDate
> +      capObject.lastShownDate = Date.now();
> +    }
> +
> +    // bump both dialy and total counters

dialy -> daily

@@ +893,5 @@
> +    }
> +
> +    // bump both dialy and total counters
> +    capObject.totalViews ++;
> +    capObject.dailyViews ++;

Please remove the space before ++.
Attachment #8602238 - Flags: review?(adw)
(In reply to Drew Willcoxon :adw from comment #20)
> it is possible to guess at the kinds of pages that might be in their
> history based on the ones that were suggested.
That's true, and to be clear, one could try to infer types of pages as opposed to specific pages or sites. Because the suggestion isn't necessarily tied to a specific page, I'm not sure if there's a good way to watch page deletions to trigger removing the fact that a suggestion was shown.

Would it be better if the url was not in plaintext? Similar to blocked tiles are recorded as a md5 hash?
maksik, the md5 hash I'm referring to in NewTabUtils:
https://hg.mozilla.org/mozilla-central/file/617dbce26726/toolkit/modules/NewTabUtils.jsm#l61

We could probably reuse that code for some simple/fast-ish hashing of strings.
We can md-hash tile urls, but how does it save us from attack?
I believe Drew's concern is a user who deleted a bunch of history, and then his computer got arrested.
The FBI agent gets frequency caps file from user's machine, gets json file from mozilla, run's json file suggested tiles urls through md5 hash and deciphers which tiles where shown to a user and potentially when.

My knee jerk reaction would be to encrypt the frequency cap file, but that does not help either, for if FX can decrypt the file upon restart, so can FBI agent. 

Theoretically we could store urls that triggered a show of a particular tile in frequency cap file.
Then, upon deletion we remove deleted url from the list and remove frequency cap entry when all trigger urls are deleted and we can't find another trigger url in suggestedLinksMap.  This has an issue with false positives: we are essentially removing frequency_cap info for tile completely, so next time FX reloads json file, views and clicks are set to 0, and if the user visits triggering page again, he will see the ad that he might've clicked already.
Max, I agree with all of that, including your suggested fix and its drawback.  (And as we talked about, you don't have to resort to scenarios involving the FBI.  We're Mozilla, and we try to protect the user's privacy as much as we can.  (I'm not implicating you and Ed or anything.  You're good people! :-)))

Sorry to cause more trouble for you guys.
No longer blocks: 1160372
Summary of our meeting: I had two concerns, (1) `clicked` reveals which suggested pages you've visited, (2) the URLs by themselves may reveal the kinds of sites you have visited.  I'm dropping (2) because the "kinds of sites" are very broad categories like news sites.  (1) on the other hand reveals your actual history, and we'll try to fix that by observing when those URLs are deleted, and responding by deleting them from the data.
Max, here's an example of how to add a visit to a page: http://mxr.mozilla.org/mozilla-central/source/browser/base/content/test/newtab/browser_newtab_bug722273.js#40  I think `place` has to include a `visits` array with at least one visit.  PlacesUtils comes from PlacesUtils.jsm.  Method definition here: http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/mozIAsyncHistory.idl#175

To remove the visit you add, you have to use this other interface: http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/History.jsm#264  You can access that from PlacesUtils.history.remove().
Attachment #8603174 - Attachment is patch: true
Attachment #8603174 - Attachment mime type: text/x-patch → text/plain
Comment on attachment 8603174 [details] [diff] [review]
v5. fix reviewer comments, handle page deletion and clear history events

Review of attachment 8603174 [details] [diff] [review]:
-----------------------------------------------------------------

I was expecting PlacesProvider.onDeleteURI to call PlacesProvider._callObservers("onLinkChanged") like we talked about.  I think you're right though that clearing recent history does not trigger individual calls to onDeleteURI for each of the deleted pages, so listening for that too is the right thing to do.  But in that case I would expect you to call PlacesProvider._callObservers("onManyLinksChanged").

But this is fine, it should work, it's good enough to land.  Thanks!
Attachment #8603174 - Flags: review?(adw) → review+
(In reply to Drew Willcoxon :adw from comment #28)

> I was expecting PlacesProvider.onDeleteURI to call
> PlacesProvider._callObservers("onLinkChanged") like we talked about. 

I took a harder look into calling onLinkChange upon page deletion or clear history, and decided against it.
Apart from removing clicked flag, nothing really may change in Directory provider upon page deletion.
So, I felt perhaps we should not bother the rest of the code recomputing suggested tiles, if all we do is removing a flag from click tracking object.  Thanks for the review
Attachment #8602238 - Attachment is obsolete: true
Attachment #8603174 - Attachment is obsolete: true
ni? per https://wiki.mozilla.org/Firefox/Data_Collection

(In reply to Pulsebot from comment #31)
> https://hg.mozilla.org/integration/fx-team/rev/ded137b69283
There's no additional data being collected, but the server provides an extra object containing frequency caps for daily/total.
Flags: needinfo?(benjamin)
Blocks: 1163142
https://hg.mozilla.org/mozilla-central/rev/ded137b69283
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 40
f+
Flags: needinfo?(benjamin)
STR:

1) Go to about:config and set browser.newtabpage.directory.src = https://people.mozilla.org/~elee/suggested.unicorn3.json

2) Visit addons.mozilla.org and several other sites until the first 8 tiles (not including suggestions) are history tiles. A suggested tile should appear.

3) Open 9 more tabs - verify that the same suggested tiles appear.

4) On the 11th tab, verify the suggested tile changes (different wording)

5) Open 9 more tabs. On the 21st tab, verify no more suggested tiles appear.

6) Change the system date to another date in the future. Open 2 more tabs. Verify there is a suggested tile. Repeat steps 3-5
Status: RESOLVED → VERIFIED
I can confirm that the fix is now live on latest Aurora, build ID: 20150528004000, using the STR from the above comment.

I've tested on Windows 7 64-bit, Ubuntu 14.04 32-bit and Mac OS X 10.9.5.
You need to log in before you can comment on or make changes to this bug.