Add telemetry for cookie jar sizes in a state partitioned world investigation
Categories
(Core :: Networking: Cookies, task, P2)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox122 | --- | fixed |
People
(Reporter: emz, Assigned: edgul)
References
(Blocks 1 open bug)
Details
(Whiteboard: [necko-triaged][necko-priority-queue])
Attachments
(2 files)
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
4.42 KB,
text/plain
|
chutten
:
data-review+
|
Details |
We expect that since we shipped Total Cookie Protection, partitioning cookies, the amount of cookies set in profiles has increased. We should quantify this using telemetry.
If we store many more cookies (per top level site) with TCP there is the risk that recently used cookies get purged too early leading to data loss and an overall bad user experience when browsing.
This topic also came up at TPAC 2022 where folks from Safari/Webkit mentioned that they had concerns over partitioning 3rd-party by default because of potentially larger cookie jars.
Relevant prefs:
network.cookie.maxNumber
network.cookie.maxPerHost
Updated•2 years ago
|
Updated•2 years ago
|
Comment 1•2 years ago
|
||
Hi Paul,
How important do you think this is? Do you want necko team to work on this? Or this is more related to privacy team?
Thanks.
Updated•2 years ago
|
Comment 2•2 years ago
|
||
Hi Dan, do you have any thoughts about this one? Should we make a decision or should we add more telemetry first?
Comment 3•2 years ago
|
||
There are definitely bad outcomes here that are likely, and that could be addressed in different ways. We don't seem to have much existing telemetry on cookie use so we surely need to add some.
In addition to adjusting the limits, we're need to adjust our purging algorithms to account for 3rd party partitioning. Much better to purge 3rd-party-partitioned cookies before we purge 1st party cookies for that same domain. At least up to a point where we want to keep recent and active partitioned cookies over old unused 1st party stuff.
How do we count 3rd-party partitioned cookies? Do all the partitioned google-analytics count together against the same maxPerHost? In practice very popular inclusions like that might end up getting flushed constantly even if they only use a handful of cookies per partition. Maybe that's OK? if they're included that much they're probably trackers, right? But if it's also a popular 1st party domain then people will get logged out of that site a lot. If the "maxPerHost" is each unique 1st/3rd party pair then some sites will easily cause the global max to fill up and cause the user to get logged out of other primary sites they rely on.
Do we need to increase the global maxNumber? Likely. How many people are at that limit? how much storage is being used by their cookies? How much higher does it need to be? Will that much more storage space harm users? (likely not on space given how much we let sites store in IDB and service workers, but maybe in time cost) We can't answer without telemetry.
My gut feel is that we need to count "maxPerHost" separately in a same-site context vs a partitioned one, with possibly different "max" values. Not sure if it makes more sense to count each partition as a separate "site" with a small max, or lump them all together counting against the same max (possibly larger than the unpartitioned max). Note that the spec suggests "at least 50 per domain" but we've used 180 since bug 1460251 (no evidence given) and 150 for some time before that.
Note that we call it maxPerHost but it's actually a domain/site/eTLD+1 limit to prevent malicious sites from spinning up unlimited subdomains to try to purge another site's cookies. We've used a per-domain limit since before RFC 6265 (2011) standardized as such in reaction to real web shenanigans. The earlier RFC 2965 (2000) had a "per unique host or domain name" limit.
It looks like our current maxPerHost is counted against the base domain-origin attributes pair; partitioning could really explode the total number of cookies in use.
What telemetry would I want?
- total number of cookies, for sure.
- some way of judging how those are spread out.
- what is the highest number of cookies per (domain+attrbutes)?
- what is the average or median number of cookies per (domain+attributes)?
- how many unique (domain+attribute) groups are there?
- maybe we should flag users who do or don't have containers enabled, because users with containers on top of TCP will of course have more groups. It depends on if they're using it to isolate particular sites (like Facebook Container) which kind of duplicates what TCP is doing and might not impact cookie use, or using it to create alternative personas which might.
- ignoring origin attributes, how many unique (domains) are there?
- Do we want to measure cookie recency? I think given our purge philosophy we might find a MOST users will have a near-max number of cookies unless they regularly clear all their cookies. So maybe not just the total number of cookies, but the total with a last-used time in the last week, last two weeks, the last month?
I'm not sure how easy it is to gather those stats without a perf impact. The "maxPerHost" number seems to be only checked when we're adding a cookie, for example, so I'm not sure there's a good time to stop and iterate through unique hosts to get that number.
| Reporter | ||
Comment 4•2 years ago
|
||
Sorry for the late response here. Talking to Tim I learned that there has already been a discussion about dividing this work between the Necko and the Privacy team. As a first step Necko plans on adding telemetry.
Comment 5•2 years ago
|
||
Ed, do you have time to work on this?
Tim and I had a discussion about this and we think the first step would be adding some telemetry below:
- Partitioned cookies size v.s. Non-partitioned cookie
- Per-domain partitioned cookie size
- Per-domain non-partitioned cookie size
- How often do we purge cookies
Yes, I will take this on
After a quick review of the discussion, I have some questions.
- a. Regarding the first requested telemetry call “Partitioned cookies size vs Non-partitioned cookie”, is this simply a recording of whether the user is using partitioning, ie if ETP Strict is enabled? Or is it possible to have both partitioned and non-partitioned cookies? If so, would that be done with ETP exceptions?
b. Do we even need this telemetry or can it be “gleaned” (sorry) from the following two telemetry items (per-domain non/partitioned cookie size)? - Regarding per-domain cookie size telemetry, for partitioned we will send a list of
origin+TopLevelOriginsand their corresponding cookie counts? For non-partitioned just send a list of origins with their corresponding cookie counts? Is there a privacy concern here with sending the raw origins? Can we hash the origins to preserve privacy and uniqueness? - I’m not familiar with telemetry dashboards, but I’m assuming we can just telem every time we purge and filter by useragent in the dashboard to find average number of calls per user. BTW, I’m also assuming by purging we are specifically referring to TCP’s purging and that we are not concerned with general periodic purging of expired cookies, manual triggering and automatic on close clearing. Does this make sense?
- I’m not clear on the details of how running containers affects the partition keys exactly. Similarly, I’m not sure which attributes we are talking about with
domain+attributes. It seems pretty clear that these will affect the values we are sending so maybe we need something a little more complex than just aorigin+TLOrigin : count, likeorigin+TLOrigin+DesiredAttributes : count. A little clarity on what exactly should go here would help. - I’m hoping the code will provide a little more insight on this one, but if anyone knows off-hand the best place/time we should be triggering the telemetry, particularly the non-purge items, that might save me some time. If overhead is low enough maybe we just send on shutdown before any automatic cookie cleanup?
| Reporter | ||
Comment 8•2 years ago
|
||
(In reply to Ed Guloien [:edgul] from comment #7)
After a quick review of the discussion, I have some questions.
- a. Regarding the first requested telemetry call “Partitioned cookies size vs Non-partitioned cookie”, is this simply a recording of whether the user is using partitioning, ie if ETP Strict is enabled? Or is it possible to have both partitioned and non-partitioned cookies? If so, would that be done with ETP exceptions?
We want to know the distribution of cookie sizes labelled / categorized by partitioned and unpartitioned. When recording the size of a cookie to determine whether its partitioned we can check if it has a non-empty partitionKey in OriginAttributes. It is possible to have both partitioned and non-partitioned cookies. It depends on the context they're set in (1p vs 3p). We don't have to check for whether the user has enabled TCP or not. ETP exceptions are also not necessary to look at, the partitionKey should be enough.
b. Do we even need this telemetry or can it be “gleaned” (sorry) from the following two telemetry items (per-domain non/partitioned cookie size)?
Yes, that could work. If we have per-domain data we can add that up to get global data. Though maybe it's easier to simply add an additional global histogram. I have doubts whether we can collect per-domain data for specific domains though. Bucketing "per domain" might be okay.
- Regarding per-domain cookie size telemetry, for partitioned we will send a list of
origin+TopLevelOriginsand their corresponding cookie counts? For non-partitioned just send a list of origins with their corresponding cookie counts? Is there a privacy concern here with sending the raw origins? Can we hash the origins to preserve privacy and uniqueness?
We most likely can't collect specific sites / origins as part of release telemetry. We want to know the sizes grouped by site, not the actual site. This can help inform potential per-site cookie size limits we want to enforce.
- I’m not familiar with telemetry dashboards, but I’m assuming we can just telem every time we purge and filter by useragent in the dashboard to find average number of calls per user. BTW, I’m also assuming by purging we are specifically referring to TCP’s purging and that we are not concerned with general periodic purging of expired cookies, manual triggering and automatic on close clearing. Does this make sense?
The telemetry code should take care of data aggregation as long as you use the correct telemetry type, e.g. histograms etc.
TCP does not perform cookie purging. There is tracker cookie purging (See PurgerTrackerService.sys.mjs). However for this bug we're only concerned about purging by the cookie code itself, e.g. when the cookie jar fills up to the max. The other mechanisms are out of scope.
- I’m not clear on the details of how running containers affects the partition keys exactly. Similarly, I’m not sure which attributes we are talking about with
domain+attributes. It seems pretty clear that these will affect the values we are sending so maybe we need something a little more complex than just aorigin+TLOrigin : count, likeorigin+TLOrigin+DesiredAttributes : count. A little clarity on what exactly should go here would help.
We could OriginAttributes as a boundary (or key) for this telemetry, rather than specifically the partitionKey. Containers also affect global cookie limits, so they should be considered too. This key should only be used for bucketing and not exposed to telemetry, because it may contain sensitive data, such as the site in the partitionKey.
- I’m hoping the code will provide a little more insight on this one, but if anyone knows off-hand the best place/time we should be triggering the telemetry, particularly the non-purge items, that might save me some time. If overhead is low enough maybe we just send on shutdown before any automatic cookie cleanup?
You could collect this data in our cookie code somewhere. You could listen for idle-daily to trigger the calculation / collection once a day. We should still be mindful of performance and not utilize too much CPU for too long. Alternatively we could do it on idle after startup instead. That might be sufficient.
Comment 9•2 years ago
|
||
Thanks for Paul's feedback.
I have nothing to add.
| Assignee | ||
Comment 10•2 years ago
|
||
Updated•2 years ago
|
| Assignee | ||
Comment 11•2 years ago
|
||
Comment 12•2 years ago
|
||
Comment on attachment 9364003 [details]
data-collection-request-cookie-partition-telemetry.md
PRELIMINARY NOTE:
Please confirm that when you say these counts are "by host+OA" that this collection does not in fact collect the host+OA? (Also, for my curiosity, what is an OA in this context?)
Also: Since these are Glean collections in mozilla-central, did you know you could use ./mach data-review to generate the data review template for you? It both saves time and makes for a more consistent presentation of information for the Data Steward, so it's wins for everyone! (there's more info in the announcement blogpost).
DATA COLLECTION REVIEW RESPONSE:
Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?
Yes.
Is there a control mechanism that allows the user to turn the data collection on and off?
Yes. This collection can be controlled through the product's preferences.
If the request is for permanent data collection, is there someone who will monitor the data over time?
No. This collection will expire in Firefox 128.
Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?
Category 1, Technical.
Is the data collection request for default-on or default-off?
Default on for all channels.
Does the instrumentation include the addition of any new identifiers?
No.
Is the data collection covered by the existing Firefox privacy notice?
Yes.
Does the data collection use a third-party collection tool?
No.
Result: datareview+
| Assignee | ||
Comment 13•2 years ago
•
|
||
Yes, we are only collecting the counts; the number of cookies within each host+OA with no host+OA identifiers.
The OA stands for OriginAttributes which is basically a set of fields that describe a context where the cookies are used. This information is used heavily to achieve distinguish and separate (partition) otherwise similar looking data. For example, you might might have cookies for any given website in multiple containers, across non/private windows or directly/from iframes.
https://searchfox.org/mozilla-central/source/__GENERATED__/__android-armv7__/dist/include/mozilla/dom/ChromeUtilsBinding.h#1097-1103
Thanks for the ./mach tip. I'll use it next time for sure.
Comment 14•2 years ago
|
||
Comment 15•2 years ago
|
||
| bugherder | ||
Updated•2 years ago
|
Description
•