Open Bug 231852 Opened 17 years ago Updated 6 days ago

ETag: filtering to counter web tracking

Categories

(Core :: Networking: Cache, defect, P3)

defect

Tracking

()

REOPENED

People

(Reporter: bmo, Unassigned)

References

(Blocks 2 open bugs)

Details

(Keywords: privacy, Whiteboard: [necko-backlog])

Attachments

(1 file)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040113
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040113

a recent post to bugtraq (
http://cert.uni-stuttgart.de/archive/bugtraq/2004/01/msg00166.html ) has revived
the discussion about a long-standing privacy issue with mozilla and other web
browsers.  martin pool discovered this problem and first publicized his findings
in march of 2000 (
http://cert.uni-stuttgart.de/archive/bugtraq/2000/03/msg00365.html ).  his POC
code, meantime, now lives at http://sourcefrog.net/projects/meantime/

mozilla needs some sort of defense against this.  perhaps whitelist/blacklist
support for enable cache-related features?

Reproducible: Always

Steps to Reproduce:
1.
2.
3.
Assignee: general → darin
Component: Browser-General → Networking: Cache
QA Contact: general → cacheqa
How is Etag linked to other HTTP functions? Can we simply block this header?
Summary: need defense against meantime-style web tracking → ETag: filtering to counter web tracking
Confirming, not sure what the solution should be though.
Status: UNCONFIRMED → NEW
Ever confirmed: true
-> default owner
Assignee: darin → nobody
Hello,

Could it be that in the private browsing the Etags should be ingored EVEN IF this would impact caching? To prevent that Etag ID to be use for persistent tracking?

Here is the WikiPedia page about that Tag: http://en.wikipedia.org/wiki/HTTP_ETag#Tracking_using_ETags
Given that there are plenty of other ways to do tracking with cached resources, it doesn't seem productive to do this unless we have something that will address other such problems as well.  That said, I'm not optimistic that there's a reasonable way to solve them (where "reasonable" means something other than disabling all caching).
(In reply to comment #5)
> (where "reasonable"
> means something other than disabling all caching).

When privacy is a concern I don't mind disabling all caching or at least destroying the cache every 5 minutes...
(In reply to comment #5)
> Given that there are plenty of other ways to do tracking with cached
> resources, it doesn't seem productive to do this unless we have something
> that will address other such problems as well. 

What other ways?  If you turn off scripting and cookies how can the server receive back some previously set unique id (other than an etag)?

> That said, I'm not
> optimistic that there's a reasonable way to solve them (where "reasonable"
> means something other than disabling all caching).

Should it not be possible to disable/blacklist/whitelist just etag support?  The cache would not send etags and fall back to dates.
I think the way to solve this is to have a pref (off by default) that prevents validation headers from being sent. So when a user has changed this pref, the cache still works for resources that have expiration time set up. It does not prevent all fingerprinting but significantly reduces the entropy you can put in the cache.
Blocks: 906448
Even the old evercookie http://samy.pl/evercookie/ , which made major headlines some time ago, uses ETags. (forevercookie tracking bug is already added here).
Another demo site demonstrates how to use ETag in the exact same way as ordinary cookies:
http://lucb1e.com/rp/cookielesscookies/
This problem is now hitting news sites [3].

And ETags are being used in the last years by major websites, e.g. Hulu [1], to track users. Also, commercial ad networks and trackers [2] and websites are already using this.
[1] http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1898390
[2] http://pastebin.com/FhUYuRsb
[3] http://www.heise.de/newsticker/meldung/User-Tracking-im-Web-Forscher-warnt-vor-heimtueckischer-Tracking-Technik-2048507.html
http://www.heise.de/newsticker/meldung/Websites-hebeln-Anti-Cookie-Massnahmen-aus-1288914.html

As long as we have bugs like this, we can't offer features like "Private Window" or cookie prefs.
Unlike cookies, the browser sends ETags despite all privacy settings.

This isn't merely theoretical anymore, and this must be stopped now.
Severity: enhancement → major
Comment on attachment 612681 [details] [diff] [review]
Proposed patch to fix this (validation headers pref)

Thanks for the patch, Camilo Viecco.

However, I'm not sure that this is the right approach. If I read this correctly, this will disable the server trip entirely, meaning we will use the cached version of the document. This may be the right thing to do in some cases, but maybe not all. It would make the browser faster, and help privacy, but we might get stale documents. Whether or not depends on the other cache factors, which I don't know.

The alternative would be to make the server check, but not send the ETag (or not store it, or both), so that we always fetch the document once we decide to make the server roundtrip. This would increase traffic, but would not increase latency all that much. Correctness would be the same.

Both of these solutions would be all or nothing and completely remove ETag handling. This is fine for people who are aware of the ETag issue, but I'M hoping for something we can enable for all users by default.

Can somebody think of a good way to tell apart real cache ETags from tracking ETags? Something that is implementable?
I wrote:
> Can somebody think of a good way to tell apart real cache ETags from tracking ETags?

Whatever the solution is, it cannot be a heuristic. Any heuristic would be workaround around in no time. The fact that ETags are used at all shows that there is a lot of energy to work around the browser, so we need something that *cannot* be worked around. But at the same time can be enabled for all Firefox users. Ideas?

comment 5:
> Given that there are plenty of other ways to do tracking with cached resources,
> it doesn't seem productive to do this unless we have something that will address other
> such problems as well.

Yes, we need to address them one after the other. But we can't ignore the problem, the implications are too big.
(In reply to Ben Bucksch (:BenB) from comment #10)
> Comment on attachment 612681 [details] [diff] [review]
> Proposed patch to fix this (validation headers pref)
> 
> Thanks for the patch, Camilo Viecco.
> 
> However, I'm not sure that this is the right approach. If I read this
> correctly, this will disable the server trip entirely, meaning we will use
> the cached version of the document. This may be the right thing to do in
> some cases, but maybe not all. It would make the browser faster, and help
> privacy, but we might get stale documents. Whether or not depends on the
> other cache factors, which I don't know.

Ben the way this patch works is that if gecko thinks it will need to make a request
for the document (cache is considered stale) then it will NOT include the validation
headers (etag or modified-since). Thus it will make the browser slower for content that
does NOT use expiration information for the cache. There is no use of stale content.

IE does what you describe below. 

> 
> The alternative would be to make the server check, but not send the ETag (or
> not store it, or both), so that we always fetch the document once we decide
> to make the server roundtrip. This would increase traffic, but would not
> increase latency all that much. Correctness would be the same.
> 
> Both of these solutions would be all or nothing and completely remove ETag
> handling. This is fine for people who are aware of the ETag issue, but I'M
> hoping for something we can enable for all users by default.
> 
> Can somebody think of a good way to tell apart real cache ETags from
> tracking ETags? Something that is implementable?
Keywords: privacy
Whiteboard: [necko-backlog]
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P1
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: P1 → P3
I just want to point out that clearing the cache clears the ETags, contrary to some articles (e.g. https://www.ghacks.net/2017/12/09/a-solution-to-etag-tracking-in-firefox/) which point to the "Cookieless cookies" PoC and claim that Firefox keeps the ETags even when the cache is cleared.

The "Cookieless cookies" PoC actually cheats and is basing it on IP address rather than ETag anyway, as demonstrated in the following comments:
https://www.ghacks.net/2017/12/09/a-solution-to-etag-tracking-in-firefox/#comment-4305249
https://www.ghacks.net/2017/12/09/a-solution-to-etag-tracking-in-firefox/#comment-4306403
relnote-firefox: --- → ?
Flags: sec-bounty?
Flags: a11y-review+
tracking-fennec: --- → ?
Flags: in-testsuite?
Flags: a11y-review?
Flags: a11y-review+
tracking-fennec: ? → ---
relnote-firefox: ? → ---
Flags: sec-bounty?
Flags: in-testsuite?
Flags: a11y-review?
Flags: a11y-review+
Flags: a11y-review+
Duplicate of this bug: 1472119

In the latest nightlies a lot of work has been done to allow users to prevent tracking through a new privacy settings page.
about:config properties have been created to block third party cookies, prevent fingerprinting and control what is sent in the referer header. Maybe it is time to revisit this issue? It sounds like a logical next step to prevent tracking.

Here are two separate ideas how to handle this problem:

  1. Ignore the ETag and store the HTTP response's retrieval timestamp instead. Then for the subsequent requests use If-Modified-Since with that stored timestamp. While it's not semantically identical to ETag, seems close enough for the common real-life use cases.
  2. Replacing a relevant GET request (those containing ETag) with a HEAD request (not containing ETag) and then locally comparing the received ETag with the stored one to decide whenever subsequently issue a GET (again not containing ETag) to retrieve the requested resource, or use the cached instead.

Also, it might be useful to create a new switch on the privacy settings page, giving the user choice between one of the above options and the unmodified cache handling behavior (means sending the ETag), with a brief warning about the risk of cookieless/ETag tracking.

Status: NEW → RESOLVED
Closed: 7 months ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1590107

ehsan: did you mean to mark this bug as depending on bug 1590107 instead of being a duplicate of it?

(In reply to Marc Bejarano from comment #20)

ehsan: did you mean to mark this bug as depending on bug 1590107 instead of being a duplicate of it?

I did mean to resolve it as a duplicate: not in the sense that the bug report is a duplicate report (which it obviously isn't), but in the sense that the cross-site tracking vector that this bug discusses will be fixed once the HTTP cache partitioning solution being tracked in bug 1590107 is deployed. In that sense the cross-site tracking aspect of this bug will be "fixed by" that bug. We usually use the duplicate status to capture this in Bugzilla, but that's just a convention, and one could argue a dependency could capture the same meaning -- I'm generally ambivalent to the distinction; my main goal here was to communicate to you and others on this very old bug that finally something is happening in this space. :-)

Of course the same-site tracking vector that this bug discusses won't be fixed by bug 1590107, as it will still be possible for a third-party subresource to store a unique identifier in the ETag of a cached response and use it to track the users' browsing activity across one top-level origin. Same-site tracking isn't currently part of Mozilla's anti-tracking policy, and the work to make it impossible will be much broader than just address ETag-based tracking.

But it seems to me that they had been a lot of effort to reduce even same-site tracking, and I don’t see why this shouldn’t be a goal and then why we should not keep this bug open. Indeed, this bug depends on the new one for cross-site tracking, but it needs more to be fully fixed.

(In reply to :ehsan akhgari from comment #21)

I did mean to resolve it as a duplicate: not in the sense that the bug report is a duplicate report (which it obviously isn't), but in the sense that the cross-site tracking vector that this bug discusses will be fixed once the HTTP cache partitioning solution being tracked in bug 1590107 is deployed.

Okay. Let's mark it like it is, then.

Status: RESOLVED → REOPENED
Depends on: 1590107
Resolution: DUPLICATE → ---
Blocks: 1590107
No longer depends on: 1590107
You need to log in before you can comment on or make changes to this bug.