Closed Bug 368255 Opened 17 years ago Closed 11 years ago

shouldn't send Google's cookie with SafeBrowsing API requests (sandbox it instead)

Categories

(Toolkit :: Safe Browsing, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Firefox 27
Tracking Status
blocking2.0 --- -
status2.0 --- wanted

People

(Reporter: bartml, Assigned: ckerschb)

References

()

Details

(Keywords: privacy)

Attachments

(2 files)

User-Agent:       Mozilla/5.0 (compatible; Konqueror/3.5) KHTML/3.5.5 (like Gecko)
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; pl; rv:1.8.1) Gecko/20061010 Firefox/2.0  (and later)

FF2 introduced anti-phishing protection feature. "Basic mode" is enabled by default. Browser connects regularly (approximately each half hour) to Google's server (there is even no GUI for choice here, see bug 342188). Google's cookie is sent with each update request. This has obvious privacy implications (e.g. Google is able to track usage of the browser by particular user, like when (which hours), how long and where (IP) user uses the browser).

Log given in URL was created using 2.0, but situation is identical in regard of cookie and antiphishing in 2.0.0.1 and 2.0.0.2 (i.e.
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.2pre) Gecko/20070125 BonEcho/2.0.0.2pre
) and (IIRC) Gran Paradiso.

Google's cookie is set automatically in first run, because default homepage in official builds is Google (well, mozilla.com site, but thanks to magic of redirections browser visits google.com domain anyway).


Reproducible: Always

Steps to Reproduce:
1. Download browser.
2. Run it.
3. PROFIT! (ok, this part is for Google Inc. only)
Actual Results:  
Google's cookie is set in first run and then sent with each request for update (each half hour approx.).

Expected Results:  
Cookie is not sent.
So you're actually reporting three problems, right?

1) You don't like the fact that Google hosts Firefox's start page
2) You don't like the fact that Safebrowsing requires downloading an updated list each time
3) You don't like Google sending cookies to you

You can change all of these through preferences. Change the home page, turn off safe browsing, block cookies from Google.

BTW. puting jokes inside 'steps to reproduce' doesn't really help with fixing the bugs.
Anyone to confirm this bug?
BartZilla's "steps to reproduce" haven't really help here.
This is intend behavior of the feature. WONTFIX for me. Furthermore there is a UI to stop this "privacy invasion". As stated in comment #1.
Keywords: privacy
Flags: blocking-firefox3?
We looked into sandboxing the connection before, and it was pretty complex as I recall. Not blocking as per the options in comment 1, and as per the privacy policy accompanying the SafeBrowsing service which explicitly prohibits Google from using the cookie information on SafeBrowsing API requests:

(cf: "Google will receive standard log information, including a cookie, as part of this process. Google will not associate the information that Phishing Protection logs with other personal information about you.")
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: blocking-firefox3? → blocking-firefox3-
Summary: sending Google's cookie with each request for update in default antiphishing mode → shouldn't send Google's cookie with SafeBrowsing API requests (sandbox it instead)
Re comment #1:
This bug is about sending cookie with each request for update, which is unnecessary and has privacy implications. And my "Steps to Reproduce" is not a joke -- nothing more is needed to reproduce this bug.

Re comment #2:
I've prepared a little HOWTO about this bug. See it here: http://bb.homelinux.org/en/firefox/howtobug368255.html
Feel free to ask if something is still unclear. Oh, and you may be also interested in this little project: http://bb.homelinux.org/firefox/sb/

Re comment #3:
Oh, sure, I know that this is intended behaviour. It was clear for me after reading comments in bug 345146. Nevertheless, sending cookie is not necessary and has serious privacy implications when you realize who is (default and only, despite GUI that suggests otherwise) "data provider" for antiphishing (Google).

Re comment #4:
Nice excuses, Mike, but actually it was not _that_ complex. I've created a patch that fixes this issue, against Firefox 2.0.0.11 sources (actually, it patches cleanly all versions of 2.0.0.x, but I tested it only with 2.0.0.11 and some earlier version...). After applying this patch Firefox doesn't send back cookie with requests for update in "basic antiphishing protection" mode (which is enabled by default), but it still sends cookie in "advanced antiphishing mode" and in "normal circumstances" like visting site in .google.com domain.

(Note that I've changed my e-mail in bugzilla preferences. It seems that Google took over mailboxes in "gazeta.pl" domain. (See eg. http://poczta.gazeta.pl/odnowa/ , in Polish) It is harder and harder with each day to keep private data away from Google's hands...)
(In reply to comment #5)
> Created an attachment (id=299788) [details]
> don't send cookie in "basic antiphishing" mode
...
> Re comment #4:
> Nice excuses, Mike, but actually it was not _that_ complex. I've created a
> patch that fixes this issue, against Firefox 2.0.0.11 sources (actually, it
> patches cleanly all versions of 2.0.0.x, but I tested it only with 2.0.0.11 and
> some earlier version...).

Bart, would you mind making this patch against CVS trunk (HEAD)? That would help it to move forward to review stage.
I don't know if this will have an impact on your patch when you update it to be against the trunk, but bear in mind that the remote lookup code (that sends each URL to Google for checking) has been disabled in nightlies, and by the time Fx 3 ships, the code (and the prefpane option) should be gone (as long as bug 388652 gets fixed, that is :)
Not sending a cookie would break the request back off code on the server side.  That is, if too many client requests come in for table updates, the server will tell some of them to stop asking for updates.

It's better to have most users get some data rather than no users getting any data because the servers are being DOSed.
Why does that require cookies?  Wouldn't responding to a random percentage of requests with "please back off" work just as well?
(In reply to comment #9)
> Why does that require cookies?  Wouldn't responding to a random percentage of
> requests with "please back off" work just as well?

Not easily.  An HTTP 5xx will trigger the client to retry after a minute, but if there are three 5xx responses in a row, the client will back off.  If the server wanted to tell a percentage to back off, it wouldn't work since the chance of the same client getting three 5xx in a row would be rare (this wound actually increase the number of client requests because of the retry).
Sounds like a dedicated "back off" signal (perhaps a specific 5xx status) should be added to the protocol, so it doesn't take three requests for a client to figure out that the server wants it to back off.  That would help even if we do keep cookies for some other reason.
Tony, if Google is not going to accept any patches fixing this issue, then how about changing Status of this bug to WONTFIX? At least it will be clear that you are not interested in any patches related with this.
Attachment #301207 - Flags: review?(tony)
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → WONTFIX
Comment on attachment 301207 [details] [diff] [review]
patch against trunk, as requested in comment #6

given that this breaks the backoff architecture, there's no way we can take this patch at this time.  It might be worth doing things differently, though I am not worried that google is misusing this data, but that's not going to happen in time for Firefox 3.
Attachment #301207 - Flags: review?(tony) → review-
Sorry, but IMO this is just another lame excuse.
Changes to "backoff architecture" are already there. Namely, FF3 implements new so-called "safebrowsing" protocol (v.2, http://code.google.com/p/google-safe-browsing/wiki/Protocolv2Spec). With each request for update server (Google) tells client (Firefox) the number of seconds before the client should contact the server again. See http://code.google.com/p/google-safe-browsing/wiki/Protocolv2Spec#3.5.2._Response_Body, "n:" parameter. Support for this parameter is already implemented, see eg. http://mxr.mozilla.org/firefox/source/toolkit/components/url-classifier/src/nsUrlClassifierDBService.cpp#2487
Status: RESOLVED → REOPENED
Flags: blocking-firefox3- → blocking-firefox3?
Resolution: WONTFIX → ---
While we could tell a particular connection to back off with the new protocol, there's other good reasons to leave the cookie in.

1. We can be more fair in how we tell clients to back off. E.g. if you've already been backed off before, we should probably back off someone else before we tell you to back off even longer.
2. Perhaps more importantly, we can use it to improve service quality. There's a lot of important questions that this helps us answer, e.g. "How long is it taking a client to get a full copy of the list", "How out of date are clients given our current strategy for throttling and rate at which we send the data"? "On average, how long do clients wait between connections and given that do we need to make changes to the way we serve data" etc.

When the client contacts the Google servers for an update, you're basically saying "Here's the version of the list I have, give me more data". A cookie is sent along with that, and that helps us understand aggregate usage patterns. By removing that cookie, you don't seem to gain much privacy (after all, you're not sending private data, you're just basically saying "I have the following version of the list, give me more data"), and yet it causes many complications for determining whether we're doing a good job of keeping clients up to date. It seems like a poor trade-off, especially given how much we've already done and continue to do to ensure user privacy. I think everyone, both at Google and on the Mozilla team, want to take every feasible step to ensure that users' privacy is maintained and respected, this proposal just doesn't seem to be that feasible in terms of the trade-off of lost data quality utility vs. practically zero privacy gain.
I don't think the problem here is that safebrowsing sends _a_ cookie, but instead that it sends the main google.com cookie (that ties the user to iGoogle, GMail, etc.). If safebrowsing had its own separate cookie and did not send any other google.com cookies along with it (basically, it was sandboxed), this wouldn't be a problem and nobody would be complaining about privacy issues.

Cookies are a normal part of the web, sure, but sending a cookie that could possibly be linked to an actual user is a problem.
Is it possible to use a different cookie? Probably, but really what would this accomplish? First of all, the privacy policies in place explicitly state what we can and can not do with the logging data, and what you are talking about is prohibited. But beyond that, what data are you really leaking? The only information the safebrowsing update requests leak is the fact that a client is online - not active, but rather connected, which for many computers is always. E.g. my computer has firefox always open, so yes, in theory if Google turned evil, violated privacy policies, betrayed user trust, and doomed its future as a company (which is not something we're going to do), someone could from the log data figure out that user ifette leaves Firefox open all the time. I fail to see how this is a privacy concern. I would be much more worried about my ISP logging every single HTTP request I make than I would be worried about Google knowing that I had a habit of leaving Firefox running.
What about gethash requests that could theoretically be used to trace what URLs a user is visiting? http://code.google.com/p/google-safe-browsing/wiki/Protocolv2Spec#3.7._HTTP_Request_for_Full-Length_Hashes

Sure, there are safeguards against such a thing (bug 419117), but this is just an example of data that Google could possibly get besides how long a computer has been turned on.
This will not block the final release of Firefox 3.
Flags: blocking-firefox3? → blocking-firefox3-
Can we use the new LOAD_ANONYMOUS flag that bug 389508 added (as per bug 313414, comment #16) for this?
(In reply to comment #20)
> Can we use the new LOAD_ANONYMOUS flag that bug 389508 added (as per bug
> 313414, comment #16) for this?

That would be no different than the existing patch in this bug (apart from it being simpler). It would prevent us from sending any cookie headers, which means that we lose the benefits from comment 15.

As far as I can tell, the only acceptable fix to this bug as summarized involves implementing a way to keep the cookies set/sent for the safebrowsing requests separate from the rest of the Google cookies despite the hosts being the same. This is possible using the existing http-on-modify-request/http-on-examine-response techniques, but there are of course a costs to doing that work (code complexity, maintenance burden, risk of breaking something). Whether that cost is outweighed by the benefits isn't immediately obvious to me, especially given the fact that if Google really was evil, they could use IP address correlation to do similar tracking (granted, it would be less reliable given NAT, and perhaps less efficient).
Though I'm not particularly worried about Google abusing this information, does sending a cookie significantly improve the QoS they're able to provide? As stated in comment #15, we're already sending the version of the list we have, so as far as I can tell all Google has to do is prioritize older versions of the list. If an inordinate amount of users who haven't used FF in a while request an update to their list, certainly other users may be told to back off unfairly often, but with the advantage that the users who need it most get the updated list immediately. (it wouldn't surprise me if they're already doing something like this, I just fail to see the significance of the cookie to them)
Version: unspecified → Trunk
Doug: how are we doing this (not sending the cookie) for Geolocation? Can you see a clear path to us doing the same thing here?
Flags: blocking-firefox3.6?
Sorry, wrong dougt - :dougt, see comment 23?
the geolocation protocol doesn't use http cookies.  The access token is sent in the data structure that we send to google.  I think their protocol was designed such that (a) you didn't need to worry about this problem, and (b) some device platforms do not have access to change the http cookies.

There was some discussion of creating a "chrome-only" cookie.  Such a cookie would allow firefox to talk to a service without passing cookies set by content.
Does the SafeBrowsing protocol require cookies?
According to the comments in this bug, it does not. Comment 8 - comment 11 indicate that providing a cookie allows the server to be more intelligent about backing off requests when the servers are under load. Comment 15 - comment 17 indicate that Google can obtain significant quality-of-service data from being able to associate clients with requests.
It would be good to know where that quality of service data is, so that the users who are contributing to it can also learn from it.
Sounds like all of these issues could be equally served with a cookie that is not the same as the user's Google cookie. The issue here is connecting SafeBrowsing activity with the personally identifiable information contained in the Google cookie (ie: the user's Google account).

Ian/Tony: would you agree?
A separate sandboxed cookie seems like it should work fine.
(In reply to comment #30)
> A separate sandboxed cookie seems like it should work fine.

BartZ - feel like trying your hand at this patch, while we get an answer to shaver's question in comment 28?
(In reply to comment #31)
> (In reply to comment #30)
> > A separate sandboxed cookie seems like it should work fine.
> 
> BartZ - feel like trying your hand at this patch, while we get an answer to
> shaver's question in comment 28?

Cookies are not necessary at all. Sending them is an unacceptable privacy violation (as is the whole Google's so-called "safebrowsing" enabled by default). For this reason (among others [1]) I am not going to write any patches that "sandbox" them (ie. that keeps separate set of Google cookies and sends them in "sb"-related requests) [0].

The only acceptable outcome for me is to disable so-called "safebrowsing" completely. The issue with cookies is actually only the tip of the privacy violation iceberg. (Apart from cookies, browser also sends another unique identifier with "sb"-related requests, namely wrkey; it is saved in urlclassifierkey3.txt in profile directory; it "survives" unchanged also when entering and leaving Private Browsing mode; visit URL [2] for demo. So explanations from Google in this bug that they really need cookie - only for QoS purposes - are, well, not very convincing, since they could use wrkey for this purpose.)

My initial report was written long time ago. Since that time the underlying protocol has changed significantly (in FF 3 / 3.5 there is actually completely another protocol implemented than in FF 2). In order to learn more about it, in more details, I implemented this protocol on a server side [2] - despite the prohibitive language from Google on the site with the spec. (To be clear: my project doesn't use/connect to Google's servers at all since it implements server side functionality itself. After reconfiguring the client side - ie. Firefox - the communication is between my server and a client, there are no connections with Google at all.)

I hope I will release source code of this project soon [3] and write extensive documentation and explanations. I'm writing this reply mainly to address Johnathan's request.

For now, let me summarize the issues with Google's so-called "safebrowsing" as is implemented and enabled _by default_ in Firefox (SINCE FF 3.0!):
- it is possible for a "safebrowsing" server to reliably differentiate between different users (browser's profiles, to be exact); cookie plays a big role in this (and is related with personal information if user uses any Google services), but - as I said above - there is also the issue with wrkey
- server may feed the clients (or chosen client(s)) with data that allows server-side to reliably "guess" (with very high probability) the visited URL (well, at least in some well-defined cases - but by "some cases" I mean "in quite many cases"); the list of URLs to monitor must be selected (it is not possible for a "safebrowsing" server to get the whole user's history [4]); the list may be updated rapidly, though
- "guessed" URLs may be easy connencted to the same user; also, server is notified immediately when a user visits given URL (ie. server gets the exact time of a visit)
- user may not be notified in any way when (if) sending of this notification happens
- it all works also when "Private Browsing" is enabled (cookies are "sandboxed" in this case, though - but that's the general feature of the Private Browsing).

You may check yourself these issues after reconfiguring Firefox as described in URL [2]. (I realize that this whole thing is not very user-friendly, so I guess this advice is for the people who know what they are doing. Read all info on this page and ask if something is unclear.)

In my opinion it is enough to disable Google's so-called "safebrowsing" - if Mozilla really cares about users' privacy, as it officially claims.


Footnotes:
[0] Please note that an initial title of this bug was "sending Google's cookie with each request for update in default antiphishing mode" and from the beginning I was thinking only about preventing sending cookies completely, not "sandboxing" them. Beltzner changed title of this bug to the current text in November 2007, see "History" link near the top of this page.

[1] Another reason is this: how are you going to manage this separate set of cookies? For example: will these "sandboxed" cookies be present in cookie manager? How are you going to differentiate them (from user point of view) between "normal" Google cookies and "sandboxed" set of cookies? How will these "sandboxed" cookies behave when dealing with Private Browsing / clearing private data / clearing cookies / clearing only cookies from .google.com / blocking only cookies from .google.com? Etc., etc... (BTW - the same set of issues already exists with Google's "geolocation" token which is, effectively, a cookie. See eg. bug 491759.)

[2] http://bb.homelinux.org/firefox/sb2/  (note that this is a different URL - and different project, since it implements completely another protocol - than mentioned in my comment #5)

[3] Probably under GNU Affero General Public License, version 3 or later (as published by the FSF).

[4] Unless all of the visited URLs happen to be monitored by the server, of course -- but such case is unlikely.
Trying to summarise your points, Bartłomiej:

*Cookies are completely unneeded.
*The MAC allows the server to track you
http://code.google.com/p/google-safe-browsing/wiki/Protocolv2Spec#4._MAC
*Since "Before blocking the site, Firefox will request a double-check to ensure that the reported site has not been removed from the list since your last update." the server can trace visits to any url it wants to.


Proposed actions to remove those threats:
*Remove cookies from safebrowsing requests.
*Add a maximum expiry time for the key, to ensure it gets rotated often enough.
*Ability to request disable the use of MAC in the client.
*Safebrowsing requests should be disabled on Private Browsing mode, using only the data already present.
*There should be a preference for what to do when an url is blocked: Ask the provider for confirmation, Block without asking, Update the safebrowsing db, Offer a dialog where the use can manually ask the provider (could be integrated in blockedSite.xhml)


I'd also like to see Firefox able to work with an already downloaded copy, but I have opened a different bug (Bug 511081) for that.
I sense a lot of anti-google sentiments, and I doubt that anything I say will change any of that. However, there are some factual errors that I would like to clear up here:

1. People have asked what the wrkey parameter is. We are not using SSL (at the time we published the protocol, using SSL would have been prohibitively expensive for us). As such, we establish a shared secret via SSL once (the "wrapped key", or wrkey), and then use that to authenticate messages (updates etc) so that a MITM can't inject false data into your SafeBrowsing database.

2. Our infrastructure for DoS prevention uses cookies, among other things. Trying to re-write shared google-wide infrastructure to use "wrkey" instead of a cookie is not a pleasing thought.

3. The data in the local database is not enough to determine whether or not to block a site. "Block without asking, ask for confirmation, update the db" is not a meaningful choice. For each URL that is on the list, we take a SHA-256 hash (e.g. a 256-bit hash). To conserve bandwidth (and also keep the database at a reasonable size), we send down only the first 32-bits of the hash to clients. When you see a match in the local database, the question is not "Is this site still listed or do I have old data?", the question is "Is this the actual URL that is listed, or is it just a hash collision on 32-bits?"

4. We retain the original logs for only two weeks. We are not trying to spy on users, build a profile of all the URLs that you have visted, etc. We have determined that two weeks of logs gives us enough data to monitor the quality of our service and get the metrics we need - after that two weeks, the logs are dropped, and all we retain are statistics that we've aggregated, e.g. how many users did we have on day X, how many users were how out of date on day X, how many QPS did we have at peak, how much bandwidth, etc. Nothing about individual users. 

5. We're not getting every URL you visit. We're only getting URLs that match the (partial) data in the local database, so we could not build up a complete profile of all that you've viewed anyways.

If you don't trust google, I doubt anything I've said is going to convince you. I would love to invite you over to my desk and show you that the logs are in fact dropped after two weeks, that we're really not spying on you, etc, however that is not feasible for me to do. So, if you don't trust what I'm telling you, you're welcome to turn the feature off, or to use an alternate provider (although I don't know of anyone else giving away comparable data for free to whomever wants to use it.) If we ever were to violate user's trust or privacy, I would fully expect Mike or Jonathan to disable SafeBrowsing - but believe it or not, we're not trying to spy on you, rather we're trying to keep users (all users, not just FF users) safe from phishing and malware. In order to do that, we need to make sure that we are providing a quality, reliable service. Part of that is making sure that our infrastructure can withstand denial of service attacks, part of this is monitoring how the service itself works (are users getting updated frequently enough? what's our hash collision rate? how long does it take for a new user to get all the data? etc.) For these purposes, having cookie data is very useful. We're not trying to track you or tie this information to your Google account etc, and so a sandboxed cookie works fine, but having no cookie compromises our abilities in this area and so is not a good solution.
Depends on: 511933
Dan: I don't think that this is in scope for 1.9.2, but please renominate if you think we can do it. If we don't get this for 1.9.2, we definitely want it for 1.9.3
blocking2.0: --- → ?
Flags: wanted-firefox3.6+
Flags: blocking-firefox3.6?
Flags: blocking-firefox3.6-
If the experiment <A HREF=http://bb.homelinux.org/firefox/sb2/"> here </A> presented by BartZilla in Comment 5 were successful, then perhaps a simple way to disable safebrowsing would be to set the following values in about:config to null:

browser.safebrowsing.malware.reportURL
browser.safebrowsing.provider.0.gethashURL
browser.safebrowsing.provider.0.keyURL
browser.safebrowsing.provider.0.lookupURL
browser.safebrowsing.provider.0.reportErrorURL
browser.safebrowsing.provider.0.reportGenericURL
browser.safebrowsing.provider.0.reportMalwareErrorURL
browser.safebrowsing.provider.0.reportMalwareURL
browser.safebrowsing.provider.0.reportPhishURL
browser.safebrowsing.provider.0.reportURL
browser.safebrowsing.provider.0.updateURL
browser.safebrowsing.warning.infoURL

I have done this without any stability problems so far (I will re-post if this changes). What do you guys think about this workaround?
If you want to disable safebrowsing, you don't have to go through all that trouble - there's a checkbox in the preferences window under "Security" ("Block reported attack site/web forgeries"). That isn't what this bug is about.
(In reply to comment #35)
> Dan: I don't think that this is in scope for 1.9.2, but please renominate if
> you think we can do it. If we don't get this for 1.9.2, we definitely want it
> for 1.9.3

I agree that we want it - not sure we can block on it. Marking wanted for now.
blocking2.0: ? → -
status2.0: --- → wanted
Mmm, OK. I just realized I never replied to Mike's comment. There are a couple ways we could do this for 1.9.3. I'd probably go with a double-keying approach (surprise!), which can be independent of the third-party work. We can make it a session-only thing to avoid changes to the db schema (and thus more risk). This would mean the safebrowsing cookies get reset on browser close. Ian, would that be a big deal?

Also, Ian, if things have changed wrt using the wrkey instead of cookies, please let us know. Disabling cookies for safebrowsing transactions would obviously be a simpler solution on our side.
We should isolate this cookie from the rest of the Firefox cookie store so it's only sent with Safe Browsing traffic.  I *think* this will provide the DoS detection Google wants without tying this cookie to users' regular browsing traffic.  Ian, please let me know if I've got this wrong.

There's other places where this isolation is useful too, so we should put in some plumbing to enable multiple cookie jars, which can be used for the PREF cookie isolation here:

https://wiki.mozilla.org/Privacy/Features/Multiple_Cookie_Jars

On another note, I think if a user disables third party cookies the cookie in question won't be sent with Safe Browsing traffic... but I haven't confirmed this is true.
Sid,

As I stated in comments 30 and 34, I am fine with isolating it in a separate cookie jar.
(In reply to Sid Stamm [:geekboy] from comment #40)
> We should isolate this cookie from the rest of the Firefox cookie store so
> it's only sent with Safe Browsing traffic.  I *think* this will provide the
> DoS detection Google wants without tying this cookie to users' regular
> browsing traffic.  Ian, please let me know if I've got this wrong.
> 
> There's other places where this isolation is useful too, so we should put in
> some plumbing to enable multiple cookie jars, which can be used for the PREF
> cookie isolation here:
> 
> https://wiki.mozilla.org/Privacy/Features/Multiple_Cookie_Jars
> 
> On another note, I think if a user disables third party cookies the cookie
> in question won't be sent with Safe Browsing traffic... but I haven't
> confirmed this is true.


Josh is adding things to the cookie service which may help with this.
I think this is a grossly underestimated privacy disaster, and it isn't because I don't trust Google. I would be extremely surprised if Google were foolish enough to make the business decision to abuse this data.

The problem is, the existence of this functionality in Firefox means that 3rd parties who can control Google through legal (or other) means have access to a list of every IP address a user connects from! (And, of course, such a list can be used to establish a user's presence at a given time and place to a degree of certainty which is considered useful for many applications.)

Google could be served with a legal demand such as an NSL requiring them to record this data for much longer (and to not tell anyone), or to hand it over in real time to someone else who retains is. There is no way for Mozilla to be sure this isn't happening. I doubt Ian Fette or anyone else at Google can say they are absolutely certain it isn't happening.

If Mozilla is going to keep SafeBrowsing enabled by default, they should refuse to send any unique identifiers with these automatic connections!
This bug is tracking the implementation of a separate cookie jar for safebrowsing. Discussion of the underlying principles should be moved to mozilla.privacy.
Assignee: nobody → sarentz
I will be doing some work on this bug in the coming week. Working together with Sid and Jonas to move this forward.
Status: REOPENED → ASSIGNED
... in the coming weekS ... :-)
I am not finding much time to actually work on this unfortunately.
Assignee: sarentz → nobody
We'll probably need to hack up the connection here to dump cookies into a "safebrowsing" jar:
http://mxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/nsUrlClassifierStreamUpdater.cpp#96

Jason, I see you were involved with implementing jars for apps.  Is there an easy way to say to necko, "hey, when you make this connection, put cookies in jar x"?
Flags: needinfo?(jduell.mcbugs)
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #42)
> Josh is adding things to the cookie service which may help with this.

Ehsan, are you referring to bug 722850?  I'm trying to identify how to bind a channel/connection to a specific "jar".
Flags: needinfo?(ehsan)
(In reply to comment #49)
> (In reply to :Ehsan Akhgari (needinfo? me!) from comment #42)
> > Josh is adding things to the cookie service which may help with this.
> 
> Ehsan, are you referring to bug 722850?  I'm trying to identify how to bind a
> channel/connection to a specific "jar".

Yes, although that's not exactly an implementation of general purpose cookie jars.
Currently cookie jars are chosen based on app id (derived from the loadgroup/notification callbacks on the channel, from nsILoadContext) and private browsing status. I suppose we could synthesize a load context which has an app id, since nothing else in desktop/mobile firefox makes use of that to my knowledge.
Thanks Josh.  Eventually we want to have general purpose cookie jars; off the top of your head can you think of an easy way to create an arbitrary load context that causes all cookies and localstorage to get dumped into a given jar?  I'm thinking maybe we could make a note in the principal that for non-apps, there may be a "cookie jar" involved...
Sid: it's trivial to create nsILoadContext-looking object from JS:

  http://mxr.mozilla.org/mozilla-central/source/netwerk/test/unit/head_channels.js#205

Doing this from C++, I think you want to tweak the stub "LoadContext" class we created for e10s use and give it a new constructor that just takes the appId, etc members directly instead of from a SeralizedLoadContext  Have it default to setting mIsNotNull=true and all the other params (except AppID) to false/null.  And change the comment here to indicate that the class is going to be used for more than just e10s: 

  http://mxr.mozilla.org/mozilla-central/source/docshell/base/LoadContext.h#20

Just set the LoadContext as the callbacks for the channel, and it'll get whatever appID you pass in and use the appropriate cookie jar (there's no special logic for creating the jar: we just create a new one when we see a new appID).

The interesting question is what appId to pass in, and how to make sure it doesn't conflict with any B2G apps (I assume at some point we'll do safebrowsing requests in B2G if we don't already).  You definitely do *not* want to use NECKO_NO_APP_ID: that's the default namespace for desktop.  And you don't want NECKO_UNKNOWN_APP_ID, since we assert when we see it in many places.  

I think we need to add some appID constants for uses like this: the best place for them is probably here, i.e add nsIScriptSecurityManager:SAFEBROWSING_APP_ID

   http://mxr.mozilla.org/mozilla-central/source/caps/idl/nsIScriptSecurityManager.idl#228

And we may want to set aside a range of values just below UINT32_MAX and guarantee that B2G never tries to use those for apps.  I propose we keep values above UINT32_MAX - 100 as that range (unless someone thinks it needs to be bigger). 

(Note: necko currently duplicates the NO_APP and UNKNOWN_APP definitions here:

  http://mxr.mozilla.org/mozilla-central/source/netwerk/base/public/nsNetUtil.h#1308

but I don't think we need to add the SAFEBROWSING_APPID there, and we should probably just purge the dupes at some point).
Flags: needinfo?(jduell.mcbugs)
Justin

Looks like we'll need to reserve some range of AppIds for uses like Safebrowsing (see last half of comment 53).  So we need to make sure B2G never tries to assign that range of Ids to apps.  Do you know where that code lives?
Flags: needinfo?(justin.lebar+bug)
jlebar is asking me why we need a cookie jar for this, and I don't have a great answer.  I've scanned through the comments here, and it seems like the cookie jar would

1) not pass user's regular google.com cookies as part of the safebrowsing request. (But presumably if user does any google.com traffic shortly before/after it wouldn't be hard to correlate their IP with the safebrowsing request?)

This seems like at best a mild improvement.  Am I right in thinking that we'd get slightly better privacy if we also periodically deleted the cookie, so it doesn't provide a track of all the IP addresses (aka locations) of the user over time?  (Of course if correlating with google.com traffic provides the user's regular google.com identity, then it would easy to reconstruct the location too.  But at least this requires more effort).  It's not clear to me how often Google's safebrowsing DoS infrastructure could handle us deleting the cookie (honestly I'm confused as to why they really need the cookie in the 1st place).

It still seems like a pretty thin privacy win overall, but easy enough to do.

I wonder if cookies are considered 'communications metadata' (since they are headers, not HTTP body) by our friendly overlords at the NSA?  That's a scary thought.

Anyway, the place to change the B2G code to make sure it doesn't hand out the reserved range of ID's I'm proposing is here:

   http://mxr.mozilla.org/mozilla-central/source/dom/apps/src/Webapps.jsm#2097

This code is :fabrice's, so ask him for review.
Flags: needinfo?(justin.lebar+bug)
Flags: needinfo?(ehsan)
nsCookieService::RemoveCookiesForApp would be the easiest and best way to get purge safebrowsing cookies if we go down that route.

Also, as comment 32 points out, app's cookies don't show up in the cookie manager.  I'm not sure how to deal with that in general, but we ought to at least delete the cookie when the user asks to clear cookies (and/or recent history).  I assume the place to do that is in the Firefox code somewhere.
(In reply to Jason Duell (:jduell) from comment #55)
> jlebar is asking me why we need a cookie jar for this, and I don't have a
> great answer.

Goal = Isolate the safebrowsing traffic from my regular traffic.  If we don't, and someone blocks third party cookies (and never visits google.com) they get a google.com cookie.  This confuses them and also leads to interesting situations when blocking third party cookies from "non-visited" sites.

It's a background service.  I argue that we should separate it from (foreground) browsing.

>  I've scanned through the comments here, and it seems like the
> cookie jar would
> 
> 1) not pass user's regular google.com cookies as part of the safebrowsing
> request. (But presumably if user does any google.com traffic shortly
> before/after it wouldn't be hard to correlate their IP with the safebrowsing
> request?)

Correct.  This is most beneficial for folks who don't frequently visit Google.  Your auth cookies and others won't be sent to safebrowsing (just the solo cookie set by safebrowsing).

> This seems like at best a mild improvement.  Am I right in thinking that
> we'd get slightly better privacy if we also periodically deleted the cookie,
> so it doesn't provide a track of all the IP addresses (aka locations) of the
> user over time?  

Yes, we could do this too.  Google says they need the cookie to provide the service. Easiest thing for us is to sandbox the safebrowsing connections.  Second easiest is to periodically delete the cookie.  I think we can do both, but have to agree on the deletion period with Google.
Depends on: 897516
Since https://bugzilla.mozilla.org/show_bug.cgi?id=897516 has landed, I assume this one should be WONTFIX?
Sounds like FIXED to me?
You're right, looks like the idea of omitting the cookies completely for sb requests was abandoned.
(In reply to Adam Moore from comment #60)
> You're right, looks like the idea of omitting the cookies completely for sb
> requests was abandoned.

Not so much abandoned as it was ruled out since cookies are required to use the service.

Yeah, gavin, I agree this is fixed by bug 897516 (sb is "sandboxed" from the regular web state as requested in the bug summary).  We could periodically delete the cookie too (> 2 weeks apart) but I think that's a follow up bug if we want to do that.
Status: ASSIGNED → RESOLVED
Closed: 16 years ago11 years ago
Resolution: --- → FIXED
(In reply to Sid Stamm [:geekboy or :sstamm] from comment #61)
> (In reply to Adam Moore from comment #60)
> > You're right, looks like the idea of omitting the cookies completely for sb
> > requests was abandoned.
> 
> Not so much abandoned as it was ruled out since cookies are required to use
> the service.
> 
> Yeah, gavin, I agree this is fixed by bug 897516 (sb is "sandboxed" from the
> regular web state as requested in the bug summary).  We could periodically
> delete the cookie too (> 2 weeks apart) but I think that's a follow up bug
> if we want to do that.

Please, no follow up bugs that are specific to expiring Google cookies :) There's bug 844623 for expiring third-party cookies and https://lh5.googleusercontent.com/X2GaHWRPj3zd3KzG1IUnWbTUaDAEbHvXaD6E5m8ZgdZaMu91GJ5tMxNiVTU7V4wTgAV1RBTcIPhU7lWaxr8A5NOYFRjRjkJHPriXvVXzIQ9tshIs7STz8E8FjQ (from http://monica-at-mozilla.blogspot.com/2013/10/cookie-counting.html) makes it clear that we need stricter expiration times for cookies in general.
Assignee: nobody → mozilla
Target Milestone: --- → Firefox 27
Product: Firefox → Toolkit
(In reply to Ian Fette (Google) from comment #34)
> I sense a lot of anti-google sentiments, and I doubt that anything I say
> will change any of that. However, there are some factual errors that I would
> like to clear up here:

> 2. Our infrastructure for DoS prevention uses cookies, among other things.
> Trying to re-write shared google-wide infrastructure to use "wrkey" instead
> of a cookie is not a pleasing thought.

I still have not seen any explanation why the questionable cookie has to be
against "google.com". For DOS prevention and QOS control any other or more
specific domain would work trivially?

Also not convinced why the cookie needs to identify every single user perfectly
uniquely and why expiry needs to be that long.

> 3. The data in the local database is not enough to determine whether or not
> to block a site. "Block without asking, ask for confirmation, update the db"
> is not a meaningful choice. For each URL that is on the list, we take a
> SHA-256 hash (e.g. a 256-bit hash). To conserve bandwidth (and also keep the
> database at a reasonable size), we send down only the first 32-bits of the
> hash to clients. When you see a match in the local database, the question is
> not "Is this site still listed or do I have old data?", the question is "Is
> this the actual URL that is listed, or is it just a hash collision on
> 32-bits?"


 
> 5. We're not getting every URL you visit. We're only getting URLs that match
> the (partial) data in the local database, so we could not build up a
> complete profile of all that you've viewed anyways.

if I am understanding your points 3. and 5. correctly it would be very easy for 
you to get a list of interesting URLs which a particular user is visiting. It is 
sufficient to make sure that his local database contains hash collisions for every 
interesting URL to cause the browser to check those URLs remotely.

> If you don't trust google, I doubt anything I've said is going to convince
> you. 

It's not just google, it makes spying much easier for anyone sniffing traffic 
somewhere in the middle of the connection.
(In reply to Richard Z. from comment #63)

> It's not just google, it makes spying much easier for anyone sniffing
> traffic somewhere in the middle of the connection.

Turned out to be a valid concern. NSA uses the PREF cookie to track users.
http://www.washingtonpost.com/blogs/the-switch/wp/2013/12/10/nsa-uses-google-cookies-to-pinpoint-targets-for-hacking/
Apologies  for bug spam, but it is worth noting this bug was marked resolved 2013-10-24 comment 61 with comment 62 as a final observation. 

I suspect it is rather pointless making comments here now in a closed bug. 
If you have any further concerns it may be best to write a new bug report.
You need to log in before you can comment on or make changes to this bug.