Last Comment Bug 321624 - cookie per-host limit doesn't include all cookies from a domain
: cookie per-host limit doesn't include all cookies from a domain
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Networking: Cookies (show other bugs)
: Trunk
: All All
: -- normal (vote)
: ---
Assigned To: dwitte@gmail.com
:
: Patrick McManus [:mcmanus]
Mentors:
Depends on: 344809
Blocks: 536230
  Show dependency treegraph
 
Reported: 2005-12-27 08:43 PST by Adam Greenberg
Modified: 2010-09-07 17:47 PDT (History)
9 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
patch v1 (66.85 KB, patch)
2009-11-19 17:22 PST, dwitte@gmail.com
no flags Details | Diff | Splinter Review
patch v2 (64.79 KB, patch)
2009-11-30 16:25 PST, dwitte@gmail.com
sdwilsh: review+
Details | Diff | Splinter Review
patch v3 (72.73 KB, patch)
2009-12-16 13:30 PST, dwitte@gmail.com
dwitte: review+
Details | Diff | Splinter Review

Description Adam Greenberg 2005-12-27 08:43:06 PST
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/20051111 Firefox/1.5
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/20051111 Firefox/1.5

This behavior appears in Firefox 1.5, 1.6a, and Mozilla 1.7.12. Suppose I have a website, a.foo.com.  I find that I can set 50 cookies to .foo.com.  If I then set 50 cookies to a.foo.com, they overwrite the first 50.  If I then set 50 cookies to  .foo.com, I find that I have 100 cookies.  How does this work?  How can I debug website cookie setting issues when Firefox does something like this?

I also find that Firefox will accept 50 cookies for a.foo.com, 50 for b.foo.com, and then 50 for .foo.com.  Could someone either fix Firefox to consistently handle cookies, or at least give me a definitive statement about the expected cookie behavior.

In the above tests, all the cookies get different names, all are persistent, and all get set to path "/".

Reproducible: Always

Steps to Reproduce:
1.  Set 50 cookies (all with different names, persistent, path = "/") to .foo.com.
2.  Set 50 cookies (different names from above) to a.foo.com.
3.  List cookies sent to a.foo.com.

Actual Results:  
I see the 50 cookies from step 2.

Expected Results:  
I expectedt to 100 cookies (from steps a and b) above.

If I reverse the order of steps a and b, I see all 100 cookies.
Comment 1 dwitte@gmail.com 2007-11-16 22:39:54 PST
so, the reason is that our implementation of the "limit number of cookies per host" feature is a little broken... we store all the cookies in a flat hash, keyed by host string. when a host (say a.foo.com) tries to set a cookie, we look up similar hosts in the hash, namely a.foo.com and foo.com, and count up the total number of cookies set by them. however, there's no way to look up subdomains of that, say b.a.foo.com, so those don't get counted.

so, let's say b.a.foo.com sets 50 cookies. when a.foo.com tries to set a cookie, the b.a.foo.com ones won't be included in the count, so it can set another 50. and then foo.com could set another 50.

it'd be nice to use the effective tld service to fix this (see https://bugzilla.mozilla.org/show_bug.cgi?id=385299#c4); then we could define a proper "per base domain" limit (i.e. 50 cookies for foo.com and all subdomains). as a web developer, would this make sense to you?

this fix would also involve either moving away from a flat hash structure, or (my preferred option) keeping a separate cache of cookie count, keyed by base domain, for checking this limit quickly and avoiding a full hash traversal.
Comment 2 dwitte@gmail.com 2007-11-26 12:12:08 PST
what IE does:
http://blogs.msdn.com/ie/archive/2007/08/29/update-to-internet-explorer-s-cookie-jar.aspx

"Note that IE’s cookie limit is applied on per-domain basis. If http://example.com sets 20 cookies, each with Domain=example.com, and http://subdomain.example.com also sets 20 cookies, each with Domain=subdomain.example.com, then 40 cookies will be sent on subsequent requests to subdomain.example.com."

so they don't even count subdomains/superdomains, they just look at how many cookies are set for the current host/domain.
Comment 3 Adam Greenberg 2008-01-03 09:57:05 PST
The IE approach offers a useful way for complex sites (those with multiple cookie setting hosts) to determine the cookie universe for any given host.  20 cookies for each (sub)domain is small, but the dominant IE position means we will have to design to that limitation anyway.  Providing a fixed 50 cookie limit spread over a top level domain and all subdomains makes site cookie management more complex without any obvious reward beyond limiting the browser resources committed to site cookie storage.
Comment 4 dwitte@gmail.com 2009-05-01 01:52:34 PDT
What we might end up doing here is have a per-host limit and then a larger limit on the eTLD+1. Not quite sure yet.
Comment 5 dwitte@gmail.com 2009-11-19 17:22:42 PST
Created attachment 413485 [details] [diff] [review]
patch v1

Changes the cookie hash to be keyed on base domain, and adds said column to the sqlite db. This makes all lookups go through a host -> base domain step first. While this is cheap enough to be insignificant for the handful of calls per pageload, it does involve an OS call, so we don't want to be doing it thousands of times on startup - hence the db column addition. (It kinda sucks to have a separate column for a value that is computed purely from another column, but there's not much else we can do here. Unless we implement our own cheap method of determining whether a given host is an IP address or not.)

Passes existing tests, I'll look at adding a few more tomorrow.

(We need this for bug 444600 in order to prevent DoS, and it also makes the semantics of countCookiesFromHost and getCookiesFromHost a bit saner.)
Comment 6 dwitte@gmail.com 2009-11-20 18:03:09 PST
Comment on attachment 413485 [details] [diff] [review]
patch v1

This failed some tests where we're dealing with host aliases like 'localhost'. The problem is that GetBaseDomain will return NS_ERROR_INSUFFICIENT_DOMAIN_LEVELS on such a string. So we need to decide what to do for host aliases. I think we want to treat them exactly like regular hostnames, i.e. allowing subdomains, computing their base domain, etc. So the rules for hosts will be:

1) Determine if the host is an IPv4 or IPv6 address, in any of their various representations. If it is, require exact string matches, and do not apply subdomain logic. We currently use PR_StringToNetAddr for this purpose, which we may want to rewrite as a platform-independent function for speed. (Note that bug 344809 already proposes doing this, for IPv4 addresses only, for purposes of eliminating cross-plat idiosyncrasies.)

2) Otherwise, we have a hostname. Try to determine the base domain. If this succeeds, great. If it fails because there are insufficient domain levels, then the host is either a TLD ('com', 'co.uk') or an alias ('localhost'). As the eTLD service stands, we can't distinguish the two, so we should just use the host directly as the key. (Setting a cookie for 'com' won't affect regular sites, since their cookie lookups will be done with a legit base domain; only lookups for 'com' itself will return them. So insufficient-domain-level keys will be isolated into their own little corner.)

I'm going to do some performance tests of PR_StringToNetAddr to see if we can do all this base domain work on startup, and avoid changing the table schema. If it's more than a few ms worth then we probably want it precomputed and stored.
Comment 7 dwitte@gmail.com 2009-11-23 17:02:28 PST
From my linux opt build on a single-core machine @ 2.13GHz:

Time to read 3000 cookies, current trunk: 14.4ms
Time to compute 3000 base domains (excluding PR_StringToNetAddr call): 3.5ms
Time for 3000 calls to PR_StringToNetAddr: 55.9ms

If we reimplement PR_StringToNetAddr, and it's cheap, then we're looking at a 3.5ms penalty on a 14.4ms read time for computing the base domains on startup. (Subtract from that the overhead inherent in adding a base domain column to the db, which is about 2ms.) So computing the base domain can be cheap. Let's do it that way and avoid the schema change.
Comment 8 dwitte@gmail.com 2009-11-30 16:25:12 PST
Created attachment 415292 [details] [diff] [review]
patch v2

As advertised, without schema change. With my patch in bug 344809, the cost of PR_StringToNetAddr is insignificant, and this will be plenty fast enough.
Comment 9 Shawn Wilsher :sdwilsh 2009-12-08 14:44:19 PST
Comment on attachment 415292 [details] [diff] [review]
patch v2

>+++ b/netwerk/cookie/src/nsCookieService.cpp
>@@ -67,16 +67,17 @@
> #include "prprf.h"
> #include "nsNetUtil.h"
> #include "nsNetCID.h"
> #include "nsAppDirectoryServiceDefs.h"
> #include "mozIStorageService.h"
> #include "mozStorageHelper.h"
> #include "nsIPrivateBrowsingService.h"
> #include "nsNetCID.h"
>+#include "prnetdb.h"
Add this with the rest of the nspr headers please.

>+  nsListIter(nsCookieEntry *aEntry, PRUint32 aIndex)
>    : entry(aEntry)
>+   , index(aIndex)
>+  {
>+  }
Can you add java-doc style comments for this please.  I think I understand what these are used for, but would like it to be documented.

>+  nsCookie* Cookie() const { return entry->GetCookies()[index]; }
nit: pointer style inconsistent re: method return type.
nit: your other methods are now always putting braces on a new line, so you should do the same here.
Can you add a debug assertion that ensures that the index is in range?

>+  PRUint32       index;
nit: nsCookieEntry::ArrayType::index_type (or appropriate typedef to reference the same)

>+  nsEnumerationData(PRInt64 aCurrentTime, PRInt64 aOldestTime)
>    : currentTime(aCurrentTime)
>    , oldestTime(aOldestTime)
nit: java-doc style comments please

>   return mDBState->dbConn->ExecuteSimpleSQL(NS_LITERAL_CSTRING(
>     "CREATE TABLE moz_cookies ("
>-    "id INTEGER PRIMARY KEY, name TEXT, value TEXT, host TEXT, path TEXT,"
>-    "expiry INTEGER, lastAccessed INTEGER, isSecure INTEGER, isHttpOnly INTEGER)"));
>+    "id INTEGER PRIMARY KEY, name TEXT, value TEXT, "
>+    "host TEXT, path TEXT, expiry INTEGER, lastAccessed INTEGER, "
>+    "isSecure INTEGER, isHttpOnly INTEGER)"));
Can you do this like we do in other code so it is easier to spot changes to the schema please?  See http://mxr.mozilla.org/mozilla-central/source/toolkit/components/downloads/src/nsDownloadManager.cpp#589

>+  const nsCookieEntry::ArrayType &cookies = aEntry->GetCookies();
>+  for (PRUint32 i = 0; i < cookies.Length(); ++i) {
nit: nsCookieEntry::ArrayType::index_type or appropriate typedef (plus other places you do this)

>@@ -933,53 +946,60 @@ nsCookieService::Add(const nsACString &a
>+  // get the base domain for the host URI.
>+  // e.g. for "www.bbc.co.uk", this would be "bbc.co.uk".
>+  nsCAutoString baseDomain;
>+  nsresult rv = GetBaseDomainFromHost(aDomain, baseDomain);
>+  if (NS_FAILED(rv))
>+    return rv;
switch this to NS_ENSURE_SUCCESS since it shouldn't really fail

>+nsCookieService::GetBaseDomain(nsIURI    *aHostURI,
>+                               nsCString &aBaseDomain,
>+                               PRBool    &aIsIPAddress)
>+  if (NS_FAILED(rv))
>+    return rv;
switch this to NS_ENSURE_SUCCESS since it shouldn't really fail at this point

>+// returns PR_TRUE if 'a' is equal to or a subdomain of 'b',
>+// assuming no leading or trailing dots are present.
>+static inline PRBool IsSubdomainOf(const nsCString &a, const nsCString &b)
s/PRBool/bool/, and then change to true/false in the method please
> NS_IMETHODIMP
> nsCookieService::CookieExists(nsICookie2 *aCookie,
>                               PRBool     *aFoundCookie)
>+  rv = GetBaseDomainFromHost(host, baseDomain);
>+  if (NS_FAILED(rv))
>+    return rv;
switch this to NS_ENSURE_SUCCESS since it shouldn't really fail

>+++ b/netwerk/cookie/src/nsCookieService.h
>+    nsresult                      GetBaseDomain(nsIURI *aHostURI, nsCString &aBaseDomain, PRBool &aIsIPAddress);
>+    nsresult                      GetBaseDomainFromHost(const nsACString &aHost, nsCString &aBaseDomain);
>+    PRBool                        SetCookieInternal(nsIURI *aHostURI, nsIChannel *aChannel, const nsCString& aBaseDomain, PRBool aIsIPAddress, nsDependentCString &aCookieHeader, PRInt64 aServerTime, PRBool aFromHttp);
>+    void                          AddInternal(const nsCString& aBaseDomain, nsCookie *aCookie, PRInt64 aCurrentTimeInUsec, nsIURI *aHostURI, const char *aCookieHeader, PRBool aFromHttp);
>+    void                          RemoveCookieFromList(const nsListIter &aIter);
>+    PRBool                        AddCookieToList(const nsCString& aBaseDomain, nsCookie *aCookie, PRBool aWriteToDB = PR_TRUE);
>+    PRBool                        IsForeign(const nsCString &aBaseDomain, PRBool aHostIsIPAddress, nsIURI *aFirstURI);
>+    PRUint32                      CheckPrefs(nsIURI *aHostURI, nsIChannel *aChannel, const nsCString &aBaseDomain, PRBool aIsIPAddress, const char *aCookieHeader);
>+    PRBool                        CheckDomain(nsCookieAttributes &aCookie, nsIURI *aHostURI, const nsCString &aBaseDomain, PRBool aIsIPAddress);
>+    PRBool                        FindCookie(const nsCString& aBaseDomain, const nsAFlatCString &aHost, const nsAFlatCString &aName, const nsAFlatCString &aPath, nsListIter &aIter, PRInt64 aCurrentTime);
>+    PRUint32                      CountCookiesFromHostInternal(const nsCString &aBaseDomain, nsEnumerationData &aData);
Can you add java-doc style comments for your new methods please?  This will make it easier for others to jump into this code too!
Also, s/PRBool/bool/ please for these.

I'd like to see a test for this new behavior (or point to one that already covers this.  r=sdwilsh with this fixed.
Comment 10 dwitte@gmail.com 2009-12-16 13:30:14 PST
Created attachment 417997 [details] [diff] [review]
patch v3

Adds some extra comments and a test for the new behavior. I'll save the PRBool->bool changes for a separate patch, since it's nice to decouple functional patches from stylistic changes for backporting purposes, and that kind of change really wants to be module-at-once.
Comment 11 dwitte@gmail.com 2009-12-16 13:31:26 PST
Waiting on bug 344809 to land this.

Note You need to log in before you can comment on or make changes to this bug.