Closed Bug 444600 (cookiemonster2) Opened 16 years ago Closed 15 years ago

Cookies go missing after a few days

Categories

(Core :: Networking: Cookies, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: gerv, Assigned: dwitte)

Details

(Keywords: fixed1.9.1, verified1.9.0.7)

Attachments

(2 files, 3 obsolete files)

This bug report is sadly somewhat vague, but was prompted by talk of similar problems in m.d.a.firefox.

I keep losing my cookies for some sites. Facebook, Google, and Barclays Bank's "remember account number" cookie are three examples. It doesn't seem to happen on other sites. 

I'm not sure when it started; it could well have been when I switched from Firefox 2 to the Firefox 3 betas. 

It's hard to give steps to reproduce, because it only happens occasionally - perhaps one instance a week. I don't have enough data to know if it happens for each site independently, or all of them at the same time. If I see it again, I will check.

I have network.cookie.lifetimePolicy set to 0 and network.cookie.lifetime.days set to 90. network.cookie.cookieBehavior is 0.

What can I do to diagnose this further? (Unfortunately, I am about to start travelling, and will be switching machines, which may complicate matters.)

Gerv
dwitte, did we make any changes here that might cause this?
I've been experiencing something similar, and just bumped my network.cookie.maxNumber pref to 10000 (from the builtin default of 1000) and my problems seem to have gotten much less frequent.

This leads me to suspect that the eviction algorithm isn't prioritizing sites I visit most often over sites I visit once, or something like that.
See also bug 403372, the "cookie monster" bug.
(In reply to comment #1)
> dwitte, did we make any changes here that might cause this?

there are certainly changes that could cause this, yes - see the cookie monster bug for details. if we're still having problems like this it's possible there's either a bug or more tweaking to do in the eviction algorithm. unfortunately i don't have time to diagnose at the moment since i'm knee deep in thesis work, but i can try to point someone in the right direction...
Given that in practice cookies are limited to 4k, and that heavy browsing racks them up pretty quickly, 1000 seems like a fairly small number for the global default. Have we considered raising it?

Of course, that would only mask the problem - we need to find out what's going on here too.

Gerv
Someone should probably start logging cookies, and when they see this happen, get the log posted.  This is how we tracked down the cookie monster bug.
Shawn's comment was only missing the cookie logging URL :-)
http://www.mozilla.org/projects/netlib/cookies/cookie-log.html
chatted with dolske and sdwilsh about this yesterday... there are two likely causes:

1) people's browsing habits are simply churning too many cookies to make 1,000 a suitable limit.

2) there's some other bug in the eviction algo.

with "heavy browsing" (lots of unique sites in a short amount of time) we could easily be hitting the limit within a few days, so anything you visited before then would get evicted. this is easily verified two ways: i) install sqlite manager (https://addons.mozilla.org/en-US/firefox/addon/5817), open cookies.sqlite, go to "browse and search", sort by the lastAccessed column, and post the lowest & highest values here. they're in microseconds, so take the difference between them and divide by 1000000*3600*24 to get days. ii) email me your sqlite file (and a voucher for a starvin' student steak dinner) and i'll do the above.

i suspect we're just hitting the limit, so we could consider bumping it again. if we went to 10k, typical mem usage would be 1 - 2Mb and pathological max size would be 50Mb, so we want to be careful doing this. or we could implement a different limit system, based on days or Mb or all three.
When we checked yesterday, the oldest of my 1000 cookies was 3 days old...

$ sqlite3 cookies.sqlite
sqlite> select lastAccessed from moz_cookies order by lastAccessed asc limit 1;
1218041035524059
sqlite> .quit
# (drop last 6 digits to convert to unix epoch)
$ perl -e 'print scalar localtime(1218041035) . "\n"'
Wed Aug  6 09:43:55 2008

I've bumped up network.cookie.maxNumber to 3000, and will see what happens. (My cookies.sqlite is currently 200K.)
FWIW, with maxNumber at 10000, the lastAccessed range is 5 days. Since I only visit my bank site every week or so, this is definitely a problem.
oddly enough, though, I only 1571 cookies in the database... perhaps cookie eviction is not the problem I am experiencing... the symptoms for me are "I get logged out of wiki.mozilla.org, developer.mozilla.org, and my bank and the wikis don't even remember my login".
I'm suffering basically the same symptoms as #10.

With the default maxcookies of 1000 - 4 days of history, but just 737 cookies in the table!

sqlite> select lastAccessed from moz_cookies order by lastAccessed asc limit 1;
1218123054043666

sqlite> select count(*) from moz_cookies;
737
perl -e 'print scalar localtime(1218123054) . "\n"'
Thu Aug  7 11:30:54 2008

(In reply to comment #12)
> With the default maxcookies of 1000 - 4 days of history, but just 737 cookies
> in the table!

that's not unusual - only persistent cookies are stored in the db, and even counting session + persistent you'll still have < 1000 cookies at all times (it's a hard limit). anywhere from 900 - 1000 total cookies would be reasonably expected, and a few hundred of those will be session.

(In reply to comment #11)
> oddly enough, though, I only 1571 cookies in the database... perhaps cookie
> eviction is not the problem I am experiencing... the symptoms for me are "I get
> logged out of wiki.mozilla.org, developer.mozilla.org, and my bank and the
> wikis don't even remember my login".

1571 / 10000 is indeed odd! how long have you been running with a 10k limit? do the oldest lastAccessed values sound correct (did you visit those sites 5 days ago)? how many cookies do you have from your bank's domain? (go into cookie manager and use the search field with your bank's base domain - don't count from the db directly.)

can you send me your sqlite file so i can play with it? i might be able to tell something from it, but logging would also help here (http://developer.mozilla.org/en/docs/Creating_a_Cookie_Log). can you turn that on and then send me the log as soon as you notice a cookie missing?
I just got back from holiday on Tuesday, and have been using my browser the normal amount for the last three days. (Except that I did access 800-odd Bugzilla sites yesterday as part of a link check, which is abnormal.)

sqlite> select lastAccessed from moz_cookies order by lastAccessed asc limit 1;
1219318737627499
sqlite> .quit
gerv@otter:default.djv$ perl -e 'print scalar localtime(1219318737) . "\n"'
Thu Aug 21 12:38:57 2008

So currently, my cookies.sqlite file apparently has 26 hours of cookies in it. Apparently, I have 569 cookies stored.

network.cookie.maxNumber is default.

If this seems odd, I have the cookie logging turned on, and can supply a log privately.

Gerv
The results of the following query would also be useful:
PRAGMA integrity_check
Shawn: result is "ok".

Gerv
Gerv, can you mail me the cookie log?

If you visited 800 sites in a day, with a cap of 1000 cookies its entirely possible that you would have received enough cookies to evict pretty much anything older than a day, but looking at the cookie log would tell the tale.
please mail me a copy too; i'll look if i get a chance.
Dan, Mike: mail sent.

Gerv
Alias: cookiemonster2
Attached patch wip patch (obsolete) — Splinter Review
ok, here's a stab at this...

the basic problem is that people browse enough these days to burn through a 1,000 cookie limit in just a couple days. but upping the limit to 10,000 cookies for everyone isn't that smart - expressing it in terms of a time limit makes more sense (e.g. 30 days).

this patch does a couple things:
1) adds a network.cookie.purgeAge pref, that defaults to 30 days.
2) purges cookies when *both*:
   a) the number of cookies exceeds network.cookie.maxNumber (default 1000) and
   b) there are cookies older than network.cookie.purgeAge.

this means there can be cookies older than the purge age if the max number hasn't been hit yet, and (the important case for this bug) there can be greater than the max number if cookies are younger than the purge age.

it also does something perf-related; currently we evict individual cookies as new ones come in and we find there's no space for them. this turns out to be inefficient, since it's a typical case to bump into the cookie limit and then purge one by one. this patch changes the bit-by-bit eviction logic into more of a mass purge - when we exceed both the number and age limits by 10%, we purge until we're back down to the age limit or number limit. this should reduce eviction-related hash traversals to a few per day, rather than potentially many per pageload.
Assignee: nobody → dwitte
Status: NEW → ASSIGNED
we should do this for 1.9.2; for 1.9.1 we could just bump the cookie limit to 2000-3000.

may also want to think about only partially reading the cookie db into memory, since if it gets real big (as it may do with this patch) it's silly to have the whole thing sitting in RAM.
Thanks a whole bunch to Dan for pointing me at this bug (and poring through my logfiles); it's been driving me crazy.  I just set my maxNumber to 10000.

Thinking about it... what, exactly, is the point of cookie eviction these days?  With the DB in memory, more cookies do take up more memory. But memory itself is virtual, so who cares?  Is it a backstop against some pathological case of non-expiring cookies?  Does the db slow down significantly with size, or cause excessive paging if it has to skip over all the now-useless cookies?
fyi, mconnor made a proposal about this a while back: https://wiki.mozilla.org/User:Mconnor/CookieExpiration
(In reply to comment #22)

i'll answer your last questions first:

> Is it a backstop against some pathological case of
> non-expiring cookies?

more a limit against accumulating cruft.

> Does the db slow down significantly with size, or cause
> excessive paging if it has to skip over all the now-useless cookies?

not really; the hash is O(1) in the absence of collisions, but reading in from the db at startup is of course O(n) (and removing expired cookies before that is O(nlogn)). and various operations scale with the max number of cookies per host, but that's 50 and we're not changing it here. eviction operations are (almost) O(n) and this patch changes them to be far less frequent.

>  With the DB in memory, more cookies do take up more memory. But memory itself
> is virtual, so who cares?

the cookie hash entry table will itself always be pinned in memory (given that it's a hash, perhaps unused parts of it can be swapped out - however, hashes tend to distribute things evenly, so it's likely that most of it will be pinned). the actual cookie data won't, but again, it'll be randomly interspersed with other data allocated over the course of your browser session, may or may not be swappable, and will contribute to fragmentation.

regardless, we strongly care about memory usage. for a 10k table size, the theoretical max memory usage of the cookieservice is 50Mb - but in practice, thanks to the weak law of large numbers, it'll be real close to 2Mb. that's still significant, so if there's a better way to limit the table size, it's worth investigating.

also, performance of things like the cookiemanager UI will fall down the tubes if you actually use it with 10k cookies. we'll need some batching love there at some point...
Attached patch patch v1 (obsolete) — Splinter Review
final patch - see comment 20 for basic idea. a few extra notes:

1) also bumps total cookie limit to 2000, from 1000
2) includes mochitest for eviction behavior (and removes obsolete cpp test)
3) adds a "batch-deleted" notification to notify consumers in one hit about purged cookies (notifying one-by-one about purged cookies is extremely slow). extensions that care about this will need to be updated. i'm debating whether to change the "added" and "deleted" notifications to always provide an nsIArray (perhaps of 1 element) of relevant cookies, or just add "batch-deleted" for this case where we need it. doing the former would break all the consumers that use currently use it, but that has the advantage of informing them of the otherwise silent change. opinions would be appreciated.

i've done quite a bit of perf testing on this, and as it stands it's pretty quick - likely a few milliseconds for an entire purge. the sqlite operations dominate (hash operations are <1ms). the overhead of doing fancy things here, like making one big DELETE statement and pushing it onto another thread using ExecuteAsync, actually hurts perf under the test conditions i used - but we can revisit that when we look at making all the cookie db operations async.
Attachment #352524 - Attachment is obsolete: true
Attachment #356372 - Flags: review?(sdwilsh)
simple patch to bump the limit to 3000 for the 1.9.0 branch. we should probably do this in the meantime...
Attachment #356374 - Flags: superreview?
Attachment #356374 - Flags: review?
Attachment #356374 - Flags: superreview?(mconnor)
Attachment #356374 - Flags: superreview?
Attachment #356374 - Flags: review?(mconnor)
Attachment #356374 - Flags: review?
Comment on attachment 356374 [details] [diff] [review]
bump limit to 3000 for 1.9.0 branch (checked in)

I think we want this for 1.9.1 as well, unless the other patch is going to be ready...
Attachment #356374 - Flags: superreview?(mconnor)
Attachment #356374 - Flags: superreview+
Attachment #356374 - Flags: review?(mconnor)
Attachment #356374 - Flags: review+
Attachment #356374 - Flags: approval1.9.1?
Attachment #356374 - Flags: approval1.9.0.7?
Comment on attachment 356374 [details] [diff] [review]
bump limit to 3000 for 1.9.0 branch (checked in)

requesting a1.9.0.7 and 1.9.1 - low risk change to bump the cookie limit, to reduce the occurrence of "my cookies went missing!" reports. we may want to take the bigger patch on 1.9.1 also, but for now, we should do this.
Taras - you might want to see how this will impact mobile before it lands so we an try to fix things ahead of time
Attachment #356372 - Flags: review?(sdwilsh) → review+
Comment on attachment 356372 [details] [diff] [review]
patch v1

You have a commented out printf that needs to be removed, but otherwise r=sdwilsh
Attachment #356372 - Flags: superreview?(mconnor)
Whiteboard: [needs 1.9.1 landing, baking before 1.9.0 approval?]
Comment on attachment 356374 [details] [diff] [review]
bump limit to 3000 for 1.9.0 branch (checked in)

Yeah, this makes sense for 1.9.1 IMO.
Attachment #356374 - Flags: approval1.9.1? → approval1.9.1+
Comment on attachment 356374 [details] [diff] [review]
bump limit to 3000 for 1.9.0 branch (checked in)

Approved for 1.9.0.7, a=dveditz for release-drivers.
Attachment #356374 - Flags: approval1.9.0.7? → approval1.9.0.7+
Whiteboard: [needs 1.9.1 landing, baking before 1.9.0 approval?]
Comment on attachment 356374 [details] [diff] [review]
bump limit to 3000 for 1.9.0 branch (checked in)

landed on 1.9.0.7, 1.9.1, and trunk. leaving bug open to land the real fix on trunk and hopefully 1.9.1...
Attachment #356374 - Attachment description: bump limit to 3000 for 1.9.0 branch → bump limit to 3000 for 1.9.0 branch (checked in)
If I read this correctly, testing this is as simple as seeing if a user is able to have 3000 cookie on a build made after this patch landed?
yup, that's correct. this is covered by unit tests though so no need for litmus here, or anything, really...
I didn't see a unit test added or changed for 1.9.0.7.
wasn't mozilla/netwerk/test/TestCookie.cpp modified per attachment 356374 [details] [diff] [review]?
(In reply to comment #37)
> wasn't mozilla/netwerk/test/TestCookie.cpp modified per attachment 356374 [details] [diff] [review]?

Yes, it was. I didn't notice the path and had glossed over it as part of the other code.
Verified that the automated test is passing through tinderbox.
Flags: wanted1.9.1?
Attachment #356372 - Flags: superreview?(mconnor) → superreview+
Comment on attachment 356372 [details] [diff] [review]
patch v1

Thought I'd already marked this, sr=me
Attachment #356372 - Attachment description: patch v1 → patch v1 (checked in to trunk)
Comment on attachment 356372 [details] [diff] [review]
patch v1

landed on trunk. will request a1.9.1 once this has baked a little...
Backed out due to test failure:
http://hg.mozilla.org/mozilla-central/rev/3a31158051a4
http://hg.mozilla.org/mozilla-central/rev/881bcbdb3cf0

*** 42063 ERROR TEST-UNEXPECTED-FAIL | /tests/extensions/cookie/test/test_eviction.html | incorrect number of cookies - got 2, expected 1051
*** 42064 ERROR TEST-UNEXPECTED-FAIL | /tests/extensions/cookie/test/test_eviction.html | incorrect number of cookies - got 2, expected 1001
*** 42065 ERROR TEST-UNEXPECTED-FAIL | /tests/extensions/cookie/test/test_eviction.html | incorrect number of cookies - got 2, expected 1001

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1236243114.1236246754.30048.gz
We've gotten complaints of lost cookies for Chromium as well, and on http://code.google.com/p/chromium/issues/detail?id=8850 we are considering implementing the behavior described on this bug.  But it doesn't look like it's actually landed.  Dan, any plans to try and fix + re-land soon, or is this in limbo?
Peter: Dan's "name" seems to suggest that he's "busy graduating".

mconnor: who can drive this forward in Dan's absence?

Gerv
this should definitely land as soon as possible, but i'm flat out finishing up until June. (the blocker is to figure out why the added tests are failing - likely a problem with the tests themselves.) assistance would be appreciated...
I talked to Dan tonight and I think he got someone to work on this (yay)
BTW, one potential issue in what got checked in (and then backed out) is that it didn't lower the cookie limits from the 3000 they're currently at down to 2000.  Dunno if any of the unit tests tested that (I didn't look), but it seems like it'd be good to make sure to do anyway, since the startup/resident memory costs of 2000 cookies are lower than 3000.  (Chromium is planning to use 2000.)
without min-age, we're a lot more comfortable with 3k cookies, given that one can easily acquire a couple thousand cookies in less than a week.  Clearly 2000 is smaller than 3000, but by that token, 1000 is smaller than 2000, so why not revert to that?  I don't think the memory overhead is significant enough to scale back for release.
Yes, I agree that without min-age 3000 is probably better than 2000.  I was speaking only of what to do when landing min-age.  Once you have min-age, it'd be nicer to keep a lower limit, not just for memory overhead, but for startup perf (which IMO is even more important).
For it to make a difference, people would need to set less than 3000 cookies in <min-age>, of course.  So it'll affect lighter web users, but not heavy users, I predict.  Still worth doing, but don't overestimate the impact. ;)
Yep, completely agreed.

I need to gather some high-level aggregate data on how many cookies users get and how fast, in order to plot what cookie populations look like over time...
OK, here's a concern.

If you set cookies for a wildcard domain ("*.evil.com"), I think you can flood infinite cookies here, because the per-host limits aren't tripped for wildcards (or are they?).
Are you talking about regular domain cookies, or specifically trying to set domain cookies with a wildcard in them? (Domain cookies are usually denoted by just .foo.com, with no wildcard character.)

If the former, the 50-per-host limit applies to domain and host cookies combined and will catch them. (Although in mozilla, it's a bit more complicated, since due to the way the cookie table is arranged we can only tally cookies for domains equal to or above the domain a cookie is currently trying to set - so we won't count subdomains. This is a bit of a bug. IE takes the easy way out and just counts separate hosts, and doesn't try to do a tally across a domain.)

If the latter, our domain enforcement check won't allow it. The host setting the cookie must be a subdomain of the domain it's trying to set, which would fail here. (This is part of RFC2109.)
Yes, sorry, I meant domain cookies (".evil.com").

Can I (finitely) flood cookies by creating lots of subdomains?  I'm not sure how the tallying works from your description.  Let's say I create a.b.c.....evil.com (insert more subdomains in the "...").  Then I have a finite but arbitrary number of levels in the hierarchy where I can set domain cookies.  Is this limited to 50 per level or 50 for all of .evil.com?
As you describe it, the limit is 50 for all of .evil.com, and that generally works.

But say you set 50 cookies for a.b.c.evil.com, and they all get set. Then later, you set 50 cookies for evil.com - they'll be allowed since the cookieservice currently doesn't have a way to know about the original 50. (If you then try to set cookies for d.evil.com, the 50 evil.com cookies can be counted since they're a superdomain of d.evil.com and the limit will apply.) This is a bug I plan to fix, though, so the spirit here is "50 per domain"...
It seems like, then, that this patch allows finite-but-arbitrary cookie flooding this way, as long as an attacker starts by setting cookies on the most-specific subdomain first, until/unless you make the cookie service able to count subdomains.

Another possible attack along similar lines: Visit a000.evil.com, which sets a host cookie on a000.evil.com, and redirects to a001.evil.com (then repeat).  You could also do that last step using XHR or iframes or some other mechanism.

I don't think this should be checked in without addressing that problem.  It certainly blocks us from doing a similar checkin in Chromium.

FWIW, 50 might be kind of low for an entire TLD+1 (what about userxxx.livejournal.com, or the various products hosted at xxx.google.com, or something?).  Perhaps 50 host cookies or 200 cookies across an entire TLD+1?
We might not want to take this on branch without addressing it, but I certainly wouldn't hold it up on trunk - such attacks could already be done up to the cookie limit of 3000, and have been possible for a long time.

Your a000.evil.com attack won't work in mozilla, since we do count down to TLD+1. It'll only work effectively if there are subdomains many levels deep.

From what I've seen in practice, from looking at many cookie db's, the average number of cookies per site is around 5, up to around 10 or 15 for the outliers. So 50 isn't unreasonable, though I'd be fine with having a higher limit for TLD+1.
Wait, so if I've set a host cookie on a000.evil.com, then when I attempt to set a cookie on a001.evil.com, it knows there's already one cookie set on some other subdomain of evil.com?  How is that possible?  And if it's possible, how does it know to find the a000.evil.com cookie to collect it?

Also, while attacks to the cookie limit can be used to blow away your existing cookies, that's all they can do -- with this patch, you allow attackers to store an arbitrary amount of cookie data in memory and on disk, which won't get purged until after 30 days.  That seems far worse.
Sorry, misread - the host cookie attack you describe will work.

I don't think it's that big a deal for trunk, since I'll have a fix in for the TLD+1 check soon, and I've never seen one of these attacks, ever - or seen anyone complain of them. That doesn't mean they're not out there, just that I'd rather we get some trunk baking of this patch. Branch is a different story, as I said.
Sure, I doubt any attacks exist in the wild right now.  The payoff of wiping a user's cookie store is immaterial.  Shoving a ton of data into memory or on disk would be more worthwhile, even just for the DoS effect.

I'm vaguely curious how you plan to deal with the subdomain checking problem.  In Chromium, our cookie storage is basically a hash map.  I proposed that we could add simple notation at each level of how many cookies were stored in subdomains, but while that tells you when the limits have been reached, it doesn't tell you how to find the cookies on other subdomains of your parent domain so as to prune the oldest ones (of course, you could brute-force this by traversing the whole cookie store...).

We might address this by changing to a tree-based storage, with domain components sorted in reverse order ("com", "google", "www").  This would let you easily enforce limits with just about any algorithm you wished, at the cost of making lookup slightly slower (not sure if that would matter).
Yeah, we'll switch from a hashmap of host to a hash based on eTLD+1. This means any lookup or store operations in the hash incur the overhead of fetching the eTLD, but in mozilla that's fast - for the typical case, it adds one or two hash lookups.

We could do a tree but I think that'd be overkill... for the common case of getting cookies for a pageload, you need to look for domain cookies anyway, so there's not a whole lot to be gained from branches below the eTLD+1.
I see.  Then each hash bucket is just a vector of cookies or something?

Definitely CC me on any bugs filed (or yet to be filed) about this.
Yeah (which is how we do it now).

cc'ed you on bug 321624.
Attachment #356372 - Attachment description: patch v1 (checked in to trunk) → patch v1
Attached patch patch v2 (obsolete) — Splinter Review
Test failures fixed. The problem was int->float promotion in a comparison, which resulted in unpredictable off-by-one behavior. Proper rounding fixed it up.

Carrying over r+sr; ready for landing on trunk.
Attachment #356372 - Attachment is obsolete: true
Attachment #389594 - Flags: superreview+
Attachment #389594 - Flags: review+
Comment on attachment 389594 [details] [diff] [review]
patch v2

http://hg.mozilla.org/mozilla-central/rev/af278cec894d

Once this bakes a while, we can consider it for 1.9.1.
Attachment #389594 - Attachment description: patch v2 → patch v2 (checked in to trunk)
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Backed out because of test_eviction.html failure on Windows

http://hg.mozilla.org/mozilla-central/rev/31be91acb2b2
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attachment #389594 - Attachment description: patch v2 (checked in to trunk) → patch v2
Attached patch patch v3Splinter Review
http://hg.mozilla.org/mozilla-central/rev/227094a1e613
Attachment #389594 - Attachment is obsolete: true
Attachment #417835 - Flags: superreview+
Attachment #417835 - Flags: review+
Landed. Followup patches are pending in bug 321624 and bug 344809, which we'll want to get in ASAP.
Status: REOPENED → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → FIXED
Dan Veditz, Christian Legnitto and I discussed landing this on 1.9.2. (It originally did, but bounced, and was never re-landed). Based on the fact that this should really go along with bug 321624 (to prevent a DoS corner case), which is a large patch, and the fact that we already bumped the limit on 1.9.2 to 3000, I don't think we need to take this.

I'm sure some people are hitting it, but we're releasing 4.0 in a few months, so there's not a huge upside here.
You need to log in before you can comment on or make changes to this bug.