Closed Bug 403372 (cookie-monster) Opened 17 years ago Closed 17 years ago

cookie eviction should be in lastaccessed order

Categories

(Core :: Networking: Cookies, defect, P2)

defect

Tracking

()

VERIFIED FIXED
mozilla1.9beta2

People

(Reporter: dwitte, Assigned: dwitte)

References

()

Details

(Keywords: dataloss, regression)

Attachments

(1 file, 1 obsolete file)

so, the "mysterious cookie loss" bug that we've been chasing for a while has been nailed down - ispiked has furnished me with several 300MB+ cookie logs, taken over the course of several days. they show that session cookies, set once at the beginning of the browser session, are eventually evicted when the overall cookie limit (1000) is hit. this takes several days of heavy browsing to happen, which made it tricky to reproduce and log. cookie eviction used to be done by lastaccessed timestamp, which is updated every time nsCookieService::GetCookie*() is called, i.e. every time stored cookies are sent back to the webserver. when we switched to mozstorage (bug 230933), doing this would have incurred a large perf penalty for GetCookie*(), since it requires updating the database each time for a frequently called method. thus we switched to a simpler solution, creation time eviction. i didn't think this would be an issue in practice, since cookies are typically reset by the server (to extend expiry time, update data, etc) every now and then, which would update the creation time. thus, sites you visit often would keep these cookies fresh. it turns out this was wrong, and certain cookies are set just once, so after you turn over 1000 cookies they'll be gone. the fix here will be to move back to lastaccessed eviction, and to find a way to do this without a large perf hit. i have some ideas and numbers which i'll post later. -> me, for M10
Flags: blocking1.9?
Keywords: dataloss, regression
Alias: cookie-monster
Marking P2 due to data loss...
Flags: blocking1.9? → blocking1.9+
Priority: -- → P2
simple fix is to add a lastAccessed column to the cookie table, set this time appropriately when cookies are set, and then update the lastAccessed time when cookies are returned to a server via GetCookie*(). this would involve putting an sqlite statement here: http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/netwerk/cookie/src/nsCookieService.cpp&rev=1.71#1176 which would be "UPDATE moz_cookies SET lastAccessed = ?1 WHERE id = ?2". now, |id| is the rowid of the cookie in the table, which should make this operation pretty fast (in relative terms). i ran some perf tests with this query in place, calling GetCookie 1000 times with each call returning 3 cookies. without the query (current trunk), 1000 calls took 3ms. with the query, and using a transaction within GetCookieInternal(), ~200ms. with the query, and a transaction wrapping the 1000 loop, 30ms. 0.2ms per GetCookie call is gonna show up on radar, given that this can be called many times per pageload. the 30ms number is unachievable, but represents what we could do in an ideal world, i.e. amortizing the hit-the-disk cost over many updates. think the best thing to do here is lazily update the db on a timer, which can be really long - say 10 minutes. losing 10 minutes of lastAccessed data on crash isn't a big deal, so we can get away with it. the data will be correct in the in-memory hash, and we can just walk the list, find dirty cookies, and update those in one transaction. depending on how many cookies need updating this should take a few ms in total. this strategy might even be worth looking at for things like SetCookie (but obviously on a much shorter delay, like 10s, i.e. "when pageload is done") ftr, sdwilsh also suggested using a multithreaded approach when we upgrade to sqlite 3.5.1, or perhaps using an in-memory db and ATTACHing it to the on-disk db periodically (see http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/storage/public/mozIStorageService.idl&rev=1.5#61).
Attached patch patch v1 (obsolete) — Splinter Review
shaver, bless him, convinced me of taking a simpler approach... update the timestamps directly in GetCookieInternal(), but only if they're stale by a certain amount (say, 1 minute). for a pageload, then, we'd update the db maybe twice (once for the page's cookies, and once for advertiser's cookies). this will break down if you load lots of sites and manage to set 1000 cookies in 1 minute, but who cares in what order your cookies are evicted in that case? i'm open to suggestions as to what the threshold should be... in keeping with the spirit of eviction, it should be some small fraction of the time it takes the average user to set 1000 cookies. ispiked managed that in a couple of days. this patch also increments the schema version of the db. i've tested this patch with the cases a) importing cookies.txt -> schema v2, b) upgrading schema v1 to v2, c) then going back to a v1 build, d) then going forward to a v2 build. it's all smooth, with one caveat. any cookies you set in c) will have null lastAccessed times, which will be interpreted as zero in d). thus those will be the first ones evicted. and given that we completely broke the ability for people to go back to an older build after using an sqlite build anyway, this is small potatoes. "cookie monster on a leash"...
Attachment #288947 - Flags: superreview?(shaver)
Attachment #288947 - Flags: review?(comrade693+bmo)
quick perf test with this patch: loading about 14 different sites (that i already had cookies for) generated 1190 GetCookie calls and 40 db transactions (each with an average of 2.5 cookies freshened), and each siteload thus generated an average of 3 transactions. so this has pretty much eliminated the update cost!
Comment on attachment 288947 [details] [diff] [review] patch v1 sr=shaver. We may find that we want to defer all updating until after onload has fired, by just keeping a list of IDs and timestamps, but managing that sort of "after load worklist" is probably more difficult than we'd like right now.
Attachment #288947 - Flags: superreview?(shaver) → superreview+
Comment on attachment 288947 [details] [diff] [review] patch v1 >- if (dbSchemaVersion > COOKIES_SCHEMA_VERSION) { >+ if (dbSchemaVersion < COOKIES_SCHEMA_VERSION) { >+ // upgrading. >+ >+ if (dbSchemaVersion == 1) { >+ // add the lastAccessed column to the table >+ rv = mDBConn->ExecuteSimpleSQL(NS_LITERAL_CSTRING( >+ "ALTER TABLE moz_cookies ADD lastAccessed INTEGER")); >+ NS_ENSURE_SUCCESS(rv, rv); >+ } >+ >+ // update the schema version >+ rv = mDBConn->SetSchemaVersion(COOKIES_SCHEMA_VERSION); >+ NS_ENSURE_SUCCESS(rv, rv); >+ >+ } else { I'd rather you use a switch statement for this like mwu had the download manager do. It's a lot easier to read the code (especially when you get a few migrations in). I realize that the cookie code probably won't need that much, but it'd be nice to fix that now before we really start to mess up blame by fixing it later. http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/toolkit/components/downloads/src/nsDownloadManager.cpp&rev=1.146#271 > return mDBConn->ExecuteSimpleSQL(NS_LITERAL_CSTRING( > "CREATE TABLE moz_cookies (" > "id INTEGER PRIMARY KEY, name TEXT, value TEXT, host TEXT, path TEXT," >- "expiry INTEGER, isSecure INTEGER, isHttpOnly INTEGER)")); >+ "expiry INTEGER, lastAccessed INTEGER, isSecure INTEGER, isHttpOnly INTEGER)")); you should consider placing each column on it's own line (have the comma before the field name though) like DM and places does: http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/toolkit/components/downloads/src/nsDownloadManager.cpp&rev=1.146#500 > PRInt64 expiry = stmt->AsInt64(5); >- PRBool isSecure = stmt->AsInt32(6); >- PRBool isHttpOnly = stmt->AsInt32(7); >+ PRInt64 lastAccessed = stmt->AsInt64(6); >+ PRBool isSecure = stmt->AsInt32(7); >+ PRBool isHttpOnly = stmt->AsInt32(8); You should consider doing what Mardak did with the DM code here - declare a PRUint32 i = 0, and increment it each time, so if you decided to add something in the middle like you do here, you don't mess up blame. >+void >+nsCookieService::UpdateDB(const nsCookie *aCookie) >+{ >+ if (!mStmtUpdate) >+ return; >+ >+ // use our cached sqlite "update" statement >+ mozStorageStatementScoper scoper(mStmtUpdate); This is generally only used if you could return early - having mStmtUpdate->Reset(); at the end of the function is more preferable here (and clearer!). Note: First and last comments I'd like addressed. The middle two are your call.
Attachment #288947 - Flags: review?(comrade693+bmo) → review+
I backed this out, as it caused (at least) a 3ms Tp regression on Linux. I didn't look at the other platforms, so I don't know if it's cross-platform or not. http://build-graphs.mozilla.org/graph/query.cgi?tbox=bl-bldlnx03_fx-linux-tbox-head&testname=pageload&autoscale=1&size=&units=ms&ltype=&points=&showpoint=2007%3A11%3A17%3A01%3A25%3A32%2C265&avg=1&days=1
no change on winnt or macosx, just linux. i really don't know why this would show up in pageload tests, because the sqlite part won't even be hit. looking into it...
Can you attach a patch of what was checked in? The only thing I can think of is that something like the db is being migrated each time.
Attachment #288947 - Attachment is obsolete: true
landed, with comments addressed and a couple of small bugfixes i missed. (had to update the unit test, too.) the Tp issues i had during the first landing have been cleared up - no Tp impact on any platform now (as it should be!). thanks for everyone's help here! people may continue to see small flailings of cookiemonster until their cookies have been evicted out and replaced by ones with a properly defined lastaccessed time - but, once that happens, things should be bug-free. please report any further sightings of blue hair or scuffling noises from garbage cans without hesitation.
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
(In reply to comment #7) > >+void > >+nsCookieService::UpdateDB(const nsCookie *aCookie) > >+{ > >+ if (!mStmtUpdate) > >+ return; > >+ > >+ // use our cached sqlite "update" statement > >+ mozStorageStatementScoper scoper(mStmtUpdate); > This is generally only used if you could return early - having > mStmtUpdate->Reset(); at the end of the function is more preferable here (and > clearer!). This wasn't addressed (although now it's in a different function nested in an if block - the semantics are still backwards from how everyone else uses it).
(In reply to comment #13) > This wasn't addressed (although now it's in a different function nested in an > if block - the semantics are still backwards from how everyone else uses it). spoke with sdwilsh; he's ok with sticking with patch as-is for consistency with other uses in the file and safety in case someone throws in an early return sometime. (that's why we have things like nsCOMPtr!)
Flags: in-testsuite?
Verified FIXED using Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9b3pre) Gecko/2007121405 Minefield/3.0b3pre. I've been running trunk builds since dwitte's patch landed and haven't experienced any cookie loss whatsoever. Also marking as in-testsuite+ since it looks like dwitte updated the cookie tests to make sure last accessed (not last created) cookies were evicted first if there's too many cookies.
Status: RESOLVED → VERIFIED
Flags: in-testsuite? → in-testsuite+
I hope you mean "least recently accessed". But yeah, this does seem fixed :-)
I'm seeing "mysterious cookie loss" in 3.0.1 release on a Mac. I'm posting here, rather than opening a new bug, because I have no particular fail case, and yet it seems to match some of the symptoms here; a large group of cookies (but not all of them) just disappear every once in a while. The other day, I suddenly had to re-login to washingtonpost.com, salon.com, Google, and a bunch of other sites. I'm hoping some of the devs on this thread can give some advice on how to track it down. I've been saving Live HTTP Headers from the login process on these sites, so I can compare the specified expiration dates against the actual ones. And I've turned on cookie logging, for whatever that's worth. But, frankly, it's overwhelming; many sites set a dozen or more cryptically-named cookies at login, most of which don't affect their "remember me" status. Can anyone point me at useful resources - maybe a list of common cookie names and meanings? I imagine that many sites have similar authentication back ends, and thus use similar cookie names. Anything like that...?
(In reply to comment #17) cookie names and such turn out to be irrelevant when debugging these things; the log contains all the information required. if you send me your log after you notice cookie loss, i can take a look and see if anything's out of order... sending me your cookies.sqlite would also help, since the size of the cookie table has bearing here.
(In reply to comment #18) I'm trying to collect the logs with NSPR_LOG_MODULES=cookie:4. But on the Mac, it seems that Firefox only reopens the log file when it quits - at which point it overwrites the log on the next startup. I tried using logrotate, but it apparently gets confused about the state of the file... Any tips on keeping the logs around? Here's my /etc/logrotate.conf: /Users/jay/Library/Logs/Firefox.log { rotate 999 daily copytruncate nocompress }
Is there anyone that can take a look at a set of cookie logs I have and see what the issue is? All my cookies are removed every 2-3 days, and it's incredibly, incredibly frustrating to have to re-login to the same site that I logged into yesterday. Or this morning. It's as if this bug was never fixed in the first place.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: