Closed Bug 230933 Opened 21 years ago Closed 17 years ago

move cookies to mozstorage

Categories

(Core :: Networking: Cookies, defect)

x86
All
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.9alpha6

People

(Reporter: mvl, Assigned: dwitte)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 2 obsolete files)

spin off from bug 178993. To store http-only, we need to change the format of
cookies.txt. But we can't, for compatibility reasons. So, a new file needs to be
created.
And to do it right, it should allow more attributes to be saved, like last use time.

very short summary of ideas so far:
- just add httponly and last use time
- add a few more unassinged fields, to be used later. (bools and int)
- make the format flexible, store what mozilla knows about, and merge when
saving not to lose data.
- Should have versioning, cookies2.txt has a versioned filename
- Maintain cookies.txt, to be used by older mozillae

read bug 178993 for more details.
Blocks: 208985
Blocks: 248590
Blocks: 201936
i've added a bunch of bugs as dependent on this, so we can keep track of what
reasons we have for implementing a new fileformat (and what other things we
might want to do while we're at it). a short summary:

bug 208985 - support Set-Cookie2.
bug 248590 - persist cookie creation time, to enhance behavioral consistency
with older mozillae.
bug 188850 - persist last-recently-used time, to allow a more performant
implementation of LRU cookie deletion.
bug 201936 - persist above two fields for purposes of user information (cookiemgr).
I know similar problems have been solved before for other components and config
files, can someone link to some fixed bugs?
vlad has said that storage is (alpha-) ready for consumers, so once 1.1 ships i
think we can look at moving cookies over to it. we'll probably continue to write
out the legacy cookies.txt to keep compatibility for 3rd party apps.

volunteers for testing storage would be appreciated. one aspect darin mentioned
is perf - is storage sufficiently powerful and performant to entrust
nsCookieService's memory store to it? or should cookies retain its memory list
and lazy-flushing behavior?
(In reply to comment #3)
> vlad has said that storage is (alpha-) ready for consumers, so once 1.1 ships i
> think we can look at moving cookies over to it. we'll probably continue to write
> out the legacy cookies.txt to keep compatibility for 3rd party apps.

Probably more than alpha ready even, it's gotten a pretty good beating on
calendar, though there's probably still some API niceness/cleanup that could happen.

> volunteers for testing storage would be appreciated. one aspect darin mentioned
> is perf - is storage sufficiently powerful and performant to entrust
> nsCookieService's memory store to it? or should cookies retain its memory list
> and lazy-flushing behavior?

Good question; I don't know the kinds of queries cookies would do, so I don't
know how well they could be optimized using indexes.  So I think the best answer
I can give is "try it and see"...

For bookmarks I'm going to try to work with sqlite directly, but that's far less
performance intensive...
more info on storage at http://wiki.mozilla.org/Mozilla2:Unified_Storage
Keep in mind that cookies needs an in-memory datastore for session cookies
anyways.  I don't think that using sqlite for session cookies makes sense, so we
probably want to preserve the existing in-memory data structure.  Whether that
means that we load persistent cookies from sqlite into the same in-memory data
structure or provide some way to aggregate the two data structures, I'm not sure.

I know that storage is being used by calendar, but it is still sort of a work in
progress.  The "review storage" bug (see bug 273050) hasn't seen any traction
since I posted review comments last year.  That said, I suspect that there are
mostly just cosmetic changes in order.  The interfaces probably will change some.
good point about session cookies. given that we don't want two different
querying methods for persistent and session cookies, it's probably best to stick
with our hash store.

how efficient is sqlite for that use case - where we use it purely for storage,
and discount its memory-caching/querying abilities? do we pay for that
functionality even if we don't use it?
(In reply to comment #7)
> good point about session cookies. given that we don't want two different
> querying methods for persistent and session cookies, it's probably best to stick
> with our hash store.
> 
> how efficient is sqlite for that use case - where we use it purely for storage,
> and discount its memory-caching/querying abilities? do we pay for that
> functionality even if we don't use it?

Not sure what you mean by pay; you're going to be doing SELECT queries to read
the data out and INSERT/UPDATE statements to flush the data to the store, so
they're basically the same.. the only difference is the frequency of the calls,
and perhaps how much data you'd read (e.g. you'll probably SELECT * to read data
and then read one statement, which will be faster than doing a select with some
kind of WHERE hostname LIKE '%foo.com', and you'll probably be flushing all the
INSERT/UPDATEs in one transaction, as opposed to doing them one by one).  That
should give you a net performance increase as far as time spent in sqlite goes.
sorry for being vague - by pay, i was thinking memory footprint. for example,
the maximum size of the cookie file atm is around 4mb (although real-world ones
wouldn't come near that). if sqlite maintains its own in-memory cache of the
file data, we'd be doubling up on it... not sure if that's a big deal or not,
though. (if it does, it'd be useful to know if there's a way of disabling the
cache.)
Ah, I see what you mean... sqlite will maintain its own memory cache per
database, up to a runtime-specified size.  However, I assume cookies would just
live as a set of tables in an overall profile storage database; so any caching
would be used overall by that database.  The cookies bits would be evicted as
necessary, so I don't think it would be that big of a deal.
mvl and i have talked about how we'd implement this whole storage thing for cookies.

one issue is versioning - storage is nice in that it allows an older mozilla to
use the cookie store, without stomping attributes it doesn't know about (e.g.
that were introduced in a newer mozilla). we should take advantage of that. this
also means that we have to keep the db synced (update cookies on the fly, and
flush to disk periodically); we can't just nuke-and-rebuild when it comes time
to flush.

one potential problem here is preserving the unknown attributes might not make
sense - say a cookie is set for httponly, and then is re-set in an older mozilla
(which does not understand httponly) to be available to all sites. preserving
the httponly attribute in this case would be wrong. but i don't think that'll be
a big deal in practice.

so, on startup we'll have to import the db, which means adding any attributes
that don't exist - see, for example, the lastaccessed counter generation code we
have at the moment. whenever a cookie is added/removed/changed, we sync that to
the db immediately (as an "update" to preserve existing attributes), and lazily
flush like we do now.

i'm guessing that flushing to disk is handled internally to storage, and is
suitably lazy (c.f. 5sec delay that cookies uses now)?
-> default owner
Assignee: darin → nobody
taking. i have a patch in hand to move cookies over to mozstorage; still have some testing and perf work to do before it's ready.
Status: NEW → ASSIGNED
Assignee: nobody → dwitte
Status: ASSIGNED → NEW
Target Milestone: --- → mozilla1.9alpha6
Attached patch storage patch v1 (obsolete) — Splinter Review
first cut at storage for cookies. some notes:

the design is pretty simple; i kept the in-memory hash we use atm, and just replaced the file reading/writing code. this means we have to keep two data structures synced, but the hashes really are a lot faster for lookups. (i have lots of perf data i'll post later). during pageload, there are two perf cases: getting cookies, and setting cookies. getting cookies will be unaffected, since we still use hashes. for the typical case of setting a few cookies, perf should be much faster since we update only the cookies that changed, rather than waiting 5 seconds and dumping the whole cookie file out to disk. (async writing is critical here).

i took advantage of the unique "creation id" we store for each cookie, and used it as the row id for the db. this makes db lookup of a specific cookie very fast. the only flaw here is overflow, but you'd have to set 1,000,000 cookies per second for 300,000 years to overflow that 64 bit int.

we currently have a "last accessed" timestamp per cookie, which is updated every time a cookie is modified or sent back to a server, and is used to evict old cookies once we hit the per-host or total limit. this is a pain to do with storage, because we'd have to update this value in the db every time a cookie is sent to a server (slow). i've changed the algo to just evict based on creation id instead, which should work fine since eviction is arbitrary and up to us anyway. this way, we don't need to update anything, and the behavior seems sane.

i decided to delete the user's old cookies.txt file after import; there's not much point for it to stick around, and it contains private data that shouldn't get orphaned and rot on the disk. also, dataloss for cookies isn't that critical, since they can easily be set again. similarly, if the sqlite db has a schema version higher than it understands (the downgrading case), we just delete the table and start over. if anyone has arguments to the contrary, i'd like to hear them.

ready for testing and review!
Summary: Allow for storing more cookie attributes by creating a new file → move cookies to mozstorage
ugh, ignore the ridiculous changes to Makefile.in, they're not supposed to be there.
> we currently have a "last accessed" timestamp per cookie, which is updated
> every time a cookie is modified or sent back to a server, and is used to evict
> old cookies once we hit the per-host or total limit. this is a pain to do with
> storage, because we'd have to update this value in the db every time a cookie
> is sent to a server (slow). i've changed the algo to just evict based on
> creation id instead, which should work fine since eviction is arbitrary and up
> to us anyway. this way, we don't need to update anything, and the behavior
> seems sane.

I'm not 100% sure I understand this change, but it doesn't sound sane.  I think it is going to break some web applications.  I know of a few web apps that create lots of cookies, knowing in advance that some the browsers are going to get fed up and start throwing away cookies.  In order to make sure that the most important cookies are not discarded, they reset only the critically important cookies with every response.

For example, a login cookie might be reset with every response.  Less important cookies holding information about individual short lived transactions are allowed to expire.

In your patch, does resetting a cookie with the same name and value, or even the same name and a different value, update the creation ID?  If so, your approach sounds reasonably safe.  If the creation ID stays constant even when the server resets the cookie, you may want to look for an alternative.
that's correct; any change to a cookie (including expiry time!) will result in a new nsCookie object to replace the old one, since they're immutable to consumers once created. when critical cookies are evicted and the consumer resets them, they will receive fresh creation id's.
Attached patch storage patch v2 (obsolete) — Splinter Review
a few changes.

this uses creation time (in microseconds) as the unique id, with a counter to enforce uniqueness (since the system clock isn't necessarily monotonic). this way, we store the creation time of cookies for free, which would be nice to display to the user.

in the db downgrade case, we check if all the expected columns are there, and just carry on if they are. this should make switching between mozilla versions transparent. if columns have been deleted, we drop the table and start over.
Attachment #267699 - Attachment is obsolete: true
Attached patch storage patch v3Splinter Review
minor revisions. ready for review!
Attachment #267942 - Attachment is obsolete: true
Attachment #268007 - Flags: superreview?(mconnor)
Attachment #268007 - Flags: review?(sdwilsh)
Comment on attachment 268007 [details] [diff] [review]
storage patch v3

General nit s/nsInt64/PR[I|Ui]nt64/g, but you can do a followup bug if you want since all these files seem to use it still.  See http://groups.google.com/group/mozilla.dev.tech.nspr/msg/d48490f4c0781b92

>+nsresult
>+nsCookieService::InitDB()
>+{
>+  // cache a connection to the cookie database
This comment should really go above where you make the call to OpenDatabase

>+    PRInt32 dbSchemaVersion;
>+    {
>+      // scope the statement, so the write lock is released when finished
>+      nsCOMPtr<mozIStorageStatement> stmt;
>+      rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING("PRAGMA user_version"),
>+                                    getter_AddRefs(stmt));
>+      NS_ENSURE_SUCCESS(rv, rv);
>+
>+      PRBool hasResult;
>+      rv = stmt->ExecuteStep(&hasResult);
>+      NS_ENSURE_SUCCESS(rv, rv);
>+      NS_ENSURE_TRUE(hasResult, NS_ERROR_FAILURE);
No result here means that user_version is not set.  If it isn't, we can still figure out of the DB is useable and act accordingly.  As the code is now, the user would have to delete the database themselves for this to ever work again.

>+        // downgrading.
>+        // if columns have been added to the table, we can still use the ones we
>+        // understand safely. if columns have been deleted or altered, just
>+        // blow away the table and start from scratch!
>+        
Please explicitly state that if someone modifies how a column is used, they need to rename it so downgrading works properly.

r=me with these fixed
Attachment #268007 - Flags: review?(sdwilsh) → review-
Attachment #268007 - Flags: superreview?(mconnor)
nits fixed. thanks for the quick review!
Attachment #268014 - Flags: superreview?(mconnor)
Attachment #268014 - Flags: review?(sdwilsh)
Attachment #268014 - Flags: review?(sdwilsh) → review+
Blocks: 348008
Blocks: 236133
Comment on attachment 268014 [details] [diff] [review]
storage patch v3.1

>+        if (NS_FAILED(rv)) {   
>+          // our columns aren't there - drop the table!
>+          rv = mDBConn->ExecuteSimpleSQL(NS_LITERAL_CSTRING("DROP TABLE moz_cookies"));
>+          NS_ENSURE_SUCCESS(rv, rv);

do we want to copy the sqlite file so that someone who blows up their file on a downgrade this way can recover?


>+  // check whether to import or just read in the db
>+  if (!tableExists)
>+    rv = ImportCookies();
>+  else
>+    rv = Read();
>+
>+  return rv;
>+}

if (!tableExists)
  return ImportCookies();
  
return Read();

Generally seems more readable.



>+  // if it's a non-session cookie and hasn't just been read from the db, write it out.
>+  if (aWriteToDB && !aCookie->IsSession() && mStmtInsert) {
>+    // use our cached sqlite "insert" statement
>+    mozStorageStatementScoper scoper(mStmtInsert);
>+
>+    nsresult rv = bindCookieParameters(mStmtInsert, aCookie);
>+    if (NS_FAILED(rv)) {
>+      NS_WARNING("binding parameters failed!");

style whining: return PR_TRUE here, and ditch the else?  too-much-nested-if ;)

>+    } else {
>+      PRBool hasResult;
>+      rv = mStmtInsert->ExecuteStep(&hasResult);
>+      if (NS_FAILED(rv))
>+        NS_WARNING("db insert failed!");
>+    }
>+  }
> 
>   return PR_TRUE;

overall, sr=me, looks like sdwilsh did the hard stuff here.
Attachment #268014 - Flags: superreview?(mconnor) → superreview+
checked in for alpha, with nits fixed. i didn't do the file backup yet; probably want fresh reviews for that. might add that in at a later point.

fixed!

notes to anyone considering modifying the cookie sqlite db schema in future: see comments in nsCookieService::InitDB() re upgrading and downgrading!
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
(In reply to comment #23)
> i didn't do the file backup yet; probably want fresh reviews for that. might add
> that in at a later point.

Probably would be a good idea to file a bug for this, lest it be forgotten.
This totally broke cookies in Camino, as far as I can tell.
Please file a separate bug for Camino, as camino-specific bugs are not a blocker.
(In reply to comment #26)
> Please file a separate bug for Camino, as camino-specific bugs are not a
> blocker.

Yeah, I was on my way to do this too. Bug 385074 has been filed.
Not user-visible, regressions can be tested separately, so in-testsuite-.
Flags: in-testsuite-
as promised in comment 14, some comments on perf:

transactions make a big difference. with synchronous writing on (sqlite default), adding 1000 cookies one-by-one (via INSERT), in a loop, took (all times in microseconds):

no transaction: 1100000 (yes, 1.1 seconds!)
no transaction, but synchronous=OFF: 106000
transactions (synchronous makes no difference): 6900

which justifies why we use async writing, to optimize the relatively common case of a pageload setting 5-10 cookies over a few seconds. note that, while we use transactions inside the SetCookie functions themselves, it's not possible to set all cookies for a pageload under one transaction - so, we have to optimize that case as well, which async helps with a lot. also, these tests were for the case of no indexing on the table, default caching and paging. starting from the last number (6900) and playing around with what we index the table on:

transactions, with index on host string: 13000
transactions, with index on 64 bit int (expiry): 12500
transactions, with index on unique rowid: 5600

i.e. the cost of indexing on either a string or non-unique integer is significant, but indexing on the row id itself is very cheap (c.f. creation time). very similar numbers result for removal of 1000 entries:

removing, with match on name, path and host strings: 290000
removing, as above, but indexing on host: 15700
removing, index on host, and only matching host: 12500
removing, index on 64 bit int (expiry): 15700
removing, index on unique rowid: 5600

thus when removing a specific entry, any kind of indexing gives a 20x speedup. whether the index is by string or int doesn't matter much, but indexing by rowid is far cheaper. looking at caching settings (PRAGMA cache_size), i didn't see much difference on insertion time for cache sizes between 2 and 2000 pages. paging (PRAGMA page_size) did have an effect on insertion time; large pages (>2k) doubled the insertion time. pages 512 - 1024 seemed to work best (tested with sync on), which probably corresponds to the OS page size. based on these findings, i left page_size and cache_size at the sqlite defaults.

looking at mem usage vs cache_size, the largest cookie file is around 4MB (1000 cookies * 4k each). that's significant, so we want to limit the cache somewhat. default cache size is 2000 pages which added around 400k to mem usage over the no cache value. increasing cache to 10000 pages made mem usage jump 4MB (the whole db). the default value seemed reasonable (the in-mem cookie list is going to chew up 4MB anyway, so a 400kb database cache is small potatoes).

finally, comparing lookup times: 1000 lookups (matching on host, name, and path) take 356us using the in-memory hashtable (fast!). SELECTing those same entries (again, matching by host, name, and path; also indexing on host, to be fair) takes 13000us. this explains why i stuck with our hash structure; the db is much slower for query operations.
(all these tests were done on a firefox opt build, otherwise default build options, on a core 2 duo 6400 @ 2.13GHz).
Depends on: 439384
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: