Closed Bug 547031 Opened 10 years ago Closed 9 years ago

Improve async error handling in cookies

Categories

(Core :: Networking: Cookies, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- betaN+

People

(Reporter: dwitte, Assigned: dwitte)

References

Details

Attachments

(6 files, 1 obsolete file)

+++ This bug was initially created as a clone of Bug #536978 +++

Now that we're about to have spiffy new error handlers for async statement execution, we should do something sensible with them. For instance:

1) On corruption errors, drop the table and repopulate the entire database from memory.
2) On filesystem read/write errors, and other such low-level things, we should probably close our database connection and run entirely from memory. We may try rebuilding the database, but there's only so much we can do.
3) On "file already open" errors (i.e. an AV scanner messing with us), just close the connection and forget about it. Trying to delete or rename the file is likely to fail, and at that point, it's Not Our Problem.
4) On consistency failures (such as "row doesn't exist"), we should probably do 1). This really should never happen, because this indicates a logic error in the code. We certainly want to assert and log it clearly.
Assignee: nobody → dwitte
I think we want this for 2.0, since (right now) we have less error handling than we used to, and we've definitely seen issues of database corruption in previous releases.
blocking2.0: --- → ?
(I'd recommend betaN+ since we need some good testing on this.)
BetaN+, due to the data loss nature of this.
blocking2.0: ? → betaN+
This makes it so DBStates are refcounted, allowing the default state to change. Firefox may not support switching profiles, but we at least need to not horribly break if one does so with Gecko. Without this, any kind of asynchronous error handling would be prone to complete failure -- we could end up operating on the database from a different profile.

This doesn't really introduce any more complexity, it's just a bunch of refactoring.
Attachment #487811 - Flags: review?(sdwilsh)
This makes TryInitDB() much more explicit about what the various failure modes are. Basically, any error opening the database or on subsequent operations will result in throwing away the database and retrying.

It also makes it so we rename a corrupt database, rather than throwing it away. I don't have a strong opinion on this... it's certainly useful in some cases to have that database around (it's helped us diagnose migration bugs, for instance), but it's also a privacy concern to have orphaned data on disk. My rationale here is that it should happen so rarely that the advantage of having it for debugging purposes outweighs the downside of having orphaned data.
Attachment #487812 - Flags: review?(sdwilsh)
GetSyncDBConn() can fail with SQLITE_BUSY if a write transaction (e.g. the DELETE kicked off from Read(), or a regular INSERT/DELETE/UPDATE operation) is in progress. If we do it earlier, we're fine. WAL saves us from the possibility of those writes blocking readers, so the read statements themselves shouldn't fail to execute.
Attachment #487813 - Flags: review?(sdwilsh)
Found a silly bug where we don't include a 'WHERE baseDomain NOTNULL" clause in the SELECT statement, like we do in Read(). We kick off a DELETE of those rows in Read(), but the SELECT in EnsureReadComplete() could execute first, and we really don't want those rows showing up there.
Attachment #487815 - Flags: review?(sdwilsh)
Those are the prelude patches to the actual meat. I've got the final part written, but I'm still writing tests to prove it works. (There are a ton of cases to test.)
Don't know why I didn't see crashes earlier, but reentrancy on observers (which we do in our tests) breaks horribly right now. This makes it so we fire notifications after all list mutations are complete, such that the observers may mutate at leisure. I dropped the "cookie-db-read" notification from EnsureReadComplete() because there's no way to make it safe -- it's called from a bunch of places itself, for which list mutation would be bad. And we don't need it for tests, because it's the synchronous case.
Attachment #488125 - Flags: review?(sdwilsh)
Attached patch part 6: async error handling (obsolete) — Splinter Review
This actually implements the logic to rebuild the database. I've played it conservative and made it so that, if we fail during an async read or a full sync read (i.e. EnsureReadComplete()), we throw away all the results. There's likely good data to be had, but we don't know which hosts we have complete data for and which we don't. Anything read previously via EnsureReadDomain() will be kept.

As an added bonus, this is the final patch, and the whole patchqueue passes tests on try!
Attachment #488903 - Flags: review?(sdwilsh)
... and this one's actually the right one.
Attachment #488903 - Attachment is obsolete: true
Attachment #488930 - Flags: review?(sdwilsh)
Attachment #488903 - Flags: review?(sdwilsh)
Attachment #488930 - Attachment is patch: true
Attachment #488930 - Attachment mime type: application/octet-stream → text/plain
Comment on attachment 487811 [details] [diff] [review]
part 1: multiple DBStates

>+++ b/extensions/cookie/test/unit/test_cookies_privatebrowsing.js
>@@ -0,0 +1,136 @@
>+/* Any copyright is dedicated to the Public Domain.
>+ * http://creativecommons.org/publicdomain/zero/1.0/
>+ */
Please copy boilerplate from http://www.mozilla.org/MPL/boilerplate-1.1/

>+  // Create URIs and channels pointing to foo.com and bar.com.
>+  let spec1 = "http://foo.com/foo.html";
>+  let spec2 = "http://bar.com/bar.html";
>+  let uri1 = NetUtil.newURI(spec1);
>+  let uri2 = NetUtil.newURI(spec2);
You don't create any channels, and there's no need to store the specs in a local.  Please update the comment and get rid of the local spec variables.

>+++ b/netwerk/cookie/nsCookieService.cpp
>   // cache frequently used statements (for insertion, deletion, and updating)
>-  rv = mDBState->dbConn->CreateStatement(NS_LITERAL_CSTRING(
>+  rv = mDefaultDBState->dbConn->CreateStatement(NS_LITERAL_CSTRING(
>     "INSERT INTO moz_cookies ("
This is only used asynchronously, right?  Use CreateAsyncStatement please.

>-  rv = mDBState->dbConn->CreateStatement(NS_LITERAL_CSTRING(
>+  rv = mDefaultDBState->dbConn->CreateStatement(NS_LITERAL_CSTRING(
>     "DELETE FROM moz_cookies "
ditto

>-  rv = mDBState->dbConn->CreateStatement(NS_LITERAL_CSTRING(
>+  rv = mDefaultDBState->dbConn->CreateStatement(NS_LITERAL_CSTRING(
>     "UPDATE moz_cookies SET lastAccessed = :lastAccessed "
ditto

>   // finalize our statements and close the db connection for the default state.
>   // since we own these objects, nulling the pointers is sufficient here.
We don't do any finalizing anymore, so this comment is wrong.

>+  // Null out the db. If it's open, this will synchronously close it.
>+  mDefaultDBState->dbConn = NULL;
This will fail if you've executed any async statements on it, I'm pretty sure.

>+    if (mDBState && mDBState->dbConn &&
>+        !nsCRT::strcmp(aData, NS_LITERAL_STRING("shutdown-cleanse").get())) {
>+      // Clear the cookie db if we're in the default DBState.
As I recall, we don't need the nsCRT:: anymore.

>     } else if (NS_LITERAL_STRING(NS_PRIVATE_BROWSING_LEAVE).Equals(aData)) {
>+      NS_ASSERTION(mDefaultDBState, "don't have a default state");
>+      NS_ASSERTION(mDBState == mPrivateDBState, "not in private state");
>+      NS_ASSERTION(!mPrivateDBState->dbConn, "private DB connection not null");
>+
>+      // Clear the private DBState, and restore the default one.
>+      mPrivateDBState = NULL;
>+      mDBState = mDefaultDBState;
Hmm, why don't we just call Observe in InitDBStates when private browsing is enabled like we used to?

> NS_IMETHODIMP
> nsCookieService::RemoveAll()
> {
>   RemoveAllFromMemory();
> 
>   // clear the cookie file
>   if (mDBState->dbConn) {
>-    NS_ASSERTION(mDBState == &mDefaultDBState, "not in default DB state");
>+    NS_ASSERTION(mDBState == mDefaultDBState, "not in default DB state");
Wouldn't it be legal to call this while in private browsing mode though?

> void
> nsCookieService::AsyncReadComplete()
> {
>   // We may be in the private browsing DB state, with a pending read on the
>   // default DB state. (This would occur if we started up in private browsing
>   // mode.) As long as we do all our operations on the default state, we're OK.
>+  NS_ASSERTION(mDefaultDBState->pendingRead, "no pending read");
>+  NS_ASSERTION(mDefaultDBState->readListener, "no read listener");
Should also assert mDefaultDBState here, correct?

> void
> nsCookieService::CancelAsyncRead(PRBool aPurgeReadSet)
> {
>   // We may be in the private browsing DB state, with a pending read on the
>   // default DB state. (This would occur if we started up in private browsing
>   // mode.) As long as we do all our operations on the default state, we're OK.
>+  NS_ASSERTION(mDefaultDBState->pendingRead, "no pending read");
>+  NS_ASSERTION(mDefaultDBState->readListener, "no read listener");
ditto

>+  rv = mDefaultDBState->stmtReadDomain->BindUTF8StringByIndex(0, aBaseDomain);
Would prefer to see BindUTF8StringByName here

>+++ b/netwerk/cookie/nsCookieService.h
>+    void                          CheckPrivateBrowsing();
This isn't implemented.

r=sdwilsh
Attachment #487811 - Flags: review?(sdwilsh) → review+
> >+++ b/extensions/cookie/test/unit/test_cookies_privatebrowsing.js
> >@@ -0,0 +1,136 @@
> >+/* Any copyright is dedicated to the Public Domain.
> >+ * http://creativecommons.org/publicdomain/zero/1.0/
> >+ */
> Please copy boilerplate from http://www.mozilla.org/MPL/boilerplate-1.1/
This applies to all your patches here.  I won't comment further about it.
Comment on attachment 487812 [details] [diff] [review]
part 2: TryInitDB() cleanup

>+++ b/netwerk/cookie/nsCookieService.cpp
>+  // Attempt to open and read the database. If TryInitDB() returns false, we
>+  // retry; if it returns true, then things either failed catastrophically
>+  // (with no further action required, including cleanup) or succeeded fine.
This comment is most certainly not correct given:
>+  OpenDBResult result = TryInitDB(false);

>+// Attempt to open and read the database. If 'aRecreateDB' is true, try to
>+// move the existing database file out of the way and create a new one. Return
>+// RESULT_OK if opening or creating the database succeeded; RESULT_RETRY if
>+// the database cannot be opened, is corrupt, or some other failure occurred
>+// that might be resolved by recreating the database; or RESULT_FAILED if there
>+// was an unrecoverable error and we must run without a database. If
>+// RESULT_RETRY or RESULT_FAILED is returned, the caller should perform cleanup
>+// of the default DBState.
This belongs in the header file in a javadoc style comment.

r=sdwilsh
Attachment #487812 - Flags: review?(sdwilsh) → review+
(In reply to comment #12)
> >   // finalize our statements and close the db connection for the default state.
> >   // since we own these objects, nulling the pointers is sufficient here.
> We don't do any finalizing anymore, so this comment is wrong.
This is addressed in part 3, so don't worry about it.
Comment on attachment 487813 [details] [diff] [review]
part 3: get sync db connection early

r=sdwilsh
Attachment #487813 - Flags: review?(sdwilsh) → review+
Comment on attachment 487815 [details] [diff] [review]
part 4: fix EnsureReadComplete()

r=sdwilsh
Attachment #487815 - Flags: review?(sdwilsh) → review+
Comment on attachment 488125 [details] [diff] [review]
part 5: fix observers

r=sdwilsh
Attachment #488125 - Flags: review?(sdwilsh) → review+
Comment on attachment 488930 [details] [diff] [review]
part 6 v2: async error handling

r=sdwilsh
Attachment #488930 - Flags: review?(sdwilsh) → review+
(In reply to comment #12)
> Please copy boilerplate from http://www.mozilla.org/MPL/boilerplate-1.1/

But it's the same as the PD boilerplate already!

> This is only used asynchronously, right?  Use CreateAsyncStatement please.

Done!

> >+  // Null out the db. If it's open, this will synchronously close it.
> >+  mDefaultDBState->dbConn = NULL;
> This will fail if you've executed any async statements on it, I'm pretty sure.

Yeah, that's OK. We never do that, and if we did, Connection::Close would assert all over the place.

> As I recall, we don't need the nsCRT:: anymore.

Hmm? I thought we did for PRUnichar* strings.

> Hmm, why don't we just call Observe in InitDBStates when private browsing is
> enabled like we used to?

I thought it was a little obfuscated to call Observe when it's clearer just to explicitly write out those two lines, but I'm not really fussed either way.

> > NS_IMETHODIMP
> > nsCookieService::RemoveAll()
> > {
> >   RemoveAllFromMemory();
> > 
> >   // clear the cookie file
> >   if (mDBState->dbConn) {
> >-    NS_ASSERTION(mDBState == &mDefaultDBState, "not in default DB state");
> >+    NS_ASSERTION(mDBState == mDefaultDBState, "not in default DB state");
> Wouldn't it be legal to call this while in private browsing mode though?

Yes, but it's guarded by mDBState->dbConn, which is always null in PB.

All other comments addressed.
(In reply to comment #20)
> > This will fail if you've executed any async statements on it, I'm pretty sure.

(The comment above that code wasn't very helpfully explanatory, I've made it clearer.)
(In reply to comment #20)
> But it's the same as the PD boilerplate already!
Yours is three lines, and the boilerplate is two.  Subtle, sure, but if gerv ever needs to go through a reformat stuff, it's helpful to have it be copied and pasted verbatim from the boilerplate page.

> Hmm? I thought we did for PRUnichar* strings.
Pretty sure it just works for both.  Try it please.
(In reply to comment #22)
> Yours is three lines, and the boilerplate is two.

I changed all the files in extensions/cookie/test/unit.

> Pretty sure it just works for both.  Try it please.

Nope. You're thinking 'wcscmp', but we don't use that, and I don't think it's portable across platforms (though it may be by now!). 'strcmp' is part of libc and is declared 'const char*', and C doesn't allow function name overloading.
Depends on: 612072
You need to log in before you can comment on or make changes to this bug.