Open Bug 1158387 Opened 5 years ago Updated 11 months ago

Cookie DB shouldn't opt in to data corruption with "PRAGMA synchronous = OFF"

Categories

(Core :: Networking: Cookies, defect, P3)

defect

Tracking

()

REOPENED
Tracking Status
firefox55 --- affected
firefox56 --- affected

People

(Reporter: jld, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [necko-backlog])

Attachments

(2 files)

As bug 1157950 comment #1 points out, the cookie service also uses the SQLite directive "PRAGMA synchronous = OFF", with a comment indicating this is "for performance".   The SQLite documentation (https://www.sqlite.org/pragma.html#pragma_synchronous) warns that “With synchronous OFF (0), … the database might become corrupted if the operating system crashes or the computer loses power before that data has been written to the disk surface.”

Also (and unlike the permission manager), the cookie service is already using the write-ahead log option, so changing this to "synchronous = NORMAL" wouldn't impose the overhead of syncing every transaction, and would limit the effects of OS crash / power failure / etc. to rolling back a small amount of recent cookie transactions (rather than arbitrary corruption).
I agree, with WAL doesn't make sense to se synchronous off.
I reported Bug 1192886, and one replies said it's a duplicate of this bug, but I don't have the knowledge to tell... please mark that bug as dup if it's the case....
Whiteboard: [necko-backlog]
This should be as simple as removing the one line that executes "PRAGMA synchronous = OFF" and then (just to be cautious) running it through try.  Ask :mak for review
Assignee: nobody → juhsu
Whiteboard: [necko-backlog] → [necko-active]
I think it may be worth doing a Talos comparison, especially on Tp tests, just to avoid surprises.
Seems no more janks happened, push to try to figure out any appearance of surprise.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=fd067b1b30c4612e8e0b2f88e60a29a1002b8737
Attached patch syncOff - v1Splinter Review
Attachment #8869882 - Flags: review?(mak77)
Comment on attachment 8869882 [details] [diff] [review]
syncOff - v1

Review of attachment 8869882 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM, Thank you!
Attachment #8869882 - Flags: review?(mak77) → review+
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/4f0f6745c23d
Remove "PRAGMA synchronous = OFF" to avoid corruption in cookie service. r=mak
Keywords: checkin-needed
Backout by kwierso@gmail.com:
https://hg.mozilla.org/mozilla-central/rev/4cf4e0f20cee
Backed out changeset 4f0f6745c23d for failing extensions/cookie/test/unit/test_cookies_async_failure.js on Windows 7 VM opt. r=backout
investigating the db-close hang

FWIW, thousands of warning happens
http://searchfox.org/mozilla-central/rev/6c2dbacbba1d58b8679cee700fd0a54189e0cf1b/dom/base/ThirdPartyUtil.cpp#186

worth to check afterwards.
Flags: needinfo?(juhsu)
why does the test contain 2 function* run_test_4(generator)? A merge problem?
I think it's a performance jank after removing SYNC=OFF (i.e., SYNC=FULL)

The stressed cookie setting[1] took >30 seconds for SYNC=FULL in my local machine. (also SYNC=NORMAL)
For reference, it took <2 seconds for the SYNC=OFF mode.

My suggestion is WONTFIX.

What do you think, :jduell, :mak?

[1] http://searchfox.org/mozilla-central/rev/6c2dbacbba1d58b8679cee700fd0a54189e0cf1b/extensions/cookie/test/unit/test_cookies_async_failure.js#214-217
Flags: needinfo?(mak77)
Flags: needinfo?(jduell.mcbugs)
(In reply to Junior[:junior] from comment #14)
> I think it's a performance jank after removing SYNC=OFF (i.e., SYNC=FULL)
> 
> The stressed cookie setting[1] took >30 seconds for SYNC=FULL in my local
> machine. (also SYNC=NORMAL)
> For reference, it took <2 seconds for the SYNC=OFF mode.

For sure not doing any kind of checks is faster than ensuring the data is written to disk... But a test trying to insert thousands of entries doesn't sound like a good enough reason to opt-in user cookies to corruption. Maybe the test is exaggerated compared to the real world?
Flags: needinfo?(mak77)
(In reply to Marco Bonardo [::mak] from comment #13)
> why does the test contain 2 function* run_test_4(generator)? A merge problem?

It's dup. They are almost identical except the lower one with one more check [1]
Removing the upper one works for me.

[1] http://searchfox.org/mozilla-central/source/extensions/cookie/test/unit/test_cookies_async_failure.js#479-482
Also, it would likely help to have a batch insertion method, here you are doing thousands of inserts without even a transaction, that means each of these inserts IS a transaction.
As an experiment, would you mind checking how much the situation would improve with a transaction? That'd be interesting.
it's very hackish, but would just be for measuring purposes.
in nsCookieService::Observe add "start-cookie-batch" and "end-cookie-batch" topics. in the first one execute "BEGIN" and in the second one "COMMIT". Then Services.cookies.QueryInterface(nsIObserver).observe(null, "XXX-cookie-batch", ""); around the 3k inserts.
(In reply to Marco Bonardo [::mak] from comment #18)
> As an experiment, would you mind checking how much the situation would
> improve with a transaction? That'd be interesting.
> it's very hackish, but would just be for measuring purposes.
> in nsCookieService::Observe add "start-cookie-batch" and "end-cookie-batch"
> topics. in the first one execute "BEGIN" and in the second one "COMMIT".
> Then Services.cookies.QueryInterface(nsIObserver).observe(null,
> "XXX-cookie-batch", ""); around the 3k inserts.

Both OFF and FULL take 0.7x - 0.8x second. Pretty good.
ni to ensure you see this.
Flags: needinfo?(mak77)
Thank you for testing my suggestion! Looks like it was like I suspected. We could either do nothing, use a hack like the above one (with scarier topic names) or introduce an API for batch insertion. I can provide suggestions related to Storage, not decisions related to Cookies, for those I just fall-through to the module peers.
Flags: needinfo?(mak77)
Comment on attachment 8870357 [details] [diff] [review]
Experiment Code: oneTransaction - please do not land

Review of attachment 8870357 [details] [diff] [review]:
-----------------------------------------------------------------

So this switch will mean that we need to be careful to use transactions any time we do a potentially large number of SQL statements.  We've seen that in the test_async_failure case, but we should also audit the cookieService.cpp to make sure there aren't any other places where we need to start/end a transaction instead of doing a whole bunch of small atomic SQL statements.

One place I'm wondering about is RebuildCorruptDB(), which potentially iterates over the hashtable and inserts each cookie in it into the SQLite database:

  https://dxr.mozilla.org/mozilla-central/rev/f81bcc23d37d7bec48f08b19a9327e93c54d37b5/netwerk/cookie/nsCookieService.cpp#1896

But it does this with a mozIStorageBindingParamsArray, which seems to let you run the same SQL statement a bunch of times with an array of different parameters--we only actually Execute() the statement once, with the ParamsArray. I'm not sure if that means they all happen as one transaction (I hope so but let's verify).

Junior--please go over the cookie code and check to see if there are any other cases where we might call the same SQL statement a lot of times in a row (I didn't have time to audit the whole file).

::: extensions/cookie/test/unit/test_cookies_async_failure.js
@@ +212,5 @@
>    // Load the profile and populate it.
>    do_load_profile();
> +
> +  let service = Services.cookies.QueryInterface(Ci.nsIObserver);
> +  service.observe(null, "start-cookie-batch", "");

Hacking in code to nsCookieService to start/end transactions based on nsIObserver notification is clever :)  But I think we're better off not having that code in the tree.  Instead, can we just modify the test to use db.executeSimpleSQL() to insert the cookies? (within a transaction--I'm assuming executeSimpleSQL can start/end transactions).
Marco: can you look at comment 21 and tell me whether a single Execute that does a bunch of SQL statements (via mozIStorageBindingParamsArray) does them all as a single transaction, or whether we should wrap that call in a transaction start/end?
Flags: needinfo?(jduell.mcbugs) → needinfo?(mak77)
(In reply to Jason Duell [:jduell] (needinfo me) from comment #21)
> So this switch will mean that we need to be careful to use transactions any
> time we do a potentially large number of SQL statements.

Potentially that's valid also for the synchronous = off case, but then the writes are so cheap that it matters less.

> But it does this with a mozIStorageBindingParamsArray, which seems to let
> you run the same SQL statement a bunch of times with an array of different
> parameters--we only actually Execute() the statement once, with the
> ParamsArray. I'm not sure if that means they all happen as one transaction
> (I hope so but let's verify).

Yes it runs it in a transaction, it's done here:
http://searchfox.org/mozilla-central/rev/35b37316149108d53a02fb8498e250ea8a22ab5d/storage/mozStorageAsyncStatementExecution.cpp#597
statementsNeedTransaction() goes through each statement and asks it if it may need a transaction, then it counts how many requests it got, if that is > 1, it will create a transaction (since each statement IS a transaction by itself, if the result is 1 we don't need a further transaction).
The needsTransaction() method first check if the statement is read-only, then it returns either 1 (single write) or the paramsArray length (http://searchfox.org/mozilla-central/rev/35b37316149108d53a02fb8498e250ea8a22ab5d/storage/mozStorageStatementData.h#133)

 Instead, can we just modify the test to use
> db.executeSimpleSQL() to insert the cookies? (within a transaction--I'm
> assuming executeSimpleSQL can start/end transactions).

yes, just execute BEGIN and COMMIT queries.
Flags: needinfo?(mak77)
Attachment #8869882 - Attachment description: syncOff - v1 → Part1: syncOff - v1
Attachment #8870357 - Attachment description: oneTransaction → Experiment Code: oneTransaction - please do not land
> Junior--please go over the cookie code and check to see if there are any
> other cases where we might call the same SQL statement a lot of times in a
> row (I didn't have time to audit the whole file).

At first glances of CookieService, activities using |AddCookieToList| are the candidates.
They are AsyncReadComplete, EnsureReadDomain, EnsureReadComplete and ImportCookies.

> Hacking in code to nsCookieService to start/end transactions based on
> nsIObserver notification is clever :)  But I think we're better off not
> having that code in the tree.  Instead, can we just modify the test to use
> db.executeSimpleSQL() to insert the cookies? (within a transaction--I'm
> assuming executeSimpleSQL can start/end transactions).

The thing is we don't expose nsCookieService::mDefaultDBState.
Hence, if we think nsIObserver trick is ugly, we might
1) expose nsCookieService::mDefaultDBDtate
2) create API to set the BEGIN/COMMIT
3) open another db in test and put the cookies from scratch

I don't have a perfect solution. Do you have any suggestion, Jason?
Flags: needinfo?(jduell.mcbugs)
Instead of the nsIObserver approach I'd just add 'startTransaction()' and 'endTransaction()' methods to nsICookieService.idl.  Document them with something like

  // Begin/end SQLite transaction.  By default each call to CookieService that affects the underlying 
  // SQLite database (addCookie, removeCookie, etc) runs in a separate transaction.  If you do this many 
  // times in a row it's faster to wrap them all in a single transaction by calling startTransaction(), 
  // doing your calls, and then endTransaction().  Example: test scripts that need to construct a large 
  // cookie database.
Flags: needinfo?(jduell.mcbugs)
And just in case...

 // Each call to startTransaction must have a matching endTransaction.  Do not nest transactions!
(In reply to Jason Duell [:jduell] (needinfo me) from comment #25)
> Instead of the nsIObserver approach I'd just add 'startTransaction()' and
> 'endTransaction()' methods to nsICookieService.idl.  Document them with
> something like
> 
>   // Begin/end SQLite transaction.  By default each call to CookieService
> that affects the underlying 
>   // SQLite database (addCookie, removeCookie, etc) runs in a separate
> transaction.  If you do this many 
>   // times in a row it's faster to wrap them all in a single transaction by
> calling startTransaction(), 
>   // doing your calls, and then endTransaction().  Example: test scripts
> that need to construct a large 
>   // cookie database.

Looks great. Thanks for your help :)
(In reply to Junior[:junior] from comment #24)
> > Junior--please go over the cookie code and check to see if there are any
> > other cases where we might call the same SQL statement a lot of times in a
> > row (I didn't have time to audit the whole file).
> 
> At first glances of CookieService, activities using |AddCookieToList| are
> the candidates.
> They are AsyncReadComplete, EnsureReadDomain, EnsureReadComplete and
> ImportCookies.
> 

|ImportCookies| pass the |mozIStorageBindingParamsArray|, which indicates a batch operation.
(In reply to Jason Duell [:jduell] (needinfo me) from comment #25)
> Instead of the nsIObserver approach I'd just add 'startTransaction()' and
> 'endTransaction()' methods to nsICookieService.idl.

note, there is a pitfall here, if a consumer invokes startTransaction but forgets to invoke endTransaction, your wall will grow forever and a bunch of assumptions may break. That's we we usually only expose "wrapping" methods that end tha transaction once done, even in case of throws errors/exceptions.
ehr, s/wall/WAL/
(In reply to Marco Bonardo [::mak] from comment #29)
> (In reply to Jason Duell [:jduell] (needinfo me) from comment #25)
> > Instead of the nsIObserver approach I'd just add 'startTransaction()' and
> > 'endTransaction()' methods to nsICookieService.idl.
> 
> note, there is a pitfall here, if a consumer invokes startTransaction but
> forgets to invoke endTransaction, your wall will grow forever and a bunch of
> assumptions may break. That's we we usually only expose "wrapping" methods
> that end tha transaction once done, even in case of throws errors/exceptions.

Is "wrapping" transaction like [1]?
[1] http://searchfox.org/mozilla-central/rev/507743376d1ba753d14ab6b9305b7c6358570be8/netwerk/cookie/nsCookieService.cpp#1001

However, we'd like to expose to js test.
Any better way instead of a delegation to mozIStorageConnection::begin/commitTransaction()?
(In reply to Junior[:junior] from comment #31)
> Is "wrapping" transaction like [1]?
> [1]
> http://searchfox.org/mozilla-central/rev/
> 507743376d1ba753d14ab6b9305b7c6358570be8/netwerk/cookie/nsCookieService.
> cpp#1001
> 

Yes, that's a RAII, the transaction is either committed or rollbacked when the object is destroyed if it wasn't committed before. that avoids having pending open transactions.
nsCookieService::RemoveCookiesWithOriginAttributes needs to be modified, too.
Depends on: 1370454
Depends on: 1370456
Depends on: 867798
Attachment #8869882 - Attachment description: Part1: syncOff - v1 → syncOff - v1
Keywords: checkin-needed
Backed out for making win64 xpcshell nearly permafail like the log below:
https://treeherder.mozilla.org/logviewer.html#?job_id=106229856&repo=mozilla-inbound

https://hg.mozilla.org/mozilla-central/rev/bb6c41fbe8bc87bb14df5d56ac048cc540e853f9
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla55 → ---
I can reproduce locally. Might be the unfinished asyncClose. Let's see.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c8bd72c5ed7d88554e5c7b8f83604f10436b1dee
(In reply to Junior[:junior] from comment #39)
> I can reproduce locally. Might be the unfinished asyncClose. Let's see.
s/can/cannot
It doesn't work to disable asyncClose.
The fail test cases are completely different in comment 39.
Before this got backed out, this regression was noticed:

== Change summary for alert #7223 (as of June 12 2017 02:10 UTC) ==

Regressions:

  4%  tp5n nonmain_normal_fileio windows7-32 opt e10s     363,876,785.90 -> 377,626,874.25
  4%  tp5n nonmain_normal_fileio windows7-32 pgo e10s     366,784,790.80 -> 380,124,378.58

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=7223
and there is a corresponding improvement:
== Change summary for alert #7247 (as of June 13 2017 00:11 UTC) ==

Improvements:

  4%  tp5n nonmain_normal_fileio windows7-32 pgo e10s     379,413,111.71 -> 363,556,657.50

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=7247
Back to this after cookie DB OMT (bug 870460)
Whiteboard: [necko-active] → [necko-backlog]
Depends on: 1372302
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P1
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: P1 → P3
Assignee: juhsu → nobody
You need to log in before you can comment on or make changes to this bug.