Closed Bug 408914 Opened 17 years ago Closed 17 years ago

Disable sqlite async IO

Categories

(Toolkit :: Storage, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla1.9beta3

People

(Reporter: vlad, Assigned: sdwilsh)

References

Details

(Keywords: qawanted)

Attachments

(3 files)

Disable our async IO; there's a theory that it was causing corruption in some cases, so let's see what the perf hit is if we just turn it off.
Attachment #293774 - Flags: review? → review?(mconnor)
Comment on attachment 293774 [details] [diff] [review]
do the disabling

wow, hg diff sucks :(

I see why you don't need to get rid of the code in mozStorageConnection->Initialize, so no complaints there.  My only question is should we be taking out the SQLITE_ENABLE_REDEF_IO in the source Makefile?  It may impact performance.
http://mxr.mozilla.org/seamonkey/source/db/sqlite3/src/Makefile.in#79

r=sdwilsh regardless.
Attachment #293774 - Flags: review?(mconnor) → review+
Please make sure this gets tested with network-mounted home drives (especially on Windows)... that was IIRC one of the prime motivating factors for async-io in the first place.
also please make sure pageload tests are done with cookies on (talos should suffice if this gets checked in for testing). testing should also be done on the cookiemanager UI doing 'remove all'.
Really, anything that uses storage (places, download manager, cookies, permission manager) should be checked to see if there is a noticeable perf loss.
you can use this to test permission manager UI - put this in your profile dir, open permmgr (prefs->content->image exceptions), wait for the window to load (will take a while), then remove all. cookie manager works pretty much the same, so this test should suffice for both.
Attachment #293788 - Attachment is patch: false
Attachment #293788 - Attachment mime type: text/plain → application/x-gzip
The plan is to just check this in and let this go out to nightlies without trying to do one-off performance testing.

I think SQLITE_ENABLE_REDEF_IO is a super tiny performance hit (vtable jump, I think), but removing it would require me to comment out more code since I think removing it prevents a bunch of stuff from being defined.

(what's wrong with hg diff?)
That's fair - more code would have to be commented out.

Not enough context in the diff was my issue.
Comment on attachment 293774 [details] [diff] [review]
do the disabling

I suppose this needs approval to land...
Attachment #293774 - Flags: approval1.9?
Comment on attachment 293774 [details] [diff] [review]
do the disabling

a=beltzner for 1.9, on the understanding that someone's going to watch for the appropriate perf metrics after it lands
Attachment #293774 - Flags: approval1.9? → approval1.9+
Vlad - I take it you'll take care of that?
Keywords: checkin-needed
As per sdwilsh, landing this.

Checking in storage/src/mozStorageService.cpp;
/cvsroot/mozilla/storage/src/mozStorageService.cpp,v  <--  mozStorageService.cpp
new revision: 1.12; previous revision: 1.11
done
Keywords: checkin-needed
Hardware: PC → All
Version: unspecified → Trunk
Looks like maybe some trending upwards for Ts and Tp after this landed, from looking at http://people.mozilla.org/~johnath/pdb/.
So, I guess the question (for drivers) is "Is this too much?"  Regardless, if we can identify the hotspots (dtrace maybe?), we could get those consumers to use multithreaded writing.
I'm not sure if I can see a clear before/after on those graphs. Anyone got a numerical breakdown we can look at?

If it's a case of "we know that this prevents data corruption", I'd lean towards taking a slight Tp/Ts hit (since slower but not broken seems to beat faster and maybe losing user data) but hard to tell looking at those graphs.

(In reply to comment #14)
> So, I guess the question (for drivers) is "Is this too much?"  Regardless, if
> we can identify the hotspots (dtrace maybe?), we could get those consumers to
> use multithreaded writing.

How do we start this? Can we get some sayrer time to look into it?
(In reply to comment #15)
> 
> (In reply to comment #14)
> > So, I guess the question (for drivers) is "Is this too much?"  Regardless, if
> > we can identify the hotspots (dtrace maybe?), we could get those consumers to
> > use multithreaded writing.
> 
> How do we start this? Can we get some sayrer time to look into it?
> 

I don't think any time is required from me. FWIW, I don't see anything alarming in the flawed old-style perf data that johnath's page shows. Talos data would be more enlightening: Ts, Txul, Tp.

Profile corruption is not acceptable, so there's no trade-off discussion to be had here. I'm sure we all know variations on the following quote, but let's repeat it for fun: "Make it correct, make it clear, make it concise, make it fast. In that order."

A benchmark against the FUEL history/bookmark APIs would be the best way to measure this. It would show hotspots in the presence of fewer confounders, and be easy to compare to Fx2.
Assignee: nobody → comrade693+bmo
Flags: blocking1.9?
Whiteboard: [needs patch]
Target Milestone: --- → mozilla1.9 M11
If this does cause profile corruption than we should absolutely fix it - if it causes slowdowns there are was to work around it via threading the ui...
Flags: blocking1.9? → blocking1.9+
Priority: -- → P1
the talos boxes seem to all show about a 1% Tp regression: qm-mini-ubuntu01, 02, 05, qm-mini-xp01. (all about a 6ms increase on a 600ms Tp.) Ts seems unaffected, didn't look at Txul since i doubt that'd change.

aside from Tp, there are still other things we can work on - in particular the cookieviewer and permissionviewer ui's don't do any batching of operations (comment 6), so clicking 'remove all' on a huge list will take a while. removing 50,000 permissions currently takes 34s with async on; now it takes 121s, 3.5x as long. i'm not sure how history/bookmarks does things but hopefully they already use batching where it matters...
i meant to add, i can't see any noticeable Tp regression on windows talos.
Attached patch v1.0Splinter Review
Say bye-bye.
Attachment #295883 - Flags: review?(vladimir)
Whiteboard: [needs patch] → [has patch][needs review vlad]
Just a reminder that our Talos test run with empty profiles.  Have we done basic tests with big profiles to make sure history, bookmarks, etc menus don't suffer?

I still think we need this change - just want to understand if it has created more perf turning for us..
Comment on attachment 295883 [details] [diff] [review]
v1.0

Looks fine.
Attachment #295883 - Flags: review?(vladimir) → review+
Whiteboard: [has patch][needs review vlad] → [has patch][has review][can land]
I think we need to still see what (if anything) was hit by this change per comment 21

Checking in config/system-headers;
new revision: 3.32; previous revision: 3.31
Checking in db/sqlite3/README.MOZILLA;
new revision: 1.17; previous revision: 1.16
Checking in db/sqlite3/src/Makefile.in;
new revision: 1.32; previous revision: 1.31
Checking in db/sqlite3/src/sqlite.def;
new revision: 1.4; previous revision: 1.3
Removing db/sqlite3/src/sqlite3file.h;
new revision: delete; previous revision: 1.10
Checking in storage/src/Makefile.in;
new revision: 1.8; previous revision: 1.7
Removing storage/src/mozStorageAsyncIO.cpp;
new revision: delete; previous revision: 1.20
Checking in storage/src/mozStorageConnection.cpp;
new revision: 1.31; previous revision: 1.30
Checking in storage/src/mozStorageService.cpp;
new revision: 1.13; previous revision: 1.12
Checking in storage/src/mozStorageService.h;
new revision: 1.6; previous revision: 1.5
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Whiteboard: [has patch][has review][can land]
Depends on: 411780
I just want to make a note that since sqlite async IO was disabled, my shutdown times have dropped from a typical value of 1m40s to under 5 seconds.
did this have any performance impact on firefox 2 -> firefox 3 migration (with a large bookmarks.html or history.dat) or dcamp's url classifier (when getting a large amount of data from the server)?
I just noticed that you guys did this. I am extremely concerned! We originally put this in because of crazy insane regressions. bsmedberg is right. At Google, as with most other organizations running Linux, we have a networked filesystem that contains home directories. With synchronoous writes and fsyncs in this situation, the browser is basically unusable. If this problem is not fixed, I predict you will lose almost all of these users. When I added this, it was considered absolutely critical, enough for me to stop working on features and spend almost a month on.

Even on local filesystems, you are still doing an fsync on the disk/network every 5 seconds or so, which I would be EXTREMELY concerned about performance-wise. Even if you don't see much average delta in PLT, you are going to really kill performance on loaded systems/networks and the whole browser will be much more janky.

Steve England: what are you doing that it takes so long to shutdown? Running PLT or something? This absolutely is a concern, and there are some things that can be done about it if you want to keep async I/O (do less commits, be careful not to do any DB operations on shutdown, etc). However, most people do not see these times. Async I/O only moves time around. If your shutdown is one minute faster, you lost that one minute during browsing.
Brett, my shutdown problem was frequently reproducible even after making a new firefox profile, starting firefox, and then closing it. And it was also reproducible even after removing the default bbc rss feed and disabling anti-phishing and testing again. I don't even know what PLT is, all I know is that now my shutdown times are no longer totally unreasonable :-)
For what it's worth, I've seen lots of people complain about the shutdown slowness before.
Shawn/Vlad did we test with homedirs on network file systems?  I haven't seen any difference in perceived perf on local disk after the change.  
Keywords: qawanted
(In reply to comment #29)
> Shawn/Vlad did we test with homedirs on network file systems?  I haven't seen
> any difference in perceived perf on local disk after the change.  
I think that fell through the cracks.  It's on my agenda to bug QA about this today.
QA will work through the testing here.  However, can you provide some testcases or scenario on some direction to testing?   
I ran a quick test with builds from the 6th and the 8th and it seemed OK. I could have messed up, though, or maybe the NFS on my computer is better than it used to be.
hi brett, we've been looking through this bug, but not really sure how we can help with testing.  running perf tests over networked homedirs isnt exactly the easiest thing to test.   Do you have any suggestions on what we can do to help?  Or should we assume you will look into more testing here, and rely on your verifications?
(In reply to comment #33)
> hi brett, we've been looking through this bug, but not really sure how we can
> help with testing.  running perf tests over networked homedirs isnt exactly the
> easiest thing to test.   Do you have any suggestions on what we can do to help?
>  Or should we assume you will look into more testing here, and rely on your
> verifications?
> 

Tony - we could use your help - perhaps running with a big existing profile (e.g. lots of history) with the homedir on a NFS mount.  Try the basic functional testing and see if you ever get any big hangs.

Shawn/Brett is there a way to log all SQLlite writes and fsyncs so we can see how frequent they are, where they originate from, and the time it takes to complete?
(In reply to comment #34)
> Shawn/Brett is there a way to log all SQLlite writes and fsyncs so we can see
> how frequent they are, where they originate from, and the time it takes to
> complete?
Not that I know of, unless we wanted to implement our own VFS module to log it.
okay, we surely can test with a large profile against homedir on a nfs mount.  Carsten will be taking point on this, so please be available for any questions he may have.  We'll load up the large profile on a debug nightly build, and run a few functional tests in litmus against our history testcases this week.
It does a fsync every time a transaction is committed.

I thought there was a timer that commits every 5 seconds or so, so a transaction is kept open. However, I could not find this code in the file. If true:

(a) This is very bad for synchronous file writing (current state) since it does a *lot* of them. Especially over NFS this is bad, but even on local filesystems it will make the app janky in a way that isn't covered by page load tests.

(b) It would also explain the terrible shutdown times when using asynchronous writing (previous state). It would accumulate a lot of commits on the background thread, which are slow. When you shutdown, these do not have a chance to catch up.

I would highly recommend running on the local system in a high-IO situation, which should make things like fsync much slower. This is the main thing that normal users will see, and it is not covered by the page cycler tests. I have used http://www.iometer.org/ for this type of thing on Windows (it will thrash your disk while you run tests).
No longer depends on: 408072
Blocks: 402615
I have to completely agree with Brett Wilson. I see this on a local disk under high i/o situations. Kernel compiles, virtualization disk image creation, virtualization heavy disk i/o, etc.

I filed a bug about it, before I found this bug,  421482.

Overall it is sounds like sqlite is just a poor choice for Firefox. It is just too i/o intensive. You either go async and delay the writes, or you go sync and performance is horrible in certain situations. This reminds me of the Google Sync extension. It too tries to write a lot of the same data, if you let it, to a remote server. I abandoned it, because it caused closes to be too slow.
Except that async writes were leading to data corruption...

Tomcat - have you had any luck testing this yet?
(In reply to comment #39)
> Except that async writes were leading to data corruption...

This has never really been explained to me. There are two possibilities. First, that there is a bug in the async code. If this is true, it should be fixed. The second possibility is that people accidentally change their file manually while it is open in Firefox because the async I/O didn't keep the file locked. The solution is to keep the file locked. There has been a bug on this.

The attitude on these bugs seems to be "async I/O causes data corruption, therefore it must be disabled" is the wrong one. There is nothing inherent with the async I/O that will cause corruption.
There's also the prolonged shut-down times we kept getting bugs filed on because we had to wait for the async thread to join with the main thread, and the fact that a statement that returned successfully could not actually make it into the database because we crashed (losing the D in ACID).

Really, now that sqlite supports multithreaded connections with a shared caches, if a consumer needs to use async writes, they can do their writes on a different thread.  Maybe for the next release storage can provide an abstraction for them, but as it stands now, this won't change for 1.9.
(In reply to comment #41)
> There's also the prolonged shut-down times we kept getting bugs filed on
> because we had to wait for the async thread to join with the main thread

Please see comment 37 for why shutdown took so long with async I/O. Batching transactions and committing every 5 seconds or so will make the shutdown problem go away. The reason it gets behind is doing all those fsyncs which queue up.


> the fact that a statement that returned successfully could not actually make
> it into the database because we crashed (losing the D in ACID).

What's wrong with this? If you crash you lose the last 5 seconds of history modifications, so what? For 90% of the database things the browser does, durable really has no meaning. Asking the user to wait for an additional amount of time per page load to ensure that the visit count increment is committed to disk before continuing is crazy. If you moved all of history to a background thread like you suggest doing for "future components," you would have the same situation: modifications might not make it out to disk even though the database itself *would* be "durable."

If the user modifies a bookmark or deletes something, you can still join with the thread to get a real transaction (I thought I had filed a bug on adding this, but I couldn't find one).


> Really, now that sqlite supports multithreaded connections with a shared
> caches, if a consumer needs to use async writes, they can do their writes on a
> different thread.

History is and will always be the most critical consumer of sqlite because it is constantly writing while your are browsing. Saying that this can be solved when a more critical component comes along is silly.

I/O is *the* most important component of client program performance with today's very fast processors and large memories. Blocking the UI while browsing (not even querying or interacting with history!) on I/O just makes for a laggy product and a bad experience.

I haven't heard any good arguments for removing async I/O that don't boil down to being too lazy to make it work properly. It probably *is* too late in the cycle for doing a change like this now, but it's unfortunate that nobody asked or cared about this when it could have made a difference.

Sorry for being so strong about this. I am concerned that all the work I did on sqlite and places for Mozilla might actually end up making the product worse than better because of the additional jankyness introduced during normal browsing (no matter how awesome the history UI is).
(In reply to comment #42)
> Please see comment 37 for why shutdown took so long with async I/O. Batching
> transactions and committing every 5 seconds or so will make the shutdown
> problem go away. The reason it gets behind is doing all those fsyncs which
> queue up.
But that requires everyone, including add-on authors, to use storage "correctly", which as I look at more and more code, they can't even do that now.

> What's wrong with this? If you crash you lose the last 5 seconds of history
> modifications, so what? For 90% of the database things the browser does,
> durable really has no meaning. Asking the user to wait for an additional amount
> of time per page load to ensure that the visit count increment is committed to
> disk before continuing is crazy. If you moved all of history to a background
> thread like you suggest doing for "future components," you would have the same
> situation: modifications might not make it out to disk even though the database
> itself *would* be "durable."
I'm sorry, but mozilla isn't just a browser.  It's also an application platform.  Expecting everyone to conform to what the browser needs is *not* the right thing to do, even if Firefox is the flagship product.

> History is and will always be the most critical consumer of sqlite because it
> is constantly writing while your are browsing. Saying that this can be solved
> when a more critical component comes along is silly.
We use storage all over the place now.  History, cookies, site specific preferences, phishing and malware.  I think there's more, but there are so many now that I can't even keep track of them anymore.  Cookies uses PRAGMA synchronous = OFF because the module owner felt that the risk of dataloss was low enough for their uses.  This helps perf immensely.   My point is that each consumer should be evaluating how safe they want their writes to be - we should provide them with the safest way be default and they can reduce it from there.

> I/O is *the* most important component of client program performance with
> today's very fast processors and large memories. Blocking the UI while browsing
> (not even querying or interacting with history!) on I/O just makes for a laggy
> product and a bad experience.
Right, but I still feel that it should be up to each consumer to deal with this and maybe provide something down the line.

I should note another issue that came up in the past with async io.  On windows it was basically impossible to remove a database file reliably because you never knew when sqlite was finally done with it.  Even if you explicitly closed the database, it was still open (and close would succeed, which was even more annoying for consumers).  You may think this is just a theoretical thing, but it actually hit us in our unit tests (for storage and download manager at least) and affected Seamonkey as well.
(In reply to comment #43)
> 
> But that requires everyone, including add-on authors, to use storage
> "correctly", which as I look at more and more code, they can't even do that
> now.

Shawn, could you be more specific here? Could you give a concrete example of add-on authors doing something incorrectly?

> I'm sorry, but mozilla isn't just a browser.  

History is unique and high-traffic enough that an async interface might be justified. No need to over generalize.

> Cookies uses PRAGMA synchronous = OFF

So, I'm guessing history does not. What data do we risk losing? Just some recently visited sites? Or a whole database corrupted?


(In reply to comment #43)
> But that requires everyone, including add-on authors, to use storage
> "correctly", which as I look at more and more code, they can't even do that
> now.

What's so much different? Now if you don't do this transaction trick, you hang the browser at random times waiting for disk syncs. If people don't know what they are doing, they will not use transactions at all and every single write will be a fsync. If the extension is doing as many operations as history such that shutdown will be slowed, it will slow down the whole browser measurably without async I/O! The fact is that if somebody doesn't know what they are doing with sqlite, performance will suck period.

BTW history should really batch transactions regardless of whether async I/O is in or out. Failing to do this will result in performance problems period.

> I'm sorry, but mozilla isn't just a browser.  It's also an application
> platform.  Expecting everyone to conform to what the browser needs is *not* the
> right thing to do, even if Firefox is the flagship product.
> 
> > History is and will always be the most critical consumer of sqlite because it
> > is constantly writing while your are browsing. Saying that this can be solved
> > when a more critical component comes along is silly.

> We use storage all over the place now.  History, cookies, site specific
> preferences, phishing and malware.  I think there's more, but there are so many
> now that I can't even keep track of them anymore.  Cookies uses PRAGMA
> synchronous = OFF because the module owner felt that the risk of dataloss was
> low enough for their uses.  This helps perf immensely.   My point is that each
> consumer should be evaluating how safe they want their writes to be - we should
> provide them with the safest way be default and they can reduce it from there.

I don't agree with the last statement. A lot of authors want it to be possible to write fast code easily. Without async I/O, this becomes more difficult for people who want to do a good job (I was involved in two or three of the components you listed above, and it's still really hard).

By taking away the async I/O, you are taking away the option for consumers to say that they want async I/O. It is actually really hard to put services on background threads in Mozilla. With async I/O you can give people more options and give people more performant runtime by default.

If you want it to be safe by default, make mozIStorageStatement.commitTransaction sync to the I/O thread and add a new "Soft Commit" function. This provides the current level of safety while adding additional options (since "options" seems to be your major concern).


> > I/O is *the* most important component of client program performance with
> > today's very fast processors and large memories. Blocking the UI while browsing
> > (not even querying or interacting with history!) on I/O just makes for a laggy
> > product and a bad experience.
> Right, but I still feel that it should be up to each consumer to deal with this
> and maybe provide something down the line.

You seem agree that it is a bad experience! I do not understand your desire to release a sucky product and then "maybe provide something down the line."


> I should note another issue that came up in the past with async io.  On windows
> it was basically impossible to remove a database file reliably because you
> never knew when sqlite was finally done with it.  Even if you explicitly closed
> the database, it was still open (and close would succeed, which was even more
> annoying for consumers).  You may think this is just a theoretical thing, but
> it actually hit us in our unit tests (for storage and download manager at
> least) and affected Seamonkey as well.

We already covered the slow shutdown problem keeping the file open. You could still always sync to to the I/O thread if you needed to close the file and do something with it. So when you say "basically impossible" you mean "I don't want to write the code to sync to the I/O thread." Do you see what I meant by "I haven't heard any good arguments for removing async I/O that don't boil down
to being too lazy to make it work properly"?

Robert wrote:
> So, I'm guessing history does not. What data do we risk losing? Just some
> recently visited sites? Or a whole database corrupted?

The whole database corrupted.
(In reply to comment #44)
> Shawn, could you be more specific here? Could you give a concrete example of
> add-on authors doing something incorrectly?
Most common problem I see is add-on authors not wrapping statement operations in try-finally blocks to make sure they always reset the statement.

> History is unique and high-traffic enough that an async interface might be
> justified. No need to over generalize.
> 
> > Cookies uses PRAGMA synchronous = OFF
> 
> So, I'm guessing history does not. What data do we risk losing? Just some
> recently visited sites? Or a whole database corrupted?
See http://sqlite.org/pragma.html#pragma_synchronous for the details.  If the operating system crashes or power is lost before the data makes it on the disk, we could have issues with the database being corrupted or lose some data.  dwitte makes the argument that operating systems journal their file io, so the chances of data corruption here are pretty darn low.  Dietrich would be better suited to explain this decision than me however.

(In reply to comment #45)
> What's so much different? Now if you don't do this transaction trick, you hang
> the browser at random times waiting for disk syncs. If people don't know what
> they are doing, they will not use transactions at all and every single write
> will be a fsync. If the extension is doing as many operations as history such
> that shutdown will be slowed, it will slow down the whole browser measurably
> without async I/O! The fact is that if somebody doesn't know what they are
> doing with sqlite, performance will suck period.
So, instead of removing the underlying problem, let's keep putting in more and more workarounds that complicate the code?  That may not be your point, but that's how I'm reading that.  It'd be great and all if this did work perfectly, but even then we get bugs like 406657 that ask to expose an API to know when the async queue is getting too full.

I'll also toss in that *all* threads are forced to use the async thread for their io - *even* background threads.

> BTW history should really batch transactions regardless of whether async I/O is
> in or out. Failing to do this will result in performance problems period.
That's something that really should be brought up with the places team especially if a bug isn't already filed.  At the same time, I'm not sure how cookies would be any different...

> By taking away the async I/O, you are taking away the option for consumers to
> say that they want async I/O. It is actually really hard to put services on
> background threads in Mozilla. With async I/O you can give people more options
> and give people more performant runtime by default.
How is it so hard to put services on background threads in Mozilla exactly?  Are our thread API's that bad, or is it that people just don't understand threading and concurrency well?  If it's the former, we should probably fix that, but if it's the latter perhaps we should get some documentation on how to go about doing that.

> If you want it to be safe by default, make
> mozIStorageStatement.commitTransaction sync to the I/O thread and add a new
> "Soft Commit" function. This provides the current level of safety while adding
> additional options (since "options" seems to be your major concern).
But what if they executed the SQL "COMMIT;"?  We could do a string comparison on their SQL every time a statement is made/executed, but that'd really suck.

> You seem agree that it is a bad experience! I do not understand your desire to
> release a sucky product and then "maybe provide something down the line."
Right, I do see it as a bad experience.  Right now I see this tradeoff:  we have this bug, and we have bugs like bug 398912 which cause a hang because suddenly a network drive went offline.  Because of the async writes, statements would return an OK status, when really they weren't OK.  For the same reason, we had bug 408072.
So we can have hangs and crashes, or a small performance degradation.  I would choose the small performance degradation any day, and in this case the drivers agreed with me.

> We already covered the slow shutdown problem keeping the file open. You could
> still always sync to to the I/O thread if you needed to close the file and do
> something with it. So when you say "basically impossible" you mean "I don't
> want to write the code to sync to the I/O thread." Do you see what I meant by
> "I haven't heard any good arguments for removing async I/O that don't boil down
> to being too lazy to make it work properly"?
Sure, we could sync the threads when someone wants to close.  Of course, that will take an indeterminate amount of time for the caller since they have no idea how full the write queue is, blocking whichever thread tried to close it in the first place.
Writing the code to sync would have been a non-issue.  It would have been one function call to FlushAsyncIO.

I'm not saying async io is bad across the board.  What I am saying is that our particular implementation wasn't working well as we kept tossing more and more consumers at it.  Fred Brooks once said "plan to throw one away; you will, anyhow."  What we had before worked alright.  It had some issues, but didn't scale well as we threw more and more at it.  We can learn from that and create something new from it's misgivings, or try and salvage it and try to make it work with our current situation (I still don't think it's possible to solve the issue of scalability, however).
(In reply to comment #46)
> So, instead of removing the underlying problem, let's keep putting in more and
> more workarounds that complicate the code?  That may not be your point, but
> that's how I'm reading that.

Isn't that what you did here?  Forcing synchronous I/O on everything is a workaround that complicates the code needed for performance; it's much easier to provide a synchronous wrapper to an async API than vice versa.

> It'd be great and all if this did work perfectly,
> but even then we get bugs like 406657 that ask to expose an API to know when
> the async queue is getting too full.

Sure, maintaining system services is hard, and especially in early days you will end up needing to rationalize API requests.  You think you're not going to get bugs like "provide asynchronous API for database access" now?  Or that extension authors won't write stuff that grinds the browser to a halt?

I don't see any commentary in the bug you cite about alternate ways to solve the problem, so I'm not sure why you even think that providing that API is the only way to solve that problem.  A commit notification for transaction (see below) would let the callers self-throttle against async I/O progress just fine, no?

> I'll also toss in that *all* threads are forced to use the async thread for
> their io - *even* background threads.

Is that inherent to having asynchronous I/O, or a limitation of sqlite3, or a matter of API fiat?

I'm being disingenuous here: I know that it's not inherent to having async I/O, and I know that sqlite has threadsafety capabilities, so I'm really pretty sure it's a matter of API fiat on the Mozilla side.

> How is it so hard to put services on background threads in Mozilla exactly? 

Ever tried it?  It's all event-queue proxy mess, hoping you win the "component I care about interacting with is threadsafe, and not just marked that way in the refcounting" lottery, and then you get the performance pain.

> Are our thread API's that bad, or is it that people just don't understand
> threading and concurrency well?

People apparently don't understand threading and concurrency well enough to make async I/O work with appropriate synchronization capabilities for the callers, so I don't know why the latter would surprise you. :P

> If it's the former, we should probably fix
> that, but if it's the latter perhaps we should get some documentation on how to
> go about doing that.

Requiring explicit threading to get async behaviour is pretty developer-hostile, even if we had better threading APIs in the product.  Asynchrony should not require threading; it doesn't on the web for the past decade, and event/callback behaviour is all over our code for similar reasons.

> > If you want it to be safe by default, make
> > mozIStorageStatement.commitTransaction sync to the I/O thread and add a new
> > "Soft Commit" function. This provides the current level of safety while adding
> > additional options (since "options" seems to be your major concern).
> But what if they executed the SQL "COMMIT;"?  We could do a string comparison
> on their SQL every time a statement is made/executed, but that'd really suck.

What if they did?  If they ask for explicit commit, they get it.

> Right, I do see it as a bad experience.  Right now I see this tradeoff:  we
> have this bug, and we have bugs like bug 398912 which cause a hang because
> suddenly a network drive went offline.

You think that bug is inherent to async I/O?  You took away the symptom, but if writing to the drive causes fail, then doing it on the main thread when the user navigates to a new page is going to cause the same fail, right?

I'm not convinced it's a trade-off, and nothing in the analysis in that bug seems to indicate that it's not just a bad interaction between sqlite and whatever "sync" operation is being done by his network drive software.  In that case async I/O is just the victim -- as it obviously would be because it's the one doing all the sqlite operations in that model!

> Because of the async writes, statements
> would return an OK status, when really they weren't OK.

An API that associates a commit notification with a transaction would let callers get application-synchronous behaviour without imposing fsync calls on the rest of the application, or risking detrimental effects on other applications due to how the OS implements fsync().  It's not a hard API to specify or verify, and I very much doubt it's a hard API to implement even without help from sqlite internals.

Callers who need ACID and not just ACI should be able to get it, but I don't know why that means we should impose the costs of D on callers that don't need it.

> For the same reason, we had bug 408072.
> So we can have hangs and crashes, or a small performance degradation. 

Bug 408072 is "calling code doesn't like it when we have an error" (he didn't actually crash) or maybe "error reporting for async DB access sucks", which is again not inherent to doing that I/O asynchronously!  What operations did we perform on the async I/O thread that won't happen on the main thread?  Certainly not clear from that bug, and bug 408072's comment 20 sounds like dwitte agrees that the core problem still exists.

Again, a transaction-committed notification would give a way to convey error status to callers, such that they can decide how to react.  And I suspect pretty strongly that an API which reset a statement when the transaction was complete would be welcome, and reduce error habitat.  Easy enough to build on top of commit notifications.

> I would choose the small performance degradation any day, and in 
> this case the drivers agreed with me.

Did the drivers do their own analysis, or take your false dichotomy between "async I/O" and "sane error handling" at face value?

> Sure, we could sync the threads when someone wants to close.  Of course, that
> will take an indeterminate amount of time for the caller since they have no
> idea how full the write queue is, blocking whichever thread tried to close it
> in the first place.

Guess what?  Doing any synchronous operation on the db can take an indeterminate amount of time!  If you have a bunch of worker threads doing fsync()-laden writes to the databases in unbatched transactions, you don't know how long it's going to take to finish processing your new transaction.

Guess what else?  You've made all callers synchronous, and moved the scheduling and error management responsibility into all of those callers.  (If the async I/O thread can't report errors, how are each of these extensions' own async sqlite threads going to do it?)

It seems to me like you're saying "it was too hard for the mozStorage module developers to make async I/O work, but extension developers and the individual component authors will be able to make async I/O work for them".  You should think more highly of yourself and your mozStorage peers. :)

> I'm not saying async io is bad across the board.

You've been saying that async I/O is why we have these other bugs, here and in those very bugs, actually.

> What I am saying is that our
> particular implementation wasn't working well as we kept tossing more and more
> consumers at it. 

Wasn't working well because of the number of consumers, or because those consumers exposed more use cases for performant, reliable I/O and we threw up our hands to give up on the performant part?

I'm certainly guilty of taking "this is because we do async I/O" at face value rather than digging into it, and I regret doing so now, because asynchronous database access is a _necessity_, and leaving it on the plate of every caller is counter-productive.  (Remember, you can't even get them to reset their statements!  Getting them to effectively batch I/O, dispatch errors, build notification models, and track queue size seems like a taller order.)

> (I still don't think it's possible to solve the
> issue of scalability, however).

What exactly do you mean by scalability, and how does being async make it _harder_ to scale?  That certainly seems counter-intuitive given what we know about the effects of synchronous operations in other software systems.  Everyone else in the world works to make things _less_ synchronous when they want them to scale better, no?
There are a couple of things here:

- Disabling async IO does not remove the chance of drive failure causing problems, but it does reduce it as (obviously) there is no longer a queue of things waiting to be written.  The time slot during which drive failure can cause corruption is much smaller.  But, as people have said, it does not completely eliminate the problem.

- The long delay on shutdown is an issue, but I think one of the problems there was that it was hard to reproduce, and so was hard to analyze why exactly it was taking so long.  If we were shutting down, it seems like we should be able to pretty quickly flush out the queue, and if the queue had gotten really, really long, it would be interesting to figure out why.

- One of the biggest problems with async IO is that, given SQLite's design, it's only possible to enable it globally for the entire app.  It hooks in at the layer of SQLite's core IO routines, of which there is one set per process.  That is not to say that it's not possible to inform callers when their data was written do disk, e.g. if a sync property was set on the connection, each commit() would wait for notification from the async thread that the writing happened.

... well, that doesn't seem to be true any more.  You can now register multiple VFS layers, and specify which one to use at open time.  The next problem still remains, but should probably read "no other VFS layer".  But we can ensure that in the app.

- Another problem is that using async IO basically means that no other application can have shared access to a database file.  This could be fine, but it would be nice to be able to support this for consumers that want it -- as brett said, this could probably done with correct locking.

- There are also a few minor programming traps; needing to use the same filename for each connection to the same database file.  This could be easily avoided by doing some OS-specific work to see whether the filenames reference the same file or not.

- The major advantage is, essentially, performance: specifically, being able to write database code in a simple manner and have it not lock up the UI thread.  This is a huge win, and I think we need to revisit async IO in the future to get back this advantage and the other advantages mentioned in previous posts.

Given all that, I still think that disabling async IO was the right thing for 1.9.  I didn't have time to work on it, and neither did sdwilsh; the issues are certainly fixable, but I believe this can be done in a followup release.  The change should be transparent to consumers, with one caveat -- I think that we should add a 'durable' property to the connection object (probably a flag on creation, given that we may have to select a different VFS) which will default to FALSE, but that consumers can set to TRUE if they want us to do extra work to try to ensure the D part of ACID.  Right now, there would be no difference, but when we add async IO back, that would let us do the correct flushes on commit  for those connections.

Given the multiple VFS layers, I think we can add it back in a straightforward way and use it where it makes sense or on demand -- but for 1.9.0.x/1.9.1.
> - The major advantage is, essentially, performance: specifically, being able to
> write database code in a simple manner and have it not lock up the UI thread. 
> This is a huge win, and I think we need to revisit async IO in the future to
> get back this advantage and the other advantages mentioned in previous posts.
> 

The same performance advantage can be had by having a separate reader/writer thread for places where the UI is really blocked by the IO.  On server database systems it is well understood that disabling fysnc gains you commit speed at the cost of durability.  If our performance issue here is *read* io then if we can thread that or make reads only aysnc then everything is good.  

What is the specific case that the current implementation is causing trouble?
Vlad summed up what I was going to say, probably much better than I could have.  I'm just going to address a few of shaver's points there were missed.

(In reply to comment #47)
> > I'll also toss in that *all* threads are forced to use the async thread for
> > their io - *even* background threads.
> Is that inherent to having asynchronous I/O, or a limitation of sqlite3, or a
> matter of API fiat?
At the time, it was a combination of all of the above really.  Like vlad mentions though, with the new VFS stuff that sqlite has this can be made much better.

> Ever tried it?  It's all event-queue proxy mess, hoping you win the "component
> I care about interacting with is threadsafe, and not just marked that way in
> the refcounting" lottery, and then you get the performance pain.
I know about the whole "is it actually threadsafe" bits, but I was unaware that we had to also deal with event-queue proxy mess.  That sucks :(

> What if they did?  If they ask for explicit commit, they get it.
I'm not sure I fully understand what you mean here - are you saying we should check if the SQL is that and flush, or just carry on?

> You think that bug is inherent to async I/O?  You took away the symptom, but if
> writing to the drive causes fail, then doing it on the main thread when the
> user navigates to a new page is going to cause the same fail, right?
Sure, but now we get immediate feedback (some component returns an error code in the error console) instead of some point down the line that the alert box comes up.

> Callers who need ACID and not just ACI should be able to get it, but I don't
> know why that means we should impose the costs of D on callers that don't need
> it.
If you don't want the D, there's the lovely PRAGMA commands previously mentioned that do in fact speed things up greatly at the cost of D (dwitte had done some tests when he was analyzing for cookies - it's in a bug comment somewhere)

> It seems to me like you're saying "it was too hard for the mozStorage module
> developers to make async I/O work, but extension developers and the individual
> component authors will be able to make async I/O work for them".  You should
> think more highly of yourself and your mozStorage peers. :)
Not necessarily too hard.  There is/was certainly a lack of cycles as there is no one person who can devote large amounts of their time to storage.  I only picked it up because I was working on the download manager and noticed issues with it.  In the mozilla world that seems to make people the owner real fast when a module is essentially untouched.

> You've been saying that async I/O is why we have these other bugs, here and in
> those very bugs, actually.
I was talking about our particular implementation - I probably should have made that clearer in hindsight.

> Wasn't working well because of the number of consumers, or because those
> consumers exposed more use cases for performant, reliable I/O and we threw up
> our hands to give up on the performant part?
Poorly performing consumers certainly made the problem worse, but the more we threw at it, the worse it'd get.  Each disk operation would be placed in the queue and processed in the order received from *all* threads and *all* connections.  It's not that difficult to saturate this at all...

(In reply to comment #48)
> Given the multiple VFS layers, I think we can add it back in a straightforward
> way and use it where it makes sense or on demand -- but for 1.9.0.x/1.9.1.
Did you mean "not for 1.9.0.x/1.9.1"?

(In reply to comment #49)
> > - The major advantage is, essentially, performance: specifically, being able to
> > write database code in a simple manner and have it not lock up the UI thread. 
> > This is a huge win, and I think we need to revisit async IO in the future to
> > get back this advantage and the other advantages mentioned in previous posts.
> > 
> 
> The same performance advantage can be had by having a separate reader/writer
> thread for places where the UI is really blocked by the IO.  On server database
> systems it is well understood that disabling fysnc gains you commit speed at
> the cost of durability.  If our performance issue here is *read* io then if we
> can thread that or make reads only aysnc then everything is good.  

Sure, in an ideal world, you can do reader/writer threads.  But given the crappy state of threading our platform, that's basically rocket science at the moment.  The async IO bits did not affect read performance at all; they were entirely to avoid blocking on write/sync.

> What is the specific case that the current implementation is causing trouble?

Which? The current (disabled) async IO implementation?

> Did you mean "not for 1.9.0.x/1.9.1"?

No, I really meant that we can re-enable async IO in 1.9.0.x/1.9.1 once we port things to the new VFS layer, write a pile of tests, and better understand what exactly the issues are (e.g., around shutdown).
(In reply to comment #51)
> No, I really meant that we can re-enable async IO in 1.9.0.x/1.9.1 once we port
> things to the new VFS layer, write a pile of tests, and better understand what
> exactly the issues are (e.g., around shutdown).
I'm not particularly fond of that because I can't see a way for us to provide the same API and allow people to opt into durability or not.  If we can solve that, by then all means this is a perfectly acceptable solution.
(In reply to comment #50)
> > Callers who need ACID and not just ACI should be able to get it, but I don't
> > know why that means we should impose the costs of D on callers that don't need
> > it.
> If you don't want the D, there's the lovely PRAGMA commands previously
> mentioned that do in fact speed things up greatly at the cost of D (dwitte had
> done some tests when he was analyzing for cookies - it's in a bug comment
> somewhere)

bug 230933 comment 29. to summarize: with a fast disk, on a quiet day, each synchronous INSERT statement takes around 1ms. combine that with callers that don't (or can't) use batching, and things can get slow. the PRAGMA to speed this up was synchronous=OFF.

we can espouse all we want about the sadness of the situation and how we really should have an asynchronous capability (and it's a tragedy!), but the fact is that there wasn't anyone in sight with time to do it right so we punted it. are we going to hold firefox 3 for it now? doubtful: from comment 18 we saw a 1% Tp regression because of it, and much larger regressions in contrived tests (e.g. 3500% when removing absolutely enormous permission lists), but that's probably not enough to hold a release this close to endgame.

adding async capability to 1.9.0.x seems a bit scary if we're going to have API changes, unless we save up these kind of things for a 1.9.1 (which could simply be a 1.9.0.x release with a bunch of rolled-up compat-breaking changes). but that's a whole different discussion...
We don't break any APIs in patch releases unless they are inherently buggy in a bad way (security hole, e.g.).

We don't break frozen APIs until Mozilla 2.

What about 1.9.1? We have experience with Mozilla 1.8.1 that may be relevant. But first, why would we break APIs by adding async capability? Why wouldn't we merely add APIs?

/be
Hit "Commit" just as I noted this bug was RESOLVED FIXED -- such bugs make poor threaded-message newsgroup simulations. This is ripe for m.d.platform or better, doncha think?

/be
(In reply to comment #53)
> 3500% when removing absolutely enormous permission lists), but that's probably

typo; this should be 250%.
(In reply to comment #55)
> Hit "Commit" just as I noted this bug was RESOLVED FIXED -- such bugs make poor
> threaded-message newsgroup simulations. This is ripe for m.d.platform or
> better, doncha think?
Posted to m.d.platform.  Google groups link here:
http://groups.google.com/group/mozilla.dev.planning/browse_thread/thread/97c355a132e4cfe1/6a4cfba886484512#6a4cfba886484512
Filed bug 422127 for batching transactions in history. This should reduce the fsyncs caused by history in the synchronous I/O case, and it should eliminate the shutdown problems in the asynchronous case.
(In reply to comment #50)
> Vlad summed up what I was going to say, probably much better than I could have.
>  I'm just going to address a few of shaver's points there were missed.
> 
> (In reply to comment #47)
> > > I'll also toss in that *all* threads are forced to use the async thread for
> > > their io - *even* background threads.
> > Is that inherent to having asynchronous I/O, or a limitation of sqlite3, or a
> > matter of API fiat?
> At the time, it was a combination of all of the above really.  Like vlad
> mentions though, with the new VFS stuff that sqlite has this can be made much
> better.
> 
> > Ever tried it?  It's all event-queue proxy mess, hoping you win the "component
> > I care about interacting with is threadsafe, and not just marked that way in
> > the refcounting" lottery, and then you get the performance pain.
> I know about the whole "is it actually threadsafe" bits, but I was unaware that
> we had to also deal with event-queue proxy mess.  That sucks :(
> 
> > What if they did?  If they ask for explicit commit, they get it.
> I'm not sure I fully understand what you mean here - are you saying we should
> check if the SQL is that and flush, or just carry on?
> 
> > You think that bug is inherent to async I/O?  You took away the symptom, but if
> > writing to the drive causes fail, then doing it on the main thread when the
> > user navigates to a new page is going to cause the same fail, right?
> Sure, but now we get immediate feedback (some component returns an error code
> in the error console) instead of some point down the line that the alert box
> comes up.
> 
> > Callers who need ACID and not just ACI should be able to get it, but I don't
> > know why that means we should impose the costs of D on callers that don't need
> > it.
> If you don't want the D, there's the lovely PRAGMA commands previously
> mentioned that do in fact speed things up greatly at the cost of D (dwitte had
> done some tests when he was analyzing for cookies - it's in a bug comment
> somewhere)

This is incorrect. SYNCHRONOUS=off will permit corruption of the database. It can not be used for any data you actually care about. Non-durable only means that a commit may or may not actually be a commit depending on if we crash soon after it, but the commit will either have gone through or not. Synchronous=off means half the transaction could have gone through, potentially corrupting sqlite's file structure, and making the database unusab.e

Because cookies can be regenerated in most cases, that might be a good candidate for non-synchronous (as long as you can detect corruption and nuke the file), but it's much more than "non-durable." This is not a reasonable choice for most data stores.
Depends on: 421482
No longer depends on: 421482
Note bug 429986 to provide a means of async IO.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: