Closed Bug 421482 Opened 16 years ago Closed 16 years ago

Firefox 3 uses fsync excessively

Categories

(Toolkit :: Storage, defect)

x86
Linux
defect
Not set
major

Tracking

()

RESOLVED FIXED

People

(Reporter: mozilla-bugzilla, Assigned: sdwilsh)

References

Details

(Keywords: perf, regression, Whiteboard: [RC2+][has patch][has review])

Attachments

(7 files, 3 obsolete files)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.8.1.12) Gecko/20080208 Fedora/2.0.0.12-1.fc8 Firefox/2.0.0.12
Build Identifier: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.8.1.12) Gecko/20080208 Fedora/2.0.0.12-1.fc8 Firefox/2.0.0.12

  I have been using Firefox 3 nightlies for a while. I have recently been experiencing really bad system responsiveness, and have been pulling my hair out to figure out why. I knew Firefox 3 seemed to be the worst affected. I then found mention of the issue I was seeing was related to fsync. 

  I used strace to check Firefox 3 and found it was using fsync like eight times for ever new page. I tried going back to firefox-2.0.0.12, and found it didn't behave the save. I then renamed ~/.mozilla and uninstalled all my plugins to be sure it wasn't an extension or plugin. Firefox 3 still did it.

  Excessive fsync during a kernel compile causes Firefox 3 become completely unresponsive till the fsyncs are complete. In some cases if something else i/o intensive is going on Firefox 3 will freeze till the other i/o has completely finished. If it gets really really bad other applications start freezing.

Tried and has the problem:
Firefox 3.0b4, nightly, 2008305
Firefox 3.0b5pre, the lastest nightly, 20080306.
Firefox 3.0b3

Tried and didn't have the problem:
Firefox 3.0b2

Reproducible: Always

Steps to Reproduce:
1. Install Firefox 3 on Linux
2. Run Firefox 3
3. strace Firefox 3 - strace -p pid 2>&1 | grep -i fsync
4. Load multiple pages in Firefox 3
Actual Results:  
fsync(43)                               = 0
fsync(43)                               = 0
fsync(42)                               = 0
fsync(43)                               = 0
fsync(43)                               = 0
fsync(42)                               = 0
fsync(43)                               = 0
fsync(43)                               = 0

Expected Results:  
Nothing
Version: unspecified → Trunk
Keywords: perf
Product: Firefox → Core
QA Contact: general → general
See also bug 417732.
This is probably a consequence of the patch for bug 408914.
(In reply to comment #2)
> This is probably a consequence of the patch for bug 408914.
Likely not since we called fsync just as often, it just wasn't as safe for data.  This is really  an issues with sqlite and we can't do a thing about it.
Nathan, if you can try a few more 3.0b3pre builds to figure out the exact
regression date that would be very helpful.  Thanks.
I'm kind of surprised -- fsync is expensive, but not system-becomes-unresponsive expensive, or at least it shouldn't be.  From your grep up there, was it just the 8 fsyncs? or did you mean to imply that there were a bunch?  Timestamps would really be useful there as well.
It may actually be six, as someone else has said elsewhere. But I think I was seeing eight fsync per page load. It normally isn't an issue, but if the system is busy doing a lot of other heavy i/o like compiling a kernel, then Firefox 3.0b3 and later stop refreshing the window completely for however the fsync takes to complete. Sometimes that is a few seconds, sometimes it can be over a minute.
The amount of delay depends on the filesystem that you're using.  

For example, I am using Reiser4 on my Linux laptop.  fsync ("temporarily", for years now) makes R4 do a full sync which can take minutes.  ext3 also suffers from this because in ordered mode, the entire journal has to be committed before a fsync can complete.

I also note that this excessive fsync usage requires my laptop drive to power up for every page load when on battery power.

I strongly recommend using the sqlite pragma commands to disable data security.  Use temporary copies of the database and replace the original on shutdown, or something.
I tried launching an IO intensive task:

ionice -c1 nice -n-20 cp -r a_big_directory_on_your_profile_partition tmp

And then stracing the firefox process for fsync calls with -T to show the time spent in fsync:

strace -f -tt -T -e trace=fsync -p <insert_firefox-bin_pid>

I got the results:

[pid  8173] 02:11:15.198325 fsync(51)   = 0 <0.002007>
[pid  8173] 02:11:15.200456 fsync(51)   = 0 <0.000195>
[pid  8173] 02:11:15.201024 fsync(40)   = 0 <0.001842>
[pid  8173] 02:11:35.571782 fsync(51)   = 0 <8.908382>
[pid  8173] 02:11:44.480487 fsync(51)   = 0 <32.004947>
[pid  8173] 02:12:18.019528 fsync(40)   = 0 <7.829440>
[pid  8173] 02:12:25.969565 fsync(51)   = 0 <1.077681>
[pid  8173] 02:12:27.047426 fsync(51)   = 0 <0.315800>

That means firefox-bin is freezing for 32 seconds at some times. That's an extreme situation, but even a few seconds freeze during less intensive IO can be very annoying. My partition containing the profile is ext3 mounted with "rw,noatime,data=ordered".
I'm using a build with SQLITE_NO_SYNC defined in sqlite3.c and I don't see the interface freeze any more while using the browser during IO operations (I see this as a huge improvement for me). But this has a price (quoting a comment in sqlite.c):

** The SQLITE_NO_SYNC macro disables all fsync()s.  This is useful
** for testing when we want to run through the test suite quickly.
** You are strongly advised *not* to deploy with SQLITE_NO_SYNC
** enabled, however, since with SQLITE_NO_SYNC enabled, an OS crash
** or power failure will likely corrupt the database file.

So this is a compromise between UI-responsiveness and data protection. I'm not sure what could be done to improve the responsiveness while not loosing data safety. Maybe this is naive, but is the database corruption risk real in practice?
(In reply to comment #9)
> safety. Maybe this is naive, but is the database corruption risk real in
> practice?
Yes
  In the url below I think it describes the basic problem as fsync, and how using .range-end instead of fsync is the better way to go, but still not foolproof. On the other hand this is more at the sqlite level. But this does highlight how sqlite may have been a poor choice for Mozilla to make.

http://marc.info/?l=linux-kernel&m=120518537822850&w=2
  This seems to be the kernel bug about how the overarching problem is how poorly ext3 ordered mode handles fsync from a performance perspective. From my own experience it seems more like a problem with ext3 in general, in that ext3's writeback is better, but still unacceptable.

http://bugzilla.kernel.org/show_bug.cgi?id=9546
Blocks: 408914
Keywords: regression
Bug 408914 didn't cause this - the amount of fsync's would be the same regardless of us using async or not.
No longer blocks: 408914
Oh, sorry, I thought comment 9 implied that...
(In reply to comment #3)
> (In reply to comment #2)
> > This is probably a consequence of the patch for bug 408914.
> Likely not since we called fsync just as often, it just wasn't as safe for
> data.  This is really  an issues with sqlite and we can't do a thing about it.

You can batch transactions like I discussed in 408914 comment 42. You can trade off fewer fsyncs for more history being lost in the event of a crash.
Did async IO call fsync on a background thread? Using lots of fsync would be less/not an issue if it did not freeze the main UI thread.
Status: UNCONFIRMED → NEW
Component: General → Places
Ever confirmed: true
Product: Core → Firefox
QA Contact: general → places
In the past when I looked at Beta 2, which as I understand was async, I saw no fsyncs with strace. Now they might have been triggered on close the browser or some other case I didn't notice, but with a few minutes of testing, no fsync.
(In reply to comment #17)
> In the past when I looked at Beta 2, which as I understand was async, I saw no
> fsyncs with strace. Now they might have been triggered on close the browser or
> some other case I didn't notice, but with a few minutes of testing, no fsync.

I think you were mistaken unless there was some error. Sqlite should have always been generating fsyncs, whether writes were asynchronous or not. Since the fsyncs were asynchronous, sometimes that thread got behind (these would batch up and delay shutdown), so you may have noticed fewer fsyncs for a given operation.
As I remembered with a few minutes of testing by going to bookmarked pages I got no fsyncs. Finally I closed the browser and got three. So that probably explains the slow shutdowns before. They were syncing until browser close.
Also also experience a lot of IO using FireFox3 - with Firefox2 this has never been a problem.

It does not only slow down overall system responsivness, but also block the main UI thread and therefor makes FireFox completly unresponsive.
I did some stracing, it seems the problem is the high amount of seeks/writes and fsyncing after them, see the attachement ffio.txt.
Any connection with bug 430530?
I don't think this is related. I could reproduce the UI freeze whenever there is lots of IO activity while the url-classifier update should only happen sporadically.
hmm, misread sorry. Yeah I guess comment 22 could be related. Clemens, you could  do a "lsof -p <pid of firefox>" while there is lots of disk activity, to check what file corresponds to the file descriptor used by the write/_lseek calls.
Hi again,

I see all the fsyncs on handle 38 and 48, right after a few:
.............
[pid  2931] _llseek(38, 5038080, [5038080], SEEK_SET) = 0
[pid  2931] write(38, "\r\0\0\0\3\2 \0\2\340\2w\2 \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024
[pid  2931] fsync(38)                   = 0
.............

Those handles are:
firefox-b 2931   ce   38uw  REG        8,3  5042176 21594391 /home/ce/.mozilla/firefox/48q58vq8.default/places.sqlite
firefox-b 2931   ce   48u   REG        8,3        0 21595038 /home/ce/.mozilla/firefox/48q58vq8.default/places.sqlite-journal

Maybe I should add that places.sqlite is quite large, it weight 4,8mb.

Hope that helps, lg Clemens
This bug is really annoying - it takes about 2-4min till the IO storm is over.
During this time the whole system is unresponsive, and FireFox almost unuseable.
Here's a workaround if you don't want to recompile, and are willing to take the following risk:

"With synchronous OFF (0), SQLite continues without pausing as soon as it has handed data off to the operating system. If the application running SQLite crashes, the data will be safe, but the database might become corrupted if the operating system crashes or the computer loses power before that data has been written to the disk surface."

Type this in the error console "Code" field and press "Evaluate":

for each (var f in ["content-prefs", "cookies", "downloads", "formhistory", "permissions", "places", "search", "urlclassifier2", "urlclassifier3"]) {
    var dirSvc = Components.classes["@mozilla.org/file/directory_service;1"]
                         .getService(Components.interfaces.nsIProperties);
    var file = dirSvc.get('ProfD', Components.interfaces.nsIFile);
    file.append(f + ".sqlite");
    var storageService = Components.classes["@mozilla.org/storage/service;1"].
                         getService(Components.interfaces.mozIStorageService);
    var conn = storageService.openDatabase(file); 
    conn.executeSimpleSQL("pragma synchronous = 0;");
    conn.close();
}


By the way, did anyone test the  "synchronous = NORMAL;" mode? Maybe that's a better compromise between performance and security:

"When synchronous is NORMAL, the SQLite database engine will still pause at the most critical moments, but less often than in FULL mode. There is a very small (though non-zero) chance that a power failure at just the wrong time could corrupt the database in NORMAL mode. But in practice, you are more likely to suffer a catastrophic disk failure or some other unrecoverable hardware fault."
Why does sqlite has to do so many reads/seeks/writes at all?

I understand it has to fsync() to have a guarantee everything is properly written down, but why so much writing at one time?
Does firefox instruct it to do so much, or does it to reorganize data?

Furthermore why does this block the whole UI? If the fsync() cannot be removed without risk, why not run it on a background-thread?

lg Clemens
> Maybe I should add that places.sqlite is quite large, it weight 4,8mb.

This is a typical-to-low size for this file after doing several months of browsing.

> Does firefox instruct it to do so much, or does it to reorganize data?

Firefox instructs it to do so. It does many small transactions, each of which are committed, and sqlite does two fsyncs for each one. As I mentioned above, it should batch them as I described in bug 408914 comment 42, in addition to getting the fsyncs off of the UI thread.

> By the way, did anyone test the  "synchronous = NORMAL;" mode?

I do not think anybody has tested this.
(In reply to comment #28)
> Here's a workaround if you don't want to recompile, and are willing to take the
> following risk:
> 
> "With synchronous OFF (0), SQLite continues without pausing as soon as it has
> handed data off to the operating system. If the application running SQLite
> crashes, the data will be safe, but the database might become corrupted if the
> operating system crashes or the computer loses power before that data has been
> written to the disk surface."
I strongly discourage folks from doing that because the database can become corrupted.

(In reply to comment #30)
> > By the way, did anyone test the  "synchronous = NORMAL;" mode?
> 
> I do not think anybody has tested this.
Actually, dwitte looked into this for the cookie service (which does use synchronous = OFF), and there was a very small gain with 1000 writes not batched in a transaction.
This bug has resulted in the wide panning of the stability of the Ubuntu Hardy release and a general distaste for Firefox 3 on Linux. If FF3 releases with this, it will be a PR disaster. Setting ? blocking firefox-3.
Flags: blocking-firefox3?
There are some key fixes to both the safebrowsing and places code since beta 5 which should heavily mitigate the symptoms here.  Please test a current nightly before nominating bugs.
Flags: blocking-firefox3?
This bug should not affect stability, though there were many bugs in FF3b5 that could have, especially in concert with popular extensions like Firebug.  Can you provide more detail about the instability, and why you think it's related to this bug?  Perhaps crash reports/stacks?

(Having people ship a beta to lots of people who aren't expecting a beta can easily result in distaste, but that wasn't really our decision. :/ )
I have tested current nighties on Debian; no improvements. The symptoms are exactly the same has have been described by so many others in this bug. Firefox brings a modern system to its knees, as it stands. As the denomination was based on a wrong assumption about my testing, renominating.
Flags: blocking-firefox3?
Ubuntu users are not the must technically minded Linux users, so you will often see them attributing this to the wrong cause.  I think the most frequent form of complaint is that "ubuntu spends too much time in iowait" which is merely a manifestation of the fact that Firefox is, indeed, waiting for i/o.
This is not an Ubuntu bug; it just happens that they stupidly shipped beta 5 as stable. fsync behaves the way that it does on every distro and every filesystem in use. If anything is worse on Ubuntu, it's due to their incorrect use of the CGROUP scheduler options resulting in poor I/O scheduling during heavy load.

All the testing I have done has been on Debian 2.6.25 which has the appropriate scheduler configuration. In the latest nighties, it's still really, really bad. Other tasks can be done while Firefox is flushing, but it's painful.

Testing was done with a user profile with about two months of accumulated browsing history.
I'm a Debian and Ubuntu user (and developer), and my firefox and system has been a growing pain to use in the last weeks.  Firefox is constantly doing IO to disk, and while iotop shows firefox as the worst offender (and other apps almost all idle), it's doing relatively little IO.

I suspect that the old method of writing all data to a new file and then moving this file to replace the older version was less pressure for the system than the current fsync introduced by the sqllite backend.  Certainly linux could improve to support this, but Firefox could in the mean time workaround this; perhaps by not committing the sqllite db so often, or doing it async, or moving back to the previous backend on linux?
My understanding is that this is worse on ext3 than on filesystems that don't use Linux's jbd (which I think means reiser4, maybe xfs?).  Can anyone here verify that they see this severe behaviour on non-ext3 filesystems?  Thanks.
Note that some of the observed disk IO problems may have been due to bug 430530, which would exacerbate the fsync issue. Ubuntu backported the fix around May 3, so previous reports that didn't investigate the specific cause may no longer be an issue. [But obviously that's not the whole problem, since comments in this bug have observed the issue with the Places DB.] 
Also, can people who are observing this problem try the commands in comment 28, but with normal instead of 0?  It would be good to know if that's an effective mitigation.
hrm, I don't think the commands in comment 28 actually do anything.  The synchronous pragma isn't stored in the db file, it's just a property of the connection.

Ignore that, I'm wrong.

(In reply to comment #42)
> hrm, I don't think the commands in comment 28 actually do anything.  The
> synchronous pragma isn't stored in the db file, it's just a property of the
> connection.
> 

(In reply to comment #39)
> My understanding is that this is worse on ext3 than on filesystems that don't
> use Linux's jbd (which I think means reiser4, maybe xfs?).  Can anyone here
> verify that they see this severe behaviour on non-ext3 filesystems?  Thanks.

See my <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=421482#c7">comment 7</a>
This defaults all storage connections to synchronous = NORMAL, and is pref setable (toolkit.storage.synchronous) with valid values of 0 (OFF), 1 (NORMAL), and 2 (FULL).

Submitted to the tryserver - will post a link once builds start to surface.
Regarding comment #37 and comment comment #39, I understand ext3 is particularly bad with fsync(), doing a whole filesystem sync instead of just the file in question. Which is appalling, and not firefox's fault. I would expect XFS to get this right (just the file's dirty blocks, thanks) but cannot test that just now. You could ask Dave Chinner for an authoritative answer.

Regarding comment #38, writing a new file is probably not possible with sqlite (will, I suppose it might be but as a stream of transactions, not remotely an efficient way to generate a file), and more importantly - IT DOES NOT SCALE.
A major gripe of mine with FF2 was that the history file and browser state would
get rewritten frequently. It forced me to keep my history short, and since my browser state routinely had tens of windows with tens (and occasionally hundreds) of tabs in various window, this was a disaster.

A suggestion for mitigating ext3 poor fsync() implementation might be as follows: obviously, do as few fsyncs as possible, but _also_ try to anticipate them. If one is likely, call sync() in another thread, intending to get the
filesystem's dirtiness low for when the fsync comes around. Being in another thread it won't stall the GUI and it may make the stall on fsync() much less.

If you made places do browser state saves in a ticklike fashion, you could then do a sync 1 second ahead of the state save and also whenever a user hit the "bookmark" button. It would require a small layer on top of the sqlite stuff to
hold in memory a chunk of uncommitted state.

Of course I speak freely here as someone who isn't implementing this.

Hard to say what effect this would have on a busy system - might be minor if we're lucky.
I think that calling sync on a busy system is causing people's eyes to bleed; doing that in another thread will only save us, and the complaints about the rest of the desktop melting down like a Saharan sundae won't really be addressed at all.

We should, IMO, fsync less frequently via NORMAL by default, and let people who value pep over safety flip the pref to OMG_YOU_CANT_BE_SERIOUS for themselves.  Ideally, we would put that in the user-agent and breakpad reports, for easy, er, triage of dataloss reports associated with that choice. :)
(In reply to comment #48)
> I think that calling sync on a busy system is causing people's eyes to bleed;
> [...]
> We should, IMO, fsync less frequently via NORMAL by default, and let people who
> value pep over safety flip the pref to OMG_YOU_CANT_BE_SERIOUS for themselves. 
> Ideally, we would put that in the user-agent and breakpad reports, for easy,
> er, triage of dataloss reports associated with that choice. :)

I will happily be a guinea pig for the OMG setting. How robust is sqlite3 against incomplete stuff? I don't mind losing my last X minutes of state but would have some concerns about having a trashed db, particularly the bookmarks (which I think are intertwined with the rests of the state, yes?)
(In reply to comment #49)
> I don't mind losing my last X minutes of state but
> would have some concerns about having a trashed db, particularly the bookmarks
> (which I think are intertwined with the rests of the state, yes?)

In the event of a power failure or blue screen, the entire database could be corrupted and unreadable, or any less severe error.
Changing the setting from FULL to NORMAL will result in one less fsync per page write in a journal file.  When it is set to FULL, sqlite does an fsync prior to a commit to flush the rollback journal to disk, and then does an fsync to flush the rollback journal header.  When it is set to NORMAL, only the second fsync is done, so this should reduce the problem a bit.
I tried to count the number of fsync calls while running the TalosStandalone tp test.
I modified the firefox script to log system call summary
... /usr/bin/strace -c -f -o/tmp/log_$$ "$dist_bin/$MOZILLA_BIN" "$@"

toolkit.storage.synchronous=0 (OFF): 1, 1
toolkit.storage.synchronous=1 (NORMAL): 192, 192, 192
toolkit.storage.synchronous=2 (FULL): 282, 291, 288

So that's 1/3 less fsync calls when using NORMAL instead of FULL in that situation.
(In reply to comment #31)
> (In reply to comment #28)
> > Here's a workaround if you don't want to recompile, and are willing to
> > take the following risk:
> > 
> > "With synchronous OFF (0), SQLite continues without pausing as soon as it
> > has handed data off to the operating system. If the application running
> > SQLite crashes, the data will be safe, but the database might become
> > corrupted if the operating system crashes or the computer loses power
> > before that data has been written to the disk surface."
>
> I strongly discourage folks from doing that because the database can become
> corrupted.

How can the database become corrupted on an ordered filesystem?

Or I guess my question is really: why is an fsync required to prevent
corruption on an ordered filesystem?

http://bugzilla.kernel.org/show_bug.cgi?id=9546#c23
(In reply to comment #50)
> (In reply to comment #49)
> > I don't mind losing my last X minutes of state but
> > would have some concerns about having a trashed db, particularly the bookmarks
> > (which I think are intertwined with the rests of the state, yes?)
> 
> In the event of a power failure or blue screen, the entire database could be
> corrupted and unreadable, or any less severe error.

I'm happy that an OS may eat my data - that's what backups are for (note to self - backup the sqlite files(!)). I'm not happy for an app crash to eat my data (losing recent data ok, corrupting the db not ok). The "With synchronous OFF (0), SQLite continues without pausing as soon as it has handed data off to the operating system" suggests to me that an app crash should not corrupt the database.

So, I'm happy to run with synchronous OFF. How do I choose that mode from user land (outside the app, perhaps via the prefs.js file - ideal - or via an envvar)?
(In reply to comment #53)
> How can the database become corrupted on an ordered filesystem?

I thought normally "ordered" filesystem, if we're running on one, refer to only
the journaling of the filesystem metadata and not to user writes.
How about keeping a transaction open for, say, a minute, and commit it on some sort of timer or when a certain amount of changes have been made.  Also build SQLite with the VFS options for Safe Append.  I am fairly sure most Linux journaled FS implement that.  Maybe not XFS but I believe that's being fixed.

Inside a transaction and with pragma syncronous=NORMAL and with "safe append" compiled in I believe SQLite will not flush at all until the COMMIT command.

It would be spiffy to change SQLite to have some transaction mode where it could commit parts of a uncommitted transaction on crash reload.  Some sort of nested transaction, an outer BEGIN NOT-REALLY-A-TRANSACTION; :) and some inner BEGIN; COMMIT; groups, and every 5 minutes a commit on the outer transaction.  Each of the inner, real transactions would have to fully succeed or fail, but partial processing of NOT-REALLY would be OK.
(In reply to comment #55)
> (In reply to comment #53)
> > How can the database become corrupted on an ordered filesystem?
> 
> I thought normally "ordered" filesystem, if we're running on one, refer to
> only the journaling of the filesystem metadata and not to user writes.

Right, thanks.  Only the metadata, and "When it's time to write the new
metadata out to disk, the associated data blocks are written first".
http://www.ibm.com/developerworks/linux/library/l-fs8.html#4

"ordered" is enough for SQLITE_IOCAP_SAFE_APPEND, but not enough to ensure
that the sqlite journal gets written _before_ modifications to the database
file.  So even if SQLITE_IOCAP_SAFE_APPEND is set, COMMIT would still fsync:

http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/db/sqlite3/src/sqlite3.c&rev=1.18&mark=23088-23090#23060

So I guess this comes down to how much the "D" in "ACID" is needed,
(discussion starting around bug 408914 comment 42, which looks like it
resulted in bug 429986), or perhaps a periodic "D" through batching commits
every ten minutes or something (comment 56 and bug 422127).
I've been testing out the patch, https://bugzilla.mozilla.org/attachment.cgi?id=320806  and it makes a world of difference, without the patch Firefox trunk would often lock up. We need this patch in 3.0-release.
I just registered an account to request inclusion of the patch in #45 but #58 just beat me to it.

A few things I want to point out though (I hope this is not too flame-ish):

Reading http://www.sqlite.org/atomiccommit.html is a bit depressing. Even if the assumptions are reasonable, it's pure guesswork with no real checks done. One thing that worried me was the "sector size" thing. Last time I checked (on NTFS, using DiskMon) writing a single byte in a file caused 4K to be written to disk (not sure it that's the filesystem cluster size or the machine page size, since both are the same). This might depend on which system call is used to do the writes (in this test WriteFile() was used, and a quick inspection reveals that that's what sqlite3.c uses). This would mean that the 512-byte "sector" assumption is incorrect. I expect other filesystems and operating systems to do the same, since this is usually implemented via a memory-mapped cache where any write to a page marks it as dirty, and pages are 4K on most architectures. I think this currently isn't a problem in Mozilla since it overrides the default page size and uses 4K via pragma.

In any case, the assumptions made are overly pessimistic. For one thing, there's not such a thing as a "partially written sector" on modern hardware. You get either the old data, the new data, or (very very unlikely) a CRC error. SQLITE_IOCAP_SAFE_APPEND is definitely safe to use: in the event of a failure while you are expanding a file, you'll not get random data or data from deleted files such as old journals (that'd be, amongst other things, a security hole on the OS). At worst you'll get all zeros which will be detected as invalid (sqlite guarantees this as every journal page has its own page number and a checksum written on itself).

To implement transactions, you need just one barrier. If you want durability, then you need another, yet I can't imagine anything in a browser needing durability. Besides, both Windows and Linux end up flushing very fast to disk (both do it in less that 5 seconds by default IIRC).
Hello folks,
  I'd like to make a comment from an end-user perspective. First of all, I'd want to note that I also observe excessive IOs in Windows as well.

But my question here comes to this: how come my browser ended up as being a Bank Account database system? I mean, it's not an ATM machine! Me, as a user, don't want to pay such amount of CPU for my bookmarks, I mean, I don't care 'bookmarks' if it's about my system responsiveness and my browser experience!

I understand all the ACID stuff, but I'm wondering why FF needs to do such amount of db activity while I'm happily ignorant browsing the net.

What is actually being updated?
I'm trying to answer this by myself: I'm intercepting all the sqlite3_exec and will try to come up with some analysis soon.
Attached file sqlite3_exec calls while typing an URL (obsolete) —
raw data, no analysis yet. Look for 'SQL:' entries (sorry my other debugging stuff).
Anyway, MAN! I just typed 'google', and see the amount of db activity (delete, update), just for typing an url! (not hitting enter).
Sorry guys, my first time on this.
This is the log when navigating 2 pages. Again, just consider lines beginning with "SQL:".
Seems like I'm missing something, since only BEGIN DEFERRED and COMMIT TRANSACTION statements were captured. Any suggestion? I'll see during Sunday.
Flags: wanted1.9.0.x+
Whiteboard: [RC2?]
(In reply to comment #62)
> Created an attachment (id=321478) [details]
> sqlite3_exec calls while typing an URL
> 
> raw data, no analysis yet. Look for 'SQL:' entries (sorry my other debugging
> stuff).
> Anyway, MAN! I just typed 'google', and see the amount of db activity (delete,
> update), just for typing an url! (not hitting enter).
> Sorry guys, my first time on this.
> 

Hrm - I'm not seeing similar behavior.   I wonder if this is from the urlclassifer (anti-phising/malware database) which updates in a background thread.  You can disable both in the security tab.  
In that case, shouldn't it take place AFTER I hit enter/go? I really think that it's accessing the bookmarks and history, because of the table it accesses. I'm uploading fresh data here, because I don't see any exec while browsing the net, so I'm wondering if there is some trigger. Data will be uploaded in few minutes, containing the output of sqlite3_total_changes() after each exec.
(this is the first time I mess with SQLite at all, so I might be missing important stuff here. Please let me know if so).
look for SQL: in the text log.
I typed 'google' and didn't hit enter/go.
Attachment #321478 - Attachment is obsolete: true
I navigated 2 pages. Somebody might want to replicate en Linux maybe? I don't expect any difference about this, since the SQLite calls are in platform-independent code, I guess.

Still look for 'SQL:' text in log messages. 'count=x' is the output of sqlite3_total_changes() after each sqlite3_exec.

Some observations: please note the amount of statements executed in 'blank' statetments, what does it mean?
Attachment #321479 - Attachment is obsolete: true
I might have uploaded a wrong log file, but my question is:
considering that I added a log message in the sqlite3_exec function, and a log message at the end of the function showing the number of updated records, why I'm getting this?

[3468] SQL: BEGIN DEFERRED
[3468] SQL: count=211
[3468] SQL: COMMIT TRANSACTION
[3468] SQL: count=221
[3468] SQL: BEGIN DEFERRED
[3468] SQL: count=88
[3468] SQL: 
[3468] SQL: count=88
[3468] SQL: 
[3468] SQL: count=89
[3468] SQL: 
[3468] SQL: count=90
[3468] SQL: 
[3468] SQL: count=91
[3468] SQL: COMMIT TRANSACTION
[3468] SQL: count=92

In other words, where are the SQL statements that cause the counter to grow? They don't go thru sqlite3_exec.

Any hint?
They go through sqlite's statement API. sqlite3_exec is for one-shot statements that aren't precompiled or re-used.
We are not going to force a respin for this.  If we do an RC2, we will get that patch, otherwise we will take it in 3.0.1.

We would like linux distros to take this patch for their builds if at all possible for 3.0.
Flags: wanted1.9.0.x+
Flags: blocking1.9.0.1+
Flags: blocking-firefox3?
Flags: blocking-firefox3-
Note:  This patch defaults to NORMAL for *all* platforms.  Each platform would have to add the right pref in the browser's preference file to toggle this to that value it wants (as the patch is currently written at least).
(In reply to comment #58)
> I've been testing out the patch,
> https://bugzilla.mozilla.org/attachment.cgi?id=320806  and it makes a world of
> difference, without the patch Firefox trunk would often lock up. We need this
> patch in 3.0-release.

Can you confirm, please, that the patch makes a world of difference even with the default value (1 - NORMAL) for toolkit.storage.synchronous?  (It surprises me that removing only 1 of 3 fsyncs would make such a difference.)

(Beware that, it looks like the browser may need to be restarted for a preference change to take effect.)
Attachment #320806 - Flags: review?(shaver)
Assignee: nobody → sdwilsh
Component: Places → Storage
Flags: blocking-firefox3-
Product: Firefox → Toolkit
QA Contact: places → storage
Whiteboard: [RC2?] → [RC2?][has patch][needs review shaver]
Comment on attachment 320806 [details] [diff] [review]
default to synchronous = NORMAL for storage connections

r=shaver, recommend that we take this at the earliest possible ship vehicle.
Attachment #320806 - Flags: review?(shaver) → review+
Don't forget: Thrashing the disk is particularly bad for MOBILE users-- wrecking battery life. People are already noticing. It's not _just_ lock-up of the GUI....
(In reply to comment #7)
> I strongly recommend using the sqlite pragma commands to disable data
> security.  Use temporary copies of the database ...

I wonder whether this is something that should be considered as a moderately
easy-to-implement solution.

Are database sizes (of those that we care enough about) small enough that this
would be reasonable?  (I'm afraid webappstore may not be.)

1) USE synchronous OFF

   This still provides up-to-date durability and consistency against
   application interruption, but the database may become corrupted on
   filesystem interruption (OS crash, power failure, etc.).

2) Periodically copy (back up) the database file at a time when it is
   consistent (at a filesystem level - it need not be flushed to disk).  This
   may be after a commit.

   If the copy of the database is made to a _new_ file, I would expect (but
   I'm not certain) that the filesystems (that we care about) would provide
   the necessary consistency.

   Durability against filesystem interruption would only be as far as the most
   recent back up, the period of which might depend on file size.

This should mean that no fsyncs are required.  The filesystem can spin up
the drive and write to disk at its leisure.

The difficult part of this solution, I assume would be detection of a
corrupted database requiring fallback to the backup.  From what I hear, it
sounds like sqlite checksums will detect this during lookup, but not
necessarily when opening the database (see bug 434805).
Just one question: will I have to hire a DBA for reading the newspaper?

Now without irony: do we want to pay all this DB stuff for history and bookmarks? IMVHO, it's a too high cost.

Automatic backups? What's the cost? Did you do any measurement about resources consumption? Another very simple question: can the DB size ever decrease, or it's always growing?

If FF is going to support this, I think there should be a setting for turning it off for those users not willing to pay this overhead for these features. It's just one oppinion.
Just checked on my mysql server, it requires 0 fsyncs to commit data to its database. Which is what you get when you use the correct tools to make the OS do what you want.

The problem noted in this bug is a problem in SQLite, not in firefox. They should be using the O_SYNC argument to open, instead of flushing the whole disk all the time.
For those of you on linux seeing performance issues, I have a few questions that I'd love to have answered:
1) are you using a distro provided build of Firefox?
2) if so, can you go to about:buildconfig and let me know if it was compiled with "--enabled-system-sqlite"?
3) if it was, can you please tell me the version number of your sqlite?
(In reply to comment #79)
> For those of you on linux seeing performance issues, I have a few questions
> that I'd love to have answered:
> 1) are you using a distro provided build of Firefox?
> 2) if so, can you go to about:buildconfig and let me know if it was compiled
> with "--enabled-system-sqlite"?
> 3) if it was, can you please tell me the version number of your sqlite?

I'm not seeing the issue myself but I can explain how Ubuntu packages firefox wrt sqlite.

The recently released 8.04/Hardy (and the upcoming 8.04.x updates) are not
using system sqlite for firefox 3 as our sqlite is too old (4.3.2). The next Ubuntu (8.10/Intrepid) currently in development already has sqlite 3.5.9 in so
firefox 3 is built with --enabled-system-sqlite (there's a test in our build
system).

As for Hardy, we decided today to take attachment 320806 [details] [diff] [review] downstream in our RC1 build so performance should be better for our users.

(In reply to comment #80)
> The recently released 8.04/Hardy (and the upcoming 8.04.x updates) are not
> using system sqlite for firefox 3 as our sqlite is too old (4.3.2). The next
> Ubuntu (8.10/Intrepid) currently in development already has sqlite 3.5.9 in so
> firefox 3 is built with --enabled-system-sqlite (there's a test in our build
> system).
4.3.2?

Note: we haven't qa'd 3.5.9 yet ourselves to make sure it works out well.  We've had issues with various versions between what we have now and up to 3.5.8.

What do you mean by "there's a test in our build system"?  A version test to make sure it's compatible, or what?

> As for Hardy, we decided today to take attachment 320806 [details] [diff] [review] downstream in our RC1
> build so performance should be better for our users.
I'm glad.  Last I knew we were planning on distros to do this since this bug seems to hit linux harder than other systems.
typo, I meant 3.4.2 in Hardy.

> What do you mean by "there's a test in our build system"?  A version test to
> make sure it's compatible, or what?

just a version test toggling --{en,dis}able-system-sqlite
(In reply to comment #78)
> The problem noted in this bug is a problem in SQLite, not in firefox. They
> should be using the O_SYNC argument to open, instead of flushing the whole disk
> all the time.
 
If I'm not mistaken, O_SYNC would actually require data syncs after every write, no only twice per transaction, while fsync is not sync and so shouldn't necessarily flush the whole disk.
(In reply to comment #10)
> (In reply to comment #9)
> > safety. Maybe this is naive, but is the database corruption risk real in
> > practice?
> Yes

This claim deserves a few more words.  Since these sqlite databases are not
shared with other concurrent users, the only thing these fsyncs accomplish
is provide some degree of protection in case of a machine crash.  Considering
the infrequency of that event, the minor consequences of corruption of browser
history (or even bookmarks), making people pay real latency time and power
costs seems to require better-than-monosyllabic justification.

By the way, does sqlite know about fdatasync()?
(In reply to comment #84)
> By the way, does sqlite know about fdatasync()?

Yes, but maybe also no.  There is an #ifdef HAVE_FDATASYNC block in the code, and has been for a long time.  But I wouldn't count on any particular build of sqlite to use it.  I'm pretty sure that versions prior to 3.5.0 bundled a configure script that did not check for fdatasync.  I dimly recall that the 3.5.x series has an improved configure script, but I don't have that version to hand and wouldn't assume this was corrected without checking it.  The version bundled with mozilla has been shorn of its configure script and does not appear to get *any* HAVE_* defines, not even from mozilla's own configure script.

Do you know whether fdatasync() has the same performance problems that fsync() does, on Linux/ext3?  I don't.
(In reply to comment #84)
> (In reply to comment #10)
> > (In reply to comment #9)
> > > safety. Maybe this is naive, but is the database corruption risk real in
> > > practice?
> > Yes
> 
> This claim deserves a few more words.  Since these sqlite databases are not
> shared with other concurrent users, the only thing these fsyncs accomplish
> is provide some degree of protection in case of a machine crash.  Considering
> the infrequency of that event, the minor consequences of corruption of browser
> history (or even bookmarks), making people pay real latency time and power
> costs seems to require better-than-monosyllabic justification.
> 

"I've lost my bookmarks" for a long time has been the #1 support request from folks - often due to profile corruption which does cause real harm and can come from power loss, crashes, any number of reasons.  This is one of the reasons we wanted a more robust data store and also built in backups for bookmarks.  It is a balance (safety and performance) but given modern systems we should be able to get both.

There was an unrelated issue in that certain versions of linux distributions where linking against the system sqlite - which unfortunately was version 3.5.8 that has a bad performance regression (http://www.sqlite.org/cvstrac/tktview?tn=3015) that hits us particularly hard.  That issues has understandability been confused with this one in causing performance problems.  

We need to make sure we have *data* that people are seeing real system problems directly related to fsync issues before we jump to any conclusions.  



Whiteboard: [RC2?][has patch][needs review shaver] → [RC2?][has patch][has review][can land]
firefox 3.0rc1 is just as bad. I seem to be using a copy using non-system sqlite. Going to try the suggested patch.
firefox 3.0rc1 with the 320806 patch isn't any better.
(In reply to comment #86)
[snip]
> from power loss, crashes, any number of reasons.  This is one of the reasons we
> wanted a more robust data store and also built in backups for bookmarks.  It is
> a balance (safety and performance) but given modern systems we should be able
> to get both.

What about the following:
 - during the session, we keep new bookmarks to a flat text file.
 - a background thread copies the flat file data into the DB, with all the necessary blocking IO (but just blocking to the copying thread)
 - when FF closes, we wait for the copying thread to finish any pending data copy from the file to the DB. After it finishes, we remove the text file.

 - when FF start up, checks if the text file exists. If it does, means that a crash occurred. In that case, a message shows up telling the user that pending data from previous session is pending to be copied to the DB, and tells the user to please wait until the job is done. OR, even better, when FF starts the background copying thread performs a diff to detect which was the last copied bookmark, and starts from there.

 - bookmark search operations are performed in the DB. When opening the bookmarks window, the UI waits for the background thread to finish updating the DB (which should not take that much).

What do you think? We would be doing an appending operation to a text file, a read operation from that file and write in the DB in background, and a recover capability in the less frequent situations of crashes.
And, IO waits take place in also less frequent situations: start-up after a crash, FF close, and bookmarks window opening.

 ?
Guys, sorry if this looks too dumb. I'm just trying to help, so I made a graphic of the idea.
Looks like I applied the patch to the wrong package. Though I am having trouble compiling that package.
Ok, I got it to compile, and it is maybe a touch better, but not significantly better.
Please stop spamming the bug with ridiculous suggestions.  Nobody is going to dump sqlite at this stage of development.  This bug exists to make sure that our use of sqlite is correct and optimal.
(In reply to comment #85)
> (In reply to comment #84)
> > By the way, does sqlite know about fdatasync()?

> Do you know whether fdatasync() has the same performance problems that fsync()
> does, on Linux/ext3?  I don't.

man fdatasync sounded hopeful for a partial solution, as I thought metadata would not need flushing too often

  fdatasync() is similar to fsync(), but does not flush modified metadata
  unless that metadata is needed in order to allow a subsequent data retrieval
  to be correctly handled.  For example, changes to st_atime or st_mtime
  (respectively, time of last access and time of last modification; see
  stat(2)) do not require flushing because they are not necessary for a
  subsequent data read to be handled correctly.  On the other hand, a change
  to the file size (st_size, as made by say ftruncate(2)), would require a
  metadata flush.

but it did not solve the problem.

fdatasync times in seconds
(from "strace -p 350 -tt -T -e trace=fsync,fdatasync")
while recursively copying a directory on the same ext3 filesystem with
data=ordered.

17:00:27.360754 fdatasync(39)           = 0 <6.385316>
17:00:33.910754 fdatasync(26)           = 0 <0.002104>
17:03:35.927404 fdatasync(39)           = 0 <15.165946>
17:03:51.094504 fdatasync(26)           = 0 <11.332884>
17:04:09.910734 fdatasync(39)           = 0 <15.647136>
17:04:25.876871 fdatasync(26)           = 0 <3.729123>
17:05:27.681558 fdatasync(39)           = 0 <4.816529>
17:05:32.624890 fdatasync(26)           = 0 <8.096333>
17:05:49.064045 fdatasync(39)           = 0 <20.433656>
17:06:09.563202 fdatasync(26)           = 0 <0.062285>

(The shorter times were actually while I was finding larger directories to
copy.)
Attachment #322061 - Attachment is obsolete: true
Anecdotal observation (your mileage may vary):

Concerned users can, meanwhile, relieve some of the delay stress by tuning the relevant ext3 file systems to journal_data mode.

e.g. "tune2fs -o journal_data /dev/sda1" and reboot once, where your file system device may vary. (Note that I also made my journal bigger)

This seems to allow the fsync to return when the data and metadata is written/flushed to the file system journal as opposed to needing to flush the journal to the file system in general.  Various web searches on the topic seem to indict the "ordered data" concept in needing to do this cumulative flush, with commentary on the topic going back as far as the turn of the century [see http://olstrans.sourceforge.net/release/OLS2000-ext3/OLS2000-ext3.html]. The common default ext3 is journal_data_ordered which implies the sequence ordered flushing observed along with metadata-only journaling.

I have been using the Ubuntu distro with the firefox 3 Beta 5 for weeks with no noticeable pauses.  I was completely unaware of this as a possible issue because I was not seeing such pauses.

Note that fdatasync _should_ still be subject to the data write ordering issues that fsync exhibits if I understand what is happening correctly.

I would call this a linux tuning issue not a firefox problem.

I don't have any hard metrics for this at this time.
Whiteboard: [RC2?][has patch][has review][can land] → [RC2?][has patch][has review][needs approval]
I have been plagued for ages with the nightlies being unstable and just having a grey screen. This goes in intermittent batches and requires frequent 'force quit' and then crashes on restart. It's always accompanied by lots of disk activity so I guess this is the correct bug.

I had it on Ubuntu Gutsy and now Hardy!

I eliminated accessibility (AT-SPI) being the cause.

It seems that keeping my bloglines page open in a tab and restarting is a very good way to cause a hang and crash. when I don't do that I have less problems. So I postulated if Ajax related ads also have GMail saved.

I'd like to see a fix shipped as at times it is completely unusable. Even if it's an interaction with linux file sys

I will try the journal fix in comment 96 and report back
(In reply to comment #86)
> > This claim deserves a few more words.  Since these sqlite databases are not
> > shared with other concurrent users, the only thing these fsyncs accomplish
> > is provide some degree of protection in case of a machine crash.  

> "I've lost my bookmarks" for a long time has been the #1 support request from
> folks - often due to profile corruption which does cause real harm and can come
> from power loss, crashes, any number of reasons.

The question is how many of these support requests actually came from cases
where this eager fsync business would have made a difference: not just
mozilla crashes, but machine crashes that preclude the OS from routine
flushing its I/O buffers.

If it turns out that many of these requests did involve machine crashes,
then sqlite still begs the question, for fsync()ing any old data file
format, be it old style text or whatever, would have provided the same
degree of protection.

> We need to make sure we have *data* that people are seeing real system problems
> directly related to fsync issues before we jump to any conclusions.  

Each of those strace transcripts above seems completely convincing.
(In reply to comment #99)
> Each of those strace transcripts above seems completely convincing.
Counting the number of fsyncs is hardly convincing data.  That's like counting the number of mallocs - it can provide insight but doesn't give you anything you can really use.
Shawn, the transcripts don't only show the number of f*sync calls,
but their duration in the <angle brackets> too.
(In reply to comment #101)
> Shawn, the transcripts don't only show the number of f*sync calls,
> but their duration in the <angle brackets> too.
The only fsync data I see with numbers for duration is from comment 94 where the commenter recursively copied a directory - that was not sqlite doing fsyncs.
See comment #8.  Each strace was focused on the firefox process.
A mere recursive copy wouldn't generate fsyncs anyway.
The number and frequency of fsyncs is pretty convincing to me when taken with the known-pessimal behaviour of ext3-fsync when under concurrent I/O load.

What data would you prefer, and how do you propose getting it?  Instrumenting the kernel to report on what's causing I/O wait times isn't impossible (kprobes might suffice without any source changes), but I'm not going to do it for you. :)

I'd recommend spinning up the Linux system of your choice, using ext3 with data=ordered, and starting a compile on the same partition as your profile.

Filesystem developers know that sqlite's fsync frequency and ext3/jbd's whole-journal behaviour in ordered mode interact poorly, and recommend that we reduce the frequency of fsyncs to mitigate.  Modern systems provide the atomic-sector update and append love that are partial motivation for the conservative fsyncing, and with this patch distro vendors can select OFF if they know more than we can about their fs configuration and capabilities at this time.  With data=journal we don't need the fsync at all.  sqlite could select that with an ioctl on the file when it's created, maybe we should ask for an API to do that sort of "set sufficiently synchronous" thing -- it could fall back to the fsync parade if it didn't know how to handle it better in an fs-specific way.

(fdatasync is the same as fsync in ext3 today, though there are patches of February-2008 vintage that will help repair that in the possibly-ext4 future.  A nice idea, though!)
(In reply to comment #103)
> See comment #8.  Each strace was focused on the firefox process.
> A mere recursive copy wouldn't generate fsyncs anyway.
Missed that one.  What that doesn't tell us though is what thread those are running on.  If those fsync calls are not on the main thread (we do have things that write on background threads), they are not the cause of the UI locking up.
From drh regarding O_SYNC flag on open:
http://www.mail-archive.com/sqlite-users@sqlite.org/msg34128.html
Shawn: it will be the cause of the brutal effects on the rest of the system regardless of which thread it's on.  We know that places causes frequent fsyncs, we know that it does that on the main thread, and we know that the block-duration of the fsyncs is related to the amount of outstanding I/O on that filesystem in the common (ext3/jbd + ordered) case.  People have patched their builds to test the fsync-delay correlation (thanks Sylvain), and produced compelling results.

The gun is smokier here than it ever was for async I/O causing corruption, to choose an example entirely not at random.  If you don't want to believe what I and other domain experts are telling you, or what people are reporting in tests on their systems, then you should spin up Linux and do more diagnostics yourself, and promptly.
We know that O_SYNC would require more disk writes, but it wouldn't cause us to block on _all_ outstanding data being written to oxide.  If the multiple writes are to the same OS page, then only the last one would need to be written with sync to get the effects we desire, I believe.  (My recollection is that O_SYNC and fsync don't actually get as far as oxide before returning, as of Linux 2.6.recent.  They get to the disk's cache, and you have to use hdparm or other craziness if you care that much. mysql does not, as a data point.)
(In reply to comment #107)
You seem to be making the assumption that I don't think this is a problem, which is inaccurate.  Schrep, who is a 1.9 driver (yes, I'm aware you are as well), asked for "*real* data" in comment 86, and talking with him on irc last night indicated that fsync counts wasn't what he was looking for.  I figured since he's in a timezone that's three hours behind us, it would be constructive to point out that the number of fsyncs isn't what he was looking for since getting the "right" fix sooner rather than later for this issue would be a good thing.  However, that seems to have painted me as "the man who thinks this isn't an issue."  Lesson learned - I'll stay out of this.
I'm assuming that you personally believe that more data is needed to indicate that this is a real problem, or that fsyncs are the cause of it, since you asked for more data and dismissed what was here as insufficient.

Number of fsyncs is irrelevant to global system performance, except where the number is > 0, meaning that we're causing all I/O on that filesystem to be flushed before anyone's I/O can make progress.  Having fewer fsyncs may result in longer "peak pause", in fact, since it will be waiting for 3 sec of accumulated I/O to flow out instead of 1 sec's worth.  Once we're beyond the buffer write-out period, reducing frequency might make sense since on average we'll see the same peak pauses but fewer of them.  (I think that's 5 seconds or so, but there are a lot of scheduling knobs which are new since I lived and died by jbd journal performance, so I could be out of date.)

Gathering "real data" seems trivial to me, if "real data" means "I observe it myself rather than relying on reports in bugs" -- boot linux, verify FS config, start FF, start write-intensive task, navigate.
Since my question regarding whether machine crashes were the only
real source of concern wasn't answered, how about pondering this
question instead.  If in fact data-losing crashes were so frequent,
why do most unix/posix programs (other than concurrency-laden
programs like databases) NOT call fsync on their precious files?

It was not rhetorical, but let me answer anyway: it's because the
OS is usually reliable enough not to require this.   Even
linux/ext3/jbd/ordered will flush out data in a few seconds in
the background.  That's just what operating systems do.

So what do you think makes firefox's files so much more precious
and/or vulnerable than, say, my email, or source code, or compiled
binaries?
(In reply to comment #110)
> I'm assuming that you personally believe that more data is needed to indicate
> that this is a real problem, or that fsyncs are the cause of it, since you
> asked for more data and dismissed what was here as insufficient.
Please re-read my comment then - specifically "talking with him [shcrep] on irc last night indicated that fsync counts wasn't what he was looking for".
We don't set policy for those other files, and we don't deal with data-loss reports pertaining to them, so the difference in handling isn't relevant.  sqlite *is* a database, and has sensitivity to the integrity of the file which may not be matched by those other formats or use cases.  You should ask in their bug trackers, not ours.  Concurrency isn't relevant to fsync, unless you're on an operating system that doesn't share buffers between different programs opening the same file.  I do not recommend that you run such an operating system, if you can still find one.

On configurations and operating systems that do not have the pathological fsync behaviour shown by ext3, fsync here is a reasonable tradeoff preferring to err on the side of data integrity, and does not have poor systemic effects.  Flushed-in-N-seconds speaks to durability, but not integrity, and it's the latter that motivates the decision we have inherited from sqlite; many comments in this bug have made that point, quite explicitly.  Unfortunately, ext3's behaviour here leads to bad effects for some Linux users, and unfortunately it's not something that came to our attention until late in the development cycle.

With the patch in this bug (which will be in 3.0.1 if not our 3.0, and in distribution packages of 3.0 for most distros) you can make your own decision about how to weigh integrity vs. performance, so I think your concerns are addressed, and have been since well before your most recent comment.
(This situation may have been exacerbated in Linux 2.6, which made fsync do drive write barriers in many situations.  Sadly, even with fsync you're not sure to get an actual flush, since that's predicated on the inode changing, and mtime is a second granularity, waah.  So on ext3, because of the combination of stop-the-world-I-want-to-fsync and no-means-no-but-fsync-means-maybe, OFF is likely the right setting for us.  Having to detect ext3 filesystems, and possibly their journalling mode, is something I'd rather not do without Linux-distributor or other expert assistance, though.  If we got optional use of EXT2_JOURNAL_DATA_FL at the sqlite layer we would probably be in the best available situation -- except that basically nobody does that, so we'd be in not-well-tested paths in the kernel itself.  Anyway, when Linux fixes fsync to make it _truly_ durable the cost will only increase, especially if ext3 isn't better at this case, so we should work to break our fsync addiction, and if possible help other sqlite users as well.

We should almost certainly just batch at the app level as well, but that's not a 3.0-timeframe option, unlike the patch in this bug.)
You can detect ext3 easily with statfs() [*not* statvfs(), unfortunately] but I am not sure about detecting data=ordered.  Parsing /proc/mounts is the least bad way I can think of.
It doesn't look like O_SYNC would work out well:
http://www.mail-archive.com/sqlite-users@sqlite.org/msg34152.html
(In reply to comment #78)
> 
> The problem noted in this bug is a problem in SQLite, not in firefox. They
> should be using the O_SYNC argument to open, instead of flushing the whole disk
> all the time.
> 

Tried that. Using O_SYNC runs at half the speed of fsync() on my SuSE 10.1 box.
Under what concurrent I/O load?
Whiteboard: [RC2?][has patch][has review][needs approval] → [RC2?][RC2+][has patch][has review][needs approval]
Sorry folks I was on a plane for most of today - since it looks like I stirred up some fun between Shaver and Shawn - let me clarify what I want to ensure before we touch anything:

1) Karl or Sylvain any chance you can re-run your tests with 

NORMAL
FULL
OFF

and post the results?

2) I wanted to make sure that "UI freezing" behavior is clearly correlated to this specific issue dince there was already one false positive The URLClassifier can be the cause of the i/o load so comment 8 doesn't mean the UI is freezing during these times (unless you saw that directly Sylvain).  Now as Shaver points out this will make your system suck anyway so we should do something about it.

3) That setting to NORMAL helps and/or fixes it.  From my reading here for folks with the right config of ext3 NORMAL isn't likely to fix in enough.  Test #1 should answer this question easily.  

I wanted 1-3 before we changed the default for everyone because there is an additional risk of corruption (however low) - so we should make sure changing this helps.

I'm definitely in favor of adding the pref so that distros and/or individual users have a way out.  Richard Hipp  suggested using temp tables (which don't sync) with batch copies to the main table occasionally.  This ensures there is no corruption, really reduces the write problem, only at the risk of some temporal data loss (last x minutes) which is totally acceptable.  As Shaver says that's too big for right now - we can do that for 3.1.

Thanks to everyone here for collecting data and adding their insight. 
(In reply to comment #76)
> 
> 1) USE synchronous OFF
> 
> 2) Periodically copy (back up) the database file
> 
> The difficult part of this solution, I assume would be detection of a
> corrupted database requiring fallback to the backup. 

This suggestion makes a lot of sense to me.  I've been thinking about this problem a lot lately - mostly wondering why I've never seen it before.  I have worked on lots of apps before that do multiple SQLite transactions per second without any problems.  (The SQLite database that backs the SQLite website does one or two transactions per second around the clock 24/7 - all on one slice out of 32 of a virtual machine)  Dunno why some people are having problems.

But clearly, turning synchronous off and making backups would work.

As for the question of how to detect corruption:  Surely FF has some indicator when it is coming up that the last shutdown was not clean.  In those cases run either

     PRAGMA integrity_check(1);
     PRAGMA quick_check(1);

If there any problems, integrity_check will find them.  Quick_check will find most problems but might (in theory - unlikely in practice) miss cases where an index is out of date with its table.  On the other hand, quick_check runs faster.  I just tested on my Linux box using a cold disk cache and integrity_check checked a 67MB database in 10 seconds and quick_check checked a 32MB database in 1.5 seconds.

So, to my mind, a good solution would be (1) turn off synchronous (2) make periodic automatic backups (3) check the database integrity during startup after an abnormal shutdown and revert to the backup if problems are seen.

We (the SQLite developers) will keep working on better follow-on solutions.  We are looking at some changes for SQLite 3.6.0 that *might* help.  But I don't think those kind of solutions will be ready for FF3.0.0.
Comment on attachment 320806 [details] [diff] [review]
default to synchronous = NORMAL for storage connections

a+ schrep for 3.0.1 or RC2.  Please land on CVS trunk.
Attachment #320806 - Flags: approval1.9+
I can land this tomorrow - should we be landing in mozilla-central at the same time?
Status: NEW → ASSIGNED
Whiteboard: [RC2?][RC2+][has patch][has review][needs approval] → [RC2+][has patch][has review][can land]
Life's too short to read through all this, but...

yes, ext3 does suck-by-design.  Always has.

I would commend the use of the sync_file_range() syscall on Linux.  It
can be used to sync a subsection of a file and will not trigger the
write-the-whole-world behavior.  It will just sync the stuff you want
sunk.
(In reply to comment #123)
> I would commend the use of the sync_file_range() syscall on Linux.  It
> can be used to sync a subsection of a file and will not trigger the
> write-the-whole-world behavior.  It will just sync the stuff you want
> sunk.

Thanks, Andrew.

Will that sync blocks appended to a file?

man sync:

"Therefore,
 unless the application is strictly performing  overwrites  of  already-
 instantiated disk blocks, there are no guarantees that the data will be
 available after a crash."

No, it will not.  sync_file_range() purely syncs the data portion
of the file, not the metadata which tells the fs driver where the data
lies.  It's basically suited to database-style applications which 
instantiate the data file's blocks during setup phase.

If the file does need to grow during regular operation then I'd suggest
that it be grown in "large" hunks - say, extend it (with write()) by a
megabyte at a time.  Then fsync it to get the metadata written.  Then
proceed to use sync_file_range() for the now-non-extending small writes.
So the large-write-plus-fsync is "rare".

Extending the file in large hunks will also improve its layout.  Often
very much.

There are ways of doing these things, but it's not a particularly happy
story, I'm afraid.  ext4 and xfs improve on some of these things.
(In reply to comment #119)
> 1) Karl or Sylvain any chance you can re-run your tests with 
> 
> NORMAL
> FULL
> OFF

Method:

With profile on a filesystem with data=ordered

% ./run-mozilla.sh =zsh

Then while recursively copying a directory, on same filesystem,

% strace -t -T -e trace=fsync -o '|tee -a ~/tmp/fsyncs.log' ./firefox-bin -no-remote -P minefield http://gcc.gnu.org/onlinedocs/gcc-4.3.0/gcc/Option-Summary.html

And walk through a few pages.

Results:

(Don't read too much into the times on the left except to group fsyncs.
 Times on the right are time spent in each call.)

NORMAL
16:19:09 fsync(49)                      = 0 <15.738214>
16:19:25 fsync(50)                      = 0 <14.467910>
16:19:39 fsync(26)                      = 0 <9.161445>
16:20:17 fsync(49)                      = 0 <67.377323>
16:21:24 fsync(26)                      = 0 <10.760249>
16:21:53 fsync(49)                      = 0 <5.626006>
16:21:59 fsync(26)                      = 0 <5.077534>
16:22:44 fsync(49)                      = 0 <0.330282>
16:22:45 fsync(26)                      = 0 <0.284402>
16:23:03 fsync(49)                      = 0 <2.226487>
16:23:06 fsync(26)                      = 0 <1.631153>
16:23:55 fsync(49)                      = 0 <1.576737>
16:23:56 fsync(26)                      = 0 <0.417255>
16:24:16 fsync(44)                      = 0 <4.956035>
16:24:21 fsync(45)                      = 0 <1.739825>
16:24:23 fsync(46)                      = 0 <0.073756>

FULL
16:00:15 fsync(45)                      = 0 <50.312335>
16:01:05 fsync(46)                      = 0 <3.244974>
16:01:08 fsync(45)                      = 0 <1.266897>
16:01:09 fsync(26)                      = 0 <1.224191>
16:01:31 fsync(45)                      = 0 <3.518809>
16:01:35 fsync(45)                      = 0 <0.334473>
16:01:35 fsync(26)                      = 0 <0.059593>
16:01:47 fsync(45)                      = 0 <0.233312>
16:01:48 fsync(45)                      = 0 <0.222275>
16:01:48 fsync(26)                      = 0 <0.205298>
16:02:17 fsync(45)                      = 0 <0.574248>
16:02:18 fsync(45)                      = 0 <0.079697>
16:02:18 fsync(26)                      = 0 <0.140334>
16:02:43 fsync(45)                      = 0 <3.131806>
16:02:46 fsync(45)                      = 0 <0.997825>
16:02:47 fsync(26)                      = 0 <0.460446>
16:03:31 fsync(41)                      = 0 <0.038480>

OFF
15:48:27 fsync(40)                      = 0 <19.517323>

(That fsync with OFF is only at shutdown.)

Results are fairly random and depend on whether what is being read is in the fs
cache and how long data has been queueing.

> 2) I wanted to make sure that "UI freezing" behavior is clearly correlated
> to this specific issue dince there was already one false positive The
> URLClassifier can be the cause of the i/o load so comment 8 doesn't mean the
> UI is freezing during these times (unless you saw that directly Sylvain).
> Now as Shaver points out this will make your system suck anyway so we should
> do something about it.

I can't really correlate freeze times with results for fsync times, because my
last kernel upgrade caused strace to slow processes down (NO_HZ as a swag).

However I can confirm that the times reported for fsync are consistent with
the delays that I see while not running strace.

I can also confirm that setting to OFF removes the "UI freezing" behavior
while FF is browsing.

> 3) That setting to NORMAL helps and/or fixes it.  From my reading here for
> folks with the right config of ext3 NORMAL isn't likely to fix in enough.
> Test

I can't detect any reproducible difference with NORMAL c/w FULL.
Really I doubt it makes much difference.

> I wanted 1-3 before we changed the default for everyone because there is an
> additional risk of corruption (however low) - so we should make sure changing
> this helps.

NORMAL may actually be safer than FULL on ext3 given comment 114.

SQLITE_IOCAP_SAFE_APPEND may give a safer situation on data=ordered ext3 (and
also save the same 1 fsync of 3 FWIW).

However, we don't want SQLITE_IOCAP_SAFE_APPEND set for xfs or data=writeback
ext3.

I think the best way to achieve the safest situation would be to use FULL, but
detect the filesystem in unixDeviceCharacteristics and set
SQLITE_IOCAP_SAFE_APPEND appropriately.

That said, I have no reason to believe the risk with NORMAL is significant.

> 
> I'm definitely in favor of adding the pref so that distros and/or individual
> users have a way out.

Yes, we should definitely get the pref in so that people can choose.
(In reply to comment #120)
> As for the question of how to detect corruption:  Surely FF has some indicator
> when it is coming up that the last shutdown was not clean.

A problem while FF was running could be detected as you suggest.

However, I don't know how to quickly detect a situation such as:

1) FF quits.
2) power cut.

Perhaps the easiest solution is to do one fsync per modified db file before
quit.  (Then there is not need to detect corruption occurring after FF quits.)

>      PRAGMA integrity_check(1);
>      PRAGMA quick_check(1);

Thank you.

--

Andrew's sync_file_range suggestion sounds workable solution to ext3's issues.
That sounds like it could be implemented (on Linux >= 2.6.17) in the vfs
sqlite3_io_methods xSync and xWrite (for detecting file length changes and
growing in large hunks).  I haven't yet thought through how best to do
unixFileSize when vfs file length is not real file length, which I assume
would mean we wouldn't have SQLITE_IOCAP_SAFE_APPEND.
Blocks: 417732
(In reply to comment #123)
> 
> I would commend the use of the sync_file_range() syscall on Linux. 

Is sync_file_range() the same as fdatasync()?  If so, SQLite has provisions to use fdatasync() instead of fsync() if compiled with -DHAVE_FDATASYNC=1.

However, upon reviewing the code, I'm not sure that SQLite calls fdatasync() at the right times.  It should use fsync() whenever a file size changes and fdatasync() when only the content of the file changes.  But I don't think it does that correctly, and this is not something that we verify in the automated test suite.  Nevertheless, we could patch this up and do a quick 3.5.10 release or else add a branch to whatever older release FF3.0.0 is using if fdatasync() is seen as a significant win over fsync().  I'll run some tests and report back...
(In reply to comment #128)
> (In reply to comment #123)
> > 
> > I would commend the use of the sync_file_range() syscall on Linux. 
> 
> Is sync_file_range() the same as fdatasync()?

The sync_file_range man page would imply so:

"SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE | SYNC_FILE_RANGE_WAIT_AFTER
              This is a traditional fdatasync(2) operation."

but it doesn't sound like it really is from comment 104, or the fdatasync man page:

 "does not flush modified metadata
  unless that metadata is needed in order to allow a subsequent data retrieval
  to be correctly handled."

> ... if fdatasync() is seen as a significant win over fsync().
> I'll run some tests and report back...

See comment 94 and 95.

sync_file_range() is quite unlike fdatasync()

fdatasync fill help, although it's theoretically less efficient.
fdatasync wil write back all the dirty pages but will not sync
the metadata unless there are actually metadata changes pending (ie:
the file got extended).

So it pretty much does what you want, because it's the sync of the metadata
which hurts.

If however growth of the file is common then fdatasync will commonly
sync the metadata and there will be no gain.

(And could someone please fix that "midair collision" thing
in bugzilla?)

yes, the manpage is incorrect.  sync_file_range() will never
sync metadata whereas fdatasync() will do so if it has changed.
I'll let Michael know.
(In reply to comment #86)
> "I've lost my bookmarks" for a long time has been the #1 support request from
> folks - often due to profile corruption which does cause real harm and can come
> from power loss, crashes, any number of reasons.  This is one of the reasons we
> wanted a more robust data store and also built in backups for bookmarks.  It is
> a balance (safety and performance) but given modern systems we should be able
> to get both.

OK, but in the interests of being pragmatical (I don't even see the point of databases for such (unimportant!) things as history -- text files renamed atomically seemed to be good enough for data consistency on unix for 30 years), wouldn't it make sense to keep a backup copy of each database?  If there are troubles reading the current database, then alert the user that they should shut down the machine properly next time, that they are likely going to lose some of their last changes (big deal, they might lose a few days of history.  Better than corrupting the entire database, and better than waiting an eternity just to open a new tab), and that a backup copy will be restored for them.  Restore that backup, continue on as normally.  A backup should be written everytime after mozilla is started when fully loaded and everything is verified to be consistent.  Only then, allow write access to the original file.
FWIW, an SQLite database file can be safely copied (as a backup) as long as it is being held open for reading by at least on database connection.  The reader will lock the file to insure that it does not change while the copy is taking place.

I ran a speed test on my 2006-vintage linux box and I can easily copy 185MB of database per second (assuming no fsyncs are done at the end ;-)) So one would think that making a backup copy of database files periodically won't take too much time nor keep the database locked for too long.  On the other hand, that same linux box will fsync() in 20 milliseconds - literally 1000 times faster that what other people are reporting it takes to fsync() on their boxes.  So maybe on a heavily loaded machine you will only be able to backup about 185KB per second.  I dunno...

Comment 126 makes it clear that (In reply to comment #128)

> However, upon reviewing the code, I'm not sure that SQLite calls fdatasync() at
> the right times.  It should use fsync() whenever a file size changes and
> fdatasync() when only the content of the file changes.  But I don't think it
> does that correctly, and this is not something that we verify in the automated
> test suite.  Nevertheless, we could patch this up and do a quick 3.5.10 release
> or else add a branch to whatever older release FF3.0.0 is using if fdatasync()
> is seen as a significant win over fsync().  I'll run some tests and report
> back...
> 

If this performs well we'd very much appreciate and take the change in a 3.5.4.3 branch.   We'll upgrade to 3.6 probably in the FF3.1 time frame later this year.
Comment 126 makes it clear that NORMAL does not appreciable better than FULL.  I'd suggest we leave the default the same for now unless we can change it easily per-platform until we get results from comment 128.  Still want the pref in there.
Checking in storage/src/mozStorageConnection.cpp;
new revision: 1.34; previous revision: 1.33
Checking in storage/test/unit/test_storage_connection.js;
new revision: 1.10; previous revision: 1.9
Whiteboard: [RC2+][has patch][has review][can land] → [RC2+][has patch][has review][needs to land in mozilla-central]
The proposed backup-and-run-without-fsync also solves any and all NFS-related sqlite problems from Firefox's perspective. At the moment--even with fsync--Mozilla crashing with a profile stored an NFS-mounted home directory results in corruption.

I know a lot of people hate me for making this bug such a public focus, but, anyway, thanks to everyone that threw their energy behind to finding a solution.
(In reply to comment #135)
> I'd suggest we leave the default the same for now....

hmmm: Shawn's checkin (comment 136) changes the default to "NORMAL".

You forgot to quote the "unless".

We can change it easily per-platform, and distros can tune their own versions easily as well.
(In reply to comment #132)
> Text files renamed atomically seemed to be good enough for data consistency
> on unix for 30 years,

metadata-only journaling filesystems without the ordered mode of ext3
(e.g. xfs), and also filesystems without journalling, still really need fsync
to provide reliable ordering of operations.

https://launchpad.net/ubuntu/+bug/37435/comments/4
http://archives.postgresql.org/pgsql-admin/2007-05/msg00001.php

> wouldn't it make sense to keep a backup copy of each database?

If we don't have some sort of sync on the database, then yes.  A backup at
least would be made less frequently and thus require much fewer syncs (or no
syncs on some filesystems).
(In reply to comment #137)
> The proposed backup-and-run-without-fsync also solves any and all NFS-related
> sqlite problems from Firefox's perspective. At the moment--even with
> fsync--Mozilla crashing with a profile stored an NFS-mounted home directory
> results in corruption.

If you have actually seen this happen, then please make sure that there is a (separate) bug filed.

SQLite hands the data off to the filesystem in the correct order AFAICT.  An application crash should not interfere with the filesystem's handling of the data (but network or OS interrupt would do).

I suspect filesystems that don't have a working fsync should have regular backups of all files, if people care about the data on the filesystem.
(In reply to comment #1r)(In reply to comment #141)
> (In reply to comment #137)
> > The proposed backup-and-run-without-fsync also solves any and all NFS-related
> > sqlite problems from Firefox's perspective. At the moment--even with
> > fsync--Mozilla crashing with a profile stored an NFS-mounted home directory
> > results in corruption.
> 
> If you have actually seen this happen, then please make sure that there is a
> (separate) bug filed.
> 
> SQLite hands the data off to the filesystem in the correct order AFAICT.  An
> application crash should not interfere with the filesystem's handling of the
> data (but network or OS interrupt would do).

There's a giant warning on the sqlite project pages that sqlite should never, ever be used with NFS; it's not a bug in Mozilla and it should come as no surprise to anyone that sqlite databases on NFS are being corrupted.

> I suspect filesystems that don't have a working fsync should have regular
> backups of all files, if people care about the data on the filesystem.

NFS is and always has been funky.
Jason, I see no sign of that giant warning.  Whatever file-locking
related cautions there are, they don't seem to matter as far as
firefox is concerned because the files are not shared amongst
instances/profiles, so there is no cross-process concurrency
to lock against.
Indeed, the only mention of NFS that I see on the sqlite.org site is

You should avoid putting SQLite database files on NFS if multiple processes might try to access the file at the same time.

from the FAQ, and there are several NFS-related tickets in the tracker dealing with locking infelicities in various implementations; none of them seem to contain "OMG DON'T GO IN THERE" sorts of advice.

If you are finding that Firefox crashes are corrupting one of the sqlite databases, on NFS or any other filesystem, please file a bug on it, and consider sending the database to Shawn or myself so we can get people doing some analysis.
I applied the patched to xulrunner rc1 and firefox rc1. I suspect I only needed to apply it to xulrunner. I built both, and installed both. Then I created the toolkit.storage.synchronous as a string in about:config, and set it to OFF. I also disabled attack/forgery options. Then I ran strace -p pid 2>&1 | grep fsync. Then I ran dd if=/dev/zero of=test-file bs=1M count=2048. I still see the interface freeze, and I still see fsyncs.

What am I doing wrong?
> toolkit.storage.synchronous as a string in about:config, and set it to OFF.

You must set it to an integer of value 0 if you want PRAGMA synchronous = OFF in sqlite (look at the switch in attachment 320806 [details] [diff] [review] for the mapping).

Yep-- it takes integer values 2="FULL" (the old default); 1="NORMAL" (the new default); 0="OFF" (enormous improvement of "the problem", but dangerous if not backed up).
> This bug exists to make sure that our use of sqlite is correct and optimal.

So get back to work, do something about that and don't reply with ridiculous comments.
(In reply to comment #148)
> So get back to work, do something about that and don't reply with ridiculous
> comments.

Why you can't underestand that this is fixed right now, patch is ready to go as soon as possible in RC 2 or 3.0.1? Who force you to use it now instead of Firefox 2?
Frank Eigler wrote, above:
> So what do you think makes firefox's files so much more precious
> and/or vulnerable than, say, my email, or source code, or compiled
> binaries?

I took this at face value, and now regret it; in case others do the same I thought I'd follow up.

Of the three categories there, one of them doesn't need as much care (compiled binaries), because it can be reconstructed completely -- there's no inherent data loss, just time loss, as long as the source code remains intact.

But for source code, and email, you are indeed dealing with possibly-irreplaceable user data; if your backup regiment is good enough then you might not lose much, but you're very likely to lose something.

It turns out that the authors of programs like mutt and evolution (mail reading programs) and emacs and vim (editors that some Linux users may have heard of) also use fsync to preserve the integrity of the user data they're dealing with.  In at least vim's and emacs' cases, there is a way provided to disable this syncing if the user wishes to live more dangerously, or knows things about her environment that the programs can't rely on portably.  Mutt is adding _more_ fsyncs, because of concerns about data integrity, in fact.

It's unfortunate that fsync interacts so badly with a very common Linux configuration, and I do think it's important that we do what we can to mitigate those effects safely.  But the characterization of Firefox as being naively paranoid about user data is unfair, and I'm sorry that I may have contributed to it through my comments in this thread.
(In reply to comment #150)
> But for source code, and email, you are indeed dealing with
> possibly-irreplaceable user data; if your backup regiment is good enough then
> you might not lose much, but you're very likely to lose something.

We're talking about some number of *seconds* of potentially lost work, by
letting the OS flush its buffers in a writeback manner.  A backup regimen
of any reasonable sort will not make a few seconds' difference.

> It turns out that the authors of programs like mutt and evolution (mail 
> reading programs) and emacs and vim also use fsync to preserve the integrity
> of the user data they're dealing with.

To the extent this is true (for my copy of mutt, strace shows no fsyncs, and
I'm turning off emacs/vim's fsyncing too for latency-avoidance reasons --
thanks for the links in your blog), it is still a big step from there to
firefox wanting to fsync with amazing frequency (comment #8, near-zero 
inter-fsync time); or to all the basic UNIX utilities; or indeed to the
version control system mozilla itself uses.

I'd say the characterization of the normal case as "living more dangerously"
is no less unfair than the one you complain about.

By the way, those suitably equipped may find running the following systemtap
script entertaining or at least informative:

#! /usr/bin/stap
probe syscall.fsync { time[tid()] = gettimeofday_ms() }
probe syscall.fsync.return { 
   printf("%s %s %d\n", ctime(gettimeofday_s()),
                        execname(),
                        gettimeofday_ms() - time[tid()])}
global time
fdatasync/sync_file_range test program

This first creates a file of length 1 then does one fsync on the new file.
Then the file is continually modified without changing the length and synced
after each modification using one of three methods (somewhat randomly
selected): fsync/fdatasync/sync_file_range.

The I/O load for the test results below was produced using dd with a small
blocksize to limit the I/O some:

dd if=/dev/zero of=large bs=64 count=$((3*1024*1024*1024/64))

I used ltrace instead of strace as my strace didn't find sync_file_range (and
my glibc-2.5 libraries don't seem to have a sync_file_range function), so
sync_file_range appears below as "syscall(277".

rm -f datasync-test.tmp &&
ltrace -t -T -e trace=,fsync,fdatasync,syscall ./a.out

16:12:59 fsync(3) = 0                   <11.864858>
16:13:13 fdatasync(3) = 0               <14.706356>
16:13:30 fsync(3) = 0                   <12.832373>
16:13:45 syscall(277, 3, 0, 1, 7) = 0   <0.343116>
16:13:49 fdatasync(3) = 0               <8.231468>
16:14:01 syscall(277, 3, 0, 1, 7) = 0   <2.347144>
16:14:06 fsync(3) = 0                   <6.938656>
16:14:16 fdatasync(3) = 0               <8.359644>
16:14:27 fsync(3) = 0                   <5.928242>
16:14:35 syscall(277, 3, 0, 1, 7) = 0   <0.009531>
16:14:39 fdatasync(3) = 0               <7.356126>
16:14:50 fsync(3) = 0                   <6.402128>
16:14:59 syscall(277, 3, 0, 1, 7) = 0   <0.802706>
16:15:03 syscall(277, 3, 0, 1, 7) = 0   <2.985404>
16:15:08 fsync(3) = 0                   <4.722020>
16:15:15 fdatasync(3) = 0               <6.532945>
16:15:24 fdatasync(3) = 0               <2.294488>
16:15:30 fsync(3) = 0                   <7.986250>
16:15:40 syscall(277, 3, 0, 1, 7) = 0   <1.409809>
16:15:45 fdatasync(3) = 0               <5.404190>

The results are consistent with fdatasync being implemented as fsync on ext3.

They show the potential for considerable savings from growing (and shrinking)
files in large hunks and using sync_file_range (which also should reduce the impact on the rest of the filesystem).
Depends on: 435712
Bug #435712 implements automatic backup and restore of the places database. If the database is discovered to be corrupted, the backup is tested to see if it is also
corrupted. If it is not, it replaces the corrupted database. If both are corrupted, the old behavior kicks in: create a new database and rerun the import operation.

I believe this patch would make it safe to completely disable fsync() for places. At worst, a single session's browsing history is lost.
> 16:12:59 fsync(3) = 0                   <11.864858>
> 16:13:13 fdatasync(3) = 0               <14.706356>
> 16:13:30 fsync(3) = 0                   <12.832373>
> 16:13:45 syscall(277, 3, 0, 1, 7) = 0   <0.343116>
> ...
> The results are consistent with fdatasync being implemented as fsync on ext3.

I think this must be a bug.  But maybe I missed something.

I forwarded #152 to linux-fsdevel and linux-ext4, see if the other guys
can check my homework.

Of course even if it is a bug and if by some miracle we fix it,
that doesn't help anything much for next year or two.
While realizing this is primarily a Linux bug ID,
I have noticed on XP that the sqlite files become very fragmented.  Perhaps all this fsync() activity has a tendency to create fragmentation of the file as it grows, since it probably doesn't allow the OS to engage in lazy allocation.  If sqlite tends to touch many parts of a file during operation, there will be many more disk seeks because of the fragmentation.  Since I didn't see a single mention of 'fragment' in this bug, I thought I should at least add it to the discussion.

If fragmentation is part of the problem, simply copying the file (especially in the growth stage for urlclassifier)  and then keeping the copy can achieve a simple defragmentation.

Contig.exe from MS can be used with -a to find out the fragments in a file.  Without -a, it will defragment the file using OS primitives.
(In reply to comment #155)
> Since I didn't see a single
> mention of 'fragment' in this bug, I thought I should at least add it to the
> discussion.

Typically that's a good sign that you want to file another bug to discuss it.  Please don't let this bug become "all the things I've noticed about sqlite"; it's long and noisy enough as it is.
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Please leave this bug open until I can land it in mozilla-central.  It's likely to be forgotten otherwise.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Keywords: fixed1.9
Whiteboard: [RC2+][has patch][has review][needs to land in mozilla-central] → [RC2+][has patch][has review]
We'll be merging back to HG (bug 433426) so we don't need to explicitly track these landings at this time.
Status: REOPENED → RESOLVED
Closed: 16 years ago16 years ago
Keywords: fixed1.9
Resolution: --- → FIXED
Jumping in with a suggestion:

The justification for durability seems to be based on the importance of _bookmarks_ to the user. In that case, how about separating the DB for bookmarks from that of the history? 

The DB for bookmarks which is rarely updated can be kept golden at all times, while the DB for history, which is more frequently updated, need not be golden and take some corruption in the case of rare system-crash events.

This way, SQLite can still be used (which, I assume, means less code/design changes) and integrity of data also maintained (for that data where it is critical).
There is no need to make further suggestions in this bug.  If you wish to propose alternative architecture for a major part of the browser, please use one of the newsgroups-cum-mailing-lists for that purpose.  Reading about Places' architecture and design on MDC is probably a prerequisite.
It may be of interest to note that the frequent fsync() calls made by Firefox can cause fairly severe I/O stalls (and resulting GUI hangs) on multi-user machines like LTSP thin client servers. Firefox is hardly the only culprit, but its' heavy use gives it the most significant impact by far.

I've done something horrible and just no-op'd fsync() for firefox with an LD_PRELOAD wrapper. Bookmark loss is what backups are for in this kind of environment, and I can recommend that as a last-ditch workaround for others in thin client environments who're facing issues.

It's great to see that patches for this are in the works; I'll try to apply test the changes in our environment shortly.
It would probably be better to create new bug, but this is a logical place where some other people may be looking for when something similar occurs.

Please see http://portableapps.com/node/14100 for reports of Firefox 3 Portable to be extremely slow, I use this version as well, my places database has 7MB and the browser is almost unusable. 

So I posted there a script which sets the SYNCHRONOUS pragma to OFF. I hope I have warned the users well enough about the results of running such script:

http://portableapps.com/node/14100#comment-87499

I offer that I would create an extension, which will set this pragma to OFF on start up and by default create a zip archive of the whole places database.
With Firefox 3 (>= rc2) you could just tell your users to go to about:config and set toolkit.storage.synchronous to integer value 0 which will have nearly the same effect (it will set SYNCHRONOUS pragma to OFF to all databases after restart). You could do an extension to expose that configuration through a UI, with the accompanying warnings.
(In reply to comment #162)
> Please see http://portableapps.com/node/14100 for reports of Firefox 3 Portable
> to be extremely slow, I use this version as well, my places database has 7MB
> and the browser is almost unusable. 
> 
> So I posted there a script which sets the SYNCHRONOUS pragma to OFF. I hope I
> have warned the users well enough about the results of running such script:
Given that it's being done on startup, when we aren't doing much writing at all to the databases, that issue probably has nothing to do with this.  There are a few posts about it being that way always, but the original poster indicated it was on startup. What makes you so sure it's this and not poor read/write speeds?

> I offer that I would create an extension, which will set this pragma to OFF on
> start up and by default create a zip archive of the whole places database.
That's not really safe or reliable in any way (see bug 435712).
(In reply to comment #164)
> > So I posted there a script which sets the SYNCHRONOUS pragma to OFF. I hope I
> > have warned the users well enough about the results of running such script:
> Given that it's being done on startup, when we aren't doing much writing at all
> to the databases, that issue probably has nothing to do with this.  There are a
> few posts about it being that way always, but the original poster indicated it
> was on startup. What makes you so sure it's this and not poor read/write
> speeds?

Yes the experience described in the post is varying, but most of the comments do not refer to startup only. I was using Portable Firefox 2 (PF2) for two months with very good performance, after a switch to PF3 the performance went down drastically, processor almost at 0% but FF frozen for long seconds on each tab opened or page visited. With the toolkit.storage.synchronous set to 0 it is the same as it was with PF2 - or may be even better (heavy JS pages load faster).

> That's not really safe or reliable in any way (see bug 435712).

Extension may be a better mean of doing this. Dumping 8MB database into .sql takes me 3 seconds (on the USB drive) - I would better spent 3 seconds once a day then 5 seconds on each page load. The dump should be a good mean of database integrity verification I guess - and I can access the data with text editor.
Flags: blocking1.9.0.1+
Also, note bug 442967, where I'm about to post an experimental build to help with this problem.
Experimental builds from bug 442967 are available now. Any testing from folks experiencing this problem is very much appreciated! From bug 442967:

BIG-LETTERED-WARNING:
Do not use your normal profile - create a copy of it.  This modifies the places
database schema, and downgrading will causes some strange behavior.

Builds here:
https://build.mozilla.org/tryserver-builds/2008-08-28_11:12-sdwilsh@shawnwilsher.com-try-df9b4f955b9/
I upgraded to hg version 21180:f21f3e9d1001, and I'm still getting the same pause, except that it's now there's no IO spike after the CPU spike finishes. The problem has always been sporadic for me. When it happens, it does so quite often, especially when I open a window or load a new page. The fact that the IO spike has gone away leads me to suspect that it's related to this issue.
This bug has been marked as fixed, but after spending some time reading over the comments, I can't find comments that would indicate it was.  I came to be interested in this not because of performance problems from massive fsync storms, but because I'm trying to minimize my system's power consumption and noise generation by spinning down my hard disk, among other things, and it looks like firefox frequently fsyncs the sqlite journal which causes my disk to spin up, which otherwise does not seem necessary.

I see several comments referring to toolkit.storage.synchronous, but I see no toolkit.storage* in my about:config.  Is it possible to disable the syncing even if it means loosing recent history on a system crash?  I am running 3.0.4 on Ubuntu Intrepid.
if you don't care about your data then i'd suggest you write a wrapper script for firefox.

preconditions:
1. you have a ramdisk
2. you don't actually care about your data

script:
0. cd to the firefox application directory
1. verify firefox isn't already running (if it is, you'll want to use xremote - otherwise you're about to incur major dataloss)
2. rsync your profile from its ~/directory to ramdisk
3. ./run-mozilla.sh ./firefox-bin -P path/to/profile/in/ramdisk
4. rsync your profile from ramdisk back to its ~/directory

this should result in 0 significant fsync's for profile data while firefox is running and 0 protection for firefox data against your system crashing.

however, as long as your system doesn't crash during the rsync operations, your data shouldn't be particularly volatile.
(In reply to comment #169)
> ...
> I'm trying to minimize my system's power consumption and
> noise generation by spinning down my hard disk, among other things, and it
> looks like firefox frequently fsyncs the sqlite journal which causes my disk to
> spin up, which otherwise does not seem necessary.

Try bug 407325.  This bug is fixed, even if the subject is misleading.

(In reply to comment #170)
> if you don't care about your data then i'd suggest you write a wrapper script
> for firefox.

Not a useful suggestion for people who use devices that depend on battery and who want firefox to "just work".

Ciao!
(In reply to comment #169)
> I see several comments referring to toolkit.storage.synchronous, but I see no
> toolkit.storage* in my about:config.

This preference is not listed unless explicitly set.  (So you'll have to add it.)

> Is it possible to disable the syncing even if it means loosing recent history
> on a system crash?  I am running 3.0.4 on Ubuntu Intrepid.

Setting toolkit.storage.synchronous to 0 will risk losing all history including bookmarks unless Firefox's profile is on a filesystem with full data journaling
(or other backup mechanisms are in place).

For ext3 mounted with data=journal, and toolkit.storage.synchronous = 0, only very recent data should be at risk.
(In reply to comment #169)
> This bug has been marked as fixed, but after spending some time reading over
> the comments, I can't find comments that would indicate it was.  [...]

IIUC it's implicit in comment #158, which was posted at the same time the FIXED resolution was last set.

> I see several comments referring to toolkit.storage.synchronous, but I see no
> toolkit.storage* in my about:config.  Is it possible to disable the syncing
> even if it means loosing recent history on a system crash?  I am running 3.0.4
> on Ubuntu Intrepid.

- The pref descriptions have been added at http://kb.mozillazine.org/About:config_entries in the last few days.
- RESOLVED FIXED means fixed-on-trunk, which for Firefox means (at the time I'm posting this) Firefox 3.2a1pre, or in general anything from http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-trunk/ . 3.0.4 is not "trunk" but "Mozilla1.9.0" or "Firefox 3.0" branch.
- Even if your version has it, not every preference appears by default in about:config. For some of them, if you want a nondefault setting, you have to use Right-click => New => then the appropriate type (Integer, String or Boolean) and the desired value.
So in the event of a system crash while using toolkit.storage.synchronous=0, the history and bookmarks will be lost entirely?  Or is there a semi recent backup normally made for falling back to?
I set this value to 0 and I'm still seeing firefox call fdatasync() on places.sqlite-journal.
I don't see how this bug can be considered FIXED.

fsync() is not a syscall that normal user applications such as a web browser should ever use on a file descriptor that goes to disk.  Certainly wanting data writes to be ordered is not an excuse for actually forcing them to disk.

I think on current Linux systems, it's a reasonable assumption to make that data writes are ordered if recovery from a system crash is desired, or otherwise files that are open during a system crash are not something the user wants protected.

My understanding is that fsync() use through libsqlite is entirely superfluous on systems with ordered data:  all it does is cause unnecessary disk writes, wear out flash chips where those are used, and cause weird situations where after a system crash, the places database will be more likely to be intact than actual files created explicitly by the user - which we don't fsync().

I'm tempted to create a Linux-specific bug:  fsync(), fdatasync(), and sync() should simply never be called by a normal web browser, and where they are, significant sleep times in the syscalls should be expected;  if sqlite really is so badly-written that it is common for the database to become mangled beyond any recovery, that frankly suggests to me we shouldn't use it.
I wrote the patch referenced in comment #153. Since that time, both xfs and ext4 have been become stable alternatives to ext3. And both filesystems properly flush only the contents of the file referenced in the fsync(FD) call instead of the entire commit buffer. So, Mozilla did its part and the rest was fixed in the kernel.

I would suggest that you upgrade your systems to one of these two filesystems.
(In reply to comment #177)
> fsync() is not a syscall that normal user applications such as a web browser
> should ever use on a file descriptor that goes to disk.
That's an awfully bold claim you are making without backing it up with any facts.  Care to explain why you have this opinion?

> Certainly wanting data
> writes to be ordered is not an excuse for actually forcing them to disk.
It's not just about making sure they are ordered, it's also about making sure they are on disk so you can properly restore yourself from a crash.  See http://en.wikipedia.org/wiki/ACID

> I think on current Linux systems, it's a reasonable assumption to make that
> data writes are ordered if recovery from a system crash is desired, or
> otherwise files that are open during a system crash are not something the user
> wants protected.
A reasonable assumption is not an API guarantee.  When it comes to user data, we cannot assume everything is going to be OK, we have to be guaranteed it will.  We assumed that fsync would have a sane behavior on all filesystems, and look where it got us.

> My understanding is that fsync() use through libsqlite is entirely superfluous
> on systems with ordered data:  all it does is cause unnecessary disk writes,
> wear out flash chips where those are used, and cause weird situations where
> after a system crash, the places database will be more likely to be intact than
> actual files created explicitly by the user - which we don't fsync().
I'm sorry, but your understanding is not correct.

> I'm tempted to create a Linux-specific bug:  fsync(), fdatasync(), and sync()
> should simply never be called by a normal web browser, and where they are,
> significant sleep times in the syscalls should be expected;  if sqlite really
> is so badly-written that it is common for the database to become mangled beyond
> any recovery, that frankly suggests to me we shouldn't use it.
And I'm sure that bug would be marked as WONTFIX.  The fact that SQLite uses fsync has nothing to do with how "badly-written" it is, but because it is ACID compliant.  If you want to throw away the durability, you can do so at your own risk by setting the pref this bug added to zero.


(In reply to comment #178)
> So, Mozilla did its part and the rest was fixed in the kernel.
Not only did we supply this patch, but we've also gone and greatly reduce the number of writes and fsync calls we make for Firefox 3.1.  Additionally, we've moved several of the remaining calls off of the main thread so we do not hang the UI in the process of performing them still.  That still may cause other applications issues on filesystems such as ext3, but we really cannot do anything about that.

> I would suggest that you upgrade your systems to one of these two filesystems.
As I understand it, an ext3 system can be switched to ext4 pretty easily.
(In reply to comment #179)
> (In reply to comment #177)
> > fsync() is not a syscall that normal user applications such as a web browser
> > should ever use on a file descriptor that goes to disk.
> That's an awfully bold claim you are making without backing it up with any
> facts.  Care to explain why you have this opinion?

Certainly.  Fsync() provides better-than-file-system reliability: user applications, on the other hand, are limited to file-system reliability:  the whole point of having a less-than-perfectly reliable file system is to trade off reliability for performance.  That is a decision made by the environment in which firefox is running and one we should not override unless explicitly instructed to do so.

Fsync() is optional in POSIX, and both a null implementation and an implementation which simply sleeps until data has been written are acceptable. I would recommend you carefully read POSIX's documentation for fsync() at http://www.opengroup.org/onlinepubs/009695399/functions/fsync.html, which makes clear that it is an extraordinary syscall (thus an extension) which is expected to involve long periods of sleep and should not be used by programmes which do not strictly require it.

What fsync() is for, I think, is simply to make sure that where a network transaction is made confirming to another machine that a database transaction has happened, a subsequent power failure would not result in a state where the confirmation has been made but no disk write ever happened.  A mail daemon might use fsync() to make sure mails do not get lost in transit, with the remote machine deleting their copy because they got a successful confirmation the message was now locally stored.

In the event of a system crash, it is fair to say the local user does not get a "success" message:  their system just crashed.

Put another way, fsync() violates layering:  normally, user applications use only the abstraction of the on-disk file system made available to them through normal syscalls, and it is up to the user to define, system-wide, how closely that abstraction is kept in sync with what might be recoverable after a crash:  they have the option of keeping it totally in sync, keeping it in a state from which it might not be recoverable, or various settings in between, of which the only one which really makes sense is "recoverable to what it did look at at a recent point, but not necessarily at the most recent point at which user interaction happened".  On Linux, those three correspond to synchronous mounting, an unjournalled filesystem, and a journalled file system with ordered data writes.  A journalled file system with unordered data writes is not expected to leave files which were opened for writing in a consistent state:  it might be slightly better than ext2, but not in any guaranteed fashion.

> > Certainly wanting data
> > writes to be ordered is not an excuse for actually forcing them to disk.
> It's not just about making sure they are ordered, it's also about making sure
> they are on disk so you can properly restore yourself from a crash.  See
> http://en.wikipedia.org/wiki/ACID

We don't make sure data we create on disk is actually written to disk:  If you save a file in firefox, that file isn't fsync()ed.  I think it's unreasonable to say that "page visited" information is somehow more valuable than information explicitly created and saved by the user:  in both cases, it is up to the operating system to decide whether and when to sync the data - which might require a disk to spin up, network access, or hardware costs.

> > I think on current Linux systems, it's a reasonable assumption to make that
> > data writes are ordered if recovery from a system crash is desired, or
> > otherwise files that are open during a system crash are not something the user
> > wants protected.
> A reasonable assumption is not an API guarantee.  When it comes to user data,
> we cannot assume everything is going to be OK, we have to be guaranteed it
> will.  We assumed that fsync would have a sane behavior on all filesystems, and
> look where it got us.

Firefox cannot guard against file system corruption as long as it uses the file system at all:  Firefox's reliability is (usually) limited to file system reliability and operating system reliability.  It is inappropriate to expend system resources to ensure reliability better than that in places where data is created implicitly (places.sqlite), but not in places where data is created explicitly (scrapbook, "save as ...", user settings, etc.)

> > My understanding is that fsync() use through libsqlite is entirely superfluous
> > on systems with ordered data:  all it does is cause unnecessary disk writes,
> > wear out flash chips where those are used, and cause weird situations where
> > after a system crash, the places database will be more likely to be intact than
> > actual files created explicitly by the user - which we don't fsync().
> I'm sorry, but your understanding is not correct.

How is it incorrect?  If sqlite keeps the file system in a non-recoverable state, but hopes that this non-recoverable state is never written to disk because of its frequent fsync() use, it is simply unreliable.

> > I'm tempted to create a Linux-specific bug:  fsync(), fdatasync(), and sync()
> > should simply never be called by a normal web browser, and where they are,
> > significant sleep times in the syscalls should be expected;  if sqlite really
> > is so badly-written that it is common for the database to become mangled beyond
> > any recovery, that frankly suggests to me we shouldn't use it.
> And I'm sure that bug would be marked as WONTFIX.

Quite probably, but I must say that it strikes me odd that a high-reliability POSIX extension really only useful for network daemons which send out remote confirmations that an action has been recorded permanently should be considered vital for a web browser.

I'm surprised people appear to have given up entirely on spinning down their laptop hard drives.  Are people seriously aware that their web browser would be the program forcing the disk to continually spin up?

>  The fact that SQLite uses
> fsync has nothing to do with how "badly-written" it is, but because it is ACID
> compliant.  If you want to throw away the durability, you can do so at your own
> risk by setting the pref this bug added to zero.

Unfortunately, no.  Even with the pref, starting with --debug and breaking on fsync produces plenty of output, both during startup and subsequently (on current CVS);  I'm going to track that down, now.

> (In reply to comment #178)
> > So, Mozilla did its part and the rest was fixed in the kernel.
> Not only did we supply this patch, but we've also gone and greatly reduce the
> number of writes and fsync calls we make for Firefox 3.1.  Additionally, we've
> moved several of the remaining calls off of the main thread so we do not hang
> the UI in the process of performing them still.  That still may cause other
> applications issues on filesystems such as ext3, but we really cannot do
> anything about that.
> 
> > I would suggest that you upgrade your systems to one of these two filesystems.
> As I understand it, an ext3 system can be switched to ext4 pretty easily.

It's odd just how intent you seem to be to keep firefox usable on only a small subset of systems:  the current checklist appears to be:
 - ext4
 - no hard drive spin-down
 - no old flash storage (where frequent writes might reduce lifetime)
 - fast fsync (which means either no other I/O access or firefox being prioritised).

All this to protect against events which are rare to begin with (system crash or power failure), which users accept to result in some data loss (they're aware that a document open in openoffice will only be saved to disk every so often, and they also know they will lose web forms and other temporary data) anyway;  and I'm still not sure whether we're merely ensuring recoverability to a consistent state, or really making the decision that any state seen by the user must be recoverable-to.

In the latter case, I would urge you to work on web forms being recoverable, that recovery being enabled by default, and a disk write always happening in between me pressing a key and that key appearing in a web form, first.

We definitely need a state-of-the-art fsync()-free web browser on some Linux systems, such as those which want to spin down their hard drives.  Currently, the only way to get that is to build sqlite without any syncing, which is obviously dangerous.
(In reply to comment #180)
I'm going to prefix my response with this statement:
I don't think you and I are going to agree here ever.  I'm arguing from the perspective of an application developer trying to cope with users getting corrupted profiles on crashes, whereas you seem to be arguing from an operating system perspective (or something else that isn't the same perspective as me).  As a result, are arguments are driven by a different set of priorities.

> Certainly.  Fsync() provides better-than-file-system reliability: user
> applications, on the other hand, are limited to file-system reliability:  the
> whole point of having a less-than-perfectly reliable file system is to trade
> off reliability for performance.  That is a decision made by the environment in
> which firefox is running and one we should not override unless explicitly
> instructed to do so.
But coming from an application developer who is trying to deal with large amounts of bug reports about user profile corruption because their system crashed, it's necessary.  When it comes to the user's profile, we care about reliability a lot.

> Fsync() is optional in POSIX, and both a null implementation and an
> implementation which simply sleeps until data has been written are acceptable.
> I would recommend you carefully read POSIX's documentation for fsync() at
> http://www.opengroup.org/onlinepubs/009695399/functions/fsync.html, which makes
> clear that it is an extraordinary syscall (thus an extension) which is expected
> to involve long periods of sleep and should not be used by programmes which do
> not strictly require it.
I take back my claim about a guarantee, but it still provides us with a better situation than the filesystem.  I don't agree that applications should not use fsync.  In fact, from the very document you linked me to:
The fsync() function should be used by programs which require modifications to a file to be completed before continuing; for example, a program which contains a simple transaction facility might use it to ensure that all modifications to a file or files caused by a transaction are recorded.

To me, this indicates that it's perfectly acceptable for an application to use this.

> Put another way, fsync() violates layering *snip*
Sure, but abstractions are only so useful.  There are many situations where an abstraction doesn't let you accomplish something you need to do.

> We don't make sure data we create on disk is actually written to disk:  If you
> save a file in firefox, that file isn't fsync()ed.  I think it's unreasonable
> to say that "page visited" information is somehow more valuable than
> information explicitly created and saved by the user:  in both cases, it is up
> to the operating system to decide whether and when to sync the data - which
> might require a disk to spin up, network access, or hardware costs.
This is quite possibly a valid point.  I think the exception is made because user history can not be re-obtained, whereas most downloads can be.

> Firefox cannot guard against file system corruption as long as it uses the file
> system at all:  Firefox's reliability is (usually) limited to file system
> reliability and operating system reliability.  It is inappropriate to expend
> system resources to ensure reliability better than that in places where data is
> created implicitly (places.sqlite), but not in places where data is created
> explicitly (scrapbook, "save as ...", user settings, etc.)
Sure, and we can't guard against performing instructions wrong either.  Of course, we aren't guarding against file system corruption, but protecting ourselves from letting the file get into a state that we can't parse.  (by we, I mean SQLite)

> How is it incorrect?  If sqlite keeps the file system in a non-recoverable
> state, but hopes that this non-recoverable state is never written to disk
> because of its frequent fsync() use, it is simply unreliable.
SQLite uses fsync as a synchronization point.  It needs to manage the journal file and the database file, and it uses fsync to ensure that the changes happen in the right order.

> Quite probably, but I must say that it strikes me odd that a high-reliability
> POSIX extension really only useful for network daemons which send out remote
> confirmations that an action has been recorded permanently should be considered
> vital for a web browser.
It's usefulness is a matter of opinion, demonstrated quite clearly on our disagreement.

> Unfortunately, no.  Even with the pref, starting with --debug and breaking on
> fsync produces plenty of output, both during startup and subsequently (on
> current CVS);  I'm going to track that down, now.
I have data to show that we've done a lot of work - http://markmail.org/message/jkc2nnaxrvdmpn7d

For what it is worth, if you are using CVS, you won't be seeing that work.  CVS was used for Firefox 3.0, but for 3.1 and beyond, we've moved to mercurial.

> All this to protect against events which are rare to begin with (system crash
> or power failure), which users accept to result in some data loss (they're
> aware that a document open in openoffice will only be saved to disk every so
> often, and they also know they will lose web forms and other temporary data)
> anyway;  and I'm still not sure whether we're merely ensuring recoverability to
> a consistent state, or really making the decision that any state seen by the
> user must be recoverable-to.
Not rare enough.  We've moved into this direction because of the number of corrupted profile bugs we've gotten as a result of crashes.  Users shouldn't have to accept data loss if we can avoid it.

> In the latter case, I would urge you to work on web forms being recoverable,
> that recovery being enabled by default, and a disk write always happening in
> between me pressing a key and that key appearing in a web form, first.
I think you are exaggerating here, but we do have some degree of form restoration (not for SSL since it's a security risk), but it's not per key AFAIK.
Since my inbox gets full and I'm somewhat struck by this, I'd like to make a suggestion:
Why not make an option for this? Ideally on a per-database basis. Typically I don't care about places as much as I do about bookmarks or browser options.
AFAIK* sqlite, just as most DB's, lets the user specify whether it should sync.

I for instance wouldn't even care about bookmarks since I use foxmarks. The phishing protection DB et al are also defined externally, so I question the value of extra protection, especially if some systems hang on it.

*Disclaimer: I didn't do tests, but sqlite manager shows me a 'fullsync flag' and the 'syncronous' mode.
(In reply to comment #182)
> Why not make an option for this?
That's *exactly* what this bug was about.  Note that's it's fixed (in Firefox 3.0 and beyond).
Sorry for the noise, I made the incorrect coercion it was not in 3.0.

(In reply to comment #183)
Adding the option doesn't remove the problems I described in #474578. Though it certainly helps. Since you tagged it as a duplicate, you may want to reconsider this.
(In reply to comment #180)
> I'm surprised people appear to have given up entirely on spinning down their
> laptop hard drives.  Are people seriously aware that their web browser would be
> the program forcing the disk to continually spin up?
> 
+1.
Using powertop, simply closing some tabs on Firefox reduces power usage from over 30W to about 15W on my laptop. The decrease is linear with the decrease in kernel wakeups due to Firefox.

This might not be entirely due to fsync, though -- if I'm not navigating to any new page, then Places is not updated, and so fsync should never be called, I'd think.
That hasn't got *anything* to do with fsync, and *everything* to do with Adobe Flash.

I wish you people would quit spamming this bug with your ridiculous, uneducated, ahistorical, completely incorrect theories about how Unix filesystems work.
Note to the comment #178, ext4 is not stable. I have personally started using it for my /home partition on multiple machines. I have in the process found multiple bugs in ext4. To get a kernel where ext4 at least seems stable I had to go to 2.6.29rc7 with an additional patch to fix a inode corruption bug. Even the latest 2.6.27 kernels with lots of ext4 backported patches weren't good enough. I recommend staying away till the next official distribution release that uses it, at least.
(In reply to comment #187)
> Note to the comment #178, ext4 is not stable. I have personally started using
> it for my /home partition on multiple machines. I have in the process found
> multiple bugs in ext4. To get a kernel where ext4 at least seems stable I had
> to go to 2.6.29rc7 with an additional patch to fix a inode corruption bug. Even
> the latest 2.6.27 kernels with lots of ext4 backported patches weren't good
> enough. I recommend staying away till the next official distribution release
> that uses it, at least.

Can you please elaborate on what specific bugs you found and how serious they were?  I've been using ext4 for about 6 months now on laptops and servers, with no visible issues.
  The first major issue I had was I would do a reboot under normal circumstances, and then get massive errors on my /home filesystem. In one case the corruption of the filesystem was so bad I had to format it, and rebuild from backup.

http://bugzilla.kernel.org/show_bug.cgi?id=12151

The second issue I had may have been the same issue, but behaved a bit differently. It basically had the same results of fsck errors on reboot, but not bad consequences. It was accompanied by the error below. Note the error below was in demsg, but the system just continued. At least with Fedora the default on ext3/ext4 error is continue. I used tune2fs to change mine to remount read-only. Continue is a really bad idea, because I didn't notice the error below for three days. During that time the corruption was getting worse. When I mentioned the error=continue to the main developers they were like, "Oh that is a really bad default, we should fix that."

EXT4-fs error (device md3): ext4_mb_generate_buddy: 
EXT4-fs: group 9812: 7604 blocks in bitmap, 7516 in gd

The next error I have seen on two systems, and I have read of at least one if not two people with the same problem. It causes processes to go zombie, and the system becomes wedged. Both were during high I/O on /home filesystems. One of the other people started multiple threads on fedora-test-list. It seems to have taken my advice and come back with the same results.

Search for Wedg in the page:
https://www.redhat.com/archives/fedora-test-list/2009-March/thread.html


The next problem I didn't experience but there were a flood of people on the bug report.

https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781
http://linux.slashdot.org/article.pl?sid=09/03/11/2031231

 
The final issue was the inode corruption. I haven't experienced this, but was alerted to it by the developers.


The guy who started the mailing list threads and myself both had problems with kernel-2.6.27.19-170.2.35.fc10.x86_64, which is the latest Fedora kernel. I tried 2.6.28.1 Fedora kernel packages with some of the other issues. kernel-2.6.29-0.61.rc8.fc10.x86_64 has been working very well for me.


Here is the changelog history from the kernel I am running.

rpm -q --changelog kernel-2.6.29-0.61.rc8.fc10.x86_64 | grep -i ext4
- Add pending ext4 patch to silence fallback allocation warning message.
- Copy ext4 fixes from rawhide:
  linux-2.6-ext4-extent-header-check-fix.patch
  linux-2.6-ext4-flush-on-close.patch
- Copy ext4 ENOSPC fix from rawhide.
- Fix ext4 hang on livecd-creator (F10#484522)
- Pull back ext4 updates from 2.6.28-rc3-git6
- Delay capable() checks in ext4 until necessary. (#467216)
- ext4 updates from stable patch queue
- Update to upstream ext4 code destined for 2.6.28.
- Add in latest ext4 patch queue - rename to ext4 as well.
- Add pending ext4 patch queue; adds fiemap interface
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1b4pre) Gecko/20090326 Ubuntu/8.10 (intrepid) Shiretoko/3.5b4pre

With and without toolkit.storage.synchronous set to 0 i see fsync() calls every time i delete a history item.

Is this OK or should this bug be reopened?
(In reply to comment #190)
> With and without toolkit.storage.synchronous set to 0 i see fsync() calls every
> time i delete a history item.
Are you restarting your browser after you change the setting?
(In reply to comment #191)
> Are you restarting your browser after you change the setting?

Yes, i just double checked. I checked with both toolkit.storage.synchronous as integer and as string (which one is correct?) and i see 3 fsync() calls with strace (and the hdd led light blink every time).
So I expect performance gains.

Thanks.
https://lists.mozilla.org/pipermail/dev-platform/2014-May/004762.html
maybe relevant.

mozilla code calls fsync where fdatasync is deemed appropriate.

TIA
Thanks for the pointer! Nice looking numbers.

The previous comments in this bug show that SQLite uses fsync over fdatasync for ACID compliance.

Perhaps that's something that should be revisited as we're in 2015 and fdatasync might be trustworthy enough now for our purposes.

NI? Mak for feedback on whether it's worth opening a new bug to re-evaluate fsync vs fdatasync.
Flags: needinfo?(mak77)
sqlite used to use fdatasync on linux before 3.7.10, and made the conscious choice not to use it in 3.7.10, three years ago. OTOH, it happens that it was reenabled on platforms that pass the AC_CHECK_FUNCS(fdatasync) test two days ago[1], so in fact, next version of sqlite will have it... if we add an AC_CHECK_FUNCS to our configure. And now looking further, I see the reason it was disabled in the first place is because Android's libc doesn't have it[2].

1. https://github.com/mackyle/sqlite/commit/55e2a047b98fac03643753d07d06c3bf157d77ff
2. https://github.com/mackyle/sqlite/commit/e29af45dcfbe5e00fc0a947a6a5009917783115e
The problem isn't fsync vs fdatasync.  The problem is that Mozilla has decided that preserving browsing history is of paramount importance and uses an overkill transactional database to store it instead of dumping a simple data structure to disk, and moving it to a backup after it has hit the disk on its own.

For that matter, it seems this bug has been marked as fixed for years, for no apparent reason since it was never fixed.
The discussion is happening in the wrong place (srly, a 2009 reso/fixed bug?),
the first thing to do it to move it to a more appropriate place.
Comment 197 is right, it's not worth to add anything more to the discussion here, since it's the wrong place. Please file a separate bug and we'll evaluate once we have the proper Sqlite version in place.
Flags: needinfo?(mak77)
Ok, I opened a new bug.
Bug 1120444 - Uset fdatasync properly instead of fdata where appropriate 

TIA
Of course, I meant to say fsync instead of fdata:

Bug 1120444 - Uset fdatasync properly instead of fsync where appropriate 

TIA
You need to log in before you can comment on or make changes to this bug.