Last Comment Bug 684513 - firefox-bin hangs after close, strace shows it slowly reading places.sqlite backwards in 4k chunks
: firefox-bin hangs after close, strace shows it slowly reading places.sqlite b...
Status: RESOLVED WORKSFORME
[Snappy:P1]
: perf
Product: Toolkit
Classification: Components
Component: Places (show other bugs)
: 6 Branch
: x86_64 Linux
: -- normal with 1 vote (vote)
: ---
Assigned To: Vladan Djeric (:vladan)
:
: Marco Bonardo [::mak]
Mentors:
: 661368 (view as bug list)
Depends on: 722243 693667
Blocks:
  Show dependency treegraph
 
Reported: 2011-09-03 10:09 PDT by Josh Triplett
Modified: 2014-09-21 09:17 PDT (History)
27 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments

Description Josh Triplett 2011-09-03 10:09:09 PDT
For a little while now, I've noticed a couple of problems: searches in the awesomebar often take an excessive amount of time before producing results (if they produce any at all), and Firefox often sticks around in the background for a while before exiting (noticeable because attempting to re-launch it brings up the "already running" dialog).  Some investigation shows that these two problems seem related, and that I can reproduce both reliably.

In an instance of Firefox I already had running, I opened a new tab and typed "foo" in the awesomebar, which matches various things in my history.  The first result didn't show up for several seconds, with additional results showing up every few seconds afterward.  A few orders of magnitude slower than usual.

I closed that tab, opened a new tab, and typed some gibberish, which shouldn't match anything.  After it spun on that for a few seconds, I closed that tab.

I opened a new tab, and typed "foo" again.  This time, no results appeared after quite a few seconds.

I then closed the browser entirely.  ps showed firefox-bin still running, using 100% CPU.

strace -f -p $(pidof firefox-bin) showed this:

[pid  4752] lseek(3, 22478848, SEEK_SET) = 22478848
[pid  4752] read(3, "[data omitted]"..., 4096) = 4096
[pid  4752] lseek(3, 22474752, SEEK_SET) = 22474752
[pid  4752] read(3, "[data omitted]"..., 4096) = 4096
[pid  4752] lseek(3, 22470656, SEEK_SET) = 22470656
[pid  4752] read(3, "[data omitted]"..., 4096) = 4096
[pid  4752] lseek(3, 22466560, SEEK_SET) = 22466560
[pid  4752] read(3, "[data omitted]"..., 4096) = 4096
[pid  4752] lseek(3, 22458368, SEEK_SET) = 22458368
[pid  4752] read(3, "[data omitted]"..., 4096) = 4096
[pid  4752] lseek(3, 22454272, SEEK_SET) = 22454272
[pid  4752] read(3, "[data omitted]"..., 4096) = 4096
[pid  4752] lseek(3, 22450176, SEEK_SET) = 22450176
[pid  4752] read(3, "[data omitted]"..., 4096) = 4096
[pid  4752] lseek(3, 3293184, SEEK_SET) = 3293184
[pid  4752] read(3, "[data omitted]"..., 4096) = 4096
[pid  4752] lseek(3, 2174976, SEEK_SET) = 2174976
[pid  4752] read(3, "[data omitted]"..., 4096) = 4096
[pid  4752] lseek(3, 22446080, SEEK_SET) = 22446080
[pid  4752] read(3, "[data omitted]"..., 4096) = 4096
[pid  4752] lseek(3, 22441984, SEEK_SET) = 22441984
[pid  4752] read(3, "[data omitted]"..., 4096) = 4096

Watching these events scroll by, I noticed that they occurred relatively slowly.

/proc/4752/fd/3 pointed to places.sqlite.

Whatever it's doing, reading a file from back to front (with a few random skips
around) seems like a really bad idea.  Also note that I have more than
enough memory to hold the entire file at once, many times over.

When firefox-bin finally finished, I looked at my profile directory, and found
this:

-rw-r--r-- 1 josh josh  73400320 2011-09-03 09:22 .mozilla/firefox/2fr6gqfu.default/places.sqlite
-rw-r--r-- 1 josh josh    458752 2011-09-03 09:25 .mozilla/firefox/2fr6gqfu.default/places.sqlite-shm
-rw-r--r-- 1 josh josh 221454152 2011-09-03 09:25 .mozilla/firefox/2fr6gqfu.default/places.sqlite-wal

Note the presence of the -shm and -wal files with Firefox no longer running.

I then re-launched Firefox and immediately closed it.  firefox-bin didn't stick
around this time, and the -shm and -wal files vanished.

I re-launched Firefox, and recreated the issue again: searched for gibberish,
cleared the search, searched for "foo", waited long enough that the first
result should have appeared, closed the browser.

I again saw firefox-bin spinning in the background, and ran strace, this time
remembering to use -tt for timing to quantify how slowly this occurred:

[pid  8066] 09:37:15.927097 lseek(31, 22130688, SEEK_SET) = 22130688
[pid  8066] 09:37:15.927275 read(31, "[data omitted]"..., 4096) = 4096
[pid  8066] 09:37:15.934218 lseek(31, 2695168, SEEK_SET) = 2695168
[pid  8066] 09:37:15.934403 read(31, "[data omitted]"..., 4096) = 4096
[pid  8066] 09:37:15.949713 lseek(31, 22122496, SEEK_SET) = 22122496
[pid  8066] 09:37:15.949838 read(31, "[data omitted]"..., 4096) = 4096
[pid  8066] 09:37:15.955098 lseek(31, 22126592, SEEK_SET) = 22126592
[pid  8066] 09:37:15.955267 read(31, "[data omitted]"..., 4096) = 4096
[pid  8066] 09:37:15.967533 lseek(31, 22118400, SEEK_SET) = 22118400
[pid  8066] 09:37:15.967701 read(31, "[data omitted]"..., 4096) = 4096
[pid  8066] 09:37:15.998122 lseek(31, 22114304, SEEK_SET) = 22114304
[pid  8066] 09:37:15.998304 read(31, "[data omitted]"..., 4096) = 4096
[pid  8066] 09:37:16.000188 lseek(31, 49950720, SEEK_SET) = 49950720
[pid  8066] 09:37:16.000309 read(31, "[data omitted]"..., 4096) = 4096
[pid  8066] 09:37:16.003916 lseek(31, 3780608, SEEK_SET) = 3780608
[pid  8066] 09:37:16.004052 read(31, "[data omitted]"..., 4096) = 4096

It appears that I can reliably reproduce this problem.
Comment 1 Josh Triplett 2011-09-04 15:18:40 PDT
*** Bug 661368 has been marked as a duplicate of this bug. ***
Comment 2 Marco Bonardo [::mak] 2011-09-05 01:54:22 PDT
IS the reported version correct? Firefox 6?
Comment 3 Josh Triplett 2011-09-05 09:08:44 PDT
(In reply to Marco Bonardo [:mak] (Away 6-18 Sept.) from comment #2)
> IS the reported version correct? Firefox 6?

That's the version I'm currently running.  I've also previously reproduced it with Firefox 7 when that was in Aurora.
Comment 4 (dormant account) 2011-09-19 13:42:11 PDT
Cool bug. Can you  export NSPR_LOG_MODULES=mozStorage:5 and then run firefox to figure out sql statement[s] are causing this?
Comment 5 Josh Triplett 2011-09-29 15:27:23 PDT
(In reply to Taras Glek (:taras) from comment #4)
> Cool bug. Can you  export NSPR_LOG_MODULES=mozStorage:5 and then run firefox
> to figure out sql statement[s] are causing this?

I exported NSPR_LOG_MODULES=mozStorage:5 , ran firefox from that shell, and reproduced the bug (confirmed with strace), but firefox produced no output.  To confirm, I exported NSPR_LOG_FILE=firefox.log , and firefox produced an empty firefox.log file.

Also, note that the above two reproductions occurred with Firefox 7:
Mozilla/5.0 (X11; Linux x86_64; rv:7.0) Gecko/20100101 Firefox/7.0
Comment 6 (dormant account) 2011-09-29 17:09:05 PDT
Looks like you need a debug build for this to work. Try one of these -debug builds http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mak77@bonardo.net-506df90a2c98/

(Note:make sure you have a backup of your profile before you mess with development builds)
Comment 7 Josh Triplett 2011-09-30 00:46:03 PDT
(In reply to Taras Glek (:taras) from comment #6)
> Looks like you need a debug build for this to work. Try one of these -debug
> builds
> http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mak77@bonardo.net-
> 506df90a2c98/
> 
> (Note:make sure you have a backup of your profile before you mess with
> development builds)

Before I go through the process of trying to run a development build, a quick question: I do have debug symbols available, so can I gather the information you need with gdb or similar somehow?
Comment 8 (dormant account) 2011-09-30 09:07:10 PDT
(In reply to Josh Triplett from comment #7)
> (In reply to Taras Glek (:taras) from comment #6)
> > Looks like you need a debug build for this to work. Try one of these -debug
> > builds
> > http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mak77@bonardo.net-
> > 506df90a2c98/
> > 
> > (Note:make sure you have a backup of your profile before you mess with
> > development builds)
> 
> Before I go through the process of trying to run a development build, a
> quick question: I do have debug symbols available, so can I gather the
> information you need with gdb or similar somehow?

I just need the sql query, should be to get it pretty easily by attaching gdb and poking around the backtrace.
Comment 9 Josh Triplett 2011-10-01 16:49:21 PDT
Took a while to figure out how to dig the information out of gdb; ended up using gdb to call sqlite3_sql(pStmt) at a frame with access to the opaque sqlite3_stmt.  Looks like it had this query in progress:

SELECT h.url, h.title, f.url, EXISTS(SELECT 1 FROM moz_bookmarks WHERE fk = h.id) AS bookmarked, ( SELECT title FROM moz_bookmarks WHERE fk = h.id AND title NOTNULL ORDER BY lastModified DESC LIMIT 1 ) AS btitle, ( SELECT GROUP_CONCAT(t.title, ',') FROM moz_bookmarks b JOIN moz_bookmarks t ON t.id = b.parent AND t.parent = :parent WHERE b.fk = h.id ) AS tags, h.visit_count, h.typed, h.id, :query_type, t.open_count FROM moz_places h LEFT JOIN moz_favicons f ON f.id = h.favicon_id LEFT JOIN moz_openpages_temp t ON t.url = h.url WHERE h.frecency <> 0 AND AUTOCOMPLETE_MATCH(:searchString, h.url, IFNULL(btitle, h.title), tags, h.visit_count, h.typed, bookmarked, t.open_count, :matchBehavior, :searchBehavior)  ORDER BY h.frecency DESC, h.id DESC LIMIT :maxResults

The backtrace showed this:

#0  0x00007f01d48a050e in sqlite3VdbeRecordCompare (nKey1=9, pKey1=0x7f01b6a7c039, pPKey2=0x7f01b76fb8a0) at sqlite3.c:61295
#1  0x00007f01d48c0276 in sqlite3BtreeMovetoUnpacked (pRes=<optimized out>, biasRight=<optimized out>, intKey=<optimized out>, pIdxKey=<optimized out>, pCur=<optimized out>) at sqlite3.c:52776
#2  sqlite3BtreeMovetoUnpacked (pCur=0x7f01b6a0cfc8, pIdxKey=0x7f01b76fb8a0, intKey=0, biasRight=0, pRes=0x7f01b76fb890) at sqlite3.c:52676
#3  0x00007f01d48cb5ea in sqlite3VdbeExec (p=0x7f01b7956ab8) at sqlite3.c:67013
#4  0x00007f01d48d11d9 in sqlite3Step (p=0x7f01b7956ab8) at sqlite3.c:61954
#5  sqlite3_step (pStmt=0x7f01b7956ab8) at sqlite3.c:62027
#6  sqlite3_step (pStmt=0x7f01b7956ab8) at sqlite3.c:62015
#7  0x00007f01d57f8ba0 in mozilla::storage::stepStmt (aStatement=0x7f01b7956ab8) at /tmp/buildd/iceweasel-7.0.1/storage/src/mozStoragePrivateHelpers.cpp:288
#8  0x00007f01d57f7637 in mozilla::storage::AsyncExecuteStatements::executeStatement (this=0x7f01b08e7270, aStatement=0x7f01b7956ab8)
    at /tmp/buildd/iceweasel-7.0.1/storage/src/mozStorageAsyncStatementExecution.cpp:365
#9  0x00007f01d57f7902 in mozilla::storage::AsyncExecuteStatements::executeAndProcessStatement (this=0x7f01b08e7270, aStatement=0x7f01b7956ab8, aLastStatement=true)
    at /tmp/buildd/iceweasel-7.0.1/storage/src/mozStorageAsyncStatementExecution.cpp:312
#10 0x00007f01d57f7a2e in mozilla::storage::AsyncExecuteStatements::bindExecuteAndProcessStatement (this=0x7f01b08e7270, aData=<optimized out>, aLastStatement=true)
    at /tmp/buildd/iceweasel-7.0.1/storage/src/mozStorageAsyncStatementExecution.cpp:294
#11 0x00007f01d57f7f87 in mozilla::storage::AsyncExecuteStatements::Run (this=0x7f01b08e7270) at /tmp/buildd/iceweasel-7.0.1/storage/src/mozStorageAsyncStatementExecution.cpp:627
#12 0x00007f01d596c41c in nsThread::ProcessNextEvent (this=0x7f01b837b5b0, mayWait=1, result=0x7f01b76fbe6c) at /tmp/buildd/iceweasel-7.0.1/xpcom/threads/nsThread.cpp:617
#13 0x00007f01d594237c in NS_ProcessNextEvent_P (thread=<optimized out>, mayWait=<optimized out>) at /tmp/buildd/iceweasel-7.0.1/build-xulrunner/xpcom/build/nsThreadUtils.cpp:245
#14 0x00007f01d596be0a in nsThread::ThreadFunc (arg=0x7f01b837b5b0) at /tmp/buildd/iceweasel-7.0.1/xpcom/threads/nsThread.cpp:272
#15 0x00007f01d6476b93 in ?? () from /usr/lib/x86_64-linux-gnu/libnspr4.so.0d
#16 0x00007f01d79d9b40 in start_thread (arg=<optimized out>) at pthread_create.c:304
#17 0x00007f01d772436d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#18 0x0000000000000000 in ?? ()

All the other threads appeared blocked in various ways, such as PR_WaitCondVar, glib's main loop, or an epoll_wait.
Comment 10 Marco Bonardo [::mak] 2011-10-03 02:05:58 PDT
this is most likely the missin ANALYZE problem, try following the steps I explain here http://blog.bonardo.net/2011/09/30/is-your-firefor-freezing-at-regular-intervals
Comment 11 (dormant account) 2011-10-03 09:44:18 PDT
(In reply to Marco Bonardo [:mak] from comment #10)
> this is most likely the missin ANALYZE problem, try following the steps I
> explain here
> http://blog.bonardo.net/2011/09/30/is-your-firefor-freezing-at-regular-
> intervals

Why is places autocomplete code running on shutdown?
Comment 12 Marco Bonardo [::mak] 2011-10-03 10:08:12 PDT
it is not, this is an async query that is slow due to the missing analyze, so slow that it ends late and increases the queries queue. All of this nonsense perf issues are just due to broken statistics in the database, otherwise these queries would take milliseconds :(
Comment 13 (dormant account) 2011-10-03 10:37:39 PDT
(In reply to Marco Bonardo [:mak] from comment #12)
> it is not, this is an async query that is slow due to the missing analyze,
> so slow that it ends late and increases the queries queue. All of this
> nonsense perf issues are just due to broken statistics in the database,
> otherwise these queries would take milliseconds :(

That makes me very sad.

Is there a bug to run analyze we can dupe this against(ie 686025)?
Comment 14 Josh Triplett 2011-10-03 10:50:29 PDT
Why would a missing analyze cause sqlite to slowly read the table *backward* in small chunks?

(In any case, this could explain why the awesomebar gets so un-awesomely slow.)

Also, why can't Firefox abort the async query?
Comment 15 (dormant account) 2011-10-03 11:05:13 PDT
(In reply to Josh Triplett from comment #14)
> Why would a missing analyze cause sqlite to slowly read the table *backward*
> in small chunks?

I'm guessing here that you have a query planner on crack(ie mak's blog post) + internal fragmentation in the db. Look at the "vacuum" prefs in about:config. What values do you have for those?

Also how big is your places database? How much ram do you have?
Comment 16 Marco Bonardo [::mak] 2011-10-03 11:11:56 PDT
(In reply to Taras Glek (:taras) from comment #13)
> Is there a bug to run analyze we can dupe this against(ie 686025)?

bug 690354 is what causes analyze to never run, but ideally I'd like to never run it that is bug 683876. Bug 686025 is just a consequence like this one



(In reply to Josh Triplett from comment #14)
> Why would a missing analyze cause sqlite to slowly read the table *backward*
> in small chunks?

Because SQLite thinks the table is empty, so it starts doing a linear scan of all the rows, then it finds it really has thounsands rows to read.

> Also, why can't Firefox abort the async query?

I never said that, but it depends at which stage Cancel() is invoked, depending on the thread traffic the scheduler may have already started the scan.
Comment 17 Josh Triplett 2011-10-03 11:22:29 PDT
(In reply to Taras Glek (:taras) from comment #15)
> (In reply to Josh Triplett from comment #14)
> > Why would a missing analyze cause sqlite to slowly read the table *backward*
> > in small chunks?
> 
> I'm guessing here that you have a query planner on crack(ie mak's blog post)
> + internal fragmentation in the db.

Reading it backward still seems pathologically bad. :)

> Look at the "vacuum" prefs in
> about:config. What values do you have for those?

places.last_vacuum: 1288031428
storage.vacuum.last.index: 1
storage.vacuum.last.places.sqlite: 1317323598

> Also how big is your places database? How much ram do you have?

places.sqlite takes up 81M on disk.  I have 4GB of memory.
Comment 18 Marco Bonardo [::mak] 2011-10-04 03:34:17 PDT
vacuum was on 29 Sept, sounds fine.
Did you run the suggestion in comment 10 and did that improve things?
Comment 19 Josh Triplett 2011-10-04 08:28:24 PDT
(In reply to Marco Bonardo [:mak] from comment #18)
> vacuum was on 29 Sept, sounds fine.
> Did you run the suggestion in comment 10 and did that improve things?

I hadn't yet; I wanted to let the discussion continue around it until that seemed like the consensus before I did something that might prevent me from continuing to reproduce the bug and help here.  I'll go ahead and try it now.
Comment 20 Josh Triplett 2011-10-04 08:34:39 PDT
No, running the instructions linked in comment 10 did not help.  I could still trivially reproduce the problem.

Should I run something other than "Expire"?
Comment 21 Marco Bonardo [::mak] 2011-10-12 08:06:36 PDT
(In reply to Josh Triplett from comment #20)
> No, running the instructions linked in comment 10 did not help.  I could
> still trivially reproduce the problem.
> 
> Should I run something other than "Expire"?

nope, not if the issue was analyze. Other entries do database cleanup btw, they won't hurt.
Comment 22 Josh Triplett 2011-10-17 00:32:34 PDT
Anything more I can do to help resolve this bug?
Comment 23 (dormant account) 2011-10-17 07:29:36 PDT
(In reply to Josh Triplett from comment #22)
> Anything more I can do to help resolve this bug?

It's on my todo list. You can get started on debugging this if you are keen :)
Comment 24 (dormant account) 2011-12-15 11:52:12 PST
Vlad any updates on this?
Comment 25 Vladan Djeric (:vladan) 2011-12-15 13:03:48 PST
(In reply to Taras Glek (:taras) from comment #24)
> Vlad any updates on this?

I looked into it and I'm at a bit of a loss on how to proceed. I've seen the same Places query execute very slowly firsthand on my debug builds (>100ms) but not for several seconds.

http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/nsPlacesAutoComplete.js#222

My current best guess is that this monster query is inherently slow and it causes major performance issues when operating on large tables.

Josh: If the issue is still reproducible, could I ask you to check the number of records in your moz_bookmarks, moz_places and moz_favicons tables?

You can do it from the shell after installing the sqlite3 package:
1. sqlite3 ~/.mozilla/firefox/<profile name ending with .default>/places.sqlite
2. select * FROM ((select count(*) from moz_bookmarks), (select count(*) from moz_places), (select count(*) from moz_favicons));

Alternatively, you can install the SQLite Manager extension:
https://addons.mozilla.org/en-US/firefox/addon/sqlite-manager/
Comment 26 Josh Triplett 2011-12-15 13:07:47 PST
Still reliably reproducible with Firefox 9.

sqlite> select * FROM ((select count(*) from moz_bookmarks), (select count(*) from moz_places), (select count(*) from moz_favicons));
870|115636|7382
Comment 27 Vladan Djeric (:vladan) 2011-12-15 14:15:16 PST
(In reply to Josh Triplett from comment #26)
> Still reliably reproducible with Firefox 9.
> 
> sqlite> select * FROM ((select count(*) from moz_bookmarks), (select
> count(*) from moz_places), (select count(*) from moz_favicons));
> 870|115636|7382

Thank you. The size of your moz_places might explain why the autocomplete query is taking so long. I was able to reproduce "reading the database backwards" behaviour by typing in a URL I hadn't visited previously. I saw the DB being read backwards in 32KB chunks over about 3.5 MB with a few random jumps in between. I didn't notice any appreciable delays but my moz_places only has 10% as many records.

As a solution, we could probably safely abort places.sqlite transaction on shutdown after a 1 or 2 second grace period. Also, it seems Marco Bonardo might already have fixes for the moz_places bloat.
Comment 28 Josh Triplett 2011-12-15 16:01:04 PST
With only 115k history entries, and the entirety of places.sqlite taking only 71MB (and thus easily fitting entirely in RAM), it should never take multiple seconds to do an awesomebar search.  I agree that places.sqlite transactions should not prevent firefox from closing, but it bothers me even more when I type in the awesomebar and don't get an instant response.

Note that this doesn't happen all the time.  Most of the time I can type and get an instant response.  However, whenever I type something that doesn't match any places entry, the awesomebar takes quite a while (just retested: 30+ seconds of 100% CPU) to recover and start answering queries instantly again.
Comment 29 (dormant account) 2011-12-15 16:14:14 PST
71*1024*1024/4096=  ~18K read() syscalls, which might be translating into 18K disk seeks. I wonder why your sqlite db isn't in fs cache by the time you exit browser.

One reason your reads are so bad is because we switched to a 32K pagesize in firefox 4, but for whatever reason Firefox is unable to vacuum your db to switch it to 32K pages. If you make a backup of your db and then run "pragma page_size=32768; vacuum" performance should be less terrible.

Josh, can you reproduce this terrible perf in sqlite shell?

It's still not clear to me why SQLite is scheduling IO backwards, we'll get to the bottom of that. This reading backwards trick cleverly prevents kernel readahead from doing anything useful :)
Comment 30 Josh Triplett 2011-12-15 16:32:50 PST
(In reply to Taras Glek (:taras) from comment #29)
> 71*1024*1024/4096=  ~18K read() syscalls, which might be translating into
> 18K disk seeks. I wonder why your sqlite db isn't in fs cache by the time
> you exit browser.

I can reproduce this problem even with the database entirely warm in cache.  Also, I have an SSD. :)

> One reason your reads are so bad is because we switched to a 32K pagesize in
> firefox 4, but for whatever reason Firefox is unable to vacuum your db to
> switch it to 32K pages. If you make a backup of your db and then run "pragma
> page_size=32768; vacuum" performance should be less terrible.

I can try that, though as you suggest it ought to have happened automatically.  It seems unlikely to make a huge difference, though; this performance problem seems entirely CPU-bound, not IO-bound.

> Josh, can you reproduce this terrible perf in sqlite shell?

I can try, if you can suggest a query equivalent to the one Firefox runs but with the custom parameters/functions filled in so that I can execute it outside of Firefox.

> It's still not clear to me why SQLite is scheduling IO backwards, we'll get
> to the bottom of that. This reading backwards trick cleverly prevents kernel
> readahead from doing anything useful :)

Yeah, that certainly exacerbates the problem.
Comment 31 Josh Triplett 2011-12-15 16:45:14 PST
(In reply to Josh Triplett from comment #30)
> (In reply to Taras Glek (:taras) from comment #29)
> > One reason your reads are so bad is because we switched to a 32K pagesize in
> > firefox 4, but for whatever reason Firefox is unable to vacuum your db to
> > switch it to 32K pages. If you make a backup of your db and then run "pragma
> > page_size=32768; vacuum" performance should be less terrible.
> 
> I can try that, though as you suggest it ought to have happened
> automatically.  It seems unlikely to make a huge difference, though; this
> performance problem seems entirely CPU-bound, not IO-bound.

Just tried it; I can still reliably reproduce this problem, and I see no obvious difference in performance.  Still takes 30-60 seconds to recover after typing gibberish.
Comment 32 Marco Bonardo [::mak] 2011-12-15 17:42:53 PST
(In reply to Taras Glek (:taras) from comment #29)
> One reason your reads are so bad is because we switched to a 32K pagesize in
> firefox 4, but for whatever reason Firefox is unable to vacuum your db to
> switch it to 32K pages. If you make a backup of your db and then run "pragma
> page_size=32768; vacuum" performance should be less terrible.

nope, VACUUM can't change page_size of a WAL database, you have to:
PRAGMA journal_mode=TRUNCATE;PRAGMA page_size=32768; VACUUM; PRAGMA journal_mode=WAL;
We can't do this in Firefox since it requires that all statements are finalized, we can't ensure that yet (and I'm not sure we will be able in future, due to add-ons) so it must be done when Firefox is closed :(

> It's still not clear to me why SQLite is scheduling IO backwards, we'll get
> to the bottom of that. This reading backwards trick cleverly prevents kernel
> readahead from doing anything useful :)

Maybe because it's searching from more frecent to less frecent results, and it's more likely that relevant results are at the end of the database.

Actually this happens because we can't use fulltext-indexing, so to match we have to actually go through each results and match. This means something that doesn't match will have to walk all the pages.
A possible workaround may be to reintroduce chunking and cancel the query after a certain amount of ms.
Comment 33 Josh Triplett 2011-12-15 18:39:01 PST
(In reply to Marco Bonardo [:mak] from comment #32)
> (In reply to Taras Glek (:taras) from comment #29)
> > One reason your reads are so bad is because we switched to a 32K pagesize in
> > firefox 4, but for whatever reason Firefox is unable to vacuum your db to
> > switch it to 32K pages. If you make a backup of your db and then run "pragma
> > page_size=32768; vacuum" performance should be less terrible.
> 
> nope, VACUUM can't change page_size of a WAL database, you have to:
> PRAGMA journal_mode=TRUNCATE;PRAGMA page_size=32768; VACUUM; PRAGMA
> journal_mode=WAL;
> We can't do this in Firefox since it requires that all statements are
> finalized, we can't ensure that yet (and I'm not sure we will be able in
> future, due to add-ons) so it must be done when Firefox is closed :(

OK, just tried that.  I can still reliably reproduce the bug, and I see no difference in the amount of time needed to recover awesomebar performance after typing gibberish.

> > It's still not clear to me why SQLite is scheduling IO backwards, we'll get
> > to the bottom of that. This reading backwards trick cleverly prevents kernel
> > readahead from doing anything useful :)
> 
> Maybe because it's searching from more frecent to less frecent results, and
> it's more likely that relevant results are at the end of the database.

> Actually this happens because we can't use fulltext-indexing, so to match we
> have to actually go through each results and match. This means something
> that doesn't match will have to walk all the pages.

That shouldn't matter.  Searching all the content without any index should still take only a fraction of a second:

$ time grep gibberish places.sqlite

real	0m0.059s
user	0m0.044s
sys	0m0.016s

Even if sqlite took 10 times as long, it should still take less than a second.

> A possible workaround may be to reintroduce chunking and cancel the query
> after a certain amount of ms.

Possibly a good idea in general, but not a fix for this bug.
Comment 34 (dormant account) 2011-12-16 14:41:06 PST
Josh, just to make sure. When you run "pragma page_size" on your db in sqlite, what do you get? Sqlite should at the very least be reading backwards in 32K chunks now :)
Comment 35 Josh Triplett 2011-12-16 15:06:11 PST
(In reply to Taras Glek (:taras) from comment #34)
> Josh, just to make sure. When you run "pragma page_size" on your db in
> sqlite, what do you get? Sqlite should at the very least be reading
> backwards in 32K chunks now :)

Correct, it now reads backwards in 32K chunks.  As before, it skips around a bit, but mostly reads backwards.
Comment 36 Emanuel Hoogeveen [:ehoogeveen] 2011-12-18 22:58:28 PST
(In reply to Marco Bonardo [:mak] from comment #32)
> nope, VACUUM can't change page_size of a WAL database, you have to:
> PRAGMA journal_mode=TRUNCATE;PRAGMA page_size=32768; VACUUM; PRAGMA
> journal_mode=WAL;
> We can't do this in Firefox since it requires that all statements are
> finalized, we can't ensure that yet (and I'm not sure we will be able in
> future, due to add-ons) so it must be done when Firefox is closed :(
Could the maintenance service from bug 481815 be leveraged for this and other database maintenance tasks, at least on Windows?
Comment 37 James May [:fowl] 2011-12-19 01:15:28 PST
Services generally try to avoid touching user data, and if properly set up, don't even have access to it.
Comment 38 Vladan Djeric (:vladan) 2012-05-14 14:00:46 PDT
Josh, can you still reproduce this issue? If so, can you confirm whether the following build solves your problem? It interrupts awesomebar SQL queries on shutdown.

http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/vdjeric@mozilla.com-1619cc524896/try-linux64-debug/firefox-15.0a1.en-US.linux-x86_64.tar.bz2
Comment 39 Josh Triplett 2012-05-22 15:13:39 PDT
(In reply to Vladan Djeric (:vladan) from comment #38)
> Josh, can you still reproduce this issue?

Yes.

> If so, can you confirm whether the
> following build solves your problem? It interrupts awesomebar SQL queries on
> shutdown.
> 
> http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/vdjeric@mozilla.
> com-1619cc524896/try-linux64-debug/firefox-15.0a1.en-US.linux-x86_64.tar.bz2

I hesitate to try a 15.0a1 build when I currently run Firefox 11, given that I need to try it on my main profile to reproduce the problem.  I'll give that build a try on a temporary copy of my profile and let you know if I can reproduce the problem.
Comment 40 Vladan Djeric (:vladan) 2012-05-23 07:05:55 PDT
> I hesitate to try a 15.0a1 build when I currently run Firefox 11, given that
> I need to try it on my main profile to reproduce the problem.  I'll give
> that build a try on a temporary copy of my profile and let you know if I can
> reproduce the problem.

Sounds great, thanks!
Comment 41 Josh Triplett 2012-05-31 15:43:19 PDT
I can't actually reproduce this problem anymore since upgrading to Firefox 12.
Comment 42 Bryan Quigley 2013-07-08 14:03:57 PDT
Is this still an issue for anyone?
Comment 43 Wayne Mery (:wsmwk, NI for questions) 2014-09-21 09:17:23 PDT
(In reply to Bryan Quigley from comment #42)
> Is this still an issue for anyone?

WFM per reporter comment 41

Note You need to log in before you can comment on or make changes to this bug.