Places makes reftest run slow

RESOLVED FIXED

Status

()

Toolkit
Places
--
major
RESOLVED FIXED
9 years ago
9 years ago

People

(Reporter: bz, Unassigned)

Tracking

({perf})

Trunk
x86
Mac OS X
Points:
---
Bug Flags:
in-testsuite -

Firefox Tracking Flags

(Not tracked)

Details

STEPS TO REPRODUCE:

1) Run reftest/mochitest a few times on a profile
2) Start a reftest run
3) Watch the test number in the title bar

EXPECTED RESULTS: test number increments smoothly while tests run

ACTUAL RESULTS: test number increments smoothly for about 20 tests, then pauses for a while (order of several seconds or so).  We have about 2800 reftests, so that's 140 pauses.

Put another way, running reftest against a clean profile (where the pauses just start appearing towards the end of the test run a little bit) takes 7 minutes 40 seconds.  Doing the same against my normal testing profile takes 17 minutes 30 seconds.  So this issue is approximately doubling my reftest runnning time.

I profiled two of these pauses, and both look identical: all the time is spent under nsNavHistory:LazyTimerCallback calling nsNavHistory::CommitLazyMessages calling nsNavHistory::AddURIInternal calling nsNavHistory::AddVisitChain calling nsNavHistory::InternalAddVisit calling mozStorageStatement::Execute calling mozStorageStatement::ExecuteStep calling sqlite3_step calling sqlite3Step calling sqlite3VdbeExec.

40% of the time is spend inside sqlite3VdbeExec itself.  The rest is spent under various stuff it calls (sqlite3BtreeKeySize, minmaxStep, sqlite3BtreeNext, sqlite3VdbeMemRelease, sqlite3VdbeCursorMoveto, etc, etc).

I can tell you what the bottom-up profile looks like too, if desired, but it's all inside sqlite stuff.

Is there anything obvious we can do to make this better somehow?  This presumably affects browsing too, and our new-profile-creating performance tests don't really catch it I bet.
moving toward adding visits and favicons with async statements rather than with lazyTimer is probably the best we can do, we have to add informations to the db somewhere, at least that would happen in a separate mozStorage thread, without blocking the UI. We already have a plan for that targeted 3.2
(In reply to comment #0)
> Is there anything obvious we can do to make this better somehow?  This
> presumably affects browsing too, and our new-profile-creating performance tests
> don't really catch it I bet.
Not sure I totally agree.  Reftests are loading pages rapidly, which isn't something a user normally does.  What basically ends up happening is a lot of work gets built up in our lazy queue because you load a bunch of pages during the delayed write period. With that said, I'm wondering if bug 480211 will end up helping you even though expiration didn't show up in the stack during the two pauses you saw.
Shawn, the delayed write is happening every 20 reftests or so.  This means that we're adding at most 40 visits.

I assume the time taken to add is linear in number of visits queued up?  If so, adding a single visit takes on the order of 100ms.  If the user happens to load a page with frames, we're going to be adding multiple visits at once, and it'll cause a noticeable UI lag.  Unless I'm missing something?
if frames are considered embed visits, those should be faster since are only added to memory tables.
Ah, ok.

There's another way to look at this if you want: this is more than doubling the CPU cycles needed to load a set of pages.  Now granted, these are pretty small pages.  But still, more than doubling.  With more normal user browsing habits the time would be spread in between page loads more, but it'd still be sucking up CPU.
(In reply to comment #4)
> if frames are considered embed visits, those should be faster since are only
> added to memory tables.
Everything is added to in-memory tables first.

(In reply to comment #3)
> Shawn, the delayed write is happening every 20 reftests or so.  This means that
> we're adding at most 40 visits.
In what kind of a time period are we talking about?
(In reply to comment #6)
> (In reply to comment #4)
> Everything is added to in-memory tables first.

yeah i meant embeds are also never synced down to disk
See numbers above.  7 minutes 40 seconds == 460 seconds to load about 2800 reftests.  That's 160ms per reftest pair on average.  20 tests would then be about 3 seconds.  That's in the vanilla profile.  In the profile with history, 1050 seconds to load all the tests gives 375ms per pair, or about 7.5 seconds pause start to pause start.

The lazy timer looks to be set for 3 seconds, and we reset it on every lazy message add, though starting with the third lazy message we just let it be.  So that's very much consistent with the above numbers: 40 pageloads in about 3 seconds, then the timer fires and spends about 3.5 seconds adding those 40 URIs to history, then we start pageloads again and set up the lazy timer for good within 3 * (3s / 40) == 0.25 seconds or so.
(In reply to comment #8)
Well, you just did all the research and explaining I was going to do with your timing number :)

With that said, I'm not sure we can do much of anything here.  SQLite is going to be taking as long as it is because it has to update a number of indexes.  I don't think any of these pages have referrers or or are link transitions, so we aren't doing any lookups.  This is just raw insert speed. :(
On the face of it, taking 3.5 seconds to insert 40 URIs sounds like a pretty slow database. Boris: this is all CPU-bound right? The CPU's not going idle?
Not sure how to tell, offhand...  CPU usage in general for a reftest run against the profile with existing history is about 82% according to |time|.  The shark profile also showed the expected number of samples for the time of the wait...  I don't _think_ shark shows samples as part of the program execution time if we're in i/o wait.  But I'd welcome any ideas for how to better measure this.

And yes, taking hundreds of ms per database insert sounds really slow to me.
On insert, if we already have a place, we have three indexes to update:
moz_historyvisits_temp_placedateindex (2 col)
moz_historyvisits_temp_fromindex (1 col)
moz_historyvisits_temp_dateindex (1 col)

If it's a new place (we have never seen it before), we'll have a number of indexes to update as well:
moz_places_temp_url_uniqueindex (1 col)
moz_places_temp_faviconindex (1 col)
moz_places_temp_hostindex (1 col)
moz_places_temp_visitcount (1 col)
moz_places_temp_frecencyindex (1 col)

Plus, the triggers that we use to actually insert into the historyvisits table:
http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/src/nsPlacesTriggers.h#141

And if we have to add something to the places table (haven't seen the uri before):
http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/src/nsPlacesTriggers.h#69

This turns out to be a lot of work for every visit, especially if we haven't seen the uri in the past.
What seems strange to me is that a clean profile (where all the visits are new) is a lot faster than an old profile (where the visits are not new).
Updating a large index is going to take longer to update than a virtually empty one.
And to be more clear, AFAIK, indexes are binary trees, so insertion/finding is going to be O(log(n)), so if we have a bazillion pages in there, it's going to take time.
OK, so inserting 40 entries into the history visits table and updating its 3 indexes takes maybe 7s for a lot of history. That still seems like a lot of CPU time to get not a lot done.

Possibly this is just endemic with using something as general as SQL here, or maybe SQLite is just slow, but whoever's to blame, it seems like things could be a lot better.

Updated

9 years ago
Depends on: 481257
So, this may be our fault.  In our insertion triggers, we have a bit of SQL that looks like this:
MAX((SELECT IFNULL(MAX(id), 0) FROM moz_places_temp), (SELECT IFNULL(MAX(id), 0) FROM moz_places)) + 1
We use this to determine the new insertion id.  The trigger for moz_historyvisits is very similar.
The bad news is that how this is written, it requires a full table scan of each table.  According to drh, this is the likely culprit since we are spending a bunch of time in sqlite3VdbeExec.

We can let the optimizer use the index instead of a table scan if we change those bits to the following:
MAX(IFNULL((SELECT MAX(id) FROM moz_places_temp), 0), (IFNULL(SELECT MAX(id) FROM moz_places), 0)) + 1
Note the change of location for IFNULL.

I've spun off bug 481257 for this issue.  Not going to fix it here, because it may end up being that it's not the cause of this (but it is an issue!).
For what it's worth, the patch for the trigger issue I just put in bug 481257 seems to resolve this bug for me: reftest runs in about 8 minutes instead of 17.5.
Awesome!
Fixed by the checkin for bug 481257.
Status: NEW → RESOLVED
Last Resolved: 9 years ago
Flags: in-testsuite-
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.