Closed Bug 392399 Opened 17 years ago Closed 17 years ago

use multi-column index on moz_historyvisits to improve performance

Categories

(Firefox :: Bookmarks & History, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Firefox 3 beta3

People

(Reporter: moco, Assigned: mak)

References

Details

(Keywords: perf)

Attachments

(2 files, 6 obsolete files)

add last_visit_date column to moz_places table (to improve history UI performance)

The current url loc patch (bug #389491) does chunking over the moz_historyvisits table.  

This has two issues:

1)  this table can be much bigger than moz_places (especially with history set to expire in 180 days).
2)  this table doesn't contain items you've never visited (if you want to show bookmarks in autocomplete, and you've expired history, you would see them.)

Perhaps we should:

1)  add a last_visit_date column to moz_historyvisits (and index it) and increment it for any non-embedded visit.  I'd suggest we lazily increment the visit table, but non-lazily increment the last_visit_date in the moz_places table.

2)  use the moz_places table (and not the moz_historyvisits) table, and chunk over that.

This could save us from us from having to JOIN against moz_historyvisits, but give us the same result.  We'd probably use that column (and avoid the JOIN) for other queries, too, such as:  the history menu and the history sidebar,

Potential issue:

When deleting a visit from the history sidebar, if it doesn't delete all visits for that place, would mean the last_visit_date could be incorrect.  (There might be a bug about how delete should delete all visits, but I'm not sure if that is true for group by date).  so delete might have to query the moz_visits table to update the last_visit_date value upon deletion of a specific visit.

as for migration, this would be something we could do in place:  add the new column and populate it from the existing moz_history_visits table.

mano / dietrich, what do you think?
it'd be good to get at least a ballpark estimate of how this would affect perf for a 180 day history, but this generally sounds ok. i think mano wanted to add a column somewhere as well, so please coordinate.
I wanted to remove a column somewhere (in particular, folder_type from moz_bookamrks).
dietrich:  agreed, having some perf numbers to back this up for a large history is a requirement.

I'll coordinate schema changes with mano so we can do this in one fell swoop.
when dietrich and I last discussed this, we realized there might be a problem with doing this if we want to give extra value to the transition type (link, typed, bookmark, etc), which is stored in the moz_history_visits.

we could store "the highest" transition type in moz_places (as I'm proposing we do with the "most recent" visit date.

going to hold off on this for now, and see what we need (as more UI changes happen to the location bar, per faaborg's mockups.)
on a related note, see bug #394038

instead of storing last_visit_date in moz_places, the plan is to store a global frencency value.

this bug may become wontfix.
(In reply to comment #1)
> it'd be good to get at least a ballpark estimate of how this would affect perf
> for a 180 day history, but this generally sounds ok. i think mano wanted to add
> a column somewhere as well, so please coordinate.
> 

I have simulated history based on real user data. There were about 500 places, visited more than 75000 times within a year or so. The super-optimized query http://mxr.mozilla.org/firefox/source/toolkit/components/places/src/nsNavHistory.cpp#2249 needed 564ms just to display 10 last items in history. This query is performed when you click on "History" menu, which is not related to frecency (comment #5) in any way in my understanding.

I was able to optimize this down to 127ms, but this is still terrible. I have tried many variants, but the only way how to boost the performance seems adding last_visit_date to moz_places. In this case the original super-optimized query is the fastest one and takes only 3.27ms - even without adding the index on the added last_visit_date column.

There is actually a way how to speed even this - copy the hidden column from moz_places to moz_historyvisits, but this is just by 5% faster and would require more changes in the code.

SUGGESTION
----------

I'm not sure whether there is any objection against triggers, but at least the following code nicely shows what should be changed.

ALTER TABLE moz_places
    ADD last_visit_date INTEGER;

UPDATE moz_places
    SET last_visit_date =
        (SELECT MAX(visit_date)
         FROM   moz_historyvisits
         WHERE  moz_places.id=place_id
           AND  visit_type NOT IN (0,4));

CREATE TRIGGER moz_historyvisits_ai AFTER INSERT ON moz_historyvisits
FOR EACH ROW WHEN NEW.visit_type NOT IN (0,4)
BEGIN
    UPDATE moz_places
       SET last_visit_date = NEW.visit_date
     WHERE id = NEW.place_id;
END;

In the query, MAX(v.visit_date) gets simply replaced with last_visit_date.
yes, with triggers we could do many things, also some partial expiration (for example favicons, embedded, etc), but i'm not sure if someone will put triggers in at this stage of the release. we could open a new report like "investigate on use of triggers to speed up places" for future investigation, since this has to be tested quite deep to avoid having cpu spikes when trigger gets in

Here are some numbers with small, mid size and big database and with favicons added to my tests, tuned queries are linked from this document:

http://spreadsheets.google.com/a/allpeers.com/pub?key=pZeib3u49OYXE8DvRD5RTHA

We can update last_visit_date whenever we update visit_count. Can we use current time, or do we need to use the time from moz_historyvisits?
Ondrej, thanks for looking into this.

I'm unable to view that google doc (looks like I need an allpeers login?)

> This query is performed when you click on "History" menu, which is not 
> related to frecency (comment #5) in any way in my understanding.

You are correct, that query has nothing to do with frecency.

> I was able to optimize this down to 127ms, but this is still terrible.

Can you attach a patch for that?  

I wouldn't call 127 ms is terrible, given the size of your database.  (Is the delay even noticable?)

> the only way how to boost the performance seems adding last_visit_date to 
> moz_places. In this case the original super-optimized query
> is the fastest one and takes only 3.27ms - even without adding the index 
> on the added last_visit_date column.

Holy smokes!

Additionally, having last_visit_date in moz_places might allow us to speed up some history sidebar / places organizer (when selecting history) queries. 

> this bug may become wontfix

clearly, this is not a wontfix.
we'll need to recalculate the last_visit_date upon visit (but only when type <> 0 or <> 4 TRANSITION_EMBED), expiration and removal of visits.  

Does your patch already do that?
> I'm unable to view that google doc (looks like I need an allpeers login?)

Here is the document republished: http://spreadsheets.google.com/pub?key=pZeib3u49OYXE8DvRD5RTHA

>> I was able to optimize this down to 127ms, but this is still terrible.

> Can you attach a patch for that?

I can prepare patch, but I'm not sure for which alternative. Please look at the document, there are 4 tuned alternatives. I personally would ignore the "+Hidden" ones, they do not add that much. Please note that adding favicons to my test database increased time to 204ms (from 127ms).

> I wouldn't call 127 ms is terrible, given the size of your database.  
> (Is the delay even noticable?)

Please note that the times that I refer to are times spent in sqlite API only. It bypasses mozStorage, so in the real life it would be always slower (but this slowdown should be rather constant, because the amount of data transferred between sqlite and UI is the same). With database time 300ms, the delay when entering History menu was very noticable to me. With menus, you expect immediate reaction.

> we'll need to recalculate the last_visit_date upon visit (but only when type <>
> 0 or <> 4 TRANSITION_EMBED), expiration and removal of visits.  
> 
> Does your patch already do that?

Not at the moment, I need a decision which alternative to take and bug should probably be assigned to me, so that not more people work on the same. 

mh, i had not take a look at history menu query lately, it can be really optimized as a first step, without changing anything in the db, the moving from my testing with 1100 places and 180000 visits its about from 60ms to 30ms (this values are pure query time, without output time).
Imho trigger should be a second step, after patching the current query, because trigger will add a delay on inserts in moz_historyvisits that should be calculated somehow.
this is the new query i tested, moving from 60ms to 30ms:

SELECT h.id, h.url, h.title, h.rev_host, h.visit_count,
(SELECT MAX(visit_date) FROM moz_historyvisits WHERE place_id = h.id),
f.url, null, null
FROM moz_places h
LEFT OUTER JOIN moz_favicons f ON h.favicon_id = f.id
WHERE h.id IN
(SELECT DISTINCT p.id
        FROM moz_places p
        JOIN moz_historyvisits ON place_id = p.id
        WHERE hidden <> 1 AND visit_type <> 4 AND visit_type <> 0
        ORDER BY visit_date DESC
        LIMIT 10)
ORDER BY 6 DESC
;

i have tried to get where the delay come from in the new query i built up. The subquery is really fast, about 8ms, what takes that to 30ms is the SELECT MAX(visit_date) subquery... 
i tried to get why that happens, the db uses  moz_historyvisits_pageindex to filter this query on place_id, and then it has to do a linear search through visit_date to find the max

so, my new idea is change the moz_historyvisits_pageindex(place_id) index to:

CREATE INDEX moz_historyvisits_pagedate ON moz_historyvists(place_id, visit_date)

This change will speed up all queries that join on place_id and filter on visit_date (a lot of queries). then we could investigate if this unified index will also make obsolete the visit_date index.

changing the index bring my query from 30ms -> 8ms with 1300 places and 200000 visits

Ondrej Brablc could you try this change on your db (so use your new query but add this index) and tell what timings do you see to view the difference with trigger method?
typo:

CREATE INDEX moz_historyvisits_pagedateindex ON moz_historyvisits(place_id,
visit_date)
(In reply to comment #13)
> so, my new idea is change the moz_historyvisits_pageindex(place_id) index to:
> 
> CREATE INDEX moz_historyvisits_pagedate ON moz_historyvists(place_id,
> visit_date)

I do not see any significant difference with this index. May be you have different distribution of visits per place. In your recent 10 there may be a places which have very long history. In my case it is 14% of visits:

select (SELECT count(*)
FROM  moz_historyvisits h1 INNER JOIN (SELECT DISTINCT place_id
        FROM moz_historyvisits, moz_places
       WHERE place_id = moz_places.id
         AND moz_places.hidden <> 1
         AND visit_type NOT IN (0,4)
       ORDER BY visit_date
       DESC LIMIT 10) h2 ON h1.place_id = h2.place_id)*100 /
       (SELECT count(*) FROM moz_historyvisits);

Anyway, change of the index can speed it up by 0.5ms in the subselect, but 1 subselect takes me 1ms. So we can save 10*0.5ms - 5ms. This is not measurable, and the whole query takes me 200ms, so 5ms, do not make a difference.

So I suggest that we chose from following solutions:

1) Tune the SQL to use subselect: 
http://www.allpeers.com/download/mozilla/places/test_history_top10%23t1_subselect.sql

2) Add last_visit_date and appropriate changes:
http://www.allpeers.com/download/mozilla/places/test_history_top10%23t3_lastvisit.sql

I will check where and how much would we contribute from the added field and whether we need an index on it.

The subquery takes usually less than 1ms. Adding this index, can speed it up to 0.5ms. So it looks like 10*0.5ms speedup - but in my case from 235

I understand the reasoning and would expect this to speed it up, but it is not the case, at least not for me. The times are almost the same, even though I can see the new index is used. If this helps in your case, than you may have different distribution of visits (may be your recent visits are for sites that are visited very frequently).

 
> This change will speed up all queries that join on place_id and filter on
> visit_date (a lot of queries). then we could investigate if this unified index
> will also make obsolete the visit_date index.
> 
> changing the index bring my query from 30ms -> 8ms with 1300 places and 200000
> visits
> 
> Ondrej Brablc could you try this change on your db (so use your new query but
> add this index) and tell what timings do you see to view the difference with
> trigger method?
> 

could you try my query instead of yours with the new index?
i don't get any advantage with that index with your query, while i'm getting big with mine... i think that is because you have added a check on visit_type for max(visit_date) that should no be useful since it's already filtered by h.id (that is filtered already on visit_type)

if you can share your profile i'd like to do some test with it
(In reply to comment #16)
> could you try my query instead of yours with the new index?
> i don't get any advantage with that index with your query, while i'm getting
> big with mine... i think that is because you have added a check on visit_type
> for max(visit_date) that should no be useful since it's already filtered by
> h.id (that is filtered already on visit_type)

I did not see, that you have removed the visit_type check. In this case you are right, I get from 232ms down to 44ms. BTW, removing the visit_date index makes it much worse - it goes above 1100ms. So with the added index and using the simpler query (your query) it is now 40+ times faster than it was originally.

This looks like quite good result, so we could stay with this solution if we try to solve this isolated from other queries. Now the question is whether we really can leave the visit_type check out. I believe, that you may have a visit with type 0 or 4, which is more recent than a visit with type other than 0 and 4. So we would display a place higher, than it should actually be. But we would still display only correct history results on the menu.

> if you can share your profile i'd like to do some test with it

Please contact me via email or IRC (user: ondrej). 
"Show All History" does not benefit from the added index on moz_historyvisits. With last_visit_date added, the query runs 10 times faster (instead of 1664ms it runs 153ms). Adding index on last_visit_date cuts this down to 71ms. So this is another argument why to prefer last_visit_date over subselects.
> BTW, removing the visit_date index makes it much worse

yes, visit_date index should not be removed, we should remove pageindex (to change it with the new one)


> I believe, that you may have a visit
> with type 0 or 4, which is more recent than a visit with type other than 0 and
> 4. So we would display a place higher, than it should actually be. But we 
> would still display only correct history results on the menu.

that can be true, visit_type = 4 is transition embed, and visit_type = 0 is due to bug 375777. The same place_id could have some visit_type = 4 and some visit_type != 4... this could be a minor glitch, the subquery (SELECT MAX(visit_date) FROM moz_historyvisits WHERE place_id = h.id) is used in a lot of queries, they are all wrong from this point of view, and pointing to wrong dates... 
notice also that embedded visits are discarded after 24 hours


> "Show All History" does not benefit from the added index on moz_historyvisits.

that is because the query does not use a subselect there

"SELECT h.id, h.url, h.title, h.rev_host, h.visit_count, MAX(visit_date), "
"f.url, null, null "
"FROM moz_places h "
"LEFT OUTER JOIN moz_historyvisits v ON h.id = v.place_id "

it should be changed to the subselect, it has not been optimized yet, changing to the subselect will speed up this as all other queries


the only reason i see to move to trigger is the not correct max(visit_date) that counts visit_type = 4 as valid visit_date... 
sorry for bugspam, the problem of bad visit_date counting visit_type = 4 can be solved changing the subquery (so it's actually a sqlite optimizer bug, since this works fast)

SELECT h.id, h.url, h.title, h.rev_host, h.visit_count,
(SELECT visit_date FROM moz_historyvisits WHERE place_id = h.id 
 AND visit_type NOT IN(0,4) ORDER BY visit_date DESC LIMIT 1),
f.url, null, null
FROM moz_places h
LEFT OUTER JOIN moz_favicons f ON h.favicon_id = f.id
WHERE h.id IN
(SELECT DISTINCT p.id
        FROM moz_places p
        JOIN moz_historyvisits ON place_id = p.id
        WHERE hidden <> 1 AND visit_type <> 4 AND visit_type <> 0
        ORDER BY visit_date DESC
        LIMIT 10)
ORDER BY 6 DESC
;

this (with the new index) fixes all problem to me, it's fast like before, does not count bad visit_date... all queries can be changed to use the new subquery.

i still need to test out All history query timings, will do asap
Blocks: 385245
History in the left pane of the places organizer is actually: 
  place:sort=4&

that becomes 

        SELECT h.id, h.url, h.title, h.rev_host, h.visit_count,
        MAX(visit_date),
        h.url, null, null
        FROM moz_places h
        LEFT OUTER JOIN moz_historyvisits v ON h.id = v.place_id
        LEFT OUTER JOIN moz_favicons f ON h.favicon_id = f.id
        WHERE
        h.hidden <> 1 AND v.visit_type <> 4 AND v.visit_type <> 0 
        GROUP BY h.id
        ORDER BY 6 DESC

this query is taking 580ms returning 861 records (as always, this is pure query time) 
replacing MAX(visit_date) with null (best case) takes 550ms
using a subquery takes about same timing as using max(visit_date)
notice that we cannot remove the LEFT JOIN on moz_historyvisits since there is the WHERE clause using visit_type


in the sidebar we are returning all visits, without any grouping or max(values) so again that cannot get faster

so i cannot understand how having last_visit_date could speed up show all history query... 
No longer blocks: 385245
s/h.url, null, null/f.url, null, null/
Attached patch test patch (obsolete) — Splinter Review
this is a test patch implementing the index and queries changes... 
it's for testing purpose, to take some perf measure, or compare with trigger alternative. 
Note: make a backup of the profile before this, since it will upgrade db schema to change the index.
(In reply to comment #20)
> sorry for bugspam, the problem of bad visit_date counting visit_type = 4 can be
> solved changing the subquery (so it's actually a sqlite optimizer bug, since
> this works fast)

If there are really bugs in the SQLite optimizer, are these being filed in their bug database?
(In reply to comment #23)
> Created an attachment (id=293589) [details]
> test patch

This query takes 5ms on my profile. This is 400 times faster than before, there is no visible delay when entering the History menu. The trigger version takes 4ms on my profile, so we can drop this idea now, because the difference is not visible now.

Some comments to the patch:

1) This comment seems wrong: // Migrate anno tables up to V7

2) What about to rename the index from moz_historyvisits_pagedateindex to moz_historyvisits_placedateindex, for newcomers, this is cleaner.

3) Could we unify all "visit_type <> 4 AND visit_type <> 0" to visit_type NOT IN (0,4) in all queries?

4) Would not it be more readable, if we used something like this:

#define SQL_STR_FRAGMENT_MAX_VISIT_DATE( place_relation )  \
  "(SELECT visit_date FROM moz_historyvisits WHERE place_id = " place_relation \
  "AND visit_type NOT IN (0,4) ORDER BY visit_date DESC LIMIT 1)"

And then in all queries:

  "SELECT h.id, h.url, h.title, h.rev_host, h.visit_count, " 
   SQL_STR_FRAGMENT_MAX_VISIT_DATE( "h.id" ) ", f.url"

5) nsresult MigrateV7Up(mozIStorageConnection *aDBConn); is missing in the header.


I lost some time yesterday by running ANALYZE SQLite command, at that time I tested the same SQL which Marco proposed, but did not realize, that it was ANALYZE what completely screwed up the results. I highly recommend not using it and will check whether this is known bug in SQLite. But I do not think that there is any bug regarding the different plan chosen by SQLite when we move from MAX to ORDER BY DESC LIMIT 1 (see comment #24).

So now we have solved the History menu and we should concentrate on the History sidebar in the bug #385245. We will not need last_visit_date as Marco correctly says in comment #21, so in this sense this bug is not related to that one, but I'm  pretty sure, we will end up using the subquery, that we have just introduced, but we would have to add date range to it.
> 1) This comment seems wrong: // Migrate anno tables up to V7

fixed locally

> 2) What about to rename the index from moz_historyvisits_pagedateindex to
> moz_historyvisits_placedateindex, for newcomers, this is cleaner.

mh, yes, i think so, i'll also add a IF NOT EXISTS to the creation

> 3) Could we unify all "visit_type <> 4 AND visit_type <> 0" to visit_type NOT
> IN (0,4) in all queries?

sometimes the sqlite optimizer makes a worst work with IN than with AND, i think that this has been fixed in latest 3.5.4 (Better optimization of some IN operator expressions), so i could do that

> 4) Would not it be more readable, if we used something like this:
> 
> #define SQL_STR_FRAGMENT_MAX_VISIT_DATE( place_relation )  \
>   "(SELECT visit_date FROM moz_historyvisits WHERE place_id = " place_relation
> \
>   "AND visit_type NOT IN (0,4) ORDER BY visit_date DESC LIMIT 1)"
> 
> And then in all queries:
> 
>   "SELECT h.id, h.url, h.title, h.rev_host, h.visit_count, " 
>    SQL_STR_FRAGMENT_MAX_VISIT_DATE( "h.id" ) ", f.url"

i was thinking something similar, but i'll wait for Seth's ideas on that

> 5) nsresult MigrateV7Up(mozIStorageConnection *aDBConn); is missing in the
> header.

yes, fixed locally, i forgot to pack up the header in the patch (doh!)

> and will check whether this is known bug in SQLite. But I do not think that
> there is any bug regarding the different plan chosen by SQLite when we move
> from MAX to ORDER BY DESC LIMIT 1 (see comment #24).

yes this needs to be investigated on 3.5.4 and if it's still there we should fill a sqlite bug on the issue, MAX should work better, not worst, than a limited order by.
i've opened a ticked on Sqlite CVSTrac to upstream the issue with MAX(), i'll report when there will be news about that (the only think they said is that is confirmed that the index is not used with MAX). I think however that we could go with a patch based on Ondrej's suggestions in comment #25, also because we don't know when sqlite will fix the issue and when we will upgrade to that sqlite version...
this is the ticket: http://www.sqlite.org/cvstrac/tktview?tn=2853

"We will be enhancing the min()/max() optimization to be able to cover the case you describe above. But this will take some time to get right. If you are in a hurry, it seems like modifying your SQL is the easiest way to go."

they will make SELECT MAX(x) WHERE expr => SELECT x WHERE expr ORDER BY x DESC LIMIT 1, so my fragment is correct to use the new index, while MAX cannot use the index (at least until the sqlite bug is fixed).

Coming with updated patch. this needs a schema upgrade so it could be unified with other upgrades requiring an upgrade (like adaptive learning that will need a new table)
Attached patch Updated patch (obsolete) — Splinter Review
this is updated with comments from Ondrej that imho are good points.
i've put the max fragment define in nsNavHistory.h to be usable from nsNavBookmarks.

i also fixed a couple of visit_date <> 0 AND visit_date <>4 to visit_date NOT IN (0,4) in autocomplete code, to use the same syntax in all queries

this, like the previous patch, makes a schema upgrade, so as always do a backup of places.sqlite before running.

asking a first review of the code, this should probably be unified with frecency patch or another patch requiring a schema upgrade...
Attachment #293589 - Attachment is obsolete: true
Attachment #294393 - Flags: review?(sspitzer)
marco, heads up, see https://bugzilla.mozilla.org/show_bug.cgi?id=407296#c15

when swtiching between a build with your patch and an older build, the downgrade code will cause problems.
yes i know, it is when ForceMigrateBookmarksDB recreates the bookmarks table, but that is a problem of all downgrades where DBSchemaVersion > 2 and there is leftpanefolderid... imho it should be fixed in 407296 with a patch to downgrade...
Marco / Ondrej:  this is looking like some good stuff.  I plan to review and land this soon.

updating summary, as the solution doesn't involve a new column, but a better index.
Assignee: nobody → mak77
No longer blocks: 394038
Flags: blocking-firefox3?
Keywords: perf
Summary: add last_visit_date column to moz_places table (to improve history UI performance) → use multi-column index on moz_historyvisits to improve performance
Target Milestone: --- → Firefox 3 M11
Flags: blocking-firefox3? → blocking-firefox3+
I don't think we need to bump the schema to 7 just for the index changes.

In the past, we've added / removed indexes dynamically without bumping the schema.

http://bonsai.mozilla.org/cvslog.cgi?file=/mozilla/toolkit/components/places/src/nsNavHistory.cpp&rev=HEAD

This is fortunate, because we don't want to do that before we have a fix for bug #407296, see comment #30 and #31.

As for how to handle this for existing profiles, I think we should do what dietrich did for bug #381795, and on long idle, fix the index (as it might take a while to create.)
index drop and creation is quite fast... however that will allow to check-in this without waiting for other schema upgrades, i'll try to make the index change on idle
> index drop and creation is quite fast... however that will allow to check-in
> this without waiting for other schema upgrades, i'll try to make the index
> change on idle

Is it fast even if the moz_historyvisit table is very large?

If so, then we could put the drop if exists / create code somewhere else, not on idle?

either way we decide to do this, it should only affect previous firefox 3 users once (first run of fx 3b3 or on the next nightly).

and, no matter where we fix the index, we don't need to update the schema.
i'm putting that on long idle timer, the same used for urlindex in bug #381795, the index change is quite fast because it involves only int, not text, however on a very large db it could take some second (less than 5 seconds from my tests, but could be worse if the user has setup 9999 days of history...). 
Even if the user does not upgrade soon there should be no loss of performance since the fragment will use the old index, so i think that long idle time is a good choice.
the index is correctly replaced after about 15 minutes of idle time
this does not need anymore a schema upgrade so it can be checked-in when ready
Attachment #294393 - Attachment is obsolete: true
Attachment #295310 - Flags: review?(sspitzer)
Attachment #294393 - Flags: review?(sspitzer)
I'm having second thoughts about doing the index modifications on idle.

I think it would be better to pay the index creation time on first run or
migration from fx3b2, then to have people out there with various states (if
they haven't been idle they'd be without index.)

I'd rather guarantee everyone be in the same state.

I'm going to test out your fix, but move your changes out of onIdle() to a new
method, EnsureCurrentSchema() which will be called when PLACES_SCHEMA_VERSION
== DBSchemaVersion.

For bug #394038, I plan to use EnsureCurrentSchema() to add the frecency column
to places.sqlite, as I don;t want to bump the schema.

I'll attach the updated patch, and if all goes well with the rest of my review
and testing, land it.
Attachment #295310 - Attachment is obsolete: true
Attachment #295310 - Flags: review?(sspitzer)
Attachment #295461 - Attachment is obsolete: true
while testing marco's patch and the places he changed, I noticed that a few of queries were not used.  this patch removes them.
Attachment #295465 - Attachment is obsolete: true
Comment on attachment 295467 [details] [diff] [review]
updated patch, remove printf

r=sspitzer on Marco's patch.

for reference, his patch:

1)  drop the old moz_historyvisits_pageindex index and create a new multi-column moz_historyvisits_placedateindex index, for performance gains.
2)  fix the queries where we attempt to find the max visit date for a place to always exclude transitions of type 0 and 4.
3)  related to #2, for readability, add the SQL_STR_FRAGMENT_MAX_VISIT_DATE #define
4)  also related to #2, switch from "v.visit_type <> 4 AND v.visit_type <> 0" to "v.visit_type NOT IN(0,4)"
5)  remove unused code
Attachment #295467 - Flags: review+
fixed.

thanks to both Marco and Ondrej for their work on this one.

Checking in toolkit/components/places/src/nsNavBookmarks.cpp;
/cvsroot/mozilla/toolkit/components/places/src/nsNavBookmarks.cpp,v  <--  nsNavBookmarks.cpp
new revision: 1.138; previous revision: 1.137
done
Checking in toolkit/components/places/src/nsNavHistory.cpp;
/cvsroot/mozilla/toolkit/components/places/src/nsNavHistory.cpp,v  <--  nsNavHistory.cpp
new revision: 1.223; previous revision: 1.222
done
Checking in toolkit/components/places/src/nsNavHistory.h;
/cvsroot/mozilla/toolkit/components/places/src/nsNavHistory.h,v  <--  nsNavHistory.h
new revision: 1.119; previous revision: 1.118
done
Checking in toolkit/components/places/src/nsNavHistoryAutoComplete.cpp;
/cvsroot/mozilla/toolkit/components/places/src/nsNavHistoryAutoComplete.cpp,v  <--  nsNavHistoryAutoComplete.cpp
new revision: 1.29; previous revision: 1.28
done
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Attached patch supplimental fixSplinter Review
in the case of what we're doing in EnsureCurrentSchema() for Marco's fix, we are safe to call this when we call this for new profiles.

but, it's really called from the wrong spot, as I found out for bug #394038 (and for other potential future uses) .

It should be should happen after we create all the tables, otherwise, for a new profile, if EnsureCurrentSchema() tries to ensure that a column exists in a table, it will try to do that before a table exists and we'll fail.
Attachment #295475 - Flags: review?(dietrich)
A few comments for dietrich / marco:

1) my patch makes it so EnsureCurrentSchema() is called after all the other exisiting "schema ensuring code" in nsNavHistory::InitDB().  The existing code does things "create table".  Perhaps we should move that existing code into EnsureCurrentSchema() or move the code in EnsureCurrentSchema() out?

2)  Currently, EnsureCurrentSchema() has a transaction (pageindexTransaction), but the code that calls EnsureCurrentSchema() is already wrapped with a transaction, so the internal one can probably be removed.

These issues can be turned into spin off bugs, but for the patch in bug #394038, I still need to move the call to after table creation.

Let me know what you guys think.
(In reply to comment #21)
> History in the left pane of the places organizer is actually: 
>   place:sort=4&
> 
> that becomes 
> 
>         SELECT h.id, h.url, h.title, h.rev_host, h.visit_count,
>         MAX(visit_date),
>         h.url, null, null
>         FROM moz_places h
>         LEFT OUTER JOIN moz_historyvisits v ON h.id = v.place_id
>         LEFT OUTER JOIN moz_favicons f ON h.favicon_id = f.id
>         WHERE
>         h.hidden <> 1 AND v.visit_type <> 4 AND v.visit_type <> 0 
>         GROUP BY h.id
>         ORDER BY 6 DESC
> 
> notice that we cannot remove the LEFT JOIN on moz_historyvisits since there is
> the WHERE clause using visit_type
Why can't that LEFT OUTER JOIN be removed? The where clause is used to filter out potential visit_dates and no columns are selected from that table.
(Remove the JOIN and replace with the SQL_STR_FRAGMENT_MAX_VISIT_DATE subquery.)

Just curious why that wasn't done for mDBAutoCompleteQuery.
(In reply to comment #47)
> Why can't that LEFT OUTER JOIN be removed? The where clause is used to filter
> out potential visit_dates and no columns are selected from that table.

because you don't want to show embedded visits, so they should not be selected at all... the same should be valid for mDBAutoCompleteQuery

(In reply to comment #46)
> A few comments for dietrich / marco:
> 
> 1) my patch makes it so EnsureCurrentSchema() is called after all the other
> exisiting "schema ensuring code" in nsNavHistory::InitDB().  The existing code
> does things "create table".  Perhaps we should move that existing code into
> EnsureCurrentSchema() or move the code in EnsureCurrentSchema() out?

ths is a one-time change between betas, will that be removed or always there?
for clarity i'd put everything together, inside or outside of ensurecurrentschema is not really a difference, but i'd differentiate better between fixed things and one-time changes at startup
 
> 2)  Currently, EnsureCurrentSchema() has a transaction (pageindexTransaction),
> but the code that calls EnsureCurrentSchema() is already wrapped with a
> transaction, so the internal one can probably be removed.

that should not be a problem, the internal transaction will be discarded automatically by mozStorage if there is already one...
(In reply to comment #49)
> (In reply to comment #47)
> > Why can't that LEFT OUTER JOIN be removed? The where clause is used to filter
> > out potential visit_dates and no columns are selected from that table.
> 
> because you don't want to show embedded visits, so they should not be selected
> at all... the same should be valid for mDBAutoCompleteQuery

Would that come back with a null sub-query SELECT (SELECT MAX(visit_date) ...) max_visit_date WHERE NOT max_visit_date ISNULL ?

Would that be better than needing to have a big JOIN and filter results based on visit_type?
that will work if the query asks for all places with visits, that is different from having all places with visits != EMBEDDED. So it depends on the query, it should improve speed if applicable.
Attachment #295475 - Flags: review?(dietrich) → review+
(In reply to comment #50)
> (In reply to comment #46)
> > A few comments for dietrich / marco:
> > 
> > 1) my patch makes it so EnsureCurrentSchema() is called after all the other
> > exisiting "schema ensuring code" in nsNavHistory::InitDB().  The existing code
> > does things "create table".  Perhaps we should move that existing code into
> > EnsureCurrentSchema() or move the code in EnsureCurrentSchema() out?
> 
> ths is a one-time change between betas, will that be removed or always there?
> for clarity i'd put everything together, inside or outside of
> ensurecurrentschema is not really a difference, but i'd differentiate better
> between fixed things and one-time changes at startup
> 

that consolidation could be done in a followup. this fix is still required for the new profile scenario.

> > 2)  Currently, EnsureCurrentSchema() has a transaction (pageindexTransaction),
> > but the code that calls EnsureCurrentSchema() is already wrapped with a
> > transaction, so the internal one can probably be removed.
> 
> that should not be a problem, the internal transaction will be discarded
> automatically by mozStorage if there is already one...
> 

still, it's not necessary
Bug 451915 - move Firefox/Places bugs to Firefox/Bookmarks and History. Remove all bugspam from this move by filtering for the string "places-to-b-and-h".

In Thunderbird 3.0b, you do that as follows:
Tools | Message Filters
Make sure the correct account is selected. Click "New"
Conditions: Body   contains   places-to-b-and-h
Change the action to "Delete Message".
Select "Manually Run" from the dropdown at the top.
Click OK.

Select the filter in the list, make sure "Inbox" is selected at the bottom, and click "Run Now". This should delete all the bugspam. You can then delete the filter.

Gerv
Component: Places → Bookmarks & History
QA Contact: places → bookmarks
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: