Closed Bug 450705 Opened 12 years ago Closed 11 years ago

Optimize the query changes from the temp table rewrite

Categories

(Toolkit :: Places, defect)

defect
Not set

Tracking

()

VERIFIED FIXED
mozilla1.9.1b2

People

(Reporter: sdwilsh, Assigned: mak)

References

Details

Attachments

(2 files, 8 obsolete files)

Many of the queries are pretty craptastic when it comes to perf.  We should fix that...
Blocks: 437244
i'd like to take this if you're comfortable with it
I thought you were taking this actually!  When do you think you can get a first pass for the more important queries?
i'm now looking at the latest patch revision from you on bug 449640, so i hope to get a first pass of all queries for tomorrow
Assignee: nobody → mak77
Cool.  I should get an updated patch to bug 449640, but no major changes, (and I'll try really hard to not do any query changes).  I strongly recommend running the unit tests after each query change.  Modifying a bunch and then running them doesn't work out so well.
Whiteboard: [swag: 1d]
i've rewritten most files but nsNavHistory.cpp that is quite huge, so i need some more time, mainly because some query causes hang (exec time > 30s and more) with a decent history (let's say more than 100 000 entries) and some query could potentially return a different set due to joins between temp and non temp tables. 

We should have more unit tests for every method, passing tests is pitifully not enough to be sure that a query is returning a consistent resultset :/

I hope to have a patch ready for tomorrow, i fear actual patches could only work with a few bookmarks and a compact history.
Whiteboard: [swag: 1d]
Attached patch wip (obsolete) — Splinter Review
really wip, not many rewrites in this patch but first step to look into. Will finish all other files before attacking nsNavHistory. needs some little edit to apply on sdwilsh latest patch version (was based on previous ones so i stil lhave to merge new versions in)

also contains some formatting for oldest queries.

final results will include this kind of table:

Test with 19000 moz_places, 145000 visits, 2700 bookmarks

query                             | orig      | part       | part_opt
==========================================================================
nsAnnotationService.cpp
-----------------------
mDBGetAnnotationFromURI           | 0,6ms     | 2,5ms      | 1,4ms
GetPagesWithAnnotationCOMArray    | 0,3ms     | 220ms      | 0,6ms
__________________________________________________________________________
nsFaviconService.cpp
--------------------
mDBGetURL                         | 0,3ms     | 2,6ms      | 1.2ms
__________________________________________________________________________
nsNavBookmarks.cpp
--------------------
mDBFindURIBookmarks               | 1,2ms     | 3,5ms      | 3,2ms
mDBGetChildren                    | 120ms     | >30s HANG  | 200ms
mDBGetItemProperties              | 0,3ms     | 2,5ms      | 0,6ms
mDBGetRedirectDestinations        | 1,4ms     | 600ms ERR  | 6ms LONG
mDBGetKeywordForURI               | 0,8ms     | 3ms        | 2.8ms
mDBGetURIForKeyword               | 3ms       | 230ms      | 5ms
FillBookmarksHash                 | 123ms     | >30s HANG  | 200ms LONG
there is a problematic issue in those queries that require an order by on the global result and that's especially for autocomplete.

For example:
SELECT h.url, h.title, f.url,null, h.visit_count
        FROM (
          SELECT * FROM moz_places_temp
          WHERE url IN(list_of_urls)
          UNION ALL
          SELECT * FROM moz_places
          WHERE id NOT IN (SELECT id FROM moz_places_temp)
          AND url IN(list_of_urls)
        ) AS h
        LEFT JOIN moz_favicons f ON f.id = h.favicon_id
        ORDER BY h.frecency DESC
;

the order by is applied to the compound table that does not have an index on frecency... and we can't order on single tables since results will most likely be overlapped... in this case the timing for the query is about 6 times slower, and that will happen for all ordered by queries, not good since we are using them often in ui...
(In reply to comment #8)
> the order by is applied to the compound table that does not have an index on
> frecency... and we can't order on single tables since results will most likely
> be overlapped... in this case the timing for the query is about 6 times slower,
> and that will happen for all ordered by queries, not good since we are using
> them often in ui...
I was afraid of this, and this is why you'll find a comment in that file saying that it could be a problem :(
i don't now how indexes are phisically implemented in sqlite, but since a UNION ALL is simply putting a table after the other and both tables have indices, maybe instead of asking Drh a complex add index to views we could ask him if it would be easier merging index in a UNION ALL... Do you see what i mean?
There appears to be lots of opportunity to optimize the query above.  Can you tell me (in general terms) (1) how many entries in moz_places (6471 in the copy of FF I use every day, but I get the idea that I am a "light" user) (2) how many entries in moz_places_temp, and (3) how many entries in list_of_urls?  (3) is perhaps the most important.

The query above would really benefit if the moz_places_url_uniqueindex where redefined to include frecency:

   CREATE UNIQUE INDEX moz_places_url_uniqueindex ON moz_places(url, frecency);

When you have a multi-term index like this, the first term can be used in the WHERE clause to speed search while the second term is used by the ORDER BY clause to obviate the need to sort the results.
(In reply to comment #11)
> (1) how many entries in moz_places (6471 in the copy of FF I use every day, > but I get the idea that I am a "light" user)

actually we have a max limit at about 40000 (hwv user *could* increase that)

> (2) how many entries in moz_places_temp

We don't have numbers on this since it will be dependant on the activity of the user between a synch happens on the tables. I suppose we should be under 1000 entries

> (3) how many entries in list_of_urls?

are urls returned by the previous awesomebar query, so should be 12, can be incremented by the user

> The query above would really benefit if the moz_places_url_uniqueindex where
> redefined to include frecency:

i've tried, but i don't see a big improvement, however this query is only one case (and probably not the worst), but awesomebar queries are various so we could query on title, on url, on tags and order on other fields. i'll try to create a list of problematic queries to analyze further.

thank you for directions
I tried the following query as an alternative:

SELECT url, title,
      (SELECT url FROM moz_favicons WHERE id=favicon_id) AS furl,
       null, visit_count, frecency
  FROM moz_places_temp
 WHERE url IN list_of_urls
UNION ALL
SELECT url, title,
      (SELECT url FROM moz_favicons WHERE id=favicon_id) AS furl,
       null, visit_count, frecency
  FROM moz_places
 WHERE url IN list_of_urls AND id NOT IN (SELECT id FROM moz_places_temp)
ORDER BY frecency DESC;

I didn't measure any noticeable speed difference between the two.  But on the other hand, using my dataset, both queries complete within the limits of the ability of gettimeofday() to measure their runtime.  Perhaps with your dataset, this new formulation of the query will be faster.
no, i don't see improvements too, but since that query returns limited results (due to IN clause) probably a better problematic example could be:

SELECT h.url, h.title, f.url, null, h.visit_count
    FROM (
      SELECT * FROM moz_places_temp
      WHERE frecency <> 0
      AND visit_count > 0
      UNION ALL
      SELECT * FROM moz_places
      WHERE id NOT IN (SELECT id FROM moz_places_temp)
      AND frecency <> 0
      AND visit_count > 0
    ) AS h
    LEFT OUTER JOIN moz_favicons f ON f.id = h.favicon_id
ORDER BY h.frecency DESC LIMIT ?1 OFFSET ?2
Attached patch wip (obsolete) — Splinter Review
- Updated to latest Shawn's patches, so applicable upon bug 449086 and bug 449640
- test pass
TODO:
- optimize nsNavHistory.cpp
- find a solution for mDBAutoComplete queries slowdown (see comment above)
Attachment #334494 - Attachment is obsolete: true
Marco - did you want me to address the queries that are wrong in the previous bug, or just have them taken care of here?
those are fixed here(In reply to comment #16)
> Marco - did you want me to address the queries that are wrong in the previous
> bug, or just have them taken care of here?

i have taken care of them here.

well about comment 14, the way i found to speed up the query is limiting moz_places results... since we will not take a lot of results from there i think a first step could be limiting moz_places on OFFSET+LIMIT, that will make first chunks faster, last chunks slower (but probably chunks with offset 10000 or more will never be used in the autocomplete).
Attached patch wip (obsolete) — Splinter Review
todo:
- finish nsNavHistory.cpp
- functionality test in the browser(i expect some query being wrong in the first complete version, since it's a quite big rewrite)
Attachment #335172 - Attachment is obsolete: true
Attached patch wip (obsolete) — Splinter Review
i'm only missing selectAs functions in nsNavHistory.
Attachment #335460 - Attachment is obsolete: true
Attached patch patch v1 (obsolete) — Splinter Review
Finally, the rewrite has come to an end, needs some testing though.
When synch will be ready we could use this to generate a tryserver build and better test functionality over there.
Attachment #335694 - Attachment is obsolete: true
Attached file query analysis
Actually, sync is ready.  I just posted a tryserver build over in bug 442967 (without your patch though).
(In reply to comment #23)
> Actually, sync is ready.  I just posted a tryserver build over in bug 442967
> (without your patch though).

adding this patch to a build would help also functionality test, so could you generate an updated build?

Attached patch patch v1.1 (obsolete) — Splinter Review
minor cleanup. Well the review isn't so straightforward fast and interesting, hwv someone will have to do that sometimes :)
Attachment #335804 - Attachment is obsolete: true
Attachment #335879 - Flags: review?(sdwilsh)
Attachment #335879 - Flags: review?(dietrich)
Comment on attachment 335879 [details] [diff] [review]
patch v1.1

>+++ b/toolkit/components/places/src/nsAnnotationService.cpp
>   // mDBGetAnnotationFromURI
>+  // bug 450705: We are not checking for duplicated ids into the unified
>+  // temporary table for perf reasons, LIMIT 1 will discard duplicates faster
>+  // since we can have only one anno with a certain name for every place_id.
I don't think the bug number is important here.  We usually do that when we have some open bug on something

>   rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
>       "INSERT INTO moz_annos "
>-      "(place_id, anno_attribute_id, mime_type, content, flags, expiration, type, dateAdded) "
>+        "(place_id, anno_attribute_id, mime_type, content, flags, expiration, "
>+          "type, dateAdded) "
Can you make type line up with place_id please (one less space)

>   rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
>       "INSERT INTO moz_items_annos "
>-      "(item_id, anno_attribute_id, mime_type, content, flags, expiration, type, dateAdded) "
>+        "(item_id, anno_attribute_id, mime_type, content, flags, expiration, "
>+        "type, dateAdded) "
Can you make type line up with item_id please (one more space needed)

>   // mDBGetItemsWithAnnotation
>   rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
>-    "SELECT a.item_id FROM moz_anno_attributes n "
>-    "INNER JOIN moz_items_annos a ON n.id = a.anno_attribute_id "
>+    "SELECT a.item_id FROM moz_items_annos a "
>+    "JOIN moz_anno_attributes n ON n.id = a.anno_attribute_id "
>     "WHERE n.name = ?1"),
Hmm, why the table change?


>+++ b/toolkit/components/places/src/nsNavBookmarks.cpp

>@@ -141,17 +141,18 @@ nsNavBookmarks::Init()
>   // NOTE: Do not modify the ORDER BY segment of the query, as certain
>   // features depend on it. See bug 398914 for an example.
>   rv = dbConn->CreateStatement(NS_LITERAL_CSTRING(
>+      "SELECT b.id "
>+      "FROM moz_bookmarks b "
>+      "JOIN ( "
>+        "SELECT id FROM moz_places_temp "
>         "WHERE url = ?1 "
>         "UNION ALL "
>+        "SELECT id FROM moz_places "
>+        "WHERE url = ?1 "
>+        "AND id NOT IN (SELECT id FROM moz_places_temp) "
>+      ") AS h ON b.fk = h.id "
>+      "WHERE b.type = ?2 "
>+      "ORDER BY MAX(IFNULL(b.lastModified, 0), b.dateAdded) DESC, b.id DESC"),
Should probably do +id to make sure sqlite does not use the id index when eliminating duplicates from moz_places.
(http://www.sqlite.org/optoverview.html#multi_index)

>   // mDBGetChildren: select all children of a given folder, sorted by position
>   // This is a LEFT OUTER JOIN with moz_places since folders does not have
>   // a reference into that table.
>   rv = dbConn->CreateStatement(NS_LITERAL_CSTRING(
>+      "SELECT * FROM( "
nit: space after FROM please

>+        "SELECT h.id, h.url, COALESCE(b.title, h.title), "
>+        "h.rev_host, h.visit_count, "
>+          SQL_STR_FRAGMENT_MAX_VISIT_DATE( "h.id" )
>+          ", f.url, null, b.id, b.dateAdded, b.lastModified, "
>+          "b.position, b.type, b.fk "
>+        "FROM moz_bookmarks b "
>+        "JOIN moz_places_temp h ON b.fk = h.id "
>+        "LEFT JOIN moz_favicons f ON h.favicon_id = f.id "
>+        "WHERE b.parent = ?1 "
>+        "UNION ALL "
>+        "SELECT h.id, h.url, COALESCE(b.title, h.title), "
>+        "h.rev_host, h.visit_count, "
>+          SQL_STR_FRAGMENT_MAX_VISIT_DATE( "h.id" )
>+          ", f.url, null, b.id, b.dateAdded, b.lastModified, "
>+          "b.position, b.type, b.fk "
>+        "FROM moz_bookmarks b "
>+        "LEFT JOIN moz_places h ON b.fk = h.id "
>+        "LEFT JOIN moz_favicons f ON h.favicon_id = f.id "
>+        "WHERE b.parent = ?1 "
>+        "AND (b.fk ISNULL OR b.fk NOT IN (select id FROM moz_places_temp)) "
>+      ") "
>+      "ORDER BY 12 ASC"),
Could you specify a column name here by chance?  Or at least a comment saying what it is we are ordering on please.

>   // get keyword text for URI (must be a bookmarked URI)
>   rv = dbConn->CreateStatement(NS_LITERAL_CSTRING(
>+      "SELECT k.keyword "
>       "FROM ( "
>+        "SELECT id FROM moz_places_temp "
>         "WHERE url = ?1 "
>         "UNION ALL "
>+        "SELECT id FROM moz_places "
>         "WHERE id NOT IN (SELECT id FROM moz_places_temp) "
>+          "AND url = ?1 "
>+      ") AS h "
>+      "JOIN moz_bookmarks b ON b.fk = h.id "
>       "JOIN moz_keywords k ON k.id = b.keyword_id"),
>     getter_AddRefs(mDBGetKeywordForURI));
Again, +id for moz_places since sorting on url will likely take longer

More to come later
Comment on attachment 335879 [details] [diff] [review]
patch v1.1

This is so much fun!

>+++ b/toolkit/components/places/src/nsNavHistory.cpp
>   // mDBGetURLPageInfo
>   rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
>+    "SELECT id, url, title, rev_host, visit_count "
>+      "FROM moz_places_temp "
>+      "WHERE url = ?1 "
>+    "UNION ALL "
>+    "SELECT id, url, title, rev_host, visit_count "
>+      "FROM moz_places "
>+      "WHERE url = ?1 "
>+    "LIMIT 1"),
>     getter_AddRefs(mDBGetURLPageInfo));
Please add a comment as to why we don't need to sort out the duplicates

>   // mDBGetIdPageInfo
>   rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
>+      "SELECT id, url, title, rev_host, visit_count "
>+        "FROM moz_places_temp "
>         "WHERE id = ?1 "
>+      "UNION ALL "
>+      "SELECT id, url, title, rev_host, visit_count "
>+        "FROM moz_places "
>+        "WHERE id = ?1 "
>+      "LIMIT 1"),
>     getter_AddRefs(mDBGetIdPageInfo));
ditto

>   // mDBGetPageVisitStats (see InternalAdd)
>   rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
>       "SELECT id, visit_count, typed, hidden "
>+        "FROM moz_places_temp "
>         "WHERE url = ?1 "
>+      "UNION ALL "
>+      "SELECT id, visit_count, typed, hidden "
>+        "FROM moz_places "
>+        "WHERE url = ?1 "
>+      "LIMIT 1"),
>     getter_AddRefs(mDBGetPageVisitStats));
ditto

In fact, ditto that in general, and I'm going to stop commenting on it.

>+  // Since we don't need real random results and ORDER BY RANDOM() is slow
>+  // we will jump at a random rowid in the table and we will get random results
>+  // only from moz_places since temp will be synched there sometimes.  
>   // Notice that frecency is invalidated as frecency = -visit_count
>   rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
>+      "SELECT * FROM ( "
>+        "SELECT id, visit_count, hidden, typed, frecency, url "
>+        "FROM ( "
>+          "SELECT * FROM moz_places_temp "
>+          "WHERE frecency < 0 "
>+          "UNION ALL "
>+          "SELECT * FROM ( "
>+            "SELECT * FROM moz_places "
>+            "WHERE id NOT IN (SELECT id FROM moz_places_temp) "
>+            "AND frecency < 0 "
>+            "ORDER BY frecency ASC LIMIT ROUND(?1 / 2) "
>+          ") "
>+        ") ORDER BY frecency ASC LIMIT ROUND(?1 / 2)) "
>+      "UNION "
>+      "SELECT * FROM ( "
>+        "SELECT id, visit_count, hidden, typed, frecency, url "
>+        "FROM moz_places "
>         "WHERE frecency < 0 "
>+        "AND ROWID >= ABS(RANDOM() % (SELECT MAX(ROWID) FROM moz_places)) "
>+        "LIMIT ROUND(?1 / 2))"),
>     getter_AddRefs(mDBInvalidFrecencies));
Specifically:
>+            "WHERE id NOT IN (SELECT id FROM moz_places_temp) "
do +id so sqlite won't use that index, and will go based on frecency

>   // when calculating frecency, we want the visit count to be 
>   // all the visits.
>   rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
>+      "SELECT "
>+        "(SELECT COUNT(*) FROM moz_historyvisits WHERE place_id = ?1) + "
>+        "(SELECT COUNT(*) FROM moz_historyvisits_temp WHERE place_id = ?1 "
>+            "AND id NOT IN (SELECT id FROM moz_historyvisits))"),
>     getter_AddRefs(mDBFullVisitCount));
So, with the password manager, I think zpao found that doing COUNT(1) was a bit faster than COUNT(*).  Not sure if it really matters though.

>   // set frecency to 0 so that it is excluded from url bar autocomplete.
>   nsCOMPtr<mozIStorageStatement> dbUpdateStatement;
>   nsresult rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
>+      "UPDATE moz_places_view "
>+      "SET frecency = 0 WHERE id IN ("
>+        "SELECT h.id FROM moz_places h "
>+        "WHERE h.url >= 'place:' AND h.url <'place;' "
nit: space after <

>+      if (!mIncludeHidden) {
>+        mQueryString = NS_LITERAL_CSTRING(
>+          "SELECT id, url, title, rev_host, visit_count, MAX(visit_date), "
>+            "favicon_url, session, empty "
>+          "FROM( "
nit: space after FROM.  You've got this in a few places, I'm going to stop commenting on it now.

> nsNavHistory::GetLastPageVisited(nsACString & aLastPageVisited)
> {
>   nsCOMPtr<mozIStorageStatement> statement;
>   nsresult rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
>+      "SELECT * FROM ( "
>+        "SELECT url, visit_date FROM moz_historyvisits_temp v "
>+        "JOIN moz_places_temp h ON v.place_id = h.id "
>+        "WHERE h.hidden <> 1 "
>+        "ORDER BY visit_date DESC LIMIT 1 "
>+      ") "
>+      "UNION ALL "
>+      "SELECT * FROM ( "
>+        "SELECT url, visit_date FROM moz_historyvisits_temp v "
>+        "JOIN moz_places h ON v.place_id = h.id "
>+        "WHERE h.hidden <> 1 "
>+        "ORDER BY visit_date DESC LIMIT 1 "
>+      ") "
>+      "UNION ALL "
>+      "SELECT * FROM ( "
>+        "SELECT url, visit_date FROM moz_historyvisits v "
>+        "JOIN moz_places h ON v.place_id = h.id "
>+        "WHERE h.hidden <> 1 "
>+        "ORDER BY visit_date DESC LIMIT 1 "
>+      ") "
>+      "UNION ALL "
>+      "SELECT * FROM ( "
>+        "SELECT url, visit_date FROM moz_historyvisits v "
>+        "JOIN moz_places_temp h ON v.place_id = h.id "
>+        "WHERE h.hidden <> 1 "
>+        "ORDER BY visit_date DESC LIMIT 1 "
>+      ") "
>+      "ORDER BY 2 DESC LIMIT 1"),
nit: comment as to what 2 is please?

>@@ -4221,11 +4580,22 @@ nsNavHistory::RemovePagesByTimeframe(PRT
>   // we only need to know if a place has a visit into the given timeframe
>   // this query is faster than actually selecting in moz_historyvisits
>   nsCOMPtr<mozIStorageStatement> selectByTime;
>+  rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
>+      "SELECT h.id FROM moz_places h WHERE "
>+        "EXISTS "
>+          "(SELECT id FROM moz_historyvisits v WHERE v.place_id = h.id "
>+            "AND v.visit_date >= ?1 AND v.visit_date <= ?2 LIMIT 1)"
>+        "OR EXISTS "
>+          "(SELECT id FROM moz_historyvisits_temp v WHERE v.place_id = h.id "
>+            "AND v.visit_date >= ?1 AND v.visit_date <= ?2 LIMIT 1) "
>+      "UNION "
>+      "SELECT h.id FROM moz_places_temp h WHERE "
>+        "EXISTS "
>+          "(SELECT id FROM moz_historyvisits v WHERE v.place_id = h.id "
>+            "AND v.visit_date >= ?1 AND v.visit_date <= ?2 LIMIT 1)"
>+        "OR EXISTS "
>+          "(SELECT id FROM moz_historyvisits_temp v WHERE v.place_id = h.id "
>+            "AND v.visit_date >= ?1 AND v.visit_date <= ?2 LIMIT 1)"),
Wouldn't you want the first table queried from to be the temp table, and let the UNION remove duplicates from the permanent one?

>@@ -5302,7 +5675,7 @@ nsNavHistory::QueryToSelectClause(nsNavH
>       "EXISTS "
>         "(SELECT h.id "
>          "FROM moz_annos anno "
>-              "JOIN moz_anno_attributes annoname "
>+          "JOIN moz_anno_attributes annoname "
weird indentation now

>@@ -6257,7 +6630,8 @@ nsNavHistory::RemoveDuplicateURIs()
>   nsCOMPtr<mozIStorageStatement> selectStatement;
>   nsresult rv = mDBConn->CreateStatement(
>       NS_LITERAL_CSTRING("SELECT "
>+        "(SELECT h.id FROM moz_places_view h WHERE h.url=url "
>+          "ORDER BY h.visit_count DESC LIMIT 1), "
nit: space around = between h.url and url
nit: weird indentation


r=sdwilsh with those comments addressed
Attachment #335879 - Flags: review?(sdwilsh) → review+
Comment on attachment 335879 [details] [diff] [review]
patch v1.1

first pass. still need to do query builder and some of expiration code.

>   // mDBRecentVisitOfURL
>   rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
>-      "SELECT v.id, v.session "
>-      "FROM ( "
>-        "SELECT * FROM moz_places_temp "
>-        "WHERE url = ?1 "
>-        "UNION ALL "
>-        "SELECT * FROM moz_places "
>-        "WHERE id NOT IN (SELECT id FROM moz_places_temp) "
>-        "AND url = ?1 "
>-      ") AS h JOIN moz_historyvisits_view v ON h.id = v.place_id "
>-      "ORDER BY v.visit_date DESC "
>+      "SELECT * FROM ( "
>+        "SELECT v.id, v.session "
>+        "FROM moz_historyvisits_temp v "
>+        "WHERE v.place_id = IFNULL((SELECT id FROM moz_places_temp WHERE url = ?1), "
>+                                  "(SELECT id FROM moz_places WHERE url = ?1)) "
>+        "ORDER BY v.visit_date DESC LIMIT 1 "
>+      ") "
>+      "UNION ALL "
>+      "SELECT * FROM ( "
>+        "SELECT v.id, v.session "
>+        "FROM moz_historyvisits v "
>+        "WHERE v.place_id = IFNULL((SELECT id FROM moz_places_temp WHERE url = ?1), "
>+                                  "(SELECT id FROM moz_places WHERE url = ?1)) "
>+        "ORDER BY v.visit_date DESC LIMIT 1 "
>+      ") "
>       "LIMIT 1"),
>     getter_AddRefs(mDBRecentVisitOfURL));
>   NS_ENSURE_SUCCESS(rv, rv);

do the place_id subselects need LIMIT 1?

>@@ -1132,36 +1131,34 @@ nsNavHistory::InitStatements()
>   // mDBGetPageVisitStats (see InternalAdd)
>   rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
>       "SELECT id, visit_count, typed, hidden "
>-      "FROM ( "
>-        "SELECT * FROM moz_places_temp "
>+        "FROM moz_places_temp "

i'd prefer consistent indentation. sometimes you indent lines after a SELECT and sometimes you don't. pick one please.

>   // mDBVisitToVisitResult, should match kGetInfoIndex_* (see GetQueryResults)
>   rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
>       "SELECT h.id, h.url, h.title, h.rev_host, h.visit_count, "
>-             "v.visit_date, f.url, v.session, null "
>-      "FROM moz_places_view h "
>-      "JOIN ( "
>-        "SELECT * FROM moz_historyvisits_temp "
>-        "WHERE id = ?1 "
>-        "UNION ALL "
>-        "SELECT * FROM moz_historyvisits "
>-        "WHERE id NOT IN (SELECT id FROM moz_historyvisits_temp) "
>-        "AND id = ?1 "
>-      ") AS v ON h.id = v.place_id "
>-      "LEFT OUTER JOIN moz_favicons f ON h.favicon_id = f.id "),
>+          "v.visit_date, f.url, v.session, null "
>+        "FROM moz_places_temp h "
>+        "LEFT JOIN moz_historyvisits_temp v_t ON h.id = v_t.place_id "
>+        "LEFT JOIN moz_historyvisits v ON h.id = v.place_id "
>+        "LEFT JOIN moz_favicons f ON h.favicon_id = f.id "
>+        "WHERE v.id = ?1 OR v_t.id = ?1 "
>+      "UNION ALL "
>+      "SELECT h.id, h.url, h.title, h.rev_host, h.visit_count, "
>+          "v.visit_date, f.url, v.session, null "
>+        "FROM moz_places h "
>+        "LEFT JOIN moz_historyvisits_temp v_t ON h.id = v_t.place_id "
>+        "LEFT JOIN moz_historyvisits v ON h.id = v.place_id "
>+        "LEFT JOIN moz_favicons f ON h.favicon_id = f.id "
>+        "WHERE v.id = ?1 OR v_t.id = ?1 "
>+      "LIMIT 1"),
>     getter_AddRefs(mDBVisitToVisitResult));
>   NS_ENSURE_SUCCESS(rv, rv);

maybe could share SQL between this and the previous query?

>@@ -1322,40 +1340,45 @@ nsNavHistory::InitStatements()
>   // We get two sets of places that are 1) most visited and 2) random so that
>   // we don't get stuck recalculating frecencies that end up being -1 every
>   // time
>+  // Since we don't need real random results and ORDER BY RANDOM() is slow
>+  // we will jump at a random rowid in the table and we will get random results
>+  // only from moz_places since temp will be synched there sometimes.  
>   // Notice that frecency is invalidated as frecency = -visit_count
>   rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
>-    "SELECT * FROM "
>-      "(SELECT id, visit_count, hidden, typed, frecency, url "
>-      "FROM ( "
>-        "SELECT * FROM moz_places_temp "
>+      "SELECT * FROM ( "
>+        "SELECT id, visit_count, hidden, typed, frecency, url "
>+        "FROM ( "
>+          "SELECT * FROM moz_places_temp "
>+          "WHERE frecency < 0 "
>+          "UNION ALL "
>+          "SELECT * FROM ( "
>+            "SELECT * FROM moz_places "
>+            "WHERE id NOT IN (SELECT id FROM moz_places_temp) "
>+            "AND frecency < 0 "
>+            "ORDER BY frecency ASC LIMIT ROUND(?1 / 2) "
>+          ") "

why the nested SELECT * FROM ?

>@@ -2560,27 +2561,25 @@ nsNavHistory::FixInvalidFrecenciesForExc
>   // set frecency to 0 so that it is excluded from url bar autocomplete.
>   nsCOMPtr<mozIStorageStatement> dbUpdateStatement;
>   nsresult rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
>-    "UPDATE moz_places_view "
>-    "SET frecency = 0 WHERE id IN ("
>-      "SELECT h.id FROM ( "
>-        "SELECT * FROM moz_places_temp "
>-        "WHERE frecency < 0 "
>-        "OR SUBSTR(url, 0, 6) = 'place:' "
>-        "UNION ALL "
>-        "SELECT * FROM moz_places "
>-        "WHERE id NOT IN (SELECT id FROM moz_places_temp) "
>-        "AND (frecency < 0 "
>-         "OR SUBSTR(url, 0, 6) = 'place:') "
>-      ") AS h "
>-      "LEFT OUTER JOIN moz_bookmarks b ON h.id = b.fk "
>-      "WHERE "
>-        // place is an unvisited child of a livemark feed
>-        "(b.parent IN ("
>-            "SELECT annos.item_id FROM moz_anno_attributes attrs "
>-            "JOIN moz_items_annos annos ON attrs.id = annos.anno_attribute_id "
>-            "WHERE attrs.name = ?1) "
>-          "AND visit_count = 0) "
>-        ")"),
>+      "UPDATE moz_places_view "
>+      "SET frecency = 0 WHERE id IN ("
>+        "SELECT h.id FROM moz_places h "
>+        "WHERE h.url >= 'place:' AND h.url <'place;' "
>+        "UNION "
>+        "SELECT h.id FROM moz_places_temp h "
>+        "WHERE  h.url >= 'place:' AND h.url < 'place;' "
>+        "UNION "
>+        // Unvisited child of a livemark        
>+        "SELECT b.fk FROM moz_bookmarks b "
>+        "JOIN moz_items_annos a ON a.item_id = b.id "
>+        "JOIN moz_anno_attributes n ON n.id = a.anno_attribute_id "
>+        "WHERE n.name = ?1 "
>+        "AND fk IN( "
>+          "SELECT id FROM moz_places WHERE visit_count = 0 AND frecency < 0 "
>+          "UNION ALL "
>+          "SELECT id FROM moz_places_temp WHERE visit_count = 0 AND frecency < 0 "
>+        ") "
>+      ")"),
>     getter_AddRefs(dbUpdateStatement));
>   NS_ENSURE_SUCCESS(rv, rv);

the original query matches all entries w/ frecency <0, but that's not covered in the optimized query.

>@@ -653,17 +697,30 @@ nsNavHistoryExpire::EraseHistory(mozISto
>   if (deletedPlaceIds.IsEmpty())
>     return NS_OK;
> 
...
>+  return aConnection->ExecuteSimpleSQL(NS_LITERAL_CSTRING(
>+      "DELETE FROM moz_places_view WHERE id IN( "
>+        "SELECT h.id "

while you're here, can you switch this to check rv so that errors don't get swallowed?

>   NS_ENSURE_SUCCESS(rv, rv);
>   rv = expireEmbeddedLinksStatement->BindInt64Parameter(0, maxEmbeddedAge);
>   NS_ENSURE_SUCCESS(rv, rv);
>-  rv = expireEmbeddedLinksStatement->BindInt32Parameter(1, nsINavHistoryService::TRANSITION_EMBED);
>-  NS_ENSURE_SUCCESS(rv, rv);
>-  rv = expireEmbeddedLinksStatement->BindInt32Parameter(2, EXPIRATION_CAP_EMBEDDED);
>+  rv = expireEmbeddedLinksStatement->BindInt32Parameter(1, EXPIRATION_CAP_EMBEDDED);
>   NS_ENSURE_SUCCESS(rv, rv);
>   rv = expireEmbeddedLinksStatement->Execute();
>   NS_ENSURE_SUCCESS(rv, rv);

why change from binding the embed type to inlining it in the query string?
Attachment #335879 - Flags: review?(dietrich) → review+
Comment on attachment 335879 [details] [diff] [review]
patch v1.1

>@@ -3178,19 +3177,160 @@ PlacesSQLQueryBuilder::SelectAsURI()
>   switch (mQueryType)
>   {
>     case nsINavHistoryQueryOptions::QUERY_TYPE_HISTORY:
>-      mQueryString = NS_LITERAL_CSTRING(
>-        "SELECT h.id, h.url, h.title, h.rev_host, h.visit_count, "
>-          "MAX(visit_date), f.url, null, null "
>-        "FROM moz_places_view AS h "
>-             "LEFT OUTER JOIN moz_historyvisits_view v ON h.id = v.place_id "
>-             "LEFT OUTER JOIN moz_favicons f ON h.favicon_id = f.id ");
>-
>-      if (!mIncludeHidden)
>-        mQueryString += NS_LITERAL_CSTRING(
>-          " WHERE h.hidden <> 1 AND v.visit_type NOT IN (0,4)"
>-            " {ADDITIONAL_CONDITIONS} ");
>-
>-      mGroupBy = NS_LITERAL_CSTRING(" GROUP BY h.id");
>+      if (!mIncludeHidden) {
>+        mQueryString = NS_LITERAL_CSTRING(
>+          "SELECT id, url, title, rev_host, visit_count, MAX(visit_date), "
>+            "favicon_url, session, empty "
>+          "FROM( "
>+            "SELECT id, url, title, rev_host, visit_count, visit_date, "
>+              "favicon_url, session, empty "
>+            "FROM ("
>+              "SELECT h.id AS id, h.url AS url, h.title AS title, "
>+                "h.rev_host AS rev_host, h.visit_count AS visit_count, "
>+                "MAX(v.visit_date) AS visit_date, f.url AS favicon_url, "
>+                "v.session AS session, null AS empty "
>+              "FROM moz_places h "
>+              "JOIN moz_historyvisits v ON h.id = v.place_id "
>+              "LEFT JOIN moz_favicons f ON h.favicon_id = f.id "
>+              "WHERE h.hidden <> 1 AND v.visit_type NOT IN ") +
>+                nsPrintfCString("(0,%d,%d) ",
>+                                nsINavHistoryService::TRANSITION_EMBED,
>+                                nsINavHistoryService::TRANSITION_DOWNLOAD) +
>+                NS_LITERAL_CSTRING("AND h.visit_count > 0 "
>+                "{ADDITIONAL_CONDITIONS} "
>+              "GROUP BY h.id "
>+            ") "
>+            "UNION ALL "
>+            "SELECT id, url, title, rev_host, visit_count, visit_date, "
>+              "favicon_url, session, empty "
>+            "FROM ( "
>+              "SELECT h.id AS id, h.url AS url, h.title AS title, "
>+                "h.rev_host AS rev_host, h.visit_count AS visit_count, "
>+                "MAX(v.visit_date) AS visit_date, f.url AS favicon_url, "
>+                "v.session AS session, null AS empty "
>+              "FROM moz_places_temp h "
>+              "JOIN moz_historyvisits v ON h.id = v.place_id "
>+              "LEFT JOIN moz_favicons f ON h.favicon_id = f.id "
>+              "WHERE h.hidden <> 1 AND v.visit_type NOT IN ") +
>+                nsPrintfCString("(0,%d,%d) ",
>+                                nsINavHistoryService::TRANSITION_EMBED,
>+                                nsINavHistoryService::TRANSITION_DOWNLOAD) +
>+                NS_LITERAL_CSTRING("AND h.visit_count > 0 "
>+                "AND h.id NOT IN (SELECT id FROM moz_places_temp) "
>+                "{ADDITIONAL_CONDITIONS} "
>+              "GROUP BY h.id "
>+            ") "
>+            "UNION ALL "
>+            "SELECT id, url, title, rev_host, visit_count, visit_date, "
>+              "favicon_url, session, empty "
>+            "FROM( "
>+              "SELECT h.id AS id, h.url AS url, h.title AS title, "
>+                "h.rev_host AS rev_host, h.visit_count AS visit_count, "
>+                "MAX(v.visit_date) AS visit_date, f.url AS favicon_url, "
>+                "v.session AS session, null AS empty "
>+              "FROM moz_places h "
>+              "JOIN moz_historyvisits v ON h.id = v.place_id "

should be moz_historyvisits_temp right?

>+      else {
>+        mQueryString = NS_LITERAL_CSTRING(
>+          "SELECT id, url, title, rev_host, visit_count, MAX(visit_date), "
>+            "favicon_url, session, empty "
>+          "FROM( "
>+            "SELECT id, url, title, rev_host, visit_count, visit_date, "
>+              "favicon_url, session, empty "
>+            "FROM ("
>+              "SELECT h.id AS id, h.url AS url, h.title AS title, "
>+                "h.rev_host AS rev_host, h.visit_count AS visit_count, "
>+                "MAX(v.visit_date) AS visit_date, f.url AS favicon_url, "
>+                "v.session AS session, null AS empty "
>+              "FROM moz_places h "
>+              "JOIN moz_historyvisits v ON h.id = v.place_id "
>+              "LEFT JOIN moz_favicons f ON h.favicon_id = f.id "
>+              // no-op since {ADDITIONAL_CONDITIONS} will start with AND
>+              "WHERE 1=1 "
>+                "{ADDITIONAL_CONDITIONS} "
>+              "GROUP BY h.id "
>+            ") "
>+            "UNION ALL "
>+            "SELECT id, url, title, rev_host, visit_count, visit_date, "
>+              "favicon_url, session, empty "
>+            "FROM ( "
>+              "SELECT h.id AS id, h.url AS url, h.title AS title, "
>+                "h.rev_host AS rev_host, h.visit_count AS visit_count, "
>+                "MAX(v.visit_date) AS visit_date, f.url AS favicon_url, "
>+                "v.session AS session, null AS empty "
>+              "FROM moz_places_temp h "
>+              "JOIN moz_historyvisits v ON h.id = v.place_id "
>+              "LEFT JOIN moz_favicons f ON h.favicon_id = f.id "
>+              "WHERE h.id NOT IN (SELECT id FROM moz_places_temp) "
>+                "{ADDITIONAL_CONDITIONS} "
>+              "GROUP BY h.id "
>+            ") "
>+            "UNION ALL "
>+            "SELECT id, url, title, rev_host, visit_count, visit_date, "
>+              "favicon_url, session, empty "
>+            "FROM( "
>+              "SELECT h.id AS id, h.url AS url, h.title AS title, "
>+                "h.rev_host AS rev_host, h.visit_count AS visit_count, "
>+                "MAX(v.visit_date) AS visit_date, f.url AS favicon_url, "
>+                "v.session AS session, null AS empty "
>+              "FROM moz_places h "
>+              "JOIN moz_historyvisits v ON h.id = v.place_id "

ditto

>-  nsresult rv = mDBConn->CreateStatement(sqlHead + (mAutoCompleteOnlyTyped ?
>-      NS_LITERAL_CSTRING("WHERE h.typed = 1 ") : EmptyCString()) + sqlTail,
>-    getter_AddRefs(mDBAutoCompleteQuery));
>+  // Try to reduce size of compund table since with partitioning this became

s/compund/compound/

r=me

please get additional review on the autocomplete queries from mardak.
(In reply to comment #27)
> 
> >   // when calculating frecency, we want the visit count to be 
> >   // all the visits.
> >   rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
> >+      "SELECT "
> >+        "(SELECT COUNT(*) FROM moz_historyvisits WHERE place_id = ?1) + "
> >+        "(SELECT COUNT(*) FROM moz_historyvisits_temp WHERE place_id = ?1 "
> >+            "AND id NOT IN (SELECT id FROM moz_historyvisits))"),
> >     getter_AddRefs(mDBFullVisitCount));
> So, with the password manager, I think zpao found that doing COUNT(1) was a bit
> faster than COUNT(*).  Not sure if it really matters though.

same timings for me... i don't see why it should be faster, i've done a test on the biggest table with
SELECT count(*) FROM moz_historyvisits;
SELECT count(1) FROM moz_historyvisits;
SELECT count(id) FROM moz_historyvisits;
SELECT count(ROWID) FROM moz_historyvisits;
often count(*) is the fastest, results have variations clearly, so i should do that in a loop, discard max and min values and calculate an average, hwv imho it doesn't matter and the advantage would probably be less that timer resolution. if i don't see advantages in big tables it will not probably exists in smaller tables.

Addressed all other Shawn's comments locally, movin on with Dietrich's review.
(In reply to comment #28)
> >   // mDBRecentVisitOfURL
> >   rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
> >-      "SELECT v.id, v.session "
> >-      "FROM ( "
> >-        "SELECT * FROM moz_places_temp "
> >-        "WHERE url = ?1 "
> >-        "UNION ALL "
> >-        "SELECT * FROM moz_places "
> >-        "WHERE id NOT IN (SELECT id FROM moz_places_temp) "
> >-        "AND url = ?1 "
> >-      ") AS h JOIN moz_historyvisits_view v ON h.id = v.place_id "
> >-      "ORDER BY v.visit_date DESC "
> >+      "SELECT * FROM ( "
> >+        "SELECT v.id, v.session "
> >+        "FROM moz_historyvisits_temp v "
> >+        "WHERE v.place_id = IFNULL((SELECT id FROM moz_places_temp WHERE url = ?1), "
> >+                                  "(SELECT id FROM moz_places WHERE url = ?1)) "
> >+        "ORDER BY v.visit_date DESC LIMIT 1 "
> >+      ") "
> >+      "UNION ALL "
> >+      "SELECT * FROM ( "
> >+        "SELECT v.id, v.session "
> >+        "FROM moz_historyvisits v "
> >+        "WHERE v.place_id = IFNULL((SELECT id FROM moz_places_temp WHERE url = ?1), "
> >+                                  "(SELECT id FROM moz_places WHERE url = ?1)) "
> >+        "ORDER BY v.visit_date DESC LIMIT 1 "
> >+      ") "
> >       "LIMIT 1"),
> >     getter_AddRefs(mDBRecentVisitOfURL));
> >   NS_ENSURE_SUCCESS(rv, rv);
> 
> do the place_id subselects need LIMIT 1?

no, because we have unique uris

> >   // mDBVisitToVisitResult, should match kGetInfoIndex_* (see GetQueryResults)
> >   rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
> >       "SELECT h.id, h.url, h.title, h.rev_host, h.visit_count, "
> >-             "v.visit_date, f.url, v.session, null "
> >-      "FROM moz_places_view h "
> >-      "JOIN ( "
> >-        "SELECT * FROM moz_historyvisits_temp "
> >-        "WHERE id = ?1 "
> >-        "UNION ALL "
> >-        "SELECT * FROM moz_historyvisits "
> >-        "WHERE id NOT IN (SELECT id FROM moz_historyvisits_temp) "
> >-        "AND id = ?1 "
> >-      ") AS v ON h.id = v.place_id "
> >-      "LEFT OUTER JOIN moz_favicons f ON h.favicon_id = f.id "),
> >+          "v.visit_date, f.url, v.session, null "
> >+        "FROM moz_places_temp h "
> >+        "LEFT JOIN moz_historyvisits_temp v_t ON h.id = v_t.place_id "
> >+        "LEFT JOIN moz_historyvisits v ON h.id = v.place_id "
> >+        "LEFT JOIN moz_favicons f ON h.favicon_id = f.id "
> >+        "WHERE v.id = ?1 OR v_t.id = ?1 "
> >+      "UNION ALL "
> >+      "SELECT h.id, h.url, h.title, h.rev_host, h.visit_count, "
> >+          "v.visit_date, f.url, v.session, null "
> >+        "FROM moz_places h "
> >+        "LEFT JOIN moz_historyvisits_temp v_t ON h.id = v_t.place_id "
> >+        "LEFT JOIN moz_historyvisits v ON h.id = v.place_id "
> >+        "LEFT JOIN moz_favicons f ON h.favicon_id = f.id "
> >+        "WHERE v.id = ?1 OR v_t.id = ?1 "
> >+      "LIMIT 1"),
> >     getter_AddRefs(mDBVisitToVisitResult));
> >   NS_ENSURE_SUCCESS(rv, rv);
> 
> maybe could share SQL between this and the previous query?

i've tried to share but i really don't like the result, it's making queries much less readable

> >@@ -1322,40 +1340,45 @@ nsNavHistory::InitStatements()
> >   // We get two sets of places that are 1) most visited and 2) random so that
> >   // we don't get stuck recalculating frecencies that end up being -1 every
> >   // time
> >+  // Since we don't need real random results and ORDER BY RANDOM() is slow
> >+  // we will jump at a random rowid in the table and we will get random results
> >+  // only from moz_places since temp will be synched there sometimes.  
> >   // Notice that frecency is invalidated as frecency = -visit_count
> >   rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
> >-    "SELECT * FROM "
> >-      "(SELECT id, visit_count, hidden, typed, frecency, url "
> >-      "FROM ( "
> >-        "SELECT * FROM moz_places_temp "
> >+      "SELECT * FROM ( "
> >+        "SELECT id, visit_count, hidden, typed, frecency, url "
> >+        "FROM ( "
> >+          "SELECT * FROM moz_places_temp "
> >+          "WHERE frecency < 0 "
> >+          "UNION ALL "
> >+          "SELECT * FROM ( "
> >+            "SELECT * FROM moz_places "
> >+            "WHERE id NOT IN (SELECT id FROM moz_places_temp) "
> >+            "AND frecency < 0 "
> >+            "ORDER BY frecency ASC LIMIT ROUND(?1 / 2) "
> >+          ") "
> 
> why the nested SELECT * FROM ?

you can't directly use LIMIT or ORDER BY with union if you want it into one of the two parts. the only ways is nesting a select *, in this case i wanted to limit results from moz_places. I've tried to avoid the first SELECT id, visit_count, hidden, typed, frecency, url FROM ( but the resulting query is slower by a couple ms.

> >@@ -2560,27 +2561,25 @@ nsNavHistory::FixInvalidFrecenciesForExc
> >   // set frecency to 0 so that it is excluded from url bar autocomplete.
> >   nsCOMPtr<mozIStorageStatement> dbUpdateStatement;
> >   nsresult rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
> >-    "UPDATE moz_places_view "
> >-    "SET frecency = 0 WHERE id IN ("
> >-      "SELECT h.id FROM ( "
> >-        "SELECT * FROM moz_places_temp "
> >-        "WHERE frecency < 0 "
> >-        "OR SUBSTR(url, 0, 6) = 'place:' "
> >-        "UNION ALL "
> >-        "SELECT * FROM moz_places "
> >-        "WHERE id NOT IN (SELECT id FROM moz_places_temp) "
> >-        "AND (frecency < 0 "
> >-         "OR SUBSTR(url, 0, 6) = 'place:') "
> >-      ") AS h "
> >-      "LEFT OUTER JOIN moz_bookmarks b ON h.id = b.fk "
> >-      "WHERE "
> >-        // place is an unvisited child of a livemark feed
> >-        "(b.parent IN ("
> >-            "SELECT annos.item_id FROM moz_anno_attributes attrs "
> >-            "JOIN moz_items_annos annos ON attrs.id = annos.anno_attribute_id "
> >-            "WHERE attrs.name = ?1) "
> >-          "AND visit_count = 0) "
> >-        ")"),
> >+      "UPDATE moz_places_view "
> >+      "SET frecency = 0 WHERE id IN ("
> >+        "SELECT h.id FROM moz_places h "
> >+        "WHERE h.url >= 'place:' AND h.url <'place;' "
> >+        "UNION "
> >+        "SELECT h.id FROM moz_places_temp h "
> >+        "WHERE  h.url >= 'place:' AND h.url < 'place;' "
> >+        "UNION "
> >+        // Unvisited child of a livemark        
> >+        "SELECT b.fk FROM moz_bookmarks b "
> >+        "JOIN moz_items_annos a ON a.item_id = b.id "
> >+        "JOIN moz_anno_attributes n ON n.id = a.anno_attribute_id "
> >+        "WHERE n.name = ?1 "
> >+        "AND fk IN( "
> >+          "SELECT id FROM moz_places WHERE visit_count = 0 AND frecency < 0 "
> >+          "UNION ALL "
> >+          "SELECT id FROM moz_places_temp WHERE visit_count = 0 AND frecency < 0 "
> >+        ") "
> >+      ")"),
> >     getter_AddRefs(dbUpdateStatement));
> >   NS_ENSURE_SUCCESS(rv, rv);
> 
> the original query matches all entries w/ frecency <0, but that's not covered
> in the optimized query.

frecency < 0 was a trick to avoid setting it into all entries if you have a lot of entries, so to avoid some useless work. i'm doing this in the second part of the query, while the first part is only setting for place: uris, i don't expect having thousands of place: uris, so the select clause would probably take the same time we spend to set to all, plus adding an additive protection agains showing place: uris in the locationbar.

> >   NS_ENSURE_SUCCESS(rv, rv);
> >   rv = expireEmbeddedLinksStatement->BindInt64Parameter(0, maxEmbeddedAge);
> >   NS_ENSURE_SUCCESS(rv, rv);
> >-  rv = expireEmbeddedLinksStatement->BindInt32Parameter(1, nsINavHistoryService::TRANSITION_EMBED);
> >-  NS_ENSURE_SUCCESS(rv, rv);
> >-  rv = expireEmbeddedLinksStatement->BindInt32Parameter(2, EXPIRATION_CAP_EMBEDDED);
> >+  rv = expireEmbeddedLinksStatement->BindInt32Parameter(1, EXPIRATION_CAP_EMBEDDED);
> >   NS_ENSURE_SUCCESS(rv, rv);
> >   rv = expireEmbeddedLinksStatement->Execute();
> >   NS_ENSURE_SUCCESS(rv, rv);
> 
> why change from binding the embed type to inlining it in the query string?

to avoid a useless bind, we should not bind constants if we don't need to change them, i find it more readable, we are not binding other transtition type, so why should this differ?

fixed all other Dietrich's review comments
- fixed all review comments
- cleaned up queries formatting a bit to be consistent

NOTE:
THIS PATCH IS A MERGE between preious optimize patch and the patch from bug 449640, for maintainance reasons having a merge makes unbitrotting easier, so we will go on with this and Shawn will do final maintainance and pushing.

Forwarding review request to Mardak for the autocomplete part of the patch.
Attachment #335879 - Attachment is obsolete: true
Attachment #337045 - Flags: review?(edilee)
Status: NEW → ASSIGNED
Whiteboard: [merged with patch from bug 449640]
Comment on attachment 337045 [details] [diff] [review]
merged patch (bug 449640  + bug 450705)

>+  // Try to reduce size of compound table since with partitioning this became
>+  // slower. Limiting moz_places with OFFSET+LIMIT will mostly help speed
>+  // of first chunks, that are usually most wanted.
Any idea what's the difference in query time for the first chunk with this new query as well as compared to the last chunk? The default behavior is to match on boundaries then match anywhere, but match anywhere doesn't start until all places have been looked at.

>+  nsCString sqlBase = NS_LITERAL_CSTRING(
>+    "SELECT h.url, h.title, f.url") + BOOK_TAG_SQL + NS_LITERAL_CSTRING(", "
Does BOOK_TAG_SQL need to be changed because I believe it's coded to look for "h". I remember running into issue where the same table name is used and I needed to do something like `h`. But maybe it's not an issue when doing a UNION.

>+      "h.visit_count, h.frecency "
Might be useful to have a note that h.frecency isn't actually a column that we read out later. (The kAutoCompleteIndex_ columns)

>+    "FROM moz_places_temp h "
>+    "LEFT OUTER JOIN moz_favicons f ON f.id = h.favicon_id "
>+    "WHERE h.frecency <> 0 "
>+    "{ADDITIONAL_CONDITIONS} "
>+    "UNION ALL "
>+    "SELECT * FROM ( "
>+      "SELECT h.url, h.title, f.url") + BOOK_TAG_SQL + NS_LITERAL_CSTRING(", "
>+        "h.visit_count, h.frecency "
>+      "FROM moz_places h "
>+      "LEFT OUTER JOIN moz_favicons f ON f.id = h.favicon_id "
>+      "WHERE h.id NOT IN (SELECT id FROM moz_places_temp) "
>+      "AND h.frecency <> 0 "
>+      "{ADDITIONAL_CONDITIONS} "
>+      "ORDER BY h.frecency DESC LIMIT (?2 + ?3) "
Can we do any better here by offsetting some known amount on later chunks? Might get hairy though trying to calculate the number of entries in moz_places_temp and estimating which results came from which tables..

>+    ") "
>+    "ORDER BY 8 DESC LIMIT ?2 OFFSET ?3");
On a slightly separate note, what if we changed the frecency index to to "frecency,typed,visit_count" which would let us do frecency DESC, typed DESC, visit_count DESC.. ?

>+  nsCString AutoCompleteQuery = sqlBase;
>+  AutoCompleteQuery.ReplaceSubstring("{ADDITIONAL_CONDITIONS}",
>+                                     (mAutoCompleteOnlyTyped ?
>+                                        "AND typed = 1" : ""));
Neat ;) Is there a perf impact if we explicitly put which table to read from even if the column doesn't need disambiguation? I noticed the new queries use just typed, visit_count, etc instead of h.typed.

>+    "SELECT IFNULL(h_t.url, h.url), IFNULL(h_t.title, h.title), f.url ") +
>+      BOOK_TAG_SQL + NS_LITERAL_CSTRING(", "
BOOK_TAG_SQL most likely will get confused here if we grab h_t over h.

>+      "IFNULL(h_t.visit_count, h.visit_count), rank "
>+    "FROM ( "
>+      "SELECT ROUND(MAX(((i.input = ?2) + (SUBSTR(i.input, 1, LENGTH(?2)) = ?2)) * "
>+        "i.use_count), 1) AS rank, place_id "
>+      "FROM moz_inputhistory i "
>+      "GROUP BY i.place_id HAVING rank > 0 "
>+      ") AS i "
Do we have any particular convention about when to use AS versus omitting it?

>+    "LEFT JOIN moz_places h ON h.id = i.place_id "
>+    "LEFT JOIN moz_places_temp h_t ON h_t.id = i.place_id "
(Actually, second thought perhaps nevermind about BOOK_TAG_SQL because it looks up both tables.. but then why use temp here?)

>+    "LEFT JOIN moz_favicons f ON f.id = IFNULL(h_t.favicon_id, h.favicon_id) "
>+    "WHERE IFNULL(h_t.url, h.url) NOTNULL "
>+    "ORDER BY rank DESC, IFNULL(h_t.frecency,h.frecency) DESC");
nit space after comma

>+    "SELECT IFNULL( "
>+        "(SELECT REPLACE(url, '%s', ?2) FROM moz_places_temp WHERE id = b.fk), "
>+        "(SELECT REPLACE(url, '%s', ?2) FROM moz_places WHERE id = b.fk) "
>+      ") AS search_url, IFNULL(h_t.title, h.title) , "
>+      "COALESCE(f.url, "
>+        "(SELECT f.url "
>+          "FROM moz_places_temp "
>+          "JOIN moz_favicons f ON f.id = favicon_id "
>+          "WHERE rev_host = IFNULL( "
>+            "(SELECT rev_host FROM moz_places_temp WHERE id = b.fk), "
>+            "(SELECT rev_host FROM moz_places WHERE id = b.fk)) "
>+          "ORDER BY frecency DESC LIMIT 1), "
>+        "(SELECT f.url "
>+        "FROM moz_places "
>+        "JOIN moz_favicons f ON f.id = favicon_id "
>+          "WHERE rev_host = IFNULL( "
>+            "(SELECT rev_host FROM moz_places_temp WHERE id = b.fk), "
>+            "(SELECT rev_host FROM moz_places WHERE id = b.fk)) "
>+        "ORDER BY frecency DESC LIMIT 1) "
I wonder if there's a cleaner way than to just copy/paste the repeated queries for moz_places_temp and moz_places for both the search_url and favicon incase we need to change the query later on. Just seems prone to coding errors with things right now.
(In reply to comment #33)
> (From update of attachment 337045 [details] [diff] [review])
> >+  // Try to reduce size of compound table since with partitioning this became
> >+  // slower. Limiting moz_places with OFFSET+LIMIT will mostly help speed
> >+  // of first chunks, that are usually most wanted.
> Any idea what's the difference in query time for the first chunk with this new
> query as well as compared to the last chunk? The default behavior is to match
> on boundaries then match anywhere, but match anywhere doesn't start until all
> places have been looked at.

on the first chunk we move from 1ms to about 6ms, on later chunks that increases, i don't have numbers for all the chunks but the first 100 should be fast enough. latest chunk with my db (Test with 19000 moz_places, 145000 visits, 2700 bookmarks) was taking more than a second though

> >+    "FROM moz_places_temp h "
> >+    "LEFT OUTER JOIN moz_favicons f ON f.id = h.favicon_id "
> >+    "WHERE h.frecency <> 0 "
> >+    "{ADDITIONAL_CONDITIONS} "
> >+    "UNION ALL "
> >+    "SELECT * FROM ( "
> >+      "SELECT h.url, h.title, f.url") + BOOK_TAG_SQL + NS_LITERAL_CSTRING(", "
> >+        "h.visit_count, h.frecency "
> >+      "FROM moz_places h "
> >+      "LEFT OUTER JOIN moz_favicons f ON f.id = h.favicon_id "
> >+      "WHERE h.id NOT IN (SELECT id FROM moz_places_temp) "
> >+      "AND h.frecency <> 0 "
> >+      "{ADDITIONAL_CONDITIONS} "
> >+      "ORDER BY h.frecency DESC LIMIT (?2 + ?3) "
> Can we do any better here by offsetting some known amount on later chunks?
> Might get hairy though trying to calculate the number of entries in
> moz_places_temp and estimating which results came from which tables..

well i tried that, was my first idea, but since after we offset again and we don't know which items from temp tables will be in our interval and which not, it will be quite complex and intensive (i didn't finished with a working example, i was always missing something, and for sure will require 1 additional query to could the offset). If you can come with something it would be appreciated :)

> >+    ") "
> >+    "ORDER BY 8 DESC LIMIT ?2 OFFSET ?3");
> On a slightly separate note, what if we changed the frecency index to to
> "frecency,typed,visit_count" which would let us do frecency DESC, typed DESC,
> visit_count DESC.. ?

not much useful if we split tables since on the compound table we don't have indexes (that's what makes these queries slower).

> 
> >+    "SELECT IFNULL(h_t.url, h.url), IFNULL(h_t.title, h.title), f.url ") +
> >+      BOOK_TAG_SQL + NS_LITERAL_CSTRING(", "
> BOOK_TAG_SQL most likely will get confused here if we grab h_t over h.
> 
> >+      "IFNULL(h_t.visit_count, h.visit_count), rank "
> >+    "FROM ( "
> >+      "SELECT ROUND(MAX(((i.input = ?2) + (SUBSTR(i.input, 1, LENGTH(?2)) = ?2)) * "
> >+        "i.use_count), 1) AS rank, place_id "
> >+      "FROM moz_inputhistory i "
> >+      "GROUP BY i.place_id HAVING rank > 0 "
> >+      ") AS i "
> Do we have any particular convention about when to use AS versus omitting it?

No it's not consistent, should be cleaned up, i think we should omit it when referring to real tables (moz_places, moz_historyvisits, etc.) while put it when referring to calculated fields or subqueries... i find more readable that way, but personal choice.

> >+    "LEFT JOIN moz_favicons f ON f.id = IFNULL(h_t.favicon_id, h.favicon_id) "
> >+    "WHERE IFNULL(h_t.url, h.url) NOTNULL "
> >+    "ORDER BY rank DESC, IFNULL(h_t.frecency,h.frecency) DESC");
> nit space after comma
> 
> >+    "SELECT IFNULL( "
> >+        "(SELECT REPLACE(url, '%s', ?2) FROM moz_places_temp WHERE id = b.fk), "
> >+        "(SELECT REPLACE(url, '%s', ?2) FROM moz_places WHERE id = b.fk) "
> >+      ") AS search_url, IFNULL(h_t.title, h.title) , "
> >+      "COALESCE(f.url, "
> >+        "(SELECT f.url "
> >+          "FROM moz_places_temp "
> >+          "JOIN moz_favicons f ON f.id = favicon_id "
> >+          "WHERE rev_host = IFNULL( "
> >+            "(SELECT rev_host FROM moz_places_temp WHERE id = b.fk), "
> >+            "(SELECT rev_host FROM moz_places WHERE id = b.fk)) "
> >+          "ORDER BY frecency DESC LIMIT 1), "
> >+        "(SELECT f.url "
> >+        "FROM moz_places "
> >+        "JOIN moz_favicons f ON f.id = favicon_id "
> >+          "WHERE rev_host = IFNULL( "
> >+            "(SELECT rev_host FROM moz_places_temp WHERE id = b.fk), "
> >+            "(SELECT rev_host FROM moz_places WHERE id = b.fk)) "
> >+        "ORDER BY frecency DESC LIMIT 1) "
> I wonder if there's a cleaner way than to just copy/paste the repeated queries
> for moz_places_temp and moz_places for both the search_url and favicon incase
> we need to change the query later on. Just seems prone to coding errors with
> things right now.

it would be more error prone if we start putting sql fragments here and there imho, testing or modifying the query could become a pain... so if you got an idea...

i'll fix other comments as soon as i can, so feel free to continue your review if you still miss some part or have suggestions.
So, it might be worthwhile for us to investigate the use of the async storage API for results here as well.  That's clearly a follow-up bug though.
Attached patch patch (obsolete) — Splinter Review
unbitrot, addressed Mardak comments, some cleanup
Attachment #337045 - Attachment is obsolete: true
Attachment #337045 - Flags: review?(edilee)
Attachment #338301 - Flags: review?(edilee)
Comment on attachment 338301 [details] [diff] [review]
patch

r=Mardak

(In reply to comment #34)
> > >+    "ORDER BY 8 DESC LIMIT ?2 OFFSET ?3");
> > On a slightly separate note, what if we changed the frecency index to to
> > "frecency,typed,visit_count" which would let us do frecency DESC, typed DESC,
> > visit_count DESC.. ?
> not much useful if we split tables since on the compound table we don't have
> indexes (that's what makes these queries slower).
I was talking more in terms of bug 412736. Things are slow if we sort by frececny, typed, visit_count because we only have an index on frecency. If we changed the index to have all 3 in that same order, it should be fast again. But now that we have moz_places and moz_places_temp, things would potentially slow down again because we have to combine the 2 results.
Attachment #338301 - Flags: review?(edilee) → review+
Attached patch updated patchSplinter Review
Had a mochitest failure on tinderbox when I landed this today.  This patch contains that fix (easy; trivial).

However, landing all this stuff caused bug 455923 and bug 455904 as far as I can tell, so it was all backed out.

The good news is that I didn't see any performance regressions on tinderbox yet (some talos machines are still running).
Attachment #338301 - Attachment is obsolete: true
i have not been able to reproduce the tests error on my Windows machine for now
Well i got something, what i noticed was that "clear private data" function was moving from seconds to minutes. So i looked at our triggers, the problem appear related to updating visit_count, our view triggers are too slow when inserting or removing a visit, probably because are calling another trigger (the update one).

so i tried changing boh delete and insert triggers to update visit_count on both tables instead of on view. so like

#define CREATE_HISTORYVISITS_VIEW_DELETE_TRIGGER NS_LITERAL_CSTRING( \
  "CREATE TEMPORARY TRIGGER moz_historyvisits_view_delete_trigger " \
  "INSTEAD OF DELETE " \
  "ON moz_historyvisits_view " \
  "BEGIN " \
    "DELETE FROM moz_historyvisits_temp " \
    "WHERE id = OLD.id; " \
    "DELETE FROM moz_historyvisits " \
    "WHERE id = OLD.id; " \
    "UPDATE moz_places " \
    "SET visit_count = visit_count - 1 " \
    "WHERE moz_places.id = OLD.place_id " \
    "AND OLD.visit_type NOT IN (0, 4, 7); " /* invalid, EMBED, DOWNLOAD */ \
    "UPDATE moz_places_temp " \
    "SET visit_count = visit_count - 1 " \
    "WHERE id = OLD.place_id " \
    "AND OLD.visit_type NOT IN (0, 4, 7); " /* invalid, EMBED, DOWNLOAD */ \
  "END" \

And this makes things really better, tests pass, and clear private data takes only some more second but not minutes.

So i don't know if this could help solving our problems, but for sure it will reduce timings.
trigger changes moved to bug 456029
There is a bug in FindVisits, countStatements is subtracting duplicate entries even if then don't have visit_count > 0, instead it should do

    rv = aConnection->CreateStatement(NS_LITERAL_CSTRING(
        "SELECT "
          "(SELECT count(*) FROM moz_places_temp WHERE visit_count > 0) + "
          "(SELECT count(*) FROM moz_places WHERE visit_count > 0) - "
          "(SELECT count(*) FROM moz_places WHERE id IN "
            "(SELECT id FROM moz_places_temp) AND visit_count > 0)"),
      getter_AddRefs(countStatement));

the change is adding " AND visit_count > 0"
Depends on: 460947
comment 42 has been splitted to bug 460947
Comment on attachment 339307 [details] [diff] [review]
updated patch


Current patch may be subject to random crashes of the nature described in bug 460635.
(In reply to comment #44)
> Current patch may be subject to random crashes of the nature described in bug
> 460635.
No, it is very much likely bug 458998.
http://hg.mozilla.org/mozilla-central/rev/730be68af1e4
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Whiteboard: [merged with patch from bug 449640]
Target Milestone: --- → mozilla1.9.1b2
Blocks: 452207
Depends on: 463483
craptastic fixed per checkins
Status: RESOLVED → VERIFIED
Depends on: 466564
You need to log in before you can comment on or make changes to this bug.