Closed Bug 500391 Opened 15 years ago Closed 15 years ago

When filtering results on search avoid querying tags and parent for each result

Categories

(Toolkit :: Places, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.9.3a1
Tracking Status
status1.9.2 --- beta5-fixed

People

(Reporter: mak, Assigned: mak)

References

Details

(Keywords: perf, Whiteboard: [TSnappiness])

Attachments

(2 files, 4 obsolete files)

Actually when filtering results on a query with searchTerms, for every node we examine we get the tags with a query and then we filter.
This means that if a query has to pass through 1000 nodes, we will execute 1000 queries...
This is obviously crazy and slow, you can see the result trying to search in Library or Sidebar with NSPR_logging enabled on storage.

In this specific case we should instead try to query for tags in the original query, so that we will get them directly from the node. see bug 486603 for further informations on why we can't do that always.

Fixing this should provide a largely better experience when using Places search fields.
Whiteboard: [TSnappiness]
And we also get the parent folder for each bookmark returned by a query...
Summary: When filtering results on search avoid querying tags for each result → When filtering results on search avoid querying tags and parent for each result
Attached patch wip 0.1 (obsolete) — Splinter Review
passes tests, and should be faster, i can search in a debug build, so i suppose optimized builds will be good. I'll create some trybuild to share testing. I'll also need to measure perf when executing queries in the sidebar.
searching seem to be faster, in debug builds it was taking about 20s, now it takes about 10s (as usual on a large db with lot of history). So this is improving.
builds will appear here:
https://build.mozilla.org/tryserver-builds/mak77@bonardo.net-try-24803e3c1131/
Attached patch wip 0.2 (obsolete) — Splinter Review
Includes an additional experimental change to the treeview.

I've taken a look around, our trees are damn slow due to the fact we hold a visibleElements list, and we populate and update that list on invalidateContainer. that means we have to getChild() each node from the resultNode, and xpcom is expensive when you getChild thousands of nodes. So i've practically moved most of the buildVisibleSection code to a new method of the container. This is a bit hackish, but i want to see how it performs, from my measurements it takes about less than half of the time.
Attachment #385887 - Attachment is obsolete: true
To put in some number, on a debug build with large history i can open the history sidebar by last visited in 6 seconds, vs more than 20s it was taking before. wip 0.2 is actually building on tryserver.
these have a small bug that won't allow to use uriIsPrefix queries, but should be good for testing https://build.mozilla.org/tryserver-builds/mak77@bonardo.net-try-10b834f019d3/
So with latest patch i measured time for a search in history sidebar on a pretty large DB, without patch it took about 144s, with patch 43s... still a lot but less than a half is a good first attack. notice without any search the sidebar takes about 41s, so the search term overhead is highly reduced.
Attached patch patch v1.0 (obsolete) — Splinter Review
i forgot to tell that previous times are taken with a debug build, an optimized one will most likely take a really shorter time, really likely less than 10s.
Btw we can measure this as about a 70% improvement on search times.

if we could rewrite the query builder this code would really benefit for maintainability
Attachment #386330 - Attachment is obsolete: true
Attachment #403389 - Flags: review?(mano)
Comment on attachment 403389 [details] [diff] [review]
patch v1.0

Nice work, even more nice gains.

>   // This is a LEFT OUTER JOIN with moz_places since folders does not have
>   // a reference into that table.
>   rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
>       "SELECT h.id, h.url, COALESCE(b.title, h.title), "
>         "h.rev_host, h.visit_count, h.last_visit_date, f.url, null, b.id, "
>-        "b.dateAdded, b.lastModified, b.position, b.type, b.fk, b.folder_type "
>+        "b.dateAdded, b.lastModified, b.parent, null, b.position, b.type, "
>+        "b.fk, b.folder_type "
>       "FROM moz_bookmarks b "
>       "JOIN moz_places_temp h ON b.fk = h.id "
>       "LEFT JOIN moz_favicons f ON h.favicon_id = f.id "
>       "WHERE b.parent = ?1 "
>       "UNION ALL "
>       "SELECT h.id, h.url, COALESCE(b.title, h.title), "
>         "h.rev_host, h.visit_count, h.last_visit_date, f.url, null, b.id, "
>-        "b.dateAdded, b.lastModified, b.position, b.type, b.fk, b.folder_type "
>+        "b.dateAdded, b.lastModified, b.parent, null, b.position, b.type, "
>+        "b.fk, b.folder_type "

We cannot add mFolderId just for some nodes, please update all queries.

Remember to special case the places root as you do that.

>+    nsAutoString tags;
>+    rv = aRow->GetString(kGetInfoIndex_ItemTags, tags);
>+    NS_ENSURE_SUCCESS(rv, rv);
>+    if (tags.Length())
>+      (*aResult)->mTags.Assign(tags);
>+

We cannot do that, mTags needs to be comma-separated.


>   PRInt64 session = aRow->AsInt64(kGetInfoIndex_SessionId);
> 
>   if (aOptions->ResultType() == nsNavHistoryQueryOptions::RESULTS_AS_VISIT) {
>     *aResult = new nsNavHistoryVisitResultNode(url, title, accessCount, time,
>                                                favicon, session);
>     if (! *aResult)
>       return NS_ERROR_OUT_OF_MEMORY;
>+
>+    nsAutoString tags;
>+    rv = aRow->GetString(kGetInfoIndex_ItemTags, tags);
>+    if (tags.Length())
>+      (*aResult)->mTags.Assign(tags);
>+

ditto
Attachment #403389 - Flags: review?(mano) → review-
Attached patch patch v1.1 (obsolete) — Splinter Review
Good catches, i recall i had found some problem with those commas, but having passed some time i forgot to check :( i should have annotated that in the bug at that time.
So i investigated why we were using comma somewhere and space somewhere else, and the answer was "no reason", i also asked dietrich to be sure since he introduced that binded param, and he thinks is fine to use a comma everywhere.

- fixed comma/space issue
- avoid setting mFolderId if parent value is <= 0
- updated mDBBookmarkToUrlResult to set parent value

i've not found other queries in need of an update, btw i'll take a second look while you review this updated patch.
Attachment #403389 - Attachment is obsolete: true
Attachment #403805 - Flags: review?(mano)
Comment on attachment 403805 [details] [diff] [review]
patch v1.1


> PlacesSQLQueryBuilder::SelectAsURI()
> {
>+  nsNavHistory* history;
>+  history = nsNavHistory::GetHistoryService();

nit: assign on decalre:


> nsresult
> PlacesSQLQueryBuilder::SelectAsVisit()
> {
>+  nsNavHistory* history;
>+  history = nsNavHistory::GetHistoryService();

ditto.


>   // URI
>-  if (NS_SUCCEEDED(aQuery->GetHasUri(&hasIt)) && hasIt)
>+  if (NS_SUCCEEDED(aQuery->GetHasUri(&hasIt)) && hasIt) {
>     BindStatementURI(statement, index.For("uri"), aQuery->Uri());
>+    if (aQuery->UriIsPrefix()) {
>+      nsCAutoString uriString;
>+      aQuery->Uri()->GetSpec(uriString);

At first, I though it would crash if  if uriisprefix is set, but uri itself isn't, but then I saw the hasIt check. I think we should rather check aQuery->Uri() directly (actually, please file a bug to remove these hasIt* methods altogether).
 

>   nsTArray<nsTArray<nsString>*> terms;
>   ParseSearchTermsFromQueries(aQueries, &terms);
> 
>-  PRInt32 queryIndex;
>   PRUint16 resultType = aOptions->ResultType();

Please move |resultType| closer to its usage.

> 
>   // The includeFolders array for each query is initialized with its
>   // query's folders array. We add sub-folders as we check items.
>   nsTArray< nsTArray<PRInt64>* > includeFolders;
>   nsTArray< nsTArray<PRInt64>* > excludeFolders;
>-  for (queryIndex = 0;
>+  for (PRInt32 queryIndex = 0;

nit: PRUin32


> 
>-  for (PRInt32 nodeIndex = 0; nodeIndex < aSet.Count(); nodeIndex ++) 
...
>+  for (PRInt32 nodeIndex = 0; nodeIndex < aSet.Count(); nodeIndex++)
...
>-  // de-allocate the matrixes
>+  // De-allocate the temporary matrixes.
>   for (PRInt32 i = 0; i < aQueries.Count(); i++) {

ditto, while your'e at it.

>+    // Find all the folders with the annotation we are excluding and save their
>+    // item ids.  When doing filtering, if a result's parent item id matches one
>+    // of the saved item ids, the result will be excluded.

nit: result's parent's or  " ... of ..."


>-  PRBool isNull;
>-  rv = aRow->GetIsNull(kGetInfoIndex_ItemId, &isNull);
>-  NS_ENSURE_SUCCESS(rv, rv);
>-  if (!isNull) {
>-    itemId = aRow->AsInt64(kGetInfoIndex_ItemId);
>-  }
>+  PRInt64 itemId = aRow->AsInt64(kGetInfoIndex_ItemId);
>+  PRInt64 parentId = -1;
>+  if (itemId > 0)
>+    parentId = aRow->AsInt64(kGetInfoIndex_ItemParentId);
>+  else
>+    itemId = -1;
> 

itemId could actually be 0, i think (places root?). 
Also, this code path is a little confusing.  Please add some comments.



>       (*aResult)->mItemId = itemId;
>+      // Don't set mFolderId for Places root node since it would be 0, and we
>+      // could rely on a not existant parent.
>+      if (parentId > 0)

Sigh, CreateRoot(..placesroot..) should've passed "null" rather 0 :(


>+
>+    nsAutoString tags;
>+    rv = aRow->GetString(kGetInfoIndex_ItemTags, tags);
>+    NS_ENSURE_SUCCESS(rv, rv);
>+    if (tags.Length())
>+      (*aResult)->mTags.Assign(tags);
>+

Shouldn't you use IsVoid? most URIs don't have tags set and thus 0 length is valid.

>+    nsAutoString tags;
>+    rv = aRow->GetString(kGetInfoIndex_ItemTags, tags);
>+    if (tags.Length())
>+      (*aResult)->mTags.Assign(tags);
>+

ditto.

r=mano otherwise.
Attachment #403805 - Flags: review?(mano) → review+
(In reply to comment #11)
> >   // URI
> >-  if (NS_SUCCEEDED(aQuery->GetHasUri(&hasIt)) && hasIt)
> >+  if (NS_SUCCEEDED(aQuery->GetHasUri(&hasIt)) && hasIt) {
> >     BindStatementURI(statement, index.For("uri"), aQuery->Uri());
> >+    if (aQuery->UriIsPrefix()) {
> >+      nsCAutoString uriString;
> >+      aQuery->Uri()->GetSpec(uriString);
> 
> At first, I though it would crash if  if uriisprefix is set, but uri itself
> isn't, but then I saw the hasIt check. I think we should rather check
> aQuery->Uri() directly (actually, please file a bug to remove these hasIt*
> methods altogether).

for the follow-up, do you mean to remove all GetHas methods, or just hasIt checks?


> >   // The includeFolders array for each query is initialized with its
> >   // query's folders array. We add sub-folders as we check items.
> >   nsTArray< nsTArray<PRInt64>* > includeFolders;
> >   nsTArray< nsTArray<PRInt64>* > excludeFolders;
> >-  for (queryIndex = 0;
> >+  for (PRInt32 queryIndex = 0;
> 
> nit: PRUin32

That would then require to cast nsCOMArray.Count() everywhere(PRInt32 Count() const) or it would pollute output with bunch of compiler warnings.
Not sure why arrays count has been defined signed, but i won't change that here since does not make a difference for us.

> >+  PRInt64 itemId = aRow->AsInt64(kGetInfoIndex_ItemId);
> >+  PRInt64 parentId = -1;
> >+  if (itemId > 0)
> >+    parentId = aRow->AsInt64(kGetInfoIndex_ItemParentId);
> >+  else
> >+    itemId = -1;
> > 
> 
> itemId could actually be 0, i think (places root?). 
> Also, this code path is a little confusing.  Please add some comments.

table ids start from 1, so no itemId can ever be 0, and that's enough to check. i modified this code path to make it more clear and added comments though.

> >+      // could rely on a not existant parent.
> >+      if (parentId > 0)
> 
> Sigh, CreateRoot(..placesroot..) should've passed "null" rather 0 :(

doesn't change a lot, when you call asInt64 null would be converted to 0, you have to explicitly use GetIsNull... btw i've removed this check since i just set parentId = -1 above.

> >+    nsAutoString tags;
> >+    rv = aRow->GetString(kGetInfoIndex_ItemTags, tags);
> >+    NS_ENSURE_SUCCESS(rv, rv);
> >+    if (tags.Length())
> >+      (*aResult)->mTags.Assign(tags);
> >+
> 
> Shouldn't you use IsVoid? most URIs don't have tags set and thus 0 length is
> valid.

Not sure what that would bring us, mTags is initialized to a void string in the result code, how can 0 length be valid to overwrite that initialization? If the db string has 0 length we don't have tags, then there is no reason to change the void initialized value in the node.
Attached patch patch v1.2Splinter Review
fixed all comments based on IRC discussion with Mano.
Attachment #403805 - Attachment is obsolete: true
http://hg.mozilla.org/mozilla-central/rev/98a42087c07b

the functionality is tested by previous existing unit tests. This is mostly performances in searching the Library or the sidebar (really minor improvements could appear in other points though).
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.3a1
filed Bug 520608 about getting rid of getHasXXX methods.
Comment on attachment 404577 [details] [diff] [review]
patch v1.2

This brings a 70% speed improvement in searches in sidebar and library. Would be pretty cool to have on 1.9.2

there are no compatibility changes and queries are already tested by our existing tests.
Attachment #404577 - Flags: approval1.9.2?
Attached patch patch for 1.9.2Splinter Review
this is the patch unbitrotted and ready to be pushed on 1.9.2 (with author and comment)
a192=beltzner, be ready to pull it out if it causes problems, though!
Depends on: 536081
Blocks: 491746
Blocks: 490512
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: