Closed Bug 415460 Opened 13 years ago Closed 13 years ago

searching in places queries does not decode urls

Categories

(Firefox :: Bookmarks & History, defect, P2)

defect

Tracking

()

RESOLVED FIXED
Firefox 3 beta4

People

(Reporter: dietrich, Assigned: Mardak)

References

Details

(Whiteboard: [fixes bug 419366, bug 419766])

Attachments

(1 file, 8 obsolete files)

Places Library version of bug 407974.
Flags: blocking-firefox3?
morphing to describe the actual fix, and because it'll fix all searches using the places query system, not just the Library.
Priority: -- → P1
Summary: search in places library does not decode urls → searching in places queries does not decode urls
If we do that at the query level, let's make sure that in the Library "copy" will still copy an encoded URL. Just like the URL bar (show decoded but copy encoded).

Decoded URLs are not properly handled in any software that involve analyzing text to detect URLs (IRC Clients, Bugzilla...)
Flags: blocking-firefox3? → blocking-firefox3+
Assignee: nobody → sdwilsh
Whiteboard: [needs patch]
Attached patch v1 (obsolete) — Splinter Review
This fixes P1 bug 415460 and P3 bug 415397 for new URLs that the user visits or bookmarks.

With this patch, you can search for "ぽ" in the Library and find urls that match. The locations listed in the Library are also correctly unencoded. (However, the editBookmarks dialog at the bottom still shows encoded.)

URLs can still be searched and are decoded in the location bar autocomplete.

For new URLs that are visited, they can be consistently deleted (for bug 415397). Before, the user would visit say.. http://site/%40 and later select the url from the autocomplete and visit http://site/@ causing 2 entries. With the patch, the DB will always keep track of http://site/@.
Assignee: sdwilsh → edilee
Status: NEW → ASSIGNED
Attachment #305229 - Flags: review?(dietrich)
FYI, still passes Erwan's unit test that landed for bug 407973 (originally for bug 413784).
ug - I was working on this bug right now.  I'll attach a test I wrote...
Attached patch test (obsolete) — Splinter Review
Attachment #305233 - Flags: review?(dietrich)
Comment on attachment 305229 [details] [diff] [review]
v1

shawn, can you do first-r please?
Attachment #305229 - Flags: review?(dietrich) → review?(sdwilsh)
Comment on attachment 305229 [details] [diff] [review]
v1

>+  nsCAutoString charset;
>+  rv = aURI->GetOriginCharset(charset);
>+  NS_ENSURE_SUCCESS(rv, rv);
This could be empty - in which case it's assumed to be UTF-8

>+  nsAutoString uriSpec;
>+  (void)textToSubURI->UnEscapeURIForUI(charset, utf8URISpec, uriSpec);
So, I don't particularly like this because of
* escaping back the result (unescaped string) is not guaranteed to give the original escaped string
I don't think that that is how we want to store the information at least.
I think that calling unEscapeNonAsciiURI and maybe UnEscapeAndConvert (not necessarily in that order) would be better here.

>+  rv = statement->BindStringParameter(index,
>+    StringHead(uriSpec, HISTORY_URI_LENGTH_MAX));
I know you didn't change this, but I don't understand why we need StringHead...

I do believe that if we are changing how we store information in the database that we also need to migrate existing data.  This is an interesting use case because technically we aren't changing any of the schema...

Also a test showing that this works with a database that already has this stuff in it would be good.
Attachment #305229 - Flags: review?(sdwilsh) → review-
(In reply to comment #8)
> (From update of attachment 305229 [details] [diff] [review])
> >+  nsCAutoString charset;
> >+  rv = aURI->GetOriginCharset(charset);
> >+  NS_ENSURE_SUCCESS(rv, rv);
> This could be empty - in which case it's assumed to be UTF-8
> 
> >+  nsAutoString uriSpec;
> >+  (void)textToSubURI->UnEscapeURIForUI(charset, utf8URISpec, uriSpec);
> So, I don't particularly like this because of
> * escaping back the result (unescaped string) is not guaranteed to give the
> original escaped string
> I don't think that that is how we want to store the information at least.
> I think that calling unEscapeNonAsciiURI and maybe UnEscapeAndConvert (not
> necessarily in that order) would be better here.
> 
> >+  rv = statement->BindStringParameter(index,
> >+    StringHead(uriSpec, HISTORY_URI_LENGTH_MAX));
> I know you didn't change this, but I don't understand why we need StringHead...
> 
> I do believe that if we are changing how we store information in the database
> that we also need to migrate existing data.  This is an interesting use case
> because technically we aren't changing any of the schema...
> 
> Also a test showing that this works with a database that already has this stuff
> in it would be good.
> 

Assignee: edilee → sdwilsh
Status: ASSIGNED → NEW
You can continue working on it Shawn. I'll just keep unbitrotting stuff.
Feel free to take it - the scope is larger than I originally intended, and I don't know if I have the cycles for it before b4.
Attached patch v1.1 (obsolete) — Splinter Review
(In reply to comment #8)
> >+  rv = aURI->GetOriginCharset(charset);
> This could be empty - in which case it's assumed to be UTF-8
Ok. Defaulted to "UTF-8" if empty.

> >+  (void)textToSubURI->UnEscapeURIForUI(charset, utf8URISpec, uriSpec);
> * escaping back the result (unescaped string) is not guaranteed to give the
> original escaped string
E.g., escape(unescape("hi there")) -> hi%20there
But that's fine because we're showing this to the user which s/he could have typed in without us unescaping.

> >+    StringHead(uriSpec, HISTORY_URI_LENGTH_MAX));
See bug 319004.

> I do believe that if we are changing how we store information in the database
> that we also need to migrate existing data.
Added migration code and migration testcase. This should also fix bug 391691.
Assignee: sdwilsh → edilee
Attachment #305229 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #305310 - Flags: review?(sdwilsh)
Blocks: 419324
Priority: P1 → P2
Whiteboard: [needs patch] → [needs review sdwilsh/dietrich]
Comment on attachment 305310 [details] [diff] [review]
v1.1

>+  for (PRInt32 i = urisToUnescape.Count(); --i >= 0; ) {
ug that's a confusing loop

>+    rv = fixURI->BindUTF8StringParameter(0, *urisToUnescape[i]);
>+    NS_ENSURE_SUCCESS(rv, rv);
>+
>+    (void)textToSubURI->UnEscapeURIForUI(charset, *urisToUnescape[i], uriSpec);
>+
>+    rv = fixURI->BindStringParameter(1, uriSpec);
>+    NS_ENSURE_SUCCESS(rv, rv);
>+
>+    rv = fixURI->Execute();
>+    NS_ENSURE_SUCCESS(rv, rv);
I think it'd probably be best here to do
if (NS_FAILED(rv)) continue;

r=sdwilsh for what it's worth
Attachment #305310 - Flags: review?(sdwilsh)
Attachment #305310 - Flags: review?(dietrich)
Attachment #305310 - Flags: review+
Attachment #305233 - Attachment is obsolete: true
Attachment #305233 - Flags: review?(dietrich)
Whiteboard: [needs review sdwilsh/dietrich] → [has patch][needs review dietrich]
Attached patch v1.2 (obsolete) — Splinter Review
(In reply to comment #13)
> r=sdwilsh for what it's worth
> 
> >+    rv = fixURI->BindUTF8StringParameter(0, *urisToUnescape[i]);
> >+    NS_ENSURE_SUCCESS(rv, rv);
> >+
> >+    (void)textToSubURI->UnEscapeURIForUI(charset, *urisToUnescape[i], uriSpec);
> >+
> >+    rv = fixURI->BindStringParameter(1, uriSpec);
> >+    NS_ENSURE_SUCCESS(rv, rv);
> >+
> >+    rv = fixURI->Execute();
> >+    NS_ENSURE_SUCCESS(rv, rv);
> I think it'd probably be best here to do
> if (NS_FAILED(rv)) continue;
Switched both loops to use NS_SUCCEEDED then append string or NS_SUCCEEDED for both binding and then execute.

Additionally, changed the BindStatementURI to if (NS_FAILED() || empty) use UTF8.
Attachment #305310 - Attachment is obsolete: true
Attachment #305418 - Flags: review?(dietrich)
Attachment #305310 - Flags: review?(dietrich)
Comment on attachment 305418 [details] [diff] [review]
v1.2

>+  while (NS_SUCCEEDED(getURIs->ExecuteStep(&hasMore)) && hasMore) {
>+    if (NS_SUCCEEDED(getURIs->GetUTF8String(0, utf8URISpec)))
>+      urisToUnescape.AppendCString(utf8URISpec);
>+  }
Oh, I got rid of the braces for the while {}.
Blocks: 391691, 415397
Whiteboard: [has patch][needs review dietrich] → [has patch][needs review dietrich][fixes bug 415397, bug 391691]
Attached patch v1.3 (obsolete) — Splinter Review
Added unescaping for live/incremental bookmark updates and updated the testcase to check for it. See bug 419324 commment 3.
Attachment #305418 - Attachment is obsolete: true
Attachment #305440 - Flags: review?(dietrich)
Attachment #305418 - Flags: review?(dietrich)
Comment on attachment 305440 [details] [diff] [review]
v1.3

Mmm.. I'm not quite liking the patch anymore. There's several consumers that are expecting strings to be passed around as nsIURI.spec strings or just nsIURIs. E.g., nsNavHistory::IsVisited(nsIURI *aURI, PRBool *_retval).

We should try to keep that invariant. Strings are passed around as nsIURI.spec nsACString UTF8 Strings. Only when they're displayed show them with UnEscapeURIForUI.
Attachment #305440 - Flags: review?(dietrich) → review-
Any reason we shouldn't just do it in the view?
Attached patch v2 (obsolete) — Splinter Review
This also bug 419366. Basically, I do the same thing I did for autocomplete multi-word out-of-order search.
Attachment #305440 - Attachment is obsolete: true
Attachment #305699 - Flags: review?(dietrich)
No longer blocks: 391691, 415397, 419324
Whiteboard: [has patch][needs review dietrich][fixes bug 415397, bug 391691] → [has patch][needs review dietrich]
Blocks: 419366
Whiteboard: [has patch][needs review dietrich] → [has patch][needs review dietrich][fixes bug 419366]
Attached patch v2.1 (obsolete) — Splinter Review
Readded shawn's testcase.
Attachment #305699 - Attachment is obsolete: true
Attachment #305702 - Flags: review?(dietrich)
Attachment #305699 - Flags: review?(dietrich)
Comment on attachment 305702 [details] [diff] [review]
v2.1


>-
>-      if (!allTermsFound && !URIHasAnyTagFromTerms(aSet[nodeIndex]->mURI, tagTerms))
>-          continue;
>+      // Load up the title, url, tags for the current node as UTF16 strings
>+      NS_ConvertUTF8toUTF16 nodeTitle(aSet[nodeIndex]->mTitle);
>+      // Unescape the URL for search term matching
>+      nsCAutoString cNodeURL(aSet[nodeIndex]->mURI);
>+      NS_ConvertUTF8toUTF16 nodeURL(NS_UnescapeURL(cNodeURL));
>+      // Fetch the tags
>+      nsAutoString nodeTags;
>+      rv = aSet[nodeIndex]->GetTags(nodeTags);

no conversion needed?

also, iirc the direct db access in URIHasTag was originally due to poor performance of the tag service (not to mention level xpconnect traversals), and i believe GetTags() does exactly that.

please test, and if this is still a problem, maybe we should instead have GetTags() use URIHasTag()?

>+      NS_ENSURE_SUCCESS(rv, rv);
>+
>+      // Determine if every search term matches anywhere in the title, url, tag
>+      PRBool matchAll = PR_TRUE;
>+      for (PRInt32 termIndex = terms[queryIndex]->Count(); --termIndex >= 0 &&
>+        matchAll; ) {

nit: weird indent, please line up with parens contents
Attachment #305702 - Flags: review?(dietrich)
Attached patch v2.2 (obsolete) — Splinter Review
(In reply to comment #21)
> also, iirc the direct db access in URIHasTag was originally due to poor
> performance of the tag service (not to mention level xpconnect traversals), and
> i believe GetTags() does exactly that.
Yeah, turns out it was pretty slow.. 100+ms instead of the 10-15ms from before this patch. (I timed the start to end time of FilterResultSet.)

So instead, I use a separate query to get the tags strings back to do out-of-order multi-word matching. Now it's going at 9-12ms for queries.

> >+      for (PRInt32 termIndex = terms[queryIndex]->Count(); --termIndex >= 0 &&
> >+        matchAll; ) {
> nit: weird indent, please line up with parens contents
Shifted to under PRInt32.
Attachment #305702 - Attachment is obsolete: true
Attachment #305830 - Flags: review?(dietrich)
Blocks: 419766
Attached patch v2.3Splinter Review
Unbitrot changes from bug 385245.
Attachment #305830 - Attachment is obsolete: true
Attachment #305934 - Flags: review?(dietrich)
Attachment #305830 - Flags: review?(dietrich)
Comment on attachment 305934 [details] [diff] [review]
v2.3

thanks, r=me. can you please file a followup to fix nsNavHistoryResultNode.tags to use this?
Attachment #305934 - Flags: review?(dietrich) → review+
Checking in toolkit/components/places/src/nsNavHistory.cpp;
/cvsroot/mozilla/toolkit/components/places/src/nsNavHistory.cpp,v  <--  nsNavHistory.cpp
new revision: 1.264; previous revision: 1.263
done
Checking in toolkit/components/places/src/nsNavHistory.h;
/cvsroot/mozilla/toolkit/components/places/src/nsNavHistory.h,v  <--  nsNavHistory.h
new revision: 1.145; previous revision: 1.144
done
RCS file: /cvsroot/mozilla/toolkit/components/places/tests/unit/test_415460.js,v
done
Checking in toolkit/components/places/tests/unit/test_415460.js;
/cvsroot/mozilla/toolkit/components/places/tests/unit/test_415460.js,v  <--  test_415460.js
initial revision: 1.1
done
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Whiteboard: [has patch][needs review dietrich][fixes bug 419366] → [fixes bug 419366]
Blocks: 419792
Whiteboard: [fixes bug 419366] → [fixes bug 419366, bug 419766]
Depends on: 425993
Bug 451915 - move Firefox/Places bugs to Firefox/Bookmarks and History. Remove all bugspam from this move by filtering for the string "places-to-b-and-h".

In Thunderbird 3.0b, you do that as follows:
Tools | Message Filters
Make sure the correct account is selected. Click "New"
Conditions: Body   contains   places-to-b-and-h
Change the action to "Delete Message".
Select "Manually Run" from the dropdown at the top.
Click OK.

Select the filter in the list, make sure "Inbox" is selected at the bottom, and click "Run Now". This should delete all the bugspam. You can then delete the filter.

Gerv
Component: Places → Bookmarks & History
QA Contact: places → bookmarks
You need to log in before you can comment on or make changes to this bug.