searching in places queries does not decode urls

RESOLVED FIXED in Firefox 3 beta4

Status

()

Firefox
Bookmarks & History
P2
normal
RESOLVED FIXED
10 years ago
7 years ago

People

(Reporter: dietrich, Assigned: Mardak)

Tracking

Trunk
Firefox 3 beta4
Points:
---
Dependency tree / graph
Bug Flags:
blocking-firefox3 +
in-testsuite +

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [fixes bug 419366, bug 419766])

Attachments

(1 attachment, 8 obsolete attachments)

(Reporter)

Description

10 years ago
Places Library version of bug 407974.
Flags: blocking-firefox3?
(Reporter)

Comment 1

10 years ago
morphing to describe the actual fix, and because it'll fix all searches using the places query system, not just the Library.
Priority: -- → P1
Summary: search in places library does not decode urls → searching in places queries does not decode urls

Comment 2

10 years ago
If we do that at the query level, let's make sure that in the Library "copy" will still copy an encoded URL. Just like the URL bar (show decoded but copy encoded).

Decoded URLs are not properly handled in any software that involve analyzing text to detect URLs (IRC Clients, Bugzilla...)
Flags: blocking-firefox3? → blocking-firefox3+

Updated

10 years ago
Assignee: nobody → sdwilsh
Whiteboard: [needs patch]
(Assignee)

Comment 3

10 years ago
Created attachment 305229 [details] [diff] [review]
v1

This fixes P1 bug 415460 and P3 bug 415397 for new URLs that the user visits or bookmarks.

With this patch, you can search for "ぽ" in the Library and find urls that match. The locations listed in the Library are also correctly unencoded. (However, the editBookmarks dialog at the bottom still shows encoded.)

URLs can still be searched and are decoded in the location bar autocomplete.

For new URLs that are visited, they can be consistently deleted (for bug 415397). Before, the user would visit say.. http://site/%40 and later select the url from the autocomplete and visit http://site/@ causing 2 entries. With the patch, the DB will always keep track of http://site/@.
Assignee: sdwilsh → edilee
Status: NEW → ASSIGNED
Attachment #305229 - Flags: review?(dietrich)
(Assignee)

Comment 4

10 years ago
FYI, still passes Erwan's unit test that landed for bug 407973 (originally for bug 413784).
ug - I was working on this bug right now.  I'll attach a test I wrote...

Updated

10 years ago
Attachment #305233 - Flags: review?(dietrich)
(Reporter)

Comment 7

10 years ago
Comment on attachment 305229 [details] [diff] [review]
v1

shawn, can you do first-r please?
Attachment #305229 - Flags: review?(dietrich) → review?(sdwilsh)
Comment on attachment 305229 [details] [diff] [review]
v1

>+  nsCAutoString charset;
>+  rv = aURI->GetOriginCharset(charset);
>+  NS_ENSURE_SUCCESS(rv, rv);
This could be empty - in which case it's assumed to be UTF-8

>+  nsAutoString uriSpec;
>+  (void)textToSubURI->UnEscapeURIForUI(charset, utf8URISpec, uriSpec);
So, I don't particularly like this because of
* escaping back the result (unescaped string) is not guaranteed to give the original escaped string
I don't think that that is how we want to store the information at least.
I think that calling unEscapeNonAsciiURI and maybe UnEscapeAndConvert (not necessarily in that order) would be better here.

>+  rv = statement->BindStringParameter(index,
>+    StringHead(uriSpec, HISTORY_URI_LENGTH_MAX));
I know you didn't change this, but I don't understand why we need StringHead...

I do believe that if we are changing how we store information in the database that we also need to migrate existing data.  This is an interesting use case because technically we aren't changing any of the schema...

Also a test showing that this works with a database that already has this stuff in it would be good.
Attachment #305229 - Flags: review?(sdwilsh) → review-
(Assignee)

Comment 9

10 years ago
(In reply to comment #8)
> (From update of attachment 305229 [details] [diff] [review])
> >+  nsCAutoString charset;
> >+  rv = aURI->GetOriginCharset(charset);
> >+  NS_ENSURE_SUCCESS(rv, rv);
> This could be empty - in which case it's assumed to be UTF-8
> 
> >+  nsAutoString uriSpec;
> >+  (void)textToSubURI->UnEscapeURIForUI(charset, utf8URISpec, uriSpec);
> So, I don't particularly like this because of
> * escaping back the result (unescaped string) is not guaranteed to give the
> original escaped string
> I don't think that that is how we want to store the information at least.
> I think that calling unEscapeNonAsciiURI and maybe UnEscapeAndConvert (not
> necessarily in that order) would be better here.
> 
> >+  rv = statement->BindStringParameter(index,
> >+    StringHead(uriSpec, HISTORY_URI_LENGTH_MAX));
> I know you didn't change this, but I don't understand why we need StringHead...
> 
> I do believe that if we are changing how we store information in the database
> that we also need to migrate existing data.  This is an interesting use case
> because technically we aren't changing any of the schema...
> 
> Also a test showing that this works with a database that already has this stuff
> in it would be good.
> 

Assignee: edilee → sdwilsh
Status: ASSIGNED → NEW
(Assignee)

Comment 10

10 years ago
You can continue working on it Shawn. I'll just keep unbitrotting stuff.
Feel free to take it - the scope is larger than I originally intended, and I don't know if I have the cycles for it before b4.
(Assignee)

Comment 12

10 years ago
Created attachment 305310 [details] [diff] [review]
v1.1

(In reply to comment #8)
> >+  rv = aURI->GetOriginCharset(charset);
> This could be empty - in which case it's assumed to be UTF-8
Ok. Defaulted to "UTF-8" if empty.

> >+  (void)textToSubURI->UnEscapeURIForUI(charset, utf8URISpec, uriSpec);
> * escaping back the result (unescaped string) is not guaranteed to give the
> original escaped string
E.g., escape(unescape("hi there")) -> hi%20there
But that's fine because we're showing this to the user which s/he could have typed in without us unescaping.

> >+    StringHead(uriSpec, HISTORY_URI_LENGTH_MAX));
See bug 319004.

> I do believe that if we are changing how we store information in the database
> that we also need to migrate existing data.
Added migration code and migration testcase. This should also fix bug 391691.
Assignee: sdwilsh → edilee
Attachment #305229 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #305310 - Flags: review?(sdwilsh)
(Assignee)

Updated

10 years ago
Blocks: 419324

Updated

10 years ago
Priority: P1 → P2
Whiteboard: [needs patch] → [needs review sdwilsh/dietrich]
Comment on attachment 305310 [details] [diff] [review]
v1.1

>+  for (PRInt32 i = urisToUnescape.Count(); --i >= 0; ) {
ug that's a confusing loop

>+    rv = fixURI->BindUTF8StringParameter(0, *urisToUnescape[i]);
>+    NS_ENSURE_SUCCESS(rv, rv);
>+
>+    (void)textToSubURI->UnEscapeURIForUI(charset, *urisToUnescape[i], uriSpec);
>+
>+    rv = fixURI->BindStringParameter(1, uriSpec);
>+    NS_ENSURE_SUCCESS(rv, rv);
>+
>+    rv = fixURI->Execute();
>+    NS_ENSURE_SUCCESS(rv, rv);
I think it'd probably be best here to do
if (NS_FAILED(rv)) continue;

r=sdwilsh for what it's worth
Attachment #305310 - Flags: review?(sdwilsh)
Attachment #305310 - Flags: review?(dietrich)
Attachment #305310 - Flags: review+

Updated

10 years ago
Attachment #305233 - Attachment is obsolete: true
Attachment #305233 - Flags: review?(dietrich)

Updated

10 years ago
Whiteboard: [needs review sdwilsh/dietrich] → [has patch][needs review dietrich]
(Assignee)

Comment 14

10 years ago
Created attachment 305418 [details] [diff] [review]
v1.2

(In reply to comment #13)
> r=sdwilsh for what it's worth
> 
> >+    rv = fixURI->BindUTF8StringParameter(0, *urisToUnescape[i]);
> >+    NS_ENSURE_SUCCESS(rv, rv);
> >+
> >+    (void)textToSubURI->UnEscapeURIForUI(charset, *urisToUnescape[i], uriSpec);
> >+
> >+    rv = fixURI->BindStringParameter(1, uriSpec);
> >+    NS_ENSURE_SUCCESS(rv, rv);
> >+
> >+    rv = fixURI->Execute();
> >+    NS_ENSURE_SUCCESS(rv, rv);
> I think it'd probably be best here to do
> if (NS_FAILED(rv)) continue;
Switched both loops to use NS_SUCCEEDED then append string or NS_SUCCEEDED for both binding and then execute.

Additionally, changed the BindStatementURI to if (NS_FAILED() || empty) use UTF8.
Attachment #305310 - Attachment is obsolete: true
Attachment #305418 - Flags: review?(dietrich)
Attachment #305310 - Flags: review?(dietrich)
(Assignee)

Comment 15

10 years ago
Comment on attachment 305418 [details] [diff] [review]
v1.2

>+  while (NS_SUCCEEDED(getURIs->ExecuteStep(&hasMore)) && hasMore) {
>+    if (NS_SUCCEEDED(getURIs->GetUTF8String(0, utf8URISpec)))
>+      urisToUnescape.AppendCString(utf8URISpec);
>+  }
Oh, I got rid of the braces for the while {}.
(Assignee)

Updated

10 years ago
Blocks: 391691, 415397
Whiteboard: [has patch][needs review dietrich] → [has patch][needs review dietrich][fixes bug 415397, bug 391691]
(Assignee)

Comment 16

10 years ago
Created attachment 305440 [details] [diff] [review]
v1.3

Added unescaping for live/incremental bookmark updates and updated the testcase to check for it. See bug 419324 commment 3.
Attachment #305418 - Attachment is obsolete: true
Attachment #305440 - Flags: review?(dietrich)
Attachment #305418 - Flags: review?(dietrich)
(Assignee)

Comment 17

10 years ago
Comment on attachment 305440 [details] [diff] [review]
v1.3

Mmm.. I'm not quite liking the patch anymore. There's several consumers that are expecting strings to be passed around as nsIURI.spec strings or just nsIURIs. E.g., nsNavHistory::IsVisited(nsIURI *aURI, PRBool *_retval).

We should try to keep that invariant. Strings are passed around as nsIURI.spec nsACString UTF8 Strings. Only when they're displayed show them with UnEscapeURIForUI.
Attachment #305440 - Flags: review?(dietrich) → review-
Any reason we shouldn't just do it in the view?
(Assignee)

Comment 19

10 years ago
Created attachment 305699 [details] [diff] [review]
v2

This also bug 419366. Basically, I do the same thing I did for autocomplete multi-word out-of-order search.
Attachment #305440 - Attachment is obsolete: true
Attachment #305699 - Flags: review?(dietrich)
(Assignee)

Updated

10 years ago
No longer blocks: 391691, 415397, 419324
Whiteboard: [has patch][needs review dietrich][fixes bug 415397, bug 391691] → [has patch][needs review dietrich]
(Assignee)

Updated

10 years ago
Blocks: 419366
Whiteboard: [has patch][needs review dietrich] → [has patch][needs review dietrich][fixes bug 419366]
(Assignee)

Comment 20

10 years ago
Created attachment 305702 [details] [diff] [review]
v2.1

Readded shawn's testcase.
Attachment #305699 - Attachment is obsolete: true
Attachment #305702 - Flags: review?(dietrich)
Attachment #305699 - Flags: review?(dietrich)
(Reporter)

Comment 21

10 years ago
Comment on attachment 305702 [details] [diff] [review]
v2.1


>-
>-      if (!allTermsFound && !URIHasAnyTagFromTerms(aSet[nodeIndex]->mURI, tagTerms))
>-          continue;
>+      // Load up the title, url, tags for the current node as UTF16 strings
>+      NS_ConvertUTF8toUTF16 nodeTitle(aSet[nodeIndex]->mTitle);
>+      // Unescape the URL for search term matching
>+      nsCAutoString cNodeURL(aSet[nodeIndex]->mURI);
>+      NS_ConvertUTF8toUTF16 nodeURL(NS_UnescapeURL(cNodeURL));
>+      // Fetch the tags
>+      nsAutoString nodeTags;
>+      rv = aSet[nodeIndex]->GetTags(nodeTags);

no conversion needed?

also, iirc the direct db access in URIHasTag was originally due to poor performance of the tag service (not to mention level xpconnect traversals), and i believe GetTags() does exactly that.

please test, and if this is still a problem, maybe we should instead have GetTags() use URIHasTag()?

>+      NS_ENSURE_SUCCESS(rv, rv);
>+
>+      // Determine if every search term matches anywhere in the title, url, tag
>+      PRBool matchAll = PR_TRUE;
>+      for (PRInt32 termIndex = terms[queryIndex]->Count(); --termIndex >= 0 &&
>+        matchAll; ) {

nit: weird indent, please line up with parens contents
Attachment #305702 - Flags: review?(dietrich)
(Assignee)

Comment 22

10 years ago
Created attachment 305830 [details] [diff] [review]
v2.2

(In reply to comment #21)
> also, iirc the direct db access in URIHasTag was originally due to poor
> performance of the tag service (not to mention level xpconnect traversals), and
> i believe GetTags() does exactly that.
Yeah, turns out it was pretty slow.. 100+ms instead of the 10-15ms from before this patch. (I timed the start to end time of FilterResultSet.)

So instead, I use a separate query to get the tags strings back to do out-of-order multi-word matching. Now it's going at 9-12ms for queries.

> >+      for (PRInt32 termIndex = terms[queryIndex]->Count(); --termIndex >= 0 &&
> >+        matchAll; ) {
> nit: weird indent, please line up with parens contents
Shifted to under PRInt32.
Attachment #305702 - Attachment is obsolete: true
Attachment #305830 - Flags: review?(dietrich)
(Assignee)

Updated

10 years ago
Blocks: 419766
(Assignee)

Comment 23

10 years ago
Created attachment 305934 [details] [diff] [review]
v2.3

Unbitrot changes from bug 385245.
Attachment #305830 - Attachment is obsolete: true
Attachment #305934 - Flags: review?(dietrich)
Attachment #305830 - Flags: review?(dietrich)
(Reporter)

Comment 24

10 years ago
Comment on attachment 305934 [details] [diff] [review]
v2.3

thanks, r=me. can you please file a followup to fix nsNavHistoryResultNode.tags to use this?
Attachment #305934 - Flags: review?(dietrich) → review+
(Assignee)

Comment 25

10 years ago
Checking in toolkit/components/places/src/nsNavHistory.cpp;
/cvsroot/mozilla/toolkit/components/places/src/nsNavHistory.cpp,v  <--  nsNavHistory.cpp
new revision: 1.264; previous revision: 1.263
done
Checking in toolkit/components/places/src/nsNavHistory.h;
/cvsroot/mozilla/toolkit/components/places/src/nsNavHistory.h,v  <--  nsNavHistory.h
new revision: 1.145; previous revision: 1.144
done
RCS file: /cvsroot/mozilla/toolkit/components/places/tests/unit/test_415460.js,v
done
Checking in toolkit/components/places/tests/unit/test_415460.js;
/cvsroot/mozilla/toolkit/components/places/tests/unit/test_415460.js,v  <--  test_415460.js
initial revision: 1.1
done
Status: ASSIGNED → RESOLVED
Last Resolved: 10 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Whiteboard: [has patch][needs review dietrich][fixes bug 419366] → [fixes bug 419366]
(Assignee)

Updated

10 years ago
Blocks: 419792
Whiteboard: [fixes bug 419366] → [fixes bug 419366, bug 419766]

Updated

10 years ago
Depends on: 425993
Bug 451915 - move Firefox/Places bugs to Firefox/Bookmarks and History. Remove all bugspam from this move by filtering for the string "places-to-b-and-h".

In Thunderbird 3.0b, you do that as follows:
Tools | Message Filters
Make sure the correct account is selected. Click "New"
Conditions: Body   contains   places-to-b-and-h
Change the action to "Delete Message".
Select "Manually Run" from the dropdown at the top.
Click OK.

Select the filter in the list, make sure "Inbox" is selected at the bottom, and click "Run Now". This should delete all the bugspam. You can then delete the filter.

Gerv
Component: Places → Bookmarks & History
QA Contact: places → bookmarks
You need to log in before you can comment on or make changes to this bug.