Closed Bug 464501 Opened 16 years ago Closed 1 month ago

Bookmark Info window and Bookmark Manager/History need to unescape/decode UTF-8 URIs

Categories

(Camino Graveyard :: Bookmarks, defect, P3)

All
macOS
defect

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: bugzilla-graveyard, Assigned: chris)

References

Details

Attachments

(1 file, 11 obsolete files)

Followup to bug 387312.

When bookmarking a UTF-8 link either via "Bookmark this link…" in the CM, drag and drop of the link onto the bookmark bar, or via the "Add Bookmark…" menu item, we need to unescape the UTF-8 URI, since it's displayed in the Bookmark Info window. Likewise, the Location column in the manager (for either Bookmarks or History) should display a "pretty" URI.

Stuart said in bug 387312 comment 14 that he was leaning toward only unescaping when we display, and keeping the internal representation as-is.

Hendy, if you're not going to fix this soon, feel free to assign to nobody.
One interesting thing I've discovered is that if you have unescaped URLs in history from before this patch, and you visit them again (or paste in the same URL, unescaped?), you can end up with the same URL in history (and autocomplete) twice, once escaped and once unescaped(!).  I can't seem to reproduce this with a fresh profile, though, which is good.
Summary: Bookmark Info window and Bookmark Manager need to unescape/decode UTF-8 URIs → Bookmark Info window and Bookmark Manager/History need to unescape/decode UTF-8 URIs
Blocks: 468258
We should make sure that tooltips pick up the unescaping once it's done, particularly on 10.4 where we don't get Cocoa's tooltips overriding ours.
Attached patch Fix v1.0 (obsolete) — Splinter Review
This should do it. Adds an |escapedURI| method to NSString that undoes what |unescapedURI| does. Makes sure that when a bookmark URL is changed, the escaped version makes it into the underlying storage, and the unescaped version is what the user sees.

This patch also enables filtering in bookmarks and history with non-ASCII characters. The extra |unescapedURI| calls will slow the filtering down a bit, but optimizing the filtering really belongs in another bug.
Attachment #369188 - Flags: review?
Attachment #369188 - Flags: review? → review?(nick.kreeger)
Comment on attachment 369188 [details] [diff] [review]
Fix v1.0

looks fine to me.
Attachment #369188 - Flags: review?(nick.kreeger) → review+
Attachment #369188 - Flags: superreview?(stuart.morgan+bugzilla)
Comment on attachment 369188 [details] [diff] [review]
Fix v1.0

>+NSString* const kExcludedCharacters = @" \"\';/?:@&=+$,#"; // List comes from RFC2396 and by examining Safari's behaviour

The comment should be on its own line above (and end with a period).

> - (id)outlineView:(NSOutlineView *)outlineView objectValueForTableColumn:(NSTableColumn *)tableColumn byItem:(id)item
...
>+  else if ([[tableColumn identifier] isEqualToString:@"url"])
>+    retValue = [(NSString*)[item valueForKey:[tableColumn identifier]] unescapedURI];

I'm not seeing a corresponding change to the setter, so it seems like editing the URL inline would change the escaping.

> - (id)outlineView:(NSOutlineView*)outlineView objectValueForTableColumn:(NSTableColumn*)aTableColumn byItem:(id)item
...
>+      return [[item url] unescapedURI];

Same here.


Couldn't we just do this with formatters instead? That seems like it would be less messy than making all the data sources handle what is fundamentally a display issue.
Attachment #369188 - Flags: superreview?(stuart.morgan+bugzilla) → superreview-
Attached patch Fix v2.0 (obsolete) — Splinter Review
New version of the patch using a new UnescapedURIFormatter. Right now the formatter is used for the views that Fix v1.0 affected; I can look in another bug at using it on other views.

BM/History filtering no longer has any performance hit.

Sending back for review, as this is substantially different from v1.0.
Attachment #369188 - Attachment is obsolete: true
Attachment #370821 - Flags: review?(nick.kreeger)
Comment on attachment 370821 [details] [diff] [review]
Fix v2.0

Canceling review, as I have found a few issues to do with URL edits not being escaped when put into the backend.
Attachment #370821 - Attachment is obsolete: true
Attachment #370821 - Flags: review?(nick.kreeger)
Attached patch Fix v2.1 (obsolete) — Splinter Review
OK, this is better. Most of the patch involves setting an UnescapedURIFormatter on various bookmark controls and changing some |stringValue|s to |objectValue|s so the backend is always given the escaped versions. There are still some calls to |unescapedURI|, for the cell tooltips.
Attachment #370998 - Flags: review?(nick.kreeger)
Comment on attachment 370998 [details] [diff] [review]
Fix v2.1

First, sorry for letting this slip...


>+@implementation UnescapedURIFormatter
>+
>+- (NSString*)stringForObjectValue:(id)anObject
>+{
>+  if (anObject && [anObject isKindOfClass:[NSString class]]) {
>+    return [anObject unescapedURI];
>+  } else {
>+    return @"";
>+  }
>+}
>+

Not to start a style war, but I think the Camino style guide doesn't follow the older Mozilla style:
http://wiki.caminobrowser.org/Development:Coding

  ...
  if {
  }
  else {
  }
  ...


Looks fine other than that.
r=me
Attachment #370998 - Flags: review?(nick.kreeger) → review+
Attached patch Fix v2.2 (obsolete) — Splinter Review
Thanks for the review, Nick. The bracing style has been changed.
Attachment #370998 - Attachment is obsolete: true
Attachment #374623 - Flags: superreview?(stuart.morgan+bugzilla)
Attachment #374623 - Flags: review+
Comment on attachment 374623 [details] [diff] [review]
Fix v2.2

One thing that this patch breaks is the resolving of the shortcut field. When committing edits to the location field in the Bookmark info dialog, the %s gets stored as %25s. When we then try to resolve it (from the location bar), it fails.

I'll spin a new patch with this fixed. Most probably, it will be done in BookmarkInfoController's |commitChanges:| method.
Attachment #374623 - Flags: superreview?(stuart.morgan+bugzilla)
Attachment #374623 - Flags: superreview-
Attachment #374623 - Flags: review-
Attachment #374623 - Flags: review+
Attached patch Fix v2.3 (obsolete) — Splinter Review
Right, this should do it. Augments the |setUrl| method of Bookmark to search for occurrences of %25s and restore them to %s. We're dealing with internal format stuff, so it makes sense to do this in backend code. Plus, direct editing in the bookmark manager table columns calls |setUrl| (good old KVC) directly.

Also removes a couple of superfluous casts. Apart from that the patch is the same as v2.2.
Attachment #374623 - Attachment is obsolete: true
Attachment #376032 - Flags: review?(nick.kreeger)
Comment on attachment 376032 [details] [diff] [review]
Fix v2.3

>Index: camino/src/bookmarks/Bookmark.mm
>===================================================================

> - (void)setUrl:(NSString *)aURL
> {
>   if (!aURL)
>     return;
> 
>+  // We get an escaped version of the URL to store, but that means that the
>+  // special keyword placeholder %s gets escaped too. We need to search for
>+  // any instances of %25s in the string and restore them to %s.
>+  NSMutableString* restoredURL;
>+  if ([aURL rangeOfString:@"%25s"].location != NSNotFound) {
>+    restoredURL = [[aURL mutableCopy] autorelease];
>+    [restoredURL replaceOccurrencesOfString:@"%25s"
>+                                 withString:@"%s"
>+                                    options:NSLiteralSearch
>+                                      range:NSMakeRange(0, [restoredURL length])];
>+    aURL = [NSString stringWithString:restoredURL];
>+  }
>+

One thing, does it make sense to move this escaping chunk to NSString+Utils?
r+
Attachment #376032 - Flags: review?(nick.kreeger) → review+
Attached patch Fix v2.4 (obsolete) — Splinter Review
Thanks for the review, Nick. You're right, the %s search does belong in the formatter - that way, a bookmark search will successfully find keyword bookmarks.
Attachment #376032 - Attachment is obsolete: true
Attachment #377139 - Flags: superreview?(stuart.morgan+bugzilla)
Attachment #377139 - Flags: review+
Comment on attachment 377139 [details] [diff] [review]
Fix v2.4

So... we can't do this. I hadn't thought this through all the way before, but I'm not sure we can actually handle anything that requires escaping user input like this.

As an example, paste the following into the location bar and hit enter:
http://hi.wikipedia.org/wiki/%E0%A4%AE%E0%A5%81%E0%A4%96%E0%A4%AA%E0%A5%83%E0%A4%B7%E0%A5%8D%E0%A4%A0
Now edit a bookmark (in a build with this patch) and paste that into the URL field. Visit the bookmarks. Be sad (although humorously, if you then click in the location bar and then hit enter, it will work, because we are unescaping the double-escaped URL when we display it there).

And of course we can't unescape it on the way out but not escape it on the way in, because then just editing it without making changes will change the URL, which is equally bad.

URL escaping and unescaping is unpleasant stuff :P
Attachment #377139 - Flags: superreview?(stuart.morgan+bugzilla) → superreview-
Attached patch Fix v2.5 (obsolete) — Splinter Review
OK, this should make it better. Now before setting the object value, the formatter checks if the URL is already escaped. We also need to work around the (un)escaping not playing well with the %s keyword. Shouldn't be a performance hit, since the getObjectValue method isn't called all that often, and even less often with a 'keyworded' URI.

Also, Stuart's test URL above revealed another buggy bit of behaviour, so the same method now strips control characters and whitespace before escaping.
Attachment #377139 - Attachment is obsolete: true
Attachment #381991 - Flags: superreview?(stuart.morgan+bugzilla)
Attachment #381991 - Attachment is patch: true
Attachment #381991 - Attachment mime type: application/octet-stream → text/plain
Attached patch Fix 2.6 (obsolete) — Splinter Review
Turns out being able to put spaces in the bookmarks search field is a good thing.
Attachment #381991 - Attachment is obsolete: true
Attachment #382003 - Flags: superreview?(stuart.morgan+bugzilla)
Attachment #381991 - Flags: superreview?(stuart.morgan+bugzilla)
Attachment #382003 - Flags: superreview?(stuart.morgan+bugzilla) → superreview-
Comment on attachment 382003 [details] [diff] [review]
Fix 2.6

>+// Returns true if the string represents an escaped URI
>+- (BOOL)isEscapedURI;

I'm really not sure it's possible to write such a function (like I said, it's unpleasant stuff).

>+  return ![self isEqualToString:[self unescapedURI]];

So http://foo.bar.com/q=%20 is escaped. Are you sure? If I searched for ' ' then it is, but how do you know I didn't search for '%20', with %2520 as the escaped URL?

Yes, that's a contrived example, but any (unescaped) URL containing %[A-F0-9]{2}, which is perfectly valid, will give the wrong answer.
Attached patch Fix v2.7 (obsolete) — Splinter Review
The test in |isEscapedURI| says that http://foo.bar.com/q=%20 is unescaped. This test is only utilised by the |getObjectValue| method, which itself is only utilised when taking string values of user-edited URI fields and putting them in bookmarks.plist. In this context, spaces would get escaped as %20, and %20 would get escaped as %2520. The user will typically edit the URI to make custom search queries; the URLs would get escaped properly in this context. While the test is suitable for this method, it is not appropriate everywhere; in this patch I have merged it into |getObjectValue|. This patch now plays well with URLs like that in Comment 15.

I have also simplified how %s is dealt with, by adding the |stringByReplacingOccurancesOfString:withString:| category method; this method is available in 10.5, but we still need the category for our 10.4 target.

The UnescapedURIFormatter has been removed from the bookmarks search field, because it prevented non-ASCII characters from being found in the non-URI fields (eg: title). The easiest way to search through all fields is to pass the search term from the field and only escape it when comparing it to the item's url property.
Attachment #382003 - Attachment is obsolete: true
Attachment #384320 - Flags: review?(stuart.morgan+bugzilla)
Comment on attachment 384320 [details] [diff] [review]
Fix v2.7

>+- (NSString *)stringByReplacingOccurancesOfString:(NSString *)aTarget withString:(NSString *)aReplacement;

This is spelled incorrectly, but even if it weren't shadowing a standard method with our own is a very dangerous game. This needs a different name (maybe stringByReplacingSubstring:withString:?) and a comment saying that it should be removed when we stop supporting 10.4.

>+// List comes from RFC2396 and by examining Safari's behaviour.
>+NSString* const kExcludedCharacters = @" \"\';/?:@&=+$,#";

Should be static, and needs a more descriptive name since it's now file-scoped.

>+    NSMutableString* mutableCopy = [self mutableCopy];
...
>+    return [mutableCopy autorelease];

Camino style is to prefer autoreleasing at creation time, since it's less error-prone.

>+  // Our special keyword placeholder %s disrupts the unescaping process. We need
>+  // to replace any instances of %s in the string before unescaping.

Since this is magic bookmark-specific handling, perhaps these methods should be named escapedBookmarkURI and unescapedBookmarkURI. If you need the generic versions too, you can leave them (but without the %s handling) and have the bookmark versions call through to them.

>+  if (anObject && [anObject isKindOfClass:[NSString class]]) {
>+    return [anObject unescapedURI];
>+  }
>+  else {
>+    return @"";
>+  }

No else after a conditional return.

>+  // Only escape the URI if it is unescaped.
>+  *anObject = [aString isEqualToString:[aString unescapedURI]] ? [aString escapedURI] : aString;

I still don't understand how this is safe, even for this usage. I apologize for the bad example I gave before; this stuff makes my head hurt. Here's a better one:
1) I paste in an escaped URL: ...foo%2525bar...
2) This logic says "that seems to be escaped already" and leaves it alone
3) When it's displayed in a text field later, the formater the other way unescapes it, showing me: ...foo%25bar...
4) When that text field is committed, the logic still believes it's already escaped, and leaves it alone.
5) The internally stored bookmark is now ...foo%25bar..., which is *not* the same URL. It's destroyed my bookmark.

It's not just an imaginary concern, since I've seen a number of URLs that are intended to do intercept the user to do something (e.g., log in) and then redirect them to the page they were trying to go to--which is a doubly-encoded parameter in the URL.

> - (BOOL)matchesString:(NSString*)searchString inFieldWithTag:(int)tag
> {
>   switch (tag) {
>     case eBookmarksSearchFieldAll:
>-      return (([[self url] rangeOfString:searchString options:NSCaseInsensitiveSearch].location != NSNotFound) ||
>+      return (([[self url] rangeOfString:[searchString escapedURI] options:NSCaseInsensitiveSearch].location != NSNotFound) ||
>               [super matchesString:searchString inFieldWithTag:tag]);
> 
>     case eBookmarksSearchFieldURL:
>-      return ([[self url] rangeOfString:searchString options:NSCaseInsensitiveSearch].location != NSNotFound);
>+      return ([[self url] rangeOfString:[searchString escapedURI] options:NSCaseInsensitiveSearch].location != NSNotFound);

This step should be done at the boundary where it is coming from the user (when the search is kicked of from the search field), not internally (where we might call it with a string that didn't come from the user) and once per bookmark.
Attachment #384320 - Flags: review?(stuart.morgan+bugzilla) → review-
Thanks for the comments, Stuart. I think we're getting closer all the time.

Maybe the best way forward at this time is to remove the conditional unescaping and have escaped URLs like that in comment 15 stay that way. This is Safari's current behaviour, and if the user doesn't like it he can always cut and paste again from the location bar. There shouldn't be too many sources of escaped URLs anyway.
Attached patch Fix v2.8 (obsolete) — Splinter Review
>This is spelled incorrectly, but even if it weren't shadowing a standard method
>with our own is a very dangerous game. This needs a different name (maybe
>stringByReplacingSubstring:withString:?) and a comment saying that it should be
>removed when we stop supporting 10.4.

Done.

>Should be static, and needs a more descriptive name since it's now file-scoped.

Done.

>Camino style is to prefer autoreleasing at creation time, since it's less
>error-prone.

Done in |stringByReplacingSubstring:withString:|.

>Since this is magic bookmark-specific handling, perhaps these methods should be
>named escapedBookmarkURI and unescapedBookmarkURI. If you need the generic
>versions too, you can leave them (but without the %s handling) and have the
>bookmark versions call through to them.

Done.

>No else after a conditional return.

Done.

>I still don't understand how this is safe, even for this usage.

I think I've finally found a foolproof solution, by using NSURL. NSURL will not allow any unescaped strings to init it. I'd say this is what Safari uses, too.

>This step should be done at the boundary where it is coming from the user (when
>the search is kicked of from the search field), not internally (where we might
>call it with a string that didn't come from the user) and once per bookmark.

I have modified the relevant methods in HistoryDataSource and BookmarkViewController to perform an extra search for the URL field (if necessary).
Attachment #384320 - Attachment is obsolete: true
Attachment #387603 - Flags: review?(stuart.morgan+bugzilla)
Attached patch Fix v2.9 (obsolete) — Splinter Review
As was quickly pointed out on IRC, NSURL doesn't play well with non-standard schemes. Since we don't want to escape javascript: and data: URIs, an extra check needs to be made in UnescapedURIFormatter's |getObjectValue:forString:errorDescription:| method. The formatter should now be able to handle any URL we throw at it.
Attachment #387603 - Attachment is obsolete: true
Attachment #387606 - Flags: review?(stuart.morgan+bugzilla)
Attachment #387603 - Flags: review?(stuart.morgan+bugzilla)
Attached patch Fix v2.10Splinter Review
Sorry about this. Smokey informed me about URLs like http://日本語.jp/. Just needed to tweak the bookmarks search code a bit (we now do both searches when the tag is All or URL).
Attachment #387606 - Attachment is obsolete: true
Attachment #387764 - Flags: review?(stuart.morgan+bugzilla)
Attachment #387606 - Flags: review?(stuart.morgan+bugzilla)
Attachment #387764 - Flags: review?(stuart.morgan+bugzilla)
Comment on attachment 387764 [details] [diff] [review]
Fix v2.10

I keep running into issues, so I'm holding off a review for now. In a few days I should have most stuff ironed out.
This either needs to make b4 or fall to 2.1, due to the complexity of code changes.

(I believe I've been told other unescaping bugs beyond those on the blocks list are going to use some of the methods added here, but I can't recall.  It would be useful if we set them.)
Flags: camino2.0b4?
Per conversation with hendy, we're going to put this on hold for 2.0.
Flags: camino2.0b4? → camino2.0b4-
These are P3 on the 2.1 list, but I think they're at the very bottom of hendy's to-do list.
Priority: -- → P3
Target Milestone: --- → Camino2.1
Pushing all the UTF-8 stuff out, since it's somewhat risky for this stage in the process.
Target Milestone: Camino2.1 → ---
Status: NEW → RESOLVED
Closed: 1 month ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: