Closed Bug 387312 Opened 17 years ago Closed 16 years ago

Decode/unescape UTF-8 URL fragments in the location bar

Categories

(Camino Graveyard :: Location Bar & Autocomplete, enhancement)

All
macOS
enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
Camino2.0

People

(Reporter: alqahira, Assigned: chris)

References

()

Details

(Keywords: polish, Whiteboard: p-safari)

Attachments

(1 file, 6 obsolete files)

STR:

1. Visit URL in Safari
2. Visit URL in Camino
3. Compare and weep

Safari already does this, and Firefox now does following its wacky location bar rewrite (bug 366797).  Leaving aside any security benefits, it looks a whole heck of a lot better and makes such URLs understandable for people who know those languages.
Do we know why this doesn't just magically work in the first place? Is it just a matter of fixing up what Gecko hands us? This seems like it might not be too difficult if that's all that's involved.

General consensus on IRC seems to be fairly neutral, but no objections were raised. Personally, I think this is a pretty critical bug for our Devanagari users (who are already pretty much screwed by any 1.8 branch build or earlier due to font handling issues), and its security benefits are a very nice bonus.
Status: UNCONFIRMED → NEW
Ever confirmed: true
[2:44pm] Wevah: should just be able to make an NSURL with the (domain non-punycoded) url string and run stringByReplacingPercentEscapesUsingEncoding:NSUTF8StringEncoding on it
[2:44pm] Wevah: er, on the path part
[2:44pm] Wevah: and reassemble

(This bug also affects our Arabic and CJK users, who are not being screwed [as much] on the branch; it just happened that I had a Devanagari url handiest)
Well, it's not quite that simple, since there are cases like a Google search for:
"a & b" ? /
where if we just decode everything, the URL as copied from the location bar wouldn't work if pasted. We need to make sure any reserved characters stay encoded (probably be decoding everything then doing a targeted re-encoding).
I'd like to see if we can do this for 2.0.
Flags: camino2.0?
Assignee: nobody → trendyhendy2000
Attached patch Initial attempt (obsolete) — Splinter Review
Initial attempt using a Core Foundation method that allows you to replace percent escapes and exclude a few from that replacement. I put in  "';/?:@&=+$, and space, which seem to be what's needed by RFC 2396 and looking at what Safari does.

I don't know if this is anywhere close to what needs to be done. I'm putting it out for comments, because it's late and URL encoding makes my head hurt.
Attachment #337278 - Flags: review?
A couple of comments, then:

1) The flash of the %-encoded URL in the location bar on pageload before it's replaced by the decoded version looks odd.  This seems to happen only on the load of the first %-encoded URL in a window?

2) Once I was able to get in a situation where pressing back or forward would always display the %-encoded version of those URLs instead of the decoded one, but I can't reproduce it so far.

3) In a very quick check, it doesn't seem to break shortcuts with : or / or regress bug 341846 or bug 326532
This patch doesn't look like it will do the right thing if a setURI: comes in while the location bar is being edited.

I also don't think it'll handle the location sheet case.

I wonder what would happen if we moved the unescaping to BrowserWindowController's updateLocationFields: method instead (which is the only thing that calls into AutoCompleteTextField's setURI: method anyway).

cl
Hardware: Macintosh → All
Attached patch Fix v1.1 (obsolete) — Splinter Review
Made the encoding stuff into an NSString method. Modified BrowserWrapper and BrowserWindowController to provide nice URLs for location fields and window titles, including local files and view-source.

What will need to be done in another bug is put nice URLs on the pasteboard, for example from the Copy Link Location CM item.
Attachment #337278 - Attachment is obsolete: true
Attachment #337400 - Flags: review?
Attachment #337278 - Flags: review?
Attached patch Fix v1.2 (obsolete) — Splinter Review
Moves the URL prettifying deeper, into CHBrowserView. We get nice URLs in location fields, window titles, page info, tabs, view-source, and local URLs.

Places to fix up in followup bugs are history and page link locations (copy link location, dnd).
Attachment #337400 - Attachment is obsolete: true
Attachment #337601 - Flags: review?
Attachment #337400 - Flags: review?
Blocks: 370331
Attached patch Fix v1.3 (obsolete) — Splinter Review
Updated version of Fix v1.2 to take into account a recent cvs checkin of another patch.
Attachment #337601 - Attachment is obsolete: true
Attachment #339179 - Flags: review?(murph)
Attachment #337601 - Flags: review?
Attachment #339179 - Flags: review?(murph) → review+
Comment on attachment 339179 [details] [diff] [review]
Fix v1.3

> -(NSString*)focusedURLString
> {
>   if (!_webBrowser)
>     return @"";
> 
>   nsCOMPtr<nsIDOMWindow> focussedWindow = [self focussedDOMWindow];
>-  NSString* locationString = [self locationFromDOMWindow:focussedWindow];
>+  NSString* locationString = [[self locationFromDOMWindow:focussedWindow] exposableURI];
>   return locationString ? locationString : @"";
> }

Based on where |focusedURLString| is used, mainly in supplying a referrer URL, should we leave this method return a fully escaped one?

>+- (NSString *)exposableURI;

>+// Create a URI that looks good in a location field
>+// Excluded character list comes from RFC2396 and by examining Safari's behaviour
>+- (NSString*)exposableURI
>+{

Since the other method in the CaminoURLStringUtils category is commented in the header, I think it'd be best we do that for |exposableURI| as well.  I'd only include the first comment line in the header, and instead place the information about where the excluded character list comes from in the implementation, since it's that kind of a detail.  Also, maybe avoid using the word "create" in the header's comment, as that would imply an owning reference.  Just use, "Returns a URI that..."

>+  return (NSString*)CFURLCreateStringByReplacingPercentEscapesUsingEncoding(kCFAllocatorDefault,
>+                                                                            (CFStringRef)self,
>+                                                                            CFSTR(" \"\';/?:@&=+$,#"),
>+                                                                            kCFStringEncodingUTF8);
>+}

The CFString object returned by CFURLCreateStringByReplacingPercentEscapesUsingEncoding() follows CF's create rule, so we'll need to release it.  You can |autorelease| a toll-free bridged object, so doing the following will work:

NSString* exposableURI = (NSString*)CFURLCreateStringByReplacingPercentEscapesUsingEncoding(kCFAllocatorDefault,
                                                                                            (CFStringRef)self,
                                                                                            CFSTR(" \"\';/?:@&=+$,#"),
                                                                                            kCFStringEncodingUTF8);
return [exposableURI autorelease];

Since the behavior is what we want, with those few changes r=murph.  Again, nice work hendy.
Attached patch Fix v1.4 (obsolete) — Splinter Review
Patch addressing murph's comments.
Attachment #339179 - Attachment is obsolete: true
Attachment #340092 - Flags: superreview?(stuart.morgan+bugzilla)
Comment on attachment 340092 [details] [diff] [review]
Fix v1.4

>+- (NSString *)exposableURI;

Escaped URLs are exposable, and we already use that term in the context of fixup of internal Gecko cache URLs. Call this unescapedURI, since that's really what it's doing.
 

This change seems pretty wide-reaching; do we really want to change the internal format that URLs are passed around in, rather than just the display layers? For example Safari doesn't store bookmark URLs unescaped, and they appear to provide the escaped version to the clipboard.

If we do want to take this approach, then every URL coming out of the CH layer should be unescaped, so the loading callbacks at the BW layer should be getting unescaped URLs, and we should audit everywhere URLs go out to make sure they should in fact be going out escaped. But I'm leaning toward keeping the internal representation as valid URLs, and only doing unescaping when for display.
Attachment #340092 - Flags: superreview?(stuart.morgan+bugzilla) → superreview-
Ironically, Safari does populate 'url '/public.url with the unescaped UTF-8 version of an unescaped URL (or an escaped bookmark URL)!
Attached patch Fix v1.5 (obsolete) — Splinter Review
Moves the unescaping higher up the hierarchy, like in v1.1, but with the changes introduced in v1.4 and comment 14. Also includes an unescape call for the Page Info window.

The Copy Link Location et al functions talk to Gecko directly, so they will be harder to fix. They are beyond the scope of this bug anyway.
Attachment #340092 - Attachment is obsolete: true
Attachment #345059 - Flags: superreview?(stuart.morgan+bugzilla)
Comment on attachment 345059 [details] [diff] [review]
Fix v1.5

Sorry for the delay; sr=smorgan. One less thing to make Smokey weep!
Attachment #345059 - Flags: superreview?(stuart.morgan+bugzilla) → superreview+
I suspect this patch is causing crashes with non-ascii characters in query strings.

example:
http://www.elpais.com/vineta/?d_date=20081111&autor=Ram%F3n&anchor=elpporopivin&xref=20081111elpepuvin_1&type=Tes&k=Ramon

or https://www.zurich.co.jp/, use the search box at the top (copy a Japanese word under the 'my zurich' column and paste)
 or try this
http://search.zurich.co.jp/cgi-bin/search?query=SmartID%93o%98%5E+%26%23187%3B
(the query string is set by script, is not decoded by Fx/Safari)

A 12 hour old build doesn't crash, a build with this patch goes boom.
Attachment #345059 - Flags: superreview+ → superreview-
Comment on attachment 345059 [details] [diff] [review]
Fix v1.5

Doh! Thanks, I should have thought of that.

This:

>+  return [unescapedURI autorelease];

Needs to be
  return unescapedURI ? [unescapedURI autorelease] : self;


Also, while you are respinning, I realized that this section:

>-  [mURLBar setURI:url];
>-  [mLocationSheetURLField setStringValue:url];
>+  [mURLBar setURI:[url unescapedURI]];
>+  [mLocationSheetURLField setStringValue:[url unescapedURI]];

Should be getting [url unescapedURI] once into a local variable and using that, not doing the unescape twice.
Attached patch Fix v1.6Splinter Review
Fix incorporating Stuart's comments. Now exception free!
Attachment #345059 - Attachment is obsolete: true
Attachment #347713 - Flags: superreview?(stuart.morgan+bugzilla)
Comment on attachment 347713 [details] [diff] [review]
Fix v1.6

sr=smorgan. philippe, thanks once again for being our early warning system :)
Attachment #347713 - Flags: superreview?(stuart.morgan+bugzilla) → superreview+
(In reply to comment #21)
> (From update of attachment 347713 [details] [diff] [review])
> sr=smorgan. 

Yum, yum, the smell of kanjis in the location bar :-)
And the kid is happy, no need to annoy me to get nice looking URL's in his homework (from wikipedia-japan)
Checked in on cvs trunk.  الف شكر!

We should file some follow-ups on stuff this didn't cover (Copy Link Location, Camino pasteboard code, Bookmarks/History).
Status: NEW → RESOLVED
Closed: 16 years ago
Flags: camino2.0? → camino2.0+
Resolution: --- → FIXED
Target Milestone: --- → Camino2.0
(In reply to comment #23)
> Checked in on cvs trunk.  الف شكر!
> 
> We should file some follow-ups on stuff this didn't cover (Copy Link Location

Filed as bug 464496.

> Camino pasteboard code

Smokey's working on figuring out what needs to be filed here.

> Bookmarks/History).

Filed as bug 464501.
The drag-n-drop bug is bug 464602.
Just for the record, I also filed bug 466117 on the autocomplete window.
...And bug 475210 on toolbar and download tooltips.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: