Open Bug 586082 Opened 14 years ago Updated 2 years ago

Internationalized URLs are stored incorrectly

Categories

(Firefox :: Sync, defect)

defect

Tracking

()

Tracking Status
blocking2.0 --- -

People

(Reporter: st3fan, Unassigned)

References

Details

(Whiteboard: [sync:bookmarks][sync:history][sync:tabs][sync:rigor])

Internationalized URLs are pushed in an incorrect format to the back-end.

The problem is that the host part is unicode while it should be punycode-encoded.
What defines this as incorrect?  I assume we're using whatever Firefox stores...
Well, Firefox gives us an nsIURI object and we use uri.spec which may include unicode characters in the hostname and path elements. IIRC the problem with that is that the high bytes are lost in crypto. We could UTF-8 encode the URI or we could just do what the spec says:

* use punycode for the hostname
* URL encode the path

uri.asciiSpec should do this for us for free.
The change is rather trivial, but it's kind of a storage bump for the engines involved (bookmarks, history, tabs). It's not technically a backwards incompatibility, but we might as well bump the storage version for those engines since we're bumping the global storage version in bug 603489 anyway, thereby introducing backwards incompatibility.

Nominating for blocking2.0.
blocking2.0: --- → ?
(In reply to comment #3)
> The change is rather trivial,

FWIW, I'm talking about using uri.asciiSpec everywhere we put a URI in a WBO.
We kinda missed the boat on this for the big storage rev.  So I'm going to minus, without a better explanation of why we must do this.
blocking2.0: ? → -
Is this still an issue, or did we fix the underlying problem with crypto chopping the high bytes?
Whiteboard: [needs verification]
(In reply to Mike Connor [:mconnor] from comment #6)
> Is this still an issue, or did we fix the underlying problem with crypto
> chopping the high bytes?

That's not really the problem. The problem is that we don't normalize URLs correctly. We store all of it as UTF-8 but should really do the hostname as punycode where appropriate. (Note that Places has the same problem, I think.)
Adding this to my radar; I mentioned it in sec review. We'll be handling these IRIs, and trying to produce correct entries, from Android Sync.
rnewman: what's the story on Android Sync? New engine version?
(In reply to Gregory Szorc [:gps] from comment #9)
> rnewman: what's the story on Android Sync? New engine version?

We fetch the string from the database directly. Whether we do the right thing or not depends on what Fennec (and the profile migrator) stuffs in there; I'd guess it's a Unicode (UTF-16) Java String.
Fennec does this, too.

13:08:19 <@mfinkle> rnewman, i hope the ascii spec, but this is all java now
13:08:33 < rnewman> yeah, there's nothing on the Java side that translates it, as I can see
13:08:33 <@mfinkle> rnewman, we send over the browser.currentURI.spec

And I'm sure favicon URIs etc. in other records are wrong, too.
13:11:04 <@mfinkle> rnewman, looks like we send over the currentURI.spec here:
13:11:07 <@mfinkle> http://mxr.mozilla.org/mozilla-central/source/mobile/android/chrome/content/browser.js#2012
13:11:16 <@mfinkle> and here
13:11:18 <@mfinkle> http://mxr.mozilla.org/mozilla-central/source/mobile/android/chrome/content/browser.js#1976
Blocks: 745408
Whiteboard: [needs verification] → [sync:bookmarks][sync:history][sync:tabs]
Whiteboard: [sync:bookmarks][sync:history][sync:tabs] → [sync:bookmarks][sync:history][sync:tabs][sync:scale]
Whiteboard: [sync:bookmarks][sync:history][sync:tabs][sync:scale] → [sync:bookmarks][sync:history][sync:tabs][sync:rigor]
Component: Firefox Sync: Backend → Sync
Product: Cloud Services → Firefox
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.