Top sites does not properly display unicode domains

NEW
Unassigned

Status

()

Firefox for Android
Activity Stream
P3
normal
8 months ago
7 months ago

People

(Reporter: mcomella, Unassigned, Mentored)

Tracking

unspecified
All
Android
Points:
---

Firefox Tracking Flags

(fennec+)

Details

(Whiteboard: [mobileAS] 1.28)

MozReview Requests

()

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(1 attachment)

STR:
0) With photon fennec
1) Get http://faß.de in your top sites (clean profile, visit & bookmark)

Expected: top site displays ^
Actual: url bar displays "xn--fa-hia"

In Firefox Beta, the old top sites works as expected.

We should add some tests to getFormattedDomain to make sure this backend works as expected and should consider adding UI tests to ensure the UI does not regress either.

There is a similar issue with the photon url bar: bug 1391421.
Priority: P3 → --
Flod, how important are unicode domain names in practice?

Note to self: add same comment to bug 1391421.
Flags: needinfo?(francesco.lodolo)
Unfortunately, I don't have that information, and I don't know who could have it :-\

I think it's a matter of understanding how frequent they are in the wild. Is there any chance that we can get that information from telemetry?
Flags: needinfo?(francesco.lodolo)
Looks like desktop has the same issue: bug 1391667.
See Also: → bug 1391667
Spoke with some data folks in SF: sounds like no one would really have that data because we don't collect urls and presumably don't collect boolean unicode domain data.

Sounds like it'd be more work to get the data we need to know if it's worth fixing than to just fix it. :)
I assume we're going to want this because we don't have data about how important it is so I'm going to work on it.
Assignee: nobody → michael.l.comella
Iteration: --- → 1.28
Whiteboard: [mobileAS] → [mobileAS] 1.28
Comment hidden (mozreview-request)
This patch fixes the problems with my formatting code but it looks like we're getting punycode directly out of the DB so we'll have to figure out how to solve that: should we be storing unicode into the DB or converting it from punycode after pulling it out (if the latter, how do we know it's punycode?)? How did this work before? Perhaps it's related to how this regressed in the photon url bar (bug 1391421).
GlobalHistory.markUriVisited writes our sites to the DB and is being called with punycode. It gets called by Gecko so:
1) Java passes unicode to gecko and gecko returns punycode
2) Java turns unicode into punycode and passes it to gecko
If you type unicode into BrowserToolbar (e.g. http://faß.de), we pass unicode as the url in the Tab:Load event we send to gecko and Tab:Load in JS receives it as unicode. Eventually, aBrowser.loadURIWithFlags is called (iirc, calls C++ gecko) so wrt comment 8:

Java passes unicode to gecko and gecko returns punycode.

Furthering that point, in BrowserToolbar.onTabChanged, we eventually receive the LOCATION_CHANGE message, which has punycode (which is how we update the title to punycode, rather than unicode, in bug 1391421).

Snorp, do you know if our handling of IDNs has changed from 56 to 57? If so, is it expected and how? It seems that unicode urls we pass to gecko come back as punycode. In Beta, the BrowserToolbar shows unicode as does top sites so I think we might have changed how this is handled on the Gecko side.
Flags: needinfo?(snorp)

Comment 10

8 months ago
mozreview-review
Comment on attachment 8900033 [details]
Bug 1391422: Replace java.net.URI with android.net.Uri in getFormattedDomain.

https://reviewboard.mozilla.org/r/171358/#review176864
Attachment #8900033 - Flags: review?(s.kaspari) → review+
(In reply to Michael Comella (:mcomella) from comment #9)
> 
> Furthering that point, in BrowserToolbar.onTabChanged, we eventually receive
> the LOCATION_CHANGE message, which has punycode (which is how we update the
> title to punycode, rather than unicode, in bug 1391421).
> 
> Snorp, do you know if our handling of IDNs has changed from 56 to 57? If so,
> is it expected and how? It seems that unicode urls we pass to gecko come
> back as punycode. In Beta, the BrowserToolbar shows unicode as does top
> sites so I think we might have changed how this is handled on the Gecko side.

I'm not aware of any change, but it does sound like there was one. I think this would have happened in docshell or a related place. Maybe smaug knows?

I do recall there being some security issues with displaying unicode URLs, where phishing sites could use a unicode URL that looks the same or similar to the ASCII one. Maybe you can just convert the punycode to unicode in AS?
Flags: needinfo?(snorp) → needinfo?(bugs)

Updated

8 months ago
Iteration: 1.28 → 1.29

Comment 12

8 months ago
I don't know. Finding the regression range is the first thing to do in regression bugs.
Flags: needinfo?(bugs)

Updated

8 months ago
Iteration: 1.29 → ---
Priority: -- → P2

Updated

8 months ago
tracking-fennec: ? → +

Updated

8 months ago
Rank: 0 → 3
Low priority: unassigning for now. I hope the photon team continues work in bug 1391421 (and we can find a regression window there).
Assignee: michael.l.comella → nobody
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from bug 1391421 comment #8)
> Looks like this change was made intentionally as part of bug 1380617. I guess
> Firefox displays unicode by accessing uri.displaySpec instead of .spec. Jing
> Wei, you should be able to make a similar change in browser.js if you want
> similar behavior.

For this bug, I wonder if our DB will have a mismatch of unicode (old format) and punycode (new format) urls. Also, we don't have direct access to `uri.displaySpec` so we'll have to work around that.
(In reply to Michael Comella (:mcomella) from bug 1391421 comment #9)
> FYI: I'm not sure what the proper way to handle this is because bug 1397842
> was filed and describes the case where:
> - www.аррӏе.com/ could be a scam site
> - www.apple.com/ is the real Apple
> 
> but they look identical here and in the url bar on desktop. How do we do
> this on desktop and why? Is it the right approach?
[eng triage recommendation] P3: there are more important things to tackle.
[triage] A bigger issue considering the security risks.
Rank: 3
Priority: P2 → P3
All open Activity Stream bugs are moving from the whiteboard tag, "[mobileAS]", to the Firefox for Android component, "Activity Stream", so that I can keep better track of these bugs as the new triage owner; I will send out an email shortly with additional details, caveats, etc.
Component: General → Activity Stream
You need to log in before you can comment on or make changes to this bug.