Closed Bug 1324716 Opened 7 years ago Closed 7 years ago

Investigate handling of characters disallowed in URLs by IDNA2008

Categories

(Core :: Networking, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla53
Tracking Status
firefox-esr45 51+ fixed
firefox50 --- wontfix
firefox51 --- verified
firefox52 --- fixed
firefox53 --- fixed

People

(Reporter: smontagu, Assigned: smontagu)

References

Details

(Keywords: csectype-spoof, sec-moderate, Whiteboard: [necko-active][post-critsmash-triage][adv-main51+][adv-esr45.7+])

Attachments

(2 files)

Bug 1323338 added U+2010 HYPHEN to the IDN blacklist. See what I said there in bug 1323338 comment 20:

> > Have we implemented IDNA2008, or just some tweaks on top of IDN2003?
> > according to rfc5892 \u2010 is "DISALLOWED" along with the rest of that
> > punctuation block (except ZWNJ and ZWJ which are CONTEXTJ).
> 
> In http://www.unicode.org/Public/idna/latest/IdnaMappingTable.txt U+2010 is
> marked as "valid" with the annotation "NV8". According to
> http://www.unicode.org/reports/tr46/#Table_Data_File_Fields this means "the
> status is valid but the character is excluded by IDNA2008 from all domain
> names for all versions of Unicode".
> 
> I think further investigation is needed to be sure how ICU's implementation
> of IDNA2008 treats such characters, i.e. whether they just accept them as
> valid, or whether they are setting some flag which we aren't testing for.
> This covers other characters as well, as long as they don't trigger punicode
> by being marked as "restricted" in xidmodifications.txt/, for example U+058A
> ARMENIAN HYPHEN
So the path forward seems to be:

1) Establish whether ICU marks these characters as illegal in IDNs or not.

2) If yes:

2.1) Change our code to detect this and trigger punycode.

3) If no:

3.1) Add any other affected characters to the IDN blacklist as in bug 1323338

3.2) File a bug on ICU
Group: core-security → network-core-security
1) I've run all valid NV8 characters (total of 7629) against the current ICU4J (58.2). Settings of the lib are the same as FF uses (NONTRANSITIONAL, CHECK_CONTEXTJ, CHECK_BIDI). Results in the attachment, but summing it up:

- Only 107 conversions fail, out of which
-- 74 because of leading combining mark. If the char is not leading the string, the conversion does not fail.
-- 33 because of bidi checks. I am not an expert in bidi, so I cannot talk about this.

All in all it looks to me as if ICU does not treat NV8 chars in a special way at all. Thus, I skip to step 3)

3.1) All affected chars are in the attachment in the format (Codepoint->A-String, or Codepoint->ICU-Error).
3.2) I will and am linking it here.
Results of applying ICU UTS46 processing on all codepoints marked as NV8
I  think it's not that alarming. 
Firefox only allows [:IdentifierStatus=Allowed:] ( + Aspirational scripts), doesn't it? 

The following set has only 9 code points. (see https://goo.gl/mPD0Ys ). A few more need to be blocked, I guess. 

[[:IdentifierStatus=Allowed:] [:Identifier_Type=Aspirational:]] & [:idna2008=DISALLOWED:] - [:isUppercase=Yes:]




 ' 	U+0027	APOSTROPHE
 . 	U+002E	FULL STOP
 : 	U+003A	COLON
 _ 	U+005F	LOW LINE
 ֊ 	U+058A	ARMENIAN HYPHEN
 ‐ 	U+2010	HYPHEN
 ’ 	U+2019	RIGHT SINGLE QUOTATION MARK
 ‧ 	U+2027	HYPHENATION POINT
 ゠ 	U+30A0	KATAKANA-HIRAGANA DOUBLE HYPHEN



I'm assuming that all NV8 code points belong to [:idna2008=DISALLOWED:]. Just in case, I'll take an intersection of an explicitly enumerated NV8 set and  [[Id=Allowed] [Id=Aspir]].
[[:IdentifierStatus=Allowed:] [:Identifier_Type=Aspirational:]] & NV8 
has 5 characters.  U+2010 was blocked in bug 1323338 so that there are 4 characters to review. 

058a ARMENIAN HYPHEN
2010 HYPHEN
2019 RIGHT SINGLE QUOTATION MARK
2027 HYPHENATION POINT
30a0 KATAKANA-HIRAGANA DOUBLE HYPHEN
Of these, I think we should block 2019, 2027 and 30a0. The Armenian hyphen 058a looks less worrying, IMO.
Simon, tentatively assigning to you.  Feel free to find a different assignee.
Assignee: nobody → smontagu
Whiteboard: [necko-active]
(In reply to Jonathan Kew (:jfkthame) from comment #7)
> Of these, I think we should block 2019, 2027 and 30a0. The Armenian hyphen
> 058a looks less worrying, IMO.

If we made this list algorithmically, why would we then choose characters to block or not block manually?

Why is the Armenian hyphen less dangerous than the U+2010 hyphen?

Gerv
(In reply to Gervase Markham [:gerv] from comment #9)
> (In reply to Jonathan Kew (:jfkthame) from comment #7)
> > Of these, I think we should block 2019, 2027 and 30a0. The Armenian hyphen
> > 058a looks less worrying, IMO.
> 
> If we made this list algorithmically, why would we then choose characters to
> block or not block manually?
> 
> Why is the Armenian hyphen less dangerous than the U+2010 hyphen?

That comment was prompted by "there are 4 characters to review"; IMO it looks less visually confusable. But I think you're right, what we should do is simply block all of these, given their status in https://tools.ietf.org/html/rfc5892.
This adds the characters mentioned to our blacklist (except for U+2027, which was already present).
Attachment #8825592 - Flags: review?(smontagu)
Comment on attachment 8825592 [details] [diff] [review]
Update IDN blacklist

Review of attachment 8825592 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great otherwise.

::: intl/uconv/nsTextToSubURI.cpp
@@ +21,5 @@
>  static const char16_t sNetworkIDNBlacklistChars[] =
>  {
>  /*0x0020,*/
>            0x00A0, 0x00BC, 0x00BD, 0x00BE, 0x01C3, 0x02D0, 0x0337,
> +  0x0338, 0x0589, 0x058A, 0x05C3, 0x05F4, 0x0609, 0x060A, 0x066A, 0x06D4,

Making the line endings ragged like this gives me a bad feeling, but I suppose you will say that reformatting will make blame for the change unclear.
Attachment #8825592 - Flags: review?(smontagu) → review+
(In reply to Simon Montagu :smontagu from comment #12)
> Making the line endings ragged like this gives me a bad feeling, but I
> suppose you will say that reformatting will make blame for the change
> unclear.

Yes, given that we're just inserting three individual items into the list, but fixing the line lengths would make it appear completely changed, I felt the ragged line endings were the lesser of two evils in this case.
https://hg.mozilla.org/mozilla-central/rev/8742aaf20f07
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Comment on attachment 8825592 [details] [diff] [review]
Update IDN blacklist

Exactly like bug 1323338, which we uplifted to all current branches (see bug 1323338 comment 13 and following). This just adds 3 more characters to the blacklist.

While it is only classified as sec-moderate, it's a potential URL-spoofing issue, and the fix is simple and very safe.
Attachment #8825592 - Flags: approval-mozilla-esr45?
Attachment #8825592 - Flags: approval-mozilla-beta?
Attachment #8825592 - Flags: approval-mozilla-aurora?
Comment on attachment 8825592 [details] [diff] [review]
Update IDN blacklist

Fix a sec-moderate. Beta51+ & Aurora52+. Should be in 51 beta 14.
Attachment #8825592 - Flags: approval-mozilla-beta?
Attachment #8825592 - Flags: approval-mozilla-beta+
Attachment #8825592 - Flags: approval-mozilla-aurora?
Attachment #8825592 - Flags: approval-mozilla-aurora+
Group: network-core-security → core-security-release
Flags: qe-verify+
QA Contact: mwobensmith
Whiteboard: [necko-active] → [necko-active][post-critsmash-triage]
Comment on attachment 8825592 [details] [diff] [review]
Update IDN blacklist

Given that the fix is safe, let's include it in ESR45.7
Attachment #8825592 - Flags: approval-mozilla-esr45? → approval-mozilla-esr45+
I'm hitting conflicts uplifting this to esr45. Can we get a rebased patch for uplift?
Flags: needinfo?(smontagu)
Whiteboard: [necko-active][post-critsmash-triage] → [necko-active][post-critsmash-triage][adv-main51+][adv-esr45.7+]
Alias: CVE-2017-5394
Alias: CVE-2017-5394
Confirmed bug in Fx50.1.0.
Verified fixed in Fx51.0b14.
Status: RESOLVED → VERIFIED
Flags: qe-verify+
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: