Closed Bug 922530 Opened 11 years ago Closed 11 years ago

U+061C ARABIC LETTER MARK displays as tofu (i.e. is not invisible)

Categories

(Core :: Layout: Text and Fonts, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla27

People

(Reporter: aharon, Assigned: smontagu)

References

Details

Attachments

(1 file, 3 obsolete files)

What steps will reproduce the problem?
1. Navigate to data:text/html,a%26%23x061c;1+2

What is the expected output?
It should look like this:
a2+1

What do you see instead?
Same as above, but with tofu (a rectangle) to the right of the 1.

ARABIC LETTER MARK (ALM) was introduced in Unicode 6.3, launched today. It is supposed to be invisible, just like the veteran U+200F RLM, but with bidi class AL (ARABIC LETTER) instead of R (RIGHT TO LEFT).

Currently (released version 24 as well as Nightly 27.0a1), it behaves as bidi class AL, but is *not* invisible. That's the bug.
Depends on: 922550
OS: Windows 7 → All
Hardware: x86_64 → All
I am not so sure that "Depends on: 922550" makes sense. This is a trivial, independent fix that should be made separately (and long before) the much more difficult task of supporting isolates and paired brackets (the other bidi innovations in Unicode 6.3). I would say that 922550 depends on this.
"Support Unicode 6.3's version of the Bidi Algorithm" is a third bug that indeed depends on this (though I'm optimistically hoping that we'll get it more or less for free from upstream ICU).

It's also probably true that the patch for bug 922550, if it parallels similar patches in the past, won't fix this and it will need to be patched separately. The dependency is a little unusual: until bug 922550 is fixed, we only support Unicode version 6.2, U+061C is an undefined codepoint, and the current behaviour in this bug is as expected. Fixing bug 922550 is required to make this bug not INVALID.

Also, it may not be quite as trivial as expected: there are IIRC more than one place in the code where we special-case bidi control characters, and also optimizations depending on all the control characters being close together in the same Unicode block, which no longer holds.
> "Support Unicode 6.3's version of the Bidi Algorithm" is a third bug
> that indeed depends on this

BTW, does it have a bug number? Do you want me to file one?

> (though I'm optimistically hoping that we'll get it more or less for free from upstream ICU).

Mati has indeed implemented all the new 6.3 features in ICU's UBA. I believe it is about to be released. Do you in fact plan to switch to ICU for the UBA?
Blocks: 922963
This adds all five new Bidi control characters everywhere I could find that defines Bidi control characters or default ignorable characters (it's depressing how many hard coded lists of ignorable characters there were).

Also Aharon's testcase that U+061C functions as an invisble Arabic letter, and a test that it still triggers Punycode when it appears in a URL so can't be used for spoofing.
Assignee: nobody → smontagu
Attachment #820225 - Flags: review?(jfkthame)
Fixed a couple of typos in comment
Attachment #820225 - Attachment is obsolete: true
Attachment #820225 - Flags: review?(jfkthame)
Attachment #820226 - Flags: review?(jfkthame)
Attachment #820226 - Attachment is obsolete: true
Attachment #820226 - Flags: review?(jfkthame)
Attachment #820236 - Flags: review?(jfkthame)
Comment on attachment 820236 [details] [diff] [review]
Add the new Bidi control characters v.2 for real

Review of attachment 820236 [details] [diff] [review]:
-----------------------------------------------------------------

::: intl/unicharutil/util/nsBidiUtils.cpp
@@ +99,1 @@
>            ((aChar)&0xfffffe)==LRM_CHAR);

I'd love to see this unified with the IS_BIDI_CONTROL_CHAR macro in nsBidiUtils.h -- either by replacing the body of this function with that macro, or replacing the macro (and its users) with an inline version of this function. We shouldn't be listing the same set of chars separately in both places.

::: uriloader/exthandler/nsExternalHelperAppService.cpp
@@ +1134,5 @@
> +    PRUnichar(0x202e), // Right-to-Left Override
> +    PRUnichar(0x2066), // Left-to-Right Isolate
> +    PRUnichar(0x2067), // Right-to-Left Isolate
> +    PRUnichar(0x2068), // First Strong Isolate
> +    PRUnichar(0x2069)  // Pop Directional Isolate

Shouldn't we remove LRM/RLM/ALM here, too? Sprinkling those among otherwise-neutral chars in the name could provide spoofing opportunities, I would think.
Addressed review comments
Attachment #820236 - Attachment is obsolete: true
Attachment #820236 - Flags: review?(jfkthame)
Attachment #820848 - Flags: review?(jfkthame)
Comment on attachment 820848 [details] [diff] [review]
Add the new Bidi control characters v.3

Review of attachment 820848 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, thanks!

::: intl/unicharutil/util/nsBidiUtils.h
@@ +95,5 @@
> +   inline bool IsBidiControl(uint32_t aChar) {
> +     return ((LRE_CHAR <= aChar && aChar <= RLO_CHAR) ||
> +             (LRI_CHAR <= aChar && aChar <= PDI_CHAR) ||
> +             (aChar == ALM_CHAR) ||
> +             ((aChar)&0xfffffe)==LRM_CHAR);

nit: spaces around & and == operators, please; also, (aChar) doesn't need parens here.
Attachment #820848 - Flags: review?(jfkthame) → review+
https://hg.mozilla.org/mozilla-central/rev/30a7b37f0892
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla27
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: