Bidi: Use the "lang" attribute to determine whether to use Arabic number ordering in ambiguous context (UAX #9 section 4.3, HL2)

NEW
Unassigned

Status

()

Core
Layout: Text
--
enhancement
16 years ago
7 years ago

People

(Reporter: Kyae-Young Kim, Unassigned)

Tracking

Trunk
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [oracle-nls])

Attachments

(2 attachments)

(Reporter)

Description

16 years ago
Env. : Mozilla 1.0, Windows 2000, locale is Arabic[Egypt]

Des. : Date Format for RTL direction is not correct.

Reproducible step:   
    1) Load the test case
    2) LTR direction is displayed correctly but RTL test case 
       should be displayed "2002/06/08"
 
Test case is
 
<HTML DIR=RTL LANG=AR>
<HEAD>
<meta charset="UTF-8">
</HEAD>
<BODY>
<B>Date Format for RTL</B><BR>
<INPUT TYPE="text" VALUE="28/06/2002"><BR>
<BR>
<B>Date Format for LTR</B><BR>
<INPUT TYPE="text" DIR=LTR VALUE="28/06/2002">
</BODY>
</HTML
I am really unsure about this case. First of all, I want to note that it's not
100% true that the date should be displayed as 2002/06/28 in RTL direction. This
is only the case in Arabic, not in Hebrew, and this distinction is part of the
rules for weak types in the Unicode Bidi Algorithm, starting at
http://www.unicode.org/unicode/reports/tr9/#W1 

Mozilla does indeed display the date as 2002/06/28 in an unambiguous Arabic
context (try entering an Arabic character before the date in the input control),
but should a text input control with no alphabetical characters of any kind,
inside a document with LANG=AR, be considered as in an Arabic context by default?

The HTML standard says (http://www.w3.org/TR/html401/struct/dirlang.html#h-8.2):
"User agents must not use the lang attribute to determine text directionality"
but I'm not sure that that covers this situation.

The Unicode Bidi Algorithm says
(http://www.unicode.org/unicode/reports/tr9/#Bidirectional_Conformance): "The
following are permissible ways for systems to apply higher-level protocols to
the ordering of bidirectional text ...  Override the number handling to use
information provided by a broader context.
For example, information from other paragraphs in a document could be used to
conclude that the document was fundamentally Arabic, and that EN should
generally be converted to AN." 

That sounds to me as if our current behaviour complies to the body of the
standard, and this bug report is an RFE that we implement that particular
"permissible" behaviour, using the lang attribute as the "information from other
paragraphs in the document". Kyae-Young, are you happy with that assessment?

cc-ing Hixie and Roozbeh
Using xml:lang="" and <html:* lang=""> attributes to determine the language
context of an element seems reaonable to me. Make sure you use the language
context of the element in which you find the numbers, though, not the element on
which the direction is set or the root element or anything arbitrary like that.
Simon: thanks for cc'ing me, btw
>Make sure you use the language
>context of the element in which you find the numbers, though, not the element on
>which the direction is set or the root element or anything arbitrary like that.

Wait a minute -- the lang attribute inherits, right? So in the test case here
doesn't that mean that *both* dates should appear as 2002/06/28?
Created attachment 87753 [details]
Expanded test case
I don't really understand bidi rules, so I don't know really, sorry!
(Reporter)

Updated

16 years ago
Status: UNCONFIRMED → NEW
Ever confirmed: true

Updated

14 years ago
Whiteboard: [oracle-nls]
Unicode 4.0.1 changed the classification of the SOLIDUS (slash) character from
"European Separator" to "Common Separator". Mozilla is apparently now following
the newer standard, resulting in all dates on the attached testcase to appear as
28/06/2002.
However, everything said above is still relevant if you replace the slash with a
hyphen (-), which is still a ES in the newer standard. I'll attach a modified
testcase soon.
Created attachment 168476 [details]
modified testcase

This is the same as the original testcase, except:

1. The slashes are replaced by hyphens, to achieve with Mozilla versions
supporting Unicode >=4.0.1 the same behaviour the original testcase used to
display with Mozilla versions implementating older versions of the standard.

2. The dates are given as regular text, in addition to being the contents of
input controls, to demonstrate that the issue is not specific to input
controls.
*** Bug 270914 has been marked as a duplicate of this bug. ***
=> All/All
OS: Windows 2000 → All
Hardware: PC → All
More descriptive summary, and change severity to RFE, per comment #1.
Severity: normal → enhancement
Summary: Bidi : Date Format for RTL direction is not correct. → Bidi: Use the "lang" attribute to determine whether to use Arabic number ordering in ambiguous context (UAX #9 section 4.3, HL2)

Updated

10 years ago
Component: Layout: BiDi Hebrew & Arabic → Layout: Text
QA Contact: zach → layout.fonts-and-text

Updated

7 years ago
Assignee: mozilla → nobody
You need to log in before you can comment on or make changes to this bug.