Open
Bug 284394
Opened 19 years ago
Updated 2 years ago
XSLT <xsl:number format="" does not number schem for Unicode characters decimal value of 1 except 0x31
Categories
(Core :: XSLT, defect)
Core
XSLT
Tracking
()
NEW
People
(Reporter: FrankTang, Assigned: peterv)
References
(Depends on 1 open bug, )
Details
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0 The XSLT spec said " Any token where the last character has a decimal digit value of 1 (as specified in the Unicode character property database), and the Unicode value of preceding characters is one less than the Unicode value of the last character generates a decimal representation of the number where each number is at least as long as the format token. Thus, a format token 1 generates the sequence 1 2 ... 10 11 12 ..., and a format token 01 generates the sequence 01 02 ... 09 10 11 12 ... 99 100 101." Also, according to Unicode 4.0 database at http://www.unicode.org/Public/4.0-Update1/extracted/DerivedNumericValues-4.0.1.txt The following 51 unicode code points have value 1. 0031 ; 1.0 # Nd DIGIT ONE 00B9 ; 1.0 # No SUPERSCRIPT ONE 0661 ; 1.0 # Nd ARABIC-INDIC DIGIT ONE 06F1 ; 1.0 # Nd EXTENDED ARABIC-INDIC DIGIT ONE 0967 ; 1.0 # Nd DEVANAGARI DIGIT ONE 09E7 ; 1.0 # Nd BENGALI DIGIT ONE 09F4 ; 1.0 # No BENGALI CURRENCY NUMERATOR ONE 0A67 ; 1.0 # Nd GURMUKHI DIGIT ONE 0AE7 ; 1.0 # Nd GUJARATI DIGIT ONE 0B67 ; 1.0 # Nd ORIYA DIGIT ONE 0BE7 ; 1.0 # Nd TAMIL DIGIT ONE 0C67 ; 1.0 # Nd TELUGU DIGIT ONE 0CE7 ; 1.0 # Nd KANNADA DIGIT ONE 0D67 ; 1.0 # Nd MALAYALAM DIGIT ONE 0E51 ; 1.0 # Nd THAI DIGIT ONE 0ED1 ; 1.0 # Nd LAO DIGIT ONE 0F21 ; 1.0 # Nd TIBETAN DIGIT ONE 1041 ; 1.0 # Nd MYANMAR DIGIT ONE 1369 ; 1.0 # Nd ETHIOPIC DIGIT ONE 17E1 ; 1.0 # Nd KHMER DIGIT ONE 17F1 ; 1.0 # No KHMER SYMBOL LEK ATTAK MUOY 1811 ; 1.0 # Nd MONGOLIAN DIGIT ONE 1947 ; 1.0 # Nd LIMBU DIGIT ONE 2081 ; 1.0 # No SUBSCRIPT ONE 215F ; 1.0 # No FRACTION NUMERATOR ONE 2160 ; 1.0 # Nl ROMAN NUMERAL ONE 2170 ; 1.0 # Nl SMALL ROMAN NUMERAL ONE 2460 ; 1.0 # No CIRCLED DIGIT ONE 2474 ; 1.0 # No PARENTHESIZED DIGIT ONE 2488 ; 1.0 # No DIGIT ONE FULL STOP 24F5 ; 1.0 # No DOUBLE CIRCLED DIGIT ONE 2776 ; 1.0 # No DINGBAT NEGATIVE CIRCLED DIGIT ONE 2780 ; 1.0 # No DINGBAT CIRCLED SANS-SERIF DIGIT ONE 278A ; 1.0 # No DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ONE 3021 ; 1.0 # Nl HANGZHOU NUMERAL ONE 3192 ; 1.0 # No IDEOGRAPHIC ANNOTATION ONE MARK 3220 ; 1.0 # No PARENTHESIZED IDEOGRAPH ONE 3280 ; 1.0 # No CIRCLED IDEOGRAPH ONE 4E00 ; 1.0 # Lo CJK UNIFIED IDEOGRAPH-4E00 58F1 ; 1.0 # Lo CJK UNIFIED IDEOGRAPH-58F1 58F9 ; 1.0 # Lo CJK UNIFIED IDEOGRAPH-58F9 5F0C ; 1.0 # Lo CJK UNIFIED IDEOGRAPH-5F0C FF11 ; 1.0 # Nd FULLWIDTH DIGIT ONE 10107 ; 1.0 # No AEGEAN NUMBER ONE 10320 ; 1.0 # No OLD ITALIC NUMERAL ONE 104A1 ; 1.0 # Nd OSMANYA DIGIT ONE 1D7CF ; 1.0 # Nd MATHEMATICAL BOLD DIGIT ONE 1D7D9 ; 1.0 # Nd MATHEMATICAL DOUBLE-STRUCK DIGIT ONE 1D7E3 ; 1.0 # Nd MATHEMATICAL SANS-SERIF DIGIT ONE 1D7ED ; 1.0 # Nd MATHEMATICAL SANS-SERIF BOLD DIGIT ONE 1D7F7 ; 1.0 # Nd MATHEMATICAL MONOSPACE DIGIT ONE # Total code points: 51 But currently, XSLT only know how to handle the value 0x31 see http://lxr.mozilla.org/seamonkey/source/extensions/transformiix/source/xslt/txXSLTNumberCounters.cpp To fix this, we need to first make the txDecimalCounter constructor to take a Unicode code point as base character for '1'. Then we need to change the void txDecimalCounter::appendNumber(PRInt32 aNumber, nsAString& aDest) implementation to convert the number into decimal based on the number 1 Unicode character we passed in as unicode code point. Then, we need to change the switch statment in the txFormattedCounter::getCounterFor to consider those 50 characters for decimal. Reproducible: Always
Reporter | ||
Comment 1•19 years ago
|
||
In particular, "format="๑" specifies numbering with Thai digits" mentioned in XSLT spec does not work.
Reporter | ||
Comment 2•19 years ago
|
||
I don't recommend you use the decimal rule for those characters other than Nd category. to test it, use your normal <xsl:number test case and change format="1" to format="๑" or format="١" or format="১" etc
Reporter | ||
Comment 3•19 years ago
|
||
cc smontagu@smontagu.org
Reporter | ||
Comment 4•19 years ago
|
||
also see 284395
Reporter | ||
Comment 5•19 years ago
|
||
Also see http://www.w3.org/TR/2002/WD-css3-lists-20021107/ for more info
Depends on: 284420
Looking at the link mentioned in comment 0 this seems non-trivial. The problem is that apparently not all of these have the numbers in order in the unicode table. So for example SUPERSCRIPT ONE is 00B9, but SUPERSCRIPT ZERO is 2070 and SUBPERSCRIPT TWO is 00B2. Also, are all these base 10? Anyhow, my point is that we shouldn't just hack this into transformiix. This needs support from intl. I filed bug 284420 on that. Also, i think the xslt spec is wrong here. # and the Unicode value of preceding characters is one less than the Unicode # value of the last character Doesn't seem to work for the superscript numbers mentioned above. Or am I missunderstanding the term "Unicode value"?
Depends on: 284395
Comment 7•19 years ago
|
||
I understand "a decimal digit value of 1" to mean that the character should have the General Character "Nd" and "1" in field 6. That includes 26 characters, if my grepping is correct: 0031;DIGIT ONE 0661;ARABIC-INDIC DIGIT ONE 06F1;EXTENDED ARABIC-INDIC DIGIT ONE 0967;DEVANAGARI DIGIT ONE 09E7;BENGALI DIGIT ONE 0A67;GURMUKHI DIGIT ONE 0AE7;GUJARATI DIGIT ONE 0B67;ORIYA DIGIT ONE 0BE7;TAMIL DIGIT ONE 0C67;TELUGU DIGIT ONE 0CE7;KANNADA DIGIT ONE 0D67;MALAYALAM DIGIT ONE 0E51;THAI DIGIT ONE 0ED1;LAO DIGIT ONE 0F21;TIBETAN DIGIT ONE 1041;MYANMAR DIGIT ONE 17E1;KHMER DIGIT ONE 1811;MONGOLIAN DIGIT ONE 1947;LIMBU DIGIT ONE FF11;FULLWIDTH DIGIT ONE 104A1;OSMANYA DIGIT ONE 1D7CF;MATHEMATICAL BOLD DIGIT ONE 1D7D9;MATHEMATICAL DOUBLE-STRUCK DIGIT ONE 1D7E3;MATHEMATICAL SANS-SERIF DIGIT ONE 1D7ED;MATHEMATICAL SANS-SERIF BOLD DIGIT ONE 1D7F7;MATHEMATICAL MONOSPACE DIGIT ONE
Comment 8•19 years ago
|
||
(In reply to comment #7) Simon: I think you are are right except 0BE7 ; 1.0 # Nd TAMIL DIGIT ONE TAMIL DIGIT ZERO does not exist in Unicode. I think it is a good idea to refactor the number generation part into intl
Well, it it's just those chars and they are all consecutive unichar values and they are all base 10 then it should be fine to do in transformiix. Though it would be nice with a function like: PRBool IsZeroDigit(PRUnichar c) Though i'm not sure how feasable it is to do non-bmp-0 characters. I don't know how much support mozilla has for that in general. Are there stringiterators that can iterate UTF-16 strings and expose decoded unichar values? Do we even have a datatype like PRUnichar that is 32bit?
Btw, how do you write 10 or 101 in tamil numbers if you don't have a zero digit?
Comment 11•19 years ago
|
||
(In reply to comment #10) > Btw, how do you write 10 or 101 in tamil numbers if you don't have a zero digit? They do have TAMIL NUMBER TEN U+0BF0 TAMIL NUMBER ONE HUNDRED U+0BF1 TAMIL NUMBER ONE THOUSAND U+0BF2 How they been used is not clear. I suggest you stay away from it for now.
Comment 12•19 years ago
|
||
(In reply to comment #11) > (In reply to comment #10) > > Btw, how do you write 10 or 101 in tamil numbers if you don't have a zero digit? > > They do have > TAMIL NUMBER TEN U+0BF0 > TAMIL NUMBER ONE HUNDRED U+0BF1 > TAMIL NUMBER ONE THOUSAND U+0BF2 > > How they been used is not clear. I suggest you stay away from it for now. > > If you REALLY REALLY care about TAMIL number system. read http://weblogs.asp.net/michkap/archive/2005/01/24/359347.aspx "It is an additive and positional system (unlike Roman numerals, there is no subtraction involved) that has no zero but includes characters for 10, 100, and 1000. In the traditional system the number 3,782 would be represented as ௩௲௭௱௮௰௨ (literally Three-Thousand(s)-Seven-Hundread(s)-Eight-Ten(s)-Two). At least since the early 1800s, however, usage of the Tamil numerals as digits has been more and more common. Thus the number 3,782 would often be represented as ௩௭௮௨ (literally 3782). "
Comment 13•19 years ago
|
||
(In reply to comment #8) > Simon: I think you are are right except > 0BE7 ; 1.0 # Nd TAMIL DIGIT ONE > TAMIL DIGIT ZERO does not exist in Unicode. It's being added (at 0BE6) in Unicode 4.1, due to be released this month.
Comment 14•19 years ago
|
||
(In reply to comment #9) > PRBool IsZeroDigit(PRUnichar c) > > Though i'm not sure how feasable it is to do non-bmp-0 characters. |PRBool IsZeroDigit(PRUint32 c)| would be better. > I don't know how much support mozilla has for that in general. > Are there stringiterators that > can iterate UTF-16 strings and expose decoded unichar values? Currently, it's done 'manually' (check whether the current 'char' is surrogate or not, etc...) as necessary in a few places. It might be a good idea to add this iterator to 'nsAString'. > Do we even have a datatype like PRUnichar that is 32bit? PRUint32 :-) We don't have UCS4 string classes (perhaps, we'll never have...) except that we have ns(Value)Array. Adding the editor of the XSLT to Cc
Updated•15 years ago
|
QA Contact: keith → xslt
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•