Closed Bug 168944 Opened 22 years ago Closed 16 years ago

hebrew numbers 5 and 5000 written the same

Categories

(Core :: Internationalization, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: ian, Assigned: mkaply)

References

()

Details

(Keywords: testcase)

Attachments

(1 file)

I don't know if this is a bug.

Our list-style-type:hebrew numbering system gives the same output for the
numbers five (5) and five thousand (5000), as shown in:

   http://www.hixie.ch/tests/adhoc/css/box/list/list-style-type/001.xml

Is this the right behaviour? From a western point of view it seems wrong, but I
can't find anyone which says that the numbers 5 and 5000 are different in
Hebrew. How do you write "look at page 5000, then look at page 5, and..."?
Keywords: testcase
The only reference I could find on Google is
http://www.qsm.co.il/Hebrew/Gimatria.htm but it doesn't have much of an explanation.
That's the document ftang used to implement the algorithm.
The well-structured and well-commented code to generate the list text is here, 
I believe:

http://lxr.mozilla.org/seamonkey/source/layout/html/base/src/nsBulletFrame.cpp#7
88
This weakness is inherent in the Hebrew numbering system. I was aware of it when
working on the IBMBIDI parts of the Hebrew numbering code, but thought that we
could live with it because in an actual numbered list, the distinction will
normally be clear from the context of the previous and or subsequent numbers. 

Alternatively, maybe we should bail out to decimal at some upper limit such as
999,999 or even 999.

Based on your comments on IRC, we could go up to at least 999,999 (using the
special "thousands" word). Can you chain thousands to go up to a million? (As
in, one thousand thousand?)

We should definitely fallback on to decimal for numbers outside the range of the
numbering system.
Status: UNCONFIRMED → NEW
Ever confirmed: true
I think we may have been taking each other out of context on IRC. Using the
"thousands" word works for an isolated number, but I'm not sure about it as part
of a numbered list.

Firstly, the transition is unnatural, as you can see from the English equivalent:

4,998
4,999
5 thousand
5,001
5,002

Secondly, there are special cases that need to be handled. "Beit
Alef-Lamed-Pe-Yud-Mem Sofit" is a solecism for 2,000 (it should be
"Alef-Lamed-Pe-Yud-Yud-Mem Sofit") and "Alef Alef-Lamed-Pe-Yud-Mem Sofit" for
1,000 is impossible (it should be "Alef-Lamed-Pe Sofit").

With those caveats, chaining is certainly an option. n * 10^6 could be expressed
as "[0x5cf+n] Alef-Lamed-Pe-Yud Alef-Lamed-Pe-Yud-Mem Sofit"; n * 10 ^ 9 as
"[0x5cf+n] Alef-Lamed-Pe-Yud Alef-Lamed-Pe-Yud Alef-Lamed-Pe-Yud-Mem Sofit" and
so on.
Sorry, I didn't use the correct Unicode terms. For Mem Sofit and Pe Sofit, read
Final Mem and Final Pe.
Sigh. The hebrew numbering system is so ridiculously complicated.

Anyway. Because the list style numbering systems are going to be used in other
contexts too, in CSS3, we need to find an algorithm that does the Right Thing
whether in a list or whether standing on its own. Is that possible?

Given that lists are not likely to reach 4999, 5000, 5001, I think it is safe to
ignore the problem with 5000 looking out of place there. If the number 5000 is
involved, it is more likely to be reached in large steps (4000, 5000, 6000) or
stand on its own (through the conversion of an attribute value to a number, for
instance, if certain CSS3 proposals happen).
Could the "other contexts" you mention include numbers embedded in text? The
Right Thing in that case is different, I'm afraid. I'll try to work up a mini
white paper which lays out the issues and possible solutions better than I can
do in bugzilla comments.
That would be excellent, thanks.
 
> Our list-style-type:hebrew numbering system gives the same output for the
> numbers five (5) and five thousand (5000), as shown in:
> 
>    http://www.hixie.ch/tests/adhoc/css/box/list/list-style-type/001.xml

At any case, the heh should be separated with a geresh from the rest of the
number, isn't that currect Simon?

>Sigh. The hebrew numbering system is so ridiculously complicated.

That was always my opinion wrt the roman numbering system (I II III IV V etc.)
(In reply to comment #12)
> At any case, the heh should be separated with a geresh from the rest of the
> number, isn't that currect Simon?

Not in my opinion, see the document linked in comment 11:
" When numbers appear in isolation, e.g. as page numbers or as list indexes,
they should be written with the letters alone. If they appear embedded in other
text, punctuation marks are added to clarify that they are numbers and not words."
Testcase from http://www.w3.org/TR/css3-lists/#hebrew0.

This testcase shows that currently, Hebrew numbering over 1000 is massively
flawed: not only are 2 and 2000 indistinguishable, numbers like 2001 are
backwards.

Patch coming up.
If you're able to make a patch, does that mean you understand what the algorithm
should be? Because the algorithm in that spec is known to be wrong, although
nobody can tell me exactly how to fix it.
Never mind about posting a patch; this needs more discussion.

What exactly was the complaint with the CSS3 draft's version?  I might be able
to figure out something that takes that into account.  Excepting punctuation and
the fact that numbering over 1000 isn't well-defined, I'm pretty sure there
isn't any better way than what CSS3 Lists says.

Another option is to cap Hebrew numbers at 999.  I think this is viable, since
numbers over 999 are rarely used.

Another option is to use repeating "tav"'s, which would be unwieldy but correct.

By the way, this bug shouldn't be marked Windows 2000.
We don't want to cap at 999, counters can start at any arbitrary number and
hebrew numbering is possible above 999.

I don't know what the error in the algorithm is. I was just told that the
current text was not correct in all cases.
=> All/All and => Internationalization, since this isn't really a layout issue
(and certainly not a bidi issue). 
Component: Layout: BiDi Hebrew & Arabic → Internationalization
OS: Windows 2000 → All
Hardware: PC → All
Summary: hebrew 5 and 5000 written the same → hebrew numbers 5 and 5000 written the same
Depends on: 413928
Fixed by bug 413928. We now display 5 as ה, ‎5000 as ה׳, and 5000000 as 5000000, all of which is the same as Safari.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: