Closed Bug 467347 Opened 16 years ago Closed 16 years ago

Fix Plural Rule #12, Arabic, to have 6 forms

Categories

(Core :: Internationalization: Localization, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla1.9.2a1

People

(Reporter: linostar, Assigned: Mardak)

References

()

Details

(Keywords: fixed1.9.1)

Attachments

(1 file, 1 obsolete file)

The current plural rule is incorrect (because it doesn't represent number 0 and numbers > 100 correctly).

Since Arabic is the only language that uses this rule, please change it to the following:

nplurals=6; plural=n==0 ? 0 : n==1 ? 1 : n==2 ? 2 : n%100>=3 && n%100<=10 ? 3 : n%100>=11 && n%100<=99 ? 4 : 5;


Thanks
Edward, do you remember what happens if the strings in a localization don't match the pluralization rule? In particular, if the number of plurals don't match?
If there aren't enough plural forms in the localized string, it'll default to the first item.

Is there much issue about backwards compatibility when changing the plural form function? Ideally it should never happen, so might as well change the function and update all the strings?
The current arabic rule #12 came from bug 413670 comment 8 with link:
http://www.eglug.org/arabize#comment-361

Perhaps just to be safe, we can create a new number just like we did for Irish Gaelic vs Scottish Gaelic.
The current form is 

[4, function(n) n==1?0:n==2?1:n<=10?2:3]

and if we pick something compatible (which the suggested version in comment 0 wouldn't be), I guess that fixing the existing function and keeping it in 12 should be good, without too many back-wards compat hassles. If we can create the function such that numbers less than 60 are the same plural form, that is.

And then we need a follow-up patch for the actual localization work.
Re comment 3, I looked at that link, but the actual functions didn't match. The arabic po file linked in the bug is a 404 now, so I couldn't cross check on that.

As I think that the plural form should be compatible enough, and not shared with anything out there, so I'd rather not add a new number. We already have a few unused plural forms ;-)
(In reply to comment #0)
> nplurals=6; plural=n==0 ? 0 : n==1 ? 1 : n==2 ? 2 : n%100>=3 && n%100<=10 ? 3 :
> n%100>=11 && n%100<=99 ? 4 : 5;

(In reply to comment #4)
> [4, function(n) n==1?0:n==2?1:n<=10?2:3]
So you would prefer something that is more backwards compatible:

[6, function(n) n==0?4:n==1?0:n==2?1:n%100>=3&&n%100<=10?2:n%100>=11&&n%100<=99?3:5]

Where the current localized string would be..
1;2;3-10;others

now becomes..
1;2;x03-x10;x11-x99;0;others
Summary: Request for changing Plural Rule #12 → Fix Plural Rule #12, Arabic, to have 6 forms
Attached patch v1 (obsolete) — Splinter Review
Make somewhat backwards compatible with existing 1;2;3-10;others
Assignee: nobody → edilee
Status: NEW → ASSIGNED
Attachment #350814 - Flags: review?(l10n)
Comment on attachment 350814 [details] [diff] [review]
v1

r?Anas just to double check that only x00,x01,x02 use the last plural form.

I wonder if we should make 11 <= val <=99 case the "others" case and specially check 0 <= val <= 2 after checking for 0 first.
Attachment #350814 - Flags: review?(linux.anas)
(In reply to comment #8)
> (From update of attachment 350814 [details] [diff] [review])
> r?Anas just to double check that only x00,x01,x02 use the last plural form.
> 
> I wonder if we should make 11 <= val <=99 case the "others" case and specially
> check 0 <= val <= 2 after checking for 0 first.

No, the last rule is used by all numbers that are greater or equal to 100.

In fact, there are two methods to read Arabic numbers: the first is from left to right (like in English), where in this case numbers of the form xyy has the same plural formula as yy (that's what the current formula should do but it didn't). The second method is reading numbers from right to left (units, then tens, then hundreds, etc), thus all the numbers in the form xyy has the same plural formula as x00.

About the backward compatibility, I think it won't be a problem if I fix all the plural form messages at once.
Wait, so the formula that you gave in the original post isn't correct? So it should then be..

n==0 ? A : n==1 ? B : n==2 ? C : n<=10 ? D : n<= 99 ? E : F

The matching testcase would look like..

// 0-9, 10-19, 20-29, 30-39, ...
A,B,C,D,D,D,D,D,D,D,
D,E,E,E,E,E,E,E,E,E,
E,E,E,E,E,E,E,E,E,E,
E,E,E,E,E,E,E,E,E,E,
...
// 100+
F,F,F,F,F,F,F,F,F,F,
Sorry, I should clarify myself better than that (never post a comment when you're hungry :p).

The new formula (mentioned in the first post) is a *fix* for the old formula which is currently used. As I said, there are two methods to read Arabic numbers, but the first one is the most common. Thus, fixing the first is better than implementing the second. Based on that, I will re-answer your question in comment 8:

I think that changing the order may confuse the localizers (since the new formula is wide-used now, e.g. in Gnome, KDE, OpenOffice). However, if it's only a temporary change until all plural messages get fixed (for backward compatibility only), it may not hurt much.
I'm still confused.

(In reply to comment #11)
> The new formula is a *fix* for the old formula which is currently used.
(In reply to comment #9)
> No, the last rule is used by all numbers that are greater or equal to 100.

Those two statements contradict because n%100>=3 && n%100<=10 matches 3-10 AND 103-110 AND 203-210, etc.

The revised formula in comment 10 would implement what you said in comment 9.
Forget about comment #9, there I explained the second method of reading Arabic number and I answered as if the new formula represent the second method, which is not the case. It is true that the formula in comment 10 represents what I've said in comment 9 correctly, but this method is not used in l10n (it is less common in Arabic language). So go back to the formula in the original post which I described in comment 11.

It's my mistake. Sorry for confusing you.
Let's be clear here. Reading or writing order of arabic numbers is, I guess, totally irrelevant. The file in which this formula is encoded is in ASCII, thus numbers are encoded in LTR.

Unless the plural form doesn't actually depend on the physical amount of items, but on the, say, last digit in reading order. I'm not excluding being surprised, but I'd be heck surprised.
Edward, the proper order of the formula elements should be like the following:

[6, function(n) n==0?0:n==1?1:n==2?2:n%100>=3&&n%100<=10?3:n%100>=11&&n%100<=99?4:5]
"Proper order" is totally arbitrary, but some orders might lead to less confusion. The order I put in the patch is to maintain a little bit of backwards compatibility with the current order per Axel's suggestion. But yes, we could twiddle the numbers around and get it in some "right order" if you think it should be that way.
A friend suggested, in order to preserve the backward compatibility and decrease the confusion that localizers may feel, that the case where n==0 is shifted to the end of the condition rule (that's to be the 6th condition case). That way, all numbers will be in natural order except for 0.

The formula will become:
n==0?5:n==1?0:n==2?1:n%100>=3&&n%100<=10?2:n%100>=11&&n%100<=99?3:4

I hope this will not contradict with back compat.
Comment on attachment 350814 [details] [diff] [review]
v1

the formula to be as the following: 
n==0?5:n==1?0:n==2?1:n%100>=3&&n%100<=10?2:n%100>=11&&n%100<=99?3:4

so the case where n==0 becomes the last case.
Attachment #350814 - Flags: review?(linux.anas) → review-
Attachment #350814 - Flags: review?(l10n) → review+
Comment on attachment 350814 [details] [diff] [review]
v1

Now let's get funky, I'll give you an r+ with comments. I think that Anas' plural function makes sense, let's go with that. r=me with that change (and the corresponding changes to the test, I assume).
Attached patch v1.1Splinter Review
Update the arabic form and add tests to check each plural form of each plural rule.
Attachment #350814 - Attachment is obsolete: true
Attachment #353442 - Flags: review?(smontagu)
Attachment #353442 - Flags: review?(smontagu) → review+
Comment on attachment 353442 [details] [diff] [review]
v1.1

(In reply to comment #14)
> Let's be clear here. Reading or writing order of arabic numbers is, I guess,
> totally irrelevant. The file in which this formula is encoded is in ASCII, thus
> numbers are encoded in LTR.
> 
> Unless the plural form doesn't actually depend on the physical amount of items,
> but on the, say, last digit in reading order. I'm not excluding being
> surprised, but I'd be heck surprised.

Just FYI: "reading order" here doesn't mean what it usually means when talking about Bidi issues. In Arabic a number like 1959 is written and read from left to right as *digits*, but as *words* it's read (in Modern Arabic) "one thousand and nine hundred and nine and fifty", and the last number read is what determines the plural.
http://hg.mozilla.org/mozilla-central/rev/af64ac164e44

Updated the wiki:
https://developer.mozilla.org/en/Localization_and_Plurals#Plural_rule_.2312_(6_forms)
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.2a1
Attachment #353442 - Flags: approval1.9.1?
Comment on attachment 353442 [details] [diff] [review]
v1.1

Pike, we'll probably want this for 3.1?
Yes, definitely.
Requesting blocking1.9.1, to get on that triage list.
Flags: blocking1.9.1?
Comment on attachment 353442 [details] [diff] [review]
v1.1

a191=beltzner
Attachment #353442 - Flags: approval1.9.1? → approval1.9.1+
Flags: blocking1.9.1? → blocking1.9.1+
Priority: -- → P2
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: