Closed Bug 467672 Opened 11 years ago Closed 11 years ago

need additional bidi.numerals setting for Persian (eastern Arabic-Indic) numerals

Categories

(Core :: Layout: Text and Fonts, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla1.9.2a1

People

(Reporter: jfkthame, Assigned: ehsan)

References

Details

(Keywords: fixed1.9.1, user-doc-needed)

Attachments

(2 files, 2 obsolete files)

User-Agent:       Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1b3pre) Gecko/20081202 Minefield/3.1b3pre
Build Identifier: 

In addition to the option of displaying numerals as the "Arabic-Indic" characters U+0660..0669, we should have an option that maps them to the "Eastern Arabic-Indic" characters U+06F0..06F9. These are the proper codepoints for use in Persian, as well as other languages such as Urdu.


Reproducible: Always
Yes, we definitely should do that.  In fact, I was going to file this bug!
Assignee: nobody → ehsan.akhgari
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Version: unspecified → Trunk
I think perhaps the way to do this is to have a separate pref for the Unicode value of zero in local numerals, set to 1632 (0x660) by default.
Version: Trunk → unspecified
(In reply to comment #2)
> I think perhaps the way to do this is to have a separate pref for the Unicode
> value of zero in local numerals, set to 1632 (0x660) by default.

So, setting this pref to a random value could produce random glyphs instead of digits?

I was thinking about having bidi.numeral.type, its default value being 1 mapping to U+0660, and its 2 value mapping to U+06F0.  Or we could have a bool pref bidi.numeral.persian to be false by default for Arabic-Indic and true for Eastern Arabic-Indic digits, in case we don't expect to add support for another digits range in future (is there even another digits range in Unicode?)
Version: unspecified → Trunk
(In reply to comment #3)
> So, setting this pref to a random value could produce random glyphs instead of
> digits?

There are two things I don't like about my idea. This is one of them, and the other is having to use decimal for Unicode values. 

> 
> I was thinking about having bidi.numeral.type, its default value being 1
> mapping to U+0660, and its 2 value mapping to U+06F0.

The downside of this is that it's much less obvious what the values mean, but all the other bidi prefs have that disadvantage already anyway, so I guess that's the way to go.

>  Or we could have a bool
> pref bidi.numeral.persian to be false by default for Arabic-Indic and true for
> Eastern Arabic-Indic digits, in case we don't expect to add support for another
> digits range in future (is there even another digits range in Unicode?)

There are 37 ranges of decimal digits in Unicode. That includes things like MATHEMATICAL SANS-SERIF BOLD, and I'm not saying we should support all of them with this pref, but we should probably support at least the Indian (i.e. DEVANAGARI, BENGALI ,GURMUKHI etc, not Arabic-Indic) and South-East Asian ones that are supported by the regional options in Windows Control Panel.
(In reply to comment #4)
> The downside of this is that it's much less obvious what the values mean, but
> all the other bidi prefs have that disadvantage already anyway, so I guess
> that's the way to go.

OK.

> There are 37 ranges of decimal digits in Unicode. That includes things like
> MATHEMATICAL SANS-SERIF BOLD, and I'm not saying we should support all of them
> with this pref, but we should probably support at least the Indian (i.e.
> DEVANAGARI, BENGALI ,GURMUKHI etc, not Arabic-Indic) and South-East Asian ones
> that are supported by the regional options in Windows Control Panel.

Hmm, are these relevant with bidi.numeral?  In bidi.numeral we do context detection based on Arabic letters, and I guess these would need other forms of context detection, hence other prefs.  

Let me rephrase my question: are there other Unicode digits ranges which are relevant to the bidi.numeral pref values?
A followup question that occurs to me: Should "bidi.numerals = IBMBIDI_NUMERAL_PERSIAN" (or whatever value that ends up being called) mean that not only ASCII digits 0030..0039 get mapped to 06Fx, but also map the standard Arabic-Indic digits from 066x to their Persian counterparts at 06Fx? Similarly, should the 06Fx digits be mapped back to 066x when "bidi.numerals = IBMBIDI_NUMERAL_HINDI" is in effect? Or should we not attempt to map between the two sets of digits from the 06xx Arabic block, only between ASCII and the one selected column of 06xx digits?

If we do extend bidi.numerals with new values for Persian, do we want to maintain the existing settings unchanged, and append IBMBIDI_NUMERAL_PERSIANCONTEXT=5 and IBMBIDI_NUMERAL_PERSIAN=6 or something like that? This gives a rather peculiar sequence of settings, but is better for backward compatibility.

Or should we say that as this feature hasn't been working anyway, backward compatibility with existing profile settings doesn't matter much and we can reorganize the prefs in a more logical sequence, or perhaps split into two settings: "gfx.text.numerals.mode" (3 settings: no change/contextual/force) and "gfx.text.numerals.script" (an integer representing a particular script/language-specific digit set, or "....locale" if we want to make it a little less cryptic, and go to the trouble of mapping locales to digit sets.) This would allow us to easily extend this feature to the non-bidi scripts as well.

Come to think of it, we should have the locale-digits information somewhere already if we do locale-specific number formatting....
(In reply to comment #6)
> A followup question that occurs to me: Should "bidi.numerals =
> IBMBIDI_NUMERAL_PERSIAN" (or whatever value that ends up being called) mean
> that not only ASCII digits 0030..0039 get mapped to 06Fx, but also map the
> standard Arabic-Indic digits from 066x to their Persian counterparts at 06Fx?
> Similarly, should the 06Fx digits be mapped back to 066x when "bidi.numerals =
> IBMBIDI_NUMERAL_HINDI" is in effect? Or should we not attempt to map between
> the two sets of digits from the 06xx Arabic block, only between ASCII and the
> one selected column of 06xx digits?

I gave this a lot of thought.  Initially I was thinking that we only want conversion between ASCII and either one of 066x or 06Fx.  But given more thought, the main purpose of these prefs are for users to see numbers in their native language shapes.  So it's worthwhile to try to show the correct glyphs.

It is very common for Persian speaking users to encounter Persian web pages which mistakenly use Arabic digit forms, so this will be especially useful to Persian users of Mozilla.

> If we do extend bidi.numerals with new values for Persian, do we want to
> maintain the existing settings unchanged, and append
> IBMBIDI_NUMERAL_PERSIANCONTEXT=5 and IBMBIDI_NUMERAL_PERSIAN=6 or something
> like that? This gives a rather peculiar sequence of settings, but is better for
> backward compatibility.

I think for the purpose of this bug, I'll go with a boolean bidi.numeral.persian pref, which defaults to false.  This shouldn't happen the default behavior (because bidi.numeral is 0 by default, so bidi.numeral.persian is ignored) and should be toggled to true for Persian users, and left false for Arabic users.

If we decide to support other Unicode digits ranges in the future, they'd probably need their own prefs anyway, since they don't use the context detection in place for bidi.numeral.

> Or should we say that as this feature hasn't been working anyway, backward
> compatibility with existing profile settings doesn't matter much and we can
> reorganize the prefs in a more logical sequence, or perhaps split into two
> settings: "gfx.text.numerals.mode" (3 settings: no change/contextual/force) and
> "gfx.text.numerals.script" (an integer representing a particular
> script/language-specific digit set, or "....locale" if we want to make it a
> little less cryptic, and go to the trouble of mapping locales to digit sets.)
> This would allow us to easily extend this feature to the non-bidi scripts as
> well.

Organizing the prefs this way would make things much more sane, but we can't really ignore backward compatibility, since these prefs were working on 1.8, and broke in 1.9, so a lot of users have them set in their configs since 1.8 days.
Depends on: 441782
Attached patch WIP Patch (obsolete) — Splinter Review
I was thinking this should be fairly straightforward, but I don't know what's wrong, as this fails in strange ways...

I thought I'd upload a small WIP patch to see if another pair of eyes can see what's wrong in the code.
Attached patch Patch (v1) (obsolete) — Splinter Review
As it turns out, the problem I was observing was caused by a compilation mistake on my part.  This patch is cleaned up and ready for review.  All unit tests pass (I made sure they also pass on Windows with cleartype disabled).
Attachment #353989 - Attachment is obsolete: true
Attachment #357561 - Flags: superreview?(roc)
Attachment #357561 - Flags: review?(roc)
Attachment #357561 - Flags: superreview?(roc)
Attachment #357561 - Flags: superreview+
Attachment #357561 - Flags: review?(roc)
Attachment #357561 - Flags: review?(jfkthame)
Attachment #357561 - Flags: review+
Comment on attachment 357561 [details] [diff] [review]
Patch (v1)

+                                        !!(i>0 ? IS_ARABIC_CHAR(aBuffer[i-1]) : PR_FALSE),

Make this i > 0 && IS_ARABIC_CHAR(aBuffer[i-1])

Jonathan should also take a look at this.
A couple of thoughts on this:

(1) I wonder if it would be preferable to avoid creating the extra mBidiNumeralPersian instance variable, by instead assigning additional values to mBidiNumeral. (This could still be exposed as a separate boolean preference, if that's the best way to present it to users.)

ISTM this would avoid passing an additional parameter to HandleNumberInChar all over the place, and also avoid adding the extra if/then/else test in there; all the testing would be done directly by the switch instead.

In nsBidiPresUtils::FormatUnicodeText, you'd then use the persianNumerals flag directly to decide whether to pass IBMBIDI_NUMERAL_HINDI or IBMBIDI_NUMERAL_PERSIAN, instead of passing it on down to HandleNumberInChar; this also means you wouldn't be passing it at all in the IBMBIDI_NUMERAL_ARABIC case, where it is irrelevant anyway.

IMO, this would make for a tidier (and marginally more efficient, though I expect that's insignificant) implementation.

(2) The naming of constants in nsBidiUtils.h seems a bit unfortunate: we have

 #define IBMBIDI_NUMERAL_NOMINAL       0 //  0 = nominalnumeralBidi *
 #define IBMBIDI_NUMERAL_REGULAR       1 //  1 = regularcontextnumeralBidi
 #define IBMBIDI_NUMERAL_HINDICONTEXT  2 //  2 = hindicontextnumeralBidi
 #define IBMBIDI_NUMERAL_ARABIC        3 //  3 = arabicnumeralBidi
 #define IBMBIDI_NUMERAL_HINDI         4 //  4 = hindinumeralBidi

which are values that are used for mBidiNumeral, to control the behavior, but now there is the new

+#define IBMBIDI_NUMERAL_PERSIAN     7

which sounds like another value in the same set, but is actually unrelated, it's some kind of index of the preference, along with

 #define IBMBIDI_TEXTDIRECTION       1
 #define IBMBIDI_TEXTTYPE            2
 #define IBMBIDI_CONTROLSTEXTMODE    3
 #define IBMBIDI_NUMERAL             4
 #define IBMBIDI_SUPPORTMODE         5
 #define IBMBIDI_CHARSET             6

If we do (1) above, then we'd want IBMBIDI_NUMERAL_PERSIAN and presumably IBMBIDI_NUMERAL_PERSIANCONTEXT as mBidiNumeral values, but even if we don't do that, I'd like to find a less confusing approach to naming these constants. Maybe add _PREFERENCE, or something? I don't see an obvious solution that stays consistent with the existing names, yet avoids the current ambiguity; one possibility would be to add a prefix or suffix to the whole set, but that involves more code churn, which might not be desirable.

WDYT?
Regarding the first suggestion, in that case we would want to probably add two more values to bidi.numeral: one for "Persian digits all the time" and one for "Persian digits for Persian text", namely, IBMBIDI_NUMERAL_PERSIAN and IBMBIDI_NUMERAL_PERSIANCONTEXT values.  Then, it won't make much sense to expose this as a separate pref.  Of course I believe that a bidi.numeral.persian would be much more understandable, but anyway this shouldn't be a pref which users want to change all the time, so we'll probably be OK by expanding the values of bidi.numeral itself.

Regarding 2 above, if we don't do the first suggestion, then I think IBMBIDI_PERSIAN_NUMERAL would be a less confusing name, wouldn't it?

Roc: what do you think?
(In reply to comment #12)

> Of course I believe that a
> bidi.numeral.persian would be much more understandable

To some extent I can agree with this; but on the other hand, in order to make proper use of this the user already needs to understand the (arbitrary numeric values of) bidi.numeral. So I'm not sure there's much benefit to separating the "persian" element out as a distinct boolean.

(For the typical Persian user, not interested in understanding and adjusting these options, the localized build should have the "right" behavior by default anyway.)

> Regarding 2 above, if we don't do the first suggestion, then I think
> IBMBIDI_PERSIAN_NUMERAL would be a less confusing name, wouldn't it?

Yes, that would help as it would no longer look so much like part of the other set of #defines. I wish these constants all had names that included a "tag" identifying their purpose, or better still, were done as distinct enumerations, but that's not really part of this bug.
(In reply to comment #13)
> To some extent I can agree with this; but on the other hand, in order to make
> proper use of this the user already needs to understand the (arbitrary numeric
> values of) bidi.numeral. So I'm not sure there's much benefit to separating the
> "persian" element out as a distinct boolean.

Yes, I agree.

> (For the typical Persian user, not interested in understanding and adjusting
> these options, the localized build should have the "right" behavior by default
> anyway.)

Definitely.  That's the main motive for me to work on this bug.  :-)
Maybe we could have a single pref value, but use bitfields to structure it. I admit I can't remember what all the values mean without reading code and documentation.
(In reply to comment #15)
> Maybe we could have a single pref value, but use bitfields to structure it. I
> admit I can't remember what all the values mean without reading code and
> documentation.

Would it be alright, considering the fact that we'd be breaking backwards compatibility?

A little background on the usage of bidi.numeral may help here.  I've seen users who have found out about this pref using a forum post or a friend's recommendation, and have it set.  They don't really know what a pref is (and they don't care, as long as it works!).  Several of them have switched back to Firefox 2 once they figured out that Firefox 3 breaks this pref.  I don't think they'd much appreciate a change in the semantics of this pref (or replacing it with a new pref) in 3.1 or a future release.

Here's what I'm proposing:

a) if the performance gain/code cleanup that Jonathan mentioned in comment 11 is worth, let's add a couple of values to bidi.numeral and drop bidi.numeral.persian.

b) otherwise, leave bidi.numeral and add bidi.numeral.persian for this bug.

What do you think?
Another option is to use two pref values but map them to an internal flags word that uses bitfields. How about that?
That's kind of what I had in mind in the first para of comment #11, though I don't see any real benefit to bitfields at this point, rather than staying with a simple enumeration of the available options. My main point is that internally we don't need a separate member variable, and by adding it we only clutter the code at every call site where we end up passing it separately.

On balance, I think I'd lean towards staying with a single pref, rather than splitting out the "persian" element of it into a separate boolean, as I don't see any significant benefit from that. If we ever extend this whole concept to support additional numeral sets (e.g. to map ASCII to native digits for indic languages), that would look increasingly odd. Though I suppose if we do that, we wouldn't really want the pref to be named "bidi.*" at all, so that would be a logical time to reorganize things further.
Attached patch Patch (v2)Splinter Review
(In reply to comment #19)
> Alright, let's go with comment #16 option a).

I implemented this approach in the new version of the patch.
Attachment #357561 - Attachment is obsolete: true
Attachment #358570 - Flags: superreview?(roc)
Attachment #358570 - Flags: review?(roc)
Attachment #358570 - Flags: review?(jfkthame)
Attachment #357561 - Flags: review?(jfkthame)
Attachment #358570 - Flags: superreview?(roc)
Attachment #358570 - Flags: superreview+
Attachment #358570 - Flags: review?(roc)
Attachment #358570 - Flags: review+
Attachment #358570 - Flags: review?(jfkthame) → review+
http://hg.mozilla.org/mozilla-central/rev/3470b0b3a98d
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.2a1
Attachment #358570 - Flags: approval1.9.1?
Comment on attachment 358570 [details] [diff] [review]
Patch (v2)

This is a nice fix for Persian users and includes full test coverage.  Nominating to land on branch.
FWIW, one of this bug's mochitests seems to have sporadically failed today:

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1233892181.1233896582.10312.gz
WINNT 5.2 mozilla-central unit test on 2009/02/05 19:49:41

*** 41905 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/base/tests/test_bug441782.html | Rendering of reftest bug467672-5 is different with bidi.numeral == 1
(In reply to comment #23)
> FWIW, one of this bug's mochitests seems to have sporadically failed today:
> 
> http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1233892181.1233896582.10312.gz
> WINNT 5.2 mozilla-central unit test on 2009/02/05 19:49:41
> 
> *** 41905 ERROR TEST-UNEXPECTED-FAIL |
> /tests/layout/base/tests/test_bug441782.html | Rendering of reftest bug467672-5
> is different with bidi.numeral == 1

This puzzles me.  The tests for bug 441782 and this bug should not be run on Windows at all <http://hg.mozilla.org/mozilla-central/file/764d6458f6d4/layout/base/tests/Makefile.in#l80>.  Is my check in the Makefile not correct?
(In reply to comment #23)
> FWIW, one of this bug's mochitests seems to have sporadically failed today:

This same test failed in the same machine's cycle 2.5 hrs later, too:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1233892181.1233896582.10312.gz
(In reply to comment #24)
>  Is my check in the Makefile not correct?

 79 # Tests for bug 441782 don't pass reliably on Windows, because of bug 469208
 80 ifneq (,$(filter windows,$(MOZ_WIDGET_TOOLKIT)))

My Makefile-Fu is not strong, but I think that's the inverse of the test you want.  I think that condition says:
    - Grep for 'windows' in the toolkit string
    - if that DOESN'T return the empty string, then do these tests.
     (i.e. if you DO find Windows in the toolkit string, add the tests)

I think you just need a simple s/ifneq/ifeq/.

As supporting evidence for this, I refer to the following three full tinderbox logs from today (for Windows, mac, & linux, respectively).  The strings "467672" and "441782" only appear in the Windows tinderbox log, and not at all in the other two.

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1233942887.1233950527.27893.gz&fulltext=1
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1233951780.1233957443.10957.gz&fulltext=1
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1233958981.1233962365.20525.gz&fulltext=1
Oh, you're absolutely right.  :(  This patch fixes this mistake.
Attachment #361035 - Flags: superreview?(roc)
Attachment #361035 - Flags: review?(roc)
Attachment #361035 - Flags: superreview?(roc)
Attachment #361035 - Flags: superreview+
Attachment #361035 - Flags: review?(roc)
Attachment #361035 - Flags: review+
Attachment #358570 - Flags: approval1.9.1? → approval1.9.1+
Depends on: 478414
Keywords: user-doc-needed
Test for this bug appears to have some randomness to it.  I filed bug 488553 about that.
Blocks: 488553
Going through user-doc-needed bugs. I don't understand the purpose of a user-doc for this. Could someone explain it to me?
(In reply to comment #31)
> Going through user-doc-needed bugs. I don't understand the purpose of a
> user-doc for this. Could someone explain it to me?

I'm not sure where bidi.numeral is documented (if it is at all).  If it's documented, this bug needs to add documentation for values 5 and 6 of this pref.  Otherwise, this whole pref needs some user documentation on what it does.  See <http://kb.mozillazine.org/About:config_entries#Bidi.> for a sample of the docs needed.
Depends on: 524014
No longer depends on: 534931
You need to log in before you can comment on or make changes to this bug.