Closed
Bug 467672
Opened 16 years ago
Closed 16 years ago
need additional bidi.numerals setting for Persian (eastern Arabic-Indic) numerals
Categories
(Core :: Layout: Text and Fonts, defect)
Core
Layout: Text and Fonts
Tracking
()
RESOLVED
FIXED
mozilla1.9.2a1
People
(Reporter: jfkthame, Assigned: ehsan.akhgari)
References
Details
(Keywords: fixed1.9.1, user-doc-needed)
Attachments
(2 files, 2 obsolete files)
20.74 KB,
patch
|
roc
:
review+
jfkthame
:
review+
roc
:
superreview+
roc
:
approval1.9.1+
|
Details | Diff | Splinter Review |
840 bytes,
patch
|
roc
:
review+
roc
:
superreview+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1b3pre) Gecko/20081202 Minefield/3.1b3pre
Build Identifier:
In addition to the option of displaying numerals as the "Arabic-Indic" characters U+0660..0669, we should have an option that maps them to the "Eastern Arabic-Indic" characters U+06F0..06F9. These are the proper codepoints for use in Persian, as well as other languages such as Urdu.
Reproducible: Always
Assignee | ||
Comment 1•16 years ago
|
||
Yes, we definitely should do that. In fact, I was going to file this bug!
Assignee: nobody → ehsan.akhgari
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Version: unspecified → Trunk
Comment 2•16 years ago
|
||
I think perhaps the way to do this is to have a separate pref for the Unicode value of zero in local numerals, set to 1632 (0x660) by default.
Version: Trunk → unspecified
Assignee | ||
Comment 3•16 years ago
|
||
(In reply to comment #2)
> I think perhaps the way to do this is to have a separate pref for the Unicode
> value of zero in local numerals, set to 1632 (0x660) by default.
So, setting this pref to a random value could produce random glyphs instead of digits?
I was thinking about having bidi.numeral.type, its default value being 1 mapping to U+0660, and its 2 value mapping to U+06F0. Or we could have a bool pref bidi.numeral.persian to be false by default for Arabic-Indic and true for Eastern Arabic-Indic digits, in case we don't expect to add support for another digits range in future (is there even another digits range in Unicode?)
Version: unspecified → Trunk
Comment 4•16 years ago
|
||
(In reply to comment #3)
> So, setting this pref to a random value could produce random glyphs instead of
> digits?
There are two things I don't like about my idea. This is one of them, and the other is having to use decimal for Unicode values.
>
> I was thinking about having bidi.numeral.type, its default value being 1
> mapping to U+0660, and its 2 value mapping to U+06F0.
The downside of this is that it's much less obvious what the values mean, but all the other bidi prefs have that disadvantage already anyway, so I guess that's the way to go.
> Or we could have a bool
> pref bidi.numeral.persian to be false by default for Arabic-Indic and true for
> Eastern Arabic-Indic digits, in case we don't expect to add support for another
> digits range in future (is there even another digits range in Unicode?)
There are 37 ranges of decimal digits in Unicode. That includes things like MATHEMATICAL SANS-SERIF BOLD, and I'm not saying we should support all of them with this pref, but we should probably support at least the Indian (i.e. DEVANAGARI, BENGALI ,GURMUKHI etc, not Arabic-Indic) and South-East Asian ones that are supported by the regional options in Windows Control Panel.
Assignee | ||
Comment 5•16 years ago
|
||
(In reply to comment #4)
> The downside of this is that it's much less obvious what the values mean, but
> all the other bidi prefs have that disadvantage already anyway, so I guess
> that's the way to go.
OK.
> There are 37 ranges of decimal digits in Unicode. That includes things like
> MATHEMATICAL SANS-SERIF BOLD, and I'm not saying we should support all of them
> with this pref, but we should probably support at least the Indian (i.e.
> DEVANAGARI, BENGALI ,GURMUKHI etc, not Arabic-Indic) and South-East Asian ones
> that are supported by the regional options in Windows Control Panel.
Hmm, are these relevant with bidi.numeral? In bidi.numeral we do context detection based on Arabic letters, and I guess these would need other forms of context detection, hence other prefs.
Let me rephrase my question: are there other Unicode digits ranges which are relevant to the bidi.numeral pref values?
Reporter | ||
Comment 6•16 years ago
|
||
A followup question that occurs to me: Should "bidi.numerals = IBMBIDI_NUMERAL_PERSIAN" (or whatever value that ends up being called) mean that not only ASCII digits 0030..0039 get mapped to 06Fx, but also map the standard Arabic-Indic digits from 066x to their Persian counterparts at 06Fx? Similarly, should the 06Fx digits be mapped back to 066x when "bidi.numerals = IBMBIDI_NUMERAL_HINDI" is in effect? Or should we not attempt to map between the two sets of digits from the 06xx Arabic block, only between ASCII and the one selected column of 06xx digits?
If we do extend bidi.numerals with new values for Persian, do we want to maintain the existing settings unchanged, and append IBMBIDI_NUMERAL_PERSIANCONTEXT=5 and IBMBIDI_NUMERAL_PERSIAN=6 or something like that? This gives a rather peculiar sequence of settings, but is better for backward compatibility.
Or should we say that as this feature hasn't been working anyway, backward compatibility with existing profile settings doesn't matter much and we can reorganize the prefs in a more logical sequence, or perhaps split into two settings: "gfx.text.numerals.mode" (3 settings: no change/contextual/force) and "gfx.text.numerals.script" (an integer representing a particular script/language-specific digit set, or "....locale" if we want to make it a little less cryptic, and go to the trouble of mapping locales to digit sets.) This would allow us to easily extend this feature to the non-bidi scripts as well.
Come to think of it, we should have the locale-digits information somewhere already if we do locale-specific number formatting....
Assignee | ||
Comment 7•16 years ago
|
||
(In reply to comment #6)
> A followup question that occurs to me: Should "bidi.numerals =
> IBMBIDI_NUMERAL_PERSIAN" (or whatever value that ends up being called) mean
> that not only ASCII digits 0030..0039 get mapped to 06Fx, but also map the
> standard Arabic-Indic digits from 066x to their Persian counterparts at 06Fx?
> Similarly, should the 06Fx digits be mapped back to 066x when "bidi.numerals =
> IBMBIDI_NUMERAL_HINDI" is in effect? Or should we not attempt to map between
> the two sets of digits from the 06xx Arabic block, only between ASCII and the
> one selected column of 06xx digits?
I gave this a lot of thought. Initially I was thinking that we only want conversion between ASCII and either one of 066x or 06Fx. But given more thought, the main purpose of these prefs are for users to see numbers in their native language shapes. So it's worthwhile to try to show the correct glyphs.
It is very common for Persian speaking users to encounter Persian web pages which mistakenly use Arabic digit forms, so this will be especially useful to Persian users of Mozilla.
> If we do extend bidi.numerals with new values for Persian, do we want to
> maintain the existing settings unchanged, and append
> IBMBIDI_NUMERAL_PERSIANCONTEXT=5 and IBMBIDI_NUMERAL_PERSIAN=6 or something
> like that? This gives a rather peculiar sequence of settings, but is better for
> backward compatibility.
I think for the purpose of this bug, I'll go with a boolean bidi.numeral.persian pref, which defaults to false. This shouldn't happen the default behavior (because bidi.numeral is 0 by default, so bidi.numeral.persian is ignored) and should be toggled to true for Persian users, and left false for Arabic users.
If we decide to support other Unicode digits ranges in the future, they'd probably need their own prefs anyway, since they don't use the context detection in place for bidi.numeral.
> Or should we say that as this feature hasn't been working anyway, backward
> compatibility with existing profile settings doesn't matter much and we can
> reorganize the prefs in a more logical sequence, or perhaps split into two
> settings: "gfx.text.numerals.mode" (3 settings: no change/contextual/force) and
> "gfx.text.numerals.script" (an integer representing a particular
> script/language-specific digit set, or "....locale" if we want to make it a
> little less cryptic, and go to the trouble of mapping locales to digit sets.)
> This would allow us to easily extend this feature to the non-bidi scripts as
> well.
Organizing the prefs this way would make things much more sane, but we can't really ignore backward compatibility, since these prefs were working on 1.8, and broke in 1.9, so a lot of users have them set in their configs since 1.8 days.
Assignee | ||
Comment 8•16 years ago
|
||
I was thinking this should be fairly straightforward, but I don't know what's wrong, as this fails in strange ways...
I thought I'd upload a small WIP patch to see if another pair of eyes can see what's wrong in the code.
Assignee | ||
Comment 9•16 years ago
|
||
As it turns out, the problem I was observing was caused by a compilation mistake on my part. This patch is cleaned up and ready for review. All unit tests pass (I made sure they also pass on Windows with cleartype disabled).
Attachment #353989 -
Attachment is obsolete: true
Attachment #357561 -
Flags: superreview?(roc)
Attachment #357561 -
Flags: review?(roc)
Attachment #357561 -
Flags: superreview?(roc)
Attachment #357561 -
Flags: superreview+
Attachment #357561 -
Flags: review?(roc)
Attachment #357561 -
Flags: review?(jfkthame)
Attachment #357561 -
Flags: review+
Comment on attachment 357561 [details] [diff] [review]
Patch (v1)
+ !!(i>0 ? IS_ARABIC_CHAR(aBuffer[i-1]) : PR_FALSE),
Make this i > 0 && IS_ARABIC_CHAR(aBuffer[i-1])
Jonathan should also take a look at this.
Reporter | ||
Comment 11•16 years ago
|
||
A couple of thoughts on this:
(1) I wonder if it would be preferable to avoid creating the extra mBidiNumeralPersian instance variable, by instead assigning additional values to mBidiNumeral. (This could still be exposed as a separate boolean preference, if that's the best way to present it to users.)
ISTM this would avoid passing an additional parameter to HandleNumberInChar all over the place, and also avoid adding the extra if/then/else test in there; all the testing would be done directly by the switch instead.
In nsBidiPresUtils::FormatUnicodeText, you'd then use the persianNumerals flag directly to decide whether to pass IBMBIDI_NUMERAL_HINDI or IBMBIDI_NUMERAL_PERSIAN, instead of passing it on down to HandleNumberInChar; this also means you wouldn't be passing it at all in the IBMBIDI_NUMERAL_ARABIC case, where it is irrelevant anyway.
IMO, this would make for a tidier (and marginally more efficient, though I expect that's insignificant) implementation.
(2) The naming of constants in nsBidiUtils.h seems a bit unfortunate: we have
#define IBMBIDI_NUMERAL_NOMINAL 0 // 0 = nominalnumeralBidi *
#define IBMBIDI_NUMERAL_REGULAR 1 // 1 = regularcontextnumeralBidi
#define IBMBIDI_NUMERAL_HINDICONTEXT 2 // 2 = hindicontextnumeralBidi
#define IBMBIDI_NUMERAL_ARABIC 3 // 3 = arabicnumeralBidi
#define IBMBIDI_NUMERAL_HINDI 4 // 4 = hindinumeralBidi
which are values that are used for mBidiNumeral, to control the behavior, but now there is the new
+#define IBMBIDI_NUMERAL_PERSIAN 7
which sounds like another value in the same set, but is actually unrelated, it's some kind of index of the preference, along with
#define IBMBIDI_TEXTDIRECTION 1
#define IBMBIDI_TEXTTYPE 2
#define IBMBIDI_CONTROLSTEXTMODE 3
#define IBMBIDI_NUMERAL 4
#define IBMBIDI_SUPPORTMODE 5
#define IBMBIDI_CHARSET 6
If we do (1) above, then we'd want IBMBIDI_NUMERAL_PERSIAN and presumably IBMBIDI_NUMERAL_PERSIANCONTEXT as mBidiNumeral values, but even if we don't do that, I'd like to find a less confusing approach to naming these constants. Maybe add _PREFERENCE, or something? I don't see an obvious solution that stays consistent with the existing names, yet avoids the current ambiguity; one possibility would be to add a prefix or suffix to the whole set, but that involves more code churn, which might not be desirable.
WDYT?
Assignee | ||
Comment 12•16 years ago
|
||
Regarding the first suggestion, in that case we would want to probably add two more values to bidi.numeral: one for "Persian digits all the time" and one for "Persian digits for Persian text", namely, IBMBIDI_NUMERAL_PERSIAN and IBMBIDI_NUMERAL_PERSIANCONTEXT values. Then, it won't make much sense to expose this as a separate pref. Of course I believe that a bidi.numeral.persian would be much more understandable, but anyway this shouldn't be a pref which users want to change all the time, so we'll probably be OK by expanding the values of bidi.numeral itself.
Regarding 2 above, if we don't do the first suggestion, then I think IBMBIDI_PERSIAN_NUMERAL would be a less confusing name, wouldn't it?
Roc: what do you think?
Reporter | ||
Comment 13•16 years ago
|
||
(In reply to comment #12)
> Of course I believe that a
> bidi.numeral.persian would be much more understandable
To some extent I can agree with this; but on the other hand, in order to make proper use of this the user already needs to understand the (arbitrary numeric values of) bidi.numeral. So I'm not sure there's much benefit to separating the "persian" element out as a distinct boolean.
(For the typical Persian user, not interested in understanding and adjusting these options, the localized build should have the "right" behavior by default anyway.)
> Regarding 2 above, if we don't do the first suggestion, then I think
> IBMBIDI_PERSIAN_NUMERAL would be a less confusing name, wouldn't it?
Yes, that would help as it would no longer look so much like part of the other set of #defines. I wish these constants all had names that included a "tag" identifying their purpose, or better still, were done as distinct enumerations, but that's not really part of this bug.
Assignee | ||
Comment 14•16 years ago
|
||
(In reply to comment #13)
> To some extent I can agree with this; but on the other hand, in order to make
> proper use of this the user already needs to understand the (arbitrary numeric
> values of) bidi.numeral. So I'm not sure there's much benefit to separating the
> "persian" element out as a distinct boolean.
Yes, I agree.
> (For the typical Persian user, not interested in understanding and adjusting
> these options, the localized build should have the "right" behavior by default
> anyway.)
Definitely. That's the main motive for me to work on this bug. :-)
Maybe we could have a single pref value, but use bitfields to structure it. I admit I can't remember what all the values mean without reading code and documentation.
Assignee | ||
Comment 16•16 years ago
|
||
(In reply to comment #15)
> Maybe we could have a single pref value, but use bitfields to structure it. I
> admit I can't remember what all the values mean without reading code and
> documentation.
Would it be alright, considering the fact that we'd be breaking backwards compatibility?
A little background on the usage of bidi.numeral may help here. I've seen users who have found out about this pref using a forum post or a friend's recommendation, and have it set. They don't really know what a pref is (and they don't care, as long as it works!). Several of them have switched back to Firefox 2 once they figured out that Firefox 3 breaks this pref. I don't think they'd much appreciate a change in the semantics of this pref (or replacing it with a new pref) in 3.1 or a future release.
Here's what I'm proposing:
a) if the performance gain/code cleanup that Jonathan mentioned in comment 11 is worth, let's add a couple of values to bidi.numeral and drop bidi.numeral.persian.
b) otherwise, leave bidi.numeral and add bidi.numeral.persian for this bug.
What do you think?
Another option is to use two pref values but map them to an internal flags word that uses bitfields. How about that?
Reporter | ||
Comment 18•16 years ago
|
||
That's kind of what I had in mind in the first para of comment #11, though I don't see any real benefit to bitfields at this point, rather than staying with a simple enumeration of the available options. My main point is that internally we don't need a separate member variable, and by adding it we only clutter the code at every call site where we end up passing it separately.
On balance, I think I'd lean towards staying with a single pref, rather than splitting out the "persian" element of it into a separate boolean, as I don't see any significant benefit from that. If we ever extend this whole concept to support additional numeral sets (e.g. to map ASCII to native digits for indic languages), that would look increasingly odd. Though I suppose if we do that, we wouldn't really want the pref to be named "bidi.*" at all, so that would be a logical time to reorganize things further.
Alright, let's go with comment #16 option a).
Assignee | ||
Comment 20•16 years ago
|
||
(In reply to comment #19)
> Alright, let's go with comment #16 option a).
I implemented this approach in the new version of the patch.
Attachment #357561 -
Attachment is obsolete: true
Attachment #358570 -
Flags: superreview?(roc)
Attachment #358570 -
Flags: review?(roc)
Attachment #358570 -
Flags: review?(jfkthame)
Attachment #357561 -
Flags: review?(jfkthame)
Attachment #358570 -
Flags: superreview?(roc)
Attachment #358570 -
Flags: superreview+
Attachment #358570 -
Flags: review?(roc)
Attachment #358570 -
Flags: review+
Reporter | ||
Updated•16 years ago
|
Attachment #358570 -
Flags: review?(jfkthame) → review+
Assignee | ||
Comment 21•16 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.2a1
Assignee | ||
Updated•16 years ago
|
Attachment #358570 -
Flags: approval1.9.1?
Assignee | ||
Comment 22•16 years ago
|
||
Comment on attachment 358570 [details] [diff] [review]
Patch (v2)
This is a nice fix for Persian users and includes full test coverage. Nominating to land on branch.
Comment 23•16 years ago
|
||
FWIW, one of this bug's mochitests seems to have sporadically failed today:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1233892181.1233896582.10312.gz
WINNT 5.2 mozilla-central unit test on 2009/02/05 19:49:41
*** 41905 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/base/tests/test_bug441782.html | Rendering of reftest bug467672-5 is different with bidi.numeral == 1
Assignee | ||
Comment 24•16 years ago
|
||
(In reply to comment #23)
> FWIW, one of this bug's mochitests seems to have sporadically failed today:
>
> http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1233892181.1233896582.10312.gz
> WINNT 5.2 mozilla-central unit test on 2009/02/05 19:49:41
>
> *** 41905 ERROR TEST-UNEXPECTED-FAIL |
> /tests/layout/base/tests/test_bug441782.html | Rendering of reftest bug467672-5
> is different with bidi.numeral == 1
This puzzles me. The tests for bug 441782 and this bug should not be run on Windows at all <http://hg.mozilla.org/mozilla-central/file/764d6458f6d4/layout/base/tests/Makefile.in#l80>. Is my check in the Makefile not correct?
Comment 25•16 years ago
|
||
(In reply to comment #23)
> FWIW, one of this bug's mochitests seems to have sporadically failed today:
This same test failed in the same machine's cycle 2.5 hrs later, too:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1233892181.1233896582.10312.gz
Comment 26•16 years ago
|
||
(In reply to comment #24)
> Is my check in the Makefile not correct?
79 # Tests for bug 441782 don't pass reliably on Windows, because of bug 469208
80 ifneq (,$(filter windows,$(MOZ_WIDGET_TOOLKIT)))
My Makefile-Fu is not strong, but I think that's the inverse of the test you want. I think that condition says:
- Grep for 'windows' in the toolkit string
- if that DOESN'T return the empty string, then do these tests.
(i.e. if you DO find Windows in the toolkit string, add the tests)
I think you just need a simple s/ifneq/ifeq/.
As supporting evidence for this, I refer to the following three full tinderbox logs from today (for Windows, mac, & linux, respectively). The strings "467672" and "441782" only appear in the Windows tinderbox log, and not at all in the other two.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1233942887.1233950527.27893.gz&fulltext=1
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1233951780.1233957443.10957.gz&fulltext=1
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1233958981.1233962365.20525.gz&fulltext=1
Assignee | ||
Comment 27•16 years ago
|
||
Oh, you're absolutely right. :( This patch fixes this mistake.
Attachment #361035 -
Flags: superreview?(roc)
Attachment #361035 -
Flags: review?(roc)
Attachment #361035 -
Flags: superreview?(roc)
Attachment #361035 -
Flags: superreview+
Attachment #361035 -
Flags: review?(roc)
Attachment #361035 -
Flags: review+
Attachment #358570 -
Flags: approval1.9.1? → approval1.9.1+
Assignee | ||
Comment 29•16 years ago
|
||
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/5d5b8c4c7ae2
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/c952d1f2fe09
Keywords: fixed1.9.1
Assignee | ||
Updated•16 years ago
|
Keywords: user-doc-needed
Comment 30•16 years ago
|
||
Test for this bug appears to have some randomness to it. I filed bug 488553 about that.
Blocks: 488553
Comment 31•15 years ago
|
||
Going through user-doc-needed bugs. I don't understand the purpose of a user-doc for this. Could someone explain it to me?
Assignee | ||
Comment 32•15 years ago
|
||
(In reply to comment #31)
> Going through user-doc-needed bugs. I don't understand the purpose of a
> user-doc for this. Could someone explain it to me?
I'm not sure where bidi.numeral is documented (if it is at all). If it's documented, this bug needs to add documentation for values 5 and 6 of this pref. Otherwise, this whole pref needs some user documentation on what it does. See <http://kb.mozillazine.org/About:config_entries#Bidi.> for a sample of the docs needed.
Updated•15 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•