bidi.numerals == 4 does not switch numeric shapes in content

RESOLVED FIXED in mozilla1.9.2a1

Status

()

Core
Layout: Text
RESOLVED FIXED
10 years ago
9 years ago

People

(Reporter: Ehsan, Assigned: Ehsan)

Tracking

(Depends on: 1 bug, {fixed1.9.1, regression, user-doc-needed})

Trunk
mozilla1.9.2a1
fixed1.9.1, regression, user-doc-needed
Points:
---
Dependency tree / graph
Bug Flags:
wanted1.9.1 +
in-testsuite +

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(4 attachments, 10 obsolete attachments)

(Assignee)

Description

10 years ago
Setting bidi.numerals to 4 is supposed to switch the numeric shapes to the Hindi form in content.  This used to work in Firefox 2, but has regressed in Firefox 3 and trunk.  This still works as expected in some XUL content (such as about:config), in the status bar, in XUL trees, and possibly in other places as well, but doesn't seem to work for any HTML content.

Steps to reproduce:
1. Set bidi.numerals to 4 in about:config.
2. See the about:config numeric values change to hindi shapes.
3. Load www.mozilla.com.
4. See the numeric values (for example, the download box for Firefox 3.0) have not changed.

A large number of Iranian users used this feature in Firefox 2, and they noticed this was broken in Firefox 3 (example: <http://support.mozilla.com/tiki-view_forum_thread.php?forumId=1&comments_parentId=22573>)

Here's the result of some investigations I did in an email communication with roc:

The dirty work is performed by HandleNumbers in intl/unicharutil/util/nsBidiUtils.cpp.  HandleNumbers is called by nsBidiPresUtils::FormatUnicodeText().   FormatUnicodeText is called by nsBidiPresUtils::RenderText() which is in turn called by nsLayoutUtils::DrawString().

Check the usage of nsBidiPresUtils::RenderText() in 1.8 and 1.9:
<http://mxr.mozilla.org/mozilla1.8/ident?i=RenderText>
<http://mxr.mozilla.org/firefox/ident?i=RenderText>

I'm not sure what is causing the problem.  Simon: do you have any ideas?
Flags: wanted1.9.1?
(Assignee)

Comment 1

10 years ago
(In reply to comment #0)
> Setting bidi.numerals to 4 is supposed to switch the numeric shapes to the
> Hindi form in content.  This used to work in Firefox 2, but has regressed in
> Firefox 3 and trunk.  This still works as expected in some XUL content (such as
> about:config), in the status bar, in XUL trees, and possibly in other places as
> well, but doesn't seem to work for any HTML content.

I just noted that bidi.numerals == 4 works correctly for image alternative texts as well.
It works in print headers and footers too, basically anywhere which isn't an nsTextFrame.

I should have thought of this issue before. We need to either do Arabic digit substitution when creating text runs, or else let the platforms do it along with Arabic shaping, assuming that all the platforms can do that.
(Assignee)

Comment 3

10 years ago
Why not do the substitution ourselves like we're already doing it in other places which actually do respect bidi.numerals on trunk?

I'm willing to work on a patch, but I don't know where to start.  Can you give me some pointers?  Thanks!
So this feature is supposed to just substitute other characters/glyphs for the characters '0'-'9'?

We should modify the input string as it gets passed down to textrun construction. There's various places we can do this. gfxTextRunWordCache is the only place that all textrun construction goes through that's cross-platform, and it's already examining every character of the string so adding extra checks for digits isn't going to slow us down, so it's probably the best place, although it is a little bit complex.

I would modify MakeTextRun (both 8-bit and 16-bit versions) to track whether we've encountered a digit in the current word and whether we've added a word with a digit to tempString. At the end, when we create a textrun for the new words in tempString, if there's a digit in tempString and bidi.numeral is nonzero then create a new string with the digits replaced with the appropriate replacement characters, and pass that to MakeTextRun instead. In the 8-bit case I guess the new string is going to have to be 16-bit, but that should be OK.

This code is performance sensitive so you probably want to cache the value of bidi.numeral. I believe there's already some gfx code that caches other pref values, so you can copy that.

I'm assuming that this digit substitution can't turn a single-direction string into a mixed-direction string. If it can, then we have much bigger problems.
If this is a bug and a regression from Firefox 2 shouldn't it be targeted at a point release rather than leaving it broken until 3.1? Or is it too big a change to fix any earlier? (Assigned to "nobody" is not a hopeful sign either way.)
Flags: wanted1.9.0.x?
(Assignee)

Comment 6

10 years ago
I'm going to get to this in this week or the next, hopefully.  If anyone else would like to step up and provide a patch sooner, feel free to reassign this.  :-)
Assignee: nobody → ehsan.akhgari
(Assignee)

Comment 7

10 years ago
Created attachment 328104 [details] [diff] [review]
WIP Patch

I followed roc's suggestions, and I have a patch here which mostly works, except two problems.

The first problem is that bidi.numeral is only handled for strings entering the cache, and changing bidi.numeral at runtime would not affect the strings already in cache.  The obvious solution here should be just walking over the cached entries and change any already-cached strings containing digits, but I thought there may be more to it than this simplistic approach, especially because of the second problem below.

Because the width of Arabic and Hindi digits generally differ, the dimensions of the text boxes need to be recalculated, which (correct me if I'm wrong) may require a reflow.  This can cause mis-positioning texts when changing bidi.numeral at runtime.  I will attach a screenshot to show the problem.  I'm not sure how to address this issue, and also if the same issue could affect the cached textruns.
(Assignee)

Comment 8

10 years ago
Created attachment 328106 [details]
Screenshot of problem 2

The top image shows the footer of www.mozilla.com loaded when bidi.numeral was set to 4.  The bottom image shows the same footer after changing bidi.numeral to 0.

The width of "2005-2008" has increased, and it should have caused a reflow by shifting "All rights reserved." to right, but it doesn't happen.  This problem didn't happen in 1.8.
I think you can solve that problem by adding a numeral-type flag to CacheHashKey in gfxTextRunWordCache.cpp
(Assignee)

Comment 10

10 years ago
(In reply to comment #9)
> I think you can solve that problem by adding a numeral-type flag to
> CacheHashKey in gfxTextRunWordCache.cpp

Thanks, I tried it, but unfortunately it didn't solve the problem.  It seems that the cache doesn't get looked up after toggling bidi.numeral, so this solution is not effective.
Status: NEW → ASSIGNED
(Assignee)

Comment 11

10 years ago
I tried clearing the cache completely after bidi.numeral is changed, and it kind of works.  The only glitch is that if a number is already being displayed on the screen, it doesn't get affected unless that area of the window gets invalidated. (Easy way to reproduce: leave a Bugzilla tab containing the bug # open when changing bidi.numeral in about:config, and hover the tab after changing this value.)

I'm not sure if clearing the cache completely is too drastic a measure here.  I can enumerate the entries and remove only those which actually contain a digit if that would be better.  Anyway, toggling bidi.numeral shouldn't happen that often I guess.

The second problem in comment 7 still persists though.

Updated

10 years ago
Depends on: 443629
The second problem is apparently an unreported bug: any change to bidi options should force a reflow. I just tried the testcases in bug 80352 comment 45, and they both failed, i.e. the expected changes didn't take effect until page reload.

I've filed that as bug 443629.
(Assignee)

Comment 13

10 years ago
Thanks for following this up, Simon.

Roc: do you think clearing the cache completely is a good thing to do here (see comment 11)?  If so, I have a patch ready for review.
I don't think runtime changes to bidi.numeral working 100% correctly should be a high priority.

However, clearing the cache and forcing a reflow sounds very reasonable. You can do that in nsPresContext::PreferenceChanged the same way we handle changes to 'font' (but you'll need to add a cache-clear).
(Assignee)

Comment 15

10 years ago
Created attachment 328168 [details] [diff] [review]
Patch (v1)

This patch handles the reflow and text run cache flushing on bidi.numeral changes.  (In fact the reflow is performed upon changing of all bidi.* prefs, which effectively solves bug 443629 as well).
Attachment #328104 - Attachment is obsolete: true
Attachment #328168 - Flags: superreview?(roc)
Attachment #328168 - Flags: review?(roc)
(Assignee)

Comment 16

10 years ago
I was going to write a number of reftests for this, but I figured out that reftests are executed as non-privileged HTML files, and netscape.security.PrivilegeManager.enablePrivilege() results in a popup dialog requesting permission from the user...  Is there any otherway for me to write tests for this bug?
(Assignee)

Updated

10 years ago
Blocks: 443629
No longer depends on: 443629
(Assignee)

Updated

10 years ago
OS: Windows Vista → All
Hardware: PC → All
you could use mochitests, use canvas.drawWindow and check that toDataURL returns the same thing. Sort of a reftest-in-mochitest.
(Assignee)

Comment 18

10 years ago
(In reply to comment #17)
> you could use mochitests, use canvas.drawWindow and check that toDataURL
> returns the same thing. Sort of a reftest-in-mochitest.

Thanks for the tip!  I wrote a test and noticed a problem.  bidi.numeral == 2 is supposed to use Hindi digits when in "Arabic context".  The "Arabic context" is defined like this: any digit which occurs right after an Arabic character.  This means that the following:

متن123

should be rendered as:

متن۱۲۳

when bidi.numeral == 2.  This will be handled correctly if the text gets passed to the 16-bit MakeTextRun(), but it seems to me that the number part ("123") is being passed to the 8-bit function instead, which makes it impossible to detect the "Arabic context" in the 8-bit version of MakeTextRun().

Do you have any idea what can be done in this regard?  I'm not sure how MakeTextRun gets called, and how it's decided which version of it would be called for what kind of text...
The Hindi digits are right-to-left, correct? That means that the string will be broken into two textruns because of the direction change, so there won't be enough context in just a single textrun. Even without the direction change, there are still textrun changes at font boundaries so this kind of context isn't available.

Another approach, somewhat more challenging, would be to make the change in nsTextFrameUtils::TransformText. That's where we do white-space collapsing. You would need a flag similar to aIncomingWhitespace to track whether the preceding character is an Arabic digit. In fact maybe we should change that from a boolean to an enum "whitespae, Arabic digit, other". That information gets propagated around to various places, including flags in textruns, so it's some work.
(In reply to comment #19)
> The Hindi digits are right-to-left, correct?

Sorry, of course the Hindi (and "Arabic") digits are left to right but the Arabic letters are RTL so we do have a direction-change boundary and my comment still applies.
(Assignee)

Comment 21

10 years ago
I see the problem now.  IINM we can store all the context information necessary in nsTextFrameUtils::TransformText and then utilize it in MakeTextRun, right?  Of course I have not looked deeply into it yet, so I may be well misguided here.
I think that's the right thing to do, yes. You can't easily do the transformation in TransformText itself because of the need to convert some 8-bit strings to 16-bit.
(Assignee)

Comment 23

10 years ago
Comment on attachment 328168 [details] [diff] [review]
Patch (v1)

OK, so I will prepare a new patch.
Attachment #328168 - Attachment is obsolete: true
Attachment #328168 - Flags: superreview?(roc)
Attachment #328168 - Flags: review?(roc)
(Assignee)

Comment 24

10 years ago
I did a bit of more investigation on this (specifically looked into BuildTextRunsScanner::BuildTextRunForFrames which seems to be the place where nsTextFrameUtils::TransformText is called and textruns are later created), and now I'm not sure that it's possible to extract the context data in nsTextFrameUtils::TransformText and utilize it in MakeTextRun.

The "context" info I'm talking about is whether the preceding character for any character has been an Arabic character or not (and also, an Arabic digit, since once we find a number containing multiple digits which is preceded by Arabic text, we want to make each digit display in Hindi format).  Storing this context information means allocating an array of |PRBool|s (or |PRPackedBool|s) and passing it along to MakeTextRun.  I think the overhead of this array is too much to implement such a feature.

How about we break it up like this?

IBMBIDI_NUMERAL_NOMINAL: for this value of bidi.numeral, we don't need to do anything.

IBMBIDI_NUMERAL_ARABIC: this doesn't need any context info, and simply converts all Hindi digits to Arabic ones.  This will be handled in MakeTextRun.

IBMBIDI_NUMERAL_HINDI: this doesn't need any context info either, and can be handled in MakeTextRun.

IBMBIDI_NUMERAL_REGULAR and IBMBIDI_NUMERAL_HINDICONTEXT both seem to do one thing, that is, what I described in comment 18.  These need context info, and I think they can both be handled in the 16-bit version of nsTextFrameUtils::TransformText.  The reasoning is like this.  There are two cases which we need conversion here:

 * When an Arabic digit is being preceded by Arabic characters.  The presence of Arabic characters mean that the text is going to be 16-bit.

 * When a Hindi digit is being preceded by non-Arabic characters.  The presense of Hindi digits mean that the text is going to be 16-bit again.

I have not tried this idea in code yet, but it seems logical and I think it would work.  I thought I'd share the idea first to make sure I'm not missing something.
The problem is stuff like

<b>123</b>متن

In that case the first part would be 8-bit, the second 16-bit.

(In reply to comment #24)
> The "context" info I'm talking about is whether the preceding character for any
> character has been an Arabic character or not (and also, an Arabic digit, since
> once we find a number containing multiple digits which is preceded by Arabic
> text, we want to make each digit display in Hindi format).  Storing this
> context information means allocating an array of |PRBool|s (or |PRPackedBool|s)
> and passing it along to MakeTextRun.  I think the overhead of this array is too
> much to implement such a feature.

You don't have to dynamically allocate an array or even a struct. There's a finite number of states that matter, three actually:
-- preceding character is an Arabic letter or Arabic digit converted to Hindi
-- preceding character is whitespace
-- preceding character is something else

So use a PRUint8 and three #defines or enum values to represent the possibilities. No more space than for a PRPackedBool.
(Assignee)

Comment 26

10 years ago
Oh now I understand what you meant.  And that's right.  We only need to track the last character in the previous run to make sure the leading character in the current run needs to be handled in a special way according to the bidi.numeral value.

Updated patch forthcoming.
(In reply to comment #24)
> IBMBIDI_NUMERAL_REGULAR and IBMBIDI_NUMERAL_HINDICONTEXT both seem to do one
> thing, that is, what I described in comment 18.

The difference between them should be whether Arabic or Indic numerals are used by default if a numeral is found before either an Arabic letter or a non-Arabic letter. I suspect that this has never worked properly, though.
(Assignee)

Comment 28

10 years ago
Created attachment 328962 [details] [diff] [review]
WIP Patch

OK, I tried the new approach.  I've got a patch which I think implements what roc suggested in comment 19, but it doesn't work.  I'm not sure how to track the problem here.

This patch only handles arabic chars as the context and it doesn't handle digits yet.  A simple test with a URL such as below can be used to test it (with bidi.numeral == 1 for example):

data:text/plain,متن1

I thought I'd post the patch here to see if someone has any ideas.
+    if (!nsCRT::strcmp(aTopic, NS_PREFBRANCH_PREFCHANGE_TOPIC_ID) &&
+        !nsCRT::strcmp(aData, NS_LITERAL_STRING("numeral").get())) {

Probably should flush the cache when "font." prefs change too.

+                                   i>0 ? tempString[i-1] : 0,

This should be the same as for the 16-bit code I guess

+                if (!seenDigit) {
+                    PRUint32 j;
+                    // Optimization: only handle values of bidi.numeral which
+                    // actually need special attention
+                    switch (mBidiNumeral) {
+                    case IBMBIDI_NUMERAL_HINDI:
+                    case IBMBIDI_NUMERAL_ARABIC:
+                    case IBMBIDI_NUMERAL_REGULAR:
+                    case IBMBIDI_NUMERAL_HINDICONTEXT:

I would avoid this switch by creating a boolean local variable needsNumeralProcessing and setting it outside the loop.

+          lastCharArabic = IS_ARABIC_CHAR(ch);

In the 8-bit code this can't be true because none of the Arabic chars are 8-bit.

I don't see any obvious bugs though. This is definitely what I had in mind and looks great to me. It shouldn't be too hard to debug, I would hope.
(Assignee)

Comment 30

10 years ago
I think I found out the source of the problem.  I've been setting a TEXT_TRAILING_ARABICCHAR flag, and it does exactly what its name says: it's set when the trailing character for a piece of text is an Arabic char.  So, for example when MakeTextRun is called for متن, then this flag is set.  When it's called for 123, this flag is not set.  I'm not sure how I can use the flag in the next MakeTextRun call, not the current one.

I tried saving mNextRunContextInfo's value before the |for (i = 0; i < mMappedFlows.Length(); ++i)| loop in BuildTextRunsScanner::BuildTextRunForFrames and use the saved value in order to detect the INCOMING_ARABICCHAR state, but the result was the same (TEXT_TRAILING_ARABICCHAR being set for متن).  Any idea what I'm doing wrong?
You should be checking TEXT_TRAILING_ARABICCHAR everywhere we currently test TEXT_TRAILING_WHITESPACE. BuildTextRunsScanner::FlushFrames for example:
-    mTrimNextRunLeadingWhitespace =
-      (textRun->GetFlags() & nsTextFrameUtils::TEXT_TRAILING_WHITESPACE) != 0;
+    if ((textRun->GetFlags() & nsTextFrameUtils::TEXT_TRAILING_WHITESPACE) != 0)
+      mNextRunContextInfo |= nsTextFrameUtils::INCOMING_WHITESPACE;

This change actually seems wrong, you need to reset mNextRunContextInfo not just set the bit. And it should be set according to TEXT_TRAILING_ARABICCHAR as well.

+      mCurrentRunTrimLeadingWhitespace = mNextRunContextInfo & nsTextFrameUtils::INCOMING_WHITESPACE;

This is going to have to become mCurrentRunContextInfo.

Hope you get the idea. Basically all these 'whitespace' variables become 3-value state variables.
(Assignee)

Comment 32

10 years ago
(In reply to comment #27)
> (In reply to comment #24)
> > IBMBIDI_NUMERAL_REGULAR and IBMBIDI_NUMERAL_HINDICONTEXT both seem to do one
> > thing, that is, what I described in comment 18.
> 
> The difference between them should be whether Arabic or Indic numerals are used
> by default if a numeral is found before either an Arabic letter or a non-Arabic
> letter. I suspect that this has never worked properly, though.

Can you elaborate this, please?  Perhaps with an example?

I'm currently not changing the behavior for IBMBIDI_NUMERAL_REGULAR and IBMBIDI_NUMERAL_HINDICONTEXT but I'd like to address this in a followup bug if needed.
(Assignee)

Comment 33

10 years ago
Created attachment 329241 [details] [diff] [review]
WIP Patch (nearly complete)

OK, I found the problem.  What we really needed was a TEXT_INCOMING_ARABICCHAR flag, not a TEXT_TRAILING_ARABICCHAR flag, because we're interested in those cases where the previous text contained an Arabic char at its end.

This patch includes the test case I've written.  The reftests 1-3 are built based on the assumption that we get exactly the same behavior as Gecko 1.8 showed.  The Arabic context detection algorithm in 1.8 was pretty basic: for each digit, look at the character before it, and see if it's Arabic or not.  I've improved this logic a bit to ignore whitespace characters.  This is tested by reftests 4 and 5.

There is still a problem with the patch.  I know it exactly but I'm not sure how to solve it.  Gecko 1.8, with bidi.numeral set to 1 or 2, completely ignored the first character of a string and started working from chatacter index 1 onwards.  Now, with textruns entering the picture, the "first character of the string" is not easily detected.  My code doesn't ignore this first character, and this causes some of the reftests to fail.  I need a way of detecting if the beginning of a string being passed to MakeTextRun is really the beginning of the string, or somewhere inside it.

Once this problem is solved, I guess this is ready for review.
Attachment #328962 - Attachment is obsolete: true
(In reply to comment #32)

> > The difference between them should be whether Arabic or Indic numerals are used
> > by default if a numeral is found before either an Arabic letter or a non-Arabic
> > letter. I suspect that this has never worked properly, though.
> 
> Can you elaborate this, please?  Perhaps with an example?

See attachment 97925 [details] from bug 166803. In Gecko 1.8, 1 3 and 4 (which are at the beginning of the content of their respective block elements) appear as Arabic numerals with bidi.numeral set to either 1 or 2; but when it is set to 2 all the numerals should be Indic.
(In reply to comment #33)
> There is still a problem with the patch.  I know it exactly but I'm not sure
> how to solve it.  Gecko 1.8, with bidi.numeral set to 1 or 2, completely
> ignored the first character of a string and started working from chatacter
> index 1 onwards.  Now, with textruns entering the picture, the "first character
> of the string" is not easily detected.  My code doesn't ignore this first
> character, and this causes some of the reftests to fail.  I need a way of
> detecting if the beginning of a string being passed to MakeTextRun is really
> the beginning of the string, or somewhere inside it.

I'm not sure what you mean by "the beginning of the string". You mean the beginning of a word? What exactly is failing and what behaviour do you need?
(Assignee)

Comment 36

10 years ago
(In reply to comment #35)
> I'm not sure what you mean by "the beginning of the string". You mean the
> beginning of a word? What exactly is failing and what behaviour do you need?

I mean the first character of the string contents of any block element...
I'd like to understand what testcase is failing and what the required behaviour is.

nsLineBreaker could tell us when the textrun is the first textrun for the nsLineBreaker text, but that doesn't seem like the right solution.
(Assignee)

Comment 38

10 years ago
I'm not using my development box right now, so I can't tell for sure which testcase is failing, but the problem occurs with bidi.numeral set to 1 or 2.

For example, when rendering this with bidi.numeral==1:

<p>۱۲۳</p>

Gecko 1.8 stars with inspecting the second digit, and because the previous one is a Hindi digit, it leaves this string alone.  My patch (the HandleNumberInChar function) uses aPrevCharArabic, which is false, so it converts the first character to the Arabic "1" digit.  The next two characters are incorrectly converted to Arabic ones as well, so the final rendering of my patch would be equivalent to:

<p>123</p>

I hope I've been clear enough.  If not, please let me know.
Looks like in 1.8, nsTextTransform appplied HandleNumbers to words:
http://mxr.mozilla.org/mozilla1.8/source/layout/generic/nsTextTransformer.cpp#1066
So basically it avoided transforming the first character of a whitespace-delimited word. But that means that in 1.8 "abc 123" is treated differently from "abc123" and "abc&nbsp;123" ... right?

(In reply to comment #38)
> Gecko 1.8 stars with inspecting the second digit, and because the previous one
> is a Hindi digit, it leaves this string alone.

I don't understand that.
571       for (i=1;i<aSize;i++) {
572         if (IS_ARABIC_CHAR(aBuffer[i-1])) 
573           aBuffer[i] = NUM_TO_HINDI(aBuffer[i]);
574         else 
575           aBuffer[i] = NUM_TO_ARABIC(aBuffer[i]);
576       }
So when i=1, IS_ARABIC_CHAR(aBuffer[0]) is false and we set aBuffer[1] to NUM_TO_ARABIC(aBuffer[1]), which I presumse is "2". So why don't we end with "1۲۳"?

BTW every time I read this code and this bug I get very confused because "Arabic" sometimes means characters in the range 0x600-0x6FF and sometimes means characters in the range 0x30-0x39 :-(. Would it be OK to use the term "Latin" instead of "Arabic" for 0x30-0x39?
(In reply to comment #39)
> I don't understand that.
> 571       for (i=1;i<aSize;i++) {
> 572         if (IS_ARABIC_CHAR(aBuffer[i-1])) 
> 573           aBuffer[i] = NUM_TO_HINDI(aBuffer[i]);
> 574         else 
> 575           aBuffer[i] = NUM_TO_ARABIC(aBuffer[i]);
> 576       }
> So when i=1, IS_ARABIC_CHAR(aBuffer[0]) is false and we set aBuffer[1] to
> NUM_TO_ARABIC(aBuffer[1]), which I presumse is "2". So why don't we end with
> "1۲۳"?

Sorry I meant, why don't we end with "۱23"?

Updated

10 years ago
Component: Layout: BiDi Hebrew & Arabic → Layout: Text
QA Contact: layout.bidi → layout.fonts-and-text
(Assignee)

Comment 41

10 years ago
(In reply to comment #39)
> Looks like in 1.8, nsTextTransform appplied HandleNumbers to words:
> http://mxr.mozilla.org/mozilla1.8/source/layout/generic/nsTextTransformer.cpp#1066
> So basically it avoided transforming the first character of a
> whitespace-delimited word. But that means that in 1.8 "abc 123" is treated
> differently from "abc123" and "abc&nbsp;123" ... right?

Actually, looking at nsTextTransformer::DoNumericShaping() <http://mxr.mozilla.org/mozilla1.8/source/layout/generic/nsTextTransformer.cpp#1499> I see that HandleNumbers is never even called with IBMBIDI_NUMERAL_REGULAR or IBMBIDI_NUMERAL_HINDICONTEXT...  The algorithm for these two cases are different to what happens in HandleNumbers for these two values, so I need to change the match to match that behavior.

mCharType seems to have a special role here, and it's set in <http://mxr.mozilla.org/mozilla1.8/source/layout/generic/nsTextTransformer.cpp#227>.  In 1.9(.1), MakeTextRun does not have access to the text frame pointer, or the presentation context, so I guess nsTextFrameThebes should retrieve the chartype property and pass it along MakeTextRun, so that it can implement the same algorithm as we had in 1.8...

This could make some of my previous work in detecting Arabic letters at the end of previous textrunes useless, but I think we should keep those as further heuristics as well.

Am I thinking correctly?

> (In reply to comment #38)
> > Gecko 1.8 stars with inspecting the second digit, and because the previous one
> > is a Hindi digit, it leaves this string alone.
> 
> I don't understand that.
> 571       for (i=1;i<aSize;i++) {
> 572         if (IS_ARABIC_CHAR(aBuffer[i-1])) 
> 573           aBuffer[i] = NUM_TO_HINDI(aBuffer[i]);
> 574         else 
> 575           aBuffer[i] = NUM_TO_ARABIC(aBuffer[i]);
> 576       }
> So when i=1, IS_ARABIC_CHAR(aBuffer[0]) is false and we set aBuffer[1] to
> NUM_TO_ARABIC(aBuffer[1]), which I presumse is "2". So why don't we end with
> "1۲۳"?

Like I said above, HandleNumbers in 1.8 is never called with IBMBIDI_NUMERAL_REGULAR or IBMBIDI_NUMERAL_HINDICONTEXT, hence this behavior.

> BTW every time I read this code and this bug I get very confused because
> "Arabic" sometimes means characters in the range 0x600-0x6FF and sometimes
> means characters in the range 0x30-0x39 :-(. Would it be OK to use the term
> "Latin" instead of "Arabic" for 0x30-0x39?

Yeah, it confuses me too.  I'm fine with using the term Latin, so let's do that from now on!

BTW, I was always puzzled why 0-9 are called Arabic numerals, and there seems to be a historical reason behind it! <http://en.wikipedia.org/wiki/Arabic_numerals>  Not that the historical background matters to me; I prefer the less-confusing terms, but it was an interesting read.  :-)
(In reply to comment #41)
> mCharType seems to have a special role here, and it's set in
> <http://mxr.mozilla.org/mozilla1.8/source/layout/generic/nsTextTransformer.cpp#227>.
>  In 1.9(.1), MakeTextRun does not have access to the text frame pointer, or the
> presentation context, so I guess nsTextFrameThebes should retrieve the chartype
> property and pass it along MakeTextRun, so that it can implement the same
> algorithm as we had in 1.8...

I was planning to remove the chartype frame property if possible (bug 399850). Would it be sufficient to leave a flag for Arabic numbers, which we could then pass into MakeTextRun, or would we need a flag for Latin numbers as well?

> This could make some of my previous work in detecting Arabic letters at the end
> of previous textrunes useless, but I think we should keep those as further
> heuristics as well.

They may well solve some cases which didn't work in 1.8, e.g. bug 166803
(Assignee)

Comment 43

10 years ago
(In reply to comment #42)
> I was planning to remove the chartype frame property if possible (bug 399850).
> Would it be sufficient to leave a flag for Arabic numbers, which we could then
> pass into MakeTextRun, or would we need a flag for Latin numbers as well?

We would need to retain eCharType_EuropeanNumber and eCharType_ArabicNumber at least for the purposes of this bug; so that means two flags, one for Arabic numbers and the other for Latin ones.

Is there any progress on bug 399850?  If not, can I assume the chartype property to exist in my patch here, and you touch this code whenever that bug is going to be fixed?

> They may well solve some cases which didn't work in 1.8, e.g. bug 166803

Yeah, I think they would.

I was also playing with an idea in my head.  There are methods of specifying the language for an HTML page (using the lang attribute, Content-Language hearder, etc.).  It may also be a good idea to include the language (if available) in the heuristics as well, so that for example if a page specifies its language to be Arabic or Persian, we would convert all of the Latin digits to Arabic ones.  What do you think?
> Is there any progress on bug 399850?  If not, can I assume the chartype
> property to exist in my patch here, and you touch this code whenever that bug
> is going to be fixed?

I was planning to work on it right after bug 449577 :)

> I was also playing with an idea in my head.  There are methods of specifying
> the language for an HTML page (using the lang attribute, Content-Language
> hearder, etc.).  It may also be a good idea to include the language (if
> available) in the heuristics as well, so that for example if a page specifies
> its language to be Arabic or Persian, we would convert all of the Latin digits
> to Arabic ones.  What do you think?

Yes, this is definitely a good idea, see bug 151395.

Bug 151374 is another long-standing bug that you may be interested in.
(In reply to comment #43)
> Is there any progress on bug 399850?  If not, can I assume the chartype
> property to exist in my patch here, and you touch this code whenever that bug
> is going to be fixed?

Sorry, I didn't answer the second question: you can assume that the chartype property will still exist.
(Assignee)

Comment 46

10 years ago
Roc: given the broken-ness of the bidi.numeral pref with value 1 or 2 on 1.8 (see comment 34), do we want to go back to that behavior, or handle it logically based on the characters' context (in which case, with a change to the test cases, I think my patch is ready for review here)?
Flags: wanted1.9.1? → wanted1.9.1+
That's Simon's call really. Simon?
(Assignee)

Comment 48

10 years ago
Simon: ping?
I think we want the behaviour from your patch, and not to go back to the behaviour in 1.8.
Is this patch ready for (re)review?
(Assignee)

Comment 51

10 years ago
Created attachment 350062 [details] [diff] [review]
Patch (v2)

Patch, updated to tip.

This is mostly ready for review now (has no major changes from previous versions).  There is only a single problem left.  Running the test results in:

not ok - Rendering of reftest 3 is different with bidi.numeral == 3
not ok - Rendering of reftest 3 is different with bidi.numeral == 4
not ok - Rendering of reftest 5 is different with bidi.numeral == 3
not ok - Rendering of reftest 5 is different with bidi.numeral == 4 

Now, comparing the two images for each case visually makes it clear that the images look the same, which is what we expect.  But inspecting them more nearly shows that there are a few pixels different between them, causing the tests to fail.

roc: do you have any idea what I may be doing wrong which should cause this?
Attachment #329241 - Attachment is obsolete: true
Attachment #350062 - Flags: superreview?(roc)
Attachment #350062 - Flags: review?(roc)
If you do two cairo_show_glyph calls, e.g. because of two frames rendering the text instead of just one, you can get different results than if you do just one cairo_show_glyphs call. Bug 465140 is about that.

I probably won't be able to review this until next week.
(Assignee)

Comment 53

10 years ago
(In reply to comment #52)
> If you do two cairo_show_glyph calls, e.g. because of two frames rendering the
> text instead of just one, you can get different results than if you do just one
> cairo_show_glyphs call. Bug 465140 is about that.

So can I leave the tests as they are for now and wait until bug 465140 gets fixed (which should be soon considering the fact that it has an r+'ed patch)?

> I probably won't be able to review this until next week.

Next week is great!  :-)  I just really would like to have this in time to land for Firefox 3.1, because we're planning on shipping a Persian localization, and this would be even more annoying if it misses 3.1 like it did on 3.0.
Do your tests pass now that 465140 has landed?
(Assignee)

Comment 55

10 years ago
(In reply to comment #54)
> Do your tests pass now that 465140 has landed?

No, I still see the same failures.  Although this time I can't see _anything_ different visually, not like before landing of that bug where some pixels looked a bit different if you looked closely.
Try loading the log in layout/tools/reftest/reftest-analyzer.xhtml ?
(Assignee)

Comment 57

10 years ago
(In reply to comment #56)
> Try loading the log in layout/tools/reftest/reftest-analyzer.xhtml ?

Actually this is not a real reftest; it's a Mochitest using canvas to compare images.  What's in that log?  Any chance I generate the log of what you're looking for in my test manually?
The main thing that's in the log are the data URLs for the two images that aren't equal.  There's a bit of other syntax around them.  See http://mxr.mozilla.org/mozilla-central/source/layout/tools/reftest/reftest.js , the DocumentLoaded function, case 2 in the switch(gState) for what gets printed (or intentionally break a reftest and look at the lines containing REFTEST in the log).  (The "number of differing pixels" line is not needed, though, but you probably need the TEST-UNEXPECTED-FAIL line.)
(Assignee)

Comment 59

10 years ago
Created attachment 350589 [details]
Test run results (data: URIs)

Here are the data URIs.  The lines 1 and 2 contain the images from test #3 with bidi.numeral==3, lines 4 and 5 contain the images from test #3 with bidi.numeral==4, lines 7 and 8 contain the images from test #5 with bidi.numeral==3, and lines 10 and 11 contain the images from test #5 with bidi.numeral==4.

I also ran the following PHP script on the results which produced this output, if it helps:

strlen(a) = 1051, strlen(b) = 1059
same: 54, different: 997
strlen(a) = 933, strlen(b) = 934
same: 66, different: 867
strlen(a) = 1058, strlen(b) = 1063
same: 112, different: 946
strlen(a) = 938, strlen(b) = 937
same: 149, different: 788

<?php

$lines = explode("\n",file_get_contents('441782-results.txt'));

check($lines[0],$lines[1]);
check($lines[3],$lines[4]);
check($lines[6],$lines[7]);
check($lines[9],$lines[10]);

function check($a, $b)
{
  $a = str_replace('data:image/png;base64,','',$a);
  $b = str_replace('data:image/png;base64,','',$b);
  
  $a = base64_decode($a);
  $b = base64_decode($b);
  
  print 'strlen(a) = ' . strlen($a) . ', strlen(b) = ' . strlen($b) . "\n";
  $same = $diff = 0;
  for ($i = 0; $i < min(strlen($a), strlen($b)); ++ $i)
    if ($a{$i} == $b{$i})
      $same ++;
    else
      $diff ++;
  print "same: $same, different: $diff\n";
}

?>
(In reply to comment #59)
> Here are the data URIs.  The lines 1 and 2 contain the images from test #3 with
> bidi.numeral==3,

In the 'line 1' image, there is a column of pixels on the left of the '0' which is missing in the 'line 2' image. (I saw this by loading the two URLs into two tabs in a window, then hitting ctrl-tab and staring hard. It's even easier to see if you can zoom in a lot (e.g. using ctrl-mousewheel on Mac).

> lines 4 and 5 contain the images from test #3 with
> bidi.numeral==4

Similar problem; the "dot" glyph after 'text' has a couple of pixels on its left in the 'line 4' image that go messing on line 5.

>, lines 7 and 8 contain the images from test #5 with
> bidi.numeral==3, and lines 10 and 11 contain the images from test #5 with
> bidi.numeral==4.

Same issues as above.

> I also ran the following PHP script on the results which produced this output,
> if it helps:

That doesn't help; PNGs are compressed so as soon as you hit one difference the rest is sure to be different.
I'm not sure what's causing those problems. You should check that the rendering from the test matches what you get when you just load those pages into a window. Are you using Linux/GTK?
(Assignee)

Comment 62

10 years ago
(In reply to comment #60)
> (In reply to comment #59)
> > Here are the data URIs.  The lines 1 and 2 contain the images from test #3 with
> > bidi.numeral==3,
> 
> In the 'line 1' image, there is a column of pixels on the left of the '0' which
> is missing in the 'line 2' image. (I saw this by loading the two URLs into two
> tabs in a window, then hitting ctrl-tab and staring hard. It's even easier to
> see if you can zoom in a lot (e.g. using ctrl-mousewheel on Mac).

Ah, yes, now I see.  Maybe I need to get a new pair of glasses...

> > lines 4 and 5 contain the images from test #3 with
> > bidi.numeral==4
> 
> Similar problem; the "dot" glyph after 'text' has a couple of pixels on its
> left in the 'line 4' image that go messing on line 5.

I still can't see the issue with this image, even now that I know where to look.

Anyway, what's more important is why this is happening.  Could it be caused because of something that I do with textruns?  Is it in scope for this bug at all?

(In reply to comment #61)
> I'm not sure what's causing those problems. You should check that the rendering
> from the test matches what you get when you just load those pages into a
> window. Are you using Linux/GTK?

I had performed those tests on Windows.  I don't have my build machines handy, but I'll check it tonight.  Are you running Mac?  Can you test to see if this happens on Mac as well?  This is 100% reproducible on Windows.
Can you come up with reftests that fail in the same way on trunk, without your patch?
(Assignee)

Comment 64

10 years ago
I'll give it a shot...
Testing this patch on current trunk, I'm seeing textRunCache-related problems. The issue is that with the context-dependent numeral settings, words get cached according to the context in which they happen to be found, and the cached form may then get re-used when the context is actually different.

We could solve this by including some information about the context in the cache keys, but the simplest solution is probably to disable caching for strings that contain digits, if one of the context settings is in effect. This shouldn't have a significant negative impact in real-world situations.
(Assignee)

Comment 66

10 years ago
I tried running the test on Linux, and it fails in different ways:

not ok - Rendering of reftest 2 is different with bidi.numeral == 1
not ok - Rendering of reftest 2 is different with bidi.numeral == 2
not ok - Rendering of reftest 3 is not different with bidi.numeral == 1
not ok - Rendering of reftest 3 is not different with bidi.numeral == 2
not ok - Rendering of reftest 4 is different with bidi.numeral == 1
not ok - Rendering of reftest 4 is different with bidi.numeral == 2
not ok - Rendering of reftest 5 is not different with bidi.numeral == 1
not ok - Rendering of reftest 5 is not different with bidi.numeral == 2 

These are real failures with quite different images.  It seems that on Linux, context detection works incorrectly:

When the preceding text is Arabic, setting bidi.numeral to 1 and 2 causes the type of digits to flip (Arabic -> Latin, Latin -> Arabic).  When the preceding text is not Arabic, setting bidi.numeral to 1 and 2 causes Arabic digits to convert to Latin digits.

I don't understand which part of my changes is platform-specific, and what causes this behavior...

By the way, the failures in comment 51 don't happen on Linux at all.
(Assignee)

Comment 67

10 years ago
Created attachment 351006 [details] [diff] [review]
Patch (v2.2)

I have not observed the problems mentioned in comment 65 myself, but I see the potential, so I updated the patch to make sure no text runs get cached when bidi.numeral is set.

I also updated the test to print a title for each reftest to make inspecting the failures easier.  That change can be removed upon final landing.
Attachment #350062 - Attachment is obsolete: true
Attachment #351006 - Flags: superreview?(roc)
Attachment #351006 - Flags: review?(roc)
Attachment #350062 - Flags: superreview?(roc)
Attachment #350062 - Flags: review?(roc)
I will look at this today; I have been experimenting with a version of the patch that bypasses the cache when there is an affected numeral in the word, so I will compare with what you have done.

To see the effects of caching (with the earlier patch), try a test file such as:

<!DOCTYPE html>
<html>
 <head>
  <meta http-equiv="content-type" content="text/html; charset=utf-8">
 </head>
 <body>
   <p>test ١٢٣ test</p>
   <p>متن ١٢٣ متن</p>
   <p>متن ٤٥٦ متن</p>
   <p>test ٤٥٦ test</p>
   <p>test 987 test</p>
   <p>متن 987 متن</p>
   <p>متن 654 متن</p>
   <p>test 654 test</p>
 </body>
</html>

with bidi.numeral == 2. (Sorry, not set up as a proper reftest yet.)
(Assignee)

Comment 69

10 years ago
(In reply to comment #63)
> Can you come up with reftests that fail in the same way on trunk, without your
> patch?

Is this possible that this only happens when using canvas to capture a frame?

I noted that in the generated images, the right side of 9 is also chopped in a similar way to the left-side of 0.  I could not see this rendering problem when changing the value of bidi.numeral manually, and loading that frame's file in a new tab.  The on-screen rendering seems to be correct at all times.  One other point which makes me suspicious that this is a canvas issue is that on Linux (where these failures don't happen), the images generated by canvas do not have the right side of 9's chopped.
(Assignee)

Comment 70

10 years ago
(In reply to comment #68)
> To see the effects of caching (with the earlier patch), try a test file such
> as:
> 
> with bidi.numeral == 2. (Sorry, not set up as a proper reftest yet.)

So, for example the ١٢٣ on the second line should not be changed to 123 with bidi.numeral == 2, right?  It is with the latest patch I attached.  Is there any other place where caching is performed?

If your patch solves this, can you please attach it to this bug?
Yes, it solves this, but currently I think there may be a leak issue so I'm hoping to track that down. Will attach it shortly for you to test & comment.
Created attachment 351046 [details] [diff] [review]
alternate version of the patch, to address textrun caching issues

This version of the patch bypasses the textrun cache for any "word" that contains a numeral that is transformed by the current bidi.numerals setting; this should give us correct and consistent behavior in all cases, AFAICT. Not yet tested across all the platforms, though.
Attachment #351046 - Flags: review?(ehsan.akhgari)
Note: applying the patch to a recent trunk, I saw a rejected change due to whitespace cleanup in nsTextFrameThebes.cpp, so be aware that it may not apply cleanly depending on the version of your repo. Also gcc on Linux complains of an extra semicolon in gfxTextRunWordCache.cpp. These are both trivial fixes to do; I'll post an updated patch after some further testing.
OK, I've run the layout/base tests on both Mac OS X and Linux, and get the same results in both cases. There are 4 reported failures:

  not ok - Rendering of reftest 3 is not different with bidi.numeral == 1
  not ok - Rendering of reftest 3 is not different with bidi.numeral == 2
  not ok - Rendering of reftest 5 is not different with bidi.numeral == 1
  not ok - Rendering of reftest 5 is not different with bidi.numeral == 2 

However, I'm unclear whether it is the actual behavior or the test that is wrong. In these tests, Arabic-Indic numerals are being converted to Western forms because they occur after Latin text; isn't that the correct "contextual" behavior? Ehsan, please confirm what's supposed to be happening in these cases.
I'll hold off re-reviewing here until you guys decide which patch wins :-)
Created attachment 351073 [details] [diff] [review]
updated patch to properly release TextRunWordCache on shutdown

Minor update to the previous version of the patch; now that the TextRunWordCache is a ref-counted object, we have to Release() it on shutdown, not simply call delete, otherwise it will be reported as a leak.

(Also fixed the issues mentioned in comment #73.)
Attachment #351046 - Attachment is obsolete: true
Attachment #351073 - Flags: review?(ehsan.akhgari)
Attachment #351046 - Flags: review?(ehsan.akhgari)
(Assignee)

Comment 77

10 years ago
Created attachment 351753 [details] [diff] [review]
Patch (v2.3)

OK, Jonathan's changes were good in my opinion, and I included them in my patch.  Also, in reply to comment 74, yes, the tests were wrong.  The behavior of bidi.numeral == 1 or 2 should be context sensitive (and these two values mean the same thing).

So, I corrected the unit test, and the tests now fully pass on Linux.  On Windows, however, the same problem with the images still persists, resulting in the below failures:

not ok - Rendering of reftest 3 is different with bidi.numeral == 1
not ok - Rendering of reftest 3 is different with bidi.numeral == 2
not ok - Rendering of reftest 3 is different with bidi.numeral == 3
not ok - Rendering of reftest 3 is different with bidi.numeral == 4
not ok - Rendering of reftest 5 is different with bidi.numeral == 4 

Like I explained in comment 69, this seems to be a canvas problem on Windows, so unless someone can suggest how to fix/work around these problems, I think we should disable the test on Win32 for now, because it's really passing, if it wasn't for this problem.

Anyway, the patch is ready for review.  Johnathan, thanks for your help with this.
Attachment #351006 - Attachment is obsolete: true
Attachment #351073 - Attachment is obsolete: true
Attachment #351753 - Flags: superreview?(roc)
Attachment #351753 - Flags: review?(roc)
Attachment #351073 - Flags: review?(ehsan.akhgari)
Attachment #351006 - Flags: superreview?(roc)
Attachment #351006 - Flags: review?(roc)
(Assignee)

Comment 78

10 years ago
Created attachment 351754 [details]
Demo of the canvas issue on Windows

Here is a screenshot which is produced on trunk when dragging a tab on Windows.  It also shows the canvas related problem (the right side of the "6" is chopped off).
+// XXXehsan:
+// We need this symbol from nsTextFrameUtils.h, but we can't include it!
+#define TEXT_INCOMING_ARABICCHAR 0x10000000

Can't you make this a cache flag? See gfxTextRunFactory::CACHE_TEXT_FLAGS.

+    // ensure there is a reference before the AddObserver calls;
+    // this will be Release()'d in gfxTextRunWordCache::Shutdown()
+    AddRef();

Instead of adding the ref here, it would be better to let gfxTextRunWordCache::Init add the ref after setting gTextRunWordCache. That way it's clear that gTextRunWordCache is holding the ref. (Unfortunately we can't make gTextRunWordCache an nsRefPtr.)

+    PRBool needsNumeralProcessing =
+        (mBidiNumeral == IBMBIDI_NUMERAL_HINDI) ||
+        (mBidiNumeral == IBMBIDI_NUMERAL_ARABIC) ||
+        (mBidiNumeral == IBMBIDI_NUMERAL_REGULAR) ||
+        (mBidiNumeral == IBMBIDI_NUMERAL_HINDICONTEXT);

Can't this just be != IBMBIDI_NUMERAL_NOMINAL?

Is HandleNumbers actually used? It seems like nsBidiPresUtils::FormatUnicodeText calls it, but only if nsDocumentViewer::SetBidiNumeral has been called and I can't find any setters of bidiNumeral in the tree. Is this used in your Persian build or anywhere else? Seems to me that probably HandleNumbers *should* be called with FormatUnicodeText using the pref setting instead of this prescontext bidi option, and that part of the bidi options should be removed. Separate bug.

+    if ((textRun->GetFlags() & nsTextFrameUtils::TEXT_TRAILING_WHITESPACE) != 0)
+      mNextRunContextInfo |= nsTextFrameUtils::INCOMING_WHITESPACE;
+    if ((textRun->GetFlags() & nsTextFrameUtils::TEXT_TRAILING_ARABICCHAR) != 0)
+      mNextRunContextInfo |= nsTextFrameUtils::INCOMING_ARABICCHAR;

You don't need != 0 here. You should have {} around the statements though.

+    PRBool inWhitespace = *aIncomingFlags & INCOMING_WHITESPACE;

Need a != 0 here since you're assigning to a PRBool which should be only 0 or 1.

Otherwise looks good.
(In reply to comment #79)

> +    // ensure there is a reference before the AddObserver calls;
> +    // this will be Release()'d in gfxTextRunWordCache::Shutdown()
> +    AddRef();
> 
> Instead of adding the ref here, it would be better to let
> gfxTextRunWordCache::Init add the ref after setting gTextRunWordCache. That way
> it's clear that gTextRunWordCache is holding the ref. (Unfortunately we can't
> make gTextRunWordCache an nsRefPtr.)

We can't do this, because Init() is called from the TextRunWordCache constructor; this in turn calls mPrefBranch->AddObserver, and if there's no reference at that time, the cache object will be destroyed during an assignment while it's being added to the list of observers. This is why the AddRef() was placed in the constructor, before Init() gets called.

There might be a solution to this involving better use of the various smart/ref'd pointer types, but I don't have a good enough grasp of all those yet to know the best approach.
OK, then the call to Init() should be moved out of the constructor to be called after "new gfxTextRunWordCache".
(Assignee)

Comment 82

10 years ago
(In reply to comment #79)
> +    PRBool needsNumeralProcessing =
> +        (mBidiNumeral == IBMBIDI_NUMERAL_HINDI) ||
> +        (mBidiNumeral == IBMBIDI_NUMERAL_ARABIC) ||
> +        (mBidiNumeral == IBMBIDI_NUMERAL_REGULAR) ||
> +        (mBidiNumeral == IBMBIDI_NUMERAL_HINDICONTEXT);
> 
> Can't this just be != IBMBIDI_NUMERAL_NOMINAL?

Hmmm, considering the fact that HandleNumberInChar can tolerate invalid pref values, I think this can safely be done as you suggest.  Will do so in the next iteration.

> Is HandleNumbers actually used? It seems like
> nsBidiPresUtils::FormatUnicodeText calls it, but only if
> nsDocumentViewer::SetBidiNumeral has been called and I can't find any setters
> of bidiNumeral in the tree. Is this used in your Persian build or anywhere
> else? Seems to me that probably HandleNumbers *should* be called with
> FormatUnicodeText using the pref setting instead of this prescontext bidi
> option, and that part of the bidi options should be removed. Separate bug.

As far as I know, the only usage of HandleNumbers is nsBidiPresUtils::FormatUnicodeText, but I'm not sure what additional gains can handling this in nsBidiPresUtils::FormatUnicodeText provide (in addition to what I am doing in this patch).  Maybe that should be removed altogether, considering the fact that it doesn't seem to get used at all?

> +    PRBool inWhitespace = *aIncomingFlags & INCOMING_WHITESPACE;
> 
> Need a != 0 here since you're assigning to a PRBool which should be only 0 or
> 1.

Oops :/

> Otherwise looks good.

So, should I disable the test on Windows?  Or disable only those tests which actually fail (don't know how reliable that would be)?

I think I also need to file a bug on the canvas issue...
FormatUnicodeText is called from various places via nsBidiPresUtils::ProcessText and its callers. This gets used for <canvas> text drawing, XUL text drawing (for trees, label="...">), page headers and footers, image "alt" text, CSS list-item numbering, and perhaps a few other places.

I'm not sure what to do about this drawing issue. I'd really like to see a testcase that fails without your patch.
(Assignee)

Comment 84

10 years ago
Created attachment 352597 [details] [diff] [review]
Patch (v2.4)

Review comments addressed.
Attachment #351753 - Attachment is obsolete: true
Attachment #352597 - Flags: superreview?(roc)
Attachment #352597 - Flags: review?(roc)
Attachment #351753 - Flags: superreview?(roc)
Attachment #351753 - Flags: review?(roc)
(Assignee)

Updated

10 years ago
Depends on: 469208
(Assignee)

Comment 85

10 years ago
(In reply to comment #83)
> I'm not sure what to do about this drawing issue. I'd really like to see a
> testcase that fails without your patch.

I finally managed to produce a test case which fails without my patch.  I filed that as bug 469208.
+        TEXT_OPTIMIZE_SPEED          = 0x0800,
+        /**
+         * When set, the previous character for this textrun was an Arabic
+         * character.  This is used for the context detection necessary for
+         * bidi.numeral implementation.
+         */
+        TEXT_INCOMING_ARABICCHAR     = 0x10000000

This should be in gfxTextRunWordCache.h.

+        gTextRunWordCache->AddRef();

Better NS_ADDREF(gTextRunWordCache);

+    gTextRunWordCache->Release();
     gTextRunWordCache = nsnull;

Better NS_IF_RELEASE(gTextRunWordCache). The possibility that gTextRunWordCache is already null must definitely be handled.

+    PRBool needsNumeralProcessing =
+        (mBidiNumeral != IBMBIDI_NUMERAL_NOMINAL);

Parentheses not needed.

+    PRBool needsNumeralProcessing =
+        (mBidiNumeral == IBMBIDI_NUMERAL_HINDI) ||
+        (mBidiNumeral == IBMBIDI_NUMERAL_REGULAR) ||
+        (mBidiNumeral == IBMBIDI_NUMERAL_HINDICONTEXT);

Can't this be changed to != NOMINAL?

(In reply to comment #85)
> (In reply to comment #83)
> > I'm not sure what to do about this drawing issue. I'd really like to see a
> > testcase that fails without your patch.
> 
> I finally managed to produce a test case which fails without my patch.  I filed
> that as bug 469208.

Thanks!
Flags: wanted1.9.0.x?
(Assignee)

Comment 87

10 years ago
Created attachment 353522 [details] [diff] [review]
Patch (v2.5)

All review comments addressed.  Disabled tests on Windows for now.
Attachment #352597 - Attachment is obsolete: true
Attachment #353522 - Flags: superreview?(roc)
Attachment #353522 - Flags: review?(roc)
Attachment #352597 - Flags: superreview?(roc)
Attachment #352597 - Flags: review?(roc)
Attachment #353522 - Flags: superreview?(roc)
Attachment #353522 - Flags: superreview+
Attachment #353522 - Flags: review?(roc)
Attachment #353522 - Flags: review+
So, revisiting comment 42 and 43: you are not using the frame chartype property at all in the final patch?
(Assignee)

Comment 90

10 years ago
(In reply to comment #89)
> So, revisiting comment 42 and 43: you are not using the frame chartype property
> at all in the final patch?

No, in comment 49 we decided to take the improved behavior in this bug, so I didn't need to use the char type property.
(In reply to comment #79)

> Is HandleNumbers actually used? It seems like
> nsBidiPresUtils::FormatUnicodeText calls it, but only if
> nsDocumentViewer::SetBidiNumeral has been called and I can't find any setters
> of bidiNumeral in the tree. Is this used in your Persian build or anywhere
> else? Seems to me that probably HandleNumbers *should* be called with
> FormatUnicodeText using the pref setting instead of this prescontext bidi
> option, and that part of the bidi options should be removed. Separate bug.

The prescontext bidi options are initialized from prefs in nsPresContext::GetUserPreferences and updated when prefs change (note that setting bidiOptions sets all the bidi* properties in nsDocumentViewer). The bidi numeral pref *is* working without Ehsan's patch in the codepaths that go through FormatUnicodeText.
(Assignee)

Comment 92

10 years ago
(In reply to comment #91)
> The prescontext bidi options are initialized from prefs in
> nsPresContext::GetUserPreferences and updated when prefs change (note that
> setting bidiOptions sets all the bidi* properties in nsDocumentViewer). The
> bidi numeral pref *is* working without Ehsan's patch in the codepaths that go
> through FormatUnicodeText.

Ah, yes.  This is what we were observing in comment 0 and 1.
(Assignee)

Comment 93

10 years ago
Landed on mozilla-central: <http://hg.mozilla.org/mozilla-central/rev/54d08b93e78b>
Status: ASSIGNED → RESOLVED
Last Resolved: 10 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Whiteboard: [landed on m-c; needs to bake for 1.9.1]
Target Milestone: --- → mozilla1.9.2a1

Comment 94

10 years ago
NEW WARNING:
  intl/unicharutil/util/nsBidiUtils.cpp:580: unused variable 'prev' - Blamed on Ehsan Akhgari <ehsan.akhgari@gmail.com> - http://hg.mozilla.org/mozilla-central/annotate/54d08b93e78bc06e8731bcc1e2eebe215af9e7d7/intl/unicharutil/util/nsBidiUtils.cpp#l580

Comment 95

10 years ago
also NEW WARNING:
  layout/generic/nsTextFrameThebes.cpp:757: 'BuildTextRunsScanner::mCurrentRunContextInfo' will be initialized after - Blamed on Ehsan Akhgari <ehsan.akhgari@gmail.com> - http://hg.mozilla.org/mozilla-central/annotate/54d08b93e78bc06e8731bcc1e2eebe215af9e7d7/layout/generic/nsTextFrameThebes.cpp#l757
  layout/generic/nsTextFrameThebes.cpp:754: 'PRPackedBool BuildTextRunsScanner::mSkipIncompleteTextRuns' - Blamed on roc+@cs.cmu.edu - http://hg.mozilla.org/mozilla-central/annotate/22339d7aca17716d42ffbe181a2a11cc7e157ed9/layout/generic/nsTextFrameThebes.cpp#l939
  layout/generic/nsTextFrameThebes.cpp:608: when initialized here - Blamed on roc+@cs.cmu.edu - http://hg.mozilla.org/mozilla-central/annotate/22339d7aca17716d42ffbe181a2a11cc7e157ed9/layout/generic/nsTextFrameThebes.cpp#l821
(Assignee)

Updated

10 years ago
Depends on: 470376
(Assignee)

Updated

10 years ago
Blocks: 467672
This seems to have caused leak bug 470418.
(Assignee)

Comment 97

9 years ago
Comment on attachment 353522 [details] [diff] [review]
Patch (v2.5)

Asking approval to land on 1.9.1.

This is a regression from Gecko 1.8, which is present in 1.9 and 1.9.1.  It matter a lot to Arabic and Persian users, so it would be great to have in 1.9.1.  It has an automated unit test and has been baking for quite some time on trunk.
Attachment #353522 - Flags: approval1.9.1?
Can't grant approval without a better handle on that memory leak - that's what baking is for, though! :)
Comment on attachment 353522 [details] [diff] [review]
Patch (v2.5)

(please renom once the memory leak issue has been resolved)
Attachment #353522 - Flags: approval1.9.1? → approval1.9.1-
(Assignee)

Updated

9 years ago
Attachment #353522 - Flags: approval1.9.1- → approval1.9.1?
(Assignee)

Comment 100

9 years ago
Comment on attachment 353522 [details] [diff] [review]
Patch (v2.5)

Re-noming now that bug 470418 has landed.
Attachment #353522 - Flags: approval1.9.1? → approval1.9.1+
Comment on attachment 353522 [details] [diff] [review]
Patch (v2.5)

a191=beltzner; make sure it lands along with the associated fix
(Assignee)

Comment 102

9 years ago
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/6e3ca7d0f85b
Keywords: fixed1.9.1
Whiteboard: [landed on m-c; needs to bake for 1.9.1]
(Assignee)

Updated

9 years ago
Depends on: 473530

Updated

9 years ago
Depends on: 477498
So was there a reason the checkin for this didn't use WindowSnapshot for the canvas ops?  That lets you do _much_ faster comparisons...

I'm assuming that there isn't and that it's OK if I blanket update all the tests that use this code to WindowSnapshot.
Also, is there a reason for the 500ms wait after each load?  Again, I'm assuming there really isn't and that it's ok to just do the snapshot onload.
(In reply to comment #103)
> So was there a reason the checkin for this didn't use WindowSnapshot for the
> canvas ops?  That lets you do _much_ faster comparisons...
> 
> I'm assuming that there isn't and that it's OK if I blanket update all the
> tests that use this code to WindowSnapshot.

Sounds good to me.
Pushed http://hg.mozilla.org/mozilla-central/rev/17a57ccd55c8

I didn't touch the windows newlines or anything like that.
(Assignee)

Updated

9 years ago
Duplicate of this bug: 482469

Updated

9 years ago
Duplicate of this bug: 482469
(Assignee)

Updated

9 years ago
Keywords: user-doc-needed
You need to log in before you can comment on or make changes to this bug.