Closed Bug 83958 Opened 19 years ago Closed 19 years ago

Bidi ordering regression

Categories

(Core :: Layout: Text and Fonts, defect)

x86
Windows NT
defect
Not set
normal

Tracking

()

VERIFIED FIXED

People

(Reporter: smontagu, Assigned: mkaply)

References

Details

(Keywords: css2, Whiteboard: [PDT+](py8ieh: check fix))

Attachments

(7 files)

Inline html tags should be ignored when performing bidi reordering. As the
attached example shows, they are currently being treated like block tags and
causing an additional embedding level.

Results: dir=ltr or dir=rtl This paragraph is all in English, so the words
should be in the same order whether 

Expected results: This paragraph is all in English, so the words should be in
the same order whether dir=rtl or dir=ltr

hyatt: is this caused by the new CSS rule matching checkin? In
nsBidiPresUtils::InitLogicalArray, you replaced 

 if (NS_STYLE_DIRECTION_RTL == display->mExplicitDirection) {

by 

 if (bits & NS_STYLE_INHERIT_VISIBILITY) {
   if (NS_STYLE_DIRECTION_RTL == vis->mDirection) {

Since ExplicitDirection means "not inherited", I would expect the equivalent
condition to be when NS_STYLE_INHERIT_VISIBILITY is not set, no?
Attached file test case
CC'ing gilad -- Mozilla Hebrew QA contact beginning this week. 
Per CSS we really should not be doing anything different if the user explicitly
sets the property or if the user lets it inherit. The cases are identical in CSS
because all that matters if the computed value and 'inherit' is never a computed
value. Could you explain this some more for me please?
Hixie: the difference between inherited and explicit directionality is 
important for the ordering of the elements within a paragraph. See 
http://www.w3.org/TR/html401/struct/dirlang.html#h-8.2.2 and the following 
sections.

For a simple example,       

 <p dir="rtl">abc <span dir="rtl">def</span> ghi</p>
 <p dir="rtl">abc <b>def</b> ghi</p>

should be rendered (capitals standing in for bold)

ghi def abc
abc DEF ghi

Can someone give me a sanity check here?

That last example in IE looks like this (still using capitals to represent 
bold):

IHG def CBA
Can someone give me a sanity check here?

That last example in IE looks like this (still using capitals to represent 
bold):

IHG def CBA
IHG FED CBA

Am I missing something or is IE?
Simon: Yes, but that's only because the relevant style rules for the 'dir'
attributes would be:

   [dir=rtl] { direction: rtl; unicode-bidi: embed; }
   [dir=ltr] { direction: ltr; unicode-bidi: embed; }
   bdo { unicode-bidi: bidi-override; }

To quote the spec:
# For the 'direction' property to have any effect on inline-level elements, 
# the 'unicode-bidi' property's value must be 'embed' or 'override'. 
  -- http://www.w3.org/TR/REC-CSS2/visuren.html#propdef-direction

It's the 'unicode-bidi' property that causes the effect, not the 'direction'
property. The following three examples should all look EXACTLY IDENTICAL:

   <span style="direction: right-to-left"> 
     abc
     <span style="direction: right-to-left"> def </span>
     ghi
   </span>

   <span style="direction: right-to-left"> 
     abc
     <span style="direction: inherit"> def </span>
     ghi
   </span>

   <span style="direction: right-to-left"> 
     abc
     <span> def </span>
     ghi
   </span>

Under no circumstances should those cases look different. If they do, then there
is a major bidi style bug.
Keywords: css2
Whiteboard: (py8ieh: check fix)
On IE 6, it looks like this:

ghi def abc

abc DEF ghi

And the whole page is right aligned.
hixie: I agree with you that the 'unicode-bidi' property causes the effect, but
note that the spec also says:

"The HTML 4.0 specification ([HTML40], section 8.2) defines bidirectionality
behavior for HTML elements. Conforming HTML user agents may therefore ignore the
'direction' and 'unicode-bidi' properties in author and user style sheets. The
style sheet rules that would achieve the bidi behavior specified in [HTML40] are
given in the sample style sheet."

Here is the relevant part of the sample style sheet,
http://www.w3.org/TR/REC-CSS2/sample.html#bidi

BDO[DIR="ltr"]  { direction: ltr; unicode-bidi: bidi-override }
BDO[DIR="rtl"]  { direction: rtl; unicode-bidi: bidi-override }

*[DIR="ltr"]    { direction: ltr; unicode-bidi: embed }
*[DIR="rtl"]    { direction: rtl; unicode-bidi: embed }

/* Elements that are block-level in HTML4 */
ADDRESS, BLOCKQUOTE, BODY, DD, DIV, DL, DT, FIELDSET, 
FORM, FRAME, FRAMESET, H1, H2, H3, H4, H5, H6, IFRAME,
NOSCRIPT, NOFRAMES, OBJECT, OL, P, UL, APPLET, CENTER, 
DIR, HR, MENU, PRE, LI, TABLE, TR, THEAD, TBODY, TFOOT, 
COL, COLGROUP, TD, TH, CAPTION 
                { unicode-bidi: embed }

The cases in your example should only be identical if the value of
'unicode-bidi' is 'normal', and although this is the default value, it does not
apply to any HTML element. 
Now, it seems we need to make sure that 'direction' is resolved *after*:

1) 'unicode-bidi' resolution,
2) 'display' resolution

since 'direction' depends on both of them:

  - on block-level elements it's set unconditionally (in the scope of mentioned 
conditions, of course);
  - to inline elements - is not applied, or applied differently, depending 
on 'unicode-bidi' value.

(David Baron: 'unicode-bidi' has already been implemented.)
Simon, Hixie: Although 'unicode-bidi: normal' is not recommended by 
http://www.w3.org/TR/REC-CSS2/sample.html#bidi, it works in IE5.0, and also in 
mozilla (tested on 20010508 build):

BDO { unicode-bidi: normal }

    ....................

<BDO dir=rtl>abc</BDO>

(style sheet rule wins, and bidi override specified in 'BDO' tag has no effect.)
My changes to nsBidiPresUtils.cpp were probably wrong.  They don't even seem to
make a difference (but the other changes seem to fix this bug, but probably
break other things).

Does anybody familiar with the bidi code know what those changes would break and
how to fix them, or if they're the right thing to do at all?
David: not sure this is related to the problem, however..

I also tried adding

static const PRUnichar kLRO              = 0x202D;
static const PRUnichar kRLO              = 0x202E;

but those do harm -- please see the comment in InitLogicalArray:

// Example: <bdo dir=ltr>latin HEBREW</bdo>
// should be displayed as is: latin HEBREW
// The entire text frame receives even EL and whouldn't "normally" be split.
// Then, if the platform is Bidi, it reverses hebrew characters, which results 
// in incorrect text order: latin WERBEH
// To avoid that, we always insert LRE/RLE. That forces splitting the frame into
// uni-directional (from the platform point of view) pieces - with even EL for
// "latin" and odd EL for "HEBREW". Unicode override, if any, will be taken into
// account in AdjustEmbeddingLevel().




 
Yes, I got the check backwards.  It should be

if (!(NS_STYLE_INHERIT_VISIBILITY & bits)) {

}

Does that fix the bug?
Simon: Why do you say unicode-bidi: normal "does not apply to any HTML element"?
I don't see where you got that from. Also, the test you quoted does not affect
my example at all, since "span" elements are not mentioned in the text you
quoted and "dir" attributes are not present in the markup I pasted.

It appears Lina and dbaron have this in hand, though. :-)
Can we fission this discussion?

The original reordering bug I reported is fixed by hyatt's change of the if
statement.

That should be reviewed and checked in, since the bug breaks real-world Bidi
sites, e.g. the weather forecast at www.msn.co.il

hixie, do you agree that the discussion of css bidi attributes is irrelevant to
both the bug and the fix? I would be happy to carry on that discussion somewhere
else, but here I think it's obscuring the issue.
While I understand that hyatt's fix returns us to the status quo, I believe it
is the wrong thing for us to do. If that fix is checked in, it should only be
on a branch, and not on the Mozilla trunk, as it is only a band-aid and merely
serves to obscure and deprioritise the real issue.

I agree, however, that it would be appropriate to continue discussing this
elsewhere. The netscape.public.mozilla.style newsgroup is probably the best
fit for this issue. Please cc me on your reply if you do respond to my comments
above in the newsgroups.
hyatt's fix is not a bandaid, it's a correction to his earlier checkin on bug 
78965, which caused a regression in Bidi reordering on sites without any style 
markup.

If there is a bidi style bug, that isn't "the real issue", it's a different 
issue and can be prioritised or deprioritised on its own merits. There's no 
reason to hijack this bug and leave the trunk rendering sites incorrectly just 
in order to make a point.
Hixie, I don't see the harm in returning the code to the way it was prior to 
the 78695 checkin.
Is NS_STYLE_INHERIT_VISIBILITY specific to the 'direction' property?  If not,
then will that even fully return things to the state they were before?

(My feeling is that the old code is wrong, and we should try to fix it, since
the specs already have a way to handle this, but we're just not using it.  It's
also IMO a rather odd (i.e., wrong) use of the style system.)
I agree with dbaron on this.
I agree that there is work needed on the bidi CSS properties. Meanwhile, does
anybody have an objection to checking hyatt's fix into the 0.9.1 branch? 
Hyatt's changes aren't on the 0.9.1 branch.  This still works there, right?
I understood that hyatt's changes were in the branch, but maybe I just don't 
know how to use Bonsai. If they're not, there should be no problem.
My changes are not on the branch.
So, what's the status of this bug?  Are we going to do something about it?  It
would be nice to fix it before we close for 0.9.2.
After putting the fix from hyatt in, I get consistent behavior between IE6 and
Mozilla in all cases mentioned in this bug.

Why shouldn't this fix go in?
Does hyatt's patch break if the 'visibility' or '-moz-opacity' property is
specified on a node where 'direction' is not?
dbaron:

Could you give me some sample HTML pretty please?

Thanks
QA contact changed to: giladehven@hotmail.com
QA Contact: momoi → giladehven
Does anyone mind if I move 'mUnicodeBidi' from 'nsStyleText' 
to 'nsStyleVisibility'?
I have a patch in my tree (modified from the one above) that has it moved to
nsStyleTextReset, except I'm having trouble getting it working.  (It should be
in a reset struct since it's reset, not inherited.)

I need to talk to hyatt today...
I also have my own version of dbaron's patch above, but there are some things 
that I still can't fix and/or don't understand. Maybe we can set up some 
conference call or IRC meeting to sort this issue out?
I'm wondering if there's been another regression in bidi reordering recently. 
The style stuff seems to be right now... it's basically what I was doing before
except I moved unicode-bidi into a different struct (which is important for
sharing) and I fixed the attribute mapping to be correct.

However, I'm seeing that the reordering isn't quite right anymore.  Putting in
LRO/RLO for override rather than LRE/RLE helps a bit (and when I was working on
the previous patch it didn't change anything).  And I'm seeing the same results
on Windows 2000 and Linux.

(I think what might be happening with the patch is that overrides are working
when they're not at the beginning of the line, and if I don't make the LRO/RLO
change it's the reverse, although I'd have to go through the test case more
closely.)
Well, the patch has the same problems if I apply it to a tree from June 5, so
the style data still isn't quite right, I think...
The above patch passes:

http://www.people.fas.harvard.edu/~dbaron/css/test/bidi2
http://www.people.fas.harvard.edu/~dbaron/css/test/bidi2_html
http://www.people.fas.harvard.edu/~dbaron/css/test/bidi2_charcode


I'm looking for reviews for this patch and the one on bug 73251.

The summary of the changes in this patch is the following:
 * Change 'unicode-bidi' from an inherited to non-inherited property (and make
it use the normal inheritance style values), and thus move it from the Text
struct to the TextReset struct.
 * Use 'unicode-bidi' instead of knowledge of "explicit 'direction'" to
determine when there should be a new embedding level
 * Since 'unicode-bidi' is not inherited anymore, modify
nsBidiPresUtils::Resolve to check the value on the block frame and add an
override to InitLogicalArray if needed.
"dbaron" wrote: 
> * Since 'unicode-bidi' is not inherited anymore, modify
> nsBidiPresUtils::Resolve to check the value on the block frame and add an
> override to InitLogicalArray if needed."

1. <http://www.w3.org/TR/REC-CSS2/visuren.html#direction>
'unicode-bidi' property applies also to inline elements. (BTW, by default 'bdo' 
html element opens an inline frame.)

2. We can't use LRO/RLO on BIDI enabled system.
According to CSS2 spec, 'unicode-bidi' with value 'embed' is an equivalent to 
LRE/RLE (U+202A/U+202B), and 'unicode-bidi' with value 'bidi-override' - 
corresponds to adding LRO/RLO (U+202D/U+202E). Thus, if I write

<P><B class=rlo>&#1491;&#1492;&#1493; <i>DEF</i></B></P>

- all the elements enclosed in <B>...</B> (including the child element <I>) 
should receive RTL override.

On the other hand, the spec says that 'unicode-bidi' isn't inherited.

Can anyone explain this contradiction?
I still have some issues with the last patch:

1) In |nsBidiPresUtils::Resolve| the second block of added code is between the
call to |InitLogicalArray| and the test of the return value. Unless there is a
particular reason for putting it there, move it to after the test.

2) What would be the pros and cons of using the bidi stuff from the CSS sample
style sheet http://www.w3.org/TR/REC-CSS2/sample.html#bidi instead of handling
BDO and DIR in nsGenericHTMLElement.cpp?

3) Have you tested on a platform with its own bidi support (e.g. W2K)? Cases 11,
12, 13, 15 and 16 in the group "Each pair of lines should look the same" in your
test pages are broken for me on WinNT with Hebrew support. If they work for you,
perhaps there is a conflict with the patch in bug 81664
In nsRuleNode::Compute<Style>Data we evaluate 'PRBool inherited', which is 
common for a whole set of rules. What will happen, if inheritance rules are 
contradictory for different styles? E.g. in ComputeTextResetData 'text-
decoration' is 'enum', but 'unicode-bidi' is 'inherit'.
>"dbaron" wrote: 
>> * Since 'unicode-bidi' is not inherited anymore, modify
>> nsBidiPresUtils::Resolve to check the value on the block frame and add an
>> override to InitLogicalArray if needed."
>
>1. <http://www.w3.org/TR/REC-CSS2/visuren.html#direction>
>'unicode-bidi' property applies also to inline elements. (BTW, by default 'bdo' 
>html element opens an inline frame.)

I apply it to inline elements - InitLogicalArray is recursive.

>2. We can't use LRO/RLO on BIDI enabled system.

That sounds bad.  So if I stick in the character entities for LRO/RLO on
the Hebrew version of Win2K, it won't work?  This sounds like a bug in
itself - we ought to be able to pass:
http://www.people.fas.harvard.edu/~dbaron/css/test/bidi2_charcode
and if we did I would think this wouldn't be a problem.  This seems almost
like it should be handled at the gfx level.

>------- Additional Comments From Lina Kemmel 2001-06-24 06:42 -------
>
>According to CSS2 spec, 'unicode-bidi' with value 'embed' is an equivalent to 
>LRE/RLE (U+202A/U+202B), and 'unicode-bidi' with value 'bidi-override' - 
>corresponds to adding LRO/RLO (U+202D/U+202E). Thus, if I write
>
><P><B class=rlo>&#1491;&#1492;&#1493; <i>DEF</i></B></P>
>
>- all the elements enclosed in <B>...</B> (including the child element <I>) 
>should receive RTL override.
>
>On the other hand, the spec says that 'unicode-bidi' isn't inherited.
>
>Can anyone explain this contradiction?

There's no contradiction - it's equivalent to putting an LRE, RLE, LRO, or
RLO at the beginning of the element and a PDF at the end.  Why does it seem
like there is a contradiction?  One doesn't need an RLO/LRO for each inline
element to create an override -- a single pair of formatting characters
should be enough.

>------- Additional Comments From Simon Montagu 2001-06-24 07:22 -------
>
>I still have some issues with the last patch:
>
>1) In |nsBidiPresUtils::Resolve| the second block of added code is between the
>call to |InitLogicalArray| and the test of the return value. Unless there is a
>particular reason for putting it there, move it to after the test.

I was just thinking that it would be better not to push one of the formatting
characters without pushing the other (the PDF).

>2) What would be the pros and cons of using the bidi stuff from the CSS sample
>style sheet http://www.w3.org/TR/REC-CSS2/sample.html#bidi instead of handling
>BDO and DIR in nsGenericHTMLElement.cpp?

A rule like [dir=rtl] has significant performance problems since we have to get
the dir attribute more in the CSS selector matching process than using attribute
mapping.  (We have to do it twice rather than once and we have to get the
attribute as a string which involves string copying.  I think it might actually
be worse than this, but I can't think why right now.)

>3) Have you tested on a platform with its own bidi support (e.g. W2K)? Cases 11,
>12, 13, 15 and 16 in the group "Each pair of lines should look the same" in your
>test pages are broken for me on WinNT with Hebrew support. If they work for you,
>perhaps there is a conflict with the patch in bug 81664

I've tested on vanilla Win2K and on Linux (although I'm not sure whether I
tested the working version on Linux - but I was seeing the same results on
the two before the final version).  Does standard US Win2K have BiDi support?


>------- Additional Comments From Lina Kemmel 2001-06-24 08:38 -------
>
>In nsRuleNode::Compute<Style>Data we evaluate 'PRBool inherited', which is 
>common for a whole set of rules. What will happen, if inheritance rules are 
>contradictory for different styles? E.g. in ComputeTextResetData 'text-
>decoration' is 'enum', but 'unicode-bidi' is 'inherit'.

This inherited bit doesn't really mean inherited - it means (I think) that for
some reason (which is generally inheritance) the style struct can't be shared.

The reason you just brought up is exactly the reason that hyatt's patch
above won't always work.
> ------- Additional Comments From David Baron 2001-06-25 09:24 -------

> I apply it [override] to inline elements - InitLogicalArray is recursive.

Sorry, I got it wrongly (I thought you said "check the value on the block frame 
and add an override to *LogicalArray* if needed."

-------

>> 2. We can't use LRO/RLO on BIDI enabled system.

> That sounds bad. This sounds like a bug in itself...

Using LRE/RLE instead, and calling AdjustEmbeddingLevel() after BIDI resolution 
were supposed to prevent this bug.

-------

> There's no contradiction - it's ['unicode-bidi'] equivalent to putting an
> LRE, RLE, LRO, or RLO at the beginning of the element and a PDF at the end.
> Why does it seem like there is a contradiction?  One doesn't need an RLO/LRO 
> for each inline element to create an override -- a single pair of formatting 
> characters should be enough.

I understand and agree with that. But what does the spec mean by "non-
inheritance" of 'unicode-bidi'?

-------

> Does standard US Win2K have BiDi support?

No, it doesn't have BiDi support (or provides it partly), unless you have BiDi 
(hebrew or arabic) locale installed.

> > > 2. We can't use LRO/RLO on BIDI enabled system.

> > That sounds bad. This sounds like a bug in itself...

>Using LRE/RLE instead, and calling AdjustEmbeddingLevel() after BIDI resolution 
>were supposed to prevent this bug.

I had to remove AdjustEmbeddingLevel because it was written around the
assumption that 'unicode-bidi' was inherited.  Since we're supposed to support
RLO and LRO characters in the document, I had hoped it wouldn't be needed.  Is
it true that we don't correctly support those on a bidi-enabled Win2K system?

> > There's no contradiction - it's ['unicode-bidi'] equivalent to putting an
> > LRE, RLE, LRO, or RLO at the beginning of the element and a PDF at the end.
> > Why does it seem like there is a contradiction?  One doesn't need an RLO/LRO 
> > for each inline element to create an override -- a single pair of formatting 
> > characters should be enough.
>
>I understand and agree with that. But what does the spec mean by "non-
>inheritance" of 'unicode-bidi'?

If 'unicode-bidi' had been inherited, it meant that given the markup:
 <p style="unicode-bidi: embed">This <span>is <span>some</span> text</span>.</p>
each span would open a new embedding level since it inherited the 'embed' value.
 Since it is not inherited, the spans have the default value of 'normal' and do
not open a new embedding level.  (Things like 'color' are inherited since you
want the text of the spans to be in the same color as the paragraph rather than
in the default color of the document.)
The problem isn't with the LRO/RLO characters on a Bidi system as such, it's 
with constructs like 

&#x202d;ABC &#x05D3;&#x05D4;&#x05D5;&#x202c;

That produces a single frame containing characters with different bidi 
chartypes but the same embedding level, which we have problems handling. On a 
non-bidi platform we just send the frame as is to gfx and it comes out OK, but 
on a bidi platform the Hebrew gets reversed when displayed.

Including the LRO and RLO characters in the displayed string doesn't work on 
all platforms, and not all platforms have an option to override their own bidi 
reordering. 
How should we proceed from here?  Would stripping the LRM, RLM, LRE, RLE, PDF,
LRO, and RLO characters before sending the text to the OS work?
> ------- Additional Comments From David Baron 2001-06-25 10:46 -------

> If 'unicode-bidi' had been inherited, it meant that given the markup:
> <p style="unicode-bidi: embed">This <span>is <span>some</span>
> text</span>.</p> each span would open a new embedding level since it
> inherited the 'embed' value.

You and the spec :-) are definitely right. My mistake was that I thought 
that "'unicode-bidi' isn't inherited" meant not only that a child element would 
not open a new embedding level, but also that it would *reset* the level to 
what it was before the last embedding/override..

> I had to remove AdjustEmbeddingLevel because it was written around the
> assumption that 'unicode-bidi' was inherited.

The significance of that inheritance was different, meaning, in fact, 
inheriting of the current embedding level (without increasing it). However, all 
this isn't important now. Hopefully, I found solution for the LRO/RLO issue 
(based on your patch).
Summary of the changes :
1. Text frame with mixed content (containing both strong RTL and LTR 
characters) are split, so that they are uni-directional from system's point of 
view.
2. Most of BiDi control characters are stripped before displaying.

Please review the patch, and, if you find it acceptable to us, I'll submit 
fixes for the:

Open issues:
1. Need to take off also Arabic specific characters - zero width non-joiner and 
zero width joiner (U+200C and U+200D).
2. Provide some cosmetic changes.
On WinNT with Hebrew support, the combined dbaron and lkemmel patch passes all
the testcases mentioned in the comments above. dbaron, can you confirm for other
platforms?
It works great for me on Win2K (default US-English install).  (It removes the
stray characters that previously showed up while rendering
http://www.people.fas.harvard.edu/~dbaron/css/test/bidi2_charcode .)
r=simon@softel.co.il (for attachment 40276 [details] [diff] [review])
Who can sr this? I would very much like this to make it into 0.9.2
sr=hyatt
Fix checked into trunk 2001-06-28 20:15 PDT.  Adding nsBranch keyword.
Keywords: nsBranch
Marking PDT+ per conversation with chofmann.
Whiteboard: (py8ieh: check fix) → [PDT+](py8ieh: check fix)
Fix checked into branch 2001-06-30 11:59 PDT.
Status: NEW → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
*** Bug 91105 has been marked as a duplicate of this bug. ***
simon@softel.co.il or giladehven - can you pls verify this bug?
QA contact to Zach, who is the default QA now.
QA Contact: giladehven → zach
VERIFIED FIXED after rerunning all the attached test cases.
Status: RESOLVED → VERIFIED
Component: Layout: BiDi Hebrew & Arabic → Layout: Text
QA Contact: zach → layout.fonts-and-text
You need to log in before you can comment on or make changes to this bug.