Closed Bug 83958 Opened 23 years ago Closed 23 years ago

Bidi ordering regression

Categories

(Core :: Layout: Text and Fonts, defect)

x86
Windows NT
defect
Not set
normal

Tracking

()

VERIFIED FIXED

People

(Reporter: smontagu, Assigned: mkaply)

References

Details

(Keywords: css2, Whiteboard: [PDT+](py8ieh: check fix))

Attachments

(7 files)

Inline html tags should be ignored when performing bidi reordering. As the attached example shows, they are currently being treated like block tags and causing an additional embedding level. Results: dir=ltr or dir=rtl This paragraph is all in English, so the words should be in the same order whether Expected results: This paragraph is all in English, so the words should be in the same order whether dir=rtl or dir=ltr hyatt: is this caused by the new CSS rule matching checkin? In nsBidiPresUtils::InitLogicalArray, you replaced if (NS_STYLE_DIRECTION_RTL == display->mExplicitDirection) { by if (bits & NS_STYLE_INHERIT_VISIBILITY) { if (NS_STYLE_DIRECTION_RTL == vis->mDirection) { Since ExplicitDirection means "not inherited", I would expect the equivalent condition to be when NS_STYLE_INHERIT_VISIBILITY is not set, no?
Attached file test case
CC'ing gilad -- Mozilla Hebrew QA contact beginning this week.
Per CSS we really should not be doing anything different if the user explicitly sets the property or if the user lets it inherit. The cases are identical in CSS because all that matters if the computed value and 'inherit' is never a computed value. Could you explain this some more for me please?
Hixie: the difference between inherited and explicit directionality is important for the ordering of the elements within a paragraph. See http://www.w3.org/TR/html401/struct/dirlang.html#h-8.2.2 and the following sections. For a simple example, <p dir="rtl">abc <span dir="rtl">def</span> ghi</p> <p dir="rtl">abc <b>def</b> ghi</p> should be rendered (capitals standing in for bold) ghi def abc abc DEF ghi
Can someone give me a sanity check here? That last example in IE looks like this (still using capitals to represent bold): IHG def CBA
Can someone give me a sanity check here? That last example in IE looks like this (still using capitals to represent bold): IHG def CBA IHG FED CBA Am I missing something or is IE?
Simon: Yes, but that's only because the relevant style rules for the 'dir' attributes would be: [dir=rtl] { direction: rtl; unicode-bidi: embed; } [dir=ltr] { direction: ltr; unicode-bidi: embed; } bdo { unicode-bidi: bidi-override; } To quote the spec: # For the 'direction' property to have any effect on inline-level elements, # the 'unicode-bidi' property's value must be 'embed' or 'override'. -- http://www.w3.org/TR/REC-CSS2/visuren.html#propdef-direction It's the 'unicode-bidi' property that causes the effect, not the 'direction' property. The following three examples should all look EXACTLY IDENTICAL: <span style="direction: right-to-left"> abc <span style="direction: right-to-left"> def </span> ghi </span> <span style="direction: right-to-left"> abc <span style="direction: inherit"> def </span> ghi </span> <span style="direction: right-to-left"> abc <span> def </span> ghi </span> Under no circumstances should those cases look different. If they do, then there is a major bidi style bug.
Keywords: css2
Whiteboard: (py8ieh: check fix)
On IE 6, it looks like this: ghi def abc abc DEF ghi And the whole page is right aligned.
hixie: I agree with you that the 'unicode-bidi' property causes the effect, but note that the spec also says: "The HTML 4.0 specification ([HTML40], section 8.2) defines bidirectionality behavior for HTML elements. Conforming HTML user agents may therefore ignore the 'direction' and 'unicode-bidi' properties in author and user style sheets. The style sheet rules that would achieve the bidi behavior specified in [HTML40] are given in the sample style sheet." Here is the relevant part of the sample style sheet, http://www.w3.org/TR/REC-CSS2/sample.html#bidi BDO[DIR="ltr"] { direction: ltr; unicode-bidi: bidi-override } BDO[DIR="rtl"] { direction: rtl; unicode-bidi: bidi-override } *[DIR="ltr"] { direction: ltr; unicode-bidi: embed } *[DIR="rtl"] { direction: rtl; unicode-bidi: embed } /* Elements that are block-level in HTML4 */ ADDRESS, BLOCKQUOTE, BODY, DD, DIV, DL, DT, FIELDSET, FORM, FRAME, FRAMESET, H1, H2, H3, H4, H5, H6, IFRAME, NOSCRIPT, NOFRAMES, OBJECT, OL, P, UL, APPLET, CENTER, DIR, HR, MENU, PRE, LI, TABLE, TR, THEAD, TBODY, TFOOT, COL, COLGROUP, TD, TH, CAPTION { unicode-bidi: embed } The cases in your example should only be identical if the value of 'unicode-bidi' is 'normal', and although this is the default value, it does not apply to any HTML element.
Now, it seems we need to make sure that 'direction' is resolved *after*: 1) 'unicode-bidi' resolution, 2) 'display' resolution since 'direction' depends on both of them: - on block-level elements it's set unconditionally (in the scope of mentioned conditions, of course); - to inline elements - is not applied, or applied differently, depending on 'unicode-bidi' value. (David Baron: 'unicode-bidi' has already been implemented.)
Simon, Hixie: Although 'unicode-bidi: normal' is not recommended by http://www.w3.org/TR/REC-CSS2/sample.html#bidi, it works in IE5.0, and also in mozilla (tested on 20010508 build): BDO { unicode-bidi: normal } .................... <BDO dir=rtl>abc</BDO> (style sheet rule wins, and bidi override specified in 'BDO' tag has no effect.)
My changes to nsBidiPresUtils.cpp were probably wrong. They don't even seem to make a difference (but the other changes seem to fix this bug, but probably break other things). Does anybody familiar with the bidi code know what those changes would break and how to fix them, or if they're the right thing to do at all?
David: not sure this is related to the problem, however.. I also tried adding static const PRUnichar kLRO = 0x202D; static const PRUnichar kRLO = 0x202E; but those do harm -- please see the comment in InitLogicalArray: // Example: <bdo dir=ltr>latin HEBREW</bdo> // should be displayed as is: latin HEBREW // The entire text frame receives even EL and whouldn't "normally" be split. // Then, if the platform is Bidi, it reverses hebrew characters, which results // in incorrect text order: latin WERBEH // To avoid that, we always insert LRE/RLE. That forces splitting the frame into // uni-directional (from the platform point of view) pieces - with even EL for // "latin" and odd EL for "HEBREW". Unicode override, if any, will be taken into // account in AdjustEmbeddingLevel().
Yes, I got the check backwards. It should be if (!(NS_STYLE_INHERIT_VISIBILITY & bits)) { } Does that fix the bug?
Simon: Why do you say unicode-bidi: normal "does not apply to any HTML element"? I don't see where you got that from. Also, the test you quoted does not affect my example at all, since "span" elements are not mentioned in the text you quoted and "dir" attributes are not present in the markup I pasted. It appears Lina and dbaron have this in hand, though. :-)
Can we fission this discussion? The original reordering bug I reported is fixed by hyatt's change of the if statement. That should be reviewed and checked in, since the bug breaks real-world Bidi sites, e.g. the weather forecast at www.msn.co.il hixie, do you agree that the discussion of css bidi attributes is irrelevant to both the bug and the fix? I would be happy to carry on that discussion somewhere else, but here I think it's obscuring the issue.
While I understand that hyatt's fix returns us to the status quo, I believe it is the wrong thing for us to do. If that fix is checked in, it should only be on a branch, and not on the Mozilla trunk, as it is only a band-aid and merely serves to obscure and deprioritise the real issue. I agree, however, that it would be appropriate to continue discussing this elsewhere. The netscape.public.mozilla.style newsgroup is probably the best fit for this issue. Please cc me on your reply if you do respond to my comments above in the newsgroups.
hyatt's fix is not a bandaid, it's a correction to his earlier checkin on bug 78965, which caused a regression in Bidi reordering on sites without any style markup. If there is a bidi style bug, that isn't "the real issue", it's a different issue and can be prioritised or deprioritised on its own merits. There's no reason to hijack this bug and leave the trunk rendering sites incorrectly just in order to make a point.
Hixie, I don't see the harm in returning the code to the way it was prior to the 78695 checkin.
Is NS_STYLE_INHERIT_VISIBILITY specific to the 'direction' property? If not, then will that even fully return things to the state they were before? (My feeling is that the old code is wrong, and we should try to fix it, since the specs already have a way to handle this, but we're just not using it. It's also IMO a rather odd (i.e., wrong) use of the style system.)
I agree with dbaron on this.
I agree that there is work needed on the bidi CSS properties. Meanwhile, does anybody have an objection to checking hyatt's fix into the 0.9.1 branch?
Hyatt's changes aren't on the 0.9.1 branch. This still works there, right?
I understood that hyatt's changes were in the branch, but maybe I just don't know how to use Bonsai. If they're not, there should be no problem.
My changes are not on the branch.
So, what's the status of this bug? Are we going to do something about it? It would be nice to fix it before we close for 0.9.2.
After putting the fix from hyatt in, I get consistent behavior between IE6 and Mozilla in all cases mentioned in this bug. Why shouldn't this fix go in?
Does hyatt's patch break if the 'visibility' or '-moz-opacity' property is specified on a node where 'direction' is not?
dbaron: Could you give me some sample HTML pretty please? Thanks
QA contact changed to: giladehven@hotmail.com
QA Contact: momoi → giladehven
Does anyone mind if I move 'mUnicodeBidi' from 'nsStyleText' to 'nsStyleVisibility'?
I have a patch in my tree (modified from the one above) that has it moved to nsStyleTextReset, except I'm having trouble getting it working. (It should be in a reset struct since it's reset, not inherited.) I need to talk to hyatt today...
I also have my own version of dbaron's patch above, but there are some things that I still can't fix and/or don't understand. Maybe we can set up some conference call or IRC meeting to sort this issue out?
I'm wondering if there's been another regression in bidi reordering recently. The style stuff seems to be right now... it's basically what I was doing before except I moved unicode-bidi into a different struct (which is important for sharing) and I fixed the attribute mapping to be correct. However, I'm seeing that the reordering isn't quite right anymore. Putting in LRO/RLO for override rather than LRE/RLE helps a bit (and when I was working on the previous patch it didn't change anything). And I'm seeing the same results on Windows 2000 and Linux. (I think what might be happening with the patch is that overrides are working when they're not at the beginning of the line, and if I don't make the LRO/RLO change it's the reverse, although I'd have to go through the test case more closely.)
Well, the patch has the same problems if I apply it to a tree from June 5, so the style data still isn't quite right, I think...
The above patch passes: http://www.people.fas.harvard.edu/~dbaron/css/test/bidi2 http://www.people.fas.harvard.edu/~dbaron/css/test/bidi2_html http://www.people.fas.harvard.edu/~dbaron/css/test/bidi2_charcode I'm looking for reviews for this patch and the one on bug 73251. The summary of the changes in this patch is the following: * Change 'unicode-bidi' from an inherited to non-inherited property (and make it use the normal inheritance style values), and thus move it from the Text struct to the TextReset struct. * Use 'unicode-bidi' instead of knowledge of "explicit 'direction'" to determine when there should be a new embedding level * Since 'unicode-bidi' is not inherited anymore, modify nsBidiPresUtils::Resolve to check the value on the block frame and add an override to InitLogicalArray if needed.
"dbaron" wrote: > * Since 'unicode-bidi' is not inherited anymore, modify > nsBidiPresUtils::Resolve to check the value on the block frame and add an > override to InitLogicalArray if needed." 1. <http://www.w3.org/TR/REC-CSS2/visuren.html#direction> 'unicode-bidi' property applies also to inline elements. (BTW, by default 'bdo' html element opens an inline frame.) 2. We can't use LRO/RLO on BIDI enabled system.
According to CSS2 spec, 'unicode-bidi' with value 'embed' is an equivalent to LRE/RLE (U+202A/U+202B), and 'unicode-bidi' with value 'bidi-override' - corresponds to adding LRO/RLO (U+202D/U+202E). Thus, if I write <P><B class=rlo>&#1491;&#1492;&#1493; <i>DEF</i></B></P> - all the elements enclosed in <B>...</B> (including the child element <I>) should receive RTL override. On the other hand, the spec says that 'unicode-bidi' isn't inherited. Can anyone explain this contradiction?
I still have some issues with the last patch: 1) In |nsBidiPresUtils::Resolve| the second block of added code is between the call to |InitLogicalArray| and the test of the return value. Unless there is a particular reason for putting it there, move it to after the test. 2) What would be the pros and cons of using the bidi stuff from the CSS sample style sheet http://www.w3.org/TR/REC-CSS2/sample.html#bidi instead of handling BDO and DIR in nsGenericHTMLElement.cpp? 3) Have you tested on a platform with its own bidi support (e.g. W2K)? Cases 11, 12, 13, 15 and 16 in the group "Each pair of lines should look the same" in your test pages are broken for me on WinNT with Hebrew support. If they work for you, perhaps there is a conflict with the patch in bug 81664
In nsRuleNode::Compute<Style>Data we evaluate 'PRBool inherited', which is common for a whole set of rules. What will happen, if inheritance rules are contradictory for different styles? E.g. in ComputeTextResetData 'text- decoration' is 'enum', but 'unicode-bidi' is 'inherit'.
>"dbaron" wrote: >> * Since 'unicode-bidi' is not inherited anymore, modify >> nsBidiPresUtils::Resolve to check the value on the block frame and add an >> override to InitLogicalArray if needed." > >1. <http://www.w3.org/TR/REC-CSS2/visuren.html#direction> >'unicode-bidi' property applies also to inline elements. (BTW, by default 'bdo' >html element opens an inline frame.) I apply it to inline elements - InitLogicalArray is recursive. >2. We can't use LRO/RLO on BIDI enabled system. That sounds bad. So if I stick in the character entities for LRO/RLO on the Hebrew version of Win2K, it won't work? This sounds like a bug in itself - we ought to be able to pass: http://www.people.fas.harvard.edu/~dbaron/css/test/bidi2_charcode and if we did I would think this wouldn't be a problem. This seems almost like it should be handled at the gfx level. >------- Additional Comments From Lina Kemmel 2001-06-24 06:42 ------- > >According to CSS2 spec, 'unicode-bidi' with value 'embed' is an equivalent to >LRE/RLE (U+202A/U+202B), and 'unicode-bidi' with value 'bidi-override' - >corresponds to adding LRO/RLO (U+202D/U+202E). Thus, if I write > ><P><B class=rlo>&#1491;&#1492;&#1493; <i>DEF</i></B></P> > >- all the elements enclosed in <B>...</B> (including the child element <I>) >should receive RTL override. > >On the other hand, the spec says that 'unicode-bidi' isn't inherited. > >Can anyone explain this contradiction? There's no contradiction - it's equivalent to putting an LRE, RLE, LRO, or RLO at the beginning of the element and a PDF at the end. Why does it seem like there is a contradiction? One doesn't need an RLO/LRO for each inline element to create an override -- a single pair of formatting characters should be enough. >------- Additional Comments From Simon Montagu 2001-06-24 07:22 ------- > >I still have some issues with the last patch: > >1) In |nsBidiPresUtils::Resolve| the second block of added code is between the >call to |InitLogicalArray| and the test of the return value. Unless there is a >particular reason for putting it there, move it to after the test. I was just thinking that it would be better not to push one of the formatting characters without pushing the other (the PDF). >2) What would be the pros and cons of using the bidi stuff from the CSS sample >style sheet http://www.w3.org/TR/REC-CSS2/sample.html#bidi instead of handling >BDO and DIR in nsGenericHTMLElement.cpp? A rule like [dir=rtl] has significant performance problems since we have to get the dir attribute more in the CSS selector matching process than using attribute mapping. (We have to do it twice rather than once and we have to get the attribute as a string which involves string copying. I think it might actually be worse than this, but I can't think why right now.) >3) Have you tested on a platform with its own bidi support (e.g. W2K)? Cases 11, >12, 13, 15 and 16 in the group "Each pair of lines should look the same" in your >test pages are broken for me on WinNT with Hebrew support. If they work for you, >perhaps there is a conflict with the patch in bug 81664 I've tested on vanilla Win2K and on Linux (although I'm not sure whether I tested the working version on Linux - but I was seeing the same results on the two before the final version). Does standard US Win2K have BiDi support? >------- Additional Comments From Lina Kemmel 2001-06-24 08:38 ------- > >In nsRuleNode::Compute<Style>Data we evaluate 'PRBool inherited', which is >common for a whole set of rules. What will happen, if inheritance rules are >contradictory for different styles? E.g. in ComputeTextResetData 'text- >decoration' is 'enum', but 'unicode-bidi' is 'inherit'. This inherited bit doesn't really mean inherited - it means (I think) that for some reason (which is generally inheritance) the style struct can't be shared. The reason you just brought up is exactly the reason that hyatt's patch above won't always work.
> ------- Additional Comments From David Baron 2001-06-25 09:24 ------- > I apply it [override] to inline elements - InitLogicalArray is recursive. Sorry, I got it wrongly (I thought you said "check the value on the block frame and add an override to *LogicalArray* if needed." ------- >> 2. We can't use LRO/RLO on BIDI enabled system. > That sounds bad. This sounds like a bug in itself... Using LRE/RLE instead, and calling AdjustEmbeddingLevel() after BIDI resolution were supposed to prevent this bug. ------- > There's no contradiction - it's ['unicode-bidi'] equivalent to putting an > LRE, RLE, LRO, or RLO at the beginning of the element and a PDF at the end. > Why does it seem like there is a contradiction? One doesn't need an RLO/LRO > for each inline element to create an override -- a single pair of formatting > characters should be enough. I understand and agree with that. But what does the spec mean by "non- inheritance" of 'unicode-bidi'? ------- > Does standard US Win2K have BiDi support? No, it doesn't have BiDi support (or provides it partly), unless you have BiDi (hebrew or arabic) locale installed.
> > > 2. We can't use LRO/RLO on BIDI enabled system. > > That sounds bad. This sounds like a bug in itself... >Using LRE/RLE instead, and calling AdjustEmbeddingLevel() after BIDI resolution >were supposed to prevent this bug. I had to remove AdjustEmbeddingLevel because it was written around the assumption that 'unicode-bidi' was inherited. Since we're supposed to support RLO and LRO characters in the document, I had hoped it wouldn't be needed. Is it true that we don't correctly support those on a bidi-enabled Win2K system? > > There's no contradiction - it's ['unicode-bidi'] equivalent to putting an > > LRE, RLE, LRO, or RLO at the beginning of the element and a PDF at the end. > > Why does it seem like there is a contradiction? One doesn't need an RLO/LRO > > for each inline element to create an override -- a single pair of formatting > > characters should be enough. > >I understand and agree with that. But what does the spec mean by "non- >inheritance" of 'unicode-bidi'? If 'unicode-bidi' had been inherited, it meant that given the markup: <p style="unicode-bidi: embed">This <span>is <span>some</span> text</span>.</p> each span would open a new embedding level since it inherited the 'embed' value. Since it is not inherited, the spans have the default value of 'normal' and do not open a new embedding level. (Things like 'color' are inherited since you want the text of the spans to be in the same color as the paragraph rather than in the default color of the document.)
The problem isn't with the LRO/RLO characters on a Bidi system as such, it's with constructs like &#x202d;ABC &#x05D3;&#x05D4;&#x05D5;&#x202c; That produces a single frame containing characters with different bidi chartypes but the same embedding level, which we have problems handling. On a non-bidi platform we just send the frame as is to gfx and it comes out OK, but on a bidi platform the Hebrew gets reversed when displayed. Including the LRO and RLO characters in the displayed string doesn't work on all platforms, and not all platforms have an option to override their own bidi reordering.
How should we proceed from here? Would stripping the LRM, RLM, LRE, RLE, PDF, LRO, and RLO characters before sending the text to the OS work?
> ------- Additional Comments From David Baron 2001-06-25 10:46 ------- > If 'unicode-bidi' had been inherited, it meant that given the markup: > <p style="unicode-bidi: embed">This <span>is <span>some</span> > text</span>.</p> each span would open a new embedding level since it > inherited the 'embed' value. You and the spec :-) are definitely right. My mistake was that I thought that "'unicode-bidi' isn't inherited" meant not only that a child element would not open a new embedding level, but also that it would *reset* the level to what it was before the last embedding/override.. > I had to remove AdjustEmbeddingLevel because it was written around the > assumption that 'unicode-bidi' was inherited. The significance of that inheritance was different, meaning, in fact, inheriting of the current embedding level (without increasing it). However, all this isn't important now. Hopefully, I found solution for the LRO/RLO issue (based on your patch).
Summary of the changes : 1. Text frame with mixed content (containing both strong RTL and LTR characters) are split, so that they are uni-directional from system's point of view. 2. Most of BiDi control characters are stripped before displaying. Please review the patch, and, if you find it acceptable to us, I'll submit fixes for the: Open issues: 1. Need to take off also Arabic specific characters - zero width non-joiner and zero width joiner (U+200C and U+200D). 2. Provide some cosmetic changes.
On WinNT with Hebrew support, the combined dbaron and lkemmel patch passes all the testcases mentioned in the comments above. dbaron, can you confirm for other platforms?
It works great for me on Win2K (default US-English install). (It removes the stray characters that previously showed up while rendering http://www.people.fas.harvard.edu/~dbaron/css/test/bidi2_charcode .)
r=simon@softel.co.il (for attachment 40276 [details] [diff] [review]) Who can sr this? I would very much like this to make it into 0.9.2
sr=hyatt
Fix checked into trunk 2001-06-28 20:15 PDT. Adding nsBranch keyword.
Keywords: nsBranch
Marking PDT+ per conversation with chofmann.
Whiteboard: (py8ieh: check fix) → [PDT+](py8ieh: check fix)
Fix checked into branch 2001-06-30 11:59 PDT.
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
*** Bug 91105 has been marked as a duplicate of this bug. ***
simon@softel.co.il or giladehven - can you pls verify this bug?
QA contact to Zach, who is the default QA now.
QA Contact: giladehven → zach
VERIFIED FIXED after rerunning all the attached test cases.
Status: RESOLVED → VERIFIED
Component: Layout: BiDi Hebrew & Arabic → Layout: Text
QA Contact: zach → layout.fonts-and-text
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: