Closed Bug 404142 Opened 17 years ago Closed 16 years ago

nested <span dir="rtl"> display in wrong order, unless there is a nested text like &nbsp; for e.g.

Categories

(Core :: Layout: Text and Fonts, defect)

defect
Not set
normal

Tracking

()

RESOLVED INVALID

People

(Reporter: ofir_144, Unassigned)

Details

(Keywords: rtl)

Attachments

(3 files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.9) Gecko/20071025 Firefox/2.0.0.9
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.9) Gecko/20071025 Firefox/2.0.0.9

nested <span dir=""rtl"> display in wrong order, unless there is a nested text like &nbsp; for e.g.

In nested I mean: 
<span dir="rtl" id="outer"><span dir="rtl">Nested Text1</span><span dir="rtl">Nexted Text2</span></span>


Reproducible: Always

Steps to Reproduce:

<span dir="rtl"><span dir="rtl"> rtl rtl one two three four</span><span dir="rtl"> five six seven eight </span></span><br>

<span dir="rtl">&nbsp;<span dir="rtl"> rtl nbsp rtl one two three four</span>&nbsp;<span dir="rtl"> five six seven eight </span>&nbsp;</span><br>

<span dir="rtl"><span dir="rtl"> rtl sp rtl one two three four </span> <span dir="rtl"><span dir="ltr"> five six seven eight</span> </span></span><br>

<span dir="rtl"> &lrm;&lt;&lrm;one&lrm;&gt;&lrm; <span dir="rtl"> two </span> &lrm;&lt;&lrm;three&lrm;&gt;&lrm; </span> fourTWOLRMplusSPACE <br>
<span dir="rtl">&lrm;&lt;&lrm;one&lrm;&gt;&lrm;<span dir="rtl"> two </span>&lrm;&lt;&lrm;three&lrm;&gt;&lrm;</span> fourTWOLRMS <br>
Actual Results:  
rtl rtl one two three four five six seven eight
five six seven eight rtl nbsp rtl one two three four
five six seven eight rtl sp rtl one two three four 
<‎three‎> two ‎<‎one‎>fourTWOLRMplusSPACE
‎<‎one‎>‎ two ‎<‎three‎>‎ fourTWOLRMS 

Expected Results:  
five six seven eight rtl rtl one two three four      // (note first 5-8 then 1-4)
five six seven eight rtl nbsp rtl one two three four
five six seven eight rtl sp rtl one two three four  
<‎three‎> two ‎<‎one‎> fourTWOLRMplusSPACE               //(Note a correct space is after <‎one‎>)
<‎three‎>‎ two ‎‎<‎one‎>‎ fourTWOLRMS                       //(Note three is first one last)

I think there should be built, a test page with lots of bidi control char's, and that it should be checked that it presents right. The page can include number sequences in words like one two three, or inside the words like he_1_llo ho2w ar3e yo4u in the order that the presentation should occur.
Summary: nested <span dir=""rtl"> display in wrong order, unless there is a nested text like &nbsp; for e.g. → nested <span dir="rtl"> display in wrong order, unless there is a nested text like &nbsp; for e.g.
Mass-assigning the new rtl keyword to RTL-related (see bug 349193).
Keywords: rtl
Off to bidi. Simon - do you have an opinion regarding the validity of this?
Severity: major → normal
Component: General → Layout: BiDi Hebrew & Arabic
OS: Windows XP → All
Product: Firefox → Core
QA Contact: general → layout.bidi
Hardware: PC → All
Version: unspecified → Trunk
I worked through this minimized version of the first case manually with the Unicode Bidi Algorithm [1] and our rendering seems to be correct.

I don't want to fill up the bug with the detailed steps, but the key is that after setting embedding levels by rules X1-X9, we have a single run of characters at embedding level 3. Then by rule N1 the neutrals (spaces) between the left-to-right characters are resolved as left-to-right.

In the case with an &nbsp; between the spans, on the other hand, the &nbsp; is in a separate run at embedding level 1 between two runs at embedding level 3. sor and eor are both R, so the &nsbp; is resolved as right-to-left.

What makes me a bit less sure of myself is that Opera, Safari and IE all render both cases as "b a" (visually). I don't like taking a position that "everybody is out of step except our Johnny", so maybe I am missing something in my interpretation of the algorithm.

[1] http://www.unicode.org/reports/tr9/
The penultimate case
<span dir="rtl"> &lrm;&lt;&lrm;one&lrm;&gt;&lrm; <span dir="rtl"> two </span>
&lrm;&lt;&lrm;three&lrm;&gt;&lrm; </span> fourTWOLRMplusSPACE <br>
is more or less equivalent to the example in http://www.w3.org/TR/CSS21/text.html#egbidiwscollapse, and there is no space after "one" here for the same reason that there is no space after "B" there.
(In reply to comment #4)
> Created an attachment (id=322489) [details]
> Minimized version of the first testcase
> 
> I worked through this minimized version of the first case manually with the
> Unicode Bidi Algorithm [1] and our rendering seems to be correct.
> 
> I don't want to fill up the bug with the detailed steps, but the key is that
> after setting embedding levels by rules X1-X9, we have a single run of
> characters at embedding level 3. Then by rule N1 the neutrals (spaces) between
> the left-to-right characters are resolved as left-to-right.
> 

I did the same research independently, and reached the same conclusion.

> 
> What makes me a bit less sure of myself is that Opera, Safari and IE all render
> both cases as "b a" (visually). I don't like taking a position that "everybody
> is out of step except our Johnny", so maybe I am missing something in my
> interpretation of the algorithm.
> 

I chatted with the WebKit developer in charge, and he acknowledged that this indeed seems like a bug in WebKit.
In the last case the rendering also seems correct, and here other browsers that I tested do the same thing. Accordingly I'm marking this invalid.
Status: UNCONFIRMED → RESOLVED
Closed: 16 years ago
Resolution: --- → INVALID
Hello Simon Montagu,

Could you explain to me in more detail how did you come to this conclusion? 

More specifically, why shouldn't this:

<span dir="rtl"><span dir="rtl"> rtl rtl one two three four</span><span
dir="rtl"> five six seven eight </span></span><br>

look like this:

five six seven eight rtl rtl one two three four


 ???

Status: RESOLVED → UNCONFIRMED
Resolution: INVALID → ---
The <span dir="rtl"> opens an additional level of embedding with respect to the bidi algorithm, but in the case of <span dir="rtl"> rtl rtl one two three four</span><span dir="rtl"> five six seven eight </span> the whole run of text "rtl rtl one two three four five six seven eight" is at the same embedding level and contains no right-to-left characters, so there is no reason for any reordering to take place.

Why do you think it should look like "five six seven eight rtl rtl one two three four"? If Uri, the Webkit developer referenced in comment 6, and I all agree that it shouldn't, I think the burden of proof is on you.
OK, here goes my point of view over the matter:

Suppose someone write in Hebrew in an RTL spanned section, then he wants to quote two English tokens, (be a token a word, or a sentence, or a number), and he wants the order of the two English tokens to be from Right To Left. The first token on the right and the second on the left: TOKEN2, TOKEN1. 

Then if my point of view is correct, he would span each English token in an RTL span, and TOKEN1 would appear before TOKEN2 in a Right To Left directionality.

<span dir="rtl">
<span dir="rtl">TOKEN1</span><span dir="rtl">TOKEN2</span>
</span>

On the other hand if my point of view is wrong, then he would have to write the English tokens as LTR in a fashion that would make them appear in the right order: TOKEN2, TOKEN1. 

or 

<span dir="rtl">
TOKEN2,TOKEN1
</span>

Here TOKEN2 is written before TOKEN1. 

And here comes the problem: _Line Breaks_

When the line would break on TOKEN2, TOKEN1, you would see TOKEN2 next to the start of the Hebrew sentence, and TOKEN1 will break down to the next line because it is after TOKEN1.

So it would look something like this:

TOKEN2, {Hebrew start of the sentence}
{Hebrew end of sentence} TOKEN1

while the intention was:

{Hebrew end of sentence} TOKEN2 ,TOKEN1 {hebrew start of the sentence}

Or in a braked form:
TOKEN1, {hebrew start of the sentence}
{hebrew end of sentence} TOKEN2


Is this enough? (Because I can probably find more reasons)


FWIW, to gain the effect you want, it's enough to include a RLM between the two tokens wrapped inside RTL spans.  Something like:

<span dir="rtl">begin <span dir="rtl">TOKEN1</span>&rlm;<span dir="rtl">TOKEN2</span> end</span>

Here is how it's rendered when the parent element has enough width for the whole content to fit on one line:

end TOKEN2TOKEN1 begin

And here is how it's rendered when you decrease the width of the container so that line breaks are enforced:

begin
TOKEN2TOKEN1
end

This is what you expect, IINM.
Attached file Sample with RLM
I was going to say the same as Ehsan. Here's a sample.
And for a simple description of what RLM and LRM actually go generally, check out: <http://en.wikipedia.org/wiki/Bi-directional_text>
Reading comment #4 I think that treating the spans as a single run of level 3 is an error. Each span may be indeed in level 3, but they are nested in a lower level each by itself, and should be ordered between themselves according to their own span directionality at the upper level, this regardless of the way that their content begins or ends.

Is there not a "Pop Directional Format" (PDF) at the </span> ? Does it not mean that the level should be lowered from 3? Isn't "sor and eor are both R" according to comment #4 ? Isn't sor&eor=R defined by that <span dir="RTL">?

I will need the answers to these pointers in order to continue. In any case, having the order of these spans defined by an inclusion or exclusion of a neutral between the two spans is logically not right. 

PS. I don't think it is right to piggyback the ordering of spans between themselves on LRM,RLMs.

RLM and LRM are meant to define the directionality of neutral symbols like < > . , and stuff. So if you have a mix of RTL and LTR languages and in the middle you have this neutral symbol you can tell it where you want it to be. Like in <RTL1>,<LTR2>,<RTL3> you may want it to look like: <RTL3>,<LTR2>,<RTL1> and it
may actually look like: <RTL3><LTR2>,,<RTL1> for example. Then you add an RLM after the first comma, so it will know it is in the middle of two RTL characters and act RTLishly. Otherwise it is confused with one RTL character at one side and one LTR character at the other side. That is what LRM, and RLM are good for. 




Component: Layout: BiDi Hebrew & Arabic → Layout: Text
QA Contact: layout.bidi → layout.fonts-and-text
(In reply to comment #14)
> Reading comment #4 I think that treating the spans as a single run of level 3
> is an error. Each span may be indeed in level 3, but they are nested in a lower
> level each by itself, and should be ordered between themselves according to
> their own span directionality at the upper level, this regardless of the way
> that their content begins or ends.
> 
> Is there not a "Pop Directional Format" (PDF) at the </span> ?

Yes, there is.

> Does it not mean
> that the level should be lowered from 3?

Yes, the level is set back to 1, until the next character (an RLE) sets it back to 3. However, note that in X9, all RLE, LRE, RLO, LRO, PDF, and BN codes are removed, so no character with a level of 1 remains between the characters with level 3. So at this point, we have just one run of level-3 characters.

> Isn't "sor and eor are both R"
> according to comment #4 ? Isn't sor&eor=R defined by that <span dir="RTL">?

Given the description above, sor and eor are irrelevant in this case, as there is no run starting or ending between the "a" and the following space - they're both at the same run of level 3.

> 
> I will need the answers to these pointers in order to continue. In any case,
> having the order of these spans defined by an inclusion or exclusion of a
> neutral between the two spans is logically not right. 
> 

I can relate to your intuition here. However, it does not match what UAX #9 specifies, and as long as UAX #9 isn't changed, I think we should keep doing what it tells us to do.

> PS. I don't think it is right to piggyback the ordering of spans between
> themselves on LRM,RLMs.
> 
> RLM and LRM are meant to define the directionality of neutral symbols like < >
> . , and stuff.

No, they're not meant for this, although they are sometimes use for this purpose.

> So if you have a mix of RTL and LTR languages and in the middle
> you have this neutral symbol you can tell it where you want it to be. Like in
> <RTL1>,<LTR2>,<RTL3> you may want it to look like: <RTL3>,<LTR2>,<RTL1> and it
> may actually look like: <RTL3><LTR2>,,<RTL1> for example. Then you add an RLM
> after the first comma, so it will know it is in the middle of two RTL
> characters and act RTLishly. Otherwise it is confused with one RTL character at
> one side and one LTR character at the other side. That is what LRM, and RLM are
> good for. 

That, and other cases, such as the one we're dealing with here.

Note also that Webkit (used by Safari and Chrome) are actively working on changing their behavior in this case to match ours (and UAX #9): see https://bugs.webkit.org/show_bug.cgi?id=19839

Based on the above, I'm marking this as INVALID. Please don't re-open this unless you can point to a specific place where we're not following UAX #9 as it currently stands.
Status: UNCONFIRMED → RESOLVED
Closed: 16 years ago16 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: