Closed Bug 177147 Opened 17 years ago Closed 13 years ago

BiDi run breaks should not be line break opportunities

Categories

(Core :: Layout: Text and Fonts, defect)

defect
Not set

Tracking

()

RESOLVED FIXED

People

(Reporter: ian, Assigned: ilya.konstantinov+future)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

Inexplicably, Right-to-Left Embedding (U+202B) and Right-To-Left Override
(U+202E) characters are considered valid line-breaking opportunities.

http://www.hixie.ch/tests/adhoc/css/text/white-space/normal/line-breaking/bidi/002.html
reassigning per smontagu
Assignee: mkaply → shanjian
It seems as if the only reason that LRE and LRO are not also causing
line breaks in the testcase is that we are optimizing them away. Adding ‏ to
the beginning of each line makes every line go red.
Summary: RLE and RLO characters allow line breaks → LRE, RLE, LRO and RLO characters allow line breaks
shanjian is no longer working on mozilla for 2 years and these bugs are still
here. Mark them won't fix. If you want to reopen it, find a good owner first. 
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → WONTFIX
Mass Reassign Please excuse the spam
Assignee: shanjian → nobody
Mass Re-opening Bugs Frank Tang Closed on Wensday March 02 for no reason, all
the spam is his fault feel free to tar and feather him
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Simon should know what to do with this
Assignee: nobody → smontagu
Status: REOPENED → NEW
Seeing on Mac => All/All
OS: Windows 2000 → All
Hardware: PC → All
Actually, we seem to allow line-breaking anywhere between RTL and LTR text
runs, as this simple testcase demonstrates.

Simon - this is not by design, is it? Is there a separate bug for that? (I
couldn't find one).
It's not explicitly by design, but it's probably an artefact of the way that we
split text frames at points where the text direction changes.
But when one word is split into frames for other reasons, we don't allow
breaking (try adding <b>aaaaaaaaaaaa</b><i>aaaaaaaaaaaaa</i> to my testcase and
notice it won't be line-broken).

So I think it's not an unavoidable consequence of splitting bidi text into frames.
Yeah, I made that test too. I meant to imply "and in that case it's probably not
too hard to fix if we know where to look", not "and in that case there's nothing
we can do about it" :D
I see this quite often in real-world situations, where a number in Hebrew text
is followed by a punctuation mark (period or comma). A line break is allowed
between the number and the following punctuation mark, and the result looks
really bad.

I tried looking at the code for fixing it, but the line-breaking code is *very*
complicated, and contais some bidi-specific code, which I do not understand. If
I'm given some pointers/hints, I might be able to do something.
For the reference, the function taking care of line-breaking is nsTextFrame::MeasureText (in nsTextFrame.cpp). I'm still looking into it.
More info:
As I've already said, nsTextFrame::MeasureText(..) is responsible for line-wrapping. Uri rightfully said that this function must look further than its own frame, since otherwise any formatting change (<b>foo</b>bar) would be a break opportunity.

MeasureText achieves this by using an interator function GetNextWord. Unfortunatelly, in the buggy case, it passes worldLen to GetNextWord which limits it up the the nearest run break.
worldLen = start
and earlier in the code, start = start of the next run

If we have:
foo bar (BAZ)
        ^
        8 = wordLen

GetNextWord, if IBMBIDI, will limit itself up the wordLen, instead of slurping the entire fragment's length.

Actually, it makes a whole lot of sense for MeasureText to limit itself to a single LTR/RTL run -- it makes visual measurements and it needs visually-consecutive text for that.

From here, I need to check:
- how is the line-breaking verdict applied ('when the fact is set in stone?'),
- is it possible to take that verdict out of MeasureText' hands and merely make MeasureText pass context info from one invocation to its next?

Will look into it more tomorrow.
Answers to myself:
Q: How is the line-breaking verdict applied ('when the fact is set in stone') ?
A: It's applied after the reflow, by nsBlockFrame::SplitLine, following decissions made by nsLineLayout::CanPlaceFrame.

Q: Is it possible to take that verdict out of MeasureText' hands and merely make
MeasureText pass context info from one invocation to its next?
A: MeasureText doesn't make the verdict. It just finds the best line break within a text frame, given the metrics of the containing-box. The result of MeasureText is *NOT* a measurement of the entire text, but a measurement of as much text as can fit in the width. If nsLineLayout::CanPlaceFrame determines MeasureText was too optimistic, the reflow engine will re-run it (the "eReflowReason_Resize" reflow) with smaller containing-box metrics (<-- my guess).

---

The line-breaking engine is very simplistic. For example:

 HTML: "foo bar<span>baz</span>"

 Frames:
  +- nsTextFrame("foo bar")
  +- nsInlineFrame("span") <-- or whatever type it is
     +- nsTextFrame("baz")
 
 The containing-box width can fit "foo bar" but can't fit "baz".

1. On initial reflow, MeasureText will see "foo bar" and not knowing any better (it doesn't go as far as seeing "baz"), it will measure the whole "foo bar".
2. CanPlaceFrame will determine there's no place for the "baz" frame, and since "foo bar" cannot be broken after, it'll re-run MeasureText with smaller available width (i.e. eReflowReason_Resize).
3. On the re-run, MeasureText will measure only "foo" cause "bar" wouldn't fit.

The same thing should happen with BiDi nsContinuingTextFrames (= the text frame after the BiDi run break).

Why doesn't it happen? One reason was that the condition for the final check in MeasureText, which results in the possible rollback of the last word, was wrong:

 (aTextData.mOffset == contentLength)

It should've also had the condition (aTextData.mOffset == startOfTheNextBidiRun).

This helps resolve the "a bC" (HEBREW = UPPERCASE) issue (making the linebreak between "a" and "b").
However, this still doesn't solve my pet peeve: "i like (CHEESECAKE)". It can still line-break between the "(" and the "CHEESECAKE".

I'll be researching this further...
Assignee: smontagu → mozilla-bugzilla
Summary: LRE, RLE, LRO and RLO characters allow line breaks → BiDi run breaks should not be line break opportunities
Robert, I'm CCing you, as you were rumored to rework MeasureText for your Thebes project ( http://wiki.mozilla.org/Gecko2:NewTextAPI ).
Following a conversation with dbaron, I realized new things:
- MeasureText must be able to run over more than one text frame, since one of its other implicit jobs is to reunite nsContinuingTextFrames created by line breaking done in a previous reflow. (Uri's work in bug 299065 has the potential to make distinction between nsContinuingTextFrames that need to be unified and those that that don't easier.)
- contentLength != this->mContentLength ... To hell with whoever named the variables in this function!

Empowered by that new knowledge, I'll try to iron it out tomorrow. Probably one of my patches will have to be one which simply renames variables and puts comments all over.
MeasureText does look ahead to the rest of the word in the following frame, starting here:
http://lxr.mozilla.org/mozilla/source/layout/generic/nsTextFrame.cpp#5706

Check whether nsLineLayout::FindNextText is correctly finding the next text across a Bidi run break:
http://lxr.mozilla.org/mozilla/source/layout/generic/nsTextFrame.cpp#5727
Fixed by bug 343445
Status: NEW → RESOLVED
Closed: 15 years ago13 years ago
Resolution: --- → FIXED
Component: Layout: BiDi Hebrew & Arabic → Layout: Text
QA Contact: zach → layout.fonts-and-text
You need to log in before you can comment on or make changes to this bug.