Closed Bug 1160847 Opened 10 years ago Closed 9 years ago

Add glue to nsBidiPresUtils to use the support for bidi isolate characters in nsBidi

Categories

(Core :: Layout: Text and Fonts, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla50
Tracking Status
firefox40 --- affected
firefox50 --- fixed

People

(Reporter: smontagu, Assigned: xidorn)

References

Details

(Keywords: rtl)

Attachments

(6 files)

Now that bug 1157726 has added support for the unicode bidi isolation control characters, we can handle the equivalent css and html properties in nsBidiPresUtils in the same way as we now do bidi overrides and embeds, by adding the control characters to the text that we pass to the bidi engine. This will supersede the rather hacky and buggy implementation I added in bug 613149.
Attached patch w-i-p patchSplinter Review
This is working well in most cases that already work: there are a couple of regressions in dynamic reftests which still need investigation, and there is still a hacky workaround in there (mIsolateCount), which I think will become unnecessary after bug 1163923. However, it does NOT fix the cases in bug 717811, and we will need some non-trivial reworking of how we implement bidi resolution and reordering to solve that. In earlier versions of the Bidi Algorithm, all formatting characters were removed from the text at a relatively early stage of bidi resolution, and play no role in reordering. Our implementation reflects this: we create dummy frame pointers before and after <bdo>s etc. which are only used during resolution. In reordering (which has to be a separate step, after line breaking) these pointers no longer exist and we don't need them. In the latest versions of the Algorithm, things have changed and the isolate formatting characters affect reordering as well, and currently we don't take them into account there. Having written all this, I now believe the solution may be less complicated than I originally thought: it may be as simple as calling GetBidiControl while creating the mLogicalFrames array in BidiLineData. I've replaced my original needinfo? by cc-ing, but if I've missed anything, please do comment.
(In reply to Simon Montagu :smontagu from comment #1) > Having written all this, I now believe the solution may be less complicated > than I originally thought: it may be as simple as calling GetBidiControl > while creating the mLogicalFrames array in BidiLineData. No, it's not as simple as that: we need to know the resolved embedding levels of the isolate controls during reordering, which currently aren't saved anywhere, and recalculating them while creating the mLogicalFrames array is probably not practical. I don't have a solution for this right now.
I don't think this idea has a future, but attaching it for reference anyway. Inserting the control characters via :before and :after in html.css works for bdi and bdo, and fixes the tests from bug 717811, but doesn't help for cases which explicitly specify unicode-bidi: isolate. I don't feel comfortable saying "CSS discourages using unicode-bidi in author stylesheets so it's not the end of the world to leave it a little bit buggy".
Attachment #8608113 - Flags: feedback?(jfkthame)
Blocks: 922963
Blocks: 1185987
Blocks: 1221034
It looks like the wip patch is almost done. I did find some minor issues in that patch, which shouldn't be too hard to fix. I'll try to work on this and see if there is any remaining issue.
Assignee: smontagu → bugzilla
(In reply to Simon Montagu :smontagu from comment #1) > In the latest versions of the Algorithm, things have changed and the isolate > formatting characters affect reordering as well, and currently we don't take > them into account there. It took me a while to understand how isolate formatting characters could affect reordering, so I'd like to leave a note to make it clear. Basically the only affected case is when two isolate runs at the same embeding level are adjacent, e.g.: <p dir="rtl"><bdi dir="ltr">123</bdi><bdi dir="ltr">abc</bdi></p> If the isolate formatting characters are not removed, and they are resolved to a lower embedding level than the text it wraps, the visual result of the code above should be "321abc". However, if we don't know where the isolate formatting characters are, we would reorder the two pieces together as if they are in the same level run, which produces "cba321". Having written this, I guess we can probably solve this via storing whether there is a virtual isolate formatting character right before a given frame to a frame property, so that we know the break point of adjacent level runs.
Depends on: 1281099
Hmmm, so after removing subparagraph code for isolate, I met bug 712600 again. Guessing it is related to reordering code.
This patch is mainly based on smontagu's wip patch. Some non-trivial differences: * BidiParagraphData.mIsolateCount and related code are not added in this patch. I investigated uses of this field in the wip patch, and it seems to me none of them makes sense: 1. in the fast path of nsBidiPresUtils::ResolveParagraph, if there would be any isolate character in the surrounding text, there must exist more than one runs, which indicates the isolate count condition is redundant. 2. in handle of br frame in nsBidiPresUtils::TraverseFrames, based on my understanding of "CSS Writing Modes Level 3" section "2.4.4. Paragraph Breaks Within Embeddings and Isolates", the resolving should happen unconditionally. * {control,override}Char in nsBidiPresUtils::TraverseFrames are assigned unconditionally when in a bidi inline container, so that we can properly handle it when there are continuations. I suspect this was the reason of regressions in dynamic reftests from the wip patch mentioned in comment 1. Review commit: https://reviewboard.mozilla.org/r/60198/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/60198/
There will be some more patches later for implementing what I described at the end of comment 5. But given the existing patch is non-trivial, I think review can start without having the next patch. Optimistically, I could finish the next patch tomorrow :)
(In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #6) > Hmmm, so after removing subparagraph code for isolate, I met bug 712600 > again. Guessing it is related to reordering code. FWIW, this wasn't related to reordering, but to splitting. Frames were not split properly if a run ends in a virtual bidi control frame.
(In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #5) > Having written this, I guess we can probably solve this via storing whether > there is a virtual isolate formatting character right before a given frame > to a frame property, so that we know the break point of adjacent level runs. Yes, that should work. I think we need to store two things: the resolved level of an isolate initiator before the frame if any (set by rule X5a-c[1]), and the resolved level of an isolate terminator after the frame, if any (set by rule X6a[2]). (we could define a pseudo-level value outside the legal range of 0 - NSBIDI_MAX_EXPLICIT_LEVEL+1 for "none"). Then we could check these properties in the BidiLineData ctor and add elements to the arrays to correspond to them, like PushBidiControl and PopBidiControl do in resolution. This assumes that anywhere there are isolate formatting characters in a frame, the frame will be split during bidi resolution. I haven't checked whether that's true now, but I think it is, and it's simple enough to make it so if necessary. [1] http://unicode.org/reports/tr9/#X5a [2] http://unicode.org/reports/tr9/#X6a
Comment on attachment 8764149 [details] Bug 1160847 part 2 - Add glue to nsBidiPresUtils to use support for bidi isolate in nsBidi. Review request updated; see interdiff: https://reviewboard.mozilla.org/r/60198/diff/1-2/
(In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #8) > * BidiParagraphData.mIsolateCount and related code are not added in this > patch. > I investigated uses of this field in the wip patch, and it seems to me > none of > them makes sense: > 1. in the fast path of nsBidiPresUtils::ResolveParagraph, if there would be > any isolate character in the surrounding text, there must exist more > than > one runs, which indicates the isolate count condition is redundant. Couldn't there be a situation where this frame is a single LTR run isolated inside a RTL paragraph so it needs to get an even embedding level higher than 0? > 2. in handle of br frame in nsBidiPresUtils::TraverseFrames, based on my > understanding of "CSS Writing Modes Level 3" section "2.4.4. Paragraph > Breaks Within Embeddings and Isolates", the resolving should happen > unconditionally. I don't remember all the background now, but this is the "hacky workaround" that I refer to at the beginning of comment 1. I suspect I put it in to handle one of the test cases in testing/web-platform/tests/html/semantics/text-level-semantics/the-bdi-element/
(In reply to Simon Montagu :smontagu from comment #11) > (In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #5) > > > Having written this, I guess we can probably solve this via storing whether > > there is a virtual isolate formatting character right before a given frame > > to a frame property, so that we know the break point of adjacent level runs. > > Yes, that should work. I think we need to store two things: the resolved > level of an isolate initiator before the frame if any (set by rule > X5a-c[1]), and the resolved level of an isolate terminator after the frame, > if any (set by rule X6a[2]). (we could define a pseudo-level value outside > the legal range of 0 - NSBIDI_MAX_EXPLICIT_LEVEL+1 for "none"). Then we > could check these properties in the BidiLineData ctor and add elements to > the arrays to correspond to them, like PushBidiControl and PopBidiControl do > in resolution. I actually was thinking about just storing whether there are isolate controls, and just make them use embedingLevel-1 as their level. But later I realized that it doesn't work. So yes, we need to store the embedding level of isolate controls, and we may actually need to store them in inline container frames rather than text frames. My thought is that we can store it in frame properties like what we do for embedding level etc. And I guess we probably don't even need a value for "none". Querying frame property is somehow expensive, so we may want to check unicode-bidi value before querying properties.
(In reply to Simon Montagu :smontagu from comment #13) > (In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #8) > > * BidiParagraphData.mIsolateCount and related code are not added in this > > patch. > > I investigated uses of this field in the wip patch, and it seems to me > > none of > > them makes sense: > > 1. in the fast path of nsBidiPresUtils::ResolveParagraph, if there would be > > any isolate character in the surrounding text, there must exist more > > than > > one runs, which indicates the isolate count condition is redundant. > > Couldn't there be a situation where this frame is a single LTR run isolated > inside a RTL paragraph so it needs to get an even embedding level higher > than 0? Basically, for any unclosed isolate level, we would always append a PDI before resolving, and the PDI would be resolved to a different embedding level, so there should always be two levels. Based on that, I think we don't need to know whether it is currently isolated. > > 2. in handle of br frame in nsBidiPresUtils::TraverseFrames, based on my > > understanding of "CSS Writing Modes Level 3" section "2.4.4. Paragraph > > Breaks Within Embeddings and Isolates", the resolving should happen > > unconditionally. > > I don't remember all the background now, but this is the "hacky workaround" > that I refer to at the beginning of comment 1. I suspect I put it in to > handle one of the test cases in > testing/web-platform/tests/html/semantics/text-level-semantics/the-bdi- > element/ Hmmm, it doesn't seem to me my patch breaks any test in that directory which wasn't already broken, so I suppose that hack is not needed anymore. In addition, this patch fixes bug 1230455, which may actually be related. And it seems to me the tests we are currently failing there would be fixed when we correctly restore isolate controls for reordering.
Blocks: 1230455
(In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #14) > I actually was thinking about just storing whether there are isolate > controls, and just make them use embedingLevel-1 as their level. But later I > realized that it doesn't work. So yes, we need to store the embedding level > of isolate controls, and we may actually need to store them in inline > container frames rather than text frames. After thinking a bit, I guess it could probably be further simplified. We can probably collapse a run of continuous virtual isolate formatting characters into one, with the lowest embedding level in them, and then record it in the next real frame, in the same way as the embedding level. Since we only use isolate formatting characters as separator in the text, those at the end shouldn't affect the reordering result, and also it isn't necessary to distinguish between initiators and terminators. In addition, since they are just used as zero-width separators, collapsing continuous separators into one with the lowest embedding level should have the identical visual behavior as well. This way, we would not need to change the code much, and also it should have better performance than trying to record all those virtual characters. smontagu, do you have any concern about this approach?
Flags: needinfo?(smontagu)
This patch mainly consists of two parts, one for resolving and the other for reordering. In the resolving part, the added code stores the lowest embedding level of all bidi formatting characters precede a frame to the bidi data of that frame when necessary. In the reordering part, virtual frame is restored from the information stored above before asking the bidi engine to reorder frames Collapsing a run of continuous virtual formatting characters into one virtual character with the lowest embedding level among them should work because a character with a higher embedding level than either of its neighbors should not affect the reordering result of any other part of the sequence. (No formal proof of this theorem, though) Review commit: https://reviewboard.mozilla.org/r/60342/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/60342/
Attachment #8764467 - Flags: review?(jfkthame)
Attachment #8764468 - Flags: review?(jfkthame)
Comment on attachment 8764148 [details] Bug 1160847 part 1 - Add some debug functions. Review request updated; see interdiff: https://reviewboard.mozilla.org/r/60196/diff/1-2/
Comment on attachment 8764149 [details] Bug 1160847 part 2 - Add glue to nsBidiPresUtils to use support for bidi isolate in nsBidi. Review request updated; see interdiff: https://reviewboard.mozilla.org/r/60198/diff/2-3/
Comment on attachment 8764467 [details] Bug 1160847 part 3 - Restore virtual bidi control characters for reordering. Also see comment 17.
Flags: needinfo?(smontagu)
Attachment #8764467 - Flags: feedback?(smontagu)
Some unexpected pass on web-platform tests :)
Comment on attachment 8764467 [details] Bug 1160847 part 3 - Restore virtual bidi control characters for reordering. Review request updated; see interdiff: https://reviewboard.mozilla.org/r/60342/diff/1-2/
Attachment #8764467 - Flags: feedback?(smontagu)
Comment on attachment 8764468 [details] Bug 1160847 part 4 - Remove useless lineOffset variable in nsBidiPresUtils::ResolveParagraph. Review request updated; see interdiff: https://reviewboard.mozilla.org/r/60344/diff/1-2/
(In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #17) > smontagu, do you have any concern about this approach? The optimization of only considering the lowest embedding level in the sequence sounds reasonable in theory, though I haven't worked through any examples or looked at the code.
Comment on attachment 8764467 [details] Bug 1160847 part 3 - Restore virtual bidi control characters for reordering. Ah, I didn't realize MozReview cancelled the feedback automatically...
Attachment #8764467 - Flags: feedback?(smontagu)
Better to use 0xfd for kBidiLevelNone: 0xff and 0xfe are already defined in nsBidi.h as special values of nsBidiLevel. The usage probably doesn't overlap, but it seems safer to use something else. (Sorry, I don't know how to reset feedback flags in modern-day Bugzilla)
(In reply to Simon Montagu :smontagu from comment #28) > Better to use 0xfd for kBidiLevelNone: 0xff and 0xfe are already defined in > nsBidi.h as special values of nsBidiLevel. The usage probably doesn't > overlap, but it seems safer to use something else. Oh, I see, NSBIDI_DEFAULT_LTR and NSBIDI_DEFAULT_RTL. I suspect the NSBIDI_LEVEL_OVERRIDE bit flag could also be an issue. In that case, there is basically no safe special value available anymore. Thus I'd argue that we do not really need to worry about those values, but can probably just leave a comment which explains this situation instead. > (Sorry, I don't know how to reset feedback flags in modern-day Bugzilla) You can click the "Details" link on the right of the attachment, and change the flag there.
Attachment #8764148 - Flags: review?(jfkthame) → review+
The patches here look good to me, but I think Simon knows this code considerably better; so before we r+ and land these, I'd like to check if he has time to give any further feedback. Simon, if you won't have more time to give to this (I appreciate you've already spent time on it), that's totally understandable; just let us know.
Flags: needinfo?(smontagu)
Comment on attachment 8764467 [details] Bug 1160847 part 3 - Restore virtual bidi control characters for reordering. Sorry, but I think I've been out of the loop too long for it to make sense for me to comment or review patches on the code level. In general, the approach seems good to me.
Flags: needinfo?(smontagu)
Attachment #8764467 - Flags: feedback?(smontagu)
Comment on attachment 8764149 [details] Bug 1160847 part 2 - Add glue to nsBidiPresUtils to use support for bidi isolate in nsBidi. https://reviewboard.mozilla.org/r/60198/#review58064 LGTM. Getting rid of the "subparagraph" stuff here is awesome - thanks for digging into this.
Attachment #8764149 - Flags: review?(jfkthame) → review+
Attachment #8764467 - Flags: review?(jfkthame) → review+
Comment on attachment 8764467 [details] Bug 1160847 part 3 - Restore virtual bidi control characters for reordering. https://reviewboard.mozilla.org/r/60342/#review58068 ::: layout/base/nsBidi.cpp:14 (Diff revision 2) > #include "nsCRTGlue.h" > > using namespace mozilla::unicode; > > +static_assert(mozilla::kBidiLevelNone > NSBIDI_MAX_EXPLICIT_LEVEL + 1, > + "The psuedo embedding level should be out-of-range"); s/psuedo/pseudo/
Comment on attachment 8764468 [details] Bug 1160847 part 4 - Remove useless lineOffset variable in nsBidiPresUtils::ResolveParagraph. https://reviewboard.mozilla.org/r/60344/#review58070 Yay!
Attachment #8764468 - Flags: review?(jfkthame) → review+
Pushed by xquan@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/386c0b99ebcb part 1 - Add some debug functions. r=jfkthame https://hg.mozilla.org/integration/mozilla-inbound/rev/a6f8f79a0d0e part 2 - Add glue to nsBidiPresUtils to use support for bidi isolate in nsBidi. r=jfkthame https://hg.mozilla.org/integration/mozilla-inbound/rev/ba3a66f75ddf part 3 - Restore virtual bidi control characters for reordering. r=jfkthame https://hg.mozilla.org/integration/mozilla-inbound/rev/f58a6621ea9e part 4 - Remove useless lineOffset variable in nsBidiPresUtils::ResolveParagraph. r=jfkthame
No longer blocks: 1283106
Depends on: 1283106
Blocks: 996627
Depends on: 1370053
Attachment #8608113 - Flags: feedback?(jfkthame)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: