Closed Bug 263359 Opened 20 years ago Closed 13 years ago

bidi algorithm implementation inconsistent with msie (newlines in <pre>)

Categories

(Core :: Layout: Text and Fonts, defect)

defect
Not set
minor

Tracking

()

RESOLVED FIXED
mozilla5

People

(Reporter: noamtm, Assigned: smontagu)

References

(Blocks 2 open bugs)

Details

(Keywords: perf)

Attachments

(17 files, 9 obsolete files)

164 bytes, text/html
Details
342 bytes, image/png
Details
344 bytes, image/png
Details
169 bytes, text/html
Details
11.53 KB, patch
roc
: review+
Details | Diff | Splinter Review
2.84 KB, patch
roc
: review+
Details | Diff | Splinter Review
20.95 KB, patch
roc
: review+
Details | Diff | Splinter Review
11.35 KB, patch
smontagu
: review+
Details | Diff | Splinter Review
16.99 KB, patch
roc
: review+
Details | Diff | Splinter Review
10.26 KB, patch
Details | Diff | Splinter Review
10.21 KB, patch
Details | Diff | Splinter Review
4.15 KB, patch
roc
: review+
Details | Diff | Splinter Review
12.03 KB, patch
roc
: review+
Details | Diff | Splinter Review
19.68 KB, image/png
Details
1.49 KB, patch
roc
: review+
Details | Diff | Splinter Review
2.39 KB, patch
roc
: review+
Details | Diff | Splinter Review
11.72 KB, patch
roc
: review+
Details | Diff | Splinter Review
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20040913 Firefox/0.10 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20040913 Firefox/0.10 In mozilla, cross-line bidi text is rendered differently that in MSIE (or other Microsoft applications, for that matter). A testcase is worth a thounsand words, so I will not write too much, and my next comment will have a testcase. Anyway, I don't know if this is a mozilla bug or an MSIE bug. I don't have enough knowledge of the bidi algorithm. So - if any of you do have that knowledge, please - either confirm this as a bug, or (more likely) drop it as a WONTFIX. I don't mean mozilla to be bug-compatible with IE. More info in the next comment. Can't I attach testcases to bug reports? Reproducible: Always Steps to Reproduce:
Attached file part 1 (html)
This is the html source. Look at the following two images: msie.png: how the file looks in MSIE. mozilla.png: how the file looks in Mozilla. Look at the position of the point on the first line.
Status: UNCONFIRMED → NEW
Component: Browser-General → Layout: BiDi Hebrew & Arabic
Ever confirmed: true
Summary: bidi algorithm implementation inconsistent with msie → bidi algorithm implementation inconsistent with msie (newlines in <pre>)
Attached file a no-<pre> testcase
The bug (if it is really a bug; again, I don't know the bidi algorithm well enough) is not special just to <pre> sections. It occurs EVERYWHERE - in normal html (as in the newly attahced testcase) as well as <textarea>s and Thunderbird/MailNews messages. The question is really this: should a \n, or a <br>, break the flow of the text, or should it be treated like a regular whitespace/punctuation char?
<br> in non-preformatted text is discussed in bug 229367. The two cases are not necessarily equivalent (so this is not a dupe).
Assignee: general → smontagu
Confirming on OS X => All/All
OS: Windows XP → All
Hardware: PC → All
Blocks: 330350
Depends on: 332655
Blocks: 330461
No longer blocks: 330461
Component: Layout: BiDi Hebrew & Arabic → Layout: Text
QA Contact: general → layout.fonts-and-text
Blocks: 477495
According to comments in bug 477495, this can lead to severe performance issues on some pages.
Keywords: perf
Blocks: html5bidi
Copied from bug 607541 comment 0: In elements where line breaks are not collapsed, e.g. <textarea> and elements with white-space:pre|pre-line|pre-wrap, line breaks should constitute UBA paragraph breaks. When a line break introduces a UBA paragraph break, the base direction of the new UBA paragraph will be determined by the computed direction of the nearest ancestor element whose bidi properties require its contents to be in a separate UBA paragraph (or sequence of paragraphs), e.g. a block element or an element directionally isolated by the ubi attribute (which is being proposed in a separate bug). Furthermore, for every element on the path in between that results in the creation of an embedding or override level, e.g. a <bdo> element or any element with a dir attribute or a value other than "normal" for the unicode-bidi CSS property, the correspondeng embedding or override level is re-introduced at the start of the new UBA paragraph (to be closed at the end of the element or the UBA paragraph, whichever comes first). For more information, see: * <http://www.w3.org/International/docs/html-bidi-requirements/#newline-as-separator> * <http://www.w3.org/Bugs/Public/show_bug.cgi?id=10812>
This is a simple test based on http://www.w3.org/Bugs/Public/attachment.cgi?id=925. I'll add more testcases from bug 229367 and its many dupes.
Attachment #493939 - Attachment description: Patch part 3: Call ResolveParagraph on encountering <br> or embedding blocks → Patch part 3: Call ResolveParagraph on encountering <br> or embedded blocks
(In reply to comment #9) > http://test.csswg.org/suites/css2.1/20101001/xhtml1/bidi-breaking-002.xht > (I think tests 3 and 4 are invalid.) Tests 3 and 4 are for the behavior of PARAGRAPH SEPARATOR (U+2029) and LINE SEPARATOR (U+2028), respectively, in <pre>. HTML4 used to say something rather ambiguous about them (http://www.w3.org/TR/REC-html40/struct/text.html#whitespace), but HTML5 no longer says anything at all about them, thus leaving them in the purview of CSS, and since that does not say anything about them either, they should be treated according to their Unicode definition, i.e. to start a new line (whether in pre-formatted text or not), with a new bidi paragraph for U+2029 and no new bidi paragraph for U+2028. Thus, I think that test 3 *is* valid. Unfortunately, test 4 does have a bug. While its comment says "line separator does not break bidi paragraph", it does not actually test this because its second line both starts and ends with an RTL character. Test 4 should thus be something like this: <div class="set"> <div class="test"> <div class="pre"> &rlm;&nbsp; + - &times; &divide; &#x05D0;&#x2028;&nbsp; + - &times; &divide; &#x05EA;</div> </div> <div class="control"> <div><bdo dir="ltr">&#x05D0; &divide; &times; - + &nbsp;</bdo></div> <div><bdo dir="ltr">&#x05EA; &divide; &times; - + &nbsp;</bdo></div> </div> </div>
Summary of where we are with this: The patches so far attached pass on tryserver and fix the related bugs bug 229367 and bug 613157 but don't address newlines in preformatted text, the actual issue that this bug is about. I made a naïve patch to fix preformatted text like this: if (frame->GetStyleContext()->GetStyleText()->NewlineIsSignificant() && frame->HasTerminalNewline()) { ResolveParagraph(aBlockFrame, PR_TRUE); but it doesn't work at all well, I think for two main reasons: * We do bidi resolution before line breaking, so the test for frame->HasTerminalNewLine() isn't right. * Even assuming a solution to that, splitting the paragraph inside a content node breaks various assumptions in ResolveParagraph().
Blocks: 229367, 613157
Presumably instead of using frame->HasTerminalNewline, you need to test NewlineIsSignificant then actually scan the text looking for all \ns.
Will it be OK to actually split the frames after the \ns, or will that confuse the line breaking code in nsTextFrame (or wherever)? If not, bidi resolution will have to pick up again for the next paragraph in mid-frame, which will be a significant complication.
It will be OK to break the frames after the \ns.
Attachment #493938 - Attachment is obsolete: true
Attachment #495868 - Flags: review?(roc)
Attachment #493939 - Attachment is obsolete: true
Attachment #495869 - Flags: review?(roc)
Also removed some dead code. This is a very significant optimization for huge text files with little bits of bidi, such as tinderbox logs.
Attachment #495873 - Flags: review?(roc)
Attached patch Tests for this bug (obsolete) — Splinter Review
Again, this is a very basic test and I will add more.
Comment on attachment 495868 [details] [diff] [review] Patch part 2 v2: more refactoring -- spin ResolveParagraph out of Resolve + nsBlockInFlowLineIterator* mLineIter; nsAutoPtr, so you can remove the manual delete
Attachment #495868 - Flags: review?(roc) → review+
+ if (frame->GetStyleContext()->GetStyleDisplay()->mDisplay != + NS_STYLE_DISPLAY_INLINE) { Is this correct? Should a display:inline-block really end the paragraph? I wouldn't have thought so. Seems to me you probably want IsInlineOutside(). I think mEmbeddingStack would be slightly clearer if it was an nsTArray<PRUnichar>. It shouldn't be interpreted as a string. The flow of control in part 3 here seems a little weird to me. Before we did InitLogicalArray followed by ResolveParagraph which resolves the whole block. Now, during InitLogicalArray we call ResolveParagraph for all paragraphs except the last one, which has ResolveParagraph called by Resolve. I think this would make more sense if you renamed InitLogicalArray to TraverseFrames and added a comment to the call to ResolveParagraph from Resolve to say "// Resolve final paragraph".
+ nsIFrame* lastPreformattedFrame = frame; This initializer is redundant. + /* + * If there is no newline in the frame, just save the text and + * do bidi resolution later + */ + mBuffer.Append(Substring(text, start)); + break; Don't we need to set lastPreformattedFrame here? I guess I'm not really sure what lastPreformattedFrame is supposed to mean. Is it supposed to be firstPreformattedFrame->GetLastContinuation()? + PRUint32 lineLength = endLine; lineLength is a misnomer. Why not just keep using endLine? + if (!next) { + // If the frame has no next in flow, create one + CreateContinuation(frame, &next, PR_TRUE); + } Why not move this up inside the previous 'if'?
After part 1 stopped storing the bidi control codes in temporary frames, we can do away with the temporary frames altogether and not waste time creating and destroying them.
Attachment #498344 - Flags: review?(roc)
(In reply to comment #29) > Is this correct? Should a display:inline-block really end the paragraph? I > wouldn't have thought so. Seems to me you probably want IsInlineOutside(). You're right. Fixed and added a test for that case. Also handled the logic for getting the right next sibling in TraverseFrames here instead of the lastPreformattedFrame stuff in [the previous version of] the next patch.
Attachment #495869 - Attachment is obsolete: true
Attachment #498346 - Flags: review?(roc)
Attachment #495869 - Flags: review?(roc)
Attachment #493940 - Attachment is obsolete: true
Attachment #495874 - Attachment is obsolete: true
I've had this in my local tree for years, and I'd like to check it in to save mq churn.
Attachment #498356 - Flags: review?(roc)
Comment on attachment 498346 [details] [diff] [review] Patch part 3 v3: Call ResolveParagraph on encountering <br> or embedding blocks; updated to comments Sorry, there's a mistake in this patch
Attachment #498346 - Flags: review?(roc)
Comment on attachment 495873 [details] [diff] [review] Patch part 5: optimize by not doing unnecessary resolution Asking approval for this set of patches. It's not risk-free: I've tested extensively and fixed a number of regressions, but there is some chance that there will be other regressions. I think it's worth taking the patches both for standards conformance and for performance: a tinderbox log with 85000+ lines loads for me in 7 seconds with these patches applied, as opposed to 80 seconds in an unpatched build.
Attachment #495873 - Flags: approval2.0?
Comment on attachment 495873 [details] [diff] [review] Patch part 5: optimize by not doing unnecessary resolution Tempting as it is, I think this isn't a good time to take it. Let's land it for post-FF4.
Attachment #495873 - Flags: approval2.0? → approval2.0-
Depends on: post2.0
Blocks: 566066
Simon, please land the patches which are meant to be landed here on cedar.
Depends on: 645119
I backed this out because it caused bug 645119. Sorry. :( http://hg.mozilla.org/mozilla-central/rev/0dc92bb949bf Another thing to watch for is a bug which a Persian user reported to me. The bidi resolution breaks when pressing a newline and waiting for a repaint. I'll attach the screenshot that I received from this user next.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 645193
Whiteboard: [not-ready-for-cedar]
Attachment #522788 - Flags: review?(roc)
Bug 645119 is caused by attachment 495873 [details] [diff] [review], which was a ride-along optimization and not really part of the fix for this bug. I'll open a new bug for it and fix the regression there.
Whiteboard: [not-ready-for-cedar]
Depends on: 650189
Depends on: 650475
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee: nobody → smontagu
OK, I tested bug 650189 with all of these builds, here are the results: > http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/smontagu@mozilla.com-9072c5d3dfcc No perf regression out of 10 runs. > http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/smontagu@mozilla.com-e794a0c595ef No perf regression (1 run out of 10 runs performed horribly, but I'm not sure what the cause was). > http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/smontagu@mozilla.com-ceebcc71f7a8 Horrible perf when loading from network, good perf when loading from cache. The only way that I can explain this is that this build might have the regression, but the regression _might_ be dependent on the way the data is fed into the reflow process. By the same token, e794a0c595ef _might_ contain the regression as well (but I can't verify this claim). I tested e794a0c595ef a lot of times after testing this builds, and couldn't reproduce the regression, so it might have been something else. > http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/smontagu@mozilla.com-681666d3bb11 Same as ceebcc71f7a8. > http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/smontagu@mozilla.com-639aa83e113f Same as ceebcc71f7a8.
(In reply to comment #56) > How is > http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/smontagu@mozilla.com-4d67e3300692 > ? At first it seemed to be better, but I managed to reproduce the bug after a couple of tries.
Blocks: 582181
Does this bug cover the bidi paragraph separation for <br>, as specified in <http://www.whatwg.org/specs/web-apps/current-work/multipage/text-level-semantics.html#the-br-element>?
That depends what you mean by "this bug". It's really bug 229367, but it's fixed by the patches in this bug, specifically attachment 498455 [details] [diff] [review]
This patch kills two birds with one stone: it fixes the performance regression (bug 650189) and also prevents the regression in bug 650489.
Attachment #537515 - Flags: review?(roc)
Comment on attachment 537515 [details] [diff] [review] Patch part 2.5: cache the lines for each frame while building mLogicalArray Review of attachment 537515 [details] [diff] [review]: -----------------------------------------------------------------
Attachment #537515 - Flags: review?(roc) → review+
Blocks: 650489
Status: ASSIGNED → RESOLVED
Closed: 14 years ago13 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Depends on: 663295
Depends on: 663662
Depends on: 664087
Blocks: 638758
Depends on: 668941
Blocks: 695640
Blocks: 557520
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: