Closed Bug 1288761 Opened 3 years ago Closed 2 years ago
bidi resolution becomes slow for extremely long paragraphs of text with mixed directionality
3.16 KB, text/html
59 bytes, text/x-review-board-request
59 bytes, text/x-review-board-request
I've put a testcase at http://people.mozilla.org/~jkew/tests/textperf1.html that allows us to experiment with pretty "raw" text layout performance in a few ways. The testcase generates a number of chunks of text (by default, 500), each containing 200 random "words", and dumps all this, separated by newlines, into a single <div> element. It then accesses the div's offsetHeight, to force layout to happen, and times how long this takes, using one of three different white-space values: 'pre', which means no line wrapping happens; 'pre-line', so that each 200-word paragraph is separately wrapped; and 'normal', where the entire block of text becomes a *single* wrapped paragraph. The most striking observation from this example is that bidi processing for a long paragraph with a mixture of LTR/RTL text is really bad. With LTR-only text, there's little difference between the timings for the three wrapping options; they're typically in the region of 450ms on my MacBook. However, when occasional Arabic characters are added, forcing bidi resolution to happen, the time for white-space:pre increases a bit (to 800ms or so); for pre-line, it jumps to 1500ms; but for white-space:normal, where the entire block of mixed-direction text becomes a single paragraph, the time goes to around 4800ms. Profiling shows that the vast majority of the time is spent in/under nsBidiPresUtils::ResolveParagraph. From the way the time increases as the length of text increases, it looks like we're running into an O(n^2) performance issue. (Try doubling the amount of text generated, but be prepared to wait...) Note that Chrome does not appear to suffer from this; it maintains a pretty consistent layout time across all three white-space settings, even when the Arabic characters are present in the text.
I expect ResolveParagraph to be O(n)... unless the bidi engine takes another O(n) inside the O(n) loop... And yes, GetLogicalRun takes another O(n)...
So the bidi engine stores bidi runs in visual order, however in ResolveParagraph, we try to get it in logical order, and consequently the bidi engine needs to loop all runs every time to provide the run we want, which is the source of the O(n) there. We would need to find a different algorithm for ResolveParagraph to efficiently use the API from the bidi engine.
Looking at the API provided by ICU, I guess the best thing we can try is to use GetLevels and count the run length ourselves rather than querying logical run. GetLevels is currently not implemented in nsBidi, but it isn't too complicated, so we can port it. Alternatively, we can wait until we remove nsBidi and use ICU directly.
I'll come to this again when we start using ICU's bidi engine. (Since otherwise we would need to migrate another function to our own bidi engine impl, which is being removed in forseeable future.)
Depends on: 924851
In bug 1387332 comment 2, we say we're going to start removing ENABLE_INTL_API=no code in mozilla-58, so at that point we can drop our old version of the bidi code (bug 1308359) and look again at this. Xidorn, would you have a chance to come back to this one sometime soon?
I guess I can come back to this once the old bidi port is removed. The dependency should be enough to remind me when everything else is ready, so clear the ni? for now.
The test page seems to be unavailable now (people.mozilla.org has gone, IIRC). Could you attach that page as an attachment in this bug?
Had a patch which should work... In the testcase above, with Number being 500 and RTL checked, before my patch, it takes ~7s in my local debug build, and profiler shows about half of the time spends in ubidi_getLogicalRun, which matches my theorem above. After my patch, the same setting takes ~3s, and the replacement of ubidi_getLogicalRun doesn't show up anymore. The number above is get with > NS_ASSERTION(aEnd->LastChild() == curFrame, "Unexpected curFrame"); in nsLineBox::RFindLineContaining removed. This assertion is very expensive, and having this makes the total time bump to >30s because LastChild is very inefficient. The testcase in my local beta version shows ~5s, and Gecko Profiler shows that 74% of the time is in ubidi_getLogicalRun. I would expect that this 74% disappears completely with the patch. We would need an opt build to confirm, though. I'm also thinking about adding a pref-reftest for this case.
Assignee: nobody → xidorn+moz
With the opt build, I can confirm that this patch at least makes the three white-space settings on par.
There is another performance issue related to this testcase, which can be reproduced even with pre-line. See bug 1405989.
Attachment #8915791 - Flags: review?(bobbyholley) → review?(rwood)
Comment on attachment 8915790 [details] Bug 1288761 part 1 - Use level-based algorithm for nsBidi::GetLogicalRun. https://reviewboard.mozilla.org/r/187014/#review192206 I wonder if there'd be scope for a further micro-optimization to make the loop in nsBidi::GetLogicalRun check 4 or 8 bytes at once so as to skip more rapidly over long runs of the same level? My example page isn't a good testcase for this because it scatters lots of one- or two-character RTL runs around, so GetLogicalRun will usually do only a short scan, but I suspect real-world pages will tend to have mostly longer runs of a given level. But let's go ahead and take the simple approach here for now, which is clearly a great improvement. ::: layout/base/nsBidi.h:101 (Diff revision 1) > /** > * Get a logical run. > * This function returns information about a run and is used > * to retrieve runs in logical order.<p> > * This is especially useful for line-breaking on a paragraph. > * Please add a comment that `CountRuns` must have been called before calling this function. (I see there's an assertion to check this, but it'd be good to document here as well.)
Attachment #8915790 - Flags: review?(jfkthame) → review+
Comment on attachment 8915790 [details] Bug 1288761 part 1 - Use level-based algorithm for nsBidi::GetLogicalRun. https://reviewboard.mozilla.org/r/187014/#review192206 I hope compilers can do auto-vectorization for this... Don't really want to write that kind of code by hand :/ It's probably worth checking whether compilers are doing that as expected.
Comment on attachment 8915791 [details] Bug 1288761 part 2 - Add perf reftest for bidi resolution. https://reviewboard.mozilla.org/r/187016/#review192232 LGTM, though I'm not very familiar with the perf reftest setup.
Attachment #8915791 - Flags: review?(jfkthame) → review+
(In reply to Xidorn Quan [:xidorn] UTC+10 from comment #16) > Comment on attachment 8915790 [details] > Bug 1288761 part 1 - Use level-based algorithm for nsBidi::GetLogicalRun. > > https://reviewboard.mozilla.org/r/187014/#review192206 > > I hope compilers can do auto-vectorization for this... Don't really want to > write that kind of code by hand :/ > > It's probably worth checking whether compilers are doing that as expected. Sadly, no compiler we use can do auto-vectorization for this code... https://godbolt.org/g/N45YQ7
Comment on attachment 8915791 [details] Bug 1288761 part 2 - Add perf reftest for bidi resolution. https://reviewboard.mozilla.org/r/187016/#review192290 Thanks for adding this! Looks great and green on try, awesome.
Attachment #8915791 - Flags: review?(rwood) → review+
Updated the pref reftest to have it run faster (avoid reflowing twice for scrollbar change). Seems to work fine: https://treeherder.mozilla.org/#/jobs?repo=try&revision=607ffe3774478af3ddbc769e6d0fbc7c7a541c06
Pushed by email@example.com: https://hg.mozilla.org/integration/autoland/rev/2a1e0bb93351 part 1 - Use level-based algorithm for nsBidi::GetLogicalRun. r=jfkthame https://hg.mozilla.org/integration/autoland/rev/3e85f0761fc9 part 2 - Add perf reftest for bidi resolution. r=jfkthame,rwood
You need to log in before you can comment on or make changes to this bug.