Closed Bug 319930 Opened 14 years ago Closed 14 years ago

Performance issue on large file (nsBidiPresUtils::Resolve/nsBidiPresUtils::RemoveBidiContinuation)

Categories

(Core :: Layout: Text and Fonts, defect)

defect
Not set

Tracking

()

RESOLVED FIXED

People

(Reporter: jonsmirl, Assigned: uriber)

References

Details

Attachments

(4 files, 1 obsolete file)

Can something be done in the layout thread to lower it's priority if a large file is opened by accident? For example opening this file on Linux will lock Mozilla for over ten minutes: file:///usr/share/mime-info/gnome-vfs.keys

It's a 2MB unicode file, but opening it shouldn't make the browser unresponsive for ten minutes.

vi can open it in under one second and jump to the end. vi knows how many lines are in the file so it must have scanned it. vi displays the unicode characters correctly (using Gnome Terminal).
Good news, CVS head renders this file in 2 minutes compared to over
10+ minutes with Firefox 1.5. I believe this is because dbaron's
rendering rewrite is checked into the trunk now.

I attached the full profile but here's the highlights, 80% of the two
minutes is spent in
nsBlockFrame::DoRemoveFrame().

Total hit count: 13595
Count %Total  Function Name
10617   78.1     nsBlockFrame::DoRemoveFrame(nsIFrame*, int)

It all comes from this call stack

             11344 nsBlockFrame::RemoveFrame(nsIAtom*, nsIFrame*)
 28975 10617    11344 nsBlockFrame::DoRemoveFrame(nsIFrame*, int)
               323 nsContinuingTextFrame::Destroy(nsPresContext*)
               290 nsBlockFrame::TryAllLines(nsLineList_iterator*,
nsLineList_iterator*, int*)
               100 nsIFrame::Invalidate(nsRect const&, int) const
                 4 nsViewManager::UpdateView(nsIView*, nsRect const&,
unsigned int)
                 2 nsFrame::Destroy(nsPresContext*)
                 2 nsFrame::GetOffsetFromView(nsPoint&, nsIView**) const
                 2 nsBlockFrame::ClearLineCursor()
                 1 nsTableOuterFrame::Destroy(nsPresContext*)
                 1 nsLineBox::Destroy(nsIPresShell*)
                 1 nsLineBox::~nsLineBox()
                 1 PresShell::IsPaintingSuppressed(int*)
(In reply to comment #1)
> Good news, CVS head renders this file in 2 minutes compared to over
> 10+ minutes with Firefox 1.5. I believe this is because dbaron's
> rendering rewrite is checked into the trunk now.

It isn't, though :-).

This looks like something to do with bidi. nsBidiPresUtils::Resolve calls nsBidiPresUtils::RemoveBidiContinuation which tears down lots of frames. This is probably some sort of O(N^2) thing.
Source file is 2MB unziped, 50K lines of UTF8
It is part of gnome localization so it includes lines in most common languages.

[af][ar][az][be][bg][bn][bs][ca][cs][cy][da][de][el][en_CA][en_GB][eo][es][et][eu]
[fi][fr][ga][gl][gu][he][hi][hr][hu][id][is][it][ja][ko][li][lt][mk][ml][mn][ms]
[nb][ne][nl][nn][no][pa][pl][pt][pt_ecido][ro][ru][sk][sl][sq][sr][sr@Latn][sr@ije]
[sv][ta][th][tk][tr][uk][vi][wa][zh_CN][zh_TW]
Jon, thanks for profiling this!

I think we should use this as the bug for making nsBidiPresUtils::Resolve and in particular nsBidiPresUtils::RemoveBidiContinuation faster.  That also came up as a major issue in bug 317334.  See bug 317334 comment 9 and bug 317334 comment 10.  (where the O(N^2) nature of the current code is discussed).  Also note bug 317334 comment 17 and 18 and bug 317334 comment 35 (discussion of possible solutions).

We should probably finish up the work in bug 299065 and then look into this...
Assignee: roc → mozilla
Component: Layout: View Rendering → Layout: BiDi Hebrew & Arabic
Depends on: 299065
QA Contact: ian → zach
Summary: Performance issue on large file → Performance issue on large file (nsBidiPresUtils::Resolve/nsBidiPresUtils::RemoveBidiContinuation)
Version: 1.8 Branch → Trunk
FWIW, RemoveBidiContinuation() itself is not O(N^2), it's linear (I couldn't find anything actually O(N^2) in the code, and crude stop-watch measurements verified the linearity).
The overall O(N^2) here comes from the fact that nsBidiPresUtils::Resolve() (and therefore RemoveBidiContinuation) is called repeatedly with incremental reflows.

So the basic problem here, IMO, is that bidi resolution removes all of the next-in-flows whenever it's called, and then the inline reflow code has to re-create them. I was wondering about this while working on bug 299065 (because I'm having lot's of difficulties expanding the current logic to deal with the parent inlines, and given the huge performance problems here, I'm wondering if expanding the current logic to deal with inlines is even a good idea).
> FWIW, RemoveBidiContinuation() itself is not O(N^2), it's linear

I'm not sure how it can be, given that it's removing nodes from a singly linked list starting at the end.

> is called repeatedly with incremental reflows.

Yes, that's an issue too (and makes the whole thing O(N^3), I bet).  If we can fix things to not remove anything, so much the better!
(In reply to comment #7)
> > FWIW, RemoveBidiContinuation() itself is not O(N^2), it's linear
> 
> I'm not sure how it can be, given that it's removing nodes from a singly linked
> list starting at the end.

It's not iterating over the linked list in orser to get to the last node. It uses an index into an array (mLogicalFrames) to arrive directly at the last frame.
Yes, but removing a node from a singly linked list requires finding the previous entry in the list.  Which is O(position in list of the thing being removed); for removals starting at the end the total time finding these previous entries will be O((last position involved)^2).
(In reply to comment #9)
> Yes, but removing a node from a singly linked list requires finding the
> previous entry in the list.  Which is O(position in list of the thing being
> removed); for removals starting at the end the total time finding these
> previous entries will be O((last position involved)^2).
> 

You are correct, of course.

However, at least from what I'm seeing with my measurements, the loop for finding the frame to be deleted is relatively fast, compared to the other actions performed in the outer loop. In other words, the actual execution time of RemoveBidiContinuation() is only slightly more than linear.
... but looking at the attached jprof log, I'm probably wrong again, as it indicates that most of the time is spent in nsBlockFrame::DoRemoveFrame() (the inner loop).

Sorry for all the spam.
Uri, the frame finding is indeed fast, relatively speaking.  Problems start when the linked list involved has tens of thousands of items in it...
OS: Linux → All
Hardware: PC → All
Attached patch patch (obsolete) — Splinter Review
In RemoveBidiContinuation(), don't actually remove the extra continuations. Instead, just mark them as fluid continuations, and allow the normal reflow code to either re-use them or remove them as necessary.
Basically this is taking the approach which I already was using for inline container continuations, and applying it to the leaf continuations as well.
Assignee: mozilla → uriber
Status: NEW → ASSIGNED
Attachment #214289 - Flags: superreview?(bzbarsky)
Attachment #214289 - Flags: review?(smontagu)
Comment on attachment 214289 [details] [diff] [review]
patch

sr=bzbarsky, but can we assert that prev->GetNextInFlow() is null or already equal to frame or isn't getting lost off the flow chain in general?  Same for frame->GetPrevInFlow()?  I guess it'll be null or equal to prev, right?  Should assert that.
Attachment #214289 - Flags: superreview?(bzbarsky) → superreview+
(In reply to comment #14)
> (From update of attachment 214289 [details] [diff] [review] [edit])
> sr=bzbarsky, but can we assert that prev->GetNextInFlow() is null or already
> equal to frame or isn't getting lost off the flow chain in general?  Same for
> frame->GetPrevInFlow()?  I guess it'll be null or equal to prev, right?  Should
> assert that.
> 

I'm getting |prev| by doing frame->GetPrevContinuation(), so I don't see a point in asserting that |prev| is indeed |frame|'s prev continuation two lines later.
As for the other way around (asserting that |frame| is |prev|'s next continuation), I can do that, but all it will do is to prove that the integrity of the double-linked list is intact - something that we're not asserting anywhere else (which I understood was OK with you, according to bug 299065 comment 52).
> so I don't see a point in asserting that |prev| is indeed |frame|'s prev
> continuation 

in-flow.  Not quite the same.  At the moment any in-flow is a continuation, of course.  That's why the assert is ok to add.

As in, what I want us to assert is that we're not losing frames from the flow chain here...  Just to make it clear why this is safe, if nothing else.
Attachment #214289 - Attachment is obsolete: true
Attachment #214334 - Flags: superreview?(bzbarsky)
Attachment #214334 - Flags: review?(smontagu)
Attachment #214289 - Flags: review?(smontagu)
Attachment #214334 - Flags: superreview?(bzbarsky) → superreview+
Comment on attachment 214334 [details] [diff] [review]
patch, with asserts [checked in]

r=me, but you might want to edit the comments at http://lxr.mozilla.org/seamonkey/source/layout/base/nsBidiPresUtils.h#227 and http://lxr.mozilla.org/seamonkey/source/layout/base/nsBidiPresUtils.cpp#207 which still say the frames are deleted.
Attachment #214334 - Flags: review?(smontagu) → review+
Checked in, with comment adjustments:
Checking in layout/base/nsBidiPresUtils.h;
/cvsroot/mozilla/layout/base/nsBidiPresUtils.h,v  <--  nsBidiPresUtils.h
new revision: 1.18; previous revision: 1.17
done
Checking in layout/base/nsBidiPresUtils.cpp;
/cvsroot/mozilla/layout/base/nsBidiPresUtils.cpp,v  <--  nsBidiPresUtils.cpp
new revision: 1.65; previous revision: 1.64
done

I'm leaving the bug open for now, until we check the actual performance improvement.
(In reply to comment #19)
> 
> I'm leaving the bug open for now, until we check the actual performance
> improvement.
> 

Uri: did you rtry to open the testcase and see what times we got?
(In reply to comment #20)
> 
> Uri: did you rtry to open the testcase and see what times we got?
> 

I only had a debug build, on which performance testing is pretty useless. I'd  like to compare to nightlies to get meaningful results.
Comparing an hourly build containing the fix to yesterday's nightly, I get the following times for the attached file:

Before: 6:10 min. (370 sec.)
After:  1:42 min. (102 sec.)
The same file in Latin-1 encoding (i.e., no bidi): 5.5 sec.

So while the improvement is significant, it seems like there's still a lot of bidi-related overhead. I even think I know where it is (in-flows being destroyed and re-created much more than necessary), but if someone (Jon?) can repeat the profiling with the fix, that would greatly help.
Attachment #214334 - Attachment description: patch, with asserts → patch, with asserts [checked in]
> The same file in Latin-1 encoding (i.e., no bidi): 5.5 sec.

Not just no bidi, but also many fewer fonts, etc.  I'll profile for sure locally when I get home tonight, but a preliminary profile over remote X doesn't show much bidi code...  Granted, in-flows being created/destroyed is not very bidi-specific code.  ;)
OK, I ran some tests.  The data that follows is using jprof, so there are 8192 hits per second (that is, the time to load the document in seconds is about the number of hits over 8192; it's only accurate to within about 10% though, given how I was testing):

Without this patch:  Total hit count: 4155857
With this patch, using different charsets to check on comment 23:
  As UTF8: Total hit count: 1548641
  As ISO-8859-1: Total hit count: 61951
  As ISO-8859-8: Total hit count: 1820885
  AS EUC-JP: Total hit count: 65619

More details for the "with patch as UTF8 profile":

Total hit count: 1548641
Count %Total  Function Name
321107   20.7     nsBlockFrame::DoRemoveFrame(nsIFrame*, int, int)
66105   4.3     FcCharSetDestroy
39644   2.6     XftCharIndex
39302   2.5     SearchTable
37028   2.4     nsFontMetricsXft::FindFont(unsigned int)

That time in nsBlockFrame::DoRemoveFrame is more like 70% of total time in my "without this patch" profile.

With this patch, all the calls to DoRemoveFrame come from nsBlockFrame::DeleteNextInFlowChild.  Between DeleteNextInFlowChild and nsTextFrame::Reflow we account for about 60% of the total time spent on the testcase.

Conclusions: Comment 22 is right on the money on all counts.

Perhaps we could mark as fluid only the "dirty" bidi continuations?  Or something?  It'd help if we didn't have to reflow the world every time here...

It's probably worth resolving this bug for tracking purposes and filing followup bugs (still blocking bug 317334 as needed).
OK, I filed bug 329878 as a follow-up. Resolving this one FIXED.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Depends on: 330269
Depends on: 330373
Component: Layout: BiDi Hebrew & Arabic → Layout: Text
QA Contact: zach → layout.fonts-and-text
You need to log in before you can comment on or make changes to this bug.