Closed Bug 359555 Opened 18 years ago Closed 3 years ago

2X slow rendering of big page with different content on slow PC.

Categories

(Core :: Layout, defect)

x86
Linux
defect
Not set
major

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: romaxa, Unassigned)

References

()

Details

(Keywords: perf)

Attachments

(3 files, 2 obsolete files)

Loading of this page take 2X longer to load than in Opera

Reflow branch does not help in this situation...
Keywords: perf
Reflow branch 20061031 build.
This profile has been created on slow ARM PC ~250MHz, 64 RAM.
Page loading time ~37 sec.VS Opera = ~18 sec
(In reply to comment #1)
> Created an attachment (id=244722) [edit]
> Gnu profile data for intro-linux page
> 
> Reflow branch 20061031 build.
> This profile has been created on slow ARM PC ~250MHz, 64 RAM.
> Page loading time ~37 sec.VS Opera = ~18 sec
> 
After changing of this in nsTextFrame.cpp (::MeasureText) it take ~3 sec of performance....
---   aReflowState.rendContext->GetTextDimensions(bp1[2], wordLen, dimensions);
+++   dimensions.ascent = 200;
+++   dimensions.descent = 60;
+++   dimensions.width = wordLen * 10;

Disabling of nsTextFrame::Reflow(...) take still ~3 sec...




Blocks: 71668
Version: Other Branch → Trunk
....

1.15 780.00  78.00      166     0.47     0.96  nsTextFrame::MeasureText
1.02 849.00  69.00      116     0.59     0.94  nsTextTransformer::GetNextWord
...
For this probably we need to do some optimizations in nsTextFrame class, because in current implementation, in most cases we have twice ::Measure for the same TextFrames.


..........
2.55 173.00   173.00      244     0.71     0.96  SelectorMatches
2.30 329.00   156.00      342     0.46     0.46  sStyleContext::GetStyleData
2.24 481.00   152.00      206     0.74     0.76  nsRuleNode::GetStyleData
1.69 596.00   115.00      312     0.37     0.79  PL_DHashTableOperate
..........
?

Please mark this dependent on the textframe rewrite and reflow branch landing.  Until those happen, there's little point in profiling the code in question.
Attached file Zipped testpage
Attached file BZipped JPROFile for URL (obsolete) —
Attached file BZipped JPROFile for URL (obsolete) —
Results for current trunk
Opera9.10: ~17 sec
MozillaTrunk: ~30sec

AMD 500MHz, 128MB RAM, 300Mb swap.
Is this profile from an optimized build? nsCOMPtr<nsIAtom>::get() should get inlined normally.
Oh yep, it was -O0 build profile...

this is profile for -O2 build
Attachment #258946 - Attachment is obsolete: true
Attachment #258947 - Attachment is obsolete: true
Hmm, this is really -O2?  nsCOMPtr<nsIAtom>::get() is still showing up in the profile... if it really isn't getting inlined in normal opt builds, we need to figure out why, because that's a significant overall slowdown.  (I think this also has something to do with SelectorMatches showing up in the profile.)

Total hit count: 114355

74319 ViewportFrame::Reflow
We're spending most of our time in reflow, which is no surprise.

22474 nsCSSFrameConstructor::ContentAppended(nsIContent*, int)
Frame construction takes a significant chunk of time; no obvious hotspots.

Another 10000 hits is spent in parsing excluding nsCSSFrameConstructor::ContentAppended.

The textrun code seems to need a lot of attention:
2249   2.0     gettimeofday
1849   1.6     HashString(nsAString_internal const&)
Together that's 3.6% of the execution time; almost all the calls are from the textrun cache.  (There's a total of 17815 hits in gfxTextRunCache::GetOrMakeTextRun.)

11061 nsTableFrame::GetMinWidth(nsIRenderingContext*)
Do we really need to be computing the minimum width for every single table?  Is there some way to get around that?  (All of the tables in this document end up with the CSS computed width anyway.)

Also seems like textframe reflow could be speeded up, but there's no point to examining that until the new textframe code lands.  Nothing else seems to stand out.
>  Do we really need to be computing the minimum width for every single table?

Yes, because auto-layout tables never shrink below their min width.  Which means if a CSS width is specified we need to check whether the min width is greater.

gettimeofday is generally expensive, but new textframe will also change how the textrun cache works.

And in fact, profiling text-dominated layout until after new textframe lands is pretty pointless.

As for the nsCOMPtr thing... does it matter that this is ARM?
> does it matter that this is ARM?
AMD 500MHz, 128MB RAM, 300Mb swap.
In that case Eli is right -- that function signature should not be appearing in your profile.
romaxa: please indicate your version of gcc, ld, and the configure flags you're using.
Should this bug still be opened? Were the patches applied?
There are no patches here. Testcase should be rerun, and if slow, reprofiled.
romaxa is this still a problem?

I tested this on Ubuntu 20 with the latest firefox release version 92.0.1 and the test site loaded really fast
i will set this as resolved for now , please feel free to change if the issue still occurs.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: