Performance problems when rendering a page with hundreds of iframes

NEW
Unassigned

Status

()

Core
Layout: HTML Frames
P4
normal
16 years ago
4 years ago

People

(Reporter: Damian Yerrick, Unassigned)

Tracking

({perf, testcase})

Trunk
Future
x86
Windows 98
perf, testcase
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(6 attachments, 1 obsolete attachment)

(Reporter)

Description

16 years ago
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)
BuildID:    2001100503

When I open a page with hundreds of <iframe> elements, Mozilla illegal ops and 
brings up Talkback.

Reproducible: Always
Steps to Reproduce:
1. Set Mozilla as your default Internet browser.
2. Open outer.html from the test case in a forthcoming attachment.


Actual Results:  MOZILLA caused an invalid page fault in
module MSVCRT.DLL at 015f:7800d16a.
(Continued in another forthcoming attachment.)

Expected Results:  Page is rendered, with each iframe usable.

This computer has 64 MB of RAM.
IE 6 has no problem with this page.

Talkback incident ids include
TB36308553G, TB36417833Q, and TB36418544X.
(Reporter)

Comment 1

16 years ago
Created attachment 52518 [details]
MOZILLA caused an invalid page fault...
(Reporter)

Comment 2

16 years ago
Created attachment 52519 [details]
Initial test case with 500 iframes
worksforme, linux build 2001-10-07-06 linux build.

I _do_ use about 90M ram to view that page, though.  Could you be just running
out of memory?  what are your virtual memory settings?
->evaughan, who took pollmann's bugs.
Assignee: pollmann → evaughan
Keywords: crash, testcase

Comment 5

16 years ago
Wow. that was fun. 25 minutes of solid disk-thrashing, but no crash. Build
2001110803 Windows 95 (on a machine with 64 mb ram)

Comment 6

16 years ago
wfm using build 2001100803 on Win2k. Albeit slow though.

Comment 7

16 years ago
Slow is the word. I just tried a version with 1000 iframes (based on the test
case, but twice as long). WinNT4sp6a, PIII-600, 320 meg memory.

This was a wonderful stress test to watch in action. It took about 20 minutes of
CPU time. The VM size started at 190 meg, peaked at about 250 meg, slooooooowly
dropped to 210 meg, then climbed again to 290 meg. Stayed steady there for a
while and took several minutes before it was ready to display the page.

It would be interesting to know what was happening at what time - what was using
all that memory, what was being freed, etc.

The page with hundreds of iframes scrolls very slowly.
Closing the 1000-iframe page is even more fun: the VM size drops, drip by drip,
slowly, torturously, a few hundred K at a time, from 290 meg to 224 meg, over
the course of 17 minutes CPU time.

Note that opening or closing the page is mostly CPU-intensive - it uses a lot
of memory, but doesn't get the system thrashing; it's the CPU that's getting the
workout.

This test case is a wonderful Bad Example for performance testing under
pathological conditions, and will doubtless be rich in pointers to places that
could do with tightening up. In the meantime, I'm not surprised it died
horribly under Win98 ...

This one is a WFM too; perhaps it should be a different bug, a pathological
test case for anyone wishing to torture their machine.

Comment 8

16 years ago
Created attachment 52692 [details]
Test case zip file, this time with 1000 iframes
Even if this isn't a crasher, it's a perf bug.  Confirming on that basis.
Blocks: 71668
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: perf
Created attachment 52742 [details]
Profile of attachment 52519 [details]
Is it me, or are we spending lots of time locking and unlocking?  Not to mention
allocating memory for 500 docshells, and the like, of course....
NS_NewHTMLContentSink ->
....
XPCJSRuntime::SyncXPCContextList ->
JS_DHashTableEnumerate ->
KillDeadContextsCB ->
JS_ContextIterator ->
js_UnlockRuntime/js_LockRuntime

is close to 1/2 the runtime, most of it in Lock/Unlock.


Much of the rest seems to be creating (root) frames and handling ContentInserted.
hmm.. To be precise, we have:

HTMLContentSink::Init ->
IsScriptEnabled ->
nsDocShell::GetScriptGlobalObject ->
nsDocShell::EnsureScriptEnvironment

Most of the time seems to be spent under there.  Over to XPConnect, ccing jst
and jband
Assignee: evaughan → dbradley
Component: HTMLFrames → XPConnect
QA Contact: amar → pschwartau

Comment 14

16 years ago
Created attachment 52811 [details]
A plot of memory usage on a Athlon 900/ 384Mb ram. Linux kernel 2.4.9. 300 sek to open; another 1200 to close

Comment 15

16 years ago
I've come up with a patch that will reduce the number of calls to
JS_ContextIterator from about 5 mill to a little under 300,000. I need to do
some more testing and timings to make sure the numbers are correct and that
things behave normally.
Status: NEW → ASSIGNED

Comment 16

16 years ago
I'm happy to look at the code even if it is not for sure. 

Ultimately it *might* be best to just add hooks in the JS engine so xpconnect 
can track JSContext creation and destruction when that happens and not some time 
later.

There were subtle problems recycled JSContext pointers in the past when we we 
not calling SyncXPCContextList as often. 

On the other hand, we are pretty close to being able to not be surprized at all 
by JSContext construction and destruction. One really can't use xpconnect on a 
JSContext for which initClasses or initClassesWithNewWrappedGlobal has not been 
called. And, now that we have nsIXPConnect::releaseJSContext we could make it 
so that *all* destroyers of JSContext (that use xpconnect) call xpconnect to do 
their JSContext destruction. Once we can *trust* that xpconnect is being 
correctly informed about these events then we really don't need to do the 
aggressive work done by SyncXPCContextList (except maybe as a DEBUG only 
verification). I think that may be the best way to go.

Comment 17

16 years ago
The following patch seems to reduce the number of context iterations. I think we
still have issues on the cleanup side. This patch basically puts in place a
marking scheme for the context map. Rather than iterating all the context in the
map and then for each one of those iterating the JS's contexts I used a mark and
sweep scheme. This consolidates the iteration of JS context to one loop and the
iteration of the context map to one loop. I got widely different numbers the two
tests runs I made with the original code. They ranged from 5 million to 50
million calls to JS_ContextIterator. In both runs with the patch I came in under
300k of calls. In any case, even going from 5 million to 300k is a decent
improvement.

This patch retains the same logic so is fairly low risk, assuming I coded it
correctly. jband's suggestion is probably the better ultimate solution. So I
think it's worth using this patch until we can validate we won't be surprised by
JSContext construction/destruction.

Comment 18

16 years ago
Created attachment 52882 [details] [diff] [review]
Consolidates context iterations in XPCJSRuntime::SyncXPCContextList

Updated

16 years ago
Priority: -- → P1
Target Milestone: --- → mozilla0.9.6

Comment 19

16 years ago
Just to make performance testers cry: I tried my 1000-iframe example in
IE 5.5. Same machine as before (PIII-600, 320 meg, WinNT4sp6a).

Time to render file: 9 seconds (Mozilla: 20 minutes), not taking 100%
CPU either.

Time to close window: straight away (Mozilla: 17 minutes at 100% CPU).

Ouch.
Anyone got IE 6 to hand?

(I can see it now: Microsoft HTML how-to: "Yes, hundreds of iframes on
a page are the new standard ...")

Comment 20

16 years ago
Ok, a little follow up for today's findings.

I ran another series of tests. Basically the patch reduces the number calls to
js_ContextIterator by about 50 million. Quantify says the reduction in overall
time is about 75%. It's hard to get an identical run with Win2k. So many things
can affect performance. I'm a bit skeptical about that number I think it's less.

Leaving the page is supposedly faster, Quantify reports a 66% decrease in time.
Like above this is suspect.

Other than WaitForSingleObject blocking, SetWindowsPos, ReleaseSemaphore, new,
malloc, TlsGetValue are next on the list.

Not much I can do about SetWindowPos. 

The ReleaseSemaphore is used  via, 

NsThreadPool::DispatchRequest -\
                                \
PR_ExitMonitor --------------------> PR_Unlock -> PR_MD_UNLOCK
                                /
Js_UnlockRuntime =============-/

PR_ExitMonitor is called mainly by the GetNewOrUsed and a few other functions in
XPConnect and nsComponentManagerImpl::GetService

I know for XPConnect this locking just needs to be simple mutex/critical
section. I'm wondering why we're using such a heavy locking function to guard
access to maps and such. (I suspect there's something I'm missing)

Some other observations. RuleHash::AppendToTable gets called 329.658 times. This
seems to emanate from EnumRulesMatching. EnumRulesMatching is called 30,000
times, and everything else propigates from there.

The other thing that probably needs to be looked at is the memory consumption.
It takes about 90 megs to display this page. I don't know if that's acceptable
or not.

Comment 21

16 years ago
dbradley: The patch looks good to me. Seems like a fine place for mark'n'sweep 
to me. Had I been in a mark'n'sweep state of mind when I wrote that code I may 
have thought of it myself. Thanks. I have only nits to complain about...

- might as well rename "KillDeadContextsCB" to "SweepContexts"
- please don't infect xpconnect with "if (" :)
- If it were me, I'd just do "MARK_FLAG = 4" with a comment about avoiding 
overlap if anything changes. There's some (slight) chance some compiler might 
want to use a smaller data type for the enum - no reason to expand unnecessarly.

On locking... xpconnect is using a monitor rather than a lock because there are 
a couple of places where the lock needs to be reentrant. The extra cost (over 
just using the underlying PRLock) is just an extra call to get the current 
thread (maps down to TlsGetValue). Locking and unlocking a PRLock (on Win32) 
maps down to Enter/LeaveCritivalSection. NSPR is structured to use funtion calls 
where we might prefer macros. But this is not really all that heavyweight. It is 
part of the cost we pay for cross platform

On comparing to IE... I speculate that IE is lazier about setting up a JS 
execution environment in iframes. I'd bet that if one ran some JS code in each 
iframe then the difference would be less dramatic (after we fix the glaring 
bits).

This, of course, leads to the question: could we get away with being lazier 
about initializing JSContexts etc. ?

Comment 22

16 years ago
Created attachment 53569 [details] [diff] [review]
Updated with jband's comments

Comment 23

16 years ago
Regarding the locking. The only thing that peaked my interest was the 511 calls
to ReleaseSemaphore, which seemed to cost a fair bit of time. This is only done
if asked for, so not a problem for the majority of locks/unlocks.

Updated

16 years ago
Attachment #52882 - Attachment is obsolete: true

Comment 24

16 years ago
Comment on attachment 53569 [details] [diff] [review]
Updated with jband's comments

r/sr=jband
Attachment #53569 - Flags: review+
Comment on attachment 53569 [details] [diff] [review]
Updated with jband's comments

r/sr=jst
Attachment #53569 - Flags: superreview+

Comment 26

16 years ago
Patch checked in (53569)

There's still more to be addressed so leaving this open.

Comment 27

16 years ago
I did a little more digging using Purify. We are now to the point where we need
to look at reducing the complexity of data structures in some fashion to
conserve memory as well as increase speed. In any case this is outside the realm
of xpconnect and needs to be reassigned. One of the larger users of memory is
RuleHash::AppendRuleToTable. This allocates around 400k for 50 iFrames. I cut
the original sample of 500 iFrames down to 50 so I could run things in a
reasonable amount of time. The following are some other large consumers of memory:

[I] MIU: Memory use of 209865 bytes from 51 blocks allocated in PR_Malloc
        Distribution of blocks in use
        Allocation location
            malloc         [msvcrt.DLL]
            PR_Malloc      [prmem.c:50]
            PresShell::AllocateFrame(UINT,void * *) [nsPresShell.cpp:1789]
            nsStyleContext::SetStyle(nsStyleStructID,nsStyleStruct const&)
[nsStyleContext.cpp:465]
            nsRuleNode::WalkRuleTree(nsStyleStructID,nsIStyleContext
*,nsRuleData *,nsCSSStruct *) [nsRuleNode.cpp:1399]
            nsRuleNode::GetVisibilityData(nsIStyleContext *) [nsRuleNode.cpp:1077]
            nsRuleNode::GetStyleData(nsStyleStructID,nsIStyleContext *)
[nsRuleNode.cpp:4249]
            nsStyleContext::GetStyleData(nsStyleStructID) [nsStyleContext.cpp:384]
            nsBoxFrameInner::CacheAttributes(void) [nsBoxFrame.cpp:420]
            nsBoxFrame::Init(nsIPresContext *,nsIContent *,nsIFrame
*,nsIStyleContext *,nsIFrame *) [nsBoxFrame.cpp:377]

[I] MIU: Memory use of 209865 bytes from 51 blocks allocated in PR_Malloc
        Distribution of blocks in use
        Allocation location
            malloc         [msvcrt.DLL]
            PR_Malloc      [prmem.c:50]
            PresShell::AllocateFrame(UINT,void * *) [nsPresShell.cpp:1789]
            nsRuleNode::CreateRootNode(nsIPresContext *,nsIRuleNode * *)
[nsRuleNode.cpp:370]
            StyleSetImpl::ResolvePseudoStyleFor(nsIPresContext *,nsIContent
*,nsIAtom *,nsIStyleContext *,int,nsICSSPseudoComparator *) [nsStyleSet.cpp:942]
            nsPresContext::ResolvePseudoStyleWithComparator(nsIContent *,nsIAtom
*,nsIStyleContext *,int,nsICSSPseudoComparator *,nsIStyleContext * *)
[nsPresContext.obj:906]
            nsPresContext::ResolvePseudoStyleContextFor(nsIContent *,nsIAtom
*,nsIStyleContext *,int,nsIStyleContext * *) [nsPresContext.obj:884]
            nsCSSFrameConstructor::ConstructRootFrame(nsIPresShell
*,nsIPresContext *,nsIContent *,nsIFrame *&) [nsCSSFrameConstructor.obj:3500]
            StyleSetImpl::ConstructRootFrame(nsIPresContext *,nsIContent
*,nsIFrame *&) [nsStyleSet.cpp:1183]
            PresShell::InitialReflow(int,int) [nsPresShell.cpp:2619]

[I] MIU: Memory use of 208896 bytes from 51 blocks allocated in
nsSupportsArray::GrowArrayBy(int)
        Distribution of blocks in use
        Allocation location
            new(UINT)      [msvcrt.DLL]
            nsSupportsArray::GrowArrayBy(int) [nsSupportsArray.obj:194]
            CSSRuleProcessor::GetRuleCascade(nsIPresContext *,nsIAtom *)
[nsCSSStyleSheet.cpp:4492]
            CSSRuleProcessor::RulesMatching(nsIPresContext *,nsIAtom
*,nsIContent *,nsIAtom *,nsIStyleContext *,nsICSSPseudoComparator
*,nsIRuleWalker *) [nsCSSStyleSheet.cpp:4063]
            EnumPseudoRulesMatching [nsStyleSet.cpp:912]
            StyleSetImpl::ResolvePseudoStyleFor(nsIPresContext *,nsIContent
*,nsIAtom *,nsIStyleContext *,int,nsICSSPseudoComparator *) [nsStyleSet.cpp:945]
            nsPresContext::ResolvePseudoStyleWithComparator(nsIContent *,nsIAtom
*,nsIStyleContext *,int,nsICSSPseudoComparator *,nsIStyleContext * *)
[nsPresContext.obj:906]
            nsPresContext::ResolvePseudoStyleContextFor(nsIContent *,nsIAtom
*,nsIStyleContext *,int,nsIStyleContext * *) [nsPresContext.obj:884]
            nsCSSFrameConstructor::ConstructRootFrame(nsIPresShell
*,nsIPresContext *,nsIContent *,nsIFrame *&) [nsCSSFrameConstructor.obj:3500]
            StyleSetImpl::ConstructRootFrame(nsIPresContext *,nsIContent
*,nsIFrame *&) [nsStyleSet.cpp:1183]
[I] MIU: Memory use of 209865 bytes from 51 blocks allocated in PR_Malloc
        Distribution of blocks in use
        Allocation location
            malloc         [msvcrt.DLL]
            PR_Malloc      [prmem.c:50]
            PresShell::AllocateFrame(UINT,void * *) [nsPresShell.cpp:1789]
            nsRuleNode::Transition(nsIStyleRule *,nsIRuleNode * *)
[nsRuleNode.cpp:414]
            nsRuleWalker::Forward(nsIStyleRule *) [nsRuleWalker.cpp:95]
            ContentEnumFunc [nsCSSStyleSheet.obj:3917]
            RuleHash::EnumerateAllRules(int,nsIAtom *,nsIAtom *,nsVoidArray
const&,(*)(nsICSSStyleRule *,void *),void *) [nsCSSStyleSheet.cpp:634]
            CSSRuleProcessor::RulesMatching(nsIPresContext *,nsIAtom
*,nsIContent *,nsIStyleContext *,nsIRuleWalker *) [nsCSSStyleSheet.cpp:3965]
            EnumRulesMatching [nsStyleSet.cpp:766]
            StyleSetImpl::ResolveStyleFor(nsIPresContext *,nsIContent
*,nsIStyleContext *,int) [nsStyleSet.cpp:866]

[I] MIU: Memory use of 205750 bytes from 50 blocks allocated in PR_Malloc
        Distribution of blocks in use
        Allocation location
            malloc         [msvcrt.DLL]
            PR_Malloc      [prmem.c:50]
            PresShell::AllocateFrame(UINT,void * *) [nsPresShell.cpp:1789]
            nsRuleNode::ComputeBorderData(nsStyleStruct *,nsCSSStruct
const&,nsIStyleContext *,nsRuleNode *,RuleDetail::nsRuleNode const&,int)
[nsRuleNode.cpp:3083]
            nsRuleNode::WalkRuleTree(nsStyleStructID,nsIStyleContext
*,nsRuleData *,nsCSSStruct *) [nsRuleNode.cpp:1411]
            nsRuleNode::GetBorderData(nsIStyleContext *) [nsRuleNode.cpp:1186]
            nsRuleNode::GetStyleData(nsStyleStructID,nsIStyleContext *)
[nsRuleNode.cpp:4249]
            nsStyleContext::GetStyleData(nsStyleStructID) [nsStyleContext.cpp:384]
            nsStyleContext::CalcStyleDifference(nsIStyleContext *,int&,int)
[nsStyleContext.cpp:670]
            CaptureChange  [nsFrameManager.cpp:1551]

[I] MIU: Memory use of 131088 bytes from 2 blocks allocated in
orkinHeap::Alloc(nsIMdbEnv *,UINT,void * *)
        Distribution of blocks in use
        Allocation location
            new(UINT)      [msvcrt.DLL]
            orkinHeap::Alloc(nsIMdbEnv *,UINT,void * *) [orkinHeap.cpp:91]
            morkZone::zone_grow_at(morkEnv *,UINT) [morkZone.cpp:242]
            morkAtomSpace::MakeBookAtomCopyWithAid(morkEnv *,morkFarBookAtom
const&,UINT) [morkAtomSpace.cpp:201]
            morkStore::AddAlias(morkEnv *,morkMid const&,UINT) [morkStore.cpp:999]
            morkBuilder::OnAlias(morkEnv *,morkSpan const&,morkMid const&)
[morkBuilder.cpp:649]
            morkParser::ReadAlias(morkEnv *) [morkParser.cpp:1056]
            morkParser::ReadDict(morkEnv *) [morkParser.cpp:1336]
            ???            [ip=0x08e7e16c]
            morkParser::ReadContent(morkEnv *,BYTE) [morkParser.cpp:1428]

Comment 28

16 years ago
Changing subject to better reflect the focus of this bug.
Reassigning, since the XPConnect issue has been addressed.
Purify/Quantify evidence points to CSS/Rules
Assignee: dbradley → kmcclusk
Status: ASSIGNED → NEW
Summary: Crash when rendering a page with hundreds of iframes → Performance problems when rendering a page with hundreds of iframes

Comment 29

16 years ago
Setting component to Compositor to reflect the new summary and new owner, 
kmcclusk@netscape.com
Component: XPConnect → Compositor
QA Contact: pschwartau → petersen

Updated

16 years ago
Status: NEW → ASSIGNED
Target Milestone: mozilla0.9.6 → mozilla1.0

Comment 30

16 years ago
I think bug 104336 may be of interest here.

Updated

16 years ago
Target Milestone: mozilla1.0 → mozilla1.0.1
Not sure why this was set to Compositor?

Changing to HTMLFrames. Downgrading severity from critical and removing crash. 
This is an extreme edge case and it doesn't crash if you have enough memory.

It took 70 seconds to load on AMD 750MHZ running WINXP. It rendered properly
afterward.
Assignee: kmcclusk → evaughan
Severity: critical → normal
Status: ASSIGNED → NEW
Component: Compositor → HTMLFrames
Keywords: crash
Priority: P1 → P3
Bulk re-assigning all of Eric's HTMLFrame bugs to John.
Assignee: eric → jkeiser
Performance problems with hundreds of iframes is not a priority right now.
Assignee: jkeiser → frame
Priority: P3 → P4
QA Contact: petersen → amar
Target Milestone: mozilla1.0.1 → Future
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6b) Gecko/20031124
Firebird/0.7+ (Nova: SVG,MNG,DOMi,Venkman)

The problem starts at around 40-45 iframes, which may look a lot, but I have
some pages where I need more than that .

same problem in Mozilla-Firebird and K-Meleon
(In reply to comment #33)
> Performance problems with hundreds of iframes is not a priority right now.
Would it be now , more than a year later ?
It is not that the iframes won't render, but the iframe content won't, unless
you resize the window they are in

testcase: 
Open
http://home.planet.nl/~pa1six/mag_eur/mag_eur.html
push on the 1st tab (left column)
scroll down the tabs and select on of the last tabs.
You should see "nothing" happening.
Rightclick inside the displayed frame and notice the weird popup behaviour (side
effect)
Now marginally resize the window, there is the content!
Rge rightframe does scoll properly, but somehow Gecko is tolatlly unable to
render the content w/o extra tricks.

Comment 36

13 years ago
*** Bug 249823 has been marked as a duplicate of this bug. ***
Assignee: core.layout.html-frames → nobody
QA Contact: amar → core.layout.html-frames
You need to log in before you can comment on or make changes to this bug.