499 bytes, text/plain
707 bytes, application/zip
923 bytes, application/octet-stream
894.48 KB, text/html
A plot of memory usage on a Athlon 900/ 384Mb ram. Linux kernel 2.4.9. 300 sek to open; another 1200 to close
6.35 KB, image/png
4.75 KB, patch
John Bandhauer: review+
|Details | Diff | Splinter Review|
From Bugzilla Helper: User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows 98) BuildID: 2001100503 When I open a page with hundreds of <iframe> elements, Mozilla illegal ops and brings up Talkback. Reproducible: Always Steps to Reproduce: 1. Set Mozilla as your default Internet browser. 2. Open outer.html from the test case in a forthcoming attachment. Actual Results: MOZILLA caused an invalid page fault in module MSVCRT.DLL at 015f:7800d16a. (Continued in another forthcoming attachment.) Expected Results: Page is rendered, with each iframe usable. This computer has 64 MB of RAM. IE 6 has no problem with this page. Talkback incident ids include TB36308553G, TB36417833Q, and TB36418544X.
worksforme, linux build 2001-10-07-06 linux build. I _do_ use about 90M ram to view that page, though. Could you be just running out of memory? what are your virtual memory settings?
->evaughan, who took pollmann's bugs.
Wow. that was fun. 25 minutes of solid disk-thrashing, but no crash. Build 2001110803 Windows 95 (on a machine with 64 mb ram)
wfm using build 2001100803 on Win2k. Albeit slow though.
Slow is the word. I just tried a version with 1000 iframes (based on the test case, but twice as long). WinNT4sp6a, PIII-600, 320 meg memory. This was a wonderful stress test to watch in action. It took about 20 minutes of CPU time. The VM size started at 190 meg, peaked at about 250 meg, slooooooowly dropped to 210 meg, then climbed again to 290 meg. Stayed steady there for a while and took several minutes before it was ready to display the page. It would be interesting to know what was happening at what time - what was using all that memory, what was being freed, etc. The page with hundreds of iframes scrolls very slowly. Closing the 1000-iframe page is even more fun: the VM size drops, drip by drip, slowly, torturously, a few hundred K at a time, from 290 meg to 224 meg, over the course of 17 minutes CPU time. Note that opening or closing the page is mostly CPU-intensive - it uses a lot of memory, but doesn't get the system thrashing; it's the CPU that's getting the workout. This test case is a wonderful Bad Example for performance testing under pathological conditions, and will doubtless be rich in pointers to places that could do with tightening up. In the meantime, I'm not surprised it died horribly under Win98 ... This one is a WFM too; perhaps it should be a different bug, a pathological test case for anyone wishing to torture their machine.
Even if this isn't a crasher, it's a perf bug. Confirming on that basis.
Is it me, or are we spending lots of time locking and unlocking? Not to mention allocating memory for 500 docshells, and the like, of course....
NS_NewHTMLContentSink -> .... XPCJSRuntime::SyncXPCContextList -> JS_DHashTableEnumerate -> KillDeadContextsCB -> JS_ContextIterator -> js_UnlockRuntime/js_LockRuntime is close to 1/2 the runtime, most of it in Lock/Unlock. Much of the rest seems to be creating (root) frames and handling ContentInserted.
hmm.. To be precise, we have: HTMLContentSink::Init -> IsScriptEnabled -> nsDocShell::GetScriptGlobalObject -> nsDocShell::EnsureScriptEnvironment Most of the time seems to be spent under there. Over to XPConnect, ccing jst and jband
Created attachment 52811 [details] A plot of memory usage on a Athlon 900/ 384Mb ram. Linux kernel 2.4.9. 300 sek to open; another 1200 to close
I've come up with a patch that will reduce the number of calls to JS_ContextIterator from about 5 mill to a little under 300,000. I need to do some more testing and timings to make sure the numbers are correct and that things behave normally.
I'm happy to look at the code even if it is not for sure. Ultimately it *might* be best to just add hooks in the JS engine so xpconnect can track JSContext creation and destruction when that happens and not some time later. There were subtle problems recycled JSContext pointers in the past when we we not calling SyncXPCContextList as often. On the other hand, we are pretty close to being able to not be surprized at all by JSContext construction and destruction. One really can't use xpconnect on a JSContext for which initClasses or initClassesWithNewWrappedGlobal has not been called. And, now that we have nsIXPConnect::releaseJSContext we could make it so that *all* destroyers of JSContext (that use xpconnect) call xpconnect to do their JSContext destruction. Once we can *trust* that xpconnect is being correctly informed about these events then we really don't need to do the aggressive work done by SyncXPCContextList (except maybe as a DEBUG only verification). I think that may be the best way to go.
The following patch seems to reduce the number of context iterations. I think we still have issues on the cleanup side. This patch basically puts in place a marking scheme for the context map. Rather than iterating all the context in the map and then for each one of those iterating the JS's contexts I used a mark and sweep scheme. This consolidates the iteration of JS context to one loop and the iteration of the context map to one loop. I got widely different numbers the two tests runs I made with the original code. They ranged from 5 million to 50 million calls to JS_ContextIterator. In both runs with the patch I came in under 300k of calls. In any case, even going from 5 million to 300k is a decent improvement. This patch retains the same logic so is fairly low risk, assuming I coded it correctly. jband's suggestion is probably the better ultimate solution. So I think it's worth using this patch until we can validate we won't be surprised by JSContext construction/destruction.
Created attachment 52882 [details] [diff] [review] Consolidates context iterations in XPCJSRuntime::SyncXPCContextList
Just to make performance testers cry: I tried my 1000-iframe example in IE 5.5. Same machine as before (PIII-600, 320 meg, WinNT4sp6a). Time to render file: 9 seconds (Mozilla: 20 minutes), not taking 100% CPU either. Time to close window: straight away (Mozilla: 17 minutes at 100% CPU). Ouch. Anyone got IE 6 to hand? (I can see it now: Microsoft HTML how-to: "Yes, hundreds of iframes on a page are the new standard ...")
Ok, a little follow up for today's findings. I ran another series of tests. Basically the patch reduces the number calls to js_ContextIterator by about 50 million. Quantify says the reduction in overall time is about 75%. It's hard to get an identical run with Win2k. So many things can affect performance. I'm a bit skeptical about that number I think it's less. Leaving the page is supposedly faster, Quantify reports a 66% decrease in time. Like above this is suspect. Other than WaitForSingleObject blocking, SetWindowsPos, ReleaseSemaphore, new, malloc, TlsGetValue are next on the list. Not much I can do about SetWindowPos. The ReleaseSemaphore is used via, NsThreadPool::DispatchRequest -\ \ PR_ExitMonitor --------------------> PR_Unlock -> PR_MD_UNLOCK / Js_UnlockRuntime =============-/ PR_ExitMonitor is called mainly by the GetNewOrUsed and a few other functions in XPConnect and nsComponentManagerImpl::GetService I know for XPConnect this locking just needs to be simple mutex/critical section. I'm wondering why we're using such a heavy locking function to guard access to maps and such. (I suspect there's something I'm missing) Some other observations. RuleHash::AppendToTable gets called 329.658 times. This seems to emanate from EnumRulesMatching. EnumRulesMatching is called 30,000 times, and everything else propigates from there. The other thing that probably needs to be looked at is the memory consumption. It takes about 90 megs to display this page. I don't know if that's acceptable or not.
dbradley: The patch looks good to me. Seems like a fine place for mark'n'sweep to me. Had I been in a mark'n'sweep state of mind when I wrote that code I may have thought of it myself. Thanks. I have only nits to complain about... - might as well rename "KillDeadContextsCB" to "SweepContexts" - please don't infect xpconnect with "if (" :) - If it were me, I'd just do "MARK_FLAG = 4" with a comment about avoiding overlap if anything changes. There's some (slight) chance some compiler might want to use a smaller data type for the enum - no reason to expand unnecessarly. On locking... xpconnect is using a monitor rather than a lock because there are a couple of places where the lock needs to be reentrant. The extra cost (over just using the underlying PRLock) is just an extra call to get the current thread (maps down to TlsGetValue). Locking and unlocking a PRLock (on Win32) maps down to Enter/LeaveCritivalSection. NSPR is structured to use funtion calls where we might prefer macros. But this is not really all that heavyweight. It is part of the cost we pay for cross platform On comparing to IE... I speculate that IE is lazier about setting up a JS execution environment in iframes. I'd bet that if one ran some JS code in each iframe then the difference would be less dramatic (after we fix the glaring bits). This, of course, leads to the question: could we get away with being lazier about initializing JSContexts etc. ?
Regarding the locking. The only thing that peaked my interest was the 511 calls to ReleaseSemaphore, which seemed to cost a fair bit of time. This is only done if asked for, so not a problem for the majority of locks/unlocks.
Comment on attachment 53569 [details] [diff] [review] Updated with jband's comments r/sr=jband
Comment on attachment 53569 [details] [diff] [review] Updated with jband's comments r/sr=jst
Patch checked in (53569) There's still more to be addressed so leaving this open.
I did a little more digging using Purify. We are now to the point where we need to look at reducing the complexity of data structures in some fashion to conserve memory as well as increase speed. In any case this is outside the realm of xpconnect and needs to be reassigned. One of the larger users of memory is RuleHash::AppendRuleToTable. This allocates around 400k for 50 iFrames. I cut the original sample of 500 iFrames down to 50 so I could run things in a reasonable amount of time. The following are some other large consumers of memory: [I] MIU: Memory use of 209865 bytes from 51 blocks allocated in PR_Malloc Distribution of blocks in use Allocation location malloc [msvcrt.DLL] PR_Malloc [prmem.c:50] PresShell::AllocateFrame(UINT,void * *) [nsPresShell.cpp:1789] nsStyleContext::SetStyle(nsStyleStructID,nsStyleStruct const&) [nsStyleContext.cpp:465] nsRuleNode::WalkRuleTree(nsStyleStructID,nsIStyleContext *,nsRuleData *,nsCSSStruct *) [nsRuleNode.cpp:1399] nsRuleNode::GetVisibilityData(nsIStyleContext *) [nsRuleNode.cpp:1077] nsRuleNode::GetStyleData(nsStyleStructID,nsIStyleContext *) [nsRuleNode.cpp:4249] nsStyleContext::GetStyleData(nsStyleStructID) [nsStyleContext.cpp:384] nsBoxFrameInner::CacheAttributes(void) [nsBoxFrame.cpp:420] nsBoxFrame::Init(nsIPresContext *,nsIContent *,nsIFrame *,nsIStyleContext *,nsIFrame *) [nsBoxFrame.cpp:377] [I] MIU: Memory use of 209865 bytes from 51 blocks allocated in PR_Malloc Distribution of blocks in use Allocation location malloc [msvcrt.DLL] PR_Malloc [prmem.c:50] PresShell::AllocateFrame(UINT,void * *) [nsPresShell.cpp:1789] nsRuleNode::CreateRootNode(nsIPresContext *,nsIRuleNode * *) [nsRuleNode.cpp:370] StyleSetImpl::ResolvePseudoStyleFor(nsIPresContext *,nsIContent *,nsIAtom *,nsIStyleContext *,int,nsICSSPseudoComparator *) [nsStyleSet.cpp:942] nsPresContext::ResolvePseudoStyleWithComparator(nsIContent *,nsIAtom *,nsIStyleContext *,int,nsICSSPseudoComparator *,nsIStyleContext * *) [nsPresContext.obj:906] nsPresContext::ResolvePseudoStyleContextFor(nsIContent *,nsIAtom *,nsIStyleContext *,int,nsIStyleContext * *) [nsPresContext.obj:884] nsCSSFrameConstructor::ConstructRootFrame(nsIPresShell *,nsIPresContext *,nsIContent *,nsIFrame *&) [nsCSSFrameConstructor.obj:3500] StyleSetImpl::ConstructRootFrame(nsIPresContext *,nsIContent *,nsIFrame *&) [nsStyleSet.cpp:1183] PresShell::InitialReflow(int,int) [nsPresShell.cpp:2619] [I] MIU: Memory use of 208896 bytes from 51 blocks allocated in nsSupportsArray::GrowArrayBy(int) Distribution of blocks in use Allocation location new(UINT) [msvcrt.DLL] nsSupportsArray::GrowArrayBy(int) [nsSupportsArray.obj:194] CSSRuleProcessor::GetRuleCascade(nsIPresContext *,nsIAtom *) [nsCSSStyleSheet.cpp:4492] CSSRuleProcessor::RulesMatching(nsIPresContext *,nsIAtom *,nsIContent *,nsIAtom *,nsIStyleContext *,nsICSSPseudoComparator *,nsIRuleWalker *) [nsCSSStyleSheet.cpp:4063] EnumPseudoRulesMatching [nsStyleSet.cpp:912] StyleSetImpl::ResolvePseudoStyleFor(nsIPresContext *,nsIContent *,nsIAtom *,nsIStyleContext *,int,nsICSSPseudoComparator *) [nsStyleSet.cpp:945] nsPresContext::ResolvePseudoStyleWithComparator(nsIContent *,nsIAtom *,nsIStyleContext *,int,nsICSSPseudoComparator *,nsIStyleContext * *) [nsPresContext.obj:906] nsPresContext::ResolvePseudoStyleContextFor(nsIContent *,nsIAtom *,nsIStyleContext *,int,nsIStyleContext * *) [nsPresContext.obj:884] nsCSSFrameConstructor::ConstructRootFrame(nsIPresShell *,nsIPresContext *,nsIContent *,nsIFrame *&) [nsCSSFrameConstructor.obj:3500] StyleSetImpl::ConstructRootFrame(nsIPresContext *,nsIContent *,nsIFrame *&) [nsStyleSet.cpp:1183] [I] MIU: Memory use of 209865 bytes from 51 blocks allocated in PR_Malloc Distribution of blocks in use Allocation location malloc [msvcrt.DLL] PR_Malloc [prmem.c:50] PresShell::AllocateFrame(UINT,void * *) [nsPresShell.cpp:1789] nsRuleNode::Transition(nsIStyleRule *,nsIRuleNode * *) [nsRuleNode.cpp:414] nsRuleWalker::Forward(nsIStyleRule *) [nsRuleWalker.cpp:95] ContentEnumFunc [nsCSSStyleSheet.obj:3917] RuleHash::EnumerateAllRules(int,nsIAtom *,nsIAtom *,nsVoidArray const&,(*)(nsICSSStyleRule *,void *),void *) [nsCSSStyleSheet.cpp:634] CSSRuleProcessor::RulesMatching(nsIPresContext *,nsIAtom *,nsIContent *,nsIStyleContext *,nsIRuleWalker *) [nsCSSStyleSheet.cpp:3965] EnumRulesMatching [nsStyleSet.cpp:766] StyleSetImpl::ResolveStyleFor(nsIPresContext *,nsIContent *,nsIStyleContext *,int) [nsStyleSet.cpp:866] [I] MIU: Memory use of 205750 bytes from 50 blocks allocated in PR_Malloc Distribution of blocks in use Allocation location malloc [msvcrt.DLL] PR_Malloc [prmem.c:50] PresShell::AllocateFrame(UINT,void * *) [nsPresShell.cpp:1789] nsRuleNode::ComputeBorderData(nsStyleStruct *,nsCSSStruct const&,nsIStyleContext *,nsRuleNode *,RuleDetail::nsRuleNode const&,int) [nsRuleNode.cpp:3083] nsRuleNode::WalkRuleTree(nsStyleStructID,nsIStyleContext *,nsRuleData *,nsCSSStruct *) [nsRuleNode.cpp:1411] nsRuleNode::GetBorderData(nsIStyleContext *) [nsRuleNode.cpp:1186] nsRuleNode::GetStyleData(nsStyleStructID,nsIStyleContext *) [nsRuleNode.cpp:4249] nsStyleContext::GetStyleData(nsStyleStructID) [nsStyleContext.cpp:384] nsStyleContext::CalcStyleDifference(nsIStyleContext *,int&,int) [nsStyleContext.cpp:670] CaptureChange [nsFrameManager.cpp:1551] [I] MIU: Memory use of 131088 bytes from 2 blocks allocated in orkinHeap::Alloc(nsIMdbEnv *,UINT,void * *) Distribution of blocks in use Allocation location new(UINT) [msvcrt.DLL] orkinHeap::Alloc(nsIMdbEnv *,UINT,void * *) [orkinHeap.cpp:91] morkZone::zone_grow_at(morkEnv *,UINT) [morkZone.cpp:242] morkAtomSpace::MakeBookAtomCopyWithAid(morkEnv *,morkFarBookAtom const&,UINT) [morkAtomSpace.cpp:201] morkStore::AddAlias(morkEnv *,morkMid const&,UINT) [morkStore.cpp:999] morkBuilder::OnAlias(morkEnv *,morkSpan const&,morkMid const&) [morkBuilder.cpp:649] morkParser::ReadAlias(morkEnv *) [morkParser.cpp:1056] morkParser::ReadDict(morkEnv *) [morkParser.cpp:1336] ??? [ip=0x08e7e16c] morkParser::ReadContent(morkEnv *,BYTE) [morkParser.cpp:1428]
Changing subject to better reflect the focus of this bug. Reassigning, since the XPConnect issue has been addressed. Purify/Quantify evidence points to CSS/Rules
Setting component to Compositor to reflect the new summary and new owner, firstname.lastname@example.org
I think bug 104336 may be of interest here.
Not sure why this was set to Compositor? Changing to HTMLFrames. Downgrading severity from critical and removing crash. This is an extreme edge case and it doesn't crash if you have enough memory. It took 70 seconds to load on AMD 750MHZ running WINXP. It rendered properly afterward.
Bulk re-assigning all of Eric's HTMLFrame bugs to John.
Performance problems with hundreds of iframes is not a priority right now.
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6b) Gecko/20031124 Firebird/0.7+ (Nova: SVG,MNG,DOMi,Venkman) The problem starts at around 40-45 iframes, which may look a lot, but I have some pages where I need more than that . same problem in Mozilla-Firebird and K-Meleon
(In reply to comment #33) > Performance problems with hundreds of iframes is not a priority right now. Would it be now , more than a year later ? It is not that the iframes won't render, but the iframe content won't, unless you resize the window they are in testcase: Open http://home.planet.nl/~pa1six/mag_eur/mag_eur.html push on the 1st tab (left column) scroll down the tabs and select on of the last tabs. You should see "nothing" happening. Rightclick inside the displayed frame and notice the weird popup behaviour (side effect) Now marginally resize the window, there is the content! Rge rightframe does scoll properly, but somehow Gecko is tolatlly unable to render the content w/o extra tricks.
*** Bug 249823 has been marked as a duplicate of this bug. ***