Closed Bug 168157 Opened 18 years ago Closed 12 years ago
Mozilla frozen during rendering
Using trunk build 2002091014 on win-xp pro,1.1ghz,512ram the testcase freezes Mozilla - there is also no incremental reflow. MSIE has a nice incremental reflow and is fine responsive.
This page also almost freezes my IE 6, CPU usage goes to 100%
Here MSIE6 also has 100% cpu usage but it is still very responsive (tool bar etc.) and incremental reflow shows progress.
I took a quick look at what that page does. It uses this "silktext" function to write out some text. This "silktext" function basically builds up some text, pixel by pixel, out of tables. So each line is a separate table with a single row. Inside this, each character and space between characters is a separate td. Inside this td is another table that actually draws the char. For example, the letter 'I' is broken down as: *** * * *** and then drawn as a 4 by 3 table with cells all sized 1px by 1px and background colors set on the cells. I suspect that we spend all of our time reflowing all this nested auto-sizing table crap and/or parsing the document.write() calls that build the crap.
Component: Layout → HTMLTables
OS: Windows XP → All
Hardware: PC → All
Probably also related to bug 54542 - we are very slow at rendering tables with large numbers of rows/columns. We should get a jprof; it might be of more general use.
When displaying http://www.world-direct.com/mozilla/dhtml/silktext.html with a debug build I see messages about document.all and document.layers
I tried getting a jprof.... it looked bogus (I _know_ 10% of the time was not spent in mork on that testcase).
Probably also related to bug 94587.
The upcoming fix in bug 85755 might help here too.
Assignee: attinasi → other
Target Milestone: Future → ---
Using the 7.01 released build on Win XP and the 11/21/02 Trunk on Linux, the page does render, but very slowly. The browser never totally freezes.
More info: page takes 21 seconds to load in 1/08/03 Trunk Build on Win XP. During that period, it looks like Mozilla is frozen, but after a second, the browser starts up again.
Maybe bug 97229 might help here too?
based on original url: http://www.world-direct.com/mozilla/dhtml/silktext.html test script is same to comments #19 of bug 186465
We seem to be spending a lot of time in nsAString::do_AssignFromReadable(nsAString const&) .... Leon, if you can attach the profiles unzipped, please do that. If you cannot, don't tar up single files -- that's sort of pointless. Just gzip them. ;)
this jprof is too good to be true no table reflow visible ;-). I am very sceptical that this jprof is valid. Even the usual suspects (cellmap) dont show up.
I can't load the test page at all in a recent build. I see 20-30 seconds of high CPU load (many samples under nsParser::BuildModel(), mostly in nsCSSFrameConstructor table construction stuff). But then CPU load drops to zero, and it looks like I've deadlocked in ... js_GC js_FinalizeObject prop_iterator_finalize js_RemoveRoot PR_Lock
I see similar results, except the chunk of active CPU time is mostly showing up as doing JS GC-related locking (from GC and AllocateGCThing, IIRC).
If you see js_AllocGCThing calling js_GC, we're in the "last ditch" trying to collect garbage because malloc has failed. That's bad, but the JS GC-allocator can't do much more than give up after one try. If it isn't giving up (if control flow remains in a single invocation of js_AllocGCThing and whatever it calls), then there's a JS bug to file. But if js_AllocGCThing fails in turn, and the failure propagates back to JS_NewObject or whatever (need a full stack), then someone is just banging his head against a wall at a higher layer, in the DOM I guess. /be
This page still deadlocks for me, but reducing the testcase allows it to work.
On the testcase at http://www.world-direct.com/mozilla/dhtml/silktext.html MSIE 6 on winxp is very fine doing incremental loading. During the whole rendering cpu usage is 100% but MSIE is always very responsive - in constrast to Mozilla, which is frozen during that time.
This is zipped an tab deliminted. You'll probably want to import this to a database or spread sheet so you can sort it. (It's only part of the data, as the entire data zipped is more than the 300k limit)
Does that bring up any new findings?
The top offenders in the Quantify data appear to be PeekMessage, FlashWindow (I'm not sure if there are blocking issues with those functions) Below is a short excerpt from the file. QueryPerformanceCounter is called by PR_MD_GET_INTERVAL. Not sure what's called PR_MD_GET_INTERVAL, but I wonder if there's an faster alternative. BitBlt always seems like an expensive function, if there was a way to reduce a few calls, you'd get a lot of payoff. PeekMessageW 3544 346838570 FlashWindow 148 222454364 malloc 133297 146642286 new(UINT) 92325 141552657 TlsGetValue 633196 121516364 QueryPerformanceCounter 20045 81984141 memcpy 181142 80149330 free 100154 78542186 memmove 136360 76659665 EnterCriticalSection 213309 76258949 FindClose 2167 64743219 BitBlt 76 62534688
Bug 124695 has some timings for different (win32) time related functions. Are you sure that you've filtered out all blocked threads? In most cases you can focus the main thread and get relevant data. Other threads may block on PeekMessage but those doesn't affect the processing time unless it's a network related problem.
I reran this test. Now the biggest offender is PR_Lock, which is mainly called by nsJAR::GetInputStream, which is called 197 times. It's also called by js_AllocGCThing, but the time spent is less, even though it's called substantially more. Another big calls is jsd_Lock, no, didn't have the debugger up. Caller of PR_Lock % time Calls Propigated time nsJAR::GetInputStream 63.34 197 923950.83 js_AllocGCThing 17.70 352288 258283.69 nsZipReaderCache::ReleaseZip 7.56 111 110287.54 jsd_Lock 7.49 410418 109303.69 A lot of PR_Lock's time is spent in EnterCriticalSection QueryPerformanceCounter shows up here as well near the top. Malloc shows up, called mainly by JS_malloc, called by js_NewObject, called by js_StringToObject, called by js_ValueToObject, called by js_ValueToNonNullObject. Maybe Brendan's patch will help some here to reduce the mallocs for new objects. Malloc is also called by PL_ArenaAllocate, which is called 5,666 times, split between FrameArena::AllocateFrame(1397 times) and nsFixedSizeAllocator::Alloc (called by nsCParserStartNode::Create, nsTokenAllocator::CreateTokenOfType). I think to be effective, a person or group needs to be tasked with spending the time to really track down and understand the performance issues involved in this and other bugs. It's really difficult to relay the information I get from Purify meaningfully, and there's too much information to post the entire data in text form, even compressed. I'd much rather be suggesting changes to improve performance rather than posting stats. But I don't have the time to do that. JProf may be the next best thing, although from what I've seen there's no way to block off and just measure a certain period of time. For instance in Purify I can pause all recording until I get to the spot I want to measure and then turn on recording. So I don't get a lot of startup/shutdown noise. Maybe there's a way to do that programmatically with jprof.
In jprof, you can at least start profiling on when you start the operation you want. See the JP_DEFER option; you have to signal Mozilla with a 'SIGPROF' signal to get it to start profiling if that's used.
If nsJAR::GetInputStream managed to spend almost a second in just 197 calls to PR_Lock, it must have hang, waiting for the lock somehow. I wonder who had that lock... Why should a jar be read during this page load anyway? It is difficult to analyze data in Quantify, especially in Mozilla where thousands of different functions can be involved in a simple operation and more than that when a page load is done. My strategy have always been to focus down to a part of the call chain where I think more time than necessary are spent and then work on that part alone. It it turns out I'm wrong I'll back up and try to find another hotspot. In your measurements, are you using funtion times, line times or timed times? I've noticed that when using function or line times Quantify gets the timed parts wrong. It times calls to the OS and the C runtime and those are presented with more time than they actually used. The PR_Lock call itself could be much quicker but not until we sacrifice win95 compatibility. There's a bug about it somewhere with a patch that had to be backed out because of Windows 95.
Well I always found line/function to be more accurate than time, but maybe I was just misinterpreting the information. But then I've never been that impressed with Quantify, well in general Rational products especially WRT UI. Calls to jsd_Lock account for more than 50% of the calls to PR_Lock. Maybe I should just remove jsd for now to get rid of the noise.
With out the debugger, js_AllocGCThing and js_PurgeDeflatedStringCache appear to be the greatest users of PR_Lock. Another thing that stands out is that this test is creating thousands of frames. And from there, really bumps up a lot of the other functions call counts and times.
Sampling what functions are on the stack rarely points you to the algorithmic problems that are the most frequent causes of performance problems like this. The real cause of this bug is that processing single reflow events take too long.
Simon, regarding your statement "The real cause of this bug is that processing single reflow events take too long." - is there a bug that is currently trying to improve this?
None that explicitly address that issue, that I know of.
not in a final cycle. rearchitecting belongs in alphas.
Flags: blocking1.4? → blocking1.4-
Still Mozilla is totally frozen during rendering - trunk build 2003072704. some comparison data: mozilla: 24395ms msie6 : 17896ms
Depends on: 123668
We don't do any incremental reflow at all - is this intended here?
still. mozilla 2? Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a6pre) Gecko/2007062202 SeaMonkey/2.0a1pre
this is wfm: ff 3500 ie7 5700 chrom 3300
=> WFM also Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:126.96.36.199) Gecko/2008092417 Firefox/3.0.3
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
Actually here it's frozen for about 7 seconds during the rendering of the page. After that it's not really responsive, for example if I - select all text and - press the right mouse button, after some seconds I get the window stating "A script on this page may be busy, or it may have stopped responding. You can stop the script now, or you can continue to see if the script will complete. Script: chrome://browser/content/browser.js:4303" Tested with latest firefox stable: Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:188.8.131.52) Gecko/2008092417 Firefox/3.0.3 And tested with current nightly (same frozen+unresponsive behaviour): (here the busy script message line number is Script: chrome://browser/content/browser.js:4056) Build identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1b2pre) Gecko/20081013 Minefield/3.1b2pre BTW I'm using a P4+Hyperthreading with XP (I know there are some pages that freeze firefox only on multicore/HyperThreading processors with Windows)
You need to log in before you can comment on or make changes to this bug.