Closed Bug 168157 Opened 18 years ago Closed 12 years ago

Mozilla frozen during rendering

Categories

(Core :: Layout: Tables, defect)

defect
Not set

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: markushuebner, Unassigned)

References

()

Details

(Keywords: perf, testcase)

Attachments

(2 files)

45.06 KB, application/x-gzip
Details
217.54 KB, application/octet-stream
Details
Using trunk build 2002091014 on win-xp pro,1.1ghz,512ram the testcase freezes 
Mozilla - there is also no incremental reflow.
MSIE has a nice incremental reflow and is fine responsive.
Anyone willing to take this bug?
Keywords: hang, perf
This page also almost freezes my IE 6, CPU usage goes to 100%
Here MSIE6 also has 100% cpu usage but it is still very responsive (tool bar 
etc.) and incremental reflow shows progress.
I took a quick look at what that page does.  It uses this "silktext" function to
write out some text.  This "silktext" function basically builds up some text,
pixel by pixel, out of tables.  So each line is a separate table with a single
row.  Inside this, each character and space between characters is a separate td.
 Inside this td is another table that actually draws the char.  For example, the
letter 'I' is broken down as:

***
 *
 *
***

and then drawn as a 4 by 3 table with cells all sized 1px by 1px and background
colors set on the cells.

I suspect that we spend all of our time reflowing all this nested auto-sizing
table crap and/or parsing the document.write() calls that build the crap.

Component: Layout → HTMLTables
OS: Windows XP → All
Hardware: PC → All
Probably also related to bug 54542 - we are very slow at rendering tables with
large numbers of rows/columns.

We should get a jprof; it might be of more general use.
When displaying http://www.world-direct.com/mozilla/dhtml/silktext.html with a
debug build I see messages about document.all and document.layers
I tried getting a jprof.... it looked bogus (I _know_ 10% of the time was not
spent in mork on that testcase).
Probably also related to bug 94587.
Depends on: 91643
Priority: -- → P2
Target Milestone: --- → Future
Keywords: testcase
The upcoming fix in bug 85755 might help here too.
Assignee: attinasi → other
Target Milestone: Future → ---
Using the 7.01 released build on Win XP and the 11/21/02 Trunk on Linux, the
page does render, but very slowly. The browser never totally freezes.
More info: page takes 21 seconds to load in 1/08/03 Trunk Build on Win XP.
During that period, it looks like Mozilla is frozen, but after a second, the
browser starts up again.
Maybe bug 97229 might help here too?
Attached file a jprofile
based on original url: http://www.world-direct.com/mozilla/dhtml/silktext.html
test script is same to	comments #19 of bug 186465
We seem to be spending a lot of time in
nsAString::do_AssignFromReadable(nsAString const&) ....

Leon, if you can attach the profiles unzipped, please do that.  If you cannot,
don't tar up single files -- that's sort of pointless.  Just gzip them.  ;)
this jprof is too good to be true no table reflow visible ;-). I am very
sceptical that this jprof is valid. Even the usual suspects (cellmap) dont show up.
I can't load the test page at all in a recent build. I see 20-30 seconds of high
CPU load (many samples under nsParser::BuildModel(), mostly in
nsCSSFrameConstructor table construction stuff). But then CPU load drops to
zero, and it looks like I've deadlocked in

...
js_GC
js_FinalizeObject
prop_iterator_finalize
js_RemoveRoot
PR_Lock
I see similar results, except the chunk of active CPU time is mostly showing up
as doing JS GC-related locking (from GC and AllocateGCThing, IIRC).
If you see js_AllocGCThing calling js_GC, we're in the "last ditch" trying to
collect garbage because malloc has failed.  That's bad, but the JS GC-allocator
can't do much more than give up after one try.  If it isn't giving up (if
control flow remains in a single invocation of js_AllocGCThing and whatever it
calls), then there's a JS bug to file.

But if js_AllocGCThing fails in turn, and the failure propagates back to
JS_NewObject or whatever (need a full stack), then someone is just banging his
head against a wall at a higher layer, in the DOM I guess.

/be
Target Milestone: --- → Future
Keywords: mozilla1.3
Blocks: 91351
This page still deadlocks for me, but reducing the testcase allows it to work.
On the testcase at
http://www.world-direct.com/mozilla/dhtml/silktext.html
MSIE 6 on winxp is very fine doing incremental loading.
During the whole rendering cpu usage is 100% but MSIE is
always very responsive - in constrast to Mozilla, which is
frozen during that time.
Attached file Quantify data
This is zipped an tab deliminted. You'll probably want to import this to a
database or spread sheet so you can sort it. (It's only part of the data, as
the entire data zipped is more than the 300k limit)
Does that bring up any new findings? 
Keywords: mozilla1.3
Target Milestone: Future → ---
Target Milestone: --- → Future
Priority: P2 → --
Target Milestone: Future → ---
The top offenders in the Quantify data appear to be PeekMessage, FlashWindow
(I'm not sure if there are blocking issues with those functions) Below is a
short excerpt from the file. QueryPerformanceCounter is called by
PR_MD_GET_INTERVAL. Not sure what's called PR_MD_GET_INTERVAL, but I wonder if
there's an faster alternative. BitBlt always seems like an expensive function,
if there was a way to reduce a few calls, you'd get a lot of payoff.

PeekMessageW            3544    346838570
FlashWindow             148     222454364
malloc                  133297  146642286
new(UINT)               92325   141552657
TlsGetValue             633196  121516364
QueryPerformanceCounter 20045	81984141
memcpy                  181142  80149330
free                    100154  78542186
memmove                 136360  76659665
EnterCriticalSection    213309  76258949
FindClose               2167    64743219
BitBlt                  76      62534688
Bug 124695 has some timings for different (win32) time related functions.

Are you sure that you've filtered out all blocked threads? In most cases you can
focus the main thread and get relevant data. Other threads may block on
PeekMessage but those doesn't affect the processing time unless it's a network
related problem.
I reran this test. Now the biggest offender is PR_Lock, which is mainly called
by nsJAR::GetInputStream, which is called 197 times. It's also called by
js_AllocGCThing, but the time spent is less, even though it's called
substantially more. Another big calls is jsd_Lock, no, didn't have the debugger up.

Caller of PR_Lock               % time  Calls   Propigated time
nsJAR::GetInputStream           63.34   197     923950.83
js_AllocGCThing                 17.70   352288  258283.69
nsZipReaderCache::ReleaseZip    7.56    111     110287.54
jsd_Lock                        7.49    410418  109303.69

A lot of PR_Lock's time is spent in EnterCriticalSection

QueryPerformanceCounter shows up here as well near the top.

Malloc shows up, called mainly by JS_malloc, called by js_NewObject, called by
js_StringToObject, called by js_ValueToObject, called by
js_ValueToNonNullObject. Maybe Brendan's patch will help some here to reduce the
mallocs for new objects.

Malloc is also called by PL_ArenaAllocate, which is called 5,666 times, split
between FrameArena::AllocateFrame(1397 times) and nsFixedSizeAllocator::Alloc
(called by nsCParserStartNode::Create, nsTokenAllocator::CreateTokenOfType).

I think to be effective, a person or group needs to be tasked with spending the
time to really track down and understand the performance issues involved in this
and other bugs. It's really difficult to relay the information I get from Purify
meaningfully, and there's too much information to post the entire data in text
form, even compressed. I'd much rather be suggesting changes to improve
performance rather than posting stats. But I don't have the time to do that.

JProf may be the next best thing, although from what I've seen there's no way to
block off and just measure a certain period of time. For instance in Purify I
can pause all recording until I get to the spot I want to measure and then turn
on recording. So I don't get a lot of startup/shutdown noise. Maybe there's a
way to do that programmatically with jprof.
In jprof, you can at least start profiling on when you start the operation you
want.  See the JP_DEFER option; you have to signal Mozilla with a 'SIGPROF'
signal to get it to start profiling if that's used.
If nsJAR::GetInputStream managed to spend almost a second in just 197 calls to
PR_Lock, it must have hang, waiting for the lock somehow. I wonder who had that
lock... Why should a jar be read during this page load anyway?

It is difficult to analyze data in Quantify, especially in Mozilla where
thousands of different functions can be involved in a simple operation and more
than that when a page load is done. My strategy have always been to focus down
to a part of the call chain where I think more time than necessary are spent and
then work on that part alone. It it turns out I'm wrong I'll back up and try to
find another hotspot.

In your measurements, are you using funtion times, line times or timed times?
I've noticed that when using function or line times Quantify gets the timed
parts wrong. It times calls to the OS and the C runtime and those are presented
with more time than they actually used.

The PR_Lock call itself could be much quicker but not until we sacrifice win95
compatibility. There's a bug about it somewhere with a patch that had to be
backed out because of Windows 95.
Oh, and the javascript debugger hooks are active if you have installed the
javascript debugger regardless if it's used or not. rginda says that it
shouldn't use more CPU than necessary but I've seen it quite high in profiles
before. There should be a bug about it.
Well I always found line/function to be more accurate than time, but maybe I was
just misinterpreting the information. But then I've never been that impressed
with Quantify, well in general Rational products especially WRT UI.

Calls to jsd_Lock account for more than 50% of the calls to PR_Lock. Maybe I
should just remove jsd for now to get rid of the noise.
Not a hang.
Flags: blocking1.4?
Keywords: hangnsbeta1
With out the debugger, js_AllocGCThing and js_PurgeDeflatedStringCache appear to
be the greatest users of PR_Lock.

Another thing that stands out is that this test is creating thousands of frames.
And from there, really bumps up a lot of the other functions call counts and times.
Sampling what functions are on the stack rarely points you to the algorithmic
problems that are the most frequent causes of performance problems like this.
The real cause of this bug is that processing single reflow events take too long.
Depends on: ireflow
adt: nsbeta1-
Keywords: nsbeta1nsbeta1-
Blocks: 203448
Simon, regarding your statement "The real cause of this bug is that processing 
single reflow events take too long." - is there a bug that is currently trying 
to improve this?
None that explicitly address that issue, that I know of.
not in a final cycle. rearchitecting belongs in alphas.
Flags: blocking1.4? → blocking1.4-
Still Mozilla is totally frozen during rendering - trunk build 2003072704.
some comparison data:
mozilla: 24395ms
msie6  : 17896ms

Depends on: 123668
We don't do any incremental reflow at all - is this intended here?
still. mozilla 2?
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a6pre) Gecko/2007062202 SeaMonkey/2.0a1pre
this is wfm:

ff    3500
ie7   5700
chrom 3300
=> WFM also 
Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
Actually here it's frozen for about 7 seconds during the rendering of the page.

After that it's not really responsive, for example if I
- select all text and
- press the right mouse button,
after some seconds I get the window stating "A script on this page may be busy, or it may have stopped responding. You can stop the script now, or you can continue to see if the script will complete.
Script: chrome://browser/content/browser.js:4303"


Tested with latest firefox stable:
Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3

And tested with current nightly (same frozen+unresponsive behaviour):
(here the busy script message line number is
Script: chrome://browser/content/browser.js:4056)

Build identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1b2pre) Gecko/20081013 Minefield/3.1b2pre

BTW I'm using a P4+Hyperthreading with XP (I know there are some pages that freeze firefox only on multicore/HyperThreading processors with Windows)
You need to log in before you can comment on or make changes to this bug.