168157 - Mozilla frozen during rendering

Reporter

Description

•

22 years ago

Using trunk build 2002091014 on win-xp pro,1.1ghz,512ram the testcase freezes 
Mozilla - there is also no incremental reflow.
MSIE has a nice incremental reflow and is fine responsive.

Markus Hübner

Reporter

Comment 1

•

22 years ago

Anyone willing to take this bug?

Keywords: hang, perf

José Jeria

Comment 2

•

22 years ago

This page also almost freezes my IE 6, CPU usage goes to 100%

Markus Hübner

Reporter

Comment 3

•

22 years ago

Here MSIE6 also has 100% cpu usage but it is still very responsive (tool bar 
etc.) and incremental reflow shows progress.

Boris Zbarsky [:bzbarsky]

Comment 4

•

22 years ago

I took a quick look at what that page does.  It uses this "silktext" function to
write out some text.  This "silktext" function basically builds up some text,
pixel by pixel, out of tables.  So each line is a separate table with a single
row.  Inside this, each character and space between characters is a separate td.
 Inside this td is another table that actually draws the char.  For example, the
letter 'I' is broken down as:

***
 *
 *
***

and then drawn as a 4 by 3 table with cells all sized 1px by 1px and background
colors set on the cells.

I suspect that we spend all of our time reflowing all this nested auto-sizing
table crap and/or parsing the document.write() calls that build the crap.

Component: Layout → HTMLTables

OS: Windows XP → All

Hardware: PC → All

Randell Jesup [:jesup] (needinfo me)

Comment 5

•

22 years ago

Probably also related to bug 54542 - we are very slow at rendering tables with
large numbers of rows/columns.

We should get a jprof; it might be of more general use.

kill this account

Comment 6

•

22 years ago

When displaying http://www.world-direct.com/mozilla/dhtml/silktext.html with a
debug build I see messages about document.all and document.layers

Boris Zbarsky [:bzbarsky]

Comment 7

•

22 years ago

I tried getting a jprof.... it looked bogus (I _know_ 10% of the time was not
spent in mork on that testcase).

Markus Hübner

Reporter

Comment 8

•

22 years ago

Probably also related to bug 94587.

Markus Hübner

Reporter

Updated

•

22 years ago

Depends on: 91643

Amarendra Hanumanula

Updated

•

22 years ago

Priority: -- → P2

Kevin McCluskey (gone)

Updated

•

22 years ago

Target Milestone: --- → Future

Carine Held

Updated

•

22 years ago

Keywords: testcase

Markus Hübner

Reporter

Comment 9

•

22 years ago

The upcoming fix in bug 85755 might help here too.

Assignee: attinasi → other

Target Milestone: Future → ---

Carine Held

Comment 10

•

22 years ago

Using the 7.01 released build on Win XP and the 11/21/02 Trunk on Linux, the
page does render, but very slowly. The browser never totally freezes.

Carine Held

Comment 11

•

22 years ago

More info: page takes 21 seconds to load in 1/08/03 Trunk Build on Win XP.
During that period, it looks like Mozilla is frozen, but after a second, the
browser starts up again.

Markus Hübner

Reporter

Comment 12

•

22 years ago

Maybe bug 97229 might help here too?

leon.zhang

Comment 13

•

22 years ago

Attached file a jprofile — Details

based on original url: http://www.world-direct.com/mozilla/dhtml/silktext.html
test script is same to	comments #19 of bug 186465

Boris Zbarsky [:bzbarsky]

Comment 14

•

22 years ago

We seem to be spending a lot of time in
nsAString::do_AssignFromReadable(nsAString const&) ....

Leon, if you can attach the profiles unzipped, please do that.  If you cannot,
don't tar up single files -- that's sort of pointless.  Just gzip them.  ;)

Bernd

Comment 15

•

22 years ago

this jprof is too good to be true no table reflow visible ;-). I am very
sceptical that this jprof is valid. Even the usual suspects (cellmap) dont show up.

Simon Fraser [no longer active]

Comment 16

•

22 years ago

I can't load the test page at all in a recent build. I see 20-30 seconds of high
CPU load (many samples under nsParser::BuildModel(), mostly in
nsCSSFrameConstructor table construction stuff). But then CPU load drops to
zero, and it looks like I've deadlocked in

...
js_GC
js_FinalizeObject
prop_iterator_finalize
js_RemoveRoot
PR_Lock

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 17

•

22 years ago

I see similar results, except the chunk of active CPU time is mostly showing up
as doing JS GC-related locking (from GC and AllocateGCThing, IIRC).

Brendan Eich [:brendan]

Comment 18

•

22 years ago

If you see js_AllocGCThing calling js_GC, we're in the "last ditch" trying to
collect garbage because malloc has failed.  That's bad, but the JS GC-allocator
can't do much more than give up after one try.  If it isn't giving up (if
control flow remains in a single invocation of js_AllocGCThing and whatever it
calls), then there's a JS bug to file.

But if js_AllocGCThing fails in turn, and the failure propagates back to
JS_NewObject or whatever (need a full stack), then someone is just banging his
head against a wall at a higher layer, in the DOM I guess.

/be

Kevin McCluskey (gone)

Updated

•

22 years ago

Target Milestone: --- → Future

Markus Hübner

Reporter

Updated

•

22 years ago

Keywords: mozilla1.3

Markus Hübner

Reporter

Updated

•

22 years ago

Blocks: 91351

Simon Fraser [no longer active]

Comment 19

•

22 years ago

This page still deadlocks for me, but reducing the testcase allows it to work.

Markus Hübner

Reporter

Comment 20

•

22 years ago

On the testcase at
http://www.world-direct.com/mozilla/dhtml/silktext.html
MSIE 6 on winxp is very fine doing incremental loading.
During the whole rendering cpu usage is 100% but MSIE is
always very responsive - in constrast to Mozilla, which is
frozen during that time.

David Bradley

Comment 21

•

22 years ago

Attached file Quantify data — Details

This is zipped an tab deliminted. You'll probably want to import this to a
database or spread sheet so you can sort it. (It's only part of the data, as
the entire data zipped is more than the 300k limit)

Markus Hübner

Reporter

Comment 22

•

22 years ago

Does that bring up any new findings?

Markus Hübner

Reporter

Updated

•

22 years ago

Keywords: mozilla1.3

Target Milestone: Future → ---

Kevin McCluskey (gone)

Updated

•

22 years ago

Target Milestone: --- → Future

Markus Hübner

Reporter

Updated

•

22 years ago

Priority: P2 → --

Target Milestone: Future → ---

David Bradley

Comment 23

•

22 years ago

The top offenders in the Quantify data appear to be PeekMessage, FlashWindow
(I'm not sure if there are blocking issues with those functions) Below is a
short excerpt from the file. QueryPerformanceCounter is called by
PR_MD_GET_INTERVAL. Not sure what's called PR_MD_GET_INTERVAL, but I wonder if
there's an faster alternative. BitBlt always seems like an expensive function,
if there was a way to reduce a few calls, you'd get a lot of payoff.

PeekMessageW            3544    346838570
FlashWindow             148     222454364
malloc                  133297  146642286
new(UINT)               92325   141552657
TlsGetValue             633196  121516364
QueryPerformanceCounter 20045	81984141
memcpy                  181142  80149330
free                    100154  78542186
memmove                 136360  76659665
EnterCriticalSection    213309  76258949
FindClose               2167    64743219
BitBlt                  76      62534688

Daniel Bratell

Comment 24

•

22 years ago

Bug 124695 has some timings for different (win32) time related functions.

Are you sure that you've filtered out all blocked threads? In most cases you can
focus the main thread and get relevant data. Other threads may block on
PeekMessage but those doesn't affect the processing time unless it's a network
related problem.

David Bradley

Comment 25

•

22 years ago

I reran this test. Now the biggest offender is PR_Lock, which is mainly called
by nsJAR::GetInputStream, which is called 197 times. It's also called by
js_AllocGCThing, but the time spent is less, even though it's called
substantially more. Another big calls is jsd_Lock, no, didn't have the debugger up.

Caller of PR_Lock               % time  Calls   Propigated time
nsJAR::GetInputStream           63.34   197     923950.83
js_AllocGCThing                 17.70   352288  258283.69
nsZipReaderCache::ReleaseZip    7.56    111     110287.54
jsd_Lock                        7.49    410418  109303.69

A lot of PR_Lock's time is spent in EnterCriticalSection

QueryPerformanceCounter shows up here as well near the top.

Malloc shows up, called mainly by JS_malloc, called by js_NewObject, called by
js_StringToObject, called by js_ValueToObject, called by
js_ValueToNonNullObject. Maybe Brendan's patch will help some here to reduce the
mallocs for new objects.

Malloc is also called by PL_ArenaAllocate, which is called 5,666 times, split
between FrameArena::AllocateFrame(1397 times) and nsFixedSizeAllocator::Alloc
(called by nsCParserStartNode::Create, nsTokenAllocator::CreateTokenOfType).

I think to be effective, a person or group needs to be tasked with spending the
time to really track down and understand the performance issues involved in this
and other bugs. It's really difficult to relay the information I get from Purify
meaningfully, and there's too much information to post the entire data in text
form, even compressed. I'd much rather be suggesting changes to improve
performance rather than posting stats. But I don't have the time to do that.

JProf may be the next best thing, although from what I've seen there's no way to
block off and just measure a certain period of time. For instance in Purify I
can pause all recording until I get to the spot I want to measure and then turn
on recording. So I don't get a lot of startup/shutdown noise. Maybe there's a
way to do that programmatically with jprof.

Boris Zbarsky [:bzbarsky]

Comment 26

•

22 years ago

In jprof, you can at least start profiling on when you start the operation you
want.  See the JP_DEFER option; you have to signal Mozilla with a 'SIGPROF'
signal to get it to start profiling if that's used.

Daniel Bratell

Comment 27

•

22 years ago

If nsJAR::GetInputStream managed to spend almost a second in just 197 calls to
PR_Lock, it must have hang, waiting for the lock somehow. I wonder who had that
lock... Why should a jar be read during this page load anyway?

It is difficult to analyze data in Quantify, especially in Mozilla where
thousands of different functions can be involved in a simple operation and more
than that when a page load is done. My strategy have always been to focus down
to a part of the call chain where I think more time than necessary are spent and
then work on that part alone. It it turns out I'm wrong I'll back up and try to
find another hotspot.

In your measurements, are you using funtion times, line times or timed times?
I've noticed that when using function or line times Quantify gets the timed
parts wrong. It times calls to the OS and the C runtime and those are presented
with more time than they actually used.

The PR_Lock call itself could be much quicker but not until we sacrifice win95
compatibility. There's a bug about it somewhere with a patch that had to be
backed out because of Windows 95.

Daniel Bratell

Comment 28

•

22 years ago

Oh, and the javascript debugger hooks are active if you have installed the
javascript debugger regardless if it's used or not. rginda says that it
shouldn't use more CPU than necessary but I've seen it quite high in profiles
before. There should be a bug about it.

David Bradley

Comment 29

•

22 years ago

Well I always found line/function to be more accurate than time, but maybe I was
just misinterpreting the information. But then I've never been that impressed
with Quantify, well in general Rational products especially WRT UI.

Calls to jsd_Lock account for more than 50% of the calls to PR_Lock. Maybe I
should just remove jsd for now to get rid of the noise.

Andrew Hagen

Comment 30

•

22 years ago

Not a hang.

Flags: blocking1.4?

Keywords: hang → nsbeta1

David Bradley

Comment 31

•

22 years ago

With out the debugger, js_AllocGCThing and js_PurgeDeflatedStringCache appear to
be the greatest users of PR_Lock.

Another thing that stands out is that this test is creating thousands of frames.
And from there, really bumps up a lot of the other functions call counts and times.

Simon Fraser [no longer active]

Comment 32

•

22 years ago

Sampling what functions are on the stack rarely points you to the algorithmic
problems that are the most frequent causes of performance problems like this.
The real cause of this bug is that processing single reflow events take too long.

Markus Hübner

Reporter

Updated

•

22 years ago

Depends on: ireflow

Samir Gehani

Comment 33

•

22 years ago

adt: nsbeta1-

Keywords: nsbeta1 → nsbeta1-

Markus Hübner

Reporter

Updated

•

22 years ago

Blocks: 203448

Markus Hübner

Reporter

Comment 34

•

22 years ago

Simon, regarding your statement "The real cause of this bug is that processing 
single reflow events take too long." - is there a bug that is currently trying 
to improve this?

Simon Fraser [no longer active]

Comment 35

•

22 years ago

None that explicitly address that issue, that I know of.

Asa Dotzler [:asa]

Comment 36

•

22 years ago

not in a final cycle. rearchitecting belongs in alphas.

Flags: blocking1.4? → blocking1.4-

Markus Hübner

Reporter

Comment 37

•

21 years ago

Still Mozilla is totally frozen during rendering - trunk build 2003072704.
some comparison data:
mozilla: 24395ms
msie6  : 17896ms

Depends on: 123668

Markus Hübner

Reporter

Comment 38

•

20 years ago

We don't do any incremental reflow at all - is this intended here?

Wayne Mery (:wsmwk)

Comment 39

•

17 years ago

still. mozilla 2?
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a6pre) Gecko/2007062202 SeaMonkey/2.0a1pre

dolphinling

Updated

•

17 years ago

URL: http://www.world-direct.com/mozilla/d... → https://wd-testnet.world-direct.at/mo...

Bernd

Comment 40

•

16 years ago

this is wfm:

ff    3500
ie7   5700
chrom 3300

Wayne Mery (:wsmwk)

Comment 41

•

16 years ago

=> WFM also 
Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3

Status: NEW → RESOLVED

Closed: 16 years ago

Resolution: --- → WORKSFORME

Diego

Comment 42

•

16 years ago

Actually here it's frozen for about 7 seconds during the rendering of the page.

After that it's not really responsive, for example if I
- select all text and
- press the right mouse button,
after some seconds I get the window stating "A script on this page may be busy, or it may have stopped responding. You can stop the script now, or you can continue to see if the script will complete.
Script: chrome://browser/content/browser.js:4303"


Tested with latest firefox stable:
Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3

And tested with current nightly (same frozen+unresponsive behaviour):
(here the busy script message line number is
Script: chrome://browser/content/browser.js:4056)

Build identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1b2pre) Gecko/20081013 Minefield/3.1b2pre

BTW I'm using a P4+Hyperthreading with XP (I know there are some pages that freeze firefox only on multicore/HyperThreading processors with Windows)

a jprofile 22 years ago leon.zhang 45.06 KB, application/x-gzip		Details
Quantify data 22 years ago David Bradley 217.54 KB, application/octet-stream		Details