Closed Bug 76722 Opened 23 years ago Closed 23 years ago

Gecko blocks the main thread for > 1 sec processing events

Categories

(Core :: Layout, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla0.9.3

People

(Reporter: phil, Assigned: harishd)

References

()

Details

(Keywords: perf, Whiteboard: [pdt+][nsBranch+][fixed and verified on the trunk])

Attachments

(16 files)

4.30 KB, patch
Details | Diff | Splinter Review
8.24 KB, patch
Details | Diff | Splinter Review
2.47 KB, text/plain
Details
8.06 KB, patch
Details | Diff | Splinter Review
27.50 KB, patch
Details | Diff | Splinter Review
36.78 KB, patch
Details | Diff | Splinter Review
37.21 KB, patch
Details | Diff | Splinter Review
34.43 KB, patch
Details | Diff | Splinter Review
34.10 KB, patch
Details | Diff | Splinter Review
38.52 KB, patch
Details | Diff | Splinter Review
39.62 KB, patch
Details | Diff | Splinter Review
9.59 KB, patch
Details | Diff | Splinter Review
10.63 KB, patch
Details | Diff | Splinter Review
10.78 KB, patch
Details | Diff | Splinter Review
2.05 KB, patch
Details | Diff | Splinter Review
3.34 KB, patch
Details | Diff | Splinter Review
One of our embedding customers tells us that they see Gecko blocking the main
thread while processing events. Sometimes one event takes more than 1 sec to
process. kmcclusk, can you look into this and see what you can find?
related to hyatt/darin work on flow control analysis????

do we have bugs filed on that analysis and possible fixes yet?
Is there a particular site where this is happening? 
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9.1
A single content flush and a single reflow on www.cnn.com typically exceed 1 
second on a 433Mhz PC running WINNT. When a single content flush or reflow is 
processed we do not return to the message pump until they have completed.

There are two possible solutions:

1) Implement interruptible reflow and content creation (A big job)
2) Limit the amount of data that processed by a single content sink flush

I'll explore alternative 2. 
If you've cvs add'ed nsDelayAlarm.*, you can include them in the patch via cvs
diff -u -N ....

/be
In my profiles, it isn't content creation that takes a lot of time.  It's frame 
creation (which right now happens in the same event as content creation).  It 
is during frame creation that style resolution and image load kickoffs happen, 
and these are the expensive operations.

I don't see a problem with doing many more ContentAppended and ContentInserted 
operations than we do now as long as we fix the O(n^2) problems with appending 
content and get a little smarter about coalescing reflows that occur as a 
result of appending and inserting content.

I instrumented:

nsParser::OnDataAvailable
nsHTMLContentSink::FlushTags
nsPresShell:ProcessReflowCommand
nsCSSFrameConstructor.cpp   => frame construction.

On www.cnn.comI saw the sequence of

OnDataAvailable -> Chunk of Markup
FlushTags

OnDataAvailable -> Chunk of Markup
FlushTags

....

Until almost the entire page loads then I finally saw almost all of the frames
constructed  as the result of a single call to FlushTags, followed by a single
reflow. Almost all of the frames are created and reflowed in one shot which
leads to frame construction and reflow delay's for a single event which are
greater than one second

If I save www.cnn.com locally and remove the links to external style sheets.I
see the following sequence

OnDataAvailable==> Chunk of Markup
FlushTags
Frame Construction=> Frames corresponding to the Markup created
Process Reflow Command

...

OnDataAvailable==> Chunk of Markup
FlushTags
Frame Construction=> Frames corresponding to the Markup created
Process Reflow Command


I discussed this with rickg@netscape.com and attinasi@netscape.com
The inclusion of the external sheets forces the parser to block until all of the
external style sheets have been loaded. Once they have loaded, the parser is
unblocked and it sends all of the data it was caching while in the blocked state
over to the content sink. This is what causes the majority of the frame
construction and reflowing to be done in one shot.

We also seem to spent too much time away for the message pump when processing
long documents. I will try modifying  how much data necko passes to the parser's
stream listener by changing nsStorageTransport.cpp's  MAX_IO_CHUNK  from  8192
to 1024.
To prevent the parser from sending all of the data that has been cached while it
is blocked in one shot we would need to limit the amount of data that it could
pass to the content sink and schedule messages to complete the flushing of the
parsers cache.
"In my profiles, it isn't content creation that takes a lot of time".

I agree. The content sink flush which results in the creation of the frames is
what takes the majority of the time. Thats what I meant when I referred to
content flush, not the creation of the content itself, although I erroneously
referred to content creation in solution 1)  :-) 





Changing the buffer size is dangerous, since you don't want to stall Necko.  If 
Necko can keep loading that's a good thing.  IMO it's better to make the 
content sink only flush small amounts to the frame constructor and still allow 
the parser to queue up 8192 blocks.  I think your original idea of tuning the 
content sink is the way to go here.

Although I guess we should tune the parser a little bit to make sure it doesn't 
synchronously parse a whole huge amount of content. :)  I assume the parser can 
literally grab the entire page from Necko (e.g., 80K) while blocked, and that 
it does something like parse the entire 80K and send a bunch of synchronous 
notifications to the content sink?

If that's what it does, you're right that we should tune the parser to post 
events instead of parsing in one fell swoop.
As I told Kevin and Attinasi, it would be very easy to gate the parser on how
much content to toss to the content model in any one cycle. We will be
discussing this a bit more tomorrow if folks want to join in.
Target Milestone: mozilla0.9.1 → mozilla0.9.2
Blocks: 71668
kevin, how are we doing with your patch?  Are we able to run through I-BENCH
test or jrgm's page-load test? 
"Are we able to run through I-BENCH test or jrgm's page-load test?"

I was able to run through I-BENCH and jrgm's page loader test on Linux with a
release build. The page load times with jrgm's test were within the range of
error for me of approx 4% so I couldn't see any impact from the patch. The
I-BENCH scores for the initial page load time were not consistent. I need to
tune the prefs which control how frequently the parser is interruptibted and how
often the system clock is sampled. After I have done some tuning I'll re-run
I-BENCH and jrgm's page loader test. 

Just to clarify. The solution that rickg and I have been working on is really
not the correct solution to this problem in the long term. The correct solution
would to be to make reflow and frame construction interruptible. Making them
interruptible would be a large task so we have been exploring using the parser's
pre-existing incremental/interruptible capability to limit the number of frames
that are created and reflowed in one shot. If both reflow and frame construction
were interruptible then there would no need to interrupt the parser's processing
of tokens. The patch I have attached interrupts the parser based on how much
time has been spent processing tokens. In addition the patch sets up a PL_Event
to continue calling the parser to continue processing if the document has been
completely loaded, but the parser has been interrupted. 
So you can do something like process some tokens, schedule an event, then 
during the processing of that event you parse some more tokens and possible 
schedule another event, and so on?  That sounds good to me.
Thats right. The continue parsing PL_event only needs to be scheduled if the
parser's OnDataAvailable has been called for the last time. If necko is still
loading a document the tokens that were left around when the parser was
interrupted will be processed the next time the parser's OnDataAvailable is
called (Provided it doesn't interrupt again).  

I also added a dummy parser request to the load group to prevent JavaScript
onload handlers and other document done processing from happening until the
final chunk of data has been tokenized and passed over to the content sink.  
Let me know if/when you're ready for a review.  I'd be happy to look at this.
Hyatt: you're close. As KM said, the parser is already incremental and 
interruptable. All we're really doing is using that in conjuction with a gating 
function that tells us to quit scanning/tokenizing/model-buildling once a given 
threshold has been exceeded. The trick now is to find the right threshold that: 
1) provides improved user responsiveness; 2) doesn't slow normal page 
construction; 3) performs the max possible work per iteration.
I am not sure if i miss some info, but i think that logic in
HTMLContentSink::DidProcessAToken is wrong. First that == should
be >=, but if you want to check time after every 20 token, it should be
more like this:

HTMLContentSink::DidProcessAToken(void) {
  mTokenCount++;
  if ((mInterruptParsing) && (mTokenCount >= mMaxTokenCount)) {
    mTokenCount = 0;
    if (mDelayTimer.HasExceeded(mMaxProcessingTime)) 
      return NS_ERROR_HTMLPARSER_INTERRUPTED;
  }
 
  return NS_OK;
}
Do not be discouraged if this patch causes a slight slowdown on page load 
times.  I would expect it to.  By not hogging the event queue, we're going to 
make paint suppression really unsuppress painting at the right time, and we're 
going to get a few more cycles of animation and progress.  The tradeoff is that 
pages with CSS should show some content earlier, so you may have a positive 
perception gain even though the overall time slows.

actually, from the work bbaetz has been doing, you'll most likely see a big win
on pages with many images.  (turns out that this is related to why we seem to do
better with fewer connections per server per page.)  because the ui thread is 
hogged for so long (and the ui thread's event queue essentially blocked), http
connections are not reclaimed by the http handler (since this happens on the ui
thread) as early as they really could be.  new connections are not started until
the old ones are reclaimed.

in fact, bbaetz has found that this contributes to about a 35% slowdown on pages
like www.netscape.com.  overall, it amounts to about a 5% slowdown on jrgm's
page-loader test.  dropping the number of http connections actually helped with
this problem because it essentially reduced the number of active consumers of
http, and hence kept the ui thread's event queue free'er... this is just an
educated guess, though.

we're working on a patch that will enable http to reclaim connections on the
socket transport thread to get around this problem.
Well, good.  Let's hope for the best! :)
My bug is bug 83772. Comments welcome...
I tried this patch, and I crash on the ibench tests. I can't seem to get a 
backtrace out of gdb though. I can run ibench without the patch applied.
The patch is ready for review/super-review

I will check it in disabled by setting mCanInterruptParsing to false.  This will
prevent the content sink from ever returning NS_ERROR_HTMLPARSER_INTERRUPTED  so
it will never schedule ParserContinue events and the Dummy parser requests  will
not get added to the load group.


I tuned the values for:
mMaxTokenProcessingTime - New throttle I added to control Maximumum time spent
processing tokens.
mNotificationInterval - Existing throttle for controlling how often the content
sink flushes.


I tuned the values by loading: 
http://lxr.mozilla.org/seamonkey/source/view/src/nsViewManager.cpp and
http://lxr.mozilla.org/seamonkey/find?string=nsCSSFrameConstructor.cpp

then I reloading them from the cache to get consistent load times.


I set the default values for mMaxTokenProcessingTime and mNotificationInterval
as low as possible without affecting page load time. I set
mMaxTokenProcessingTime to be a multiple of mNotificationInterval by default
because the two are linearly related. Both default values can be overridden
using prefs.

I did this on the following machines with release builds:
500Mhz Linux build.
750Mhz WINNT system
433Hmz WINNT system
200Mhz WIN95 system


I also ran I-BENCH and jrgm's page-load test.

The current settings do not affect page load times but they DO NOT make Mozilla
super-responsive during the load of large pages. 
They make Mozilla more responsive but its not as responsive as I would like. We
need to improve the performance of incremental reflows so we can reduce
the value for mNotificationInterval and mMaxTokenProcessingTime without
affecting page performance.


As a test to see that it is working you can set the user pref for
mNotificationInterval to an absurdly low value 

user_pref("content.notify.interval", 1);

and load: 

http://lxr.mozilla.org/seamonkey/find?string=nsCSSFrameConstructor.cpp

Mozilla will be extremly responsive during the load. During the page load, you
can click on menus and scroll as if the document was completed loaded. You will
see small chunks of content added and reflowed.

The default values are:
user_pref("content.max.tokenizing.time", 360000);
user_pref("content.notify.interval", 120000);


You can turn off the interrupting of the parser using the following pref:

user_pref("content.interrupt.parsing", true);
Whiteboard: waiting for review/super-review
"I tried this patch, and I crash on the ibench tests. I can't seem to get a 
backtrace out of gdb though. I can run ibench without the patch applied."

I just applied the latest patch to a pull from this morning on Linux.
I ran the I-Bench performance tests with a release build and it worked fine, it
didn't crash.

I ran both the specific HTML page load test

http://i-bench.zdnet.com/ibench/performance_tests/perf_loadspeed.php

as well as running the performance tests with all of the default settings on

http://i-bench.zdnet.com/ibench/testlist/run.php

If you think the parser interruption code is causing a problem you can turn off
the interruption of the code either through the pref

user_pref("content.interrupt.parsing", false);

or you can modify 

mozilla/content/html/document/src/nsHTMLContentSink.cpp at line 2492

and change

mCanInterruptParsing = PR_TRUE; 

to:

mCanInterruptParsing = PR_FALSE; 
I should have noted earlier: I had applied the patch 'attach_id=36855' after 
bbaetz comment, to a win2k build and could run the ibench stuff with no 
problem.
Drat. I thought I updated my comment here, but obviously not.

The crash was repeatable, going away when I reverted the patch, and then
returning when I reapplied it, but then (without updating from cvs) it just
disappeared. I had no luck in getting a backtrace out of gdb, so I have no idea
what caused it.

I was seeing slower ibench times from home -> ibench.zdnet.com (ie higher
latency than to dogspit interanlly) with the patch, though, but if noone else is
seeing those then I'll just blame that as pilot error.
With the latest patch 37679, I improved performance by replacing PR_Now calls
with PR_Interval_Now. I ran I-BENCH on Linux with a release build and I am not
seeing any change in page load times.
Keywords: perf
OS: Linux → All
Hardware: PC → All
I see one problem with this patch, steps to reproduce:
Go bugzilla query page, enter "perf" to keywords-field, and submit
query -> Page loads ok, but throbber keeps spinning forever.

When i set pref user_pref("content.interrupt.parsing", false);
problem goes away.  This is on linux.

I have some other changes on my tree, so not 100% sure is it this patch.
I don't even have the patch installed (and never have) and I have the problem of
the throbber never stopping, but only with the classic skin. CVS build from
06/11, RedHat 7.1
I tried Kevin's patch on my Mac OS 9.1 400 MHz w/ 384 MB of memory and I did not
see any side-effects. However, using the default setting, I could not tell any
change in responsiveness. Then again, it wasn't that bad without it. More
performance tuning needs to be done as cranking the setting up the max did show
more responsiveness. Let's get this checked in to get more eyes looking at it.
Comments on patch:
1) in nsParser::PostContinueEvent() you should check for (nsnull != mEventQueue) 
before creating the new event - if null, ev will leak (and it is a waste of 
time creating it).
2)new data members in nsParser.h / nsHTMLContentSink.cpp: should we use 
PRPackedBool instead of PRBool? Probably no big deal sinec there are very few 
parser or content sink instances.
3) Initialize the statics for DummyParserRequest?

These are all really minor comments. The code look good, and it looks like the 
testing has been thorough too. I did not see any problems with the use of the 
mCanInterruptParsing data member, but did you check to make sure that disabling 
interrupt-of-parsing does indeed disable it? If so, then sr=attinasi
The patch works fine for me too (but I'm biased). :)

Couple of small nits:

1. In nsParserContinueEvent::HandleEvent(), make sure the parser argument is 
non-null.
2. This whole process needs to be documented. The parser was always incremental, 
but this 
   is a new kind of incrementalism, and should be described.

To attinasi: packed bool won't help. If you need 1 bit, it still costs you 
eight. 

Rick
[Nit on a nit: PRPackedBool is eight bits (unsigned char). PRBool is 32 (int).]
This sounds very related to bug 15122.
Whiteboard: waiting for review/super-review → waiting for approval
patch id=36536 looks good. r=harishd
patch id=38536 looks good. r=harishd
sr=vidur@netscape.com per phone conversation with Vidur
Blocks: 83989
a=chofmann
Checked in patch 38536 with the default for content.interrupt.parsing set to false. 

To test interrupting the parser to improve user interactivity during long page
loads set the pref

user_pref("content.interrupt.parsing", true);

You can increase the responsiveness by changing content.notify.interval.

The default is
user_pref("content.notify.interval", 120000);
if you set it to
user_pref("content.notify.interval", 1000);

Mozilla will get very responsive during long page loads, but overall page load
times will increase.  The current defaults are tuned to not affect overall page
load times.
Whiteboard: waiting for approval → Checked in a fix, but it is disabled. To enable the fix add user_pref("content.interrupt.parsing", true); to pref.js
Whiteboard: Checked in a fix, but it is disabled. To enable the fix add user_pref("content.interrupt.parsing", true); to pref.js → Checked in a fix, but it is disabled. To enable the fix add user_pref("content.interrupt.parsing", true); to pref.js, critical for 0.9.2
I am still seeing problem with bugzilla queries. When
setting user_pref("content.interrupt.parsing", true);
after bugzilla query throbber just keep spinning. When setting
pref to false, throbber stops after query.

Tested with build 2001062121 in linux and new profile.
I think we have what we need for mozilla 0.9.2 with the addtion of the 
prefs to get us some help with testing.  moving this off the critical
for 0.9.2 list. adding the nsbranch keyword so we can track if we want
to take it for the netscape release later on down the road.
Keywords: nsBranch
Whiteboard: Checked in a fix, but it is disabled. To enable the fix add user_pref("content.interrupt.parsing", true); to pref.js, critical for 0.9.2 → Checked in a fix, but it is disabled. To enable the fix add user_pref("content.interrupt.parsing", true); to pref.js,
Bugzilla queries are likely suffering from another problem. See bug 81524.
No its not that ui freezes, mozilla works fine, just trobber keeps spinning,
AND this only happens when i enable content.interrupt.parsing.
Harish and I looked at the problem with the bugzilla query. The problem is that
necko loads the entire document then starts a new load of the same document for
some reason. Necko never passes the flag to indicate the document has completed
for the second load so the DummyParserRequest that is added when
content.interrupt.parsing is enabled never gets removed from the load request so
the throbber keeps going.  
The bugzilla query page is multipart/x-mixed-replace, so the document should be
loading twice, shouldn't it? Or something like that. 
Depends on: 87370
With Darin's help I traced into the multi-part converter code and the problem is 
that the nsMultiMixedConv implementation doesn't pass the OnStopRequest through 
to the parser for the second document.

I filed bug 87370 for this problem. 
Target Milestone: mozilla0.9.2 → mozilla0.9.3
The patch I attached will dynamically switch between two different settings for
the content.notify.interval and max.token.processing.interval based on the users
actions. The basic algorithm is if a user event (mouse mouse, keypress etc.) has
occurred in the last second then use values which maximizes interactive
responsiveness. If a user event has not occurred in the last second then return
to the values which maximize page-load performance. The idea is that if the user
is going to grab the thumb of the scrollbar or make a menu selection they would
prefer to emphasize the processing of the user-events over page-load. 
sr=attinasi for the patch ID 40220

The patch makes interaction really sweet, and actually makes the app look 
'smart' when a large page is loading and you reach for the scroll-thumb because 
the mouse-movement slows down the load and makes grabbing the thumb easier. 
Also, the menus are way more responseive when you want to access them, but page 
load is full-speed when the mouse is left alone. This is a great improvement 
over the static inverval!
Remove

+  printf("Enable interrupting parser %d\n", enableInterruptParsing);


with that r=harishd.
I did some runs with a 06/26 trunk build on linux and win98. (I didn't
get to mac, since I was chasing down a different issue [bug 88166] at
the same time).

What I did was to create a new profile and set it up with sidebar collapsed,
in modern skin, and then after a few restarts, run the page-loader test. 
The first run was with the default settings, which makes this turned off
at the moment. Between each test, I delete NewCache, history.dat, cookies.txt
and restarted a few times to get to the same initial state. I ran tests 
with the pref set to true and timing values of 120000, 10000, 40000, 80000,
and then back to 120000, and finally again as 'off' (disabled). 


Linux/500MHz/128MB/rh9.1
 times in msec.
              cached    uncached 
1)    off      1775       1949 
2) 120000      1795       1925 
3)  10000      1998       2107 
4)  40000      1795       1914 
5)  80000      1777       1896 
6) 120000      1776       1933 
7)   off       1765       1890 


win98/500MHz/128MB
 times in msec.
              cached    uncached 
1)    off      1404       1680
2) 120000      1442       1656
3)  10000      1471       1713
4)  40000      1456       1684
5)  80000      1431       1675
6) 120000      1454       1665
7)   off       1413       1640


So, it is effectively neutral (according to my testing) on Linux
[i.e., can't differentiate the changes from 1% noise]. However, I do
see a measurable difference of about ~3% for cached load times on
win98, for a setting of 120000 (the default). 

(I should note that I think this patch has much different dynamics on 
win98 than on win NT variants: when I ran on win2k with a setting of 
10000 usec, the DOM L2 spec shot up from ~4 seconds to ~30 seconds to 
finish loading. But on win98, I only went from ~4 to 5.5 seconds (I 
think).)

It would still be good to run the ibench test, and to check the Mac
(my bad).  Sorry not getting to this earlier. (p.s. I am away until
Monday afternoon).
I found a problem:

when loading http://i-bench.zdnet.com/ibench/i-bench.htm if I have the lastest
patch installed and I move the pointer around rapidly to cause the parser to be
interrupted very early on in the load process the page (sometimes) doesn't load
at  all.  

At the beginning of the i-bench document it does a document.write of a script
tag which has an href to external file that contains JavaScript to execute. I
believe this triggers the problem. I also get an assertion:

0[ba2c40]: ###!!! ASSERTION: nsHTMLDocument::Reset() - dummy doc write request s
till exists!: 'mDocWriteDummyRequest == nsnull', file S:\mozilla\content\html\do
cument\src\nsHTMLDocument.cpp, line 371

The following is the stack that triggers the assertion:


nsDebug::Assertion(const char * 0x01cf2a9c, const char * 0x01cf2a7c, const char
* 0x01cf2a44, int 371) line 290 + 13 bytes
nsHTMLDocument::Reset(nsHTMLDocument * const 0x028f7a40, nsIChannel *
0x02de5920, nsILoadGroup * 0x00ed4ee0) line 371 + 49 bytes
nsHTMLDocument::OpenCommon(nsIURI * 0x028e0760) line 2137 + 39 bytes
nsHTMLDocument::Open(nsHTMLDocument * const 0x028f7b80, nsIDOMDocument * *
0x0012e390) line 2244 + 18 bytes
nsHTMLDocument::Open(nsHTMLDocument * const 0x028f7b7c) line 2212 + 40 bytes
nsHTMLDocument::WriteCommon(const nsAString & {...}, int 0) line 2309 + 31 bytes
nsHTMLDocument::ScriptWriteCommon(int 0) line 2425 + 19 bytes
nsHTMLDocument::Write(nsHTMLDocument * const 0x028f7b80) line 2452
XPTC_InvokeByIndex(nsISupports * 0x028f7b80, unsigned int 19, unsigned int 0,
nsXPTCVariant * 0x0012e83c) line 139
XPCWrappedNative::CallMethod(XPCCallContext & {...}, XPCWrappedNative::CallMode
CALL_METHOD) line 1881 + 42 bytes
XPC_WN_CallMethod(JSContext * 0x00f6f420, JSObject * 0x00dab2b0, unsigned int 1,
long * 0x024636ec, long * 0x0012ea70) line 1252 + 11 bytes
js_Invoke(JSContext * 0x00f6f420, unsigned int 1, unsigned int 0) line 807 + 23
bytes
js_Interpret(JSContext * 0x00f6f420, long * 0x0012f888) line 2702 + 15 bytes
js_Execute(JSContext * 0x00f6f420, JSObject * 0x00dab138, JSScript * 0x02dffe40,
JSStackFrame * 0x00000000, unsigned int 0, long * 0x0012f888) line 986 + 13 bytes
JS_EvaluateUCScriptForPrincipals(JSContext * 0x00f6f420, JSObject * 0x00dab138,
JSPrincipals * 0x028f60b0, const unsigned short * 0x02420028, unsigned int 741,
const char * 0x02dfe630, unsigned int 1, long * 0x0012f888) line 3273 + 25 bytes
nsJSContext::EvaluateString(nsJSContext * const 0x00f6e920, const nsAString &
{...}, void * 0x00dab138, nsIPrincipal * 0x028f60ac, const char * 0x02dfe630,
unsigned int 1, const char * 0x00000000, nsAString & {...}, int * 0x0012f8f4)
line 608 + 85 bytes
nsScriptLoader::EvaluateScript(nsScriptLoadRequest * 0x029bde90, const
nsAFlatString & {...}) line 565
nsScriptLoader::ProcessRequest(nsScriptLoadRequest * 0x029bde90) line 477 + 22 bytes
nsScriptLoader::OnStreamComplete(nsScriptLoader * const 0x028f50c4,
nsIStreamLoader * 0x029bdbd0, nsISupports * 0x029bde90, unsigned int 0, unsigned
int 741, const char * 0x02dfe750) line 756
nsStreamLoader::OnStopRequest(nsStreamLoader * const 0x029bdbd4, nsIRequest *
0x029bd8d0, nsISupports * 0x00000000, unsigned int 0) line 120 + 81 bytes
nsStreamListenerTee::OnStopRequest(nsStreamListenerTee * const 0x02dfeac0,
nsIRequest * 0x029bd8d0, nsISupports * 0x00000000, unsigned int 0) line 25
nsHttpChannel::OnStopRequest(nsHttpChannel * const 0x029bd8d4, nsIRequest *
0x029b9bf0, nsISupports * 0x00000000, unsigned int 0) line 2111
nsOnStopRequestEvent::HandleEvent() line 161
nsARequestObserverEvent::HandlePLEvent(PLEvent * 0x029cc9e4) line 64
PL_HandleEvent(PLEvent * 0x029cc9e4) line 590 + 10 bytes
PL_ProcessPendingEvents(PLEventQueue * 0x00f32f50) line 520 + 9 bytes
_md_EventReceiverProc(HWND__ * 0x00050778, unsigned int 49363, unsigned int 0,
long 15937360) line 1071 + 9 bytes
USER32! 77e7124c()
00f32f50()


My guess is that we have some code that assumes that when the stream is complete
that all of the document has been tokenized and the content model is complete. 
The interrupting of the parser is disabled if we are in a sub-parser context so
it will not interrupt the parser when it's in the middle of the parsing the
result of a document.write, but there may be some other places where we need to
disallow the interrupting of the parser.
Kevin, does that page fail to load with the interval set to a very low number
_without_ the patch that makes it dynamic? I would think that it would, since
the dynamic stuff just sets the interval really low...
If I set initial value for the mNotificationInterval to be the same as the value
that it is lowered to when there is a user event it does not fail. The assertion
only appears when the mNotificationInterval is dynamically switched between two
values. 
Ok, I found the problem. I missed one of the places where I should have replaced
mNotificationInterval with GetNotificationInterval:

nsHTMLContentSink.cpp

PRBool
HTMLContentSink::IsTimeToNotify()
{
 ...

  LL_I2L(interval, mNotificationInterval);

Should be

  LL_I2L(interval, GetNotificationInterval());

Great, I'm glad you found it. [s]r=attinasi on the inline fix.
I checked patch 40583 into the trunk. Dynamic interrupting of the parser is now
turned on by default on the trunk. To test this, try loading a large document
such as http://lxr.mozilla.org/seamonkey/find?string=nsCSSFrameConstructor.cpp.
As you move the mouse to grab the scrollbar or make a menu selection Mozilla
should be extremely responsive. If you stop moving the mouse Mozilla will go
back to the defaults to enable fast page load.

 
Whiteboard: Checked in a fix, but it is disabled. To enable the fix add user_pref("content.interrupt.parsing", true); to pref.js, → Checked in a fix to the trunk
I ran I-BENCH with a release build on WINNT 750Mhz CableModem and the times were
unchanged with the dynamic switching. However, the times are slower with dynamic
switching turned on if the mouse pointer is inside the window when the I-BENCH
tests are run. This is because windows generates a mouse move event when a new
window is constructed under the mouse pointer, this mouse move event causes the
lower content.notification.interval and max.token.processing intervals to be
used for 3/4 of second when the page begins loading.  

I can improve the performance on I-BENCH when the pointer is in the window by
increasing the lower values returned for content.notification.interval and
max.token.processing intervals in the GetNotificationInterval() and
GetMaxTokenProcessingTime(). However, Doing this makes Mozilla less interactive
during long page loads.  Suppressing the mouse move events generated when the
mouse just happens to be over a window when it is created would be a better
solution.

cc: saari for the mouse event on page load issue.
3/4 second added on to the total i-bench time, or to the per-run time?
With the latest patch installed there isn't any impact on I-Bench load times.

The 3/4 of second is how long the dynamic switching waits after a user-event
before it switches to lower settings to improve user responsiveness. If another
user-event  occurs within 3/4 of second of the previous user event it will stick
with the lower settings. If 3/4 of second goes by without a user event it
switches back to the higher settings which maximimizes page load performance. 
The problem I was having with I-BENCH was that when each page was loaded and the
mouse pointer was in the window it was jumping to the lower settings because the
creation of the window was generating a mouse move event. It should be sticking
with higher settings which maximize page load I wasn't actually moving the mouse
or typing. The lastest patch waits 2 seconds after the beginning of the document
load before it looks at the user event time when determining if it should jump
to the lower setting. This eliminates the jumping to the lower value when
Mozilla cycles through pages. It also eliminates the jumping to the lower value
immediately after the user hits return in the URL bar, or clicks on a link if it
results in the loading of a new document.
I'd like to suggest that the comment be reworded a little (it is a confusing
sentance). From:
+    // should be used. but only consider using the lower
+    // value if the document has already been loading 
+    // for 2 seconds. 2 seconds was chosen because it is
+    // greater than the default 3/4 of second that is used
+    // to determine when to switch between the modes and it
+    // gives the document a little time to create windows
+    // result in user events because on some systems creating
+    // a window under the mouse pointer will generate a
+    // mouse move event.

to

+ // should be used. But only consider using the lower
+ // value if the document has already been loading
+ // for 2 seconds. 2 seconds was chosen because it is
+ // greater than the default 3/4 of second that is used
+ // to determine when to switch between the modes and it
+ // gives the document a little time to create windows.
+ // This is important because on some systems (Windows, 
+ // for example) when a window is created and the mouse
+ // is over it, a mouse move event is sent, which will kick
+ // us into interactive mode otherwise. It also supresses
+ // reaction to pressing the ENTER key in the URL bar...

Or something like that. Anyway, the information is there, I think you just had a
typo or something. 

Also, since the value 2 seconds is really related to the 3/4 second default
value (2X + fuzz-factor), why not make them consts or defines and put the
relationship into the code (in the odd event that the default interval is
changed, the wait-time would be changed automatically that way). Not critical,
but it seems cleaner to me.

sr=attinasi.
r=harishd
Checked in patch 40719 into the trunk. I-BENCH scores should be back to what 
they were previously.
If you want to bring the Mozilla0.9.2 branch up to date with the patches I've
installed on the trunk apply patches 40583 and 40719
Assignee: kmcclusk → harishd
Status: ASSIGNED → NEW
Reassigning this bug to Harish since I will be gone until Aug 29.
If you encounter problems and you want to turn off the interrupting of the
parser as the default: change line 2509 in nsHTMLContentSink.cpp from:

PRBool enableInterruptParsing = PR_TRUE;

to be

PRBool enableInterruptParsing = PR_FALSE;


adding vtrunk keyword to indicate the bug has been fixed on the trunk.  Bug left
open for tracking checkin to branch (nsbranch) when appropriate.

Once bug has been fixed on the branch also, pls remove vtrunk keyword.
Keywords: vtrunk
This is a high impact fix for user experience during long page loads.  It has
also been on the trunk for a week now.  Marking nsBranch+.
Whiteboard: Checked in a fix to the trunk → [nsBranch+] Checked in a fix to the trunk
Ran browser buster, with build ID 2001070604, and did not see anything abnormal.
Verified FIXED on today's trunk builds on Win2k and Linux. I checked this by
comparing the interactivity of the application while loading
nsCSSFrameConstructor.cpp via LXR. 
Whiteboard: [nsBranch+] Checked in a fix to the trunk → [nsBranch+][fixed and verified on the trunk]
Whiteboard: [nsBranch+][fixed and verified on the trunk] → [pdt+][nsBranch+][fixed and verified on the trunk]
Checked into branch on behalf of Harish.  Marking fixed.
Really marking fixed.
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
chris - pls verify this on the branch builds. Thanks.
Keywords: vtrunkvbranch
Marking verified with the July 16th branch builds (20010716) with Linux Redhat,
Windows ME, and Mac OS 9.1.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: