Page-load slow down due to browser entered favor-native-event-mode wrongly

NEW
Assigned to

Status

()

Core
Document Navigation
8 years ago
4 months ago

People

(Reporter: Yong, Assigned: Yong)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments, 1 obsolete attachment)

(Assignee)

Description

8 years ago
User-Agent:       Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; chromeframe; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; CIBA; InfoPath.2; MS-RTC LM 8; OfficeLiveConnector.1.3; OfficeLivePatch.0.0)
Build Identifier: Mozilla/5.0 (X11: U:Linux i686:en-US)

There is an issue in nsDocShell.cpp:

gNumberOfDocumentsLoading can not keep balanced when a document location is
changed from a frame (in that case the gNumberOfDocumentsLoading will not increase). 
It will goes negative. Thus the favor-pending-native-event mode is always triggered. This has significant impact on page load performance. In our workload it has ~40% impact on page load performance.



Reproducible: Always

Steps to Reproduce:
This is a very simple test case to reproduce the issue:
== dummy.html ==
<html>
<title>dummy</title>
<body>
Hello! World!
<iframe id="ad" src="about:blank" width="100" height="100" ></iframe>
<script>
var cel = document.getElementById("ad");
cel.src = "about:blank"; 
</script>
</body>
</html>

Please add a trace from nsDocShell::EndPageLoad() to show the value of
gNumberOfDocumentsLoading, it will goes -1 after the first run and decrease 
by 1 on each page reflush.


Actual Results:  
The gNumberOfDocumentsLoading will decrease by 1 on each run.

Expected Results:  
gNumberOfDocumentsLoading is zero after page load finshed.
(Assignee)

Comment 1

8 years ago
This is the proposed fix:

diff -r 2557dd211e0c docshell/base/nsDocShell.cpp
--- a/docshell/base/nsDocShell.cpp	Wed Nov 25 12:45:17 2009 +0000
+++ b/docshell/base/nsDocShell.cpp	Sat Dec 12 14:09:56 2009 +0800
@@ -5748,12 +5748,23 @@
 
         mEODForCurrentDocument = PR_TRUE;
 
-        // If all documents have completed their loading
-        // favor native event dispatch priorities
-        // over performance
-        if (--gNumberOfDocumentsLoading == 0) {
-          // Hint to use normal native event dispatch priorities 
-          FavorPerformanceHint(PR_FALSE, NS_EVENT_STARVATION_DELAY_HINT);
+        nsLoadFlags loadFlags = 0;
+        aChannel->GetLoadFlags(&loadFlags);
+
+        // gNumberOfDocumentsLoading should keep balanced when a document location is
+        // changed from a frame (in that case the gNumberOfDocumentsLoading will not
+        // increase). If we still decrease the gNumberOfDocumentsLoading, it will goes
+        // negative. But the actual # of loading documents is greater than 0. Thus the
+        // favor-pending-native-event mode is always triggered. This has heavy impact on
+        // page load performance.
+        if (loadFlags & nsIChannel::LOAD_TARGETED) {
+            // If all documents have completed their loading
+            // favor native event dispatch priorities
+            // over performance
+            if (--gNumberOfDocumentsLoading == 0) {
+                // Hint to use normal native event dispatch priorities
+                FavorPerformanceHint(PR_FALSE, NS_EVENT_STARVATION_DELAY_HINT);
+            }
         }
     }
     /* Check if the httpChannel has any cache-control related response headers,

Updated

8 years ago
Component: DOM: Core & HTML → Document Navigation
QA Contact: general → docshell
Created attachment 417659 [details] [diff] [review]
proposed fix

copy & paste from comment 1
Attachment #417659 - Flags: review?(bzbarsky)

Updated

8 years ago
Assignee: nobody → yong.wu
I would say we should just rip this code out. As I recall it this code attempts to improve pageload speed by making us less responsive during pageload. However that's actually not something we want. Responsiveness is very important for perceived speed and so we're almost doing the opposite of what the desired behavior is here.
I agree.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Created attachment 417777 [details] [diff] [review]
Remove perfhints

This completely removes the perfhint code. The idea is that we basically always want to be responsive, and so favoring perf over responsiveness is the wrong thing always.
Attachment #417777 - Flags: review?(vladimir)
Created attachment 417783 [details] [diff] [review]
Remove perfhints v1.01

Tiny change to remove the unused |now| temporary variable in nsBaseAppShell::OnProcessNextEvent. No other changes compared to last patch
Attachment #417777 - Attachment is obsolete: true
Attachment #417783 - Flags: review?(vladimir)
Attachment #417777 - Flags: review?(vladimir)
Is the "responsive" mode perhaps too responsive, though?
I *think* not, but I'd definitely like for someone that knows the event loop stuff to review.

I pushed a tryserver build, but I suspect that won't be too useful since there shouldn't be native events there.
Comment on attachment 417783 [details] [diff] [review]
Remove perfhints v1.01

This looks fine to me, but I'm wondering whether we still need all the prefs.. I guess that's so that we can interrupt after either N ms or X tokens or something?
Attachment #417783 - Flags: review?(vladimir) → review+
Yeah, I'd rather change that separately so that we know which changes cause what effects.

All of that stuff is changing with the HTML5 parser anyway.
Comment on attachment 417659 [details] [diff] [review]
proposed fix

Removing review request, since we're removing the code.
Attachment #417659 - Flags: review?(bzbarsky)
(Assignee)

Comment 12

8 years ago
Thanks for the rapid responses from you guys. While I have a concerns with this change (remove perfhints and favor native events always). Most of plugins (let's say the flash plugin on gtk widget) will send a lot of native events in a much higher frequency than human interactions. Considering a windowless plugin which invalidates part of its window from time to time with 40 frames per seconds (like some ads which plays in a loop), and when a user opens multiple tabs browsing, the page loads of other tabs will be delayed easily.

If we considering running browser on a netbook which has less powerful cpu (mostly has only 1 core, poor concurrent threading support), things will even worse. Those native events busy-loop will lag slow the browser significantly.

The previous code is to avoid trap into the busy loop when some pages are under loading. It is a way to balance native events and users.
Note that we're *already* favoring native events most of the time. We're just not doing it during page load. This patch just makes us behave the same during pageload as we do normally
(Assignee)

Comment 14

8 years ago
(In reply to comment #13)
> Note that we're *already* favoring native events most of the time. We're just
> not doing it during page load. 

Yes. 

> This patch just makes us behave the same during
> pageload as we do normally

This is a the difference. If we disable the favor-perf-mode, when some plugins are displayed (which sends lots of native events to the glib main loop), the page load on some other tabs will be blocked due to the main thread is falling into the native events busy loop.
If these native events from plugins are causing problems, why should those problems only be affecting pageload? Why not things such as XMLHttpRequest loads, timers, javascript animations, layout reflows etc?

If what you're saying is true, then this is something we need to fix some other way than just adding a hack for pageload.

Also, ideally plugins shouldn't be using native events. What we really want to prioritize is UI events, such as mouse and keyboard, in order to improve user responsiveness. Do you know if the events that the plugins are sending is causing this code:

http://mxr.mozilla.org/mozilla-central/source/widget/src/windows/nsWindow.cpp#2749

to return "true"? If so, pageload is likely getting hurt quite a bit already by these plugin events.
(Assignee)

Comment 16

8 years ago
(In reply to comment #15)
> If these native events from plugins are causing problems, why should those
> problems only be affecting pageload? Why not things such as XMLHttpRequest
> loads, timers, javascript animations, layout reflows etc?
> 
> If what you're saying is true, then this is something we need to fix some other
> way than just adding a hack for pageload.

That is probably. But I think most of http request and frame reflow are already covered in page loading (I mean, between CreateContentViewer() and EndPageLoad()).

> 
> Also, ideally plugins shouldn't be using native events. What we really want to
> prioritize is UI events, such as mouse and keyboard, in order to improve user
> responsiveness. Do you know if the events that the plugins are sending is
> causing this code:
> 
> http://mxr.mozilla.org/mozilla-central/source/widget/src/windows/nsWindow.cpp#2749
> 
> to return "true"? If so, pageload is likely getting hurt quite a bit already by
> these plugin events.

Not that case. This is actually easy to observe - let's see an example:
== dummy2.html ==
<html>
  <head>
    <title> Wall-Clock </title>
  </head>
  <body bgcolor="green">
<!-- <<< -->
  <div>
    <object type="application/x-shockwave-flash" data="http://www.bbc.co.uk/home/object/clock/tiny.swf" height="400" width="400">
    <param name="movie" value="http://www.bbc.co.uk/home/object/clock/tiny.swf">
    <param name="wmode" value="transparent">
    </object>
  </div>
<!-- >>> -->
  </body>
</html>


Please put a trace from nsBaseAppShell.cpp and you will see a lot of native events processed triggered the native event loop. Please also remove the object block and try tracing again - I am sure you will find the difference.

Except the events triggered by plugin _invalidaterect (yes it is a windowless plugin), most of other are the dummy events just to let the flash to keep playing. You could also changed it to windowed mode, there are still lots of dummy native events.
(Assignee)

Comment 17

8 years ago
Btw, I am talking about the GTK widget backend on Linux. I am not sure whether the same thing happens to Windows.
Not sure if this is related, but in the last few days I've noticed that I can barely interact with the entire browser while playing a flash video on zeit.de, e.g. in the middle of this page: http://www.zeit.de/2009/52/Schachpartie
Pausing the video or scrolling it offscreen makes the issue go away. I'm on Windows, using Monday's trunk nightly.
(In reply to comment #18)
> Not sure if this is related, but in the last few days I've noticed that I can
> barely interact with the entire browser while playing a flash video on zeit.de,
> e.g. in the middle of this page: http://www.zeit.de/2009/52/Schachpartie
> Pausing the video or scrolling it offscreen makes the issue go away. I'm on
> Windows, using Monday's trunk nightly.

Not related since nothing has landed yet.
Nothing has landed here, but obviously there's some code in place to favor native events, and Yong says some plugins send lots of native events...
My point is that if plugins, or something else, causes bad behavior due to excessive amounts of native events, then the current fix to de-prioritize them during page load, is the wrong fix. Simply for the reason that it only changes things during page load.

Also note that when you can't interact with the page, then that is *because* native events are not receiving enough attention. Specifically the native events produced by your mouse/keyboard interactions.

The whole purpose of this patch is to improve responsiveness, at the cost of wall clock time for loading pages. What the current code does is to intentionally *reduce* responsiveness so that we can finish parsing and rendering the page quicker. The patch changes that so that responsiveness is always prioritized.
(In reply to comment #21)
> My point is that if plugins, or something else, causes bad behavior due to
> excessive amounts of native events, then the current fix to de-prioritize them
> during page load, is the wrong fix. Simply for the reason that it only changes
> things during page load.

I didn't mean to imply anything else. I was merely suggesting that native event spamming does indeed happen.
ccing roc and josh for comment 16.  Sounds like something we should fix somehow.
I also wonder whether OOPP makes comment 16 better by making the plugin just swamp the plugin process event loop, not our main one.
I'm not sure where native event spamming would come from on Linux. We don't fire idle events like we do for Mac plugins. If the plugin decides to fire lots of events, there's not much we can do, except use OOPP to protect the browser. The power issue remains, but it's a problem with the plugin.
(Assignee)

Comment 26

8 years ago
(In reply to comment #25)
> I'm not sure where native event spamming would come from on Linux. We don't
> fire idle events like we do for Mac plugins. If the plugin decides to fire lots
> of events, there's not much we can do, except use OOPP to protect the browser.
> The power issue remains, but it's a problem with the plugin.

At least 3 kinds of native events sent via plug-ins upon my observation (please see the attachment for the traces).
1) A windowless plugin will send GDK_EXPOSE events each time it invalidates the canvas. Browser will catch such events, provide a drawable and send back GraphicsExpose to plugin.

2) With Adobe flash player plugin, besides expose event, a windowless plugin will also send another native events continually. I do not actually know the detail about that event (I did not got the Adobe source). But I have tracing results (in the attachment). It shows the callback is added to the glib main loop when the flash file loading finished. More important, the frequency of this event is 3x of expose event.

For windowed plugin, no expose events sent. All the events are the 2nd one.

3) In NonXEmbed mode (plugin like swfdec still use XtBin), the plugin will send xt_events to main loop all the way.
(Assignee)

Comment 27

8 years ago
Created attachment 418547 [details]
Plugin native events mentioned in comment 26.

Add the traces mentioned in comment 26.
(In reply to comment #15)
> Also, ideally plugins shouldn't be using native events. What we really want to
> prioritize is UI events, such as mouse and keyboard, in order to improve user
> responsiveness.

Sounds good.  I don't see any documentation for NPN_ScheduleTimer.
Would using this actually improve the situation (with our current event
scheduling code), anyway?
(Would these events somehow have lower priority than input events?)
(In reply to comment #27)
>Here are more info on native events sent by Adobe flash player plugin:

Thanks.  This is useful data.

>1) A windowless plugin will send GDK_EXPOSE events each time it invalidates the canvas. Browser will catch such events, provide a drawable and send back GraphicsExpose to plugin.
>
>#0  expose_event_cb (widget=0xb3d0d600, event=0xbfb3fec4) at gwidget/src/gtk2/nsWindow.cpp:5404
>#1  0xb78d7d36 in _gtk_marshal_BOOLEAN__BOXED (closure=0xb3d3d500, return_value=0xbfb3fc70, n_param_values=2, param_values=0xb0a7f760, invocation_hint=0xbfb3fc5c,
>    marshal_data=0xb4d1bca4) at gtkmarshalers.c:84
>#2  0xb7478f5b in IA__g_closure_invoke (closure=0xb3d3d500, return_value=0xbfb3fc70, n_param_values=2, param_values=0xb0a7f760, invocation_hint=0xbfb3fc5c) at gclosure.c:767
>#3  0xb748f229 in signal_emit_unlocked_R (node=0xb6a7f5b0, detail=0, instance=0xb3d0d600, emission_return=0xbfb3fda8, instance_and_params=0xb0a7f760) at gsignal.c:3247
>#4  0xb749070f in IA__g_signal_emit_valist (instance=0xb3d0d600, signal_id=56, detail=0, var_args=0xbfb3fe00 "") at gsignal.c:2990
>#5  0xb7490cf6 in IA__g_signal_emit (instance=0xb3d0d600, signal_id=56, detail=0) at gsignal.c:3037
>#6  0xb79f68de in gtk_widget_event_internal (widget=0xb3d0d600, event=0xbfb3fec4) at gtkwidget.c:4767
>#7  0xb78d188b in IA__gtk_main_do_event (event=0xbfb3fec4) at gtkmain.c:1582
>#8  0xb7656608 in _gdk_window_process_updates_recurse (window=0xb6b3b660, expose_region=0xb0ac9960) at gdkwindow.c:5061
>#9  0xb767e674 in _gdk_windowing_window_process_updates_recurse (window=0xb6b3b660, region=0xb0ac9960) at gdkwindow-x11.c:5566
>#10 0xb7651db6 in gdk_window_process_updates_internal (window=0xb6b3b660) at gdkwindow.c:5220
>#11 0xb7653f2f in IA__gdk_window_process_all_updates () at gdkwindow.c:5328
>#12 0xb7653f8b in gdk_window_update_idle (data=0x0) at gdkwindow.c:4954

This should not actually be called for each invalidate but only when there are
no events waiting with higher priority than GDK_PRIORITY_REDRAW ==
(G_PRIORITY_HIGH_IDLE + 20).  (i.e. after X events have been processed.)

OOPP will not help here (without significant change in implementation).

IIUC, events scheduled from Mozilla have G_PRIORITY_DEFAULT and therefore
should be processed before gdk_window_update_idle is called, so I don't expect
the expose events to have a heavy impact on page load time.  If this is having
a big effect then I guess something is not behaving as intended.
http://hg.mozilla.org/mozilla-central/annotate/ec5b16168109/widget/src/gtk2/nsAppShell.cpp#l125

(Perhaps some Mozilla events should be processed after X input events, but
AFAIK Mozilla events do not have priorities, making it difficult to do the
right thing here.)

>2) Besides GDK_EXPOSE, a windowless plugin will also send another native events continually. I dont actually know what is the event. But it is added to the glib main loop when the flash file loading finished.
>
>#0  IA__g_timeout_add_full (priority=0, interval=10, function=0xac2e2a50, data=0xab2a9000, notify=0) at gmain.c:3511
>#1  0xb73cda1f in IA__g_timeout_add (interval=10, function=0xac2e2a50, data=0xab2a9000) at gmain.c:3560
>#2  0xac2e2a04 in ?? () from /home/yong/.mozilla/plugins/libflashplayer.so

That's 100 times per second with G_PRIORITY_DEFAULT, which is the same
priority as GDK picks up X events, so could well affect responsiveness (and
everything else).

Out-of-process plugins should help if this is the bottleneck.  Try a Firefox
nightly build with the dom.ipc.plugins.enabled pref set to true.

>3) In NonXEmbed mode (XtBin, plugin like swfdec still use it), the plugin will send xt_events to main loop all the way. 
>#0  xt_event_polling_timer_callback (user_data=0xb0a26800) at gwidget/src/gtkxtbin/gtk2xtbin.c:198
>#1  0xb744ba46 in g_timeout_dispatch (source=0xb1e3c6f0, callback=0xb6203b8f <xt_event_polling_timer_callback>, user_data=0xb0a26800) at gmain.c:3398
>#2  0xb744b368 in IA__g_main_context_dispatch (context=0xb6a22400) at gmain.c:1960
>#3  0xb744ec49 in g_main_context_iterate (context=0xb6a22400, block=0, dispatch=1, self=0xb6a05340) at gmain.c:2591
>#4  0xb744ef58 in IA__g_main_context_iteration (context=0xb6a22400, may_block=0) at gmain.c:2654
>#5  0xb4da01c1 in nsAppShell::ProcessNextNativeEvent (this=0xb6958920, mayWait=0) at gwidget/src/gtk2/nsAppShell.cpp:147
>#6  0xb4dca69c in nsBaseAppShell::DoProcessNextNativeEvent (this=0xb6958920, mayWait=0) at gwidget/src/xpwidgets/nsBaseAppShell.cpp:154
>#7  0xb4dcabcf in nsBaseAppShell::OnProcessNextEvent (this=0xb6958920, thr=0xb6a90380, mayWait=1, recursionDepth=0)
>    at gwidget/src/xpwidgets/nsBaseAppShell.cpp:292
>
>It is added from browser side when the plugin initialization:

This is actually added with g_timeout_add(25,...) so happens every 25 ms:
http://hg.mozilla.org/mozilla-central/annotate/ec5b16168109/widget/src/gtkxtbin/gtk2xtbin.c#l374

I don't know whether 40 times per second is a good or bad number.

If a significant amount of time is spent in these calls, then perhaps the
priority of xt_polling_timer_id can be lowered to G_PRIORITY_DEFAULT_IDLE.

Also, perhaps XtIMXEvent events should not be handled here, because they are
processed elsewhere.

>#0  IA__g_main_context_add_poll (context=0x0, fd=0xb630c16c, priority=300) at gmain.c:2982
>#1  0xb6309067 in gtk_xtbin_new (parent_window=0xb6ce47f0, f=0x0) at gwidget/src/gtkxtbin/gtk2xtbin.c:368

The source for xt_event_funcs actually has GDK_PRIORITY_EVENTS ==
G_PRIORITY_DEFAULT, so using G_PRIORITY_LOW is strange here.

Perhaps xt_event_polling_timer_callback is actually currently picking up the
XtIMXEvent events, and so GDK_PRIORITY_EVENTS should be used on the poll if
xt_event_polling_timer_callback no longer handles the XtIMXEvent events.

Whether the Xt events are actually the ones slowing page load is not clear
though.  There are a number of tweaking parameters for these, so I hesitate to
change any without data.

Really some profiling data is needed to give more clues as to what is slowing
things down.  Or just try OOPP first to see if that helps.
Depends on: 531142
Karl: So what is your opinion. Can we safely take this patch for now? Or do we need to fix how mozilla does event prioritization first?
(In reply to comment #30)
> Karl: So what is your opinion. Can we safely take this patch for now? Or do we
> need to fix how mozilla does event prioritization first?

I don't know for certain.  Theoretically it seems that if events required for
page load are being starved because of an excess number of native (GDK)
events, then user input would also be starved.  And it doesn't seem right to
process even fewer user input events to speed up page load.  So my gut feel is
that removing this use of FavorPerformanceHint would probably help more cases
than it hinders.

However, Yong believes that processing fewer native events helps his
situation, but I haven't used Yong's device or workloads to test or look at exactly what is happening.

I did some synthetic testing with multiple tabs and/or windows containing the
Wall-Clock plugin test from Comment 16, and it seems that OOPP (in its current
form at least) is not really going to solve the starvation problems.

In hidden tabs, the GDK_EXPOSE events (item 1 of comment 26) are not
delivered.  With many hidden tabs containing Flash, the plugin instances'
timers (item 2) are all that's involved, and OOPP plugins did (vastly) help
browser (parent) process responsiveness here.

However, with a few Flash instances visible (in separate windows), the
responsiveness with OOPP is no better than IPP, and both are affected by the
number of hidden tabs.  For visible plugins, the expose events are sent to the
plugin (child) process synchronously.  I don't know the chromium message loop
/ pump code well but it looks like these events from the parent may be
processed by the child with the same priority as other events (including the
plugin instances' timers).  If so, the parent process is ending up blocked by
the child process's event queue.

Some events sent from parent to child could be made asynchronous (with some
effort), but some should be synchronous.  Perhaps FavorPerformanceHint might
actually work well in the plugin process, were it would really mean favor
internal events, which include parent messages and not a lot else I assume.

(Note that bug 535687 needs to be fixed before nightly builds can be used to test OOPP.)
(In reply to comment #31)
> (In reply to comment #30)
> > Karl: So what is your opinion. Can we safely take this patch for now? Or do we
> > need to fix how mozilla does event prioritization first?
> 
> I don't know for certain.  Theoretically it seems that if events required for
> page load are being starved because of an excess number of native (GDK)
> events, then user input would also be starved.  And it doesn't seem right to
> process even fewer user input events to speed up page load.  So my gut feel is
> that removing this use of FavorPerformanceHint would probably help more cases
> than it hinders.

Based on this I'm going to attempt to land this patch on the m-c. It's early in the release cycle so perfect time for a change like this :)


> Some events sent from parent to child could be made asynchronous (with some
> effort), but some should be synchronous.  Perhaps FavorPerformanceHint might
> actually work well in the plugin process, were it would really mean favor
> internal events, which include parent messages and not a lot else I assume.

But then we'd want that to be happening all the time in the child process, right? Not just while we're loading a page?
(In reply to comment #32)
> But then we'd want that to be happening all the time in the child process,
> right? Not just while we're loading a page?

Yes, exactly.
You need to log in before you can comment on or make changes to this bug.