Closed Bug 132759 (100%CPU) Opened 18 years ago Closed 18 years ago

[HANG]high(100%) cpu usage with Flash: WM_USER+1 events given too high priority and starve UI

Categories

(Core :: Plug-ins, defect, P1, critical)

x86
Windows NT
defect

Tracking

()

VERIFIED FIXED
Future

People

(Reporter: andrea.aime, Assigned: serhunt)

References

()

Details

(Keywords: hang, perf, topembed-, Whiteboard: [PL2:Vendor]][NOT A MOZILLA ISSUE THIS IS A FLASH ISSUE -- NOT MOZILLA])

Attachments

(5 files, 22 obsolete files)

8.00 KB, application/x-shockwave-flash
Details
1.29 KB, application/octet-stream
Details
25.19 KB, application/octet-stream
Details
2.03 KB, application/octet-stream
Details
55.04 KB, patch
serhunt
: review+
serhunt
: superreview+
Details | Diff | Splinter Review
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; WinNT4.0; en-US; rv:0.9.9+) Gecko/20020314
BuildID:    2002031403

Just try to go to the cited URL and see... there are
lots of problems with this page, fixed background,
animated gifs and lots of flash animation, but in
IE 5 it works fine

Reproducible: Always
Steps to Reproduce:
1. go to www.informaniak.it
2.
3.

Actual Results:  The page should not hang mozilla with 100% CPU utilization

Expected Results:  Had to kill Mozilla...

Hardware: PII450 with 380 MB RAM
Also very slow on www.infomaniak.it, but CPU goes to 100% 
only when scrolling... seems that fixed background is 
playin a major part in the slowdown... maybe only a duplicate
of some other bugs?
It seems to be no problems with 21th march build under WinXP.

On my computer, of course ! ;-)
My 1.5 GHz machine loads this page but it seems to be chewing plenty CPU cycles.
 This particular page is triggering the Flash plugin but I've seen other pages
running other plugins like Acrobat Reader also cause the browser to become
unresponsive.  With this page loaded I can click in the URL bar (delayed
responsiveness though) but can't seem to close the button with the top right `x'
window close widget.  Over to plugins for a look.
Assignee: sgehani → beppe
Component: XP Apps → Plug-ins
QA Contact: paw → shrir
Could also be that it is unrelated to the page.
Today I saw Mozilla suddenly jump to 100%
CPU just by looking severely on it...
4.79 bows down too. same behaviour on this page as 6.x
Status: UNCONFIRMED → NEW
Ever confirmed: true
*** Bug 141400 has been marked as a duplicate of this bug. ***
*** Bug 144390 has been marked as a duplicate of this bug. ***
*** Bug 144713 has been marked as a duplicate of this bug. ***
*** Bug 143580 has been marked as a duplicate of this bug. ***
I heard that the latest release of flash 6.0r26 (probably) fixes these freeze 
issues. Can anyone check as well ? Thx!
Mozilla with Shockwave Flash 6.0 r29 still goes to 100%
Henrik: what sites did you visit? Did you test the ones listed here, or do you
have other places you test?
http://terraplus.terra.com.br/ is also making mozilla suck
Priority: -- → P2
Whiteboard: [Flash]
Target Milestone: --- → mozilla1.0.1
http://terraplus.terra.com.br/ 
I crashed my debug build on the last URL playing with back/forward. By the looks
of my stack and the source, I hope bug 66748 will fix the crash.

http://zooch.hn.org/mozcrash/mozcrash1.html
Something funky is going on here. I see the plugin get created and destroyed. A
simplified testcase would help.

http://www.ananova.com/assets/ticketsbanner.swf
Is more than normal (about 30-40%) but not 100% cpu using Flash 6.0r29
this site
http://radio2.dk/kobenhavn.htm
is also pretty high on CPU usage. It's Java.
using Sun Java JRE 1.4.0
ok..my findings:

This new version does definitely bring *down* the cpu usage on "shockwave.com" . 
Initially, it peaks up to 100%..but later on comes down withiina s econd ( this 
wasn't the case with 6.0r23)..however I did not see the two advertisement 
banners come up this time ( orbitz andd another one)..maybe that's the reason?

Also, the links Henrik mentioned are still showing a rise rise in cpu usage 
...playing in the mid 90's.  Especially, The terraplus site...it has some mambo 
jumbo music on it as well...it's really slowing things down. cc'ing Troy evans 
from Macromedia for his insight.
Quite likely that this depends on Windows flavor. ananova.com works fine for me
with 5.0 r41 on WinXP.
I tested on NT
http://www.ananova.com/assets/ticketsbanner.swf

definitely spikes up to 100% and stays there with flash6.0r29 on my NT.
av, are u using flash5.x? Pls try flash6 (r29). flash6.r23 introduced these 
issues..is what I learnt from people.
shockwave.com definitely does not chow on cpu with 6.0r29 .I just checked with 
the ad banners ON.
playing of  http://www.overclockmaniak.it/newsite/images/ventola.swf 
(I've saved the content of that file in this attachment)
is very cpu expensive and this is flash plugin problem not mozilla,
well mozilla problem is that it allows to run such kind of cpu hungry 
plugins on its main ui thread:(
BTW: with IE cpu usage jumps 100% too, but its ui still can respond.
Adding also topembed here. Equally important for embed-based products. Adding
Roger to the cc since terraplus.terra.com.br is major portal in Brazil. 
Keywords: topembed
I just repro'd this using a 1.0 mfcEmbed build from today.

As this is a problem in the flash plugin, the likely only way we can resolve
this is to start running plugins out in their own thread/process which isn't
going to happen anytime soon AFAICT.
Keywords: topembedtopembed-
http://www.idefense.com/XSS.html also seems to do some high cpu usage
*** Bug 146821 has been marked as a duplicate of this bug. ***
*** Bug 148341 has been marked as a duplicate of this bug. ***
Flash and high CPU usage also in bug 99361, bug 115666, bug 124643
a re-written Summary for this bug would help find it: high cpu usage with Flash?
life has not become any easier with this 6.0r29 drop from flash. just going to 
shockwave.com or the test url above still takes the cpu to around 100%..slows 
everything down on my box and stays there for a while. the browser is literally 
hung upside down...ugh !
Summary: 100% cpu utilization, I had to kill Mozilla → high(100%) cpu usage with Flash
Still works fine for me with 6.0 r29 on WinXP, CPU is about 30-40% in almost all
testcases. The test case in the attachment does show 100% but the browser is
still perfectly responsive. I am testing with NS7.
*** Bug 149849 has been marked as a duplicate of this bug. ***
shrir: what utility are you using to monitor your usage? I am not seeing
consistent peeking, in my testing I have seen small peeks in respect to mouse
events, which the plug-in instance may be coded to monitor. On other pages, I do
not see that same problem. I wonder if this is not due to event monitoring by
the plug-in
am using task manager...I do not move my mouse once I am on shockwave.com...and 
I see a lot of slowness...arun is shockwave looking into this? Thx!
*** Bug 151357 has been marked as a duplicate of this bug. ***
is anyone taking this bug seriously ? Oh pls.. look at the number of 
dup....trust me. Can someone form macromedia be contacted? Arun, before u leave 
on vacation, you gotta clear this one up :)...pls !
*** Bug 124643 has been marked as a duplicate of this bug. ***
shrir - I think people are taking this bug seriously, but based on Comment #27
From Judson Valeski. This sounds like a issue Macromedia needs to address to ME.
Whiteboard: [Flash] → [Flash] [adt2 RTM]
ok guys, heres the deal.  It's not a flash problem totally.  Flash eats a lot of
cpu cycles. It has to, to do what it does.  The problem is in mozilla, and it's
deep down inside it.  I have researched this issue, and some of the pages posted
here are getting cpu cycles eaten up because the flash is very large on the
screen.  Many times this takes up 100% or close cpu time in ie or mozilla, this
is not a problem.  The problem is that unlike ie, mozilla runs plugins in the
same thread as its ui.  This is why ie appears responsive while mozilla does
not.  One of the main problems i've had deals with multiple instances of flash.
 If there are multiple instances of flash on a page, they slow down the browser
considerably, this is because mozilla renders the flash even when it is not
visible in the window.  In IE, you can minimize the window, or scroll down the
page to where flash isnt visible, and the cpu cycles it eats go down.  IE doesnt
render flash that isnt in view! what a good idea!  I love mozilla, but i have
recently had to go to many websites with multiple flash instances(forums with
flash signatures).  This is killing my browser so i have to go to ie(which i
hate) so pleas fix this so i can use mozilla 100% all the time! 
This bug is not about a problem in Mozilla but rather something with Flash 6
because the same problem is present using 4.x with the testcases in this bug.

I don't believe the problem was present with Flash 5, but can someone confirm?

Does anyone know if Macromedia will fix this in a dot release?
Summary: high(100%) cpu usage with Flash → high(100%) cpu usage with Flash 6
Whiteboard: [Flash] [adt2 RTM] → [Flash] [adt2 RTM][NOT A MOZILLA ISSUE THIS IS A FLASH 6 ISSUE -- NOT MOZILLA]
Bug 124643 was marked a dupe of this, but Bug 124643 is linux-only.
Also, that bug clearly showed how flash used much more CPU time in Mozilla as
compared to Netscape 4.   (which would hint towards Mozilla problem, not flash).
Should that be re-opened??
i had the same problem with flash 5, its all in the way mozilla handles plugins.
 I dont think this will get fixed very quickly, due to the massive changes it
will take to do something like this.  Plugins need to be run in their own
thread, and plugins need to be turned on only when in the viewable area.  These
2 things will make flash in mozilla 100% better imho.  as i said before, flash
takes a lot of cpu cycles by nature, in all browsers, it's the fact that mozilla
handles this much less gracefully than IE.
Having a separate thread for a plugin may be a valid point, but I can hardly
imagine, say, some audio plugin which stops playing music when user scrolls it
out of view. Does Flash really stop do whatever it is doing? Maybe they have a
special communications with IE to allow the browser to stop plugin in such a case?
I do not think it is correct to shut down the plug-in if it is out of view, as
AV mentioned that is not appropriate for an audio instance. We also have to
contend with the tabbing option - if a user tabs to a different window -- should
we really stop playing the radio? We are looking at separating out the plug-ins
and determining how effective that will really be. As Peter has mentioned, we
have had discussions with Flash and they know that there is a issue and they are
actively working that issue.
As per comment 44 I'm not convinced that bug 124643 really is a dup of this. 
The complaint was that flash on Linux had become slower in newere versions of 
Mozilla (not that it chewed up 100% of the CPU - I don't think it does). Flash 
does not chew up 100% of the CPU for me and if it were a flash problem why 
don't older versions of Mozilla (or indeed Opera 6) run so slowly?

I would also ask that someone check that 100% CPU usage happen on not Windows 
OSes (I'm not sure it does but if someone can check and change the OS on this 
bug).
*** Bug 143759 has been marked as a duplicate of this bug. ***
*** Bug 152043 has been marked as a duplicate of this bug. ***
*** Bug 152186 has been marked as a duplicate of this bug. ***
check about Bug 152186 ... it's not a duplicate.. the cpu usage is only bout 
40 - 50%.. scrolling works okay.. the only problem is bout those 3 buttons
Blocks: 71668
Blocks: 152186
No longer blocks: 71668
Keywords: perf
this should have been marked as nsbeta1- already, this is out of the mozilla code
Keywords: nsbeta1nsbeta1-
*** Bug 104218 has been marked as a duplicate of this bug. ***
*** Bug 152731 has been marked as a duplicate of this bug. ***
*** Bug 152994 has been marked as a duplicate of this bug. ***
*** Bug 153408 has been marked as a duplicate of this bug. ***
http://www.macromedia.com/software/trial_download/

downloaded flash  development trial s/w from here. Then saved a flash ad from 
usatoday.com locally and launched it in just the standalone flash player (no 
browser was open , as u will see in the screenshot)...the CPU reaches 100% with 
just two instances of the flash file open. It's a single frame with fr/rate =15 
(shows the app debug info). 
http://jazz/users/shrir/publish/Clipboard.jpg

could not attach this big a file.
Doron and I just discovered if we set the flash plugin size to 1x1 pixel the CPU
usage goes way down but sound continues. Perhaps we could check if a plugin is
not visible and then artificially set its size to 1 by 1 pixel.
To update everybody, here's what I've found so far:

With the help of Spy++, as a test, I overrode Flash's default window procedure
with my own so I could filter windows messages. I put some printf's in and some
::GetTickCount()'s to measure how long between each message. I'm not sure if
this has anything to do with the problem, but here's what I found:

a) WM_TIMER messages seem to increase with complexity of swf and/or total
quantity of swfs open.

b) I get many WM_USER+1 (code = 0x0401) messages. Many, many more with two
macromedia.com windows opened.

c) Sometimes, I get a difference of zero in tick count between consecutive
0x0401 messages. Am I incorrectly sampling time?

d) So I tried consuming these 0x0401 messages when the diff was zero, and lo and
behold Flash would not do any animation however the background color seemed to
paint. It also would not send any more of these kind of events after that.
What's more interesting is that I could easily open many, many browser windows
with macromedia.com and still have a responsive GUI. 

I'm not sure if any of this info will help, but I wonder if anyone else thinks
there could be something up with the timing of these messages or perhaps in the
Flash painting? Ideas? Thoughts? Doron's observation that a 1x1 sized plugin
doesn't show this bug could possible mean it's related to painting.

Tomorrow, I may try to figure out a way to slightly delay the forwarding of
these 0x0401 messages. We already have a bug that we are at least 25% slower on
the Mac sending idle events and I hope the same bug won't be repeated on Windows. :(
Peter, don't rely much on GetTickCount, it has quite an error span. I think its
accurateness is somewhere around 50 ms.
|GetLocalTime| is probably better. However (from MSVC help):
"It is not recommended that you add and subtract values from the SYSTEMTIME
structure to obtain relative times. Instead, you should:
  o Convert the SYSTEMTIME structure to a FILETIME structure. 
  o Copy the resulting FILETIME structure to a LARGE_INTEGER structure. 
  o Use normal 64-bit arithmetic on the LARGE_INTEGER value."
*** Bug 118949 has been marked as a duplicate of this bug. ***
handing over to andrei to help resolve this issue
Assignee: beppe → av
Real fix shouldn't pass everything around in global vars, but this demonstrates
that we can throttle the WM_USER events that Flash is apparently using as a
callback timer without impacting the performance of the animation. In fact,
this hack gets better cpu usage than IE.

Cheers
Added Freeze to summary and bumped up to P1

The patch needs to be reviewed for risk, we also need to figure out how to
resolve this issue on mac and linux. Once the patch is considered ready for
primetime, this needs to be checked in on the trunk and baked for a few days. QA
will need to test it thoroughly. We will also need to get MM to test this and
let us know if it affects anything within their application.

This may be a bit too risky to check in at this point in time (on the branch).
Keywords: hang
Priority: P2 → P1
Summary: high(100%) cpu usage with Flash 6 → [HANG]high(100%) cpu usage with Flash 6
We need to consider several options in what to do with fast consequitive messages:
  1. just pretend they are consumed and return
  2. reschedule with a timer, and when it fires:
     2.1. do SendMessage (append to the MQ synchronously)
     2.2. do PostMessage (append to the MQ asynchronously)
     2.3. do CallWindowProc (call immediately)
  3. say it is not consumed and put it back to the message queue

I am a little concerned with the approach in the above patch. As far as I
understand it follows scenario 2.3. In this case we will accumulate things and
create new timers faster than old ones fire. We probably need some smart
compromise on dropping messages. 

Ideas?
The WM_USER throttling makes me very nervous. While it fixes the 100% bug, it 
will almost certainly introduce other bugs into the Flash Player, notably in 
regards to video and sound playback.

Also, there is some risk to Mozilla that WM_USER events become unreliable, 
which could cause Mozilla problems in the future.

I fear the throttling will lead to a "whack a mole" bug situation where you're 
always putting one down to find one more...

Peter's analysis is essentially correct: we post about 120 WM_USER messages per 
second, per movie, for a 12 fps movie. This is extravagent for the plugin 
architecture, and it is a bug we are reviewing. I think a solution on the Flash 
Player side makes more sense: we're causing the bug, and our code can make a 
better quality / responsiveness decision than the browser could make for us.

A related issue -- the plugin draws when off screen -- is compounding the 
problem. We're also working on a fix for this, put the "1x1" behavior is 
shutting down the paint loop, which works around the problem. Note that 
the "1x1" solution is inadvertantly taking advantage of an eccentric behavior 
of the Flash Player, and I would hesitate to rely on it for the future.

We're working on a fix, but have no timeline for release.

lee thomason
Flash Player Architect
Our system also gives higher priority to WM_USER events (which is how PLEvents
work) than other events like painting.
Summary: [HANG]high(100%) cpu usage with Flash 6 → [HANG]high(100%) cpu usage with Flash: WM_USER+1 events given too high priority and starve UI
Whiteboard: [Flash] [adt2 RTM][NOT A MOZILLA ISSUE THIS IS A FLASH 6 ISSUE -- NOT MOZILLA] → [Flash] [adt2 RTM][NOT A MOZILLA ISSUE THIS IS A FLASH ISSUE -- NOT MOZILLA]
Lee: right now there are a variety of conditions where mozilla looks pretty
bad when running Flash. Any suggestions for things we can do in the short term
to address this?
For what it's worth, the throttling hasn't adversely affected any animations or
sound in my testing, and based on what Peter was seeing in Spy running IE,
you're getting the WM_USER events back from the queue at 70ms intervals in their
environment, so I didn't randomly choose that number.

That said, I'd much prefer the bug be fixed on the Flash side. As far as I can
tell, the message queue is being used as a heartbeat/timer. Are there any other
uses for those WM_USER events that I might want to know about?

Is Flash internally limited to a maximum frame rate or WM_USER callback rate? If
so we could match the timer to that maximum, and probably still solve the problem.
I was looking at Opera, not IE. IE looks like it's running Flash in some sort of
"windowless" mode so I don't think it's a good comparison. In looking at Opera,
I noticed different speeds, some as small as 10ms but they all seemed uniform
and most times were closer to ~50ms.
*** Bug 155815 has been marked as a duplicate of this bug. ***
Blocks: 154896
No longer blocks: 154896
Blocks: 154896
*** Bug 156268 has been marked as a duplicate of this bug. ***
Peter, we have a contact inside Opera if you think they are doing something
special  with flash.
"Any suggestions for things we can do in the short term to address this?"

Probably not; the essential problem is a bug in the Flash Player. However, even 
if you strive for the commendable goal of making the browser safe to ill-
behaved plugins, you tread a fine line between safety and introducing unforseen 
behavior to plugins (not just Flash). I'll be in on some brainstorming later 
this week on how that might be possible, but from what I know at this point 
it's not a quick fix. (Well, the fix may or may not, but the testing of both 
Mozilla and the relevant plugins will take some time.)

I don't mean to be overly conservative about this bug fix (Flash Player and / 
or Mozilla enhancement) but Mozilla and Flash needs a robust solution that we 
can be happy with for a long time.
An interesting behaviour I've noticed (I most probably missed something related
to this, so sorry if I'm repeating) in http://www.salo.cl is that every flash
piece seems to be highly responsive, while the rest of mozilla can't even manage
to redraw itself.

(Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.1a+) Gecko/20020709)
We noticed that IE takes no CPU when the window containing Flash is hidden,
while Gecko still takes up the cpu as if it was showing.  Is there a way to get
this benefit into our code?
IE looks to default to display flash in "windowless" mode so it has fine grain
control over the events. If the Flash Netscape plugin was windowless, I think we
could do the same. See bug 93959 for more information about
windowless/transparent Flash and also some sample code.
*** Bug 115666 has been marked as a duplicate of this bug. ***
Whiteboard: [Flash] [adt2 RTM][NOT A MOZILLA ISSUE THIS IS A FLASH ISSUE -- NOT MOZILLA] → [Flash][NOT A MOZILLA ISSUE THIS IS A FLASH ISSUE -- NOT MOZILLA]
Target Milestone: mozilla1.0.1 → mozilla1.0.2
This patch makes sure there is at least 33mS between each call to flash.

note: this (hack) only supports one flash per window
Attachment #90109 - Attachment is obsolete: true
Attached patch same hack moved to nsWindow (obsolete) — Splinter Review
Here's some changes I've been working on to the platform-specific widget
Windows code to do the same kind of throttle hack as the last few patches. It
doesn't quite work very good, but nsWindow is probably a more likely place to
implement some kind of subclassing as there is already code that supports it. I
just publicly exposed it and also added some code from bug 128127. Then I gave
plugin windows their own ProcessMessage routine. Some of the globals still need
to be moved into the class. The globals may be why this patch also does not
solve the problem of several windows open to the macromedia.com page. Kind
funny that this also has the side effect of indirectly fixing bug 38484. This
is a hack and not a real patch.
> Created an attachment (id=91206)
> same hack moved to nsWindow

Cool!

My current opinion is that we have 3 problems:

  1) Flash draws even when its drawing area is not visible.

  2) Mozilla has a single event loop which give priority to WM_USER events
  
  3) Moz should gracefully manage plugins that request service too often

Throttling plugins to some upper limit (say 30-50 per second) is a
good defensive strategy for moz. We should probably this add to moz. But I
want to point out that even with throttling I could still get the CPU 
load up to 100% by opening multiple windows with Flash running.

When the plugin load is 100% moz should still allow the user to use
the window close button (the "X" button) to get rid of offending windows.
Moz will probably also need to re-prioritize its event queue or have
multiple event queues.

However, It is my current opinion that the really critical problem is that
Flash continues to draw even when it drawing area is not visible (as Jon
pointed out in comment #42).

If this were addressed then I believe the chance of seeing the other 
problems (#2 and #3) diminish significantly.
Hm...I think we could do a better job of hiding the plugin too. This may help
bug 116766.

What we currently do is rely on the OS when a parent window is hidden. For
example, on a tab switch, nsWindow::Show(PR_FALSE) is called on the document's
widget. :ShowWindow(mWnd, SW_HIDE); is called on that window, but it doesn't
appear that that children are ever notified.

A possible hack could be going through the children when nsWindow::Show is
called and notifying (or possibly hacking the size) of windows that are plugins.
That last patch helps a little bit in that it allows for nsWindow to detect
plugins by eWindowType_plugin.

Of course, this does not apply to top level windows which is where the problem
appears to most manifests itself.
simple flash that has low drawing load
use this to see the load from the servicing the event loop
Removing minus from topembed to reconsider for embeddin
Keywords: topembed-topembed
My belief is that this was nsbeta1- and topembed- because the fix for this needs
to come from Macromedia in a new drop of Flash (i.e. 3rd party issue, not a
Mozilla issue). Returning the -, per my chat with Mike.
Keywords: topembedtopembed-
Whiteboard: [Flash][NOT A MOZILLA ISSUE THIS IS A FLASH ISSUE -- NOT MOZILLA] → [Flash][NOT A MOZILLA ISSUE THIS IS A FLASH ISSUE -- NOT MOZILLA][PL2:NA]
Alias: 100%CPU
The bug also occurs when you visit http://news.sina.com.cn , the biggest
chinese news site that has lots of flashes.
On my 500MHz PIII displaying http://news.sina.com.cn with today's branch and 
trunk builds with Flash6 r40 the CPU stays at 100%.

Using the experimental Flash6 r40 with the "non visible" fix the CPU is in the
60% range.

IE 6 seems to have problems displaying this page.
IE6 have no problem in displaying the http://news.sina.com.cn. I browse it
everyday with IE6 and lots of people in china visit the site using IE.
I don't know why, but the Flash IE uses (ActiveX) does not spend CPU rendering
Flash that is not visible. The current Flash that gecko uses (Netscape plugin)
does and thus pages like this use lots of CPU that is unnecessary. MacroMedia
has been contacted about this and we should see a fixed version in the near
future. Once this new version is out then we should retest to see what issues
exist then.
reassign to me since this is a vendor issue
Assignee: av → beppe
Whiteboard: [Flash][NOT A MOZILLA ISSUE THIS IS A FLASH ISSUE -- NOT MOZILLA][PL2:NA] → [PL2:Vendor]][NOT A MOZILLA ISSUE THIS IS A FLASH ISSUE -- NOT MOZILLA]
Target Milestone: mozilla1.0.2 → Future
Blocks: 139820
Blocks: 38484
I found that a small change to the event queue management made the buttons
responsive. There was a secondary problem and I ended up talking to danm.
He pointed me to a patch by hyatt that exactly takes out the code that I find
helps make the buttons responsive. 

  3.41 hyatt%netscape.com Oct 16 2001  Fix for 97805, also fixing event
                                       prioritization problems on Win32

http://bonsai.mozilla.org/cvsview2.cgi?diff_mode=context&whitespace_mode=show&fi
le=nsAppShell.cpp&root=/cvsroot&subdir=mozilla/widget/src/windows&command=DIFF_F
RAMESET&rev1=3.40&rev2=3.41

Danm said hyatt did this to get a 10% performance improvment.
Yes, I remember this change. I had it measured at 7.5%, but 10% is the fish 
story version of that number :-). I'll give this another spin and see what 
the current number looks like.
So I just ran a couple of tests with and without that old change to remove the
PeekMessage for WM_USER-1. Whereas before, in Oct 2001, this made a big change
in pageload results, now, at least on win2k, I am not seeing a significant 
difference.

I probably need to try this out on win98 to see if there is a difference on 
that OS. I probably need to sanity check my tests too, since I'm puzzled. But
maybe this is no longer an issue (e.g., something else changed that made this
change moot?). [hyatt: I know you no longer remember anything about Win32, 
but maybe you have a comment].

[I should note that I'm running the current trunk build, but I also have 
pending changes from bug 144533 that substantially reduce the number of 
progress notifications that arrive in the front-end code.]
This patch prioritizes the minimize/maximize/close button so the respond even
when
flash has the CPU at 100%. It could be reduced to just test for the single
event 
WM_NCLBUTTONDOWN.
If the code includes WM_PAINT in the prioritization then I see the throbber 
blinking between its final or static background and the transitional images; 
eg: prioritizing this "::PeekMessage(&msg, NULL, 0, WM_USER-1, PM_REMOVE)", or 
any range that includes WM_PAINT seems to cause the throbber to blink.

the patch attachment 93671 [details] [diff] [review] helps but when flash is using 100% CPU and the user 
tries to move the window moz hangs in ::DispatchMessage(&msg); where it appears
to be reading the event queue in the natural (unsorted/unprioritized) order.

As long as Flash use PostMessage their events will have priority over keyboard
and mouse. This is the result of Microsoft's design choice to always handle
posted event before user/hardware events. The theory is that 2 hardware events,
eg: keydown followed by keyup, can generate multiple posted events; eg: keydown,
send char, keyup, which should be processed before the hardware events.
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/winui/winui/windowsuserinterface/windowing/messagesandmessagequeues/messagesandmessagequeuesreference/messagesandmessagequeuesfunctions/peekmessage.asp

As far as I can tell, as long as Flash uses PostMessage (or SendMessage) and not
a timer event, moz will always be subject to hangs when Flash is using 100%
CPU.
Note: bug 157144 is related to this. Bug 157144 covers the case where Mozilla
posts WM_APP events for notification of PLEvents which are used for reflow,
timers, network data available, etc. When Mozilla posts these events during the
load of long documents the UI will freeze if the user tries to move the top
level window or resize it.
 
Blocks: 151917
we can fix this by have a window proc that checks if user input is pending is if
it is then delays the plugin by putting it on a timer with a delay of 0.

PRBool nsWindow::ProcessPluginMessage(UINT msg, WPARAM wParam, LPARAM lParam,
LRESULT *aRetValue)
{
    *aRetValue = 0;
    // check if user input is being starved
    // input waiting so delay flash
    if (HIWORD(GetQueueStatus(QS_INPUT))) {
      if (msg == WM_USER+1) { // need to make this generic for all plugins
        plugin_wparam = wParam;
        plugin_lparam = lParam;
        //printf("delay Flash\n");
        return ::SetTimer(mWnd, NS_EVENT_THROTTLE_TIMER_ID, 0,
(TIMERPROC)EventTimerThrottleCallback);
      }
    }

    nsWindow *someWindow = GetNSWindowPtr(mWnd);
    if (someWindow)
       *aRetValue = ::CallWindowProc((WNDPROC)someWindow->GetPrevWindowProc(),
someWindow->mWnd,  msg, wParam, lParam);

    return true;
}
FYI: I'm working on a more complete solution over in bug 157144.
www.sina.com.cn fails to draw some of the flash areas if I delay these
messages: 1121, 1122, 1123
Attached patch WIP patch (obsolete) — Splinter Review
This still needs some work:

I still need to find out what Flash is doing with message IDs 1121, 1122, 1123.


This patch crashes if the Flash plugin is missing but this appears to be in the

code carried over from attachment 91206 [details] [diff] [review].
Attachment #91037 - Attachment is obsolete: true
Attachment #93671 - Attachment is obsolete: true
Attachment #96096 - Attachment is obsolete: true
I've opened bug 164084 to discuss whether we should remove the PeekMessage code.
Please cc: yourself if interested.
I find that with attachment 91206 [details] [diff] [review] and no Flash plugin moz crashes when I try to
go to http://www.sony.com.cn/home/index.htm
Attached file sample java applet
moz crashes displaying this applet
Just to mention another web site with slooooow performance on Mozilla (1.0.1
RC2), Shockwave Flash 6.0 r47 :
http://www.citroen.fr

(the site is horrible even on IE, but this is not the point !)
I think it is related to this bug too.
yes, the http://www.citroen.fr site can use 100%CPU and this WIP patch allows
gecko
to continue to respond to user input.
Attachment #91206 - Attachment is obsolete: true
Attachment #96259 - Attachment is obsolete: true
Should the owner for this bug be reassigned to Peter or bstell, as they are
actively working on the issue?
Here's a patch to only the plugins module that will put any WM_USER+1 messages
sent to the plugin's window on a normal Gecko PLEvent, like is used other
actions such as layout reflow and timers. On my 1.4 Ghz, I get slightly better
response from the UI on http://www.overclockmaniak.it/ with my patch than with
Brian's but I need to test on a slower system. Since this patch combines the
priority of both Flash's events with ours, it's a bit more difficult to lock us
up than before however it's still possible to have many Flash plugin instances
overwhelm us with events. Also this is quite touchy as when I rescheduled other
events besides WM_USER+1, Flash completely stopped working. I think the real
fix needs to come from Macromedia. This patch is also still a work in progress
as there are still remaining issues to be resolved but please try it out.
Peter, didn't we see the other day how a plugin steels our subclassing? Was it
with previous version of Flash?
Jamie: ultimately, this issue needs to be resolved by the plug-in vendor. Any
hack that we provide will truly only mask the issue, not resolve it. So,
regardless if someone is working on this or not, or if a patch is checked in or
not, this bug needs to remain open until such time as the vendor supplies the
resolution. Keeping this one assigned to me is fine, I am tracking vendor
related issues.
Another site using Flash which drive CPU to 100%: http://disney.com.cn
Attached patch revised patch using PLEvents (obsolete) — Splinter Review
Attachment #97862 - Attachment is obsolete: true
Attachment #98279 - Attachment is obsolete: true
Attachment #98535 - Attachment is obsolete: true
Attached patch patch v.11 (obsolete) — Splinter Review
This is slightly modified previous patch.

- I think using hash table to create window/instance associations is a bit
overkill here, we can do that using Win32 API
- I put winproc code in a separate file just to keep thing cleaner
- SetProp/GetProp API calls are not supported in Windows CE, is it going to be
a problem? From the other hand I found them elsewhere in the Mozilla code.
I worked with Rick Potts a bit today and he noticed that flash runs much faster
on Navigator compared to gecko.
Brian's last comment suggests that we have to be careful as we declare issues as
Macromedia issues (and NOT Mozilla issues) vs. issues we find are legitimately
Mozilla issues.  Brian stopped by and showed me how N7's Flash performance
varies distinctly from Netscape Communicator 4.x, which displayed Flash
animations working faster than on N7.
So perhaps this bug does indeed boil down to an issue which we can, at some
point, hand off to the plugin vendor. But a *separate bug* should be logged in
which we dig a bit deeper into our performance concerns.  I guess I'm looking to
Brian to log this bug, and to mention that bug number here for reference.
Is the following URL affectected by this precise bug?

Pay attention, I loose the control of Mozilla when I use it!

Patrick

---

http://www.speed.be/ADSL/resultat_adsl.asp
the http://disney.com.cn site has a 100% load from java and does not show a
UI lockup
Patrick, yes, running a build built with one of the last few patches does allow
UI responsivness for me on this URL:
http://www.speed.be/ADSL/resultat_adsl.asp
Without the patch, the page doesn't even complete loading for me before the hang.

----
I really like the approach Andrei took in attachment 98665 [details] [diff] [review] in that it creates a
new class for our platform specific plugin window code which is currently #ifdef
sprinkled throughout.
My concern with using the current PLEvent starvation code is that the paint 
starvation time of 750 mSec (3000 mSec for win9X) makes using menus very sluggish 
and cascading menus do not cascade at all since the PLEvent starvation code does 
not test for timer starvation.

I do not find that moz hangs when displaying 
http://www.speed.be/ADSL/resultat_adsl.asp but the startup time is very long.
s/I do not find that moz hangs/With these fixes I do not find that moz hangs/
*** Bug 151750 has been marked as a duplicate of this bug. ***
*** Bug 168197 has been marked as a duplicate of this bug. ***
We have kind of two separate tasks here: subclassing plugin window and message
handling. I think we should benefit from subclassing anyway and must take more
carefull approach here, the current patches limit us to 4x plugins only. After
talking to Peter who started to look at it, I will continue to try to find the
better way of implementing this. Meanwhile, message handling experimenting may
continue with the existing patch as it is more or less independent thing, and
whatever is  found the best eventually will be combined with whatever
subclassing approach Peter and I may come up with.
We should be able to solve this problem by rescheduling the WM_USER events as
PLEvents after I land the fix for bug 164931.  I am planning on making the
plevent native notification use timers on WIN32 when a document is not loading.
Using the patch in bug 164931 and rescheduling the WM_USER events as PL_Events
cut the test case in this bug from 100% to 90% in a Debug build and the UI
remained interactive while flash was running.

The patch in bug 163528 should also make Mozilla more interactive. It forces the
synchronous paint of any pending paint events before processing mouse or
keyboard input
I've also experienced this using the final 1.1 version.
You can "predictably" get the (almost) 100% CPU when visiting www.theinquirer.net

Also, sometimes "No reason" is clear (ie. no flash animations running or visible).

Usually "just waiting" brings back the UI, had to kill the app only on a couple
of ocasions
Painting with the latest patch occurs much more smoothly.
Attached patch patch v.13 (obsolete) — Splinter Review
This patch utilizes Peter's idea of subclassing the plugin window from object
frame rather than plugin instance code, because we don't have an access to the
plugin instance code in general case.

- code moved to layout
- new object |nsPluginNativeWindow| introduced, it is derived from
|nsPluginWindow| struct and has pure virtuals to manipulate native window stuff

- subclassing now happens for all plugins, although all the messages are just
relayed to the original winproc, except for WM_USER+1 for Flash when starvation
occurs
- event handling logic was just taken from the latest Brian's patch, Brian
please take a look to make sure I did not break anything.
Attachment #98665 - Attachment is obsolete: true
In general I do not care exactly how this solved. I'm okay with using the PLEvent
code for delaying the messages. There are several issues I do care about.

Avoiding unneeded malloc's
==========================
The patches I have seen so far put all Flash messages on the PLEvent queue. This 
causes us to use a malloc'd class to hold the event info. In general I prefer to 
avoid malloc'ing when not necessary. For Flash messages which are projected to 
come in at 120 messages per second per Flash area I really would strongly prefer 
not to malloc. For example, the www.sina.com.cn page today has 9 Flash animations
which if there were 120 messages per second would mean 1080 malloc's per second. 
I believe that to avoid malloc'ing for every message we  will need to test for 
starvation locally and only defer the message when starvation is happening.
Perhaps we can reuse the PLEvent starvation logic locally.


Relatively Prompt Painting
===========================
Bug 164931 may allow us to use a much shorter paint starvation time. Since page 
layout is a one time event we probably want a relatively long time so we do not 
paint too often. This optimizes layout speed. If menus are sluggish during page 
layout that seems okay since this will not last long. Flash animations are quite 
different in that they are a continuous activity so we need prompter painting so 
menus are usable. The current 3 second starvation for Win9x seems far too long 
for menus to be usable. 


Timer Starvation Detection
==========================
If we use PLEvents we will need to add timer starvation logic so cascading menus 
will work.
Blocks: 169071
"Bug 164931 may allow us to use a much shorter paint starvation time."

When the fix for bug 164931 is checked in *all* PL_Event notification will be
done with a timer except when a document is loading. The input and paint
starvation limits will only apply when a page is loading.  All painting will
complete before the next PL_Event notification is processed. There will not be
any paint starvation.  If I check in the fix for bug 164931 and we continue to
use posted WM_APP events for flash after the page is loaded flash will be able
to starve out Mozilla layout, paint and timer events.  
"Avoiding unneeded malloc's"

this is probably an issue for Mozilla timers as well since they are implemented
as a separate thread which posts a PL_Event when the timer expires. DHTML
animations often have at least one timer and some have many timers running
simultaneously. Maybe we need to create an arena for PL_Events?

"Relatively Prompt Painting"

The fix for bug 164931 should eliminate this issue for flash animations after
the page has loaded. Paints should not be starved for flash or Mozilla after the
page has loaded since the timers for PL_Event notification will not fire until
the paints have been processed

"Timer Starvation Detection"

The fix for bug 164931 should also eliminate this issue for flash animations
after the page has loaded. 


If we decide not to fix this issue by converting the WM_USER flash user messages
to PL_Events I would suggest always converting the WM_USER flash messages into
timer events instead of doing it only when timers, paint, and input are starved.
 This will significantly lower the CPU utilization on pages with flash animations.
Per Peter's suggestion I made a version that adds a PluginWindowEvent as a member 
of the ns4xPluginInstance window to avoid the malloc per message. The patch 
always puts Flash messages on the PLEvent queue. I also applied the patch from 
bug 164391 which makes the UI responsive even when running 
http://www.overclockmaniak.it/.

I do note that with the patch from bug 164391 that the CPU never goes over about 
80% on the 1.8 GHz system I am developing on and the Flash animations seem a bit 
slower.

I heard that Andrei is working on a revision. If desired I could merge this into
his new version.
the previous message should read: "bug 164931"
Taking it back since I am one of those who is working on it.

Brian, Peter, are we decided on what approach to take eventwise? Brian, you may
post your latest version using any of the last three patches since the event
code is pretty distinct and I can easily incorporate it to whatever subclassing
code I will come up with.
Assignee: beppe → av
I think the problem of always mallocing PLEvents is one beyond the scope of this
bug. I've opened bug 169247 to investigate using areans for PLEvents throughout
the code.
Andrei: here is a version that uses the PLEvent queue and only malloc's once.
The patch in bug 164931 is also required to actually prevent UI starvation.
Attachment #98957 - Attachment is obsolete: true
Depends on: 164931
Attached patch patch v.15 (obsolete) — Splinter Review
This patch combines the latest from event handling (the very previous patch)
and subclassing. I think it is complete enough and is in working condition. I
tested it with Flash, the default plugin, scriptable sample, Java and Real.
Please review.
Attachment #99307 - Attachment is obsolete: true
Attachment #99558 - Attachment is obsolete: true
*** Bug 160775 has been marked as a duplicate of this bug. ***
Attached patch patch v.16 (obsolete) — Splinter Review
Some Peter's comments from our conversation addressed:

- default native window stubs for other platforms are now common and reside in
one file
- full page mode is added to the mechanism
- |nsIPluginInstance->SetWindow| call is now encapsulated in
|nsPluginNativeWindow->CallSetWindow|
- undoing subclassing is now hidden in the dtor

Please review.
Attachment #99611 - Attachment is obsolete: true
Andrei: I think we should add code to remove any pending events when destroying
the window. I believe it should look like this:

 nsPluginNativeWindowWin::~nsPluginNativeWindowWin()
 {
+  UndoSubclassAndAssociateWindow();
 }

 PluginWindowEvent*
 nsPluginNativeWindowWin::GetPluginWindowEvent(HWND aWnd, UINT aMsg, WPARAM
aWParam, LPARAM aLParam)
 {
   if (!mPluginWindowEvent)
     mPluginWindowEvent = new PluginWindowEvent;
 
   if (mPluginWindowEvent) {
     mPluginWindowEvent->Init(aWnd, aMsg, aWParam, aLParam);
-    PL_InitEvent(mPluginWindowEvent, null, &PluginWindowEvent_Handle,
PluginWindowEvent_Destroy);
+    PL_InitEvent(mPluginWindowEvent, this, &PluginWindowEvent_Handle,
PluginWindowEvent_Destroy);
   }
   return mPluginWindowEvent;
 };




 nsresult nsPluginNativeWindowWin::UndoSubclassAndAssociateWindow()
 {
   HWND hWnd = (HWND)window;
   if (IsWindow(hWnd))
     ::RemoveProp(hWnd, NS_PLUGIN_WINDOW_PROPERTY_ASSOCIATION);
 
+  // clear any pending events to avoid dangling pointers
+  nsCOMPtr<nsIEventQueueService>
eventService(do_GetService(kEventQueueServiceCID));
+  if (eventService) {
+    nsCOMPtr<nsIEventQueue> eventQueue;  
+    eventService->GetThreadEventQueue(PR_GetCurrentThread(),
getter_AddRefs(eventQueue));
+    if (eventQueue) {
+      eventQueue->RevokeEvents(this);
+    }
+  }
+  if (mPluginWindowEvent) {
+    delete mPluginWindowEvent;
+    mPluginWindowEvent = NULL;
+  }
+
   return NS_OK;
 }
Kevin: does this look correct to you?
Attached patch patch v.17 (obsolete) — Splinter Review
- event leak fixed, my fault, thanks, Brian for helping with this one
- some minor perfections added to nsPluginNativeWindow base class and its
derivative for Windows

Please review.
Attachment #99734 - Attachment is obsolete: true
Except for the fix to the assertion this looks fine to me.

-  NS_ASSERTION(!win, "plugin window already has property");
+ NS_ASSERTION(!win || (win == this), "plugin window already has property and
this is not us");

I feel competent to review the event queue part, so
r=bstell@ix.netcom.com on the event queue part.

Perhaps Peter would r= the subclassing part.

Thanks for working on this!
Attached patch patch v.17.1 (obsolete) — Splinter Review
Very minor change to patch v.17

- fixed assertion in |nsPluginNativeWindowWin::SubclassAndAssociateWindow|
after ::GetProp
- made property name a bit more unique
Attachment #99749 - Attachment is obsolete: true
Comment on attachment 99762 [details] [diff] [review]
patch v.17.1

Carrying over r=bstell on the event part.
Comment on attachment 99762 [details] [diff] [review]
patch v.17.1

r=peterl
Attachment #99762 - Flags: review+
am testing av's special build: 
ran thru flash, qktime, real shockwave, viewpoint, acrobat, still testing.

found one crasher on cbsnews.com (video url on home page, click to view..and 
then while the video is playing, close the window  using 'x'..browser crashes). 
This is using real1 and real8 and happens only on this special build. 
same thing happens on cnet radio live(cnet.com ,click on 'Listen Live' on 
bottom right). Confirmed with Beppe.
 
will update once again after finishing my tests.
I'm seeing the same thing Shrir is seeing but only with the build found here:
http://warp.mcom.com/u/av/publish_html/mozilla-100CPU.zip
If I apply attachment 99762 [details] [diff] [review] to my debug trunk build, I do not crash. I'll double
check the optmized build I built with this patch at home later.
I am also testing av's build. Tested flash, qt, java, and real1. I had up to 10
different windows open each with at least one active plug-in (some had several
instances). I did not experience reduced performance, the app did not lock up
either. I ran through most of the URL's referenced in this bug as well as the
macromedia site, the quicktime site, bmwfilms, aim.com, izzy-sings.com. It's
looking great.

I was able to repro the crash that Shrir describes, but could not repro the
crash on the current trunk build.
and WMP too -- forgot to mention that one
Well, I don't build optimized frequently, so maybe I did not do it right way.
Could have I missed something? I specified MOZCONFIG with the following three
options:
  ac_add_options --enable-crypto
  ac_add_options --disable-debug
  ac_add_options --enable-optimize
Then updated the tree, clobbered and rebuild. Zip only contains /bin folder content.
We need to make sure that this is not like my local build is spoiled. I am
rebuilding with symbols now, would be nice if somebody can rebuild optimized
with the patch on the fresh tree.
Comment on attachment 99762 [details] [diff] [review]
patch v.17.1

- In nsObjectFrame.cpp:

+  nsCOMPtr<nsIPluginInstance> inst;
+
   // we need to finish with the plugin before native window is destroyed
   // doing this in the destructor is too late.
   if(mInstanceOwner != nsnull)
   {
-    nsIPluginInstance *inst;
-    if(NS_OK == mInstanceOwner->GetInstance(inst))

Why move the declaration of |inst|? Is it used in a broader scope now? Also:

-	   inst->SetWindow(nsnull);
+	   window ? window->CallSetWindow(nullinst) : inst->SetWindow(nsnull);

For readability's sake I'd suggest you rewrite the above with an if statement,
i.e.:

+	   if (window) {
+	     window->CallSetWindow(nullinst);
+	   } else {
+	     inst->SetWindow(nsnull);
+	   }

- In nsPluginInstanceOwner::GetWindow():

 {
-  aWindow = &mPluginWindow;
+  aWindow = mPluginWindow;
   return NS_OK;
 }

Should we assert that mPluginWindow is not null here? Before this change a
caller never got null back from this method, now they could possibly get null
back...

- In nsPluginHostImpl.cpp:

+    rv =aInstance->GetPeer(getter_AddRefs(peer));

Be consistent with space around '=' (I know you didn't introduce this
inconsistency here, but clean it up while you're at it).

- In nsPluginNativeWindow.h:

+#ifndef _nsPluginWindow_h_
+#define _nsPluginWindow_h_
...
+#endif //_nsPluginWindow_h_

Those should all be _nsPluginNativeWindow_h_

- In nsPluginNativeWindow.cpp:

+class nsPluginNativeWindowPLATFORM : public nsPluginNativeWindow {
+public: 
+  nsPluginNativeWindowPLATFORM();
+  ~nsPluginNativeWindowPLATFORM();
+};

Since nsPluginNativeWindow has virtual methods, you'll need to mark the
destructor in this class as virtual too.

+nsPluginNativeWindowPLATFORM::nsPluginNativeWindowPLATFORM() 
+{

Don't you want to call nsPluginNativeWindow's ctor here too (for clarity and
consistency, if nothing else)?

- In nsPluginNativeWindowWin.cpp:

+static NS_DEFINE_CID(kEventQueueServiceCID, NS_EVENTQUEUESERVICE_CID);
+static NS_DEFINE_IID(kCPluginManagerCID, NS_PLUGINMANAGER_CID); // needed for
NS_TRY_SAFE_CALL

Don't use NS_DEFINE_IID for defining CID's, use NS_DEFINE_CID in both those
places.

+class nsPluginNativeWindowWin : public nsPluginNativeWindow {
+public: 
+  nsPluginNativeWindowWin();
+  ~nsPluginNativeWindowWin();

Mark this dtor virtual too.

+private:
+  nsresult SubclassAndAssociateWindow();
+  nsresult UndoSubclassAndAssociateWindow();
+
+public:
+  nsresult CallSetWindow(nsCOMPtr<nsIPluginInstance> &aPluginInstance);

Mark this virtual too, for clarity if nothing else.

+
+public:
+  // locals
+  WNDPROC GetWindowProc();
+  PluginWindowEvent * GetPluginWindowEvent(HWND aWnd, UINT aMsg, WPARAM
aWParam, LPARAM aLParam);
+
+private:
+  WNDPROC mPluginWinProc;
+  PluginWindowEvent *mPluginWindowEvent;

What's up with all those public and private keywords? :-) Can't you reorder
things a bit so you only need one of each?

- In ProcessFlashMessageDelayed()

First of all, this method returns an nsresult and based on that result we
decide if the message was processed or not. That's IMO not clean, nsresults
have special meaning, and since we don't use the return value for anything
other than to know if the message was processed or not, don't use an nsresult.
How about using a PRBool here in stead? I.e. the method returns PR_TRUE if the
message was processed and PR_FALSE if it wasn't processed.

Second, we do a fair bit of work here if we're processing a WM_USER+1 message.
We should at the very least use a boolean to cache if the plugin is flash or
not (i.e. have a boolean in the native window or somesuch, or set a property on
hWnd) so that we don't need to get the plugin mime type for every message and
do a string compare.

Additionally, we should really also cache the nsIEventQueueService on the
plugin window so that we don't need to hit the service manager every single
time we go through this code (which can be really frequently).

Also, the code:

+    nsCOMPtr<nsIEventQueue> eventQueue;
+    eventService->GetThreadEventQueue(PR_GetCurrentThread(),
getter_AddRefs(eventQueue));

should be changed to:

+    nsCOMPtr<nsIEventQueue> eventQueue;
+   
eventService->GetSpecialEventQueue(nsIEventQueueService::UI_THREAD_EVENT_QUEUE,
getter_AddRefs(eventQueue));

Since that call is faster (AFAIK) than the GetThreadEventQueue. We're
quaranteed to always be on the UI thread here, right?

Then:

+      PluginWindowEvent *pwe = win->GetPluginWindowEvent(hWnd, msg, wParam,
lParam);
+      if (pwe) {
+	 eventQueue->PostEvent(pwe);
+      }

GetPluginWindowEvent() always returns the same event, are we guaranteed that
the event is not already in use? I don't see any code that guarantees that, and
I don't feel good about making the assumption that flash will never post a
WM_USER+1 message before the last one has been processed. Or did I miss
something here?

- In nsPluginNativeWindowWin::GetPluginWindowEvent():

+  if (!mPluginWindowEvent)
+    mPluginWindowEvent = new PluginWindowEvent;
+
+  if (mPluginWindowEvent) {
...
+  return mPluginWindowEvent;

You can avoid the double check on mPluginWindowEvent (in the case when there is
an mPluginWindowEvent) if you'd write the above as:

+  if (!mPluginWindowEvent) {
+    mPluginWindowEvent = new PluginWindowEvent;
+
+    if (!mPluginWindowEvent) {
+      return nsnull;
+    }
+  }
...
+  return mPluginWindowEvent;

And this code really should make sure that it doesn't return an event that's
already in use. Maybe the code that fires this event can null out
mPluginWindowEvent when it fires the event and then once the event is destroyed
mPluginWindowEvent could be set to point to the event again, unless it already
points to another event in which case you'd just free the memory used by the
handled event.

- In PLUG_NewPluginNativeWindow():

+  *aPluginNativeWindow = new nsPluginNativeWindowWin();
+
+  return *aPluginNativeWindow ? NS_OK : NS_ERROR_FAILURE;

Use NS_ERROR_OUT_OF_MEMORY to flag OOM, not NS_ERROR_FAILURE.

- In nsPluginViewer.cpp:

     }
   }
-
-	
	return NS_OK;
 }

Please clean up that weird indentation. Remove the tab(s).

- In pluginInstanceOwner :: GetWindow():

 {
-  aWindow = &mPluginWindow;
+  aWindow = mPluginWindow;
   return NS_OK;
 }

Again, should we assert if mPluginWindow is null?

Other than that, the changes look reasonable to me, but I heard rumors about a
crash, so I'll hold my sr for now, and the above needs some fixing too...
Attachment #99762 - Flags: needs-work+
I think I know why we crash. We don't want to subclass after we called
SetWindow(null). Patch is coming.
This fixes real crash on sbsnews. Please tell me if it looks reasonable.
Johnny, thanks for great comments. Brian and me will address all of them in the
incoming patch. I just have a couple of things on which I disagree:

> Since nsPluginNativeWindow has virtual methods, you'll need to mark the
> destructor in this class as virtual too.

Destructor of the base class is marked virtual. According to MSVC help "The
virtual keyword is needed only in the base class's declaration of the function;
any subsequent declarations in derived classes are virtual by default." Is not
this standard? Also, is not the destructor always virtual? I will change it but
just wanted to know for sure for the future.

>+nsCOMPtr<nsIEventQueue> eventQueue;
>+eventService->GetSpecialEventQueue(nsIEventQueueService::UI_THREAD_EVENT_QUEUE,
getter_AddRefs(eventQueue));
>
> Since that call is faster (AFAIK) than the GetThreadEventQueue. We're 
> quaranteed to always be on the UI thread here, right?

No. Java is not in the UI thread AFAIK, and being in our thread or not is up to
plugins themselves, this is not under our control. I would not change this
unless somebody convinces me.
Attached file WIP: rework the event code (obsolete) —
Attachment #99950 - Attachment is obsolete: true
Attached patch patch v.18 (obsolete) — Splinter Review
- crash on closing plugin window is fixed
- some clean up and effectiveness is done, thanks, Johnny for comments!
- event handling logic, when new event comes before old is handled, fixed,
thanks, Brian

Please review.
Attachment #99762 - Attachment is obsolete: true
Attachment #99923 - Attachment is obsolete: true
1) could we change:
-  PRBool InUse() { return !(mWnd==NULL && mMsg==0 && mWParam==0 && mLParam==0); };
+  PRBool InUse() { return (mWnd!=NULL || mMsg!=0); };

2) since we use "(mWnd!=NULL || mMsg!=0)" to tell that the event is in use
can we add an assertion

 void PluginWindowEvent::Init(HWND aWnd, UINT aMsg, WPARAM aWParam, LPARAM aLParam)
 {
+  NS_ASSERTION(aWnd!=NULL && mMsg!=0, "invalid plugin event value");
   NS_ASSERTION(mWnd==NULL && mMsg==0 && mWParam==0 && mLParam==0,"event already
in use");

 PluginWindowEvent*
 nsPluginNativeWindowWin::GetPluginWindowEvent(HWND aWnd, UINT aMsg, WPARAM
aWParam, LPARAM aLParam)
 {
   PluginWindowEvent *event;
   if (mPluginWindowEvent.InUse()) {
+    // We have the ability to alloc if needed in case in the future some plugin
+    // should post multiple PostMessages. However, this could lead to many
+    // alloc's per second which could become a performance issue. If/when this
+    // is asserting then this needs to be studied. See bug 169247
+    NS_ASSERTION(1, "possible plugin performance issue");
     event = new PluginWindowEvent;
> GetPluginWindowEvent() always returns the same event, are we guaranteed that
> the event is not already in use?

This is a good point. While it is true that currently a given Flash only posts
one at a time this could change in the future. Code to alloc was added in
attachment 99954 [details] [diff] [review]. We should have the assertion mentioned in my comment #167 to
notice if this code is being used so we can check for performance issues.
used the newer special build:
stuff seems great, the crasher is gone, other plugins work fine too.
However, another issue came up...

steps:
1. go here : 
http://www.mozilla.org/quality/browser/front-end/testcases/plugins/ra5.html 
2. observe that a dialog comes up saying 'real network is being contacted...'  
which ultimately comes back saying  'nothing was found to play this clip' (.ram 
which runs just fine
on 4.79) 
3. Click CANCEL on this dialog and click BACK to go back to the previous page... 

Result : Cpu goes to 100% and stays there...previous page loads after a big 
delay but cpu is still 100% and UI navigation is impossible. 
Av, yes, if a base class' method (including the destructor) is marked virtual
the virtual keyword is implied on any overridden methods, but for clarity it's
always good to put the keyword in there anyways. And in your case, the base
class' destructor is virtual so you're ok, at least kinda. You're out on thin
ice here though since nsPluginNativeWindow (which is the class that has the
destructor that's actually marked virtual) inherits from nsPluginWindow which is
a struct that doesn't have a destructor, let alone a virtual one. This causes
weird things to happen, depending on the compiler you're using, some cases are
worse than others.

Independent of the compiler, if someone was to delete one of these
nsPluginNativeWindows objects through an nsPluginWindow pointer w/o doing the
proper cast, the nsPluginNativeWindow's destructor wouldn't be called. Same goes
for the destructors on any other classes that inherits nsPluginNativeWindow, of
course.

Using MSVC, you'd even crash if you'd do the above. The reason for this is that
when you instanciate a new nsPluginNativeWindow the MSVC compiler will put the
vtable for the virtual methods in the first 4 bytes of the chunk of memory
allocated for this new object. This means that the nsPluginWindow is at a 4-byte
offset from the start of the memory that was allocated, and thus, if you cast
this object to a nsPluginWindow you'll get a pointer to the nsPluginNativeWindow
+ 4 bytes, and delete (and internally free) won't be able to free that memory
(due to the 4-byte offset). Using gcc, you'd be ok since it seems to put the
vtable at the end of the memory allocated for a class, but on windows you're
kinda screwed.

This is fragile, at the very least. I assume we can't change nsPluginWindow any
more, right? If we could, this whole mess could be solved by adding a virtual
destructor to the nsPluginWindow struct, but w/o that we're stuck. The other
option which is a bit more work and a bit trickier to start with, but much more
robust once done, is to not mix C-style structs with C++ classes that have
virtual methods. IOW, you'd need to rewrite the nsPluginNativeWindow so that
there are no more virtual methods, you'd need to use function pointers or
something like that to make that code work w/o virtual methods in
nsPluginNativeWindow. You'd need to manually do the cleanup of everything that a
nsPluginNativeWindow holds on to, and that would get kinda messy :-(.

I'm not sure what to suggest here, if you think the code is safe and you know
that a nsPluginNativeWindow will never be deleted through a pointer to
nsPluginWindow, then I guess we can live with that if we choose to accept this
fragility. I guess you could add a protected non-virtual destructor to
nsPluginWindow so that it's impossible to call delete on a pointer to a
nsPluginWindow directly (that would not change the layout of the struct), would
that make sense? (You'd need to wrap that in an |extern "C++" { }| block to
leave the struct usable from C.)

As far as the threading issues goes, you're right, Java and other plugins may
run on other threads, but flash doesn't AFAIK, and the code in question would
never be hit using other plugins. Either way, it's safer to do what you did, and
who knows, maybe flash does, or will some day, run on other threads as well...

Let me know what you think about the above virtual method stuff and if you think
that what you have is what you want then I'll go over this patch once more and
sr it for you.
Johnny, thanks for the clarification! I was not objecting to marking functions
virtual (in fact, I did it in the last patch, v.18), I just wanted to make sure
I understand your motivation. It's perfectly fine with me.

You are right we cannot change nsPluginWindow -- it stems from 2.0 days and
everybody is relying on the way it is now. The good news is nobody except us is
supposed to delete this object, so I think we are fine here as we will do proper
cast as necessary. Adding a protected non-virtual destructor to nsPluginWindow
seems reasonable to me, as it would catch the crash situation you described on
the compilation stage.
*** Bug 162007 has been marked as a duplicate of this bug. ***
I'd be cool with this if we'd add a protected destructor to nsPluginWindow, that
way we won't do the wrong thing in our code...
it appears to be in the real player plugin PNEN3260
Looks like I spoke too soon. We cannot add a destructor to |nsPluginWindow|.
This struct is used as a union member in other structs in nsplugindefs.h.

Johnny, I think we are still fine, because only we in our code will delete it.
If you think is not acceptable, I can rewrite the code so that this SetWindow
call is not encapsulated and we don't derive from |nsPluginWindow| at all. This
will be less clear and not as convenient though. Please let me know.
my last comment, #174, is regarding the infinite loop mentioned in comment 169
http://bugzilla.mozilla.org/show_bug.cgi?id=132759#c169
the real player infinite loop of comment 169 appears (as Andrei mentioned in a 
private email) to happen for RealPlayer 8 but not RealOne.
Attached patch patch v.19 (obsolete) — Splinter Review
- more accurate implementation of subclassing/unsubclassing. This fixed the
reported Real 100% CPU problem
- big comment about potential danger during destruction of the
|nsPluginNativeWindow| object added to nsPluginNativeWindow.h
- latest Brian's additions to event handling incorporated
Attachment #99954 - Attachment is obsolete: true
I have tested the latest test build with the above patch included. I confirm all 
the issues that had been reported have now got fixed(cpu usage issue for real). 
Did extensive functionality testing of flash/scriptable flash, shockwave 
director,quicktime,ipix,viewpoint,realplayer,acrobat (tier 1 & 2 plugins). 
Everything is working fine. 
Attachment #100352 - Flags: review+
Comment on attachment 100352 [details] [diff] [review]
patch v.19

r=peterl
Comment on attachment 100352 [details] [diff] [review]
patch v.19

- In nsPluginHostImpl::HandleBadPlugin():

-    rv =instance->GetPeer(getter_AddRefs(peer));
+    rv =aInstance->GetPeer(getter_AddRefs(peer));

Since you're already touching this code, add a space after '='.

- In nsPluginNativeWindowWin::GetPluginWindowEvent():

+    event = new PluginWindowEvent;

Add () after the type name. I guess it doesn't matter to the compiler, but the
parens are normally there, so for consistency's sake if nothing else...

+    if (event) {
+      event->Clear();

No point in calling Clear() here, the ctor of PluginWindowEvent call Clear()
for you.

+      event->SetIsAlloced(PR_TRUE);
+    }
+  }
+  else {
+    event = &mPluginWindowEvent;
+  }
+
+  if (!event) {
+    return nsnull;
+  }

This last if check should be moved into the if (mPluginWindowEvent.InUse())
check since &mPluginWindowEvent is never null.

sr=jst if you change that when you check in.
Attachment #100352 - Flags: superreview+
We also need to null out the |nsPluginNativeWindow| struct in generic platform
implementation, as Linux checks on some fields before accessing them (ws_infor).
Thanks Serge for pointing this out.
Attached patch patch v.20Splinter Review
- the above comments are addressed
- priming |nsPluginWindow| struct fields with null added to the default
|nsPluginNativeWindow| constructor platform implementations
- added a new method |nsPIPluginHost::DeletePluginNativeWindow| to pair with
|nsPIPluginHost::NewPluginNativeWindow|, this is done mainly because some
compilers don't like to delete objects using pointers to the base class, so we
need to cast it back to platform specific type before deleting.
Attachment #100352 - Attachment is obsolete: true
Comment on attachment 100631 [details] [diff] [review]
patch v.20

carrying over r= and sr=
Attachment #100631 - Flags: superreview+
Attachment #100631 - Flags: review+
Checked in to the trunk. Marking fixed. Please try it heavily.
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
Using today's trunk build (2002-09-26-08-trunk), I get crash consistently when
dragging around the flash pop-up on http://www.ccidnet.com
I'm on Simplified Chinese XP.
http://www.overlookmaniak.it and http://www.yesky.com works for me though with
the same build. The UI lockup problem is gone and CPU usage dropped down quite a
bit for these two URLs.
Using 2002092608 on Win2k
http://www.ccidnet.com hangs mozilla
(or hogs so much CPU that it cannot be used)
Problem with ccidnet.com seems to be yet another issue. In the console output I
see that while loading the page layout keeps creating webshells and gives up
when the number reaches 100. I remember filing a bug on this the other day. This
is likely because bad page coding causes some sort of recursion, e.g.
window.reload in JavaScript will cause this, and layout does not have a
mechanism to handle such a situation protectively.
Here is the similar bug -- bug 126466, it even has the same original url. It is
marked fixed in May. Regressed?
And the problem exists even without Flash. Should we reopen that bug?
I compared the 2002-09-24-08-trunk with 2002-09-26-08-trunk (2002-09-25-08-trunk
fails to install)

http://www.overclockmaniak.it 
  9-24 = hard freeze, no ui response including unable to close with "x"
  9-26 = a bit sluggish but responds to all UI input

http://www.yesky.com
  9-24 = hard freeze, no ui response including unable to close with "x"
  9-26 = a bit sluggish but responds to all UI input


http://www.ccidnet.com
  9-24 = hard freeze, no ui response including unable to close with "x"
  9-26 = this freezes but in an odd way; memory use goes up a down,
         app can be moved around, app appears not to paint completely,
         the "x" button visually responds but instead of closing the
         app a dialog appears saying the app is not responding and asking
         if the app should be ended.
On http://www.ccidnet.com I note that when I move the mouse over a link then the 
flash animations stop.
When I disable flash by removing it from the plugins dir http://www.ccidnet.com
comes up, the flash plugin areas show the missing plugin symbol, and the page has 
a low CPU load.  When I move the mouse over a link then the CPU load goes to
100% and the app freezes (marquee stops, jpgs/gifs stop) but I can still move the
app and the "x" button visually responds but brings up a dialog saying the app is
not responding.
With the 9-24 build if I cover the flash areas the CPU load is about 80% and the
"x" is visually responisive until I move the mouse over a link then the load goes
to 100% and the app stop responding.

I fairly positive that the http://www.ccidnet.com issue is not related to this
bug's checkin.
*** Bug 155001 has been marked as a duplicate of this bug. ***
*** Bug 171057 has been marked as a duplicate of this bug. ***
guess I can mark this bug verif now.
Status: RESOLVED → VERIFIED
*** Bug 154070 has been marked as a duplicate of this bug. ***
***IF*** we were to land this on the 1.0 branch, what other patches do we need?
Do we need the patches for bug 164931 and bug 163528?
Kevin's work in bug 164931 is a companion fix with this bug, so yes it will also
need to go in. I'm not sure of bug 163528, looking at the fix date, it does not
seem to be related, but I will leave that up to av
I think this description is accurate. Bug 163528 is not needed for the patch in
the present bug to work.
*** Bug 179032 has been marked as a duplicate of this bug. ***
*** Bug 179752 has been marked as a duplicate of this bug. ***
Blocks: 507216
(In reply to Henrik Gemal from comment #10)
> http://www.ananova.com/assets/ticketsbanner.swf is also a 100%

> http://www.dailyhosting.net/blog is also a 100%
You need to log in before you can comment on or make changes to this bug.