Closed Bug 271050 Opened 15 years ago Closed 15 years ago

Improve Camino's handling of Gecko Events

Categories

(Camino Graveyard :: General, defect)

PowerPC
macOS
defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED
Camino0.9

People

(Reporter: me, Assigned: sfraser_bugs)

References

()

Details

(Keywords: fixed1.7.6)

Attachments

(5 files)

Camino currently handles gecko events by setting up an NSTimer which fires off every 0.005 seconds 
and checks the gecko event queue. Not only does this make me feel dirty, it's been theorized that this 
is the source of some of Camino's performance problems, particularly although not 
exclusively where flash is concerned.
This patch makes camino's gecko event handling behave like other UNIX
platforms; rather than polling at a timed interval, the strategy is:

1. Events are posted to a pipe.
2. An NSFileHandle is used to monitor the pipe for posted events.
3. When posted events are detected pending events are processed.

Additionally, this is rate-limited, so that events are processed at a maximum
frequency.

In my testing, browsing with this patch "feels" snappier; I notice this the
most on pages that make heavy use of flash.

Could others please look at this patch and test? Specifically, I'd like to
know:

1. Does this patch truly improve Camino's behavior?
2. Does this strategy seem better than the old one?

This patch probably isn't really ready for review yet; specifically, I'd say it
needs some documentation before it can go in. But please test and give your
opinions on whether this is a good idea or not.
I'd be glad to supply a build with this patch applied to anyone who doesn't want to build it. Either post 
to the bug or send me an email requesting a build and I'll put one online.
As an example, CPU usage on the test URL is dramatically less for me with the patch:

http://www.homestarrunner.com/sbemail118.html
Comment on attachment 166655 [details] [diff] [review]
event handling change

>+#define MIN_PLEVENT_INTERVAL_MILLISECONDS  50
>+

I thought consts were prefered over #defines

>-#define MAC_USE_WAKEUPPROCESS
>+#define MAC_USE_UNIX

Why the name change ?
(In reply to comment #2)
> I'd be glad to supply a build with this patch applied to anyone who doesn't
want to build it. Either post 
> to the bug or send me an email requesting a build and I'll put one online.

While I can't currently build, It would be a good idea to post two builds from
the same source - o,e with the patch and one without, then post to the forums
and mailing list about it and have people comment on the mailing list if they
see any difference.

re:comment#5
I build and put this location:
http://caminofreak.hp.infoseek.co.jp/subset/caminol10nJP.html

Flash animation is improved dramatically. Brilliant patch!
(In reply to comment #4)
> (From update of attachment 166655 [details] [diff] [review])
> >+#define MIN_PLEVENT_INTERVAL_MILLISECONDS  50
> >+
> 
> I thought consts were prefered over #defines
> 
You are quite correct. I used that from something similar that was there
before... that said, a define there may in fact be desirable...

> >-#define MAC_USE_WAKEUPPROCESS
> >+#define MAC_USE_UNIX
> 
> Why the name change ?

It's doing something entirely different than it was before. (Now it's using the
UNIX platform specific code instead of the mac code. Previously it was changing
a bit of the mac code based on that flag.)
I canalso confirm that there is an incredible difference in the handling of cpu
intensive sites with the build waverider posted.

Websites that previously would bring my G4 1,25 GHZ to a grinding hault now work
without even showing a beachball cursor on a G3 400mhz. That is what I'd like to
call a HUGE improvement for this browser.

Not that this patch might also fix the bug which caused camino to even use cpu
cycles when all windows wher closed.

I'm typing this message in one window while in the other a pretty heavy flash
movie is playing in another window on the G3 400 mhz iMac. And so I'd like to
nominate this bug to be put in 0.8.2 if we can. This is a major improvement.
(In reply to comment #5)
> While I can't currently build, It would be a good idea to post two builds from
> the same source - o,e with the patch and one without, then post to the forums
> and mailing list about it and have people comment on the mailing list if they
> see any difference.

Excellent idea. I've placed a disk image with the two builds here:
http://www.speakeasy.org/~gbeier/Camino-perftest.dmg.bz2

and am posting to the list and to the forum on mozillazine.org. If you have any other forums in mind, 
please post there :-)
Bug 225397 is a flash performance Meta bug which might be of interest further on
in the development of thi sbug.

I tested both bugs in the meta bug and looked at the url in those bugs that
where giving us trouble. With this patch those websites didn't skip frames
anymore on a G3 400 iMac, sure the mac wen't a little slow but the beachball
cusors didn't even show.
Ok so I really tested this patch and these are my findings:

- we are not much faster than before, in some cases we are even slower than we
where. In some cases I get the feeling flash is held back instead of given full
speed. 

This page for instance: http://yugop.com/ver3/stuff/12/index.html
Is much faster in the build without the patch, but in that build it will skip
frames and hang on slower machines. With the build with the patch the movie
even plays on slow machines but feels way slower. On a fast machine I can see
that camino has a lot of free cpu but doesn't use it to speed up the movie.

- however when a page is very cpu intensive we are MUCH more graceful when it
comes to showing the beach-ball. We just don't and we still keep to be
responsive on mouse click of the user.

- web-page containing multiple flash movies are responding much better. if one
movie is in full motion and you do something in the other movie they both
continue without a hitch. While before we would skip frames.

This page will show that: http://yugop.com/ver3/
Just choose any of the wonderful heavy flash animations and you will see that
while the top movie is playing and the using the menu movie at the bottom in
the patch-less build will result in frame skipping. In the build with patch no
frames are skipped and everything works as it should.

I encountered using a build with a lot of flash (as that is the only good way I
have found to test the patch) is some kind of memory crash for which I attached
the crash log. The crash happens when ShockWave content is being loaded.

I also had some trouble with input boxes f*cking up when I was typing in them
with you patched build.
It seems almost Flash movies becomes snappier, but transfer speed that Speed
Test Site shows becomes much slower.

For example;
  8 Mbps (with New event handling version)
20 Mbps (with Old event handling version)
As someone who's played in the event handling code before, let me take a moment
to describe how things were before, what they are with this patch, and whether
it can be improved on.

Everywhere:
    Events are posted to the main thread via the plevents code. They may
        be posted from other threads.
    The main thread needs to wake up and handle these events in a timely 
        fasion for good performance.

The old world (Carbon/Mozilla):
    Carbon Events are used to notify the main thread that there
        are events waiting to be processed; these get processed at
        WaitNextEvent time.
        
The Camino world (MOZ_WIDGET_COCOA)
    Carbon events were not used (because they don't get processed in some
        run loop modes perhaps), so the plevents code does nothing
        to notify the UI thread that there are events waiting to be
        processed.
    Instead, nsToolkit sets up a timer which polls the main thread's
    event queue to see if there are plevents to be processed.

With this patch:
    plevents used the Unix pipe code, and nsToolkit sets up a notification
    that will get called when there's data on the pipe. A timer is used to
    ensure that plevents are processed at most every 50ms (why?).

In an ideal world, nsToolkit wouldn't have to do anything to get plevents to be
processed (as is the case in the Carbon world). I think plevent.c should, for
Mac OS X, use CFMessagePorts to notify the UI thread that there's work to do.
I've written the code before, and could come up with a patch once my cvs access
gets resolved.
(In reply to comment #13)
> With this patch:
>     plevents used the Unix pipe code, and nsToolkit sets up a notification
>     that will get called when there's data on the pipe. A timer is used to
>     ensure that plevents are processed at most every 50ms (why?).
>
The timer is a misguided attempt to keep Camino's CPU usage down. While it does
that, the increased apparent CPU usage is OK as it only gets used if available
and doesn't adversely impact system performance. I'm taking the timer out.
 
> In an ideal world, nsToolkit wouldn't have to do anything to get plevents to be
> processed (as is the case in the Carbon world). I think plevent.c should, for
> Mac OS X, use CFMessagePorts to notify the UI thread that there's work to do.
> I've written the code before, and could come up with a patch once my cvs access
> gets resolved.

And actually, there's absolutely no need for me to do this in nsToolkit anymore;
the only reason i did is intertia. I should probably move it to MainController.

If you've got code for using CFMessagePorts (Is that better than the BSD pipe
for some reason? I thought since this was just a single byte of data going one
direction a pipe might be best...) I'd be more than willing to do any necessary
cleanup to get a landable patch. Feel free to send it to me or post it here if
you'd like.

Thanks for the lucid explanation!
I should be able to get a CFMessagePort patch out this week.

There are a couple of things that I wonder about.
First, should the plevent code just notify the UI thread that there are any
events in the queue (which means that the UI thread would then go and process
all, or N of them in one go), or should each and every plevent get proxied over
to the UI thread via CFMessagePort or similar?

The advantage of proxying each plevent separately is that their processing is
interleaved with UI events, which might ease some of the CPU hogging we see on
flash-heavy sites (more on that in a moment). However, that means more
inter-thread communication, which might be expensive (though I'm sure trivial
compared to the amount of work done for many of those events).

Previous attempts to reduce plevent-based CPU hogging have put a cap on the
number of events processed from the queue in one go (hence the "N" above).

Now the reason I believe that Flash-heavy sites cause these CPU hogging problems
that is that we set up a 50ms timer for each and every plugin, to feed the
plugin null events (so it can do idle processing). Some flash movies use
considerable CPU time each time they are sent an idle event.

Because timers are implemented cross-platform using a thread, they use plevents
to notify the calling thread when the timer fires. I think this is why this
event handling tweaking has such an impact on flash-heavy sites.
Note that nsITimers use PLEvents to fire, at least in some cases, so we
absolutely need to support PLEvents firing at least as often as DOM setTimeout
calls can fire (every 10ms, possibly more often).
> If you've got code for using CFMessagePorts (Is that better than the BSD pipe
> for some reason?

The advantage of CFMessagePort is that you can create a run loop source for one.
In other words, unlike a BSD pipe, when a request is sent to a CFMessagePort, it
wakes up your CFRunLoop.
Here's what I was thinking. It would be good if someone could do an opt build
with this for testing.
(In reply to comment #18)
> Created an attachment (id=166833)
> Patch to change plevents to use CFMessagePorts, and remove nsToolkit timer
> stuff.
> 
> Here's what I was thinking. It would be good if someone could do an opt build
> with this for testing.

Thank you!

Here is an opt build for everyone's testing enjoyment:
http://www.speakeasy.org/~gbeier/Camino-CFMessage.dmg.bz2
i did some pageload testing between the 11/22 nightly and the cfMessagePort
build that geoff posted and there is some difference:

cache cleared, ran 1 cycle 
11/22: 1842ms avg 
CFMP: 1887ms avg

MessagePort build is 2% slower in the totally uncached case.

after filling the cache, re-ran test, 3 cycles
11/22: 819ms avg
CFMP: 856ms avg

MP build is 5% slower in the cached case. this is the case where the network
makes the least amount of difference as all images are already cached.
i tested the first patch (geoff's) and got about a 12% slowdown (in the cached
case) from an equivalent build w/out his patch.

it was pointed out that the CFMP build is a dynamic opt build while the 11/22 is
a static opt build. that could account for some of the difference.
In reply to Comment #11 using the new build from Geoff.

With the patch from Simon Camino is no longer slower than before, perhaps even
better when it comes to normal flash animations such as:
http://yugop.com/ver3/stuff/12/index.html

Camino now does use all the cpu it can but will gear down when the user is doing
things in other movie son one page, you can actaully see it happen:
http://yugop.com/ver4

I also noticed that javascript animated objects are WAY smoother and faster then
they where before and even animated when the scrollbar is used:
http://gids.omroep.nl/?medium=tv
That second URL is definitely noticeably better using the experimental build.
Without wandering into the pointless realm of "it feels snappier" it would be
interesting if we could also quantifiably test things like Flash performance and
UI responsiveness.
Well shut my mouth, I'm guilty of exactly what I was describing, overzealous
performance-increase reporting. On subsequent testing performance is visually
indistinguishable between the experimental build and 0.8.1. There must have been
some other factor influencing performance initially.

Simple Flash FPS testing using <> is close, probably within the margin of error
(whatever that may be): 23--24 for 0.8.1, 24--25 for the experimental build.
(In reply to comment #24)
> Well shut my mouth, I'm guilty of exactly what I was describing, overzealous
> performance-increase reporting. On subsequent testing performance is visually
> indistinguishable between the experimental build and 0.8.1. There must have been
> some other factor influencing performance initially.
> 
> Simple Flash FPS testing using <> is close, probably within the margin of error
> (whatever that may be): 23--24 for 0.8.1, 24--25 for the experimental build.

Maybe, but I *can* say that on pages with lots of Flash embeds I can actually
scroll without waiting 10 seconds and getting constant beachballs, and that on
full Flash sites (e.g., http://www.metroid.com/), I don't need to hold the mouse
button down on the movie (old Camino-fu) to have it play back decently.
(In reply to comment #25)
> Maybe, but I *can* say that on pages with lots of Flash embeds I can actually
> scroll without waiting 10 seconds and getting constant beachballs, and that on
> full Flash sites (e.g., http://www.metroid.com/), I don't need to hold the mouse
> button down on the movie (old Camino-fu) to have it play back decently.

Exactly, more so people using old machines like G3 iMacs and such will be the
ones who will notice a BIG performance changes. I had a hands on experience
where the yugop pages would skip frames like hell and camino would be very
unresponsive while with the patches Camino had no troubles what so ever.

Most of us here in bugzulla own big beasts of machines so we won't really notice
the changes. Find an old one and you will see.

*** Bug 197889 has been marked as a duplicate of this bug. ***
*** Bug 182087 has been marked as a duplicate of this bug. ***
If anyone is interested I have too build up static optimized at
http://softkid.nerim.net Camino_clean.zip is patch free and Camino_patch.zip
contains smfr's patch.
It seems the CFMP static build makes Flash movies snappier and doesn't slow down
transfer speed.
BenchJS results <http://www.24fun.com/downloadcenter/benchjs/benchjs.html> using
test builds (opt, static) from http://softkid.nerim.net. All numbers are
averages of 3 runs.

Mac OS X 10.3
Camino_clean:   8.87sec
Camino_patched: 9.24sec (4.1% slower)

Mac OS X 10.2:
Camino_clean:   9.24sec
Camino_patched: 9.69sec (4.8% slower)

This shows that the build with the CFMessagePort patch is apparently slower.
However, I believe that this is because the build is doing additional drawing
during the tests that is missed by the vanilla build.

For example, on the second (popup window) test, the patched build shows the
contents of the popup windows to be red. On the vanilla build, they close before
the contents are drawn.

On test 6 (the animated letters), the patched build shows a smoother animation,
drawing every frame. The animation in the vanilla build is jerky, drawing fewer
frames.

So I belive that the CFMessagePort patch makes Camino behave more correctly in
terms of drawing.
(In reply to comment #31)
> So I belive that the CFMessagePort patch makes Camino behave more correctly in
> terms of drawing.

I think that this is definitely the correct thing to do. Taken to an extreme case, Camino could have 
superb rendering performance if it just skipped that darn drawing step altogether ;) Obviously if the 
patch can be improved to match the performance of the old system that's great, but I don't think that 
this really counts as a performance regression, since the only way the old behavior is so fast is because 
it's wrong.
Simon - are you working on the patch any more or would you like to start getting
reviews? I think the latest patch from you (using CFMessagePorts) works really
well. It makes us behave correctly in a few situations we don't now, positive
performance impacts are huge in some cases, negative perf impacts are minor and
acceptable (usually for the sake of correctness) in others.
I think the patch is ready to go.
Assignee: pinkerton → sfraser_bugs
Attachment #166833 - Flags: superreview?(bryner)
Attachment #166833 - Flags: review?(pinkerton)
my only concern here is that this is causing us to do too much work in some
cases (overpainting) unrelated to plugins. I totally agree that painting all
frames or an animation are important, but it sounds like from some of the
examples given that we're painting when we weren't before.

Anything we can do to tweak this? 5% isn't a trivial slowdown. That said, I'm
all for this patch, it's gonna make things a lot better, just trying to make it
the best it can be up front.
> we're painting when we weren't before

I think we were wrong to not paint before.

The pageload numbers are the real test -- were you able to run those for static
plain and patched builds?
Status: NEW → ASSIGNED
The pageload tests get 6.4% slower with this patch (mean 393ms to 419ms).
+    SInt32 err = CFMessagePortSendRequest(self->mRemoteMessagePort, 0, NULL,
1.0, 0.0, NULL, NULL);

what's the 1.0 and 0.0? either a comment or constants would help here

+    eventQueue->mLocalMessagePort =
CFMessagePortCreateLocal(kCFAllocatorDefault, portName,
+                              _md_EventReceiverProc, &portContext, NULL);

can this ever fail? i guess there can't ever be a duplicate (can't dupe the
process id) but could you ever be "out" of machine-wide ports? what would be the
behavior if it did fail? no events? crash?

looks good otherwise. damn, 6.4%. ouchies. :(
This patch made a whole world of a difference.. Im on an iMac 233, 10.3.7, and before it I could'nt 
enter a page which had a large flash-animation on it. It made Camino totally unresponsible and I had to 
use command-W, and wait ~20 seconds for it to respond. Now I can scroll up & down with the same 
animation running.

Just my 0.02¢
simon? cleaned up patch?
has anyone tested this on a 10.1 system (read: could someone test for us)? it
would be awesome if we could also take this patch on the branch for 0.8.3.
(In reply to comment #41)
> has anyone tested this on a 10.1 system (read: could someone test for us)? it
> would be awesome if we could also take this patch on the branch for 0.8.3.

Providing Branch build containing the patch would help. Is this needed ?
This is a complete patch for an alternative version of the CFMessagePort patch
taht just makes a plain CFRunLoopSource, which it notifies using
CFRunLoopSourceSignal(). This should be just as good, and is simpler.
Test branch builds with and without the last CFRunLoop patch are at:
<ftp://ftp.accesscom.com/pub/users/s/smfr/camino/>
Pageload results look pretty good:

vanilla build: mean    445ms
CFRunLoop build: mean  443ms.
(In reply to comment #46)
> Pageload results look pretty good:
> 
> vanilla build: mean    445ms
> CFRunLoop build: mean  443ms.

Testing the patched build with the Flash video here:

http://www.atomfilms.com/af/content/gangsta_rap_se

yields satisfactory results. All previous builds of not only Camino, but also Fx
and Mozilla drop frames and quickly lock up, requiring a Force Quit. This is
looking good.
I should add I'm testing on a 12" PB 1.25GHz with 1GB of RAM. OSX 10.3.7
bumping up to 0.9
Target Milestone: --- → Camino0.9
The checkin for this bug looks like it caused an 8% improvement in startup time
but a 19% regression in pageload time.
19% is much worse than my testing showed, alas. I'll try checking in the
CFMessagePort version.
With some basic test I can indeed see umbres dropping for pageload tests. Howe
ever more importantly, when rendering pages the UI always stays responsive and
progress in the rendering of very large pages is always updated. All in all ths
creates a smoother and more responsive feel when using Camino.

I do agree that 19% is a lot to loose, maybe to much?
Pageload results from 500MHz laptop:
Original branch: 803ms
CFRunLoop build: 955ms

When running the tests you can see more painting; on some pages, the background
draws first, followed by the content. On faster machines this probably won't
happen, explaining the divergence in results.
Attachment #166833 - Flags: superreview?(bryner)
Attachment #166833 - Flags: review?(pinkerton)
so what do we want to do with this? mark it done? see if we can't get the times
down more?
I need to swap the CFRunLoop patch back into the trunk.
CFRunLoop patch is back in:
Checking in plevent.c;
/cvsroot/mozilla/xpcom/threads/plevent.c,v  <--  plevent.c
new revision: 1.48; previous revision: 1.47

Should we just close this out?
guess so.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
This bug is marked fixed, but I never saw a check-in on bonsai.  Am I missing
something?
Look at the CVS history for plevents.c.
Blocks: 279168
Comment on attachment 169688 [details] [diff] [review]
Alternative simpler patch using CFRunLoopSource

asking for approval just to be safe. 

this has been on the trunk for a long time and is ifdef'd for cocoa. only
camino is affected by these changes.
Attachment #169688 - Flags: superreview+
Attachment #169688 - Flags: review+
Attachment #169688 - Flags: approval1.7.6?
Did the pageload performance issues (comment 50) ever get worked out?  It is not
clear from the remaining comments....
(In reply to comment #61)
> Did the pageload performance issues (comment 50) ever get worked out?  It is not
> clear from the remaining comments....

No unfortunatly not. See below for the numbres.
http://axolotl.mozilla.org/graph/query.cgi?tbox=pawn.office.mozilla.org&testname=pageload&autoscale=1&size=&days=0&units=&ltype=&points=&showpoint=2004:01:13:11:39:21,777&avg=1

the page load numbers on pawn are out of whack with our own testing, but we
decided to take the perf hit anyway for substantially improved flash performance
and to fix other bugs, such as chewing up the CPU when no windows were open that
we would occasionally get into.
Comment on attachment 169688 [details] [diff] [review]
Alternative simpler patch using CFRunLoopSource

a=caillon for 1.7.6
Attachment #169688 - Flags: approval1.7.6? → approval1.7.6+
landed on branch, will do an 083 test spin with this in it so we can get some
10.1 luvin'
Keywords: fixed1.7.6
> the page load numbers on pawn are out of whack with our own testing

To be more precise, I think what's happening is that on slower machines, the
patch has a pageload impact because some pages redraw more than once when
loading. On faster machines there are no extra redraws, so the patch has little
impact on performance (but does help fastly with responsiveness on Flash-heavy
pages).
Blocks: 269388
You need to log in before you can comment on or make changes to this bug.