Closed Bug 95952 Opened 23 years ago Closed 23 years ago

Waste of offscreen pixmaps

Categories

(Core :: Web Painting, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla0.9.7

People

(Reporter: otaylor, Assigned: kmcclusk)

References

Details

(Keywords: perf)

Attachments

(5 files, 1 obsolete file)

When running mozilla 0.9.2, I've observed that a very big offscreen
pixmap is permanently kept around that is bigger than the size of the
largest window open. (Seems to be the size of the largest window rounded
up to an even number like 800x800, 800x1200, 1600x1200, etc.)

Because Mozilla keeps this around and occasionally reallocates it, 
it almost always ends up occupying space in offscreen video ram.
In many cases, this keeps any other pixmaps being allocated in the
ofscreen video ram.

This badly hurts other clients running on the same display, and may
also hurt Mozilla, since its temporary offscreen pixmaps used for handling
expose events won't be put in video ram, and thus drawing to them
won't be accelerated.

I believe the part of the Mozilla keeping this pixmap around is
view/src/viewManager.cpp.

The best fix would be to only allocate the pixmap when needed
and free it immediately afterwards. Allocating offscreen pixmaps is going
to be very cheap compared to drawing operations on them.

If this isn't possible, other things that could be done include:

 - Release the pixmap when all toplevels are unmapped
 - Release the pixmap after a few seconds of inactivity
Possible perf improvement here.  Good stuff from owen.
Keywords: perf
Status: UNCONFIRMED → NEW
Ever confirmed: true
The pixmap offscreen in question is the scratch area used as the backbuffer for
rendering. The current algorithm was designed to minimize the frequency of
allocating and deallocating the backbuffer.  All painting in Mozilla is done on
the backbuffer before being blitted to the onscreen window to prevent flicker.
Frequently, a full window paint is required. All toplevel windows reuse the same
offscreen, so if a full window paint is required the scratch buffer must be
large enough the accommodate the largest toplevel window. The current algorithm
sizes the backbuffer at a discrete sizes to prevent window resizing operations
from continuous allocation and deallocating the offscreen, while allowing the
size of the offscreen to shrink when the window sizes are made small enough to
fit within one of the smaller discrete sizes.

If the backbuffer were to be created only "as needed" then it would need to be
allocated and deallocated with every paint operation.

Frequent invalidation of the window caused by animated gifs and DHTML animations
would also result in large numbers of paint requests with the associated
allocation and deallocation of the back buffer. Other operations such as
resizing the window when "showing the window contents" would result in
continuous allocation and deallocation of the offscreen for example.

 - Release the pixmap when all toplevels are unmapped

It should be shrinking the back buffer down to the smallest discrete size if all
of the top level windows are removed or minimized.

 - Release the pixmap after a few seconds of inactivity

I think this may be doable if we took into account whether all of Mozilla's
windows where inactive. If Another application has an active window then mozilla
would dump it's backbuffer. 





You are making the assumption that allocating a pixmap is an expensive 
peration. This is simply not the case, and pixmap allocation time will
most likely be completely swamped by other factors.

I've attached a rough benchmark for allocation speed on X. What it does
is times both:

 Allocating pixmap A, B and N * (clear B, copy B to A)
 Allocating pixmap A, and N * (allocate B, clear B, copy B to A, clear A)

A typical result is:

1000 iters at 512x512
One pixmap:  741.321 (0.741321/iter)
Many pixmaps: 718.056 (0.718056/iter)

(The reason, I'd guess for the better result here for many pixmaps is the
lack of the need to wait for one copy to stop before starting the next clear
- that is, pipelining on the video card.)

But it's a rough benchmark at best, and once you get up near screen size
you can get strange results:

1000 iters at 1100x1100
One pixmap:  2850.575 (2.850575/iter)
Many pixmaps: 3406.789 (3.406789/iter)

(I think the many pixmap case might occasionally allocated a pixmap not
in video ram.) But, once you get above screen size, the pixmap is no
longer in video ram, and performance gets _much_, _much_, worse.

10 iters at 1300x1300
One pixmap:  2085.611 (208.561100/iter)
Many pixmaps: 842.560 (84.256000/iter)

(I can't explain why the "many pixmaps" result is a lot faster here.) 
This is what mozilla is doing to other apps, and probably itself, by
gobbling video ram.

But my main point would be here that allocating and freeing pixmaps 
is a very cheap operation compared to actually drawing on them
on X. (Windows may be different, but I would expect not too different ...
allocating a offscreen drawing surface should basically involve just
twiddling some tables on any window system.)

Anyways, I'd encourage you to do some timing rather than assuming that
caching this huge pixmap is a win. (Perhaps you've already done that
timing before adding it, of course.)

You may dump the pixmap when all windows are minimized, but that doesn't
really do the right thing on X, since windows can be on another virtual
desktop without being "minimized" - or (as on Windows) simply obscured
by another window. A much better thing to track on X is the VisibilityNotify 
event, which gives notification of VisibilityUnobscured  /
VisibilityPartiallyObscured / VisibilityFullyObscured. VisibilityFullyObscured
means that no part of the window is visible.

By "active" I don't mean "having the keyboard focus", I mean "having
done some drawing recently". Mozilla could of course be getting exposes
or updating an anim gif when it didn't have the keyboard focus.
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9.6
Target Milestone: mozilla0.9.6 → mozilla0.9.7
If patch 58566 is installed and you add
user_pref("layout.transitory.backbuffer",true); to pref.js the backbuffer will
be created at the beginning of each paint request and destroyed after the paint
is complete.  

I will check this in so we can do performance testing on all three platforms to
see if the default should be set to true.
Comment on attachment 58566 [details] [diff] [review]
patch which makes the backbuffer transitory when user pref ("layout.transitory.backbuffer", true) 

r= alexsavulov
Attachment #58566 - Flags: review+
Comment on attachment 58566 [details] [diff] [review]
patch which makes the backbuffer transitory when user pref ("layout.transitory.backbuffer", true) 

sr=attinasi
Attachment #58566 - Flags: superreview+
Checked in patch.

jrgm: could you run your page load tests on WIN32, Mac, and Linux with 

user_pref("layout.transitory.backbuffer",true); ?

I want to see if using a transitory backbuffer affects page load.

I'll try to get this done in the next couple of days (or find a willing victim 
to help me do it :-)
With this pref on, I see one of these per transition from page to page:

** CRITICAL **: file nsRenderingContextGTK.cpp: line 781 (nsresult
nsRenderingContextGTK::CreateDrawingSurface(nsRect *, unsigned int, void *&)):
assertion `(aBounds->width > 0) && (aBounds->height > 0)' failed.

I don't actually see any obvious problems though.
Just a quick update. (I want to look at this some more tomorrow, including 
win9x and slower machines, and I'm a little puzzled by the linux result so
far [or maybe just plain old confused]).

Here's what it looks like so far:

1) Linux doesn't show a significant change (don't know why)
2) win2k appears to about 2.5% faster and it looks "for real"
3) Mac OS 9 and OS X both get their butts kicked, big time -- they are both
   20%+ slower with this pref setting

#1 is not what I expected from the notes above. I do get the same assertion
as noted by Adam, although it is silent while the test is running [don't know 
why that is the case, though]. I guess that it's not so bad that we
can play nicer with the other kids without taking a big perf. hit on Linux.

By the way, would there be a substantial influence of hardware/video? (I 
suppose the answer to that is an obvious yes).
Can we move the offscreen pixmap caching into GFX?
It does not surprise me that mac is so much slower. Allocating big offscreen 
GWorlds (several megabytes) is slow.
The Linux performance is actually not really surprising - a neutral result
_for mozilla_ is about what I'd expect; the point of the bug is that 
allocating a large pixmap is extremely cheap, but keeping around a large
pixmap even when Mozilla isn't actively drawing will have a very negative
impact on _other applications_ running on the same display.

While it's not inconceivable that in some cases Mozilla itself will
benefit from allowing a more flexible use of video ram, I suspect
most uses of video ram in mozilla will be when the backbuffer is
allocated, even with the transitory back buffer. There may be other
benefits to using a transitory backbuffer, such as the ability to 
allocate a smaller pixmap when the currently rendering area is smaller,
which could improve performance, but with a quick look, the current
patch looks like it wouldn't reveal such improvements.

The results will, in fact, be dependent on the amount of video ram,
resolution, color depth, other uses of video ram (say, if 3D is enabled in X)
and so forth, as well as on the exact implementation of the X server;
a more sophisticated X server that does migrates pixmaps in and out
of video ram can mitigate (or make worse) the effects of applications
doing dumb things. On a XFree86-3.3.x server where rendering to
offscreen pixmaps wasn't done at all (IIRC), I'd expect no effect at
all from this patch assuming sufficient system RAM.
 
But except for saying "the more video RAM, the less effect gobbling 
up video RAM has" it would be hard to say exactly what the dependencies
are; what strongly I know is the case is that there are _some_ 
X configurations where this makes a huge difference for non-mozilla
apps, so the only thing to watch out for is if there are X configurations 
where it hurts mozilla performance. (I don't expect this to be the case and 
the above results seem to bear this out.)
Note:
This bug covers the issue of making the backbuffer transitory.
see bug 114082 for the issue of requesting/using a damageRect size backbuffer.
Blocks: 91747
This patch moves the management of the backbuffer to the rendering context and
overrides GetBackbuffer and ReleaseBackbuffer on WIN32 and GTK to force the
backbuffer to be allocated only when needed. On all other platforms the default
is to cache the backbuffer's drawing surface.
We're going to need multiple backbuffers later. Can you change the
nsIRenderingContext API to support this? i.e.,

NS_IMETHOD GetBackbuffer(const nsRect &aRequestedSize, const nsRect &aMaxSize,
nsDrawingSurface &aBackbuffer) = 0;
NS_IMETHOD ReleaseBackbuffer(nsDrawingSurface &aBackbuffer) = 0;
NS_IMETHOD DestroyCachedBackbuffers(void) = 0;

If you don't want to implement this now, you can just throw an assertion if
someone tries to use more then one back buffer at once.

It's not clear what the aMaxSize parameter is for. From the point of view of the
consumer of the interface, it's not obvious why it's needed or what it should be
set to. Can't you just do something intelligent based on the requested size?

Also,
> if (PR_FALSE == RectFitsInside(aRect1, aWidth, aHeight)) {
reads more easily as
if (!RectFitsInside(...)) {
"We're going to need multiple backbuffers later."

I view the backbuffer as a single global buffer that is optimized for pre
rendering the paint request and copying forward. The size of the backbuffer is
tied to size of the paint requests. The only thing thats special about the
backbuffer is this relationship. Otherwise its just like any other offscreen
drawing surface. If we need additional offscreen's can't they be allocating
using the nsDrawingSurface's?

"can't you just do something intelligent based on the requested size?"

The size of the cached backbuffer attempts to be large enough to minimize the re
-allocation of the backbuffer when a series of GetBackbuffer requests are made.
  If the backbuffer only grew in size we could always expand the backbuffer's
drawing surface to accommodate the requests and very quickly the calls to
GetBackbuffer would seldom cause the backbuffer to be re-allocated.  A problem
occurs when you want to efficiently shrink the backbuffer. If you know the
maximum size that will be requested you can efficiently shrink the backbuffer
when the maximum request size becomes smaller.  The sequence of GetBackbuffer
calls will not tell you if can really shrink the backbuffer because a large
request as the result of window expose event, window resize, or loading a new
url may occur after many small requests.  In Mozilla, we get a large number of
small requests punctuated by large full window requests.  
> I view the backbuffer as a single global buffer that is optimized for pre
> rendering the paint request and copying forward.

This is never going to be optimal on Mac, nor will it deal with having multiple 
screens of different depths. For optimal back-buffering performance on Mac, we'd 
need a back buffer per window, which is maintained with the same global 
coordinates as the window it buffers.
"For optimal back-buffering performance on Mac, we'd need a back buffer per
window, which is maintained with the same global coordinates as the window it
buffers."

This should be implementable using the API in this patch. On the Mac,
GetBackBuffer could be implemented to return a drawing surface that is
associated with a per window backbuffer. 

The abstraction from the viewmanager's point of view is: "give me a drawing
surface that's at least size (n)"
> If we need additional offscreen's can't they be allocating using the
> nsDrawingSurface's?

Yes, I suppose so.

So if the backbuffer is just a cached nsDrawingSurface, why don't we get rid of
these APIs and have the GFX implementations just cache nsDrawingSurfaces
internally when desired?

> The sequence of GetBackbuffer calls will not tell you if can really shrink
> the backbuffer because a large request as the result of window expose event,
> window resize, or loading a new url may occur after many small requests.

It's not hard to design a policy that will limit the worst-case allocation
overhead. Here's a really simple policy: every N seconds, shrink the cached
buffer to the largest size you saw during the last N seconds.
"So if the backbuffer is just a cached nsDrawingSurface, why don't we get rid of
these APIs and have the GFX implementations just cache nsDrawingSurfaces
internally when desired?"

The backbuffer has a special semantic that various platforms will handle
differently. It has a special relationship to the widget it's going to be
drawing in and that may determine whether it is cached at all, the caching
policy if it is cached, who owns the drawing surface, the drawing surface's life
cycle. 

For example, on the Mac getting the Backbuffer could return a drawing surface
that is being maintained as a backing store for each window. The caching of this
drawing surface may be handled in the widget module and the viewmanager should
not have any special knowledge about the internal implementation. The
viewmanager simply requests a Backbuffer to draw in and it doesn't know whether
the platform created a new drawing surface, returned a single cached drawing
surface that all widgets use, or returned a drawing surface that is always
associated with each top-level window. The GetBackbuffer, ReleaseBackbuffer,
DestroyCachedBackbuffers provide a layer of abstraction so platform specific
implementation's have room to maneuver under.

"Here's a really simple policy: every N seconds, shrink the cached
buffer to the largest size you saw during the last N seconds."

Time does not make a very good policy for shrinking the cache because people
typically load a URL, then pause for a period of time while they read the page.
Often scrolling the page which forces are repaint of only a portion of the page.
The cache would tend to shrink on every page load unless the time was made very
large to accomodate the typically reading time per URL. When the user loaded a
new page they would pay the full performance hit for the new page. On the Mac
this was shown to be a 20% page load penalty.  

In the case of the backbuffer the viewmanager does know what the maximum size of
a paint request will be. It will never be larger than the union of the root
widgets, so it makes sense to pass this info along so if the backbuffer is
cached it can be shrunk to a size that is optimal rather than guessing.





Comment on attachment 61201 [details] [diff] [review]
Fix for GTK Critical error caused by paint event dispatched for widget with 0 height or width

sr=attinasi
Attachment #61201 - Flags: superreview+
Comment on attachment 61155 [details] [diff] [review]
Move management of  the backbuffer to the rendering context

Looks fine to me - sr=attinasi
Attachment #61155 - Flags: superreview+
Comment on attachment 61155 [details] [diff] [review]
Move management of  the backbuffer to the rendering context

r=karnaze
Attachment #61155 - Flags: review+
Comment on attachment 61201 [details] [diff] [review]
Fix for GTK Critical error caused by paint event dispatched for widget with 0 height or width

r=karnaze
Attachment #61201 - Flags: review+
Fix checked in
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Attached patch Patch for Xlib gfx code (obsolete) — Splinter Review
Patch for Xlib gfx code (same changes as the patch for gfx/src/gtk/).

Requesting rs=, please ...
It seems that Xlib nsDrawingSurfaceImpl code leaks offscreen Pixmaps -
attachment 61743 [details] [diff] [review] caused the Xserver to consume huge amounts of memory ...
Attachment #61743 - Attachment is obsolete: true
Comment on attachment 61760 [details] [diff] [review]
New patch for Xlib gfx code

r=kmcclusk@netscape.com
Attachment #61760 - Flags: review+
Comment on attachment 61760 [details] [diff] [review]
New patch for Xlib gfx code

sr=attinasi
Attachment #61760 - Flags: superreview+
a=brendan@mozilla.org on attachment 61760 [details] [diff] [review] for 0.9.7 branch checkin.  Who checked
this into the trunk, and when?  Please, open a new bug for a similar problem in
a different port, next time.

You might want to use PRPackedBool in structs for adjacent booleans, in a future
patch.

/be
Blocks: 114455
Keywords: mozilla0.9.7+
i just <now> checked attachment 61760 [details] [diff] [review] into the branch and trunk. the fix 
appears to be for some other attachment.
No longer blocks: 114455
Component: Layout: View Rendering → Layout: Web Painting
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: