Closed Bug 95952 Opened 23 years ago Closed 23 years ago

Waste of offscreen pixmaps

Categories

(Core :: Web Painting, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla0.9.7

People

(Reporter: otaylor, Assigned: kmcclusk)

References

Details

(Keywords: perf)

Attachments

(5 files, 1 obsolete file)

When running mozilla 0.9.2, I've observed that a very big offscreen pixmap is permanently kept around that is bigger than the size of the largest window open. (Seems to be the size of the largest window rounded up to an even number like 800x800, 800x1200, 1600x1200, etc.) Because Mozilla keeps this around and occasionally reallocates it, it almost always ends up occupying space in offscreen video ram. In many cases, this keeps any other pixmaps being allocated in the ofscreen video ram. This badly hurts other clients running on the same display, and may also hurt Mozilla, since its temporary offscreen pixmaps used for handling expose events won't be put in video ram, and thus drawing to them won't be accelerated. I believe the part of the Mozilla keeping this pixmap around is view/src/viewManager.cpp. The best fix would be to only allocate the pixmap when needed and free it immediately afterwards. Allocating offscreen pixmaps is going to be very cheap compared to drawing operations on them. If this isn't possible, other things that could be done include: - Release the pixmap when all toplevels are unmapped - Release the pixmap after a few seconds of inactivity
Possible perf improvement here. Good stuff from owen.
Keywords: perf
Status: UNCONFIRMED → NEW
Ever confirmed: true
The pixmap offscreen in question is the scratch area used as the backbuffer for rendering. The current algorithm was designed to minimize the frequency of allocating and deallocating the backbuffer. All painting in Mozilla is done on the backbuffer before being blitted to the onscreen window to prevent flicker. Frequently, a full window paint is required. All toplevel windows reuse the same offscreen, so if a full window paint is required the scratch buffer must be large enough the accommodate the largest toplevel window. The current algorithm sizes the backbuffer at a discrete sizes to prevent window resizing operations from continuous allocation and deallocating the offscreen, while allowing the size of the offscreen to shrink when the window sizes are made small enough to fit within one of the smaller discrete sizes. If the backbuffer were to be created only "as needed" then it would need to be allocated and deallocated with every paint operation. Frequent invalidation of the window caused by animated gifs and DHTML animations would also result in large numbers of paint requests with the associated allocation and deallocation of the back buffer. Other operations such as resizing the window when "showing the window contents" would result in continuous allocation and deallocation of the offscreen for example. - Release the pixmap when all toplevels are unmapped It should be shrinking the back buffer down to the smallest discrete size if all of the top level windows are removed or minimized. - Release the pixmap after a few seconds of inactivity I think this may be doable if we took into account whether all of Mozilla's windows where inactive. If Another application has an active window then mozilla would dump it's backbuffer.
You are making the assumption that allocating a pixmap is an expensive peration. This is simply not the case, and pixmap allocation time will most likely be completely swamped by other factors. I've attached a rough benchmark for allocation speed on X. What it does is times both: Allocating pixmap A, B and N * (clear B, copy B to A) Allocating pixmap A, and N * (allocate B, clear B, copy B to A, clear A) A typical result is: 1000 iters at 512x512 One pixmap: 741.321 (0.741321/iter) Many pixmaps: 718.056 (0.718056/iter) (The reason, I'd guess for the better result here for many pixmaps is the lack of the need to wait for one copy to stop before starting the next clear - that is, pipelining on the video card.) But it's a rough benchmark at best, and once you get up near screen size you can get strange results: 1000 iters at 1100x1100 One pixmap: 2850.575 (2.850575/iter) Many pixmaps: 3406.789 (3.406789/iter) (I think the many pixmap case might occasionally allocated a pixmap not in video ram.) But, once you get above screen size, the pixmap is no longer in video ram, and performance gets _much_, _much_, worse. 10 iters at 1300x1300 One pixmap: 2085.611 (208.561100/iter) Many pixmaps: 842.560 (84.256000/iter) (I can't explain why the "many pixmaps" result is a lot faster here.) This is what mozilla is doing to other apps, and probably itself, by gobbling video ram. But my main point would be here that allocating and freeing pixmaps is a very cheap operation compared to actually drawing on them on X. (Windows may be different, but I would expect not too different ... allocating a offscreen drawing surface should basically involve just twiddling some tables on any window system.) Anyways, I'd encourage you to do some timing rather than assuming that caching this huge pixmap is a win. (Perhaps you've already done that timing before adding it, of course.) You may dump the pixmap when all windows are minimized, but that doesn't really do the right thing on X, since windows can be on another virtual desktop without being "minimized" - or (as on Windows) simply obscured by another window. A much better thing to track on X is the VisibilityNotify event, which gives notification of VisibilityUnobscured / VisibilityPartiallyObscured / VisibilityFullyObscured. VisibilityFullyObscured means that no part of the window is visible. By "active" I don't mean "having the keyboard focus", I mean "having done some drawing recently". Mozilla could of course be getting exposes or updating an anim gif when it didn't have the keyboard focus.
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9.6
Target Milestone: mozilla0.9.6 → mozilla0.9.7
If patch 58566 is installed and you add user_pref("layout.transitory.backbuffer",true); to pref.js the backbuffer will be created at the beginning of each paint request and destroyed after the paint is complete. I will check this in so we can do performance testing on all three platforms to see if the default should be set to true.
Comment on attachment 58566 [details] [diff] [review] patch which makes the backbuffer transitory when user pref ("layout.transitory.backbuffer", true) r= alexsavulov
Attachment #58566 - Flags: review+
Comment on attachment 58566 [details] [diff] [review] patch which makes the backbuffer transitory when user pref ("layout.transitory.backbuffer", true) sr=attinasi
Attachment #58566 - Flags: superreview+
Checked in patch. jrgm: could you run your page load tests on WIN32, Mac, and Linux with user_pref("layout.transitory.backbuffer",true); ? I want to see if using a transitory backbuffer affects page load.
I'll try to get this done in the next couple of days (or find a willing victim to help me do it :-)
With this pref on, I see one of these per transition from page to page: ** CRITICAL **: file nsRenderingContextGTK.cpp: line 781 (nsresult nsRenderingContextGTK::CreateDrawingSurface(nsRect *, unsigned int, void *&)): assertion `(aBounds->width > 0) && (aBounds->height > 0)' failed. I don't actually see any obvious problems though.
Just a quick update. (I want to look at this some more tomorrow, including win9x and slower machines, and I'm a little puzzled by the linux result so far [or maybe just plain old confused]). Here's what it looks like so far: 1) Linux doesn't show a significant change (don't know why) 2) win2k appears to about 2.5% faster and it looks "for real" 3) Mac OS 9 and OS X both get their butts kicked, big time -- they are both 20%+ slower with this pref setting #1 is not what I expected from the notes above. I do get the same assertion as noted by Adam, although it is silent while the test is running [don't know why that is the case, though]. I guess that it's not so bad that we can play nicer with the other kids without taking a big perf. hit on Linux. By the way, would there be a substantial influence of hardware/video? (I suppose the answer to that is an obvious yes).
Can we move the offscreen pixmap caching into GFX?
It does not surprise me that mac is so much slower. Allocating big offscreen GWorlds (several megabytes) is slow.
The Linux performance is actually not really surprising - a neutral result _for mozilla_ is about what I'd expect; the point of the bug is that allocating a large pixmap is extremely cheap, but keeping around a large pixmap even when Mozilla isn't actively drawing will have a very negative impact on _other applications_ running on the same display. While it's not inconceivable that in some cases Mozilla itself will benefit from allowing a more flexible use of video ram, I suspect most uses of video ram in mozilla will be when the backbuffer is allocated, even with the transitory back buffer. There may be other benefits to using a transitory backbuffer, such as the ability to allocate a smaller pixmap when the currently rendering area is smaller, which could improve performance, but with a quick look, the current patch looks like it wouldn't reveal such improvements. The results will, in fact, be dependent on the amount of video ram, resolution, color depth, other uses of video ram (say, if 3D is enabled in X) and so forth, as well as on the exact implementation of the X server; a more sophisticated X server that does migrates pixmaps in and out of video ram can mitigate (or make worse) the effects of applications doing dumb things. On a XFree86-3.3.x server where rendering to offscreen pixmaps wasn't done at all (IIRC), I'd expect no effect at all from this patch assuming sufficient system RAM. But except for saying "the more video RAM, the less effect gobbling up video RAM has" it would be hard to say exactly what the dependencies are; what strongly I know is the case is that there are _some_ X configurations where this makes a huge difference for non-mozilla apps, so the only thing to watch out for is if there are X configurations where it hurts mozilla performance. (I don't expect this to be the case and the above results seem to bear this out.)
Note: This bug covers the issue of making the backbuffer transitory. see bug 114082 for the issue of requesting/using a damageRect size backbuffer.
Blocks: 91747
This patch moves the management of the backbuffer to the rendering context and overrides GetBackbuffer and ReleaseBackbuffer on WIN32 and GTK to force the backbuffer to be allocated only when needed. On all other platforms the default is to cache the backbuffer's drawing surface.
We're going to need multiple backbuffers later. Can you change the nsIRenderingContext API to support this? i.e., NS_IMETHOD GetBackbuffer(const nsRect &aRequestedSize, const nsRect &aMaxSize, nsDrawingSurface &aBackbuffer) = 0; NS_IMETHOD ReleaseBackbuffer(nsDrawingSurface &aBackbuffer) = 0; NS_IMETHOD DestroyCachedBackbuffers(void) = 0; If you don't want to implement this now, you can just throw an assertion if someone tries to use more then one back buffer at once. It's not clear what the aMaxSize parameter is for. From the point of view of the consumer of the interface, it's not obvious why it's needed or what it should be set to. Can't you just do something intelligent based on the requested size? Also, > if (PR_FALSE == RectFitsInside(aRect1, aWidth, aHeight)) { reads more easily as if (!RectFitsInside(...)) {
"We're going to need multiple backbuffers later." I view the backbuffer as a single global buffer that is optimized for pre rendering the paint request and copying forward. The size of the backbuffer is tied to size of the paint requests. The only thing thats special about the backbuffer is this relationship. Otherwise its just like any other offscreen drawing surface. If we need additional offscreen's can't they be allocating using the nsDrawingSurface's? "can't you just do something intelligent based on the requested size?" The size of the cached backbuffer attempts to be large enough to minimize the re -allocation of the backbuffer when a series of GetBackbuffer requests are made. If the backbuffer only grew in size we could always expand the backbuffer's drawing surface to accommodate the requests and very quickly the calls to GetBackbuffer would seldom cause the backbuffer to be re-allocated. A problem occurs when you want to efficiently shrink the backbuffer. If you know the maximum size that will be requested you can efficiently shrink the backbuffer when the maximum request size becomes smaller. The sequence of GetBackbuffer calls will not tell you if can really shrink the backbuffer because a large request as the result of window expose event, window resize, or loading a new url may occur after many small requests. In Mozilla, we get a large number of small requests punctuated by large full window requests.
> I view the backbuffer as a single global buffer that is optimized for pre > rendering the paint request and copying forward. This is never going to be optimal on Mac, nor will it deal with having multiple screens of different depths. For optimal back-buffering performance on Mac, we'd need a back buffer per window, which is maintained with the same global coordinates as the window it buffers.
"For optimal back-buffering performance on Mac, we'd need a back buffer per window, which is maintained with the same global coordinates as the window it buffers." This should be implementable using the API in this patch. On the Mac, GetBackBuffer could be implemented to return a drawing surface that is associated with a per window backbuffer. The abstraction from the viewmanager's point of view is: "give me a drawing surface that's at least size (n)"
> If we need additional offscreen's can't they be allocating using the > nsDrawingSurface's? Yes, I suppose so. So if the backbuffer is just a cached nsDrawingSurface, why don't we get rid of these APIs and have the GFX implementations just cache nsDrawingSurfaces internally when desired? > The sequence of GetBackbuffer calls will not tell you if can really shrink > the backbuffer because a large request as the result of window expose event, > window resize, or loading a new url may occur after many small requests. It's not hard to design a policy that will limit the worst-case allocation overhead. Here's a really simple policy: every N seconds, shrink the cached buffer to the largest size you saw during the last N seconds.
"So if the backbuffer is just a cached nsDrawingSurface, why don't we get rid of these APIs and have the GFX implementations just cache nsDrawingSurfaces internally when desired?" The backbuffer has a special semantic that various platforms will handle differently. It has a special relationship to the widget it's going to be drawing in and that may determine whether it is cached at all, the caching policy if it is cached, who owns the drawing surface, the drawing surface's life cycle. For example, on the Mac getting the Backbuffer could return a drawing surface that is being maintained as a backing store for each window. The caching of this drawing surface may be handled in the widget module and the viewmanager should not have any special knowledge about the internal implementation. The viewmanager simply requests a Backbuffer to draw in and it doesn't know whether the platform created a new drawing surface, returned a single cached drawing surface that all widgets use, or returned a drawing surface that is always associated with each top-level window. The GetBackbuffer, ReleaseBackbuffer, DestroyCachedBackbuffers provide a layer of abstraction so platform specific implementation's have room to maneuver under. "Here's a really simple policy: every N seconds, shrink the cached buffer to the largest size you saw during the last N seconds." Time does not make a very good policy for shrinking the cache because people typically load a URL, then pause for a period of time while they read the page. Often scrolling the page which forces are repaint of only a portion of the page. The cache would tend to shrink on every page load unless the time was made very large to accomodate the typically reading time per URL. When the user loaded a new page they would pay the full performance hit for the new page. On the Mac this was shown to be a 20% page load penalty. In the case of the backbuffer the viewmanager does know what the maximum size of a paint request will be. It will never be larger than the union of the root widgets, so it makes sense to pass this info along so if the backbuffer is cached it can be shrunk to a size that is optimal rather than guessing.
Comment on attachment 61201 [details] [diff] [review] Fix for GTK Critical error caused by paint event dispatched for widget with 0 height or width sr=attinasi
Attachment #61201 - Flags: superreview+
Comment on attachment 61155 [details] [diff] [review] Move management of the backbuffer to the rendering context Looks fine to me - sr=attinasi
Attachment #61155 - Flags: superreview+
Comment on attachment 61155 [details] [diff] [review] Move management of the backbuffer to the rendering context r=karnaze
Attachment #61155 - Flags: review+
Comment on attachment 61201 [details] [diff] [review] Fix for GTK Critical error caused by paint event dispatched for widget with 0 height or width r=karnaze
Attachment #61201 - Flags: review+
Fix checked in
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Attached patch Patch for Xlib gfx code (obsolete) — Splinter Review
Patch for Xlib gfx code (same changes as the patch for gfx/src/gtk/). Requesting rs=, please ...
It seems that Xlib nsDrawingSurfaceImpl code leaks offscreen Pixmaps - attachment 61743 [details] [diff] [review] caused the Xserver to consume huge amounts of memory ...
Attachment #61743 - Attachment is obsolete: true
Comment on attachment 61760 [details] [diff] [review] New patch for Xlib gfx code r=kmcclusk@netscape.com
Attachment #61760 - Flags: review+
Comment on attachment 61760 [details] [diff] [review] New patch for Xlib gfx code sr=attinasi
Attachment #61760 - Flags: superreview+
a=brendan@mozilla.org on attachment 61760 [details] [diff] [review] for 0.9.7 branch checkin. Who checked this into the trunk, and when? Please, open a new bug for a similar problem in a different port, next time. You might want to use PRPackedBool in structs for adjacent booleans, in a future patch. /be
Blocks: 114455
Keywords: mozilla0.9.7+
i just <now> checked attachment 61760 [details] [diff] [review] into the branch and trunk. the fix appears to be for some other attachment.
No longer blocks: 114455
Component: Layout: View Rendering → Layout: Web Painting
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: