Closed Bug 597336 Opened 14 years ago Closed 14 years ago

Fennec GTK is extremely slow on N900 after landing cedar and using remote layers

Categories

(Firefox for Android Graveyard :: General, defect)

x86
Linux
defect
Not set
normal

Tracking

(fennec2.0b1+)

VERIFIED FIXED
Tracking Status
fennec 2.0b1+ ---

People

(Reporter: romaxa, Assigned: cjones)

References

(Blocks 1 open bug, )

Details

Attachments

(7 files, 3 obsolete files)

2.13 KB, text/plain
Details
23.04 KB, text/plain
Details
36.03 KB, text/plain
Details
1.40 KB, patch
Details | Diff | Splinter Review
3.21 KB, patch
vlad
: review+
Details | Diff | Splinter Review
4.12 KB, patch
karlt
: review+
Details | Diff | Splinter Review
13.81 KB, patch
karlt
: review+
Details | Diff | Splinter Review
Attached file oprofile cut
Sounds like we are scaling a lot with billinear filtering and hitting slow pixman path. 2452 19.9674 libpixman-1.so.0.17.3 Xorg bits_image_fetch_transformed 1571 12.7932 libpixman-1.so.0.17.3 Xorg combine_out_reverse_u 1285 10.4642 no-vmlinux no-vmlinux /no-vmlinux 1031 8.3958 libpixman-1.so.0.17.3 Xorg fetch_pixel_r5g6b5 498 4.0554 no-vmlinux oprofiled /no-vmlinux 476 3.8762 libpixman-1.so.0.17.3 Xorg pixman_composite_scanline_add_mask_asm_neon 456 3.7134 libpixman-1.so.0.17.3 Xorg fetch_scanline_a8 422 3.4365 libpixman-1.so.0.17.3 Xorg pixman_composite_over_8888_8888_asm_neon
and yeah I was testing on file:///usr/share ;)
Here is the backtrace: 4 0x427bc8e0 in _XWaitForReadable (dpy=0x42e3a000) at ../../src/XlibInt.c:505 #5 0x427bcd24 in _XRead (dpy=0x42e3a000, data=0xbecfafb0 "\1\2\367\r", size=32) at ../../src/XlibInt.c:1104 #6 0x427bd694 in _XReply (dpy=0x42e3a000, rep=0xbecfafb0, extra=0, discard=1) at ../../src/XlibInt.c:1735 #7 0x427b7554 in XSync (dpy=0x42e3a000, discard=0) at ../../src/Sync.c:48 #8 0x427b76bc in _XSyncFunction (dpy=0xfffffffc) at ../../src/Synchro.c:37 #9 0x41dcfa64 in XRenderComposite (dpy=0x42e3a000, op=12, src=62914951, mask=62914956, dst=62914947, src_x=<value optimized out>, src_y=<value optimized out>, mask_x=<value optimized out>, mask_y=<value optimized out>, dst_x=<value optimized out>, dst_y=<value optimized out>, width=<value optimized out>, height=<value optimized out>) at ../../src/Composite.c:66 #10 0x415672e8 in _cairo_xlib_surface_composite (op=CAIRO_OPERATOR_ADD, src_pattern=0xbecfc5b8, mask_pattern=0xbecfb2c8, abstract_dst=0x48c36120, src_x=0, src_y=0, mask_x=0, mask_y=0, dst_x=0, dst_y=0, width=800, height=480, clip_region=0x0) at gfx/cairo/cairo/src/cairo-xlib-surface.c:2294 #11 0x4154b1a8 in _cairo_surface_composite (op=CAIRO_OPERATOR_ADD, src=0xbecfc5b8, mask=0xbecfb2c8, dst=0xb, src_x=0, src_y=0, mask_x=0, mask_y=0, dst_x=0, dst_y=0, width=800, height=480, clip_region=0x0) ---Type <return> to continue, or q <return> to quit--- at gfx/cairo/cairo/src/cairo-surface.c:1821 #12 0x4154d844 in _clip_and_composite (clip=0x0, op=<value optimized out>, src=0xbecfc5b8, draw_func=0x4154e6c4 <_composite_traps_draw_func>, draw_closure=0xbecfbc28, dst=0x48c36120, extents=0xbecfc520) at gfx/cairo/cairo/src/cairo-surface-fallback.c:365 #13 0x4154e250 in _clip_and_composite_trapezoids (src=0xbecfc5b8, op=CAIRO_OPERATOR_SOURCE, dst=0x48c36120, traps=0xbecfc074, antialias=CAIRO_ANTIALIAS_DEFAULT, clip=0x0, extents=0xbecfc520) at gfx/cairo/cairo/src/cairo-surface-fallback.c:851 #14 0x4154ed34 in _cairo_surface_fallback_fill (surface=0x48c36120, op=CAIRO_OPERATOR_SOURCE, source=0xbecfc5b8, path=0x4195abb4, fill_rule=CAIRO_FILL_RULE_WINDING, tolerance=0.10000000000000001, antialias=CAIRO_ANTIALIAS_DEFAULT, clip=0x0) at gfx/cairo/cairo/src/cairo-surface-fallback.c:1406 #15 0x4154be5c in _cairo_surface_fill (surface=0x48c36120, op=CAIRO_OPERATOR_SOURCE, source=0xbecfc5b8, path=0xb, fill_rule=CAIRO_FILL_RULE_WINDING, tolerance=0.10000000000000001, antialias=CAIRO_ANTIALIAS_DEFAULT, clip=0xbecfc6a0) at gfx/cairo/cairo/src/cairo-surface.c:2222 #16 0x4153422c in _cairo_gstate_fill (gstate=<value optimized out>, path=0x4195abb4) at gfx/cairo/cairo/src/cairo-gstate.c:1177 #17 0x4152d7b8 in *INT__moz_cairo_fill_preserve (cr=0x4195a910) at gfx/cairo/cairo/src/cairo.c:2338 #18 0x412ac438 in gfxContext::Fill (this=<value optimized out>) at gfx/thebes/gfxContext.cpp:151 ---Type <return> to continue, or q <return> to quit--- #19 0x412edf88 in mozilla::layers::ThebesLayerBuffer::DrawBufferQuadrant (this=<value optimized out>, aTarget=0x48cf5a90, aXSide=<value optimized out>, aYSide=<value optimized out>, aOpacity=1, aXRes=0.816299975, aYRes=0.816299975) at gfx/layers/ThebesLayerBuffer.cpp:123 #20 0x412ee028 in mozilla::layers::ThebesLayerBuffer::DrawBufferWithRotation (this=0x47f7e90c, aTarget=0x48cf5a90, aOpacity=1, aXRes=0.816299975, aYRes=0.816299975) at gfx/layers/ThebesLayerBuffer.cpp:136 #21 0x412e5bd0 in mozilla::layers::BasicThebesLayerBuffer::DrawTo (this=0x47f7e90c, aLayer=0x47f7e820, aIsOpaqueContent=1, aTarget=0x48cf5a90, aOpacity=1) at gfx/layers/basic/BasicLayers.cpp:507 #22 0x412e6a9c in mozilla::layers::BasicShadowThebesLayer::Paint (this=0x47f7e820, aContext=<value optimized out>, aCallback=<value optimized out>, aCallbackData=<value optimized out>, aOpacity= regcache.c:178: internal-error: register_size: Assertion `regnum >= 0 && regnum < (gdbarch_num_regs (gdbarch) + gdbarch_num_pseudo_regs (gdbarch))' failed. And I think I've seen this problem already see bug 568767 , 558087
blocking?
tracking-fennec: --- → ?
tracking-fennec: ? → 2.0b1+
(gdb) frame 10 #10 0x415672e8 in _cairo_xlib_surface_composite (op=CAIRO_OPERATOR_ADD, src_pattern=0xbecfc5b8, mask_pattern=0xbecfb2c8, abstract_dst=0x48c36120, src_x=0, src_y=0, mask_x=0, mask_y=0, dst_x=0, dst_y=0, width=800, height=480, clip_region=0x0) at /home/romaxa/microbcomponent/cedar-mobile-dev/gfx/cairo/cairo/src/cairo-xlib-surface.c:2294 2294 XRenderComposite (dst->dpy, Current language: auto; currently c (gdb) p src_pattern $1 = (const cairo_pattern_t *) 0xbecfc5b8 (gdb) p *src_pattern $2 = {type = CAIRO_PATTERN_TYPE_SURFACE, ref_count = {ref_count = 0}, status = CAIRO_STATUS_SUCCESS, user_data = { size = 0, num_elements = 0, element_size = 12, elements = 0x0, is_snapshot = 0}, matrix = {xx = 0.99999999999999989, yx = 0, xy = 0, yy = 0.99999999999999989, x0 = 0, y0 = 428.99999999999994}, filter = CAIRO_FILTER_NEAREST, extend = CAIRO_EXTEND_NONE, has_component_alpha = 0} (gdb) p *mask_pattern $3 = {type = CAIRO_PATTERN_TYPE_SURFACE, ref_count = {ref_count = 0}, status = CAIRO_STATUS_SUCCESS, user_data = { size = 0, num_elements = 0, element_size = 12, elements = 0x0, is_snapshot = 0}, matrix = {xx = 1, yx = 0, xy = 0, yy = 1, x0 = 0, y0 = 0}, filter = CAIRO_FILTER_GOOD, extend = CAIRO_EXTEND_NONE, has_component_alpha = 0}
I've compiled pixman 18-2 (for Xorg), and that did not help much... maybe 1% faster...
Attached file Pixman params.
I've set NEAREST filter for aTarget I see small improvement but not much... diff --git a/gfx/layers/ThebesLayerBuffer.cpp b/gfx/layers/ThebesLayerBuffer.cpp --- a/gfx/layers/ThebesLayerBuffer.cpp +++ b/gfx/layers/ThebesLayerBuffer.cpp @@ -99,16 +99,20 @@ ThebesLayerBuffer::DrawBufferQuadrant(gf PR_TRUE); gfxPoint quadrantTranslation(quadrantRect.x, quadrantRect.y); nsRefPtr<gfxPattern> pattern = new gfxPattern(mBuffer); #ifdef MOZ_GFX_OPTIMIZE_MOBILE gfxPattern::GraphicsFilter filter = gfxPattern::FILTER_NEAREST; pattern->SetFilter(filter); + nsRefPtr<gfxPattern> ctxpattern = aTarget->GetPattern(); + gfxPattern::GraphicsFilter prevFilter = ctxpattern->Filter(); + ctxpattern->SetFilter(filter); + aTarget->SetPattern(ctxpattern); #endif // Transform from user -> buffer space. gfxMatrix transform; transform.Scale(aXRes, aYRes); transform.Translate(-quadrantTranslation); pattern->SetMatrix(transform); @@ -117,16 +121,19 @@ ThebesLayerBuffer::DrawBufferQuadrant(gf if (aOpacity != 1.0) { aTarget->Save(); aTarget->Clip(); aTarget->Paint(aOpacity); aTarget->Restore(); } else { aTarget->Fill(); } +#ifdef MOZ_GFX_OPTIMIZE_MOBILE + ctxpattern->SetFilter(prevFilter); +#endif } void ThebesLayerBuffer::DrawBufferWithRotation(gfxContext* aTarget, float aOpacity, float aXRes, float aYRes) { // Draw four quadrants. We could use REPEAT_, but it's probably better // not to, to be performance-safe. In attachment backtrace of Xorg Here is oprofile results with nearest 253 1.8387 oprofiled oprofiled /usr/bin/oprofiled 265 1.9259 libpixman-1.so.0.18.2 Xorg pixman_composite_src_8888_0565_asm_neon 296 2.1512 no-vmlinux busybox /no-vmlinux 439 3.1904 no-vmlinux oprofiled /no-vmlinux 473 3.4375 no-vmlinux Xorg /no-vmlinux 545 3.9608 libpixman-1.so.0.18.2 Xorg pixman_composite_over_8888_8888_asm_neon 574 4.1715 libpixman-1.so.0.18.2 Xorg pixman_composite_scanline_out_reverse_asm_neon 576 4.1860 libpixman-1.so.0.18.2 Xorg fetch_scanline_a8 625 4.5422 libpixman-1.so.0.18.2 Xorg pixman_composite_scanline_add_mask_asm_neon 1321 9.6003 libpixman-1.so.0.18.2 Xorg fetch_pixel_r5g6b5 1568 11.3953 no-vmlinux no-vmlinux /no-vmlinux 3257 23.6701 libpixman-1.so.0.18.2 Xorg bits_image_fetch_transformed
The key will be working out what has introduced the scaling. I'm also curious whether these graphics drivers accelerate RepeatPad. Render's RepeatNone is a bit weird, while RepeatPad corresponds to GL blending. If the drivers support GL, it would be trivial to accelerate scaling with RepeatPad.
I guess we are not talking about Fennec and HW acceleration right now... at least for b1... we need to figure out how to avoid unnecessary scaling and make sure that we always do pixel snapping. While scrolling we definitely must not do any scaling...
NONE and PAD repeat hopefully are going to be fast in pixman soon, at least for nearest scaling even when trying to sample pixels outside the source image: http://lists.freedesktop.org/archives/pixman/2010-September/000547.html Based on this "ThebesLayerBuffer::DrawBufferWithRotation" function name, it may be also 90/180/270 degrees rotation. This is also going to be improved. Anyway, right now it is way too easy to put pixman to knees using various transforms. Improvements in this area are really needed. Even though some of such performance issues are caused by improperly selected operations or transform parameters by the applications.
(In reply to comment #9) > Anyway, right now it is way too easy to put pixman to knees using various > transforms. Improvements in this area are really needed. Even though some of > such performance issues are caused by improperly selected operations or > transform parameters by the applications. Right, but don't these devices have some graphics processor? Pixman should not be getting used for simple composites with transforms. (In reply to comment #8) > I guess we are not talking about Fennec and HW acceleration right now... at > least for b1... HW accel is the responsibility of the X11 graphics driver, not Fennec. > we need to figure out how to avoid unnecessary scaling and make sure that we > always do pixel snapping. Agreed.
(In reply to comment #10) > (In reply to comment #9) > > Anyway, right now it is way too easy to put pixman to knees using various > > transforms. Improvements in this area are really needed. Even though some of > > such performance issues are caused by improperly selected operations or > > transform parameters by the applications. > > Right, but don't these devices have some graphics processor? > Pixman should not be getting used for simple composites with transforms. That's not so clear and largely depends on the quality of the drivers (and some of the drivers are proprietary/closed -> that means little hope for them). The reality is that currently hardware acceleration often has difficulties competing with software rendering when doing 2D graphics: http://blogs.gnome.org/otte/2010/06/26/fun-with-benchmarks/ Surely there is a huge potential for HW acceleration if it is used right. But software rendering also still has some room for improvement (especially scaling, rotation and radial/linear gradients). There is one more point to consider. When using integrated graphics, the same memory is shared between the graphics chip and the CPU. Many of 2D operations are memory bandwidth limited, which means that there is absolutely no chance for the GPU to do this job faster. Surely GPU can take some load off from CPU and let it do something else. But in some cases I myself would just prefer to just have another CPU core instead of GPU.
Ok, I've found our old friend: pref("browser.ui.zoom.pageFitGranularity", 9); which seems breaking UI/ShadowLayer resolution relation ship... seems like pageFitGranularity scaling Chrome context and forcing permanent scale to viewport "setViewportScale" pageFitGranularity - must change setContentResolution and scale layers content... not layers itself.
oleg, fennec uses that preference to adjust the zoom level: http://mxr.mozilla.org/mobile-browser/source/chrome/content/browser.js#2404
and that preference makes Set viewportScale browser.xml:|0.8122| in updateTo: function And that makes our layers permanently scaled down to 0.8122, and make pixman very unhappy.... I see here 2 options: 1) stop scaling top level layers (preferred option) 2) switch things to image rendering and use optimized pixman scaling
(In reply to comment #14) > and that preference makes > Set viewportScale browser.xml:|0.8122| in updateTo: function > > And that makes our layers permanently scaled down to 0.8122, and make pixman > very unhappy.... > That doesn't necessarily imply a permanent scale; only if there's not a corresponding setResolution() call. The layers log should show that (or a person who knows the frontend code could chime in).
setResolution call is made as soon as scale is set by _updateCacheViewport:http://mxr.mozilla.org/mobile-browser/source/chrome/content/bindings/browser.xml#427In fact if you scrolled several pixels setResolution will be called again and again (which apparently doesn't fix this problem). My theory is that the setResolution call is being dropped somewhere.We can't just change pageFitGranularity because that is just masking the problem. It might pop up somewhere else and I'm not comfortable changing functinoality because changing a pref fixes one known case.Romaxa or whoever can reproduce this: I suggest the following:1) Make sure pattern matrix and source matrix are in fact differenthttp://mxr.mozilla.org/mozilla-central/source/gfx/layers/ThebesLayerBuffer.cpp#1092) Break in setResolution and see why this isn't getting reflected in layers tree
Sorry about line breaks, seems to be a problem with submitting a comment in Bugzilla after someone else has changed something in the bug.
Bug 598111 might be related to this, and has steps to reproduce.
Depends on: 598111
Romaxa: could you test again and see if this is a problem?
tested on latest m-c + m-b sometime fit to view make such scale that we are hitting this path: 1211 5.3065 libc-2.5.so Xorg memcpy 1289 5.6483 libc-2.5.so fennec memcpy 2678 11.7348 libxul.so fennec fast_composite_scaled_nearest_565_565_none_SRC 2711 11.8794 libxul.so fennec pixman_composite_over_8888_0565_asm_neon and this is looks more less fast, but we still do runtime scaling... sometime something happen, and we go by path in attachment
I'm still unclear about what comment 14 refers to. For layer surfaces and compositing, I think our options are (1) (current impl) use X to render into Xlib back surfaces in the content process, use X for compositing in the chrome process (2) render into gfxImageSurfaces in the content process, [mumble upload mumble] gfxImageSurfaces and use X for compositing in the chrome process (3) render into gfxImageSurfaces in the content process, composite those with non-X cairo to a gfxImageSurface in the chrome process, upload composited screen to X (3) is approximately what happens on android currently, and it's pretty fast. We can flip between (1) and (2) relatively easily (for (2), it might be better to upload from the content process, which is more work). We'd need a fair amount more work to implement (3), but it wouldn't be too terribly hard. The real question in my mind is where we want to be heading with GL compositing. If texture-from-pixmap is fast on device, I would imagine we'd want the setup of (2) or (1), except use GL directly for compositing in the chrome process. If it's not, we'll probably want a (2a) that uploads directly to textures instead of Pixmaps. In any case, we wouldn't want (3) unless compositing on the GPU is slower than in software, and in that case we've lost. Does everyone agrees with that assessment? If so, I think we should give (2a) a try (dougt has a patch?). If that's still too slow, we can try a (2b) that uses one gfxImageSurface and two Xlib surfaces for thebes layers, renders into the ImageSurface in chrome, uploads to the Xlib back surface from content, and then swaps with the chrome front surface. If all that is still too slow, we can try (3), but I want to save that for last because I think it will be a dead end in the long run.
(In reply to comment #20) > Created attachment 477300 [details] > m-c: 3ff2aa532d41, m-b: 1d6ea74da2c0, Qt Raster > > sometime something happen, and we go by path in attachment > > $7 = {matrix = {{65536, 0, 0}, {0, 65536, 73703511}, {0, 0, 65536}}} This looks like scale factor is set to 1.0, but pixman is still hitting transformed path because of having fractional translate (73703511 ~= 1124.6).
> For layer surfaces and compositing, I think our options are GTK use option 1. Qt raster (default on N900) use option 3 (same as android) Ok, I've made smal hack suggested by Ben: gfxMatrix ctxMatrix = aTarget->CurrentMatrix(); if (ctxMatrix.xx == aXRes && ctxMatrix.yy == aYRes) { ctxMatrix.xx = ctxMatrix.yy = 1.0; aTarget->SetMatrix(ctxMatrix); } else transform.Scale(aXRes, aYRes); And this removing .00000000000000000000001 scale value... and bits_image_fetch_transformed problem... But I still see a lot of store/fetch non-optimized calls 380 1.6618 libc-2.5.so Xorg memcpy 387 1.6924 libc-2.5.so fennec memcpy 496 2.1691 libxul.so fennec pixman_composite_scanline_add_mask_asm_neon 749 3.2755 vmlinux-2.6.28-omap1 vmlinux-2.6.28-omap1 omap3_enter_idle 852 3.7259 libxul.so fennec pixman_composite_over_8888_0565_asm_neon 1242 5.4314 libxul.so fennec fetch_scanline_a8 1686 7.3731 libxul.so fennec store_scanline_r5g6b5 3841 16.7971 libxul.so fennec combine_out_reverse_u 6000 26.2387 libxul.so fennec fetch_scanline_r5g6b5
And finally I did aTrget->NudgeCurrentMatrixToIntegers(); transform.Translate(-gfxPoint((int)quadrantTranslation.x, (int)quadrantTranslation.y)); ******************************* And that make scrolling very fast: 581 2.7208 libxul.so fennec pixman_composite_src_n_0565_asm_neon 606 2.8379 libQtGui.so.4.6.2 fennec void qt_memfill_template<unsigned int, unsigned int>(unsigned int*, unsigned int, int) 815 3.8166 libxul.so fennec js::Interpret(JSContext*, JSStackFrame*, unsigned int, unsigned int) 890 4.1678 libxul.so fennec pixman_composite_src_0565_0565_asm_neon 934 4.3739 libc-2.5.so Xorg memcpy 967 4.5284 libc-2.5.so fennec memcpy 1405 6.5796 vmlinux-2.6.28-omap1 vmlinux-2.6.28-omap1 omap3_enter_idle 2101 9.8389 libxul.so fennec pixman_composite_over_8888_0565_asm_neon 2768 12.9624 oprofiled oprofiled /usr/bin/oprofiled
Heh, I'm rebuilding with the nudge patch as we speak.
(In reply to comment #21) > If that's still > too slow, we can try a (2b) that uses one gfxImageSurface and two Xlib surfaces > for thebes layers, renders into the ImageSurface in chrome That was supposed to say, "renders into the ImageSurface in *content*", oops.
(In reply to comment #23) > I still see a lot of store/fetch non-optimized calls > 496 2.1691 libxul.so fennec > pixman_composite_scanline_add_mask_asm_neon > 749 3.2755 vmlinux-2.6.28-omap1 vmlinux-2.6.28-omap1 > omap3_enter_idle > 852 3.7259 libxul.so fennec > pixman_composite_over_8888_0565_asm_neon > 1242 5.4314 libxul.so fennec > fetch_scanline_a8 > 1686 7.3731 libxul.so fennec > store_scanline_r5g6b5 > 3841 16.7971 libxul.so fennec > combine_out_reverse_u > 6000 26.2387 libxul.so fennec > fetch_scanline_r5g6b5 The 'combine_out_reverse_u' alone is addressed by the following patch since pixman-0.19.2 (and it somewhat helps in scaled cases too): http://cgit.freedesktop.org/pixman/commit/?id=8a5d1be1dab799ed23239f3471b4a351d8356368 But looks like we may need some new neon fast path function(s?). I wonder if it is something like 'add_0565_8_0565' or 'out_reverse_0565_8_0565'? And the just released pixman-0.19.4 has some improvements in the scaling code. Basically if the transformation matrix is a bit wrong and the pixels outside of the source image border are accessed, it will also run fast path code. But the fast paths will have different function names: one with '_cover_' part in its name and another one with '_none_'.
After some discussion, we're going to go with option (3) to mitigate risk. Oleg, do you mind spinning off these ThebesLayerBuffer optimizations into a separate bug, since they apply to qt and gtk? That bug needs to block b1 also. Let's keep this one focused on the gtk rendering backend. Oleg also pointed out that bug 483409 is basically "implement option (3)". I'll see how the required patches here evolve to see whether a dup is appropriate.
With bug 598531 applied and a hacky patch to force image rendering everywhere applied, we're doing pretty well. Pretty close to perf with X surfaces and X compositing, maybe a little better. This is good because the hacky patch is taking the slow-boat socket-upload path for the final window image. Will see what that's like with XShm.
Assignee: nobody → jones.chris.g
Stole all the hard parts from bug 483409 (thanks Vlad!). This seems an improvement to me on device, but (i) we're still getting lags while scrolling, and (ii) we're still not up to MicroB perf. The lags might be from being dumb about XSync()ing; that can be optimized.
Attached patch hackier WIP 2, without XSync (obsolete) — Splinter Review
Maybe a little better without XSync()ing, but there are still scrolling lags. This might be chrome/content fighting over CPU, not sure. I did notice that scrolling perf is significantly worse on the nytimes homepage when fully zoomed out, compared to zoomed on an article. Zoomed-in, we're doing quite well, worse so zoomed out.
Attachment #477431 - Attachment is obsolete: true
These patches put our gtk eggs into the same basket as qt's, and hopefully get gtk to parity perf-wise modulo possible XSync inefficiency.
Comment on attachment 477454 [details] [diff] [review] part 4: If UseClientSideRendering(), render to a gfxImageSurface backed by memory shared with X and then XShmPutImage to copy it to our window's drawable works fine, no X in profile: only slow local pixman: 91 1.0583 libxul.so plugin-container fetch_scanline_a8 193 1.0694 libxul.so fennec pixman_composite_scanline_add_mask_asm_neon 204 1.1304 libc-2.5.so fennec getenv - do we call it often? 261 1.4462 libxul.so plugin-container pixman_composite_over_n_8_0565_asm_neon 312 1.7288 libxul.so fennec pixman_composite_over_8888_0565_asm_neon 453 2.5101 libxul.so fennec fetch_scanline_a8 699 3.8732 libxul.so fennec store_scanline_r5g6b5 14247.8905 libxul.so fennec combine_out_reverse_u 2139 11.8524 oprofiled oprofiled /usr/bin/oprofiled 2329 12.9052 libxul.so fennec fetch_scanline_r5g
Attachment #477454 - Flags: feedback+
Attachment #477451 - Attachment description: part 1: Add gfxPlatformGtk::UserClientSideRendering() to control whether to avoid X server compositing → part 1: Add gfxPlatformGtk::UseClientSideRendering() to control whether to avoid X server compositing
Attachment #477451 - Flags: superreview?(vladimir) → superreview+
Comment on attachment 477452 [details] [diff] [review] part 2: Create offscreen gfxImageSurfaces if UseClientSideRendering() sure, why not
Attachment #477452 - Flags: review?(vladimir) → review+
Comment on attachment 477453 [details] [diff] [review] part 3: Generalize the not-using-X-compositing check when creating shadow-layer backing surfaces >+ShadowLayerManager::PlatformSyncBeforeReplyUpdate() >+{ >+ // FIXME/bug 597336: with client-side Gtk rendering, we need to >+ // ensure that our widget's backing shm image isn't scribbled over >+ // before all its pending XShmPutImage()s complete. That said, >+ // XSync() is an unnecessarily heavyweight synchronization >+ // mechanism; other options are possible. If this XSync is shown to >+ // hurt responsiveness, we need to explore the other options. This doesn't seem to be the right place for that. Here we don't know whether the widget is using XShm. (The shm image involved here is not used with X.) >+ // child, even those will be read operations. Otherwise, the child "even though they will be"? r+ if you find another place for the XSync in part 4 and check UsingXCompositing() here.
Attachment #477453 - Flags: review?(karlt) → review+
Comment on attachment 477454 [details] [diff] [review] part 4: If UseClientSideRendering(), render to a gfxImageSurface backed by memory shared with X and then XShmPutImage to copy it to our window's drawable >+include $(topsrcdir)/config/config.mk >+include $(topsrcdir)/ipc/chromium/chromium-config.mk Why is this necessary? >+#include "mozilla/X11Util.h" Is this used? >+class nsShmImage { GdkImage would do most of the work here for us and so would be a simpler solution. Given we hope to replace this with GL, doing less work here sounds good. However, we may eventually want this (or a GDK-free version) available to other clients also, and this looks like a good first step. I also think the use of IPC_RMID here is better than in _gdk_image_new_for_depth. So I'm ok with either solution. >+ ScopedXFree<XVisualInfo> vinfos; >+ int nVinfos; >+ XVisualInfo vinfoTemplate; >+ vinfoTemplate.visual = aVisual; >+ vinfoTemplate.visualid = XVisualIDFromVisual(aVisual); >+ vinfoTemplate.screen = screen; >+ >+ vinfos = XGetVisualInfo(dpy, VisualIDMask|VisualScreenMask, >+ &vinfoTemplate, &nVinfos); >+ NS_ASSERTION(vinfos && nVinfos == 1, >+ "Could not locate unique matching XVisualInfo for Visual"); >+ XVisualInfo* vinfo = &vinfos[0]; gfxXlibSurface::DepthOfVisual() already exists for this (Maybe I should have put that in X11Util.h.), but it might be more efficient to get the depth of the drawable and pass it through EnsureShmImage. >+ shm->mImage->data = reinterpret_cast<char*>(shm->mSegment->memory()); I prefer static_cast here. (With void*, we know that the intention is that is should be cast to another type. I read reinterpret_cast as actually changing the interpretation of the data.) >+ if (!XShmAttach(dpy, &shm->mInfo)) { >+ return nsnull; >+ } If this fails (and as currently implemented, that would only be because the server doesn't support MIT-SHM), then the destructor should not XShmDetach. AFAICT, the point at which lack of functional XShm is detected is also here, but that is only through a BadAccess X error. This might happen due to access privileges or due to the server being on a different machine (which is AFAICT only detected through not finding a shm segment with the same id). This means that XSync and and checking for errors is required here. gdk_error_trap_push/pop would be useful for that. If it fails once then probably worth recording the failure to avoid trying again. If you want to avoid too many XSyncs, then you might be able to sneak in an optimization to skip subsequent gdk_error_trap_push/XSyncs if the first succeeds. >+ shm->mSize = aSize; >+ shm->mFormat = (shm->mImage->depth == 24) ? >+ gfxASurface::ImageFormatRGB24 : >+ gfxASurface::ImageFormatRGB16_565; Hmm. I would have thought we had code to do Visual -> ImageFormat properly, but I haven't found it. At least make Create fail for depth != 24 or 16 (probably only 8, 32). >+ GC gc = XCreateGC(dpy, d, 0, 0); (dpy, d, 0, NULL), for clarity. >+ // XXX should be OK to destroy the old image here even if >+ // there are outstanding Puts, because the X server should >+ // still have the memory mapped into its address space; the >+ // Detach is ordered after the Puts Only if we know the server has processed the Attach. There also needs to be an XSync before reusing the same shared image for a subsequent paint (or queue up paints until notified of PutImage completion, or ...). >- if (translucent) { >+ if (UseShm()) { >+ // We're using an xshm mapping as a back buffer. >+ layerBuffering = BasicLayerManager::BUFFER_NONE; >+ } else if (translucent) { "#ifdef MOZ_HAVE_SHMIMAGE" for consistency with the UseShm test below. The UseShm and translucent properties are orthogonal. If both are set then both need to be considered. >@@ -6547,22 +6701,33 @@ nsWindow::GetThebesSurface() Are there any consumers of nsThebesDeviceContext::CreateRenderingContext() or nsThebesRenderingContext::Init() that still want to use this to draw to the window? It might make more sense to leave mThebesSurfaces as is and use a fresh image surface for the gfxContext in OnExposeEvent. > nsRefPtr<gfxASurface> mThebesSurface; >+ // If we're using xshm rendering, mThebesSurface wraps mShmImage >+ nsRefPtr<nsShmImage> mShmImage; "#if defined(MOZ_X11)" I guess. I don't know if we care about "!defined(MOZ_HAVE_SHAREDMEMORYSYSV)" I assume destructors are run in the reverse order to constructors? Might be better to put mShmImage before mThebesSurface, if mThebesSurface is still going to wrap the shared memory segment.
Attachment #477454 - Flags: review?(karlt) → review-
(In reply to comment #39) > Comment on attachment 477453 [details] [diff] [review] > part 3: Generalize the not-using-X-compositing check when creating shadow-layer > backing surfaces > > >+ShadowLayerManager::PlatformSyncBeforeReplyUpdate() > >+{ > >+ // FIXME/bug 597336: with client-side Gtk rendering, we need to > >+ // ensure that our widget's backing shm image isn't scribbled over > >+ // before all its pending XShmPutImage()s complete. That said, > >+ // XSync() is an unnecessarily heavyweight synchronization > >+ // mechanism; other options are possible. If this XSync is shown to > >+ // hurt responsiveness, we need to explore the other options. > > This doesn't seem to be the right place for that. > Here we don't know whether the widget is using XShm. > (The shm image involved here is not used with X.) > OK, works for me. > >+ // child, even those will be read operations. Otherwise, the child > > "even though they will be"? > Yes, typo, thanks. > r+ if you find another place for the XSync in part 4 and check > UsingXCompositing() here. Will do. I'll just stick it after all the XShmPutImage()s for now.
(In reply to comment #40) > Comment on attachment 477454 [details] [diff] [review] > part 4: If UseClientSideRendering(), render to a gfxImageSurface backed by > memory shared with X and then XShmPutImage to copy it to our window's drawable > > >+include $(topsrcdir)/config/config.mk > >+include $(topsrcdir)/ipc/chromium/chromium-config.mk > > Why is this necessary? > To pull in SharedMemorySysV, which depends indirectly on some chromium code (unfortunately :S). > >+#include "mozilla/X11Util.h" > > Is this used? > It defines ScopedXFree, which I used in this version of the patch. But with the change you requested below, I won't need ScopedXFree. > >+class nsShmImage { > > GdkImage would do most of the work here for us and so would be a simpler > solution. Given we hope to replace this with GL, doing less work here sounds > good. > > However, we may eventually want this (or a GDK-free version) available > to other clients also, and this looks like a good first step. I also think > the use of IPC_RMID here is better than in _gdk_image_new_for_depth. > > So I'm ok with either solution. > Alright. I don't have much of an opinion either way, and this code works, and I'm not familiar with GdkImage, so I would prefer to use this implementation. > >+ ScopedXFree<XVisualInfo> vinfos; > >+ int nVinfos; > >+ XVisualInfo vinfoTemplate; > >+ vinfoTemplate.visual = aVisual; > >+ vinfoTemplate.visualid = XVisualIDFromVisual(aVisual); > >+ vinfoTemplate.screen = screen; > >+ > >+ vinfos = XGetVisualInfo(dpy, VisualIDMask|VisualScreenMask, > >+ &vinfoTemplate, &nVinfos); > >+ NS_ASSERTION(vinfos && nVinfos == 1, > >+ "Could not locate unique matching XVisualInfo for Visual"); > >+ XVisualInfo* vinfo = &vinfos[0]; > > gfxXlibSurface::DepthOfVisual() already exists for this (Maybe I should have > put that in X11Util.h.), but it might be more efficient to get the depth of > the drawable and pass it through EnsureShmImage. > OK. > >+ shm->mImage->data = reinterpret_cast<char*>(shm->mSegment->memory()); > > I prefer static_cast here. (With void*, we know that the intention is that is > should be cast to another type. I read reinterpret_cast as actually changing > the interpretation of the data.) > Oh, nice catch. Not sure what I was thinking here. > >+ if (!XShmAttach(dpy, &shm->mInfo)) { > >+ return nsnull; > >+ } > > If this fails (and as currently implemented, that would only be because the > server doesn't support MIT-SHM), then the destructor should not XShmDetach. > OK. I wasn't sure about that. > AFAICT, the point at which lack of functional XShm is detected is also here, > but that is only through a BadAccess X error. This might happen due to access > privileges or due to the server being on a different machine (which is AFAICT > only detected through not finding a shm segment with the same id). > Oh ... that sounds bad. Does that mean it's possible for the remote server to find a shm segment that incidentally has the same ID? I guess it's not worth worrying about too much. > This means that XSync and and checking for errors is required here. > gdk_error_trap_push/pop would be useful for that. > > If it fails once then probably worth recording the failure to avoid trying > again. > Sounds good. > If you want to avoid too many XSyncs, then you might be able to sneak in an > optimization to skip subsequent gdk_error_trap_push/XSyncs if the first > succeeds. > I'm not worried about that at all; for fennec, we'll have exactly one top-level native widget, and it will never be resized, so this code will only run once. > >+ shm->mSize = aSize; > >+ shm->mFormat = (shm->mImage->depth == 24) ? > >+ gfxASurface::ImageFormatRGB24 : > >+ gfxASurface::ImageFormatRGB16_565; > > Hmm. I would have thought we had code to do Visual -> ImageFormat properly, > but I haven't found it. > > At least make Create fail for depth != 24 or 16 (probably only 8, 32). > Will do. > >+ GC gc = XCreateGC(dpy, d, 0, 0); > > (dpy, d, 0, NULL), for clarity. > OK. > >+ // XXX should be OK to destroy the old image here even if > >+ // there are outstanding Puts, because the X server should > >+ // still have the memory mapped into its address space; the > >+ // Detach is ordered after the Puts > > Only if we know the server has processed the Attach. > Ah! Nice catch. Since we'll be syncing after Attach, I'll edit this comment to note that. > There also needs to be an XSync before reusing the same shared image for a > subsequent paint (or queue up paints until notified of PutImage completion, or > ...). > Yep, will add that in Put(). We can optimize later. > >- if (translucent) { > >+ if (UseShm()) { > >+ // We're using an xshm mapping as a back buffer. > >+ layerBuffering = BasicLayerManager::BUFFER_NONE; > >+ } else if (translucent) { > > "#ifdef MOZ_HAVE_SHMIMAGE" for consistency with the UseShm test below. > OK. > The UseShm and translucent properties are orthogonal. If both are set then > both need to be considered. > I thought about that when writing the patch, but only cared about fennec where we don't have translucent widgets. But I think I can just reorder things here so that the logic stays right for translucency. > >@@ -6547,22 +6701,33 @@ nsWindow::GetThebesSurface() > > Are there any consumers of nsThebesDeviceContext::CreateRenderingContext() or > nsThebesRenderingContext::Init() that still want to use this to draw to the > window? > BasicLayers uses this as a reference for CreateSimilarSurface. Not sure about other consumers. > It might make more sense to leave mThebesSurfaces as is and use a fresh image > surface for the gfxContext in OnExposeEvent. > Per above, that might lead to us creating an Xlib surface in BasicLayers in odd cases. Also not sure about other possible consumers. > > nsRefPtr<gfxASurface> mThebesSurface; > >+ // If we're using xshm rendering, mThebesSurface wraps mShmImage > >+ nsRefPtr<nsShmImage> mShmImage; > > "#if defined(MOZ_X11)" I guess. OK ... FWIW I tried to keep ifdefs to a minimum. > I don't know if we care about "!defined(MOZ_HAVE_SHAREDMEMORYSYSV)" > We do unfortunately, because of --disable-ipc builds where SharedMemorySysV isn't available. But in practice we don't care about X11-without-SysV-shm, right. > I assume destructors are run in the reverse order to constructors? > Might be better to put mShmImage before mThebesSurface, if mThebesSurface is > still going to wrap the shared memory segment. OK, sounds more robust.
Attachment #477454 - Attachment is obsolete: true
Comment on attachment 477862 [details] [diff] [review] part 4: If UseClientSideRendering(), render to a gfxImageSurface backed by memory shared with X and then XShmPutImage to copy it to our window's drawable, v2 >+ gdk_window_get_internal_paint_info(aWindow, &gd, &dx, &dy); >+ >+ Display* dpy = gdk_x11_get_default_xdisplay(); >+ Drawable d = GDK_DRAWABLE_XID(gd); >+ >+ GC gc = XCreateGC(dpy, d, 0, nsnull); >+ for (GdkRectangle* r = aRects; r < aEnd; r++) { >+ XShmPutImage(dpy, d, gc, mImage, >+ r->x, r->y, >+ r->x, r->y, I think you should subtract dx,dy from the destination position. >+ new gfxImageSurface(reinterpret_cast<unsigned char*>(mSegment->memory()), This can be a static_cast too >+# ifdef MOZ_HAVE_SHMIMAGE >+ if (UseShm()) >+ // EnsureShmImage() is a dangerous interface, but we guarantee >+ // that the thebes surface and the shmimage have the same >+ // lifetime >+ mThebesSurface = EnsureShmImage(size, >+ visual, gdk_drawable_get_depth(d), >+ mShmImage); >+ else >+# endif // MOZ_HAVE_SHMIMAGE > > mThebesSurface = new gfxXlibSurface Fall back to gfxXlibSurface if EnsureShmImage fails (or, maybe better, use XPutImage with an regular image surface).
Attachment #477862 - Flags: review?(karlt) → review+
(In reply to comment #44) > Comment on attachment 477862 [details] [diff] [review] > part 4: If UseClientSideRendering(), render to a gfxImageSurface backed by > memory shared with X and then XShmPutImage to copy it to our window's drawable, > v2 > > >+ gdk_window_get_internal_paint_info(aWindow, &gd, &dx, &dy); > >+ > >+ Display* dpy = gdk_x11_get_default_xdisplay(); > >+ Drawable d = GDK_DRAWABLE_XID(gd); > >+ > >+ GC gc = XCreateGC(dpy, d, 0, nsnull); > >+ for (GdkRectangle* r = aRects; r < aEnd; r++) { > >+ XShmPutImage(dpy, d, gc, mImage, > >+ r->x, r->y, > >+ r->x, r->y, > > I think you should subtract dx,dy from the destination position. > Done. > >+ new gfxImageSurface(reinterpret_cast<unsigned char*>(mSegment->memory()), > > This can be a static_cast too > Ooopsie again! Fixed. > >+# ifdef MOZ_HAVE_SHMIMAGE > >+ if (UseShm()) > >+ // EnsureShmImage() is a dangerous interface, but we guarantee > >+ // that the thebes surface and the shmimage have the same > >+ // lifetime > >+ mThebesSurface = EnsureShmImage(size, > >+ visual, gdk_drawable_get_depth(d), > >+ mShmImage); > >+ else > >+# endif // MOZ_HAVE_SHMIMAGE > > > > mThebesSurface = new gfxXlibSurface > > Fall back to gfxXlibSurface if EnsureShmImage fails (or, maybe better, use > XPutImage with an regular image surface). OK, I added a fallback to gfxXlibSurface.
Not sure if I read the patches right, but I don't see any tests associated to these set of patches. Verification of this bug is going to require a full smoketest.
Flags: in-testsuite?
It seems that this patch is causing a segfault during the talos Txul test: http://tinderbox.mozilla.org/showlog.cgi?log=Mobile/1285293533.1285293915.27556.gz It could be causing some of the content crashes we have been seeing on devices too.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(gdb) bt #0 0x0000000000000000 in ?? () #1 0x00007f66c6ec4937 in mozilla::layers::PLayersParent::DeallocShmem (this=0x7f66ac69b080, aMem=...) at PLayersParent.cpp:404 #2 0x00007f66c70161dd in DestroySharedShmemSurface<mozilla::layers::PLayersParent> (this=0x2909ce0, aSurface=0x29e6e28) at /home/cjones/mozilla/mozilla-central/gfx/layers/ipc/ShadowLayers.cpp:465 #3 mozilla::layers::ShadowLayerManager::DestroySharedSurface (this=0x2909ce0, aSurface=0x29e6e28) at /home/cjones/mozilla/mozilla-central/gfx/layers/ipc/ShadowLayers.cpp:505 #4 0x00007f66c7018641 in mozilla::layers::ShadowLayerParent::ActorDestroy (this=0x2393fd0, why=43937320) at /home/cjones/mozilla/mozilla-central/gfx/layers/ipc/ShadowLayerParent.cpp:79 #5 0x00007f66c6ec4719 in mozilla::layers::PLayerParent::OnMessageReceived (this=0x2393fd0, __msg=<value optimized out>) at PLayerParent.cpp:88 #6 0x00007f66c6ec038e in mozilla::dom::PContentParent::OnMessageReceived (this=0x284f980, __msg=...) at PContentParent.cpp:440 #7 0x00007f66c6ea5baf in mozilla::ipc::AsyncChannel::OnDispatchMessage (this=0x284f990, msg=...) at /home/cjones/mozilla/mozilla-central/ipc/glue/AsyncChannel.cpp:262 #8 0x00007f66c6ea9880 in mozilla::ipc::RPCChannel::OnMaybeDequeueOne (this=0x284f990) at /home/cjones/mozilla/mozilla-central/ipc/glue/RPCChannel.cpp:438 #9 0x00007f66c6fb0d56 in MessageLoop::RunTask (this=0x1d8e110, task=0x7f66a4005aa0) at /home/cjones/mozilla/mozilla-central/ipc/chromium/src/base/message_loop.cc:343 #10 0x00007f66c6fb1188 in MessageLoop::DeferOrRunPendingTask (this=0x7f66ac69b080, pending_task=<value optimized out>) at /home/cjones/mozilla/mozilla-central/ipc/chromium/src/base/message_loop.cc:351 #11 0x00007f66c6fb1425 in MessageLoop::DoWork (this=0x1d8e110) at /home/cjones/mozilla/mozilla-central/ipc/chromium/src/base/message_loop.cc:451 #12 0x00007f66c6ea807c in mozilla::ipc::DoWorkRunnable::Run (this=<value optimized out>) at /home/cjones/mozilla/mozilla-central/ipc/glue/MessagePump.cpp:70 #13 0x00007f66c6f8169f in nsThread::ProcessNextEvent (this=0x1d97fb0, mayWait=0, result=0x7ffffce2631c) at /home/cjones/mozilla/mozilla-central/xpcom/threads/nsThread.cpp:547 #14 0x00007f66c6f4f66d in NS_ProcessNextEvent_P (thread=0x7f66ac69b080, mayWait=43937320) at nsThreadUtils.cpp:250 #15 0x00007f66c6ea7eb2 in mozilla::ipc::MessagePump::Run (this=0x1d8df50, aDelegate=0x1d8e110) at /home/cjones/mozilla/mozilla-central/ipc/glue/MessagePump.cpp:110 #16 0x00007f66c6fb0f2e in MessageLoop::RunHandler (this=0x7f66ac69b080) at /home/cjones/mozilla/mozilla-central/ipc/chromium/src/base/message_loop.cc:202 #17 MessageLoop::Run (this=0x7f66ac69b080) at /home/cjones/mozilla/mozilla-central/ipc/chromium/src/base/message_loop.cc:176 #18 0x00007f66c6dfdc41 in nsBaseAppShell::Run (this=0x1daa680) at /home/cjones/mozilla/mozilla-central/widget/src/xpwidgets/nsBaseAppShell.cpp:180 #19 0x00007f66c6cce5aa in nsAppStartup::Run (this=0x20f28e0) at /home/cjones/mozilla/mozilla-central/toolkit/components/startup/src/nsAppStartup.cpp:191 #20 0x00007f66c654af29 in XRE_main (argc=<value optimized out>, argv=<value optimized out>, aAppData=<value optimized out>) at /home/cjones/mozilla/mozilla-central/toolkit/xre/nsAppRunner.cpp:3673 #21 0x0000000000400ef9 in main (argc=4, argv=0x7ffffce26d08) at /home/cjones/mozilla/mozilla-central/mobile/app/nsBrowserApp.cpp:155 This is bug 591555.
Confirmed that the patch there makes this go away on my desktop build. Will wait until that lands and cycles before re-closing this.
It was 591555.
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
verified FIXED on build: Mozilla/5.0 (Maemo; Linux armv71; rv:2.0b6pre) Gecko/20100927 Namoroka/4.0b7pre Fennec/2.0b1pre
Status: RESOLVED → VERIFIED
Depends on: 603724
Depends on: 646714
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: