Closed Bug 924679 Opened 6 years ago Closed 6 years ago

Frequent NULL pointer dereferences in gfxASurface::Wrap(_cairo_surface*)

Categories

(Core :: Graphics, defect)

x86_64
Linux
defect
Not set

Tracking

()

VERIFIED FIXED
mozilla28
Tracking Status
firefox28 --- verified

People

(Reporter: billm, Assigned: roc)

References

Details

Attachments

(7 files, 2 obsolete files)

1.06 KB, patch
mattwoodrow
: review+
Details | Diff | Splinter Review
2.86 KB, patch
karlt
: review+
Details | Diff | Splinter Review
13.09 KB, patch
karlt
: review+
Details | Diff | Splinter Review
3.98 KB, patch
mattwoodrow
: review+
Details | Diff | Splinter Review
24.65 KB, patch
karlt
: review+
Details | Diff | Splinter Review
6.24 KB, patch
mattwoodrow
: review+
Details | Diff | Splinter Review
3.33 KB, patch
karlt
: review+
Details | Diff | Splinter Review
I'm running on Linux with layers acceleration enabled using an nvidia card. Recently I've started to get tons of crashes like this one:

https://crash-stats.mozilla.com/report/index/3f287251-dec2-4270-a054-482d22131008

Matt said he thinks it might be related to bug 740200.

This bug is very easy for me to reproduce (usually I get a crash within 30 seconds). I'm happy to look for information in gdb if you guys need it.
Hop on IRC #gfx and we should be able to work through this.
Actually, I get this myself on startup when I enable OMTC GL in my default profile.
Attached patch wallpaper-ish fix (obsolete) — Splinter Review
The bug is that we have a DrawTargetDual which reports its type as BACKEND_CAIRO since that's what its two delegate DrawTargets are. However, it always returns null from GetNativeSurface, so we crash trying to wrap null. It's simple enough to take the fallback path instead.

I'm dubious that DrawTargetDual should return BACKEND_CAIRO as its backend type. Seems to me it would make more sense to have its own backend type. But I'm not sure so I'm not doing that here. Alternatively, when it wraps two cairo surfaces, we could perhaps create a cairo_tee_surface for those two surfaces and return that from GetNativeSurface.
Assignee: nobody → roc
Attachment #817733 - Flags: review?(matt.woodrow)
Comment on attachment 817733 [details] [diff] [review]
wallpaper-ish fix

Review of attachment 817733 [details] [diff] [review]:
-----------------------------------------------------------------

Won't this just result in the themed object not being drawn at all? We just return from gfxXlibNativeRenderer::Draw without doing anything if GetThebesSurfaceForDrawTarget returns nullptr.

I'm ok with landing this so that people don't have to crash, but we really need to fix it properly.

It appears (from a quick check) that this thebes surface is only used for a few things: creating a similar temporary surface and copying the background up into the temporary.

The former of those two things wouldn't need (or want?) a tee surface, we could just use either of the two cairo surfaces. I don't see how the latter could work with a tee surface, so i guess we're not hitting that without moz2d somehow.
Maybe the thing to do here is to rewrite gfxXlibNativeRenderer to use cairo directly instead of Thebes gfxASurfaces. Everywhere that we want to use gfxXlibNativeRenderer will have cairo.
I am working on that approach.
Attachment #817733 - Attachment is obsolete: true
Attachment #817733 - Flags: review?(matt.woodrow)
Attachment #819615 - Flags: review?(matt.woodrow) → review+
There's still some usage of gfxASurfaces in gfxXlibNativeRenderer with this patch: namely, gfxImageSurfaces, since I want to get this stuff landed to fix the crash bug. But it pushes quite far in the right direction.
Attachment #819620 - Flags: review?(matt.woodrow) → review+
Attachment #819625 - Flags: review?(matt.woodrow) → review+
Attachment #819617 - Flags: review?(karlt) → review+
Comment on attachment 819619 [details] [diff] [review]
Part 3: Make gfxXlibNativeRenderer::DrawWithXlib callback take a cairo_surface_t instead of a gfxXlibSurface

>+     * @param surfaceSize the surface has this size.
>+     * @param offset  draw at this offset into the given drawable
>      * @param clipRects an array of rectangles; clip to the union.
>      *                  Any rectangles provided will be contained by the
>      *                  rectangle with size provided to Draw and by the
>      *                  surface extents.
>      * @param numClipRects the number of rects in the array, or zero if
>      *                     no clipping is required.
>      */
>-    virtual nsresult DrawWithXlib(gfxXlibSurface* surface,
>+    virtual nsresult DrawWithXlib(cairo_surface_t* surface,
>                                   nsIntPoint offset,
>                                   nsIntRect* clipRects, uint32_t numClipRects) = 0;

No need to document |surfaceSize|, which doesn't exist.
Attachment #819619 - Flags: review?(karlt) → review+
Comment on attachment 819621 [details] [diff] [review]
Part 5: Add gfxXlibSurface::CreateCairo

Need to do something to free the pixmap when the surface is destroyed.
Attachment #819621 - Flags: review?(karlt) → review-
When I first read "CreateCairo", I thought it might create a cairo_t with an xlib surface as the target, so I suggest "CreateCairoSurface".
Comment on attachment 819624 [details] [diff] [review]
Part 6: Remove gfxXlibNativeRender::DrawOutput and convert tmpXlibSurface from gfxXlibSurface to cairo_surface_t

Can gfxColor.h nsAutoPtr.h gfxXlibSurface.h includes be removed from
gfxXlibNativeRenderer.h now?

>-        cairo_xlib_surface_get_screen (target) : screen;

>+        cairo_xlib_surface_get_screen (cairoTarget ) : screen;

Unnecessary extra space added.

>     // When the background has an alpha channel, we need to draw with an alpha
>     // channel anyway, so there is no need to copy the background.  If
>     // doCopyBackground is set here, we'll also need to check below that the
>     // background can copied without any loss in format conversions.
>     bool doCopyBackground = !drawIsOpaque && canDrawOverBackground &&
>-        target_content == CAIRO_CONTENT_COLOR;
>+        cairoTarget && cairo_surface_get_content (cairoTarget) == CAIRO_CONTENT_COLOR;

It would be better to set doCopyBackground if using the drawTarget.
We could even do that when the cairoTarget has alpha because the only
difference that might result would be using a visual more closing matching the
target (if there is more than one visual with alpha and
supportsAlternateVisual is set).

>-    if (ctx->IsCairo()) {

>-    } else {
>-        DrawTarget* drawTarget = ctx->GetDrawTarget();
>-        if (!drawTarget) {
>-            return;

>+    if (ctx->IsCairo()) {
>+      cairoTarget = cairo_get_group_target(ctx->GetCairo());
>+    } else {
>+      drawTarget = ctx->GetDrawTarget();
>+      cairoTarget = static_cast<cairo_surface_t*>
>+        (drawTarget->GetNativeSurface(NATIVE_SURFACE_CAIRO_SURFACE));
>+    }
>+
>+    cairo_surface_t* tempXlibSurface =
>+        CreateTempXlibSurface(cairoTarget, drawTarget, size,

Indentation confusion.

Do you know that GetDrawTarget() will return non-null here?

>+            NS_ASSERTION(cairoTarget, "eCopyBackground only used when there's a cairoTarget");
>+            cairo_set_operator(tmpCtx, CAIRO_OPERATOR_SOURCE);
>+            gfxPoint pt = -(offset + ctx->CurrentMatrix().GetTranslation());
>+            cairo_set_source_surface(tmpCtx, cairoTarget, pt.x, pt.y);

Oh.

I wonder whether it is worth including a comment here to indicate that the
extra slow path is taken unnecessarily when the DrawTarget is a
DrawTargetSkia, for example.
Attachment #819624 - Flags: review?(karlt) → review+
>+    SurfaceFormat moz2DFormat =
>+        cairo_surface_get_content(tempXlibSurface) == CAIRO_CONTENT_COLOR ?
>+            FORMAT_B8G8R8A8 : FORMAT_B8G8R8X8;

>+            RefPtr<SourceSurface> sourceSurface =
>+                Factory::CreateSourceSurfaceForCairoSurface(tempXlibSurface,
>+                                                            moz2DFormat);

What happens when moz2DFormat doesn't match the format of the cairo_surface_t?
Flags: needinfo?(roc)
Attached patch Part 5 v2Splinter Review
Attachment #820246 - Flags: review?(karlt)
Flags: needinfo?(roc)
(In reply to Karl Tomlinson (:karlt) from comment #18)
> Comment on attachment 819624 [details] [diff] [review]
> >     // When the background has an alpha channel, we need to draw with an alpha
> >     // channel anyway, so there is no need to copy the background.  If
> >     // doCopyBackground is set here, we'll also need to check below that the
> >     // background can copied without any loss in format conversions.
> >     bool doCopyBackground = !drawIsOpaque && canDrawOverBackground &&
> >-        target_content == CAIRO_CONTENT_COLOR;
> >+        cairoTarget && cairo_surface_get_content (cairoTarget) == CAIRO_CONTENT_COLOR;
> 
> It would be better to set doCopyBackground if using the drawTarget.
> We could even do that when the cairoTarget has alpha because the only
> difference that might result would be using a visual more closing matching
> the
> target (if there is more than one visual with alpha and
> supportsAlternateVisual is set).

We could, but the case where the drawTarget wraps a single cairo surface (which should be the common case) is already handled because cairoTarget will be non-null in that case.

> Do you know that GetDrawTarget() will return non-null here?

Yes.

> >+            NS_ASSERTION(cairoTarget, "eCopyBackground only used when there's a cairoTarget");
> >+            cairo_set_operator(tmpCtx, CAIRO_OPERATOR_SOURCE);
> >+            gfxPoint pt = -(offset + ctx->CurrentMatrix().GetTranslation());
> >+            cairo_set_source_surface(tmpCtx, cairoTarget, pt.x, pt.y);
> 
> Oh.
> 
> I wonder whether it is worth including a comment here to indicate that the
> extra slow path is taken unnecessarily when the DrawTarget is a
> DrawTargetSkia, for example.

I'm a little confused. Why is this unnecessary for DrawTargetSkia?

(In reply to Karl Tomlinson (:karlt) from comment #19)
> >+    SurfaceFormat moz2DFormat =
> >+        cairo_surface_get_content(tempXlibSurface) == CAIRO_CONTENT_COLOR ?
> >+            FORMAT_B8G8R8A8 : FORMAT_B8G8R8X8;
> 
> >+            RefPtr<SourceSurface> sourceSurface =
> >+                Factory::CreateSourceSurfaceForCairoSurface(tempXlibSurface,
> >+                                                            moz2DFormat);
> 
> What happens when moz2DFormat doesn't match the format of the
> cairo_surface_t?

I don't think it matters all that much since we're just wrapping the cairo implementation here.
Comment on attachment 820246 [details] [diff] [review]
Part 5 v2

This looks correct, but I'm not sure the extra code is worthwhile.

CreateTempXlibSurface() could use the existing gfxXlibSurface Create() method and return the result of CairoSurface() on that.
Attachment #820246 - Flags: review?(karlt) → review+
Similarly, I'm not clear what moving from the C++ gfxXlibSurface objects to the C cairo_surface_t objects, with manual ref counting, really wins here.

Is there a reason why the cairo_surface_t from GetNativeSurface(NATIVE_SURFACE_CAIRO_SURFACE) can't be wrapped in a gfxXlibSurface()?
Flags: needinfo?(roc)
> > >+            NS_ASSERTION(cairoTarget, "eCopyBackground only used when there's a cairoTarget");
> > >+            cairo_set_operator(tmpCtx, CAIRO_OPERATOR_SOURCE);
> > >+            gfxPoint pt = -(offset + ctx->CurrentMatrix().GetTranslation());
> > >+            cairo_set_source_surface(tmpCtx, cairoTarget, pt.x, pt.y);
> > 
> > Oh.
> > 
> > I wonder whether it is worth including a comment here to indicate that the
> > extra slow path is taken unnecessarily when the DrawTarget is a
> > DrawTargetSkia, for example.
> 
> I'm a little confused. Why is this unnecessary for DrawTargetSkia?

Currently the patch only supports this path for DrawTargetCairo IIUC, and yes, that is all that is important now.

I assume Snapshot() could be used efficiently in copying the background if the native surface is a CPU memory surface in a DrawTargetSkia.  If we are planning to use Skia with CPU surfaces and compare perf one day, then a comment might help someone trying to improve perf with Skia.  If we are not, then don't worry.

Similarly, we may be able to do better with tee surfaces, but we'd need additional code to do some sort of optimization and I guess DrawTargetDual is the same kind of thing.  It's not important unless it is a common path.

> (In reply to Karl Tomlinson (:karlt) from comment #19)
> > >+    SurfaceFormat moz2DFormat =
> > >+        cairo_surface_get_content(tempXlibSurface) == CAIRO_CONTENT_COLOR ?
> > >+            FORMAT_B8G8R8A8 : FORMAT_B8G8R8X8;
> > 
> > >+            RefPtr<SourceSurface> sourceSurface =
> > >+                Factory::CreateSourceSurfaceForCairoSurface(tempXlibSurface,
> > >+                                                            moz2DFormat);
> > 
> > What happens when moz2DFormat doesn't match the format of the
> > cairo_surface_t?
> 
> I don't think it matters all that much since we're just wrapping the cairo
> implementation here.

Yes, I suspected it didn't matter, but maybe it's worth a comment explaining that it doesn't so the formats don't need to match exactly.
(In reply to Karl Tomlinson (:karlt) from comment #23)
> Similarly, I'm not clear what moving from the C++ gfxXlibSurface objects to
> the C cairo_surface_t objects, with manual ref counting, really wins here.

The goal is to move away from gfxASurface stuff to use Moz2D only. It seemed simpler here to use cairo_surface_t than to convert the code to use Moz2D DrawTarget and SourceSurface.

(In reply to Karl Tomlinson (:karlt) from comment #24)
> I assume Snapshot() could be used efficiently in copying the background if
> the native surface is a CPU memory surface in a DrawTargetSkia.  If we are
> planning to use Skia with CPU surfaces and compare perf one day, then a
> comment might help someone trying to improve perf with Skia.  If we are not,
> then don't worry.

Yeah I don't think we'll be trying to mix Skia drawing with X drawing and expecting it to perform well.

> Yes, I suspected it didn't matter, but maybe it's worth a comment explaining
> that it doesn't so the formats don't need to match exactly.

OK. Thanks!
Flags: needinfo?(roc)
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #25)
> (In reply to Karl Tomlinson (:karlt) from comment #23)
> > Similarly, I'm not clear what moving from the C++ gfxXlibSurface objects to
> > the C cairo_surface_t objects, with manual ref counting, really wins here.
> 
> The goal is to move away from gfxASurface stuff to use Moz2D only. It seemed
> simpler here to use cairo_surface_t than to convert the code to use Moz2D
> DrawTarget and SourceSurface.

OK, thanks, and yes, I could imagine cairo_surface_t being simpler here than using Moz2D.
(In reply to Bill McCloskey (:billm) from comment #29)
> Thanks Rob!

Bill, I assume this is verified fixed based on your comment, can you please confirm?
Flags: needinfo?(wmccloskey)
Sorry, wrong status flag.
I think I did check that it worked. I stopped using hardware acceleration a while ago since it doesn't seem like we intend to support it in the future. So I'm not sure.
Flags: needinfo?(wmccloskey)
(In reply to Bill McCloskey (:billm) from comment #32)
> I think I did check that it worked. I stopped using hardware acceleration a
> while ago since it doesn't seem like we intend to support it in the future.
> So I'm not sure.

Okay, thanks Bill. I'm going to flag this to see if we can reproduce this and verify it's fixed on our own hardware. If not we'll have to assume this is fixed and mark it [qa-].
Keywords: verifyme
I tried to replicate this issue based on the available data from socorro (Comment 0) and other comments or dependencies (Bug 740200) associated to it, with no luck. I've used Ubuntu 13.10 64-bit [1] with the latest Beta (BuildID: 20140303165517) and a GeForce 210 graphics card.

Marking this issue as verified. If anyone has any additional information in terms of reproducibility, please let me know and I will follow up on this. 


[1] Mozilla/5.0 (X11; Linux x86_64; rv:28.0) Gecko/20100101 Firefox/28.0
Status: RESOLVED → VERIFIED
Keywords: verifyme
You need to log in before you can comment on or make changes to this bug.