Closed Bug 612662 Opened 14 years ago Closed 14 years ago

Crash [@ _cairo_image_surface_assume_ownership_of_data]

Categories

(Core :: SVG, defect)

x86
macOS
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla2.0b8
Tracking Status
blocking2.0 --- final+

People

(Reporter: jruderman, Assigned: dholbert)

References

(Blocks 1 open bug)

Details

(Keywords: crash, testcase, Whiteboard: [sg:critical?])

Crash Data

Attachments

(9 files, 2 obsolete files)

110 bytes, image/svg+xml
Details
42.34 KB, text/plain
Details
15.44 KB, text/plain
Details
217 bytes, image/svg+xml
Details
1.30 KB, patch
roc
: review+
Details | Diff | Splinter Review
1.64 KB, patch
roc
: review+
Details | Diff | Splinter Review
19.40 KB, text/plain
Details
4.86 KB, patch
roc
: review+
Details | Diff | Splinter Review
3.85 KB, patch
roc
: review+
Details | Diff | Splinter Review
WARNING: Detected DOM modification during paint, bailing out!: file layout/base/FrameLayerBuilder.cpp, line 1704

Followed by a crash [@ _cairo_image_surface_assume_ownership_of_data]
Attached file log with stack trace
This is super bad. We should not be modifying the DOM in any way during painting.
Assignee: nobody → dholbert
blocking2.0: --- → final+
On my linux x86_64 machine, I don't get a crash or the warning that Jesse mentions in Comment 0. Instead, I get this output (repeating once for each paint, it appears):

> WARNING: Invalid type to RecordMemoryUsedForSurfaceType!: file gfxASurface.cpp, line 543
> ###!!! ASSERTION: gfxASurface::CairoSurface called with mSurface == nsnull!: 'mSurface != nsnull', file gfxASurface.h, line 117
> WARNING: Invalid type to RecordMemoryUsedForSurfaceType!: file gfxASurface.cpp, line 543
Confirmed that this crashes soon after loading on a Mac nightly build, though.  (No crashreporter dialog.)
Just kidding, there was a crash report after all -- the dialog just didn't appear right away.  Report is: bp-95fc8b76-e6c7-4f5c-bddf-dc97c2101116
fwiw, we have bug 335054 on adding assertions to ensure we don't modify the DOM during painting.
We already have that warning in comment #0. We can't make it an assertion because we know plugins can trigger it in "valid" situations without async plugin painting.
I don't get the warning every time. Maybe that's because it sometimes manages to crash first?

I'm looking forward to being free of sync plugin painting and being able to enable these assertions!  Should we enable the assertions per-platform as we get async plugin painting up?
I suppose we could do that, but we have to either support async plugin painting for in-process plugins, or run all plugins out of process before we can enable that for a particular platform.
Here's the backtrace at the point where we fire the warning.

In this case, we've gotten to the line that issues the warning because, in FrameLayerBuilder::CheckDOMModified, we find the following values:
  mInitialDOMGeneration == 121
  mRootPresContext->GetDOMGeneration() == 137

Those aren't equal (which apparently means we've done some DOM modifications), so we issue the NS_WARNING.
Right. The question is, what bumped the DOM-generation during painting? We might want to add some extra debug-only code to assert immediately when that happens, so you get a useful stack.
The initial testcase uses _the same SVG file_ as the background, which makes this trickier than it needs to be.

Here's an alternate testcase that reproduces this crash (and reproduces Comment 3's assertions on Linux) using a trivial SVG data-URI for the background (that trivial SVG being "<svg xmlns="http://www.w3.org/2000/svg"></svg>")
Attachment #490956 - Attachment description: testcase (crashes Firefox when loaded) → testcase (crashes Firefox when loaded on a Mac)
Attachment #491320 - Attachment is patch: false
Attachment #491320 - Attachment mime type: text/plain → image/svg+xml
So the linux assertion & warning are because we try to create a gfxImageSurface with
  width = 33475
  height = 288
and that fails in "_cairo_image_surface_create_with_pixman_format()" because "_cairo_image_surface_is_size_valid()" returns false, since our width is greater than MAX_IMAGE_SIZE (32767) as defined in cairo-image-surface.c

This means we end up having a gfxImageSurface with a null "mSurface" pointer (as indicated in the assertion in Comment 3) and mSurfaceValid == PR_FALSE.

The warning from Comment 3 is harmless and actually should be avoided -- I'll attach a patch to keep us from hitting the code that warns there.
So in the gfxImageSurface constructor, if our surface-setup fails (e.g. due to the size being too large), then our call to gfxASurface::Init will...
 (a) set mSurfaceValid = PR_FALSE.
 (b) destroy our (invalid) surface
http://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxASurface.cpp?mark=219-221#214

The next thing the gfxImageSurface constructor does is to call "RecordMemoryUsed".  However, we shouldn't be making that call, because:
   Part (b) above means we're basically not using any memory
   Part (a) above means that the RecordMemoryUsed call ends up failing and spamming the NS_WARNING from comment 3 (specifically because gfxASurface::GetType() fails, returning -1, when mSurfaceValid == PR_FALSE -- and RecordMemoryUsed expects a valid GetType() value to be passed in)

So, we shouldn't actually be calling RecordMemoryUsed in this case. This first patch addresses that. (Doesn't actually fix the bug, though.)
Attachment #491407 - Flags: review?(roc)
(In reply to comment #13)
>   width = 33475
>   height = 288
> and that fails in "_cairo_image_surface_create_with_pixman_format()" because
> "_cairo_image_surface_is_size_valid()" returns false, since our width is
> greater than MAX_IMAGE_SIZE (32767) as defined in cairo-image-surface.c

Stepping through this on the mac right now... So far, we're ok with that large of a width, since _cairo_quartz_verify_surface_size in cairo-quartz-surface.c only mandates that the width be <= CG_MAX_WIDTH == USHRT_MAX.

My guess is that this giant (busted?) surface is the source of our trouble, even though _cairo_quartz_verify_surface_size thinks it's fine...
Ah, so at the point where we crash, we're calling
 _cairo_image_surface_assume_ownership_of_data (surface->imageSurfaceEquiv);
on a cairo_quartz_surface_t, and we have
  surface->base.status == CAIRO_STATUS_SUCCESS
but
  surface->imageSurfaceEquiv.status == CAIRO_STATUS_INVALID_SIZE

So maybe the "imageSurfaceEquiv" error-status just isn't getting recognized soon enough...
Aha!

So on Linux, we create only one surface, with cairo_image_surface_create_for_data(), which is called in the context of the attached patch. (and that fails in this case, so we effectively bail out.)

On Mac, we create two surfaces -- one with _cairo_quartz_surface_create_internal[1] and one with the same function used on Linux -- cairo_image_surface_create_for_data [2].  The former succeeds (for reasons described in Comment 15) and the latter fails (for the reason described in comment 13).  But we don't ever check the status of the latter one (at least, not at surface-creation-time.

The former surface is "surf->base", and the latter is "surf->imageSurfaceEquiv", described in Comment 16.

[1] http://mxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairo-quartz-surface.c#3150
[2] http://mxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairo-quartz-surface.c#3160
Attachment #491407 - Attachment description: Patch 1: Skip call to RecordMemoryUsed if surface-allocation failed. → Patch 1: Fix linux warning: skip call to RecordMemoryUsed if surface-allocation failed.
This patch fixes the Linux assertion from Comment 3. The assertion happens because we have an invalid gfxASurface, and we try to wrap it in a gfxContext, which calls gfxASurface::CairoSurface on it (and fails an assertion because the surface is invalid).
This patch fixes the mac crash on both testcases here, by checking "imageSurfaceEquiv" in our cairo_quartz_surface_t when it's created, and replacing it with a null pointer if it has issues.

I'm not sure what the implications of this are, but it works around this crash, at least.  (and we have checks like "if (surface->imageSurfaceEquiv)" sprinkled around our quartz cairo code, indicating that we can handle a world where imageSurfaceEquiv is null).

This patch does _not_ fix the "Detected DOM modification" warning on the first testcase, though.  I'll look into that next.
Attachment #491432 - Flags: review?(jmuizelaar)
Attachment #491428 - Flags: review?(roc)
Comment on attachment 491432 [details] [diff] [review]
Patch 3: Fix mac crash: set imageSurfaceEquiv to null if it fails

+    if (cairo_surface_status(tmpImageSurfaceEquiv)) {

space after cairo_surface_status and cairo_surface_destroy

Looks good to me, although you'll need to package this as a patch against cairo like the other patches in gfx/cairo.
(In reply to comment #2)
> This is super bad. We should not be modifying the DOM in any way during
> painting.

So these DOM modifications all seem to be tweaks to XUL elements in the helper-document -- in particular, <scrollbar> elements.  This happens during our SVGDocumentWrapper::UpdateViewportBounds call (which is needed to resize our helper SVG document, at paint-time, to potentially align its viewBox with whatever viewport it has been given.)

For example, see attached backtrace for one of the hits in IncrementDOMGeneration calls.  In the backtrace, we're inside of "nsGfxScrollFrameInner::FinishReflowForScrollbar", which has four calls to SetCoordAttribute (for scrollbars), each of which is effectively a call to nsGenericElement::SetAttrAndNotify.  And SetAttrAndNotify creates a mozAutoDocUpdate which calls IncrementDOMGeneration. (and ends up freaking out our FrameLayerBuilder later on)

Note that we don't actually want or need scrollbars in our helper document -- I'm avoiding drawing them by passing nsIPresShell::RENDER_IGNORE_VIEWPORT_SCROLLING to PresShell::RenderDocument (in VectorImage.cpp).  However, the RenderDocument call happens *after* we've hit this warning.  Maybe there's something I should be doing earlier disable scrollbars entirely?  (If so, that'd fix this or at least improve the situation here.)
Hmm, so the problem here is that the image document uses itself as an image at a different size, so while we're drawing it we try to reflow it to a different size. That is actually really bad.

I doubt there are any reasonable uses of this, so I suggest we simply prevent reentrant VectorImage::Draw calls on the same object. We'll need that to prevent infinite recursion in some cases, anyway. We should have a test for the obvious infinite-recursion case.

As a side issue, we should probably not construct scrollbars for image documents, for efficiency. See nsGfxScrollFrameInner::CreateAnonymousContent.
Ah, of course -- I thought I checked for that, but I guess not.  Yup -- disallowing reentrant "Draw()" calls sounds like a good plan.

I'll add a check for IsBeingUsedAsImage to nsGfxScrollFrameInner::CreateAnonymousContent, too -- thanks for the pointer on that.
This addresses roc's suggestions on the cairo patch, from Comment 20.
Attachment #491707 - Flags: review?(roc)
Attachment #491432 - Attachment is obsolete: true
Attachment #491432 - Flags: review?(jmuizelaar)
(Note that the bugzilla's shiny diff-viewer chokes on lines that start with "++", so it doesn't show most of the changed lines in new patch I've added in gfx/cairo.  Those lines show up fine in the raw patch-file view, though.)
You need to add a line to the README file as well
As discussed in Comment 22 & Comment 23, this patch adds a flag to VectorImage to prevent reentrant Draw calls.  It also adds a tweak to prevent scrollbar creation in svg-as-an-image documents.

(For the latter tweak, my patch collapses two formerly-nested "if" checks into a single "if" check, and then adds "presContext->Document()->IsBeingUsedAsImage()" as a new possible condition for triggering the contents of that clause.)
Attachment #491711 - Flags: review?(roc)
Added line to README per comment 26.
Attachment #491707 - Attachment is obsolete: true
Attachment #491714 - Flags: review?(roc)
Attachment #491707 - Flags: review?(roc)
Comment on attachment 491711 [details] [diff] [review]
Patch 4: Prevent reentrant VectorImage::Draw calls, & disable scrollbars in image docs

Need a testcase for recursive image drawing.
Attachment #491711 - Flags: review?(roc) → review+
Yup -- I'll include both testcases from this bug as crashtests.

Thanks for the help & reviews!
Flags: in-testsuite?
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Any objections to un-hiding this bug?

This only affected gecko2-era code (after SVG-as-an-image support was added), and it was fixed quickly, 5 months ago, well before Firefox 4 was released.
Blocks: 276431
Crash Signature: [@ _cairo_image_surface_assume_ownership_of_data]
Blocks: 335054
Group: core-security
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #7)
> We already have that warning in comment #0. We can't make it an assertion
> because we know plugins can trigger it in "valid" situations without async
> plugin painting.

For fuzzing, I'd be more than happy with an assertion that ignores the plugin case. Even if it's as an exclusion as wide as MOZ_ASSERT_IF(!everLoadedAnyPlugin, ...).
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: