As a security precaution, we have turned on the setting "Require API key authentication for API requests" for everyone. If this has broken something, please contact bugzilla-admin@mozilla.org
Last Comment Bug 612662 - Crash [@ _cairo_image_surface_assume_ownership_of_data]
: Crash [@ _cairo_image_surface_assume_ownership_of_data]
Status: RESOLVED FIXED
[sg:critical?]
: crash, testcase
Product: Core
Classification: Components
Component: SVG (show other bugs)
: Trunk
: x86 Mac OS X
: -- critical (vote)
: mozilla2.0b8
Assigned To: Daniel Holbert [:dholbert]
:
: Jet Villegas (:jet)
Mentors:
Depends on:
Blocks: randomstyles 335054 276431
  Show dependency treegraph
 
Reported: 2010-11-16 12:42 PST by Jesse Ruderman
Modified: 2014-02-14 13:09 PST (History)
7 users (show)
dholbert: in‑testsuite+
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
final+


Attachments
testcase (crashes Firefox when loaded on a Mac) (110 bytes, image/svg+xml)
2010-11-16 12:42 PST, Jesse Ruderman
no flags Details
log with stack trace (42.34 KB, text/plain)
2010-11-16 12:43 PST, Jesse Ruderman
no flags Details
backtrace when we fire the warning (15.44 KB, text/plain)
2010-11-16 18:11 PST, Daniel Holbert [:dholbert]
no flags Details
testcase 2 (crashes Firefox when loaded on a Mac) (217 bytes, image/svg+xml)
2010-11-17 14:02 PST, Daniel Holbert [:dholbert]
no flags Details
Patch 1: Fix linux warning: skip call to RecordMemoryUsed if surface-allocation failed. (1.30 KB, patch)
2010-11-17 17:44 PST, Daniel Holbert [:dholbert]
roc: review+
Details | Diff | Splinter Review
Patch 2: Fix linux assertion (1.64 KB, patch)
2010-11-17 19:11 PST, Daniel Holbert [:dholbert]
roc: review+
Details | Diff | Splinter Review
Patch 3: Fix mac crash: set imageSurfaceEquiv to null if it fails (1.14 KB, patch)
2010-11-17 19:22 PST, Daniel Holbert [:dholbert]
no flags Details | Diff | Splinter Review
backtrace of nsRootPresContext::IncrementDOMGeneration call (19.40 KB, text/plain)
2010-11-18 13:32 PST, Daniel Holbert [:dholbert]
no flags Details
Patch 3 v2: Fix mac crash: set imageSurfaceEquiv to null if it fails (2.88 KB, patch)
2010-11-18 16:21 PST, Daniel Holbert [:dholbert]
no flags Details | Diff | Splinter Review
Patch 4: Prevent reentrant VectorImage::Draw calls, & disable scrollbars in image docs (4.86 KB, patch)
2010-11-18 16:29 PST, Daniel Holbert [:dholbert]
roc: review+
Details | Diff | Splinter Review
Patch 3 v3: Fix mac crash: set imageSurfaceEquiv to null if it fails (now with README note) (3.85 KB, patch)
2010-11-18 16:38 PST, Daniel Holbert [:dholbert]
roc: review+
Details | Diff | Splinter Review

Description User image Jesse Ruderman 2010-11-16 12:42:19 PST
Created attachment 490956 [details]
testcase (crashes Firefox when loaded on a Mac)

WARNING: Detected DOM modification during paint, bailing out!: file layout/base/FrameLayerBuilder.cpp, line 1704

Followed by a crash [@ _cairo_image_surface_assume_ownership_of_data]
Comment 1 User image Jesse Ruderman 2010-11-16 12:43:33 PST
Created attachment 490957 [details]
log with stack trace
Comment 2 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2010-11-16 12:45:00 PST
This is super bad. We should not be modifying the DOM in any way during painting.
Comment 3 User image Daniel Holbert [:dholbert] 2010-11-16 13:43:56 PST
On my linux x86_64 machine, I don't get a crash or the warning that Jesse mentions in Comment 0. Instead, I get this output (repeating once for each paint, it appears):

> WARNING: Invalid type to RecordMemoryUsedForSurfaceType!: file gfxASurface.cpp, line 543
> ###!!! ASSERTION: gfxASurface::CairoSurface called with mSurface == nsnull!: 'mSurface != nsnull', file gfxASurface.h, line 117
> WARNING: Invalid type to RecordMemoryUsedForSurfaceType!: file gfxASurface.cpp, line 543
Comment 4 User image Daniel Holbert [:dholbert] 2010-11-16 13:55:12 PST
Confirmed that this crashes soon after loading on a Mac nightly build, though.  (No crashreporter dialog.)
Comment 5 User image Daniel Holbert [:dholbert] 2010-11-16 14:38:49 PST
Just kidding, there was a crash report after all -- the dialog just didn't appear right away.  Report is: bp-95fc8b76-e6c7-4f5c-bddf-dc97c2101116
Comment 6 User image Jesse Ruderman 2010-11-16 14:42:09 PST
fwiw, we have bug 335054 on adding assertions to ensure we don't modify the DOM during painting.
Comment 7 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2010-11-16 14:53:45 PST
We already have that warning in comment #0. We can't make it an assertion because we know plugins can trigger it in "valid" situations without async plugin painting.
Comment 8 User image Jesse Ruderman 2010-11-16 15:04:54 PST
I don't get the warning every time. Maybe that's because it sometimes manages to crash first?

I'm looking forward to being free of sync plugin painting and being able to enable these assertions!  Should we enable the assertions per-platform as we get async plugin painting up?
Comment 9 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2010-11-16 15:07:49 PST
I suppose we could do that, but we have to either support async plugin painting for in-process plugins, or run all plugins out of process before we can enable that for a particular platform.
Comment 10 User image Daniel Holbert [:dholbert] 2010-11-16 18:11:10 PST
Created attachment 491066 [details]
backtrace when we fire the warning

Here's the backtrace at the point where we fire the warning.

In this case, we've gotten to the line that issues the warning because, in FrameLayerBuilder::CheckDOMModified, we find the following values:
  mInitialDOMGeneration == 121
  mRootPresContext->GetDOMGeneration() == 137

Those aren't equal (which apparently means we've done some DOM modifications), so we issue the NS_WARNING.
Comment 11 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2010-11-16 18:23:06 PST
Right. The question is, what bumped the DOM-generation during painting? We might want to add some extra debug-only code to assert immediately when that happens, so you get a useful stack.
Comment 12 User image Daniel Holbert [:dholbert] 2010-11-17 14:02:54 PST
Created attachment 491320 [details]
testcase 2 (crashes Firefox when loaded on a Mac)

The initial testcase uses _the same SVG file_ as the background, which makes this trickier than it needs to be.

Here's an alternate testcase that reproduces this crash (and reproduces Comment 3's assertions on Linux) using a trivial SVG data-URI for the background (that trivial SVG being "<svg xmlns="http://www.w3.org/2000/svg"></svg>")
Comment 13 User image Daniel Holbert [:dholbert] 2010-11-17 17:37:58 PST
So the linux assertion & warning are because we try to create a gfxImageSurface with
  width = 33475
  height = 288
and that fails in "_cairo_image_surface_create_with_pixman_format()" because "_cairo_image_surface_is_size_valid()" returns false, since our width is greater than MAX_IMAGE_SIZE (32767) as defined in cairo-image-surface.c

This means we end up having a gfxImageSurface with a null "mSurface" pointer (as indicated in the assertion in Comment 3) and mSurfaceValid == PR_FALSE.

The warning from Comment 3 is harmless and actually should be avoided -- I'll attach a patch to keep us from hitting the code that warns there.
Comment 14 User image Daniel Holbert [:dholbert] 2010-11-17 17:44:39 PST
Created attachment 491407 [details] [diff] [review]
Patch 1: Fix linux warning: skip call to RecordMemoryUsed if surface-allocation failed.

So in the gfxImageSurface constructor, if our surface-setup fails (e.g. due to the size being too large), then our call to gfxASurface::Init will...
 (a) set mSurfaceValid = PR_FALSE.
 (b) destroy our (invalid) surface
http://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxASurface.cpp?mark=219-221#214

The next thing the gfxImageSurface constructor does is to call "RecordMemoryUsed".  However, we shouldn't be making that call, because:
   Part (b) above means we're basically not using any memory
   Part (a) above means that the RecordMemoryUsed call ends up failing and spamming the NS_WARNING from comment 3 (specifically because gfxASurface::GetType() fails, returning -1, when mSurfaceValid == PR_FALSE -- and RecordMemoryUsed expects a valid GetType() value to be passed in)

So, we shouldn't actually be calling RecordMemoryUsed in this case. This first patch addresses that. (Doesn't actually fix the bug, though.)
Comment 15 User image Daniel Holbert [:dholbert] 2010-11-17 17:59:16 PST
(In reply to comment #13)
>   width = 33475
>   height = 288
> and that fails in "_cairo_image_surface_create_with_pixman_format()" because
> "_cairo_image_surface_is_size_valid()" returns false, since our width is
> greater than MAX_IMAGE_SIZE (32767) as defined in cairo-image-surface.c

Stepping through this on the mac right now... So far, we're ok with that large of a width, since _cairo_quartz_verify_surface_size in cairo-quartz-surface.c only mandates that the width be <= CG_MAX_WIDTH == USHRT_MAX.

My guess is that this giant (busted?) surface is the source of our trouble, even though _cairo_quartz_verify_surface_size thinks it's fine...
Comment 16 User image Daniel Holbert [:dholbert] 2010-11-17 18:18:53 PST
Ah, so at the point where we crash, we're calling
 _cairo_image_surface_assume_ownership_of_data (surface->imageSurfaceEquiv);
on a cairo_quartz_surface_t, and we have
  surface->base.status == CAIRO_STATUS_SUCCESS
but
  surface->imageSurfaceEquiv.status == CAIRO_STATUS_INVALID_SIZE

So maybe the "imageSurfaceEquiv" error-status just isn't getting recognized soon enough...
Comment 17 User image Daniel Holbert [:dholbert] 2010-11-17 18:38:02 PST
Aha!

So on Linux, we create only one surface, with cairo_image_surface_create_for_data(), which is called in the context of the attached patch. (and that fails in this case, so we effectively bail out.)

On Mac, we create two surfaces -- one with _cairo_quartz_surface_create_internal[1] and one with the same function used on Linux -- cairo_image_surface_create_for_data [2].  The former succeeds (for reasons described in Comment 15) and the latter fails (for the reason described in comment 13).  But we don't ever check the status of the latter one (at least, not at surface-creation-time.

The former surface is "surf->base", and the latter is "surf->imageSurfaceEquiv", described in Comment 16.

[1] http://mxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairo-quartz-surface.c#3150
[2] http://mxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairo-quartz-surface.c#3160
Comment 18 User image Daniel Holbert [:dholbert] 2010-11-17 19:11:52 PST
Created attachment 491428 [details] [diff] [review]
Patch 2: Fix linux assertion

This patch fixes the Linux assertion from Comment 3. The assertion happens because we have an invalid gfxASurface, and we try to wrap it in a gfxContext, which calls gfxASurface::CairoSurface on it (and fails an assertion because the surface is invalid).
Comment 19 User image Daniel Holbert [:dholbert] 2010-11-17 19:22:53 PST
Created attachment 491432 [details] [diff] [review]
Patch 3: Fix mac crash: set imageSurfaceEquiv to null if it fails

This patch fixes the mac crash on both testcases here, by checking "imageSurfaceEquiv" in our cairo_quartz_surface_t when it's created, and replacing it with a null pointer if it has issues.

I'm not sure what the implications of this are, but it works around this crash, at least.  (and we have checks like "if (surface->imageSurfaceEquiv)" sprinkled around our quartz cairo code, indicating that we can handle a world where imageSurfaceEquiv is null).

This patch does _not_ fix the "Detected DOM modification" warning on the first testcase, though.  I'll look into that next.
Comment 20 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2010-11-17 19:54:33 PST
Comment on attachment 491432 [details] [diff] [review]
Patch 3: Fix mac crash: set imageSurfaceEquiv to null if it fails

+    if (cairo_surface_status(tmpImageSurfaceEquiv)) {

space after cairo_surface_status and cairo_surface_destroy

Looks good to me, although you'll need to package this as a patch against cairo like the other patches in gfx/cairo.
Comment 21 User image Daniel Holbert [:dholbert] 2010-11-18 13:32:19 PST
Created attachment 491642 [details]
backtrace of nsRootPresContext::IncrementDOMGeneration call

(In reply to comment #2)
> This is super bad. We should not be modifying the DOM in any way during
> painting.

So these DOM modifications all seem to be tweaks to XUL elements in the helper-document -- in particular, <scrollbar> elements.  This happens during our SVGDocumentWrapper::UpdateViewportBounds call (which is needed to resize our helper SVG document, at paint-time, to potentially align its viewBox with whatever viewport it has been given.)

For example, see attached backtrace for one of the hits in IncrementDOMGeneration calls.  In the backtrace, we're inside of "nsGfxScrollFrameInner::FinishReflowForScrollbar", which has four calls to SetCoordAttribute (for scrollbars), each of which is effectively a call to nsGenericElement::SetAttrAndNotify.  And SetAttrAndNotify creates a mozAutoDocUpdate which calls IncrementDOMGeneration. (and ends up freaking out our FrameLayerBuilder later on)

Note that we don't actually want or need scrollbars in our helper document -- I'm avoiding drawing them by passing nsIPresShell::RENDER_IGNORE_VIEWPORT_SCROLLING to PresShell::RenderDocument (in VectorImage.cpp).  However, the RenderDocument call happens *after* we've hit this warning.  Maybe there's something I should be doing earlier disable scrollbars entirely?  (If so, that'd fix this or at least improve the situation here.)
Comment 22 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2010-11-18 14:05:54 PST
Hmm, so the problem here is that the image document uses itself as an image at a different size, so while we're drawing it we try to reflow it to a different size. That is actually really bad.

I doubt there are any reasonable uses of this, so I suggest we simply prevent reentrant VectorImage::Draw calls on the same object. We'll need that to prevent infinite recursion in some cases, anyway. We should have a test for the obvious infinite-recursion case.

As a side issue, we should probably not construct scrollbars for image documents, for efficiency. See nsGfxScrollFrameInner::CreateAnonymousContent.
Comment 23 User image Daniel Holbert [:dholbert] 2010-11-18 15:01:14 PST
Ah, of course -- I thought I checked for that, but I guess not.  Yup -- disallowing reentrant "Draw()" calls sounds like a good plan.

I'll add a check for IsBeingUsedAsImage to nsGfxScrollFrameInner::CreateAnonymousContent, too -- thanks for the pointer on that.
Comment 24 User image Daniel Holbert [:dholbert] 2010-11-18 16:21:36 PST
Created attachment 491707 [details] [diff] [review]
Patch 3 v2: Fix mac crash: set imageSurfaceEquiv to null if it fails

This addresses roc's suggestions on the cairo patch, from Comment 20.
Comment 25 User image Daniel Holbert [:dholbert] 2010-11-18 16:24:32 PST
(Note that the bugzilla's shiny diff-viewer chokes on lines that start with "++", so it doesn't show most of the changed lines in new patch I've added in gfx/cairo.  Those lines show up fine in the raw patch-file view, though.)
Comment 26 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2010-11-18 16:25:30 PST
You need to add a line to the README file as well
Comment 27 User image Daniel Holbert [:dholbert] 2010-11-18 16:29:54 PST
Created attachment 491711 [details] [diff] [review]
Patch 4: Prevent reentrant VectorImage::Draw calls, & disable scrollbars in image docs

As discussed in Comment 22 & Comment 23, this patch adds a flag to VectorImage to prevent reentrant Draw calls.  It also adds a tweak to prevent scrollbar creation in svg-as-an-image documents.

(For the latter tweak, my patch collapses two formerly-nested "if" checks into a single "if" check, and then adds "presContext->Document()->IsBeingUsedAsImage()" as a new possible condition for triggering the contents of that clause.)
Comment 28 User image Daniel Holbert [:dholbert] 2010-11-18 16:38:34 PST
Created attachment 491714 [details] [diff] [review]
Patch 3 v3: Fix mac crash: set imageSurfaceEquiv to null if it fails (now with README note)

Added line to README per comment 26.
Comment 29 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2010-11-18 17:13:09 PST
Comment on attachment 491711 [details] [diff] [review]
Patch 4: Prevent reentrant VectorImage::Draw calls, & disable scrollbars in image docs

Need a testcase for recursive image drawing.
Comment 30 User image Daniel Holbert [:dholbert] 2010-11-18 21:59:40 PST
Yup -- I'll include both testcases from this bug as crashtests.

Thanks for the help & reviews!
Comment 32 User image Daniel Holbert [:dholbert] 2011-04-28 23:56:43 PDT
Any objections to un-hiding this bug?

This only affected gecko2-era code (after SVG-as-an-image support was added), and it was fixed quickly, 5 months ago, well before Firefox 4 was released.
Comment 33 User image Jesse Ruderman 2014-02-14 13:09:08 PST
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #7)
> We already have that warning in comment #0. We can't make it an assertion
> because we know plugins can trigger it in "valid" situations without async
> plugin painting.

For fuzzing, I'd be more than happy with an assertion that ignores the plugin case. Even if it's as an exclusion as wide as MOZ_ASSERT_IF(!everLoadedAnyPlugin, ...).

Note You need to log in before you can comment on or make changes to this bug.