Closed Bug 746896 Opened 12 years ago Closed 12 years ago

Crash in _cairo_surface_snapshot_copy_on_write during CC after printing a large canvas

Categories

(Core :: Graphics, defect)

x86_64
Linux
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla16
Tracking Status
firefox13 --- wontfix
firefox14 + verified
firefox15 + verified
firefox16 + verified
firefox-esr10 14+ verified

People

(Reporter: jruderman, Assigned: nrc)

References

Details

(Keywords: crash, sec-critical, testcase, Whiteboard: [sg:critical][advisory-tracking+])

Crash Data

Attachments

(3 files, 3 obsolete files)

Attached file testcase
1. Set
  user_pref("gfx.canvas.azure.enabled", true);
2. Load the testcase.
3. Print the page (to PDF).
4. Quit Firefox.

Result: crash during a shutdown CC.

This might be the cause of bug 740325. I'm filing this separately because I'm not sure, and because the testcase demonstrates that it's a security bug.
Assigning to Joe as a placeholder to find the right developer to work on this.
Assignee: nobody → joe
From Joe: Azure is enabled by default since FF7 on windows.
I believe I saw a patch for some similar bug? Jeff? Do you remember anything?
(In reply to Bas Schouten (:bas) from comment #4)
> I believe I saw a patch for some similar bug? Jeff? Do you remember anything?

Nothing jumps to mind.
Bas wins!
Assignee: joe → bas.schouten
I'm not seeing this on windows. Am I correct in understanding this is on Linux with Azure canvas preffed on manually?
Right. I haven't tested other platforms.
No longer blocks: 379903, 740325
Blocks: 379903, 740325
Are ever intending to ship Azure on linux? If not this is a less severe bug.

Does this problem repro on the Mac build? If so we need to fix it
If it's related to bug 740325 then it may be a regression (and happens on Windows, too). Or maybe some more recent code change is just making an old crash more likely.
We are planning on shipping Azure everywhere, but that doesn't mean that the same bugs will appear everywhere. In particular, the Cairo backend for Azure is quite unfinished.

However, Nick is working on the Cairo backend, so maybe he can fix this.
Assignee: bas.schouten → ncameron
Hi Nick, any traction here?
Hi working on it (In reply to David Bolter [:davidb] from comment #12)
> Hi Nick, any traction here?

Hi, working on it right now, I can reproduce with no problem, and have lots of info on how the crash is happening, but no fix yet (or even an idea of how to fix).
OK, figured it out now and I have a hack-ey fix, will post a proper fix soon.

See also bug 760349, which has similar steps to reproduce, but I think it totally different. My fix does not fix that bug.
Attached patch patch (obsolete) — Splinter Review
This fixes the crash, but causes a copy when we often don't need one. We should replace this with a copy-on-write link to data in GetThebesSurfaceForDrawTarget, but that is a fair amount of work (we would need a new class, gfxSourceSurfaceSurface or something), so I suggest we make that a separate bug, especially as this is a security issue, but Azure canvas is not properly supported yet.
Attachment #629055 - Flags: review?(joe)
Attached patch patch (obsolete) — Splinter Review
Attachment #629055 - Attachment is obsolete: true
Attachment #629055 - Flags: review?(joe)
Attachment #629112 - Flags: review?(joe)
Comment on attachment 629112 [details] [diff] [review]
patch

Review of attachment 629112 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/thebes/gfxImageSurface.cpp
@@ +199,5 @@
> +    }
> +
> +    gfxIntSize size(data->GetSize().width, data->GetSize().height);
> +    if (size != mSize)
> +    {

{ on previous line

@@ +203,5 @@
> +    {
> +        return false;
> +    }
> +
> +    CopyForStride(mData, data->GetData(), size, mStride, data->Stride());

You'll need to have a check to see if we're A1 or A8; I don't think you'll be able to copy a DataSourceSurface to ourselves in that case.

(Aside: Is DataSourceSurface always 32-bit?)

::: gfx/thebes/gfxImageSurface.h
@@ +75,5 @@
>      /* Fast copy from another image surface; returns TRUE if successful, FALSE otherwise */
>      bool CopyFrom (gfxImageSurface *other);
>  
> +    /**
> +     * Fast copy from an Azure source surface; returns TRUE if successful, FALSE otherwise

s/ Azure//

Azure is the project, not the API :)

::: gfx/thebes/gfxPlatform.cpp
@@ +596,5 @@
>  
>      IntSize size = data->GetSize();
>      gfxASurface::gfxImageFormat format = OptimalFormatForContent(ContentForFormat(data->GetFormat()));
>  
> +    // We need to make a copy here because data might change it's data under us

its

@@ +597,5 @@
>      IntSize size = data->GetSize();
>      gfxASurface::gfxImageFormat format = OptimalFormatForContent(ContentForFormat(data->GetFormat()));
>  
> +    // We need to make a copy here because data might change it's data under us
> +    nsRefPtr<gfxImageSurface> imageSurf = new gfxImageSurface(gfxIntSize(size.width, size.height), format, false);

This is going to introduce a ton of copies :( Does this always get used with the Cairo backend?
Attachment #629112 - Flags: review?(joe) → review-
(In reply to Joe Drew (:JOEDREW!) from comment #17)
> Comment on attachment 629112 [details] [diff] [review]
> @@ +203,5 @@
> > +    {
> > +        return false;
> > +    }
> > +
> > +    CopyForStride(mData, data->GetData(), size, mStride, data->Stride());
> 
> You'll need to have a check to see if we're A1 or A8; I don't think you'll
> be able to copy a DataSourceSurface to ourselves in that case.
> 
> (Aside: Is DataSourceSurface always 32-bit?)
>

DataSourceSurface doesn't expose any API about the format of its data, which is why there is no check. I don't know if they are always 32bit, I'll check both things with Bas next week.

> @@ +597,5 @@
> >      IntSize size = data->GetSize();
> >      gfxASurface::gfxImageFormat format = OptimalFormatForContent(ContentForFormat(data->GetFormat()));
> >  
> > +    // We need to make a copy here because data might change it's data under us
> > +    nsRefPtr<gfxImageSurface> imageSurf = new gfxImageSurface(gfxIntSize(size.width, size.height), format, false);
> 
> This is going to introduce a ton of copies :( Does this always get used with
> the Cairo backend?

Yes, unfortunately so. I'm not sure what you mean by always. Any time we hit this path, there will be a copy. The bug is with the Skia backend (Cairo is only being used because of the printing), which is probably why this bug only manifests when doing a print and on Linux (although I think we should take the same path with Direct2D, so not sure why there is no crash on Windows). With the Cairo backend, we take the if path in this method (the change is in the else) path, so we would avoid a copy.

We should replace the copy with a copy on write class, but it might be a fair bit of work and isn't necessary for a fix here (and this (Azure/Skia canvas) is not even properly supported yet), so I propose doing that work later, I can file a bug if you agree.
> DataSourceSurface doesn't expose any API about the format of its data, which
> is why there is no check. I don't know if they are always 32bit, I'll check
> both things with Bas next week.

WRONG! Fixing this now...
> (Aside: Is DataSourceSurface always 32-bit?)

No, I think they can be 565 or A8 also
Attached patch updated patch (obsolete) — Splinter Review
Attachment #629112 - Attachment is obsolete: true
Attachment #629988 - Flags: review?(joe)
Comment on attachment 629988 [details] [diff] [review]
updated patch

Review of attachment 629988 [details] [diff] [review]:
-----------------------------------------------------------------

One very important fix is necessary, but I don't need to review it again.

::: gfx/thebes/gfx2DGlue.h
@@ +2,5 @@
>   * License, v. 2.0. If a copy of the MPL was not distributed with this
>   * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
>  
> +#ifndef GFX_2D_GLUE_H
> +#define GFX_2D_GLUE_H

*shakes fist at Bas*

::: gfx/thebes/gfxImageSurface.cpp
@@ +202,5 @@
> +        !(a1 == gfxASurface::ImageFormatRGB24 &&
> +          a2 == gfxASurface::ImageFormatARGB32)) {
> +        return false;
> +    }
> +}

missing "return true"
Attachment #629988 - Flags: review?(joe) → review+
Made the change from the review and added an assertion, carrying r=joe

changed

imageSurf->CopyFrom(source);

to

bool resultOfCopy = imageSurf->CopyFrom(source);
NS_ASSERTION(resultOfCopy, "Failed to copy surface.");

in gfxPlatform::GetThebesSurfaceForDrawTarget because the review comment made me realise I never actually checked the returned result. It should always be true because the memory is allocated earlier and the format is picked to be correct.
Attachment #629988 - Attachment is obsolete: true
Attachment #630245 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/52b9c50593c3
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
(In reply to Nick Cameron [:nrc] from comment #25)

I don't know how I managed to change tracking-firefox-esr10, but I don't seem to be able to change it back.

dveditz: do you want to change this back, and should we land this on beta/aurora?
(In reply to Nick Cameron [:nrc] from comment #26)
> https://hg.mozilla.org/mozilla-central/rev/52b9c50593c3

False alarm.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
https://hg.mozilla.org/mozilla-central/rev/28add393ae55
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
(In reply to Nick Cameron [:nrc] from comment #27)
> dveditz: do you want to change this back, and should we land this on
> beta/aurora?

Please request approval to land this patch on Aurora/Beta/ESR, thanks.
Comment on attachment 630245 [details] [diff] [review]
updated patch with review comments

[Approval Request Comment]
Bug caused by (feature/regressing bug #): unknown
User impact if declined: crashes on printing, security vulnerability
Testing completed (on m-c, etc.): yes, see above comments
Risk to taking this patch (and alternatives if risky): can increase copying and thus impact performance
String or UUID changes made by this patch: none
Attachment #630245 - Flags: approval-mozilla-esr10?
Attachment #630245 - Flags: approval-mozilla-beta?
Attachment #630245 - Flags: approval-mozilla-aurora?
Comment on attachment 630245 [details] [diff] [review]
updated patch with review comments

Apparently this breaks Azure/Skia canvas. Mattwoodrow is working on a patch, I'll post more details and a bug number on Monday. I don't think we need to backout, but we should probably hold off on pushing to Beta etc.
Attachment #630245 - Flags: approval-mozilla-esr10?
Attachment #630245 - Flags: approval-mozilla-beta?
Attachment #630245 - Flags: approval-mozilla-aurora?
Comment on attachment 630245 [details] [diff] [review]
updated patch with review comments

[Approval Request Comment]
Bug caused by (feature/regressing bug #): unknown
User impact if declined: crashes on printing, security vulnerability
Testing completed (on m-c, etc.): yes, see above comments
Risk to taking this patch (and alternatives if risky): can increase copying and thus impact performance; causes incorrect rendering with Azure/Skia canvas, but this is not enabled by default on any platform
String or UUID changes made by this patch: none
Attachment #630245 - Flags: approval-mozilla-esr10?
Attachment #630245 - Flags: approval-mozilla-beta?
Attachment #630245 - Flags: approval-mozilla-aurora?
Attachment #630245 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Before landing on beta/esr, have we checked all applicable perf dashboards to ensure that we have no new regressions there?
Comment on attachment 630245 [details] [diff] [review]
updated patch with review comments

From Nick in email:
 I checked the dashboards and there are no performance regressions around the time the patch landed on inbound or Aurora (the code changed is pref'ed off, so this is expected)

Approving for beta and esr10, please update status flags when landed.
Attachment #630245 - Flags: approval-mozilla-esr10?
Attachment #630245 - Flags: approval-mozilla-esr10+
Attachment #630245 - Flags: approval-mozilla-beta?
Attachment #630245 - Flags: approval-mozilla-beta+
Whiteboard: [sg:critical] → [sg:critical][advisory-tracking+]
Firefox 13.0.1: crash
Firefox 14.0b11: no crash
Firefox 15.0a2 2012-07-11: no crash
Firefox 16.0a1 2012-07-11: no crash
Firefox 10.0.6esrpre 2012-07-11: no crash
Group: core-security
As suggested earlier, this patch prevented the crash by breaking azure/skia canvas. It hasn't shown up anywhere since this isn't a supported configuration and requires a perf to be activated.

I'm not really sure why this was classified as sg:crit and uplifted to all the branches.

The crash is caused by the created gfxImageSurface relying on the pixel data owned by the draw target without taking a ref to the draw target to ensure that the data lives long enough.

In nsCanvasRenderingContext2DAzure::Reset(), we delete the draw target before the thebes surface, leaving it with a dangling pointer to the pixel memory.

This really isn't specific to skia, except that the gfxImageSurface destructor apparently touches this deleted memory, whereas the others (e.g. gfxQuartzSurface) isn't.

The obvious answer of just storing a ref to the DrawTarget as user data on the gfxASurface won't work since that will create a cycle and prevent them from ever being collected.

I think we do want to use this ref though, and change the DrawTarget -> gfxASurface lookup code to use something other than a reference stored on the DrawTarget's UserData.
Blocks: 746883
You need to log in before you can comment on or make changes to this bug.