Crash in _cairo_surface_snapshot_copy_on_write during CC after printing a large canvas

VERIFIED FIXED in Firefox 14

Status

()

Core
Graphics
--
critical
VERIFIED FIXED
5 years ago
5 years ago

People

(Reporter: Jesse Ruderman, Assigned: nrc)

Tracking

(Blocks: 1 bug, {crash, sec-critical, testcase})

Trunk
mozilla16
x86_64
Linux
crash, sec-critical, testcase
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox13 wontfix, firefox14+ verified, firefox15+ verified, firefox16+ verified, firefox-esr1014+ verified)

Details

(Whiteboard: [sg:critical][advisory-tracking+], crash signature)

Attachments

(3 attachments, 3 obsolete attachments)

(Reporter)

Description

5 years ago
Created attachment 616478 [details]
testcase

1. Set
  user_pref("gfx.canvas.azure.enabled", true);
2. Load the testcase.
3. Print the page (to PDF).
4. Quit Firefox.

Result: crash during a shutdown CC.

This might be the cause of bug 740325. I'm filing this separately because I'm not sure, and because the testcase demonstrates that it's a security bug.
(Reporter)

Comment 1

5 years ago
Created attachment 616479 [details]
stack trace

Opt: bp-f44efcb1-d956-4e1f-8b83-42e382120419
Assigning to Joe as a placeholder to find the right developer to work on this.
Assignee: nobody → joe
status-firefox14: --- → affected
status-firefox15: --- → affected
From Joe: Azure is enabled by default since FF7 on windows.
status-firefox12: --- → wontfix
status-firefox13: --- → affected
status-firefox-esr10: --- → affected
I believe I saw a patch for some similar bug? Jeff? Do you remember anything?
Keywords: sec-critical
(In reply to Bas Schouten (:bas) from comment #4)
> I believe I saw a patch for some similar bug? Jeff? Do you remember anything?

Nothing jumps to mind.
Bas wins!
Assignee: joe → bas.schouten
I'm not seeing this on windows. Am I correct in understanding this is on Linux with Azure canvas preffed on manually?
(Reporter)

Comment 8

5 years ago
Right. I haven't tested other platforms.
No longer blocks: 379903, 740325

Updated

5 years ago
Blocks: 379903, 740325
Are ever intending to ship Azure on linux? If not this is a less severe bug.

Does this problem repro on the Mac build? If so we need to fix it
If it's related to bug 740325 then it may be a regression (and happens on Windows, too). Or maybe some more recent code change is just making an old crash more likely.
We are planning on shipping Azure everywhere, but that doesn't mean that the same bugs will appear everywhere. In particular, the Cairo backend for Azure is quite unfinished.

However, Nick is working on the Cairo backend, so maybe he can fix this.
Assignee: bas.schouten → ncameron
Hi Nick, any traction here?
(Assignee)

Comment 13

5 years ago
Hi working on it (In reply to David Bolter [:davidb] from comment #12)
> Hi Nick, any traction here?

Hi, working on it right now, I can reproduce with no problem, and have lots of info on how the crash is happening, but no fix yet (or even an idea of how to fix).
(Assignee)

Comment 14

5 years ago
OK, figured it out now and I have a hack-ey fix, will post a proper fix soon.

See also bug 760349, which has similar steps to reproduce, but I think it totally different. My fix does not fix that bug.
(Assignee)

Comment 15

5 years ago
Created attachment 629055 [details] [diff] [review]
patch

This fixes the crash, but causes a copy when we often don't need one. We should replace this with a copy-on-write link to data in GetThebesSurfaceForDrawTarget, but that is a fair amount of work (we would need a new class, gfxSourceSurfaceSurface or something), so I suggest we make that a separate bug, especially as this is a security issue, but Azure canvas is not properly supported yet.
Attachment #629055 - Flags: review?(joe)
(Assignee)

Comment 16

5 years ago
Created attachment 629112 [details] [diff] [review]
patch
Attachment #629055 - Attachment is obsolete: true
Attachment #629055 - Flags: review?(joe)
Attachment #629112 - Flags: review?(joe)
Comment on attachment 629112 [details] [diff] [review]
patch

Review of attachment 629112 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/thebes/gfxImageSurface.cpp
@@ +199,5 @@
> +    }
> +
> +    gfxIntSize size(data->GetSize().width, data->GetSize().height);
> +    if (size != mSize)
> +    {

{ on previous line

@@ +203,5 @@
> +    {
> +        return false;
> +    }
> +
> +    CopyForStride(mData, data->GetData(), size, mStride, data->Stride());

You'll need to have a check to see if we're A1 or A8; I don't think you'll be able to copy a DataSourceSurface to ourselves in that case.

(Aside: Is DataSourceSurface always 32-bit?)

::: gfx/thebes/gfxImageSurface.h
@@ +75,5 @@
>      /* Fast copy from another image surface; returns TRUE if successful, FALSE otherwise */
>      bool CopyFrom (gfxImageSurface *other);
>  
> +    /**
> +     * Fast copy from an Azure source surface; returns TRUE if successful, FALSE otherwise

s/ Azure//

Azure is the project, not the API :)

::: gfx/thebes/gfxPlatform.cpp
@@ +596,5 @@
>  
>      IntSize size = data->GetSize();
>      gfxASurface::gfxImageFormat format = OptimalFormatForContent(ContentForFormat(data->GetFormat()));
>  
> +    // We need to make a copy here because data might change it's data under us

its

@@ +597,5 @@
>      IntSize size = data->GetSize();
>      gfxASurface::gfxImageFormat format = OptimalFormatForContent(ContentForFormat(data->GetFormat()));
>  
> +    // We need to make a copy here because data might change it's data under us
> +    nsRefPtr<gfxImageSurface> imageSurf = new gfxImageSurface(gfxIntSize(size.width, size.height), format, false);

This is going to introduce a ton of copies :( Does this always get used with the Cairo backend?
Attachment #629112 - Flags: review?(joe) → review-
(Assignee)

Comment 18

5 years ago
Try push: https://tbpl.mozilla.org/?tree=Try&rev=d07787580fac
(Assignee)

Comment 19

5 years ago
(In reply to Joe Drew (:JOEDREW!) from comment #17)
> Comment on attachment 629112 [details] [diff] [review]
> @@ +203,5 @@
> > +    {
> > +        return false;
> > +    }
> > +
> > +    CopyForStride(mData, data->GetData(), size, mStride, data->Stride());
> 
> You'll need to have a check to see if we're A1 or A8; I don't think you'll
> be able to copy a DataSourceSurface to ourselves in that case.
> 
> (Aside: Is DataSourceSurface always 32-bit?)
>

DataSourceSurface doesn't expose any API about the format of its data, which is why there is no check. I don't know if they are always 32bit, I'll check both things with Bas next week.

> @@ +597,5 @@
> >      IntSize size = data->GetSize();
> >      gfxASurface::gfxImageFormat format = OptimalFormatForContent(ContentForFormat(data->GetFormat()));
> >  
> > +    // We need to make a copy here because data might change it's data under us
> > +    nsRefPtr<gfxImageSurface> imageSurf = new gfxImageSurface(gfxIntSize(size.width, size.height), format, false);
> 
> This is going to introduce a ton of copies :( Does this always get used with
> the Cairo backend?

Yes, unfortunately so. I'm not sure what you mean by always. Any time we hit this path, there will be a copy. The bug is with the Skia backend (Cairo is only being used because of the printing), which is probably why this bug only manifests when doing a print and on Linux (although I think we should take the same path with Direct2D, so not sure why there is no crash on Windows). With the Cairo backend, we take the if path in this method (the change is in the else) path, so we would avoid a copy.

We should replace the copy with a copy on write class, but it might be a fair bit of work and isn't necessary for a fix here (and this (Azure/Skia canvas) is not even properly supported yet), so I propose doing that work later, I can file a bug if you agree.
(Assignee)

Comment 20

5 years ago
> DataSourceSurface doesn't expose any API about the format of its data, which
> is why there is no check. I don't know if they are always 32bit, I'll check
> both things with Bas next week.

WRONG! Fixing this now...
(Assignee)

Comment 21

5 years ago
> (Aside: Is DataSourceSurface always 32-bit?)

No, I think they can be 565 or A8 also
(Assignee)

Comment 22

5 years ago
Created attachment 629988 [details] [diff] [review]
updated patch
Attachment #629112 - Attachment is obsolete: true
Attachment #629988 - Flags: review?(joe)
Comment on attachment 629988 [details] [diff] [review]
updated patch

Review of attachment 629988 [details] [diff] [review]:
-----------------------------------------------------------------

One very important fix is necessary, but I don't need to review it again.

::: gfx/thebes/gfx2DGlue.h
@@ +2,5 @@
>   * License, v. 2.0. If a copy of the MPL was not distributed with this
>   * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
>  
> +#ifndef GFX_2D_GLUE_H
> +#define GFX_2D_GLUE_H

*shakes fist at Bas*

::: gfx/thebes/gfxImageSurface.cpp
@@ +202,5 @@
> +        !(a1 == gfxASurface::ImageFormatRGB24 &&
> +          a2 == gfxASurface::ImageFormatARGB32)) {
> +        return false;
> +    }
> +}

missing "return true"
Attachment #629988 - Flags: review?(joe) → review+
(Assignee)

Comment 24

5 years ago
Created attachment 630245 [details] [diff] [review]
updated patch with review comments

Made the change from the review and added an assertion, carrying r=joe

changed

imageSurf->CopyFrom(source);

to

bool resultOfCopy = imageSurf->CopyFrom(source);
NS_ASSERTION(resultOfCopy, "Failed to copy surface.");

in gfxPlatform::GetThebesSurfaceForDrawTarget because the review comment made me realise I never actually checked the returned result. It should always be true because the memory is allocated earlier and the format is picked to be correct.
Attachment #629988 - Attachment is obsolete: true
Attachment #630245 - Flags: review+
status-firefox12: wontfix → ---
status-firefox13: affected → wontfix
status-firefox16: --- → affected
tracking-firefox-esr10: --- → 14+
tracking-firefox14: --- → +
tracking-firefox15: --- → +
tracking-firefox16: --- → +
(Assignee)

Comment 25

5 years ago
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=88c6ee3a9aef
tracking-firefox-esr10: 14+ → ---
Target Milestone: --- → mozilla16
(Assignee)

Comment 26

5 years ago
https://hg.mozilla.org/mozilla-central/rev/52b9c50593c3
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
(Assignee)

Comment 27

5 years ago
(In reply to Nick Cameron [:nrc] from comment #25)

I don't know how I managed to change tracking-firefox-esr10, but I don't seem to be able to change it back.

dveditz: do you want to change this back, and should we land this on beta/aurora?
(Assignee)

Comment 28

5 years ago
(In reply to Nick Cameron [:nrc] from comment #26)
> https://hg.mozilla.org/mozilla-central/rev/52b9c50593c3

False alarm.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Assignee)

Comment 29

5 years ago
Another go: https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=28add393ae55
https://hg.mozilla.org/mozilla-central/rev/28add393ae55
Status: REOPENED → RESOLVED
Last Resolved: 5 years ago5 years ago
status-firefox16: affected → fixed
Resolution: --- → FIXED
tracking-firefox-esr10: --- → 14+
(In reply to Nick Cameron [:nrc] from comment #27)
> dveditz: do you want to change this back, and should we land this on
> beta/aurora?

Please request approval to land this patch on Aurora/Beta/ESR, thanks.
(Assignee)

Comment 32

5 years ago
Comment on attachment 630245 [details] [diff] [review]
updated patch with review comments

[Approval Request Comment]
Bug caused by (feature/regressing bug #): unknown
User impact if declined: crashes on printing, security vulnerability
Testing completed (on m-c, etc.): yes, see above comments
Risk to taking this patch (and alternatives if risky): can increase copying and thus impact performance
String or UUID changes made by this patch: none
Attachment #630245 - Flags: approval-mozilla-esr10?
Attachment #630245 - Flags: approval-mozilla-beta?
Attachment #630245 - Flags: approval-mozilla-aurora?
(Assignee)

Comment 33

5 years ago
Comment on attachment 630245 [details] [diff] [review]
updated patch with review comments

Apparently this breaks Azure/Skia canvas. Mattwoodrow is working on a patch, I'll post more details and a bug number on Monday. I don't think we need to backout, but we should probably hold off on pushing to Beta etc.
Attachment #630245 - Flags: approval-mozilla-esr10?
Attachment #630245 - Flags: approval-mozilla-beta?
Attachment #630245 - Flags: approval-mozilla-aurora?
(Assignee)

Comment 34

5 years ago
Comment on attachment 630245 [details] [diff] [review]
updated patch with review comments

[Approval Request Comment]
Bug caused by (feature/regressing bug #): unknown
User impact if declined: crashes on printing, security vulnerability
Testing completed (on m-c, etc.): yes, see above comments
Risk to taking this patch (and alternatives if risky): can increase copying and thus impact performance; causes incorrect rendering with Azure/Skia canvas, but this is not enabled by default on any platform
String or UUID changes made by this patch: none
Attachment #630245 - Flags: approval-mozilla-esr10?
Attachment #630245 - Flags: approval-mozilla-beta?
Attachment #630245 - Flags: approval-mozilla-aurora?

Updated

5 years ago
Attachment #630245 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Before landing on beta/esr, have we checked all applicable perf dashboards to ensure that we have no new regressions there?
https://hg.mozilla.org/releases/mozilla-aurora/rev/d88144263365
status-firefox15: affected → fixed
Comment on attachment 630245 [details] [diff] [review]
updated patch with review comments

From Nick in email:
 I checked the dashboards and there are no performance regressions around the time the patch landed on inbound or Aurora (the code changed is pref'ed off, so this is expected)

Approving for beta and esr10, please update status flags when landed.
Attachment #630245 - Flags: approval-mozilla-esr10?
Attachment #630245 - Flags: approval-mozilla-esr10+
Attachment #630245 - Flags: approval-mozilla-beta?
Attachment #630245 - Flags: approval-mozilla-beta+
(Assignee)

Comment 38

5 years ago
https://tbpl.mozilla.org/?tree=Mozilla-Beta&rev=43660c6f8500
status-firefox14: affected → fixed
(Assignee)

Comment 39

5 years ago
https://tbpl.mozilla.org/?tree=Mozilla-Esr10&rev=8597b02bbad1
status-firefox-esr10: affected → fixed
Whiteboard: [sg:critical] → [sg:critical][advisory-tracking+]
Firefox 13.0.1: crash
Firefox 14.0b11: no crash
Firefox 15.0a2 2012-07-11: no crash
Firefox 16.0a1 2012-07-11: no crash
Firefox 10.0.6esrpre 2012-07-11: no crash
Status: RESOLVED → VERIFIED
status-firefox-esr10: fixed → verified
status-firefox14: fixed → verified
status-firefox15: fixed → verified
status-firefox16: fixed → verified
Group: core-security
As suggested earlier, this patch prevented the crash by breaking azure/skia canvas. It hasn't shown up anywhere since this isn't a supported configuration and requires a perf to be activated.

I'm not really sure why this was classified as sg:crit and uplifted to all the branches.

The crash is caused by the created gfxImageSurface relying on the pixel data owned by the draw target without taking a ref to the draw target to ensure that the data lives long enough.

In nsCanvasRenderingContext2DAzure::Reset(), we delete the draw target before the thebes surface, leaving it with a dangling pointer to the pixel memory.

This really isn't specific to skia, except that the gfxImageSurface destructor apparently touches this deleted memory, whereas the others (e.g. gfxQuartzSurface) isn't.

The obvious answer of just storing a ref to the DrawTarget as user data on the gfxASurface won't work since that will create a cycle and prevent them from ever being collected.

I think we do want to use this ref though, and change the DrawTarget -> gfxASurface lookup code to use something other than a reference stored on the DrawTarget's UserData.
(Assignee)

Updated

5 years ago
Blocks: 746883
You need to log in before you can comment on or make changes to this bug.