[10.5] Crash in _cairo_quartz_surface_mask @ _cairo_user_data_array_fini

RESOLVED WORKSFORME

Status

()

Core
Graphics
--
critical
RESOLVED WORKSFORME
6 years ago
5 years ago

People

(Reporter: Scoobidiver (away), Assigned: smichaud)

Tracking

({crash, regression, topcrash})

11 Branch
x86
Mac OS X
crash, regression, topcrash
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(crash signature)

Attachments

(4 attachments)

(Reporter)

Description

6 years ago
It's #4 top browser crasher in 11.0 on Mac OS X.

Stacks look like:
Frame 	Module 	Signature 	Source
0 	XUL 	_cairo_user_data_array_fini 	gfx/cairo/cairo/src/cairo-array.c:389
1 	XUL 	_moz_cairo_surface_destroy 	gfx/cairo/cairo/src/cairo-surface.c:654
2 	XUL 	_cairo_surface_to_cgimage 	gfx/cairo/cairo/src/cairo-quartz-surface.c:1116
3 	XUL 	_cairo_quartz_surface_mask_with_surface 	gfx/cairo/cairo/src/cairo-quartz-surface.c:2926
4 	XUL 	_cairo_quartz_surface_mask 	gfx/cairo/cairo/src/cairo-quartz-surface.c:2999
5 	XUL 	_cairo_surface_mask 	gfx/cairo/cairo/src/cairo-surface.c:2164
6 	XUL 	_cairo_gstate_mask 	gfx/cairo/cairo/src/cairo-gstate.c:1124
7 	XUL 	_moz_cairo_mask 	gfx/cairo/cairo/src/cairo.c:2314
8 	XUL 	nsSVGIntegrationUtils::PaintFramesWithEffects 	layout/svg/base/src/nsSVGIntegrationUtils.cpp:362
9 	XUL 	nsDisplaySVGEffects::Paint 	layout/base/nsDisplayList.cpp:2968
10 	XUL 	mozilla::FrameLayerBuilder::DrawThebesLayer 	layout/base/FrameLayerBuilder.cpp:2158
11 	XUL 	mozilla::layers::BasicShadowableThebesLayer::PaintBuffer 	gfx/layers/basic/BasicLayers.cpp:563
12 	XUL 	mozilla::layers::BasicThebesLayer::PaintThebes 	gfx/layers/basic/BasicLayers.cpp:748
13 	XUL 	mozilla::layers::BasicLayerManager::PaintLayer 	gfx/layers/basic/BasicLayers.cpp:1902
14 	XUL 	mozilla::layers::BasicLayerManager::PaintLayer 	gfx/layers/basic/BasicLayers.cpp:1917
15 	XUL 	mozilla::layers::BasicLayerManager::EndTransactionInternal 	gfx/layers/basic/BasicLayers.cpp:1614
16 	XUL 	mozilla::layers::BasicShadowLayerManager::EndTransaction 	gfx/layers/basic/BasicLayers.cpp:1565
17 	XUL 	nsDisplayList::PaintForFrame 	layout/base/nsDisplayList.cpp:635
18 	XUL 	nsDisplayList::PaintRoot 	layout/base/nsDisplayList.cpp:540
19 	XUL 	nsLayoutUtils::PaintFrame 	layout/base/nsLayoutUtils.cpp:1769
20 	XUL 	PresShell::Paint 	layout/base/nsPresShell.cpp:5502
...

or
Frame 	Module 	Signature 	Source
0 	XUL 	_cairo_user_data_array_fini 	gfx/cairo/cairo/src/cairo-array.c:389
1 	XUL 	_moz_cairo_surface_destroy 	gfx/cairo/cairo/src/cairo-surface.c:654
2 	XUL 	_cairo_surface_to_cgimage 	gfx/cairo/cairo/src/cairo-quartz-surface.c:1116
3 	XUL 	_cairo_quartz_surface_mask_with_surface 	gfx/cairo/cairo/src/cairo-quartz-surface.c:2926
4 	XUL 	_cairo_quartz_surface_mask 	gfx/cairo/cairo/src/cairo-quartz-surface.c:2999
5 	XUL 	_cairo_surface_mask 	gfx/cairo/cairo/src/cairo-surface.c:2164
6 	XUL 	_cairo_gstate_mask 	gfx/cairo/cairo/src/cairo-gstate.c:1124
7 	XUL 	_moz_cairo_mask_surface 	gfx/cairo/cairo/src/cairo.c:2314
8 	XUL 	gfxContext::Mask 	gfx/thebes/gfxContext.cpp:797
9 	XUL 	gfxAlphaBoxBlur::Paint 	gfx/thebes/gfxBlur.cpp:119
10 	XUL 	nsContextBoxBlur::DoPaint 	layout/base/nsCSSRendering.cpp:4275
11 	XUL 	nsTextFrame::PaintOneShadow 	layout/generic/nsTextFrameThebes.cpp:4976
12 	XUL 	nsTextFrame::PaintText 	layout/generic/nsTextFrameThebes.cpp:5430
13 	XUL 	nsDisplayText::Paint 	layout/generic/nsTextFrameThebes.cpp:4281
14 	XUL 	mozilla::FrameLayerBuilder::DrawThebesLayer 	layout/base/FrameLayerBuilder.cpp:2158
15 	XUL 	mozilla::layers::BasicThebesLayer::PaintThebes 	gfx/layers/basic/BasicLayers.cpp:708
16 	XUL 	mozilla::layers::BasicLayerManager::PaintLayer 	gfx/layers/basic/BasicLayers.cpp:1902
17 	XUL 	mozilla::layers::BasicLayerManager::PaintLayer 	gfx/layers/basic/BasicLayers.cpp:1917
18 	XUL 	mozilla::layers::BasicLayerManager::PaintLayer 	gfx/layers/basic/BasicLayers.cpp:1917
19 	XUL 	mozilla::layers::BasicLayerManager::EndTransactionInternal 	gfx/layers/basic/BasicLayers.cpp:1614
20 	XUL 	mozilla::layers::BasicShadowLayerManager::EndTransaction 	gfx/layers/basic/BasicLayers.cpp:1565
21 	XUL 	nsDisplayList::PaintForFrame 	layout/base/nsDisplayList.cpp:635
22 	XUL 	nsDisplayList::PaintRoot 	layout/base/nsDisplayList.cpp:540
23 	XUL 	nsLayoutUtils::PaintFrame 	layout/base/nsLayoutUtils.cpp:1769
24 	XUL 	PresShell::Paint 	layout/base/nsPresShell.cpp:5502
...

or
Frame 	Module 	Signature 	Source
0 	XUL 	_cairo_user_data_array_fini 	gfx/cairo/cairo/src/cairo-array.c:389
1 	XUL 	_moz_cairo_surface_destroy 	gfx/cairo/cairo/src/cairo-surface.c:654
2 	XUL 	_cairo_surface_to_cgimage 	gfx/cairo/cairo/src/cairo-quartz-surface.c:1116
3 	XUL 	_cairo_quartz_surface_mask_with_surface 	gfx/cairo/cairo/src/cairo-quartz-surface.c:2926
4 	XUL 	_cairo_quartz_surface_mask 	gfx/cairo/cairo/src/cairo-quartz-surface.c:2999
5 	XUL 	_cairo_surface_mask 	gfx/cairo/cairo/src/cairo-surface.c:2164
6 	XUL 	_cairo_gstate_mask 	gfx/cairo/cairo/src/cairo-gstate.c:1124
7 	XUL 	_moz_cairo_mask_surface 	gfx/cairo/cairo/src/cairo.c:2314
8 	XUL 	gfxContext::Mask 	gfx/thebes/gfxContext.cpp:797
9 	XUL 	gfxAlphaBoxBlur::Paint 	gfx/thebes/gfxBlur.cpp:119
10 	XUL 	nsCSSRendering::PaintBoxShadowOuter 	layout/base/nsCSSRendering.cpp:4275
11 	XUL 	nsDisplayBoxShadowOuter::Paint 	layout/base/nsDisplayList.cpp:1502
12 	XUL 	mozilla::FrameLayerBuilder::DrawThebesLayer 	layout/base/FrameLayerBuilder.cpp:2158
13 	XUL 	mozilla::layers::BasicShadowableThebesLayer::PaintBuffer 	gfx/layers/basic/BasicLayers.cpp:563
14 	XUL 	mozilla::layers::BasicThebesLayer::PaintThebes 	gfx/layers/basic/BasicLayers.cpp:748
15 	XUL 	mozilla::layers::BasicLayerManager::PaintLayer 	gfx/layers/basic/BasicLayers.cpp:1902
16 	XUL 	mozilla::layers::BasicLayerManager::PaintLayer 	gfx/layers/basic/BasicLayers.cpp:1917
17 	XUL 	mozilla::layers::BasicLayerManager::PaintLayer 	gfx/layers/basic/BasicLayers.cpp:1917
18 	XUL 	mozilla::layers::BasicLayerManager::EndTransactionInternal 	gfx/layers/basic/BasicLayers.cpp:1614
19 	XUL 	mozilla::layers::BasicShadowLayerManager::EndTransaction 	gfx/layers/basic/BasicLayers.cpp:1565
20 	XUL 	nsDisplayList::PaintForFrame 	layout/base/nsDisplayList.cpp:635
21 	XUL 	nsDisplayList::PaintRoot 	layout/base/nsDisplayList.cpp:540
22 	XUL 	nsLayoutUtils::PaintFrame 	layout/base/nsLayoutUtils.cpp:1769
23 	XUL 	PresShell::Paint 	layout/base/nsPresShell.cpp:5502
...

More reports at:
https://crash-stats.mozilla.com/report/list?signature=_cairo_user_data_array_fini
Component: Layout → Graphics
QA Contact: layout → thebes
I'm crossing my fingers that Steven can diagnose this!
(Assignee)

Comment 2

6 years ago
These seem to start full-blown with the 20120228210006 build (which I think is FF 11.0b5), without appearing in nightlies.

That's extremely puzzling.  Do you have any idea, Joe, what could have triggered this?

(There are tons of other crashes with this same signature, especially on Linux.  But the stacks are otherwise quite different.)
(Assignee)

Comment 3

6 years ago
Of course my patch for bug 721663 did land on the 12 and 11 branches on 2012/02/27, just in time for the 11.0b5 "release" :-(

So I suppose there's some connection.  Though I can't believe my patch *caused* these crashes.
(Assignee)

Comment 4

6 years ago
Interestingly, many of the comments indicate that these are crashes on startup.
(Assignee)

Comment 5

6 years ago
I'll bet this is a simple NULL-deference bug -- _cairo_array_index() can return NULL.

Patch coming up.
(Assignee)

Comment 6

6 years ago
> Patch coming up.

Not quite yet.

The 'slots' pointer in _cairo_user_data_array_fini() *does* appear to be NULL.  But I can't figure out how this could happen if 'num_slots' is non-zero.

I'll continue work on this tomorrow.
(Reporter)

Comment 7

6 years ago
(In reply to Steven Michaud from comment #3)
> Of course my patch for bug 721663 did land on the 12 and 11 branches on
> 2012/02/27, just in time for the 11.0b5 "release" :-(
The Beta-11 (50,000 ADU on Mac) regression range is indeed:
http://hg.mozilla.org/releases/mozilla-beta/pushloghtml?fromchange=0338a18c2bc8&tochange=8c9e4873d419
Bug 721663 belongs to the Beta regression range.
The first crash in Aurora 12 (5,000 ADU on Mac) happened in 12.0a2/20120303, but it's discontinuous across builds.
There are no crashes in 13.0a1 or 14.0a1 (1,500 ADU on Mac).
Blocks: 721663
Keywords: regression
Summary: crash in nsDisplayList::PaintForFrame @ _cairo_user_data_array_fini on Mac OS X 10.5 → [10.5] Crash in _cairo_quartz_surface_mask @ _cairo_user_data_array_fini
(Assignee)

Comment 8

6 years ago
I'm about to throw my hands up on this.

The only thing that's clear is that the value of 'array->elements' in _cairo_user_data_array_t is invalid, and so also may be value of 'array' itself.

I've no idea how this could happen.  And until we can reproduce this bug I'm unlikely to be able to find out.

I spent most of the day looking for obvious causes (like uninitialized memory), but I couldn't find any.

Possibly it's worth mentioning that bug 721663 involved memory corruption.  I thought I'd completely worked around that ... but maybe I didn't.

Anyone know if platform font code might somehow be involved here (and/or in bug 721663)?

I know we use ATS fonts on OS X 10.5, and in the past they've proved to be very flaky.

Jon and Jonathan, is it possible to avoid using ATS fonts on OS X 10.5 (to make gfxMacPlatformFontList::UseATSFontEntry() return false even on 10.5) without blowing things up?

I'm going to try doing that in combination with backing out my patch for bug 721663, to see if it makes any difference.
I will take a look at some of the crash reports and see if I can reproduce.
(Correcting comment #8)

> The only thing that's clear is that the value of 'array->elements' in
> _cairo_user_data_array_t is invalid, and so also may be value of 'array' itself.

_cairo_user_data_array_t -> _cairo_user_data_array_fini()

http://hg.mozilla.org/mozilla-central/annotate/9023803c5d64/gfx/cairo/cairo/src/cairo-array.c#l379
So far I generated one crash while testing this, but it was not in this stack.

I looked at the correlation reports and there is small correlation to Evernote, so I installed that and the addon.

Comment 12

6 years ago
It's not clear from the notes here, do you have a testcase that reproduces the problem?

The patch for bug 721663 is in the regression range from comment 7.  Something to do with that patch seems like a more likely candidate for the root cause of this.

> Jon and Jonathan, is it possible to avoid using ATS fonts on OS X
> 10.5 (to make gfxMacPlatformFontList::UseATSFontEntry() return false
> even on 10.5) without blowing things up?

I wouldn't do this unless you have clear evidence that the cause is somehow font related.
(Reporter)

Comment 13

6 years ago
(In reply to Steven Michaud from comment #8)
> I'm going to try doing that in combination with backing out my patch for bug
> 721663, to see if it makes any difference.
I don't think it's a good idea because bug 721663 was representing 1.8% of all Mac OS X crashes while this bug represent 0.8% of all Mac OS X crashes.
> I'm going to try doing that in combination with backing out my patch for bug
> 721663, to see if it makes any difference.

I'm not suggesting this as a solution.  I'll be doing it for debugging purposes -- to learn more about what's going on.
I tried turning off ATS fonts on OS X 10.5 and also backing out my patch for bug 721663.  I still crashed.

So it looks like this bug's crashes have nothing to do with ATS fonts, or (probably) with font code in general.

By the way, turning off ATS fonts on OS X 10.5 didn't make things blow up.  But it did make text very ugly (probably because we fall back to some default font).
>I tried turning off ATS fonts on OS X 10.5 and also backing out my patch for bug
> 721663.  I still crashed.

I still crashed with the STR from bug 721663 comment #3.
So it really does look like I'm stuck until someone can figure out how to reproduce this bug's crashes.
There's been exactly one of this bug's crashes on the 13 or 14 branches in the last month:

bp-76ec25e6-61f3-410f-835d-8b48b2120322

So there's some reason to hope that they've been (somehow) worked around in current code.
(Following up comment #15)

I also tried turning on ATS fonts on OS X 10.6.8.  This *didn't* make the STR from bug 721663 comment #3 start working on 10.6.8.
(Reporter)

Comment 20

6 years ago
(In reply to Steven Michaud from comment #18)
> So there's some reason to hope that they've been (somehow) worked around in
> current code.
I won't be so sure because Aurora (5,000 ADU) and Nightly (1,500 ADU) Mac populations are low comparing with 4M ADU in Release.
Created attachment 608900 [details] [diff] [review]
Possible patch

I've found another way to fix bug 721663 than my original patch for it
-- changing my patch for bug 671064 to use a snapshot of a
cairo_source_t object, instead of just retaining and releasing the
original object.

My "new" patch for bug 671064 is arguably more correct than my
original patch.  And since bug 721663 does seem to be related to this
bug (bug 737509), there's a good chance my patch will also fix this
bug.

Using a snapshot in _cairo_surface_to_cgimage() is more resource
intensive than just using the "original" cairo_source_t object,
suitably addrefed.  But my gut tells me the penalty won't be large --
_cairo_surface_to_cgimage() isn't called *that* often.  In any case I
don't know how to measure this presumed performance penalty.  If
anyone knows of tools that might be able to help, please comment here.

Andrea, this patch follows your advice in bug 671064 comment #30.
Please let me know if I've got it right.

I'm still convinced, as I was at bug 721663 comment #26, that these
bugs aren't caused by accessing deleted objects.  (For example, it
doesn't help to comment out every call to CG.*Release() in
cairo-quartz-surface.c.)  It's pretty clearly some sort of memory
corruption, though I still haven't figured out exactly how it happens.

However I do now have a	hunch:

As I understand it, a Cairo snapshot is	immutable.  My hunch is that
bug 721663's and this bug's crashes happen when data is altered	in the
midst of Core Graphics calls to draw objects (for example
CGImageMaskCreate()).  So using an immutable snapshot prevents these
changes from happening, thereby fixing the crashes.  That the crashes
happen only on OS X 10.5 may mean that on 10.6 and 10.7 the Core
Graphics calls make copies of the data before acting on it.

Andrea, do you think this is likely/possible?
Assignee: nobody → smichaud
Status: NEW → ASSIGNED
Attachment #608900 - Flags: review?(ranma42)
Attachment #608900 - Flags: review?(joe)
I've started tryserver builds, which should be available by tomorrow morning.
Here's a tryserver build made with my patch from comment #21:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/smichaud@pobox.com-1eb583c860a1/try-macosx64/firefox-14.0a1.en-US.mac.dmg

By the way, there were no non-spurious test failures during the tryserver runs.
Attachment #608900 - Flags: review?(joe) → review?(jmuizelaar)
(Reporter)

Comment 24

6 years ago
It's #4 top browser crasher in 11.0 and #12 in 12.0b2 on Mac OS X.
Created attachment 612977 [details] [diff] [review]
Debug version of my patch

(Following up comment #21)

> However I do now have a hunch:
>
> As I understand it, a Cairo snapshot is immutable.  My hunch is that
> bug 721663's and this bug's crashes happen when data is altered in
> the midst of Core Graphics calls to draw objects (for example
> CGImageMaskCreate()).  So using an immutable snapshot prevents these
> changes from happening, thereby fixing the crashes.

I've used this debugging patch to confirm my hunch.

On both OS X 10.5.8 and 10.6.8, following the STR from bug 721663
comment #3 causes the following sequence to be logged:

_cairo_surface_to_cgimage(1): Took snapshot 0x12a113cd0
_cairo_surface_snapshot_copy_on_write(1): surface 0x12a113cd0
_cairo_surface_to_cgimage(1): Took snapshot 0x12a113e20
_cairo_surface_snapshot_copy_on_write(1): surface 0x12a113e20
_cairo_surface_to_cgimage(1): Took snapshot 0x12a113f70
_cairo_surface_snapshot_copy_on_write(1): surface 0x12a113f70
_cairo_surface_to_cgimage(1): Took snapshot 0x12a1140c0
_cairo_surface_snapshot_copy_on_write(1): surface 0x12a1140c0
DataProviderReleaseCallback(): Done with snapshot 0x12a113e20
DataProviderReleaseCallback(): Done with snapshot 0x149016d10
DataProviderReleaseCallback(): Done with snapshot 0x12a113cd0
DataProviderReleaseCallback(): Done with snapshot 0x12a113f70
DataProviderReleaseCallback(): Done with snapshot 0x12a1140c0

This shows several cases of _cairo_surface_snapshot_copy_on_write()
being called on a snapshot before cairo_surface_destroy() is called on
it from DataProviderReleaseCallback().  This indicates an attempt to
"change" the snapshot before CGImageMaskCreate() is finished with it.
But since snapshots are copy-on-write, in fact no change takes place
to the snapshot itself -- the change is made to a copy of the
snapshot.  Which is what we want, and why my patch is a good thing.

This debugging patch includes the actual patch for this bug (from
comment #21).

My logs don't show any "illegal" attempts to change the snapshots --
none of the "ERROR!" messages appear in them.  Which is also a good
thing, but isn't the result of my patch.
Comment on attachment 608900 [details] [diff] [review]
Possible patch

Sure. Very sorry this took so long.
Attachment #608900 - Flags: review?(jmuizelaar) → review+
Comment on attachment 608900 [details] [diff] [review]
Possible patch

Landed on mozilla-central:
http://hg.mozilla.org/mozilla-central/rev/17e4143dd6f0

Andrea, I'm not going to wait for your review.  But let us know if you have any comments.

I actually intended to land this on mozilla-inbound ... but my fingers lead the way to mozilla-central.  No bad thing, though -- this should probably land as soon as possible.

I'll wait a few days, then request approval for aurora.  I'm also tempted to request beta approval, though it may be a bit too late for that.
Attachment #608900 - Flags: review?(ranma42)
(Assignee)

Updated

6 years ago
Status: ASSIGNED → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Sorry, this patch may have caused this new crash [@ _cairo_surface_snapshot_copy_on_write]:
https://tbpl.mozilla.org/php/getParsedLog.php?id=10755974&tree=Firefox

I've backed out the patch for now, until the cause of the crash is identified:
https://hg.mozilla.org/mozilla-central/rev/52ee2124247d
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
The crash was on OS X 10.5.8.

I'll try running tests locally on 10.5.8 and see what happens.
I retriggered the test and the second run was green.  So the crash is intermittent.  (That might make it harder to debug, and it means we aren't sure this patch caused the crash.)
> I'll try running tests locally on 10.5.8 and see what happens.

The trunk won't build on 10.5.8 anymore, at least with the mozconfig I normally use.

So for the moment I'm stuck.  I'll see what I can turn up tomorrow.
(Following up comment #31)

Actually, Bob Clary's advice in bug 721744 comment #8 gets the build working again.

But it'll be tomorrow before I start playing with it.
Last night I ran another set of tryserver tests for my patch, and this morning I discovered the same crash again, also on OS X 10.5.8.  So the crash must be related to my patch, though I can't believe my patch actually causes it.  (It's much more likely the crash is caused by another bug uncovered by my patch.)

I still haven't managed to complete a build on 10.5.8 -- now it looks like I need to upgrade Python from 2.5 to 2.6.  But I'll be spending the rest of the day on this, trying to get a gdb crash stack for the crash.

My hope is to take care of this over the next few days, by fixing the bug my patch (presumably) uncovered.  But I now think it's too risky trying to get a fix for this bug's crashes into FF 12.  We'll just have to live with them til FF 13.
> Last night I ran another set of tryserver tests for my patch, and this morning I
> discovered the same crash again, also on OS X 10.5.8.

https://tbpl.mozilla.org/php/getParsedLog.php?id=10764019&tree=Try&full=1#error0
> I still haven't managed to complete a build on 10.5.8 -- now it looks like I need
> to upgrade Python from 2.5 to 2.6.

Turns out I just needed to install simplejson.  See bug 741084 comment #7.
For the record (so it doesn't get lost), here's the log of the original crash, which occurred during a different browser-chrome test:

https://tbpl.mozilla.org/php/getParsedLog.php?id=10755974&tree=Firefox&full=1#error0
And here's yet another crash, which happened during one of the tryserver runs of my patch for bug 313700 (which was built during the short time this bug's patch was on the trunk):

https://tbpl.mozilla.org/php/getParsedLog.php?id=10762011&tree=Try&full=1#error0

This crash happened during the same browser-chrome test as did the one from comment #36.
I'm making progress on this ... but it's very indirect and painfully slow.

The crash stacks from comment #34, comment #36 and comment #37 are at least partially corrupt.  For example cairo_surface_flush() does *not* contain line 331 of cairo-surface.c.  But what *does* appear at line 331 (a call (indirectly) to _cairo_surface_snapshot_copy_on_write() in cairo_surface_detach_snapshot) makes sense as part of the stack.

The gdb stacks I've been able to get are much worse -- as best I can tell they're completely corrupt and useless.  Maybe it's something about the test harness environment (since I'm invoking gdb with the --debugger=gdb parameter to runtests.py).

But I *am* able to reproduce crashes in gdb with browser_tabview_bug627288.js with my patch, which don't happen without it.  So at least I can mount a brute force attack on these crashes.

I'll be spending all day tomorrow on this ... and very possibly also the rest of this week.
Created attachment 615762 [details] [diff] [review]
Work in progress

I've found the bug uncovered by my original patch for this bug, and
I've created a new patch that fixes it.  But I'm not entirely
satisfied with it, because it may be too specifically tailored to the
mochitest crashes.

What causes these crashes is that code added by Joe's patch for bug
685767 to gfxBlur.cpp (to gfxAlphaBoxBlur::Init()) makes an assumption
that turns out to be incorrect.

http://hg.mozilla.org/mozilla-central/annotate/40455cbb1ad3/gfx/thebes/gfxBlur.cpp#l73

The incorrect assumption is that gfxAlphaBoxBlur::mImageSurface,
created by gfxAlphaBoxBlur::Init(), will live no longer than the
gfxAlphaBoxBlur object that created it:

A gfxImageSurface object (or any subclass of gfxASurface) lives as
long as its internal cairo_surface_t object (gfxASurface::mSurface)
does, and the reference count of this object may be increased by cairo
code -- which would make it live longer than its "owner" (the
gfxAlphaBoxBlur object that created it).

On the basis of the above (incorrect) assumption,
gfxAlphaBoxBlur::Init() creates an AlphaBoxBlur object (mBlur), whose
data it passes to the gfxImageSurface object (mImageSurface) as
"existing" data (data not owned by the gfxImageSurface object), and
which gets destroyed in the gfxAlphaBoxBlur destructor.  But if the
gfxImageSurface object lives longer than its "owning" gfxAlphaBoxBlur
object, then the "existing" data will become invalid before the
gfxAlphaBoxBlur object is destroyed.

My current patch addresses this problem by making an AlphaBoxBlur
object ref-countable, by passing this object (mBlur) to the
gfxImageSurface constructor, and by making the gfxImageSurface object
hold a reference to it.

Doing this fixes the mochitest crashes.  But I can't help but wonder
if gfxImageSurface objects need to hold a reference to *any*
"existing" data.

I'm going to devise some tests for this, and start playing around with
them.  The first thing I'll do is make the gfxImageSurface destructor
deliberately reference any non-owned data (in mData), so see if this
causes crashes.  I'll do these tests on OS X 10.5, on which jemalloc
is disabled.
> I'm going to devise some tests for this, and start playing around
> with them.  The first thing I'll do is make the gfxImageSurface
> destructor deliberately reference any non-owned data (in mData), so
> see if this causes crashes.  I'll do these tests on OS X 10.5, on
> which jemalloc is disabled.

I've run all our browser-chrome and chrome tests on this test build,
together with most of the regular mochitests (I couldn't get that test
run to finish).

I found that the following regular mochitests crashed consistently in
my altered gfxImageSurface destructor (even when run individually):

content/canvas/test/test_2d.composite.uncovered.pattern.destination-atop.html
content/canvas/test/test_2d.composite.uncovered.pattern.destination-in.html
content/canvas/test/test_2d.composite.uncovered.pattern.source-in.html
content/canvas/test/test_2d.composite.uncovered.pattern.source-out.html
content/canvas/test/test_canvas.html
layout/base/tests/test_bug731777.html

And the following test crashed once (though I couldn't get it to crash
again):

layout/base/tests/test_bug93077-1.html

So it does look like the "new" bug (the one uncovered by my original
patch for this bug, and partially fixed by my patch from comment #39)
does need a more general fix.

I'll be working on this.
Created attachment 617123 [details] [diff] [review]
Debugging version of work-in-progress patch

For those who are curious, here's the debugging patch I've been working with.  It incorporates my patch from comment #39.
(Reporter)

Comment 42

5 years ago
It's #5 top crasher in 13.0 and #28 in 14.0b6 on Mac OS X.
(Reporter)

Comment 43

5 years ago
Let's close it as WFM now that Mac OS X 10.5 is no longer supported in Firefox 17.
Status: REOPENED → RESOLVED
Last Resolved: 6 years ago5 years ago
Resolution: --- → WORKSFORME
(In reply to comment #43)

That's fine with me.

There's still unfinished work here, but I'm not going to be able to get back to it for a while.  There are too many other more bugs that are more urgent than this one, especially now that OS X 10.5 has been desupported.

When (if?) I do get back to this, I'll do it in a different bug.
You need to log in before you can comment on or make changes to this bug.