Closed Bug 444864 Opened 16 years ago Closed 16 years ago

Crash [@ nsNativeThemeCocoa::DrawCellWithScaling] with <html:input>, large letter-spacing

Categories

(Core :: Widget: Cocoa, defect, P1)

x86
macOS
defect

Tracking

()

VERIFIED FIXED

People

(Reporter: jruderman, Assigned: smichaud)

References

Details

(4 keywords, Whiteboard: [sg:critical?])

Crash Data

Attachments

(7 files, 4 obsolete files)

147 bytes, text/html
Details
3.75 KB, text/plain
Details
6.82 KB, text/plain
Details
622 bytes, text/plain
Details
2.64 KB, text/plain
Details
4.62 KB, patch
jaas
: review+
Details | Diff | Splinter Review
1.35 KB, patch
jaas
: review+
Details | Diff | Splinter Review
Exception Type:  EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x0000000031ff4988
Crashed Thread:  0

Thread 0 Crashed:
0   argb32_sample_ARGB32 + 383
1   argb32_image_mark + 601
2   argb32_image + 390
3   ripd_Mark + 326
4   ripl_BltImage + 1299
5   ripc_RenderImage + 273
6   ripc_DrawImage + 4973
7   CGContextDrawImage + 397
8   nsNativeThemeCocoa::DrawCellWithScaling(NSCell*, CGContext*, CGRect const&, _NSControlSize, float, float, float, float, float const (*) [3][4], int) + 2082 (nsNativeThemeCocoa.mm:304)
9   nsNativeThemeCocoa::DrawPushButton(CGContext*, CGRect const&, int, int, int) + 1482 (nsNativeThemeCocoa.mm:463)
10  nsNativeThemeCocoa::DrawWidgetBackground(nsIRenderingContext*, nsIFrame*, unsigned char, nsRect const&, nsRect const&) + 2978 (nsNativeThemeCocoa.mm:1116)

Related to bug 444260?
My stack looked a bit different from Jesse's when I crashed with his testcase.  nsNativeThemeCocoa::DrawCellWithScaling wasn't called for me.
Whiteboard: [sg:critical?]
Here's a stack trace I made in gdb using a 1.9.0-branch opt build with
debug symbols.  Notice that the deleted object appears to be a Gecko
object (its address appears as a parameter in several Gecko methods in
the trace).  This is different from bug 444260, where the deleted
object appears to be an internal Cocoa object (one that's not directly
visible anywhere in browser code).

Oddly, though I crash on load (on both the trunk and the 1.9.0
branches) when I open the testcase from bmo, it's much more difficult
to crash when I download the testcase and open it using a file:// URL.
However, even when I don't crash, the browser's performance slows to a
miserable crawl (the slowness lasts until I leave the current page or
close the current window).

So I think there are really two bugs here:

1) A crash bug caused by referencing a deleted object, possibly in
   cross-platform code.

2) An OS-X-specific performance bug caused by calling
   CGContextDrawImage() (a system call) with an
   inappropriately-constructed CGImageRef.

The most urgent bug is #1.  But once it's fixed, I think we also need
to address #2.  When I get to that point, I'll open a new bug to
address #2 (which I won't label security sensitive).

> Related to bug 444260?

I don't think so.
Assignee: joshmoz → smichaud
Status: NEW → ASSIGNED
Attachment #329472 - Attachment mime type: application/octet-stream → text/plain
(In reply to comment #2)
> Here's a stack trace I made in gdb using a 1.9.0-branch opt build with
> debug symbols.

Oh, right.  Debug symbols :-)  Thanks.
I also crash on Linux when I load this bug's testcase into FF3
(loading it from bmo or from a file:// URL).  But Breakpad doesn't
come up, so all I have is what appears in the console.
I don't crash (in FF3) on Windows.
(Following up comment #2)

> Here's a stack trace I made in gdb using a 1.9.0-branch opt build
> with debug symbols.  Notice that the deleted object appears to be a
> Gecko object (its address appears as a parameter in several Gecko
> methods in the trace).

I'm wrong about the deleted object being a Gecko object -- when I
break on argb32_sample_ARGB32 (just before the crash) and get a stack
trace (doing bt), none of the functions' parameters match the address
of the object that eventually causes the crash (as they do when I get
a stack after the crash).  I have to conclude that the stack I get
after the crash is partially corrupt (at least with respect to the
function parameters that it reports).

After a bunch of digging around, I think it's most likely this bug's
deleted object is also an internal object (possibly a local variable
in the method where the crash takes place), like that of bug 444260.
Though it's not a Cocoa or CoreFoundation object (since neither
NSZombieEnabled=YES nor CFZombieLevel=4 makes the crashes stop).

So it looks like I'm going to have to tackle this bug's crash
indirectly, by resolving problem #2 (from comment #2).
Finally, I'm no longer at all sure this bug's invalid pointer
(0x33c1b988 from comment #2's trace) corresponds to a deleted object.
This address is (I think) more likely to lie just below the address
space occupied by a valid object.
Attached patch Provisional fix (obsolete) — Splinter Review
Here's a patch modeled on my patch for bug 444260 (attachment 328698 [details] [diff] [review]),
but which I think obsoletes it.  This patch fixes both this bug (bug
444864) and bug 444260.
Attachment #329570 - Flags: review?(joshmoz)
Flags: wanted1.9.1?
Flags: wanted1.9.0.x?
Priority: -- → P1
Josh, can you review this?
Flags: wanted1.9.0.x?
Flags: wanted1.9.0.x+
Flags: blocking1.9.0.3?
My patch (attachment 329570 [details] [diff] [review]) also fixes bug 449111.  So that's now
three (potentially) security-sensitive crash bugs that it fixes -- bug
444260, bug 444864 and bug 449111.
Attachment #329570 - Flags: review?(joshmoz) → review?(mstange)
Attachment #329570 - Flags: review?(joshmoz)
After talking to Josh and Markus last week about my patch (attachment
329570 [review]) on IRC, I went over the code (and my patch) again to see if I
could either come up with a better patch, or come up with better
justifications for it.

I've not been able to come up with a better patch (save for a minor
change to how I get the focused view's bounds).  But until and unless
we (or someone else) _can_ come up with something better, I think my
patch (as altered) is a reasonable way to fix these three bugs (bug
444260, bug 444864 and bug 449111).  Here's why.

The coordinate system for destRect (with which
nsNativeThemeCocoa::DrawCellWithScaling() is called) is one for an
nsIFrame object (which means that it's ultimately the coordinate
system for the NSView into which DrawCellWithScaling() will draw).
Because of this, I've replaced (in my new patch) focusViewFrame with
focusViewBounds, which I initialize with a call to [focusView bounds]
(which uses focusView's coordinate system).

The problems of all three bugs arise when we call the scaling code (in
DrawCellWithScaling) on an object whose destRect is unresonably large.
These include crashes.  But they also include the performance bug (#2)
that I describe in comment #2 above.  So getting rid of these crashes
isn't really enough (by itself) to "fix" these problems.

Also (aside from the performance problem) the buttons' display (from
all three bugs) is much more "broken" when I just tackle the crash
directly (e.g. using my first patch for bug 444260, attachment 328691 [details] [diff] [review])
than it is with my current patch.  My current patch also makes the
buttons' display conform much more closely to how they're drawn by FF3
on Windows and Linux (and by FF2 on all platforms).

So I think it makes sense to avoid the scaling code "when we need to".

My current patch decides this using a rough rule of thumb -- we should
avoid the scaling code when the object we're drawing will overflow the
NSView object into which it's being drawn.

I considered (and tried out) simply truncating the size of the object
to be drawn (making sure drawRect is never larger than the NSView into
which we're drawing).  But it sometimes makes sense to draw a larger
NSCell object into a smaller NSView object -- for example (as in this
bug's case) it sometimes makes sense to make an NSCell object larger
than the current browser page (so that one has to scroll the page to
see the whole object).

Next I considered limiting the size of the NSCell object to some
multiple of the size of the NSView object into which it's drawn.  This
could be a reasonable approach ... but how do you decide what the
limits are?

My rough rule of thumb, though rough, has the advantage of being very
simple.

Finally, Markus noticed (last week) that my patch doesn't get rid of
all drawing problems -- for example, parts of the buttons tend to
disappear when you scroll the browser window horizontally (in all
these bugs' examples).  But 1) I'm not sure there's anything we can do
about this (I suspect it's due to an Apple bug in [NSCell
drawWithFrame:inView:]) and 2) It's not unreasonable that that buttons
with such unreasonable dimensions don't draw perfectly.  It's only
unreasonable that we crash while drawing them :-)
Attachment #329570 - Attachment is obsolete: true
Attachment #335379 - Flags: review?(mstange)
Attachment #335379 - Flags: review?(joshmoz)
Attachment #329570 - Flags: review?(mstange)
Attachment #329570 - Flags: review?(joshmoz)
Comment on attachment 335379 [details] [diff] [review]
Provisional fix rev1 (use correct coordinate system)

I'm fine with this approach.
Attachment #335379 - Flags: review?(mstange) → review+
Flags: blocking1.9.1?
Here's a patch that takes a new approach -- it disables scaling if the
area of drawRect (in pixels^2) is too large.

This "works", but I think it's less robust (and therefore not as good)
as my previous patch.  The main reason is that you need to make
CELL_SCALING_MAX_AREA quite small in order to avoid both crashes and
(very) poor drawing performance.  (I'll talk more about this in a
later comment.)

In my next comment I'll post another patch that (more or less) follows
the approach of my previous patch (with some improvements).

Both patches now check for [NSView focusView] returning NULL -- which
I discovered can happen, and can itself lead to crashes.
Attachment #335379 - Attachment is obsolete: true
Attachment #335379 - Flags: review?(joshmoz)
Here's a patch that improves on my previous patch, while still taking
(more or less) the same approach.

Instead of checking if drawRect is larger than the currently focused
NSView object, it checks if either its width or its height is more
than twice as large as the window containing the currently focused
NSView object.  This deals with the edge case of focusedView being
substantially smaller than its window.
Josh asked me to find an absolute "pixel size" above which we can
avoid scaling.  As I say in comment #15, I did find a way to do this.
But (for various reasons), that approach isn't as good as that taken
by my rev3 patch from comment #16.

The main difficulty is that (as I say in comment #2) this bug is
really two bugs:  At very large "pixel sizes", crashes will happen
(using the testcases for all three bugs mentioned here -- bug 444260,
bug 444864 and bug 449111).  But if you alter the testcases for bug
444864 and bug 444260 to make their "pixel sizes" just small enough to
avoid crashes, it will take horribly long to scroll or resize those
(altered) examples' windows.

For some reason the performance problem is particularly bad if
CGContextDrawImage() rescales its image (in either dimension) by a
factor close to but less than 1 (e.g. by 0.9):  You need to make
CELL_SCALING_MAX_AREA quite small to avoid this worst-case scenario.

In tests on nsNativeThemeCocoa::DrawCellWithScaling(), with either
xscale or yscale set to 0.9, I found that I needed to set
CELL_SCALING_MAX_AREA to approximately 500000 (I tested on a very fast
machine (an 8-core MacPro) running OS X 10.5.4 and a moderately fast
machine (an early model MacBook Pro) running OS X 10.4.11).

In tests using Markus's test app from comment #14, I found I could get
reasonable performance (with 0.9 scaling) using an "area" as large as
750000.  But my tests with nsNativeThemeCocoa::DrawCellWithScaling()
are obviously more directly relevant.
Side note:  Interestingly, I found that CELL_SCALING_MAX_AREA doesn't
seem to depend on how much RAM you have installed.  My
just-small-enough-to-avoid-crashes value for CELL_SCALING_MAX_AREA was
the same on my MacPro (with 4GB of RAM) as on my MacBook Pro (with 2GB
of RAM).
Attachment #337535 - Flags: review?(joshmoz)
Attachment #337532 - Flags: review+
Attachment #337535 - Flags: review?(joshmoz) → review-
Attachment #337532 - Flags: superreview?(roc)
Comment on attachment 337535 [details] [diff] [review]
Fix rev3 (avoid scaling if drawRect too much larger than its window)

I'm still not convinced my rev2 patch is better than my rev3 patch.

But my rev2 patch will do the job.  And it has the advantage that it
will also catch (i.e. prevent) attempts to rescale large cells (large
in area) that might otherwise cause performance problems.
Attachment #337535 - Attachment is obsolete: true
Flags: blocking1.9.0.3? → blocking1.9.0.3+
Attachment #337532 - Flags: superreview?(roc) → superreview+
rev2 patch (attachment 337532 [details] [diff] [review]) landed on mozilla-central.
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Comment on attachment 337532 [details] [diff] [review]
Fix rev2 (avoid scaling if drawRect too large)

Do people feel this needs test(s)?  If so, what kind of test(s)?
Attachment #337532 - Flags: approval1.9.0.3?
I think crashtests would be good - just put the testcases into /widget/src/cocoa/crashtests and update crashtests.list.
Flags: blocking1.9.1? → blocking1.9.1+
Flags: wanted1.9.1?
> rev2 patch (attachment 337532 [details] [diff] [review]) landed on mozilla-central.

Backed out because it likely caused some reftest failures (see bug
455346).

I'm looking into this.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Turns out the reftest failures were a real issue (and not just another
example of test flakiness, or test-machine flakiness).

They were caused by the code I added (to both rev2 and rev3) to take
an early out when [NSView focusView] returns nil.  Turns out this is
unnecessary for my rev2 patch ([NSCell drawWithFrame:inView:] can deal
with it if [NSView focusView] returns nil).  And the fact that
nsNativeThemeCocoa::DrawCellWithScaling() could return without drawing
anything was (apparently) what caused the reftest failures.

The reftest failures mentioned in bug 455346 no longer happen when I
run reftests locally.  With luck they won't happen on the unit test
machines, either.

When I reland this patch, I intend to do it in the morning.  The unit
tests can take many hours to run, and I don't want to have to watch
this into the wee hours.

(The crashes I saw, which the nil focusView early out was needed to
fix, only happened with my rev3 patch.)
Attachment #337532 - Attachment is obsolete: true
Attachment #338736 - Flags: superreview?(roc)
Attachment #338736 - Flags: review?(joshmoz)
Attachment #337532 - Flags: approval1.9.0.3?
Attachment #338736 - Flags: review?(joshmoz) → review+
Attachment #338736 - Flags: superreview?(roc) → superreview+
rev4 patch landed on mozilla-central.

Over the next few hours, I'll be watching for any reftest failures.
Status: REOPENED → RESOLVED
Closed: 16 years ago16 years ago
Resolution: --- → FIXED
I followed Markus' suggestion from comment #22 and added the testcases for
this bug, bug 444260 and bug 449111 to widget/src/cocoa/crashtests.
Attachment #339118 - Flags: review?(joshmoz)
Comment on attachment 338736 [details] [diff] [review]
Fix rev4 (fix rev2's reftest failures)

Crashtests are available (attachment 339118 [details] [diff] [review]).
Attachment #338736 - Flags: approval1.9.0.3?
Comment on attachment 339118 [details] [diff] [review]
Crashtests for bugs 444864, 444260 and 449111

thanks
Attachment #339118 - Flags: review?(joshmoz) → review+
Flags: wanted1.8.1.x-
Blocks: 449111
Attachment #339118 - Flags: superreview?(roc)
Attachment #339118 - Flags: superreview?(roc) → superreview+
Crashtests (attachment 338736 [details] [diff] [review]) landed on mozilla-central.
> Crashtests (attachment 338736 [details] [diff] [review]) landed on mozilla-central.

Actually they're in attachment 339118 [details] [diff] [review].
Attachment #338736 - Flags: approval1.9.0.4? → approval1.9.0.4+
Comment on attachment 338736 [details] [diff] [review]
Fix rev4 (fix rev2's reftest failures)

Approved for 1.9.0.4, a=dveditz for release-drivers
Attachment #339118 - Flags: approval1.9.0.4?
Attachment #339118 - Flags: approval1.9.0.4?
Comment on attachment 339118 [details] [diff] [review]
Crashtests for bugs 444864, 444260 and 449111

Tests don't need approval to land, even on the branch. However, these shouldn't land until this bug is opened up publicly, after 3.0.4 is released.
Flags: in-testsuite?
Landed my rev4 patch on the 1.9.0 branch:

Checking in widget/src/cocoa/nsNativeThemeCocoa.mm;
/cvsroot/mozilla/widget/src/cocoa/nsNativeThemeCocoa.mm,v  <--  nsNativeThemeCocoa.mm
new revision: 1.94; previous revision: 1.93
done

(Sorry for the delay.)

I'll land the tests after 3.0.4 is released.
Keywords: fixed1.9.0.4
Verified with Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.4pre) Gecko/2008102104 GranParadiso/3.0.4pre for 1.9.0.4.
Group: core-security
Steven Michaud already checked in crashtests (comment 26):

http://hg.mozilla.org/mozilla-central/rev/427a44cf7ccd
Flags: in-testsuite? → in-testsuite+
verified fixed on the 1.9.1 branch using Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1b3pre) Gecko/20081229 Shiretoko/3.1b3pre and verified on trunk using Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2a1pre) Gecko/20081229 Minefield/3.2a1pre
Status: RESOLVED → VERIFIED
Crash Signature: [@ nsNativeThemeCocoa::DrawCellWithScaling]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: