Closed Bug 380494 Opened 17 years ago Closed 17 years ago

crash [@ moz_cairo_win32_surface_get_image][@ cairo_win32_surface_create_for_dc]

Categories

(Core :: Graphics, defect)

defect
Not set
critical

Tracking

()

VERIFIED FIXED

People

(Reporter: Peter6, Assigned: vlad)

References

Details

(Keywords: crash, regression, topcrash)

Crash Data

Attachments

(2 files, 3 obsolete files)

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a5pre) Gecko/20070511 Minefield/3.0a5pre ID:2007051106 [cairo]

since Bug 371135 landed (uncertain) some people get unreproducable crash.
While browsing FF simply crashes, no obvious reason.

Incident ID: 32028548
Stack Signature	_cairo_win32_surface_create_for_dc 658fc8f7
Product ID	FirefoxTrunk
Build ID	2007051004
Trigger Time	2007-05-11 00:24:50.0
Platform	Win32
Operating System	Windows NT 5.1 build 2600
Module	firefox.exe + (004ed4cb)
URL visited	
User Comments	sudden death after startup, no obvious reason
Since Last Crash	43231 sec
Total Uptime	43231 sec
Trigger Reason	Access violation
Source File, Line No.	e:\builds\tinderbox\fx-trunk\winnt_5.2_depend\mozilla\gfx\cairo\cairo\src\cairo-win32-surface.c, line 373
Stack Trace 	
_cairo_win32_surface_create_for_dc  [mozilla/gfx/cairo/cairo/src/cairo-win32-surface.c, line 373]
gfxWindowsNativeDrawing::PaintToContext  [mozilla/gfx/thebes/src/gfxwindowsnativedrawing.cpp, line 272]
nsNativeThemeWin::DrawWidgetBackground  [mozilla/widget/src/windows/nsnativethemewin.cpp, line 852]
0x002b7058
Keywords: crash
FWIW, your crash involved _cairo_win32_surface_create_for_dc, but someone else had a crash with _moz_cairo_win32_surface_get_image with the same two functions in the stack.

Stack Signature	 _moz_cairo_win32_surface_get_image de8e4739
Product ID	FirefoxTrunk
Build ID	2007051106
Trigger Time	2007-05-11 19:31:02.0
Platform	Win32
Operating System	Windows NT 5.1 build 2600
Module	firefox.exe + (004ed8fa)
URL visited	
User Comments	
Since Last Crash	34552 sec
Total Uptime	34552 sec
Trigger Reason	Access violation
Source File, Line No.	e:\builds\tinderbox\fx-trunk\winnt_5.2_depend\mozilla\gfx\cairo\cairo\src\cairo-win32-surface.c, line 1856
Stack Trace 	
_moz_cairo_win32_surface_get_image  [mozilla/gfx/cairo/cairo/src/cairo-win32-surface.c, line 1856]
gfxWindowsNativeDrawing::PaintToContext  [mozilla/gfx/thebes/src/gfxwindowsnativedrawing.cpp, line 253]
nsNativeThemeWin::DrawWidgetBackground  [mozilla/widget/src/windows/nsnativethemewin.cpp, line 968]
Severity: major → critical
Summary: crash [@ moz_cairo_win32_surface_get_image] → crash [@ moz_cairo_win32_surface_get_image][@ cairo_win32_surface_create_for_dc]
Seems to happen for me whenever I open tools -> add-ons.

Talkback TB32069036Q
CCing vlad since this appears to be the same crash we were talking about with aja a few nights ago in IRC.
OS: Windows XP → All
Hardware: PC → All
(In reply to comment #4)
> CCing vlad since this appears to be the same crash we were talking about with
> aja a few nights ago in IRC.
> 
FWIW, crashes didn't occur for me in -safe-mode, or when I disabled two extensions (cmSiteNavigator Toolbar 0.8, and Link Widgets 1.6). I also tried launching the buttons.png from Link Widgets into the content window and mousing over the buttons in it repeatedly, and no crash.  
I verified bug 371135 as the culprit via my own builds by pulling source by time.
Blocks: 371135
I created a debug build and put some debug prints into CheckSurfaceSize and what i found was that it was the first check for either width or height being <=0 that was triggering this issue.  I cot the following debug print an assertion:

Size <=0 - w=69, h=0,l=0
###!!! ASSERTION: gfxASurface::AddRef without mSurface: 'mSurface != nsnull', file c:/mozilla/trunk/gfx/thebes/src/gfxASurface.cpp, line 66

Further investigation found that the call causing this was this one:

http://lxr.mozilla.org/seamonkey/source/gfx/thebes/src/gfxWindowsNativeDrawing.cpp#173

I have no idea why we should be here with a height of zero.

I tried adding a check here for the surface creation failing by making it return nsnull, but that does not prevent the browser from crashing.

There would appear to be 2 issues here.

1.  Why should we be at this point in the code at all with a zero height.

and

2.  If this surface creation does return an error what needs to be done to prevent a subsequent browser crash.

During my investigation I also found that the code here:

http://lxr.mozilla.org/seamonkey/source/gfx/thebes/src/gfxASurface.cpp#170

is incorrect in that is should be only doing the ADDREF if result is not null.

Attached patch workaround (obsolete) — Splinter Review
I don't think this is a proper fix.   This merely avoids the crash by permitting zero as a valid value of height and/or width of a surface.
Attached patch potential fix (obsolete) — Splinter Review
Can someone who can reproduce this try this fix?  I can't manage to trigger the crash.
Assignee: nobody → vladimir
Status: NEW → ASSIGNED
(In reply to comment #9)
> Created an attachment (id=264771) [details]
> potential fix
> 
> Can someone who can reproduce this try this fix?  I can't manage to trigger the
> crash.
> 

This resolves the crash issue for me.
Comment on attachment 264771 [details] [diff] [review]
potential fix

roc, does this seem like a good approach to fixing this (we'll need similar wallpaper on linux)?  The problem is that now we're explicitly failing on attempts to get surfaces that are 0 width or 0 height; before, there was code in cairo that made those 1 pixel in size that we put in to wallpaper.

I think we should fix the callers; otherwise, we can just take the earlier patch in this bug and make 0 width/height valid for surfaces... (though that causes problems with mallocs and whatever, so.. we'll see)
Attachment #264771 - Flags: review?(roc)
Comment on attachment 264771 [details] [diff] [review]
potential fix

I think this is fine, modulo the answers to two questions:
1) Why 2 and not 1 for the size?
2) What is the rationale to not supporting zero-sized surfaces in cairo? That seems like an unnecessary complication to the API.
Attachment #264771 - Flags: superreview+
Attachment #264771 - Flags: review?(roc)
Attachment #264771 - Flags: review+
Ok, based on:

16:10 < cworth> vlad_: I don't know that we've ever tested it, but I'd consider cairo broken to not handle 0-sized surfaces, (just like I curse broken xlib for not handling these).

I believe win32, xlib, and quartz all have explicit support for this.  So, we allow zero-sized surfaces.
Attachment #264698 - Attachment is obsolete: true
Attachment #264771 - Attachment is obsolete: true
Attachment #264809 - Flags: review?(roc)
(In reply to comment #13)
> Created an attachment (id=264809) [details]
> allow for zero-sized surfaces

That works for me as well.  I like your cleverer way to avoid dividing by zero.
Comment on attachment 264809 [details] [diff] [review]
allow for zero-sized surfaces

If any of these tests are true then it's very likely due to a
programming error, so I would like to have assertions there.
I've been able to reasonably reproduce this at litmus.mozilla.org. Not simple, but fairly reliable. I hope it helps verify a fix.

-open a test run for trunk smoketests
-unhide all the test cases
-click a radio button for a test case
-hide that test case by clicking its title
-repeat with next test case until crash. usually between 2 and 7 testcases.

other talkback suggest crashing with this stack at Tools > Addons. But I haven't been able to repro that.
Keywords: topcrash
(In reply to comment #15)
> (From update of attachment 264809 [details] [diff] [review])
> If any of these tests are true then it's very likely due to a
> programming error, so I would like to have assertions there.

Yeah, good point.  I just added in NS_ERROR's to those cases.
With this patch installed I ran into an issue where the browser would sometimes hang when switching back and forth between extensions and themes in the addons window.  I added the following which seems to have helped.

Index: gfx/thebes/src/gfxImageSurface.cpp
===================================================================
RCS file: /cvsroot/mozilla/gfx/thebes/src/gfxImageSurface.cpp,v
retrieving revision 1.12
diff -u -8 -r1.12 gfxImageSurface.cpp
--- gfx/thebes/src/gfxImageSurface.cpp  10 May 2007 19:58:09 -0000      1.12
+++ gfx/thebes/src/gfxImageSurface.cpp  15 May 2007 20:08:16 -0000
@@ -44,17 +44,20 @@
 gfxImageSurface::gfxImageSurface(const gfxIntSize& size, gfxImageFormat format)
 :
     mSize(size), mFormat(format)
 {
     mStride = ComputeStride();

     if (!CheckSurfaceSize(size))
         return;

-    mData = (unsigned char *) malloc(mSize.height * mStride);
+    if (mSize.height)
+        mData = (unsigned char *) malloc(mSize.height * mStride);
+    else
+        mData = (unsigned char *) malloc(1);
     if (!mData)
         return;

     mOwnsData = PR_TRUE;

     cairo_surface_t *surface =
         cairo_image_surface_create_for_data((unsigned char*)mData,
                                             (cairo_format_t)format,
Ugh, checked in before I saw that comment.  That makes sense, I'll get that in as a followup.  I still don't like that something is allocating a zero sized surface, but I guess it's ok.
Attached patch followup (obsolete) — Splinter Review
Followup patch to avoid malloc(0)
Attachment #264913 - Flags: review?(roc)
malloc(0) returns NULL, so why is this needed? cairo_image_surface_create_for_data shouldn't be looking at the data if it's a size-zero surface. Is there an underlying bug here?
It doesn't, as far as I can tell -- is malloc(0) returning NULL portable?

However, the problem is that there's a check there if the allocated memory is NULL... so maybe that patch should be more like

if (height * stride > 0) {
   data = malloc(...);
   if (!data)
      return;
} else {
   data = nsnull;
}

?
(In reply to comment #22)
> It doesn't, as far as I can tell -- is malloc(0) returning NULL portable?

Yes.

> However, the problem is that there's a check there if the allocated memory is
> NULL... so maybe that patch should be more like
> 
> if (height * stride > 0) {
>    data = malloc(...);
>    if (!data)
>       return;
> } else {
>    data = nsnull;
> }

Sure.
What's been checked in already appears to have resolved the reproducible crash
I'd been having with the 2 link navigation extensions mentioned in comment 5
above.
(In reply to comment #23)
> (In reply to comment #22)
> > It doesn't, as far as I can tell -- is malloc(0) returning NULL portable?
> 
> Yes.
> 
Googling for malloc 0 and msvc malloc 0 got me to discussions that seem to indicate that MSVC and some versions of Linux do not return NULL for a zero byte malloc.  I am going to do do a debug build with with some extra debug printf's near the malloc and try to figure out what is really causing the hang (assuming I can reproduce it).
(In reply to comment #25)
> Googling for malloc 0 and msvc malloc 0 got me to discussions that seem to
> indicate that MSVC and some versions of Linux do not return NULL for a zero
> byte malloc.

Links?

Anyway it doesn't matter in this case.
(In reply to comment #26)
> (In reply to comment #25)
> > Googling for malloc 0 and msvc malloc 0 got me to discussions that seem to
> > indicate that MSVC and some versions of Linux do not return NULL for a zero
> > byte malloc.
> 
> Links?
Here are a mess of links that all come down to the bottom line of, if you are trying to write portable code you should never call malloc with a zero argument.

But, I see that calling malloc with a zero and getting null in return takes a different codepath in this case so I will try the new patch and report back.
You're right. But anyway, it still shouldn't matter.
(In reply to comment #23)
> > However, the problem is that there's a check there if the allocated memory is
> > NULL... so maybe that patch should be more like
> > 
> > if (height * stride > 0) {
> >    data = malloc(...);
> >    if (!data)
> >       return;
> > } else {
> >    data = nsnull;
> > }

Well, for some odd reason I could not reproduce the issue with a debug build and then rebuilt without debug and could not reproduce it there either.  I might have to go back to trying this on the work computer where I had the issue originally, but I won't be back there until Thursday.

In any event I did create a build with code like above and it at least does not seem to introduce any new issues and des seem to make sense.
Attached patch followupSplinter Review
Fixed followup.  Whether this was causing the crashing or not, we should take this anyway -- otherwise we can't create 0-sized gfxImageSurfaces.
Attachment #264913 - Attachment is obsolete: true
Attachment #265053 - Flags: review?(roc)
Attachment #264913 - Flags: review?(roc)
Followup checked in.
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Flags: in-testsuite?
verified with trunk build from 5/24 and this hasn't appeared in talkback reports since the follow check in
Status: RESOLVED → VERIFIED
Crash Signature: [@ moz_cairo_win32_surface_get_image] [@ cairo_win32_surface_create_for_dc]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: