Closed Bug 411369 Opened 17 years ago Closed 17 years ago

Crash while trying to print [@ @0x0 @0x1000 _cairo_meta_surface_create] [@ _moz_cairo_surface_finish ]

Categories

(Core :: Printing: Output, defect, P1)

x86
Windows Vista
defect

Tracking

()

VERIFIED FIXED

People

(Reporter: martijn.martijn, Assigned: sayrer)

References

Details

(5 keywords)

Crash Data

Attachments

(4 files, 4 obsolete files)

With every page, this crash happens for me here on current trunk build on Vista.

http://crash-stats.mozilla.com/report/index/4a0b1611-be42-11dc-aa45-001a4bd43ef6

On branch I get a "Printing failed when starting the document." error instead, which is at least better than crashing.

I haven't looked at when this regressed.
Flags: blocking1.9?
It also happens when printing to a file.
I don't crash with my debug build, but that build is a month old (20071204).
Let's see what happens after I have updated my debug build.
I just installed the XPS Document Writer driver on my windowsXP machine, but I don't crash there.
Flags: blocking1.9?
Hrm, I'm actually crashing with everything while trying to print with Firefox trunk on my Vista machine.
And now it's also happening on my windowsXP machine. There is clearly something wacky going on.
Can other persons besides me reproduce this?
Flags: blocking1.9?
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9b3pre) Gecko/2008010912 Minefield/3.0b3pre ID:2008010912

crash here too when trying to print with pdfcreator
http://sourceforge.net/projects/pdfcreator

I'm sure this worked a few weeks ago
Thanks for that regression window, CC-ing Neil, although it seems weird to me those check-ins would have caused this.
(In reply to comment #7)
> Thanks for that regression window, CC-ing Neil, although it seems weird to me
> those check-ins would have caused this.
> 
I doublechecked , using the build before and after the regressionwindow
It's Bug 376578 or Bug 409242
FWIW I am also crashing on Vista trying to print anything but a plain text page, like a bug here. 

I have no idea what XPS DocWriter is... mostly just a plain VistaHP install here as I've been a bit gun-shy to go off tweaking and adding stuff willy-nilly.

Latest hourly:
Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9b3pre) Gecko/2008010915 Minefield/3.0b3pre Firefox/3.0 ID:2008010915
I can't print at all, crashing every time, using any printer, on Windows XP.
I can confirm the regression window: Minefield 2008010805 works, but 2008010813 is broken.
Summary: Crash [@ @0x0] while trying to print with the Microsoft XPS Document Writer → Crash [@ @0x0] while trying to print
Version: unspecified → Trunk
Sounds more like a blocker than.
The XPS DocWriter stuff is more or less MS PDF. It's just a way of printing to a file instead of wasting tons of paper.
Severity: critical → blocker
Here's a crash I got which is in _cairo_meta_surface_create
bp-4e0f7ebd-bf8b-11dc-bf04-001a4bd46e84

Also if you widen where the bonsai query searches:
http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=1199823480&maxdate=1199831099&cvsroot=%2Fcvsroot

It looks like Bug 361343 is the culprit due to the new optimization.  
Blocks: 361343
No longer blocks: 376578
Severity: blocker → critical
Severity: critical → blocker
Seems like the first order of business is running this under Purify. I can do this tomorrow, but maybe someone can get to it sooner.
note: the fix from Bug 408538 didn't fix this one
This page doesn't crash when trying to print the page: 
http://fr.wikipedia.org/wiki/P%C3%A9trusse

This one crash: http://fr.wikipedia.org/wiki/Alzette

call stack: bp-29999611-c04e-11dc-9e52-001a4bd43e5c
Summary: Crash [@ @0x0] while trying to print → Crash while trying to print [@ @0x0 @0x1000 _cairo_meta_surface_create]
This is a critical regression, a blocker for smoketests, a problem for dogfooding on Windows Vista, and I consider it a Zero Tolerance New Crash.

This bug should be blocking and fixed ASAP or the patches in the regression window need to be backed out.
No crash with Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; en-US; rv:1.9b3pre) Gecko/2008011104 Minefield/3.0b3pre ID:2008011104

Seems to be a Windows only bug.
Summary: Crash while trying to print [@ @0x0 @0x1000 _cairo_meta_surface_create] → Crash while trying to print [@ @0x0 @0x1000 _cairo_meta_surface_create] [@ _moz_cairo_surface_finish ]
Be nice to get this fixed for B3 given the reproducibility and the number of dups..
Flags: blocking1.9? → blocking1.9+
Priority: -- → P1
Attached file gavin's purify session
While linking, I get this:

c:\wpo_firefox\mozilla\jpeg\jidctint.c(918) : 
warning C4731: 'ippiDCTQuantInv8x8LS_JPEG_16s8u_C1R' : frame pointer register 'ebx' modified by inline assembly code

Attached image wikipedia image 1 (obsolete) —
Attached image wikipedia image 2 (obsolete) —
Attached file test page (obsolete) —
Attachment #296878 - Attachment is obsolete: true
Attachment #296879 - Attachment is obsolete: true
Attached image test image 1
Attachment #296880 - Attachment is obsolete: true
Comment on attachment 296882 [details]
test image 1

Trying to print this image will crash you.
ruled out bug 406580.
Attachment #296877 - Attachment is patch: false
Based on comment #12, #24 and the fact that my builds do not crash, I would say this definitely smells like bug 361343 to me.  I am going to do a new build with the code form that bug added to my .mozconfig and report back.
Bug 361343 is definitely the culprit.  Adding those 3 lines to my .mozconfig results in a build which crashes when trying to print that jpeg image.
This crash is related to cairo and jpeg.  Both of these are in C and not c++. So, I tired removing just this line:

export CFLAGS="-GL"

and leaving these 2:

export CXXFLAGS="-GL"
export LD_FLAGS="-LTCG"
 
That seems to be sufficient to avoid the crash.
That's just masking the problem though.  There's an actual bug here, exposed by the greater optimization.
Well, from the stack:

>	xul.dll!_moz_cairo_surface_finish(_cairo_surface * surface=0x00000000)  Line 517 + 0x8 bytes	C
 	xul.dll!_cairo_win32_surface_release_source_image(void * abstract_surface=0x03626ae8, _cairo_image_surface * image=0x03782b70, void * image_extra=0x0012ed80)  Line 592 + 0x18 bytes	C

_cairo_win32_surface_release_source_image calls cairo_surface_destroy() on image_extra if it's not NULL.

cairo_surface_destroy decrements the reference count, and if it's 0, it calls cairo_surface_finish on the passed-in surface.

We're ending up in cairo_surface_finish with a surface of NULL, when it wasn't NULL in _cairo_win32_surface_release_source_image.  cairo_surface_destroy also does a NULL check on surface right at the start and returns if it's NULL, and that's not being triggered.  (This is all assuming that we can trust the argument values in that function call, though.)

So something's not right here, especially since cairo_surface_finish /also/ does a null check of surface right at the start.  Line 517 is:

    status = surface->backend->finish (surface);

which indeed would blow up if surface or backend was NULL, but there were many references to both surface and backend in that function previously and they're not blowing up.
(In reply to comment #36)
> So something's not right here, especially since cairo_surface_finish /also/
> does a null check of surface right at the start.  Line 517 is:
> 
>     status = surface->backend->finish (surface);
> 
> which indeed would blow up if surface or backend was NULL, but there were many
> references to both surface and backend in that function previously and they're
> not blowing up.
> 

The code in that area does not look that complicated.  This smells like a compiler bug to me.
(In reply to comment #37)
>
> The code in that area does not look that complicated.  This smells like a
> compiler bug to me.

I might agree, if not for attachment 296877 [details]. If we don't solve this in a few days, we'll give cairo a -GL- option to turn it off for that module.

This shouldn't be too hard to figure out from stepping through the assembly, assuming you can catch that cairo_win32_surface_release_source_image reliably before things blow up.  I wish we had some better tools to trace data flow at the instruction/register level while debugging.
This can be avoided by compiling without WPO in jpeg/ cairo/cairo and cairo/libpixman. I think this is a bug with the GL option and inline asm. Could be related to this issue:

http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=99487

Assignee: nobody → sayrer
That claims that the issue was fixed in SP1 (VS2005 SP1, I assume).  Are the tinderboxes running SP1?
(In reply to comment #40)
> This can be avoided by compiling without WPO in jpeg/ cairo/cairo and
> cairo/libpixman. I think this is a bug with the GL option and inline asm. Could
> be related to this issue:
> 
> http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=99487
> 
Well, if we are to do that, shouldn't we also make sure breakpad is not compiled with the GL option either as I believe it also has inline asm code.
(In reply to comment #42)
> Well, if we are to do that, shouldn't we also make sure breakpad is not
> compiled with the GL option either as I believe it also has inline asm code.

Uh, no.  Only the Mac and Linux client code has inline assembly.
We have inline assembly in other chunks of the code though; e.g. xpconnect.  The bug described there doesn't exist in SP1, which we're building with.  There may still be problems with -GL when inline assembly is present though, but I don't know how much faith I have in an optimization flag that breaks down in that way.
(In reply to comment #44)
> We have inline assembly in other chunks of the code though; e.g. xpconnect. 
> The bug described there doesn't exist in SP1, which we're building with.

Did anyone verify that? IIRC, the message in there was a little confusing.

OK, looking at the release date of SP1 and the comments there, seems obvious that it's in.
Yep, I verified by taking the code that reproduces the problem from that bug report and testing that it works with SP1.
(In reply to comment #38)
> If we don't solve this in a few days, we'll give cairo a -GL- option
> to turn it off for that module.

I'm going to request that, given it's been a few days now, we go with this option.

Is there any good reason why the changes in bug 361343 weren't backed out while work was being done to figure out this bug? Given the amount of people that are seeing this (the _moz_cairo_surface_finish signature alone is topcrash #6), it seems logical to have backed that change out until a fix was determined.
Yes please, I just crashed there again.  This makes trunk nightlies hard to dogfood.  Let's either back out the original while you fix it, or disable -GL in those modules.  sayrer: can we land *something* today?
(In reply to comment #49)
> 
> Is there any good reason why the changes in bug 361343 weren't backed out while
> work was being done to figure out this bug? 

Yes, we want to find bugs in the product related to bug 361343. There could have been more.
Do we have the capability to override this on a per module basis without causing build bustage for people building with other compilers?

That is kind of why I suggested only doing the -GL for the c++ code, and not the c code in comment #34.
Depends on: 413019
Attachment #297874 - Flags: review?(ted.mielczarek)
Attachment #297874 - Attachment is obsolete: true
Attachment #297874 - Flags: review?(ted.mielczarek)
Attachment #297877 - Flags: review?(ted.mielczarek)
Attachment #297877 - Flags: review?(ted.mielczarek) → review+
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9b3pre) Gecko/2008011900 Minefield/3.0b3pre ID:2008011900

remaining ?
for printpreview - the printpreview popup stays open with message "preparing", this only seems to happen for a first printpreview, the next time it works.

this doesn't happen all the time, sometimes it closes after a while
(In reply to comment #55)
> Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9b3pre) Gecko/2008011900
> Minefield/3.0b3pre ID:2008011900
> 
> remaining ?
> for printpreview - the printpreview popup stays open with message "preparing",
> this only seems to happen for a first printpreview, the next time it works.
> 
> this doesn't happen all the time, sometimes it closes after a while
> 
This happens every time for me.

I have verified that this does not happen with my own builds which are not compiled with -GL.
Interestingly, I cannot reproduce this issue with my own build compiled with the WPO stuff enabled.
(In reply to comment #57)
> Interestingly, I cannot reproduce this issue with my own build compiled with
> the WPO stuff enabled.
> 

Major differences in my build would be compiling using vc8 express and using cygwin and not mozillabuild.
Depends on: 413169
Depends on: 413170
Finding problems printing attachment 296882 [details] in valgrind. Purify hasn't been much help--going to try some alternative tools on Windows.
(In reply to comment #55)
> remaining ?
> for printpreview - the printpreview popup stays open with message "preparing"

See bug 413200 comment 11
(In reply to comment #60)
> (In reply to comment #55)
> > remaining ?
> > for printpreview - the printpreview popup stays open with message "preparing"
> 
> See bug 413200 comment 11
> 
That would explain why I could not duplicate the issues in my builds.  I include the patch for bug 412598 which fixes the problems introduced by bug  	352791.


The patches in bug 413169 and bug 413170 might fix this crash, but I still have my tinfoil hat on. Marking this fixed, someone can take bug 413019 if they really want WPO turned back on for gfx. There are FIXME comments in all the right places.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Yes, thanks, I can print again with current trunk build.
Verified fixed, using:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9b3pre) Gecko/2008012104 Minefield/3.0b3pre
Status: RESOLVED → VERIFIED
I have a PGO Win64 build running with -GL back in for PNG, GFX (including libpixman) where print works. JPEG is still a problem in jdsamples.
(In reply to comment #42)
> (In reply to comment #40)
> > This can be avoided by compiling without WPO in jpeg/ cairo/cairo and
> > cairo/libpixman. I think this is a bug with the GL option and inline asm. Could
> > be related to this issue:
> > 
> > http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=99487
> > 
> Well, if we are to do that, shouldn't we also make sure breakpad is not
> compiled with the GL option either as I believe it also has inline asm code.

The Microsoft support note indicates that the problem is with mmx in the inline assembly around the emms instruction. The workaround is to use the equivalent intrinsic. JPEG only uses mmx in the jidctfst code which isn't used by Mozilla
Crash Signature: [@ @0x0 @0x1000 _cairo_meta_surface_create] [@ _moz_cairo_surface_finish ]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: