Crash while trying to print [@ @0x0 @0x1000 _cairo_meta_surface_create] [@ _moz_cairo_surface_finish ]

VERIFIED FIXED

Status

()

Core
Printing: Output
P1
blocker
VERIFIED FIXED
10 years ago
7 years ago

People

(Reporter: Martijn Wargers (dead), Assigned: Robert Sayre)

Tracking

(5 keywords)

Trunk
x86
Windows Vista
crash, dogfood, regression, smoketest, topcrash+
Points:
---
Dependency tree / graph
Bug Flags:
blocking1.9 +

Firefox Tracking Flags

(Not tracked)

Details

(crash signature)

Attachments

(4 attachments, 4 obsolete attachments)

(Reporter)

Description

10 years ago
With every page, this crash happens for me here on current trunk build on Vista.

http://crash-stats.mozilla.com/report/index/4a0b1611-be42-11dc-aa45-001a4bd43ef6

On branch I get a "Printing failed when starting the document." error instead, which is at least better than crashing.

I haven't looked at when this regressed.
Flags: blocking1.9?
(Reporter)

Comment 1

10 years ago
It also happens when printing to a file.
I don't crash with my debug build, but that build is a month old (20071204).
Let's see what happens after I have updated my debug build.
(Reporter)

Comment 2

10 years ago
I just installed the XPS Document Writer driver on my windowsXP machine, but I don't crash there.
Flags: blocking1.9?
(Reporter)

Comment 3

10 years ago
Hrm, I'm actually crashing with everything while trying to print with Firefox trunk on my Vista machine.
(Reporter)

Comment 4

10 years ago
And now it's also happening on my windowsXP machine. There is clearly something wacky going on.
Can other persons besides me reproduce this?
Flags: blocking1.9?
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9b3pre) Gecko/2008010912 Minefield/3.0b3pre ID:2008010912

crash here too when trying to print with pdfcreator
http://sourceforge.net/projects/pdfcreator

I'm sure this worked a few weeks ago
Keywords: regression
(Reporter)

Comment 7

10 years ago
Thanks for that regression window, CC-ing Neil, although it seems weird to me those check-ins would have caused this.
(In reply to comment #7)
> Thanks for that regression window, CC-ing Neil, although it seems weird to me
> those check-ins would have caused this.
> 
I doublechecked , using the build before and after the regressionwindow
It's Bug 376578 or Bug 409242
Blocks: 376578
FWIW I am also crashing on Vista trying to print anything but a plain text page, like a bug here. 

I have no idea what XPS DocWriter is... mostly just a plain VistaHP install here as I've been a bit gun-shy to go off tweaking and adding stuff willy-nilly.

Latest hourly:
Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9b3pre) Gecko/2008010915 Minefield/3.0b3pre Firefox/3.0 ID:2008010915

Comment 10

10 years ago
I can't print at all, crashing every time, using any printer, on Windows XP.
I can confirm the regression window: Minefield 2008010805 works, but 2008010813 is broken.
Summary: Crash [@ @0x0] while trying to print with the Microsoft XPS Document Writer → Crash [@ @0x0] while trying to print
Version: unspecified → Trunk
(Reporter)

Comment 11

10 years ago
Sounds more like a blocker than.
The XPS DocWriter stuff is more or less MS PDF. It's just a way of printing to a file instead of wasting tons of paper.
Severity: critical → blocker

Comment 12

10 years ago
Here's a crash I got which is in _cairo_meta_surface_create
bp-4e0f7ebd-bf8b-11dc-bf04-001a4bd46e84

Also if you widen where the bonsai query searches:
http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=1199823480&maxdate=1199831099&cvsroot=%2Fcvsroot

It looks like Bug 361343 is the culprit due to the new optimization.  
Blocks: 361343
No longer blocks: 376578
Severity: blocker → critical
(Reporter)

Updated

10 years ago
Severity: critical → blocker
(Assignee)

Comment 13

10 years ago
Seems like the first order of business is running this under Purify. I can do this tomorrow, but maybe someone can get to it sooner.
note: the fix from Bug 408538 didn't fix this one

Updated

10 years ago
Duplicate of this bug: 411868

Comment 16

10 years ago
This page doesn't crash when trying to print the page: 
http://fr.wikipedia.org/wiki/P%C3%A9trusse

This one crash: http://fr.wikipedia.org/wiki/Alzette

call stack: bp-29999611-c04e-11dc-9e52-001a4bd43e5c
Summary: Crash [@ @0x0] while trying to print → Crash while trying to print [@ @0x0 @0x1000 _cairo_meta_surface_create]

Updated

10 years ago
Duplicate of this bug: 411649
This is a critical regression, a blocker for smoketests, a problem for dogfooding on Windows Vista, and I consider it a Zero Tolerance New Crash.

This bug should be blocking and fixed ASAP or the patches in the regression window need to be backed out.
Keywords: topcrash → dogfood, smoketest, topcrash+, zt4newcrash
No crash with Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; en-US; rv:1.9b3pre) Gecko/2008011104 Minefield/3.0b3pre ID:2008011104

Seems to be a Windows only bug.
Duplicate of this bug: 412178
Summary: Crash while trying to print [@ @0x0 @0x1000 _cairo_meta_surface_create] → Crash while trying to print [@ @0x0 @0x1000 _cairo_meta_surface_create] [@ _moz_cairo_surface_finish ]

Comment 22

10 years ago
Be nice to get this fixed for B3 given the reproducibility and the number of dups..
Flags: blocking1.9? → blocking1.9+
Priority: -- → P1
(Assignee)

Comment 23

10 years ago
Created attachment 296877 [details]
gavin's purify session
(Assignee)

Comment 24

10 years ago
While linking, I get this:

c:\wpo_firefox\mozilla\jpeg\jidctint.c(918) : 
warning C4731: 'ippiDCTQuantInv8x8LS_JPEG_16s8u_C1R' : frame pointer register 'ebx' modified by inline assembly code

(Assignee)

Comment 25

10 years ago
Created attachment 296878 [details]
wikipedia image 1
(Assignee)

Comment 26

10 years ago
Created attachment 296879 [details]
wikipedia image 2
(Assignee)

Comment 27

10 years ago
Created attachment 296880 [details]
test page
(Assignee)

Updated

10 years ago
Attachment #296878 - Attachment is obsolete: true
(Assignee)

Updated

10 years ago
Attachment #296879 - Attachment is obsolete: true
(Assignee)

Comment 28

10 years ago
Created attachment 296882 [details]
test image 1
(Assignee)

Updated

10 years ago
Attachment #296880 - Attachment is obsolete: true
(Assignee)

Comment 29

10 years ago
Comment on attachment 296882 [details]
test image 1

Trying to print this image will crash you.
(Assignee)

Comment 30

10 years ago
ruled out bug 406580.
(Assignee)

Comment 31

10 years ago
Created attachment 296908 [details]
stack of crash from VS debugger
Attachment #296877 - Attachment is patch: false
Based on comment #12, #24 and the fact that my builds do not crash, I would say this definitely smells like bug 361343 to me.  I am going to do a new build with the code form that bug added to my .mozconfig and report back.
Bug 361343 is definitely the culprit.  Adding those 3 lines to my .mozconfig results in a build which crashes when trying to print that jpeg image.
This crash is related to cairo and jpeg.  Both of these are in C and not c++. So, I tired removing just this line:

export CFLAGS="-GL"

and leaving these 2:

export CXXFLAGS="-GL"
export LD_FLAGS="-LTCG"
 
That seems to be sufficient to avoid the crash.
That's just masking the problem though.  There's an actual bug here, exposed by the greater optimization.
Well, from the stack:

>	xul.dll!_moz_cairo_surface_finish(_cairo_surface * surface=0x00000000)  Line 517 + 0x8 bytes	C
 	xul.dll!_cairo_win32_surface_release_source_image(void * abstract_surface=0x03626ae8, _cairo_image_surface * image=0x03782b70, void * image_extra=0x0012ed80)  Line 592 + 0x18 bytes	C

_cairo_win32_surface_release_source_image calls cairo_surface_destroy() on image_extra if it's not NULL.

cairo_surface_destroy decrements the reference count, and if it's 0, it calls cairo_surface_finish on the passed-in surface.

We're ending up in cairo_surface_finish with a surface of NULL, when it wasn't NULL in _cairo_win32_surface_release_source_image.  cairo_surface_destroy also does a NULL check on surface right at the start and returns if it's NULL, and that's not being triggered.  (This is all assuming that we can trust the argument values in that function call, though.)

So something's not right here, especially since cairo_surface_finish /also/ does a null check of surface right at the start.  Line 517 is:

    status = surface->backend->finish (surface);

which indeed would blow up if surface or backend was NULL, but there were many references to both surface and backend in that function previously and they're not blowing up.
(In reply to comment #36)
> So something's not right here, especially since cairo_surface_finish /also/
> does a null check of surface right at the start.  Line 517 is:
> 
>     status = surface->backend->finish (surface);
> 
> which indeed would blow up if surface or backend was NULL, but there were many
> references to both surface and backend in that function previously and they're
> not blowing up.
> 

The code in that area does not look that complicated.  This smells like a compiler bug to me.
(Assignee)

Comment 38

10 years ago
(In reply to comment #37)
>
> The code in that area does not look that complicated.  This smells like a
> compiler bug to me.

I might agree, if not for attachment 296877 [details]. If we don't solve this in a few days, we'll give cairo a -GL- option to turn it off for that module.

This shouldn't be too hard to figure out from stepping through the assembly, assuming you can catch that cairo_win32_surface_release_source_image reliably before things blow up.  I wish we had some better tools to trace data flow at the instruction/register level while debugging.
(Assignee)

Comment 40

10 years ago
This can be avoided by compiling without WPO in jpeg/ cairo/cairo and cairo/libpixman. I think this is a bug with the GL option and inline asm. Could be related to this issue:

http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=99487

(Assignee)

Updated

10 years ago
Assignee: nobody → sayrer
That claims that the issue was fixed in SP1 (VS2005 SP1, I assume).  Are the tinderboxes running SP1?
(In reply to comment #40)
> This can be avoided by compiling without WPO in jpeg/ cairo/cairo and
> cairo/libpixman. I think this is a bug with the GL option and inline asm. Could
> be related to this issue:
> 
> http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=99487
> 
Well, if we are to do that, shouldn't we also make sure breakpad is not compiled with the GL option either as I believe it also has inline asm code.
(In reply to comment #42)
> Well, if we are to do that, shouldn't we also make sure breakpad is not
> compiled with the GL option either as I believe it also has inline asm code.

Uh, no.  Only the Mac and Linux client code has inline assembly.
We have inline assembly in other chunks of the code though; e.g. xpconnect.  The bug described there doesn't exist in SP1, which we're building with.  There may still be problems with -GL when inline assembly is present though, but I don't know how much faith I have in an optimization flag that breaks down in that way.
(Assignee)

Comment 45

10 years ago
(In reply to comment #44)
> We have inline assembly in other chunks of the code though; e.g. xpconnect. 
> The bug described there doesn't exist in SP1, which we're building with.

Did anyone verify that? IIRC, the message in there was a little confusing.

(Assignee)

Comment 46

10 years ago
OK, looking at the release date of SP1 and the comments there, seems obvious that it's in.
Yep, I verified by taking the code that reproduces the problem from that bug report and testing that it works with SP1.
Duplicate of this bug: 412791
(In reply to comment #38)
> If we don't solve this in a few days, we'll give cairo a -GL- option
> to turn it off for that module.

I'm going to request that, given it's been a few days now, we go with this option.

Is there any good reason why the changes in bug 361343 weren't backed out while work was being done to figure out this bug? Given the amount of people that are seeing this (the _moz_cairo_surface_finish signature alone is topcrash #6), it seems logical to have backed that change out until a fix was determined.
Yes please, I just crashed there again.  This makes trunk nightlies hard to dogfood.  Let's either back out the original while you fix it, or disable -GL in those modules.  sayrer: can we land *something* today?
(Assignee)

Comment 51

10 years ago
(In reply to comment #49)
> 
> Is there any good reason why the changes in bug 361343 weren't backed out while
> work was being done to figure out this bug? 

Yes, we want to find bugs in the product related to bug 361343. There could have been more.
Do we have the capability to override this on a per module basis without causing build bustage for people building with other compilers?

That is kind of why I suggested only doing the -GL for the c++ code, and not the c code in comment #34.
(Assignee)

Updated

10 years ago
Depends on: 413019
(Assignee)

Comment 53

10 years ago
Created attachment 297874 [details] [diff] [review]
turn off WPO in areas discussed at firefox meeting
Attachment #297874 - Flags: review?(ted.mielczarek)
(Assignee)

Comment 54

10 years ago
Created attachment 297877 [details] [diff] [review]
turn off WPO in areas discussed at firefox meeting v2
Attachment #297874 - Attachment is obsolete: true
Attachment #297874 - Flags: review?(ted.mielczarek)
(Assignee)

Updated

10 years ago
Attachment #297877 - Flags: review?(ted.mielczarek)
Attachment #297877 - Flags: review?(ted.mielczarek) → review+
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9b3pre) Gecko/2008011900 Minefield/3.0b3pre ID:2008011900

remaining ?
for printpreview - the printpreview popup stays open with message "preparing", this only seems to happen for a first printpreview, the next time it works.

this doesn't happen all the time, sometimes it closes after a while
(In reply to comment #55)
> Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9b3pre) Gecko/2008011900
> Minefield/3.0b3pre ID:2008011900
> 
> remaining ?
> for printpreview - the printpreview popup stays open with message "preparing",
> this only seems to happen for a first printpreview, the next time it works.
> 
> this doesn't happen all the time, sometimes it closes after a while
> 
This happens every time for me.

I have verified that this does not happen with my own builds which are not compiled with -GL.
Interestingly, I cannot reproduce this issue with my own build compiled with the WPO stuff enabled.
(In reply to comment #57)
> Interestingly, I cannot reproduce this issue with my own build compiled with
> the WPO stuff enabled.
> 

Major differences in my build would be compiling using vc8 express and using cygwin and not mozillabuild.
(Assignee)

Updated

10 years ago
Depends on: 413169
(Assignee)

Updated

10 years ago
Depends on: 413170
(Assignee)

Comment 59

10 years ago
Finding problems printing attachment 296882 [details] in valgrind. Purify hasn't been much help--going to try some alternative tools on Windows.
(In reply to comment #55)
> remaining ?
> for printpreview - the printpreview popup stays open with message "preparing"

See bug 413200 comment 11
(In reply to comment #60)
> (In reply to comment #55)
> > remaining ?
> > for printpreview - the printpreview popup stays open with message "preparing"
> 
> See bug 413200 comment 11
> 
That would explain why I could not duplicate the issues in my builds.  I include the patch for bug 412598 which fixes the problems introduced by bug  	352791.


(Assignee)

Comment 62

10 years ago
The patches in bug 413169 and bug 413170 might fix this crash, but I still have my tinfoil hat on. Marking this fixed, someone can take bug 413019 if they really want WPO turned back on for gfx. There are FIXME comments in all the right places.
Status: NEW → RESOLVED
Last Resolved: 10 years ago
Resolution: --- → FIXED
(Reporter)

Comment 63

10 years ago
Yes, thanks, I can print again with current trunk build.
Verified fixed, using:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9b3pre) Gecko/2008012104 Minefield/3.0b3pre
Status: RESOLVED → VERIFIED

Comment 64

10 years ago
I have a PGO Win64 build running with -GL back in for PNG, GFX (including libpixman) where print works. JPEG is still a problem in jdsamples.

Comment 65

10 years ago
(In reply to comment #42)
> (In reply to comment #40)
> > This can be avoided by compiling without WPO in jpeg/ cairo/cairo and
> > cairo/libpixman. I think this is a bug with the GL option and inline asm. Could
> > be related to this issue:
> > 
> > http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=99487
> > 
> Well, if we are to do that, shouldn't we also make sure breakpad is not
> compiled with the GL option either as I believe it also has inline asm code.

The Microsoft support note indicates that the problem is with mmx in the inline assembly around the emms instruction. The workaround is to use the equivalent intrinsic. JPEG only uses mmx in the jidctfst code which isn't used by Mozilla
Crash Signature: [@ @0x0 @0x1000 _cairo_meta_surface_create] [@ _moz_cairo_surface_finish ]
You need to log in before you can comment on or make changes to this bug.