Closed Bug 1019063 Opened 9 years ago Closed 8 years ago

crash in mozilla::gfx::UserData::Add(mozilla::gfx::UserDataKey*, void*, void (*)(void*))

Categories

(Core :: Graphics, defect)

32 Branch
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla41
Tracking Status
firefox36 --- wontfix
firefox37 + wontfix
firefox38 + wontfix
firefox39 + wontfix
firefox40 + fixed
firefox41 + fixed
thunderbird_esr38 44+ fixed

People

(Reporter: nhirata, Assigned: milan)

References

Details

(Keywords: crash, topcrash)

Crash Data

Attachments

(2 files, 4 obsolete files)

This bug was filed from the Socorro interface and is 
report bp-8a8cc5d8-659c-4568-bd6b-1291f2140531.
=============================================================
Crashing Thread
Frame 	Module 	Signature 	Source
0 	libxul.so 	mozilla::gfx::UserData::Add(mozilla::gfx::UserDataKey*, void*, void (*)(void*)) 	/home/geeksphone/FOS/peak/objdir-gecko/layout/base/../../dist/include/mozilla/gfx/UserData.h:30
1 	libxul.so 	mozilla::FrameLayerBuilder::ClippedDisplayItem::~ClippedDisplayItem 	/home/geeksphone/FOS/peak/objdir-gecko/layout/base/../../dist/include/Layers.h:481
2 	libxul.so 	nsTArray_Impl<mozilla::FrameLayerBuilder::ClippedDisplayItem, nsTArrayInfallibleAllocator>::RemoveElementsAt(unsigned int, unsigned int) 	/home/geeksphone/FOS/peak/objdir-gecko/layout/base/../../dist/include/nsTArray.h:536
3 	libxul.so 	nsTHashtable<mozilla::FrameLayerBuilder::ThebesLayerItemsEntry>::s_ClearEntry(PLDHashTable*, PLDHashEntryHdr*) 	/home/geeksphone/FOS/peak/objdir-gecko/layout/base/../../dist/include/nsTArray.h:1313
4 	libxul.so 	PL_DHashTableFinish(PLDHashTable*) 	/home/geeksphone/FOS/peak/gecko/xpcom/glue/pldhash.cpp:320
5 	libxul.so 	mozilla::FrameLayerBuilder::~FrameLayerBuilder 	/home/geeksphone/FOS/peak/objdir-gecko/layout/base/../../dist/include/nsTHashtable.h:399
6 	libxul.so 	mozilla::FrameLayerBuilder::~FrameLayerBuilder 	/home/geeksphone/FOS/peak/gecko/layout/base/FrameLayerBuilder.h:166
7 	libxul.so 	mozilla::layers::LayerManagerUserDataDestroy 	/home/geeksphone/FOS/peak/objdir-gecko/layout/base/../../dist/include/Layers.h:142
8 	libxul.so 	mozilla::gfx::UserData::Add(mozilla::gfx::UserDataKey*, void*, void (*)(void*)) 	/home/geeksphone/FOS/peak/objdir-gecko/layout/base/../../dist/include/mozilla/gfx/UserData.h:33
9 	libxul.so 	nsDisplayList::PaintForFrame(nsDisplayListBuilder*, nsRenderingContext*, nsIFrame*, unsigned int) const 	/home/geeksphone/FOS/peak/objdir-gecko/layout/base/../../dist/include/Layers.h:481
10 	libxul.so 	nsDisplayList::PaintRoot(nsDisplayListBuilder*, nsRenderingContext*, unsigned int) const 	/home/geeksphone/FOS/peak/gecko/layout/base/nsDisplayList.cpp:1209
11 	libxul.so 	nsLayoutUtils::PaintFrame(nsRenderingContext*, nsIFrame*, nsRegion const&, unsigned int, unsigned int) 	/home/geeksphone/FOS/peak/gecko/layout/base/nsLayoutUtils.cpp:2817
12 	libxul.so 	PresShell::Paint(nsView*, nsRegion const&, unsigned int) 	/home/geeksphone/FOS/peak/gecko/layout/base/nsPresShell.cpp:5917
13 	libxul.so 	nsViewManager::ProcessPendingUpdatesPaint(nsIWidget*) 	/home/geeksphone/FOS/peak/gecko/view/src/nsViewManager.cpp:443
14 	libxul.so 	nsViewManager::ProcessPendingUpdatesForView(nsView*, bool) 	/home/geeksphone/FOS/peak/gecko/view/src/nsViewManager.cpp:384
15 	libxul.so 	nsViewManager::ProcessPendingUpdates() 	/home/geeksphone/FOS/peak/gecko/view/src/nsViewManager.cpp:1074
16 	libxul.so 	nsRefreshDriver::Tick(long long, mozilla::TimeStamp) 	/home/geeksphone/FOS/peak/gecko/layout/base/nsRefreshDriver.cpp:1214
17 	libxul.so 	mozilla::RefreshDriverTimer::TimerTick(nsITimer*, void*) 	/home/geeksphone/FOS/peak/gecko/layout/base/nsRefreshDriver.cpp:168
18 	libxul.so 	nsTimerImpl::Fire() 	/home/geeksphone/FOS/peak/gecko/xpcom/threads/nsTimerImpl.cpp:555
19 	libxul.so 	nsTimerEvent::Run() 	/home/geeksphone/FOS/peak/gecko/xpcom/threads/nsTimerImpl.cpp:639
20 	libxul.so 	nsThread::ProcessNextEvent(bool, bool*) 	/home/geeksphone/FOS/peak/gecko/xpcom/threads/nsThread.cpp:715
21 	libxul.so 	NS_ProcessNextEvent(nsIThread*, bool) 	/home/geeksphone/FOS/peak/gecko/xpcom/glue/nsThreadUtils.cpp:263
22 	libxul.so 	mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) 	/home/geeksphone/FOS/peak/gecko/ipc/glue/MessagePump.cpp:136
23 	libxul.so 	MessageLoop::RunInternal() 	/home/geeksphone/FOS/peak/gecko/ipc/chromium/src/base/message_loop.cc:229
24 	libxul.so 	MessageLoop::Run() 	/home/geeksphone/FOS/peak/gecko/ipc/chromium/src/base/message_loop.cc:222
25 	libxul.so 	nsBaseAppShell::Run() 	/home/geeksphone/FOS/peak/gecko/widget/xpwidgets/nsBaseAppShell.cpp:164
26 	libxul.so 	nsAppStartup::Run() 	/home/geeksphone/FOS/peak/gecko/toolkit/components/startup/nsAppStartup.cpp:278
27 	libxul.so 	XREMain::XRE_mainRun() 	/home/geeksphone/FOS/peak/gecko/toolkit/xre/nsAppRunner.cpp:4023
28 	libxul.so 	XREMain::XRE_main(int, char**, nsXREAppData const*) 	/home/geeksphone/FOS/peak/gecko/toolkit/xre/nsAppRunner.cpp:4092
29 	libxul.so 	XRE_main 	/home/geeksphone/FOS/peak/gecko/toolkit/xre/nsAppRunner.cpp:4304
30 	b2g 	main 	/home/geeksphone/FOS/peak/gecko/b2g/app/nsBrowserApp.cpp:163
31 	libc.so 	__libc_init 	/home/geeksphone/FOS/keon_nightly/bionic/libc/bionic/libc_init_dynamic.c:114
32 	b2g 	NS_UTF16ToCString 	/home/geeksphone/FOS/peak/gecko/xpcom/glue/standalone/nsXPCOMGlue.cpp:764 

More Crashes: 
https://crash-stats.mozilla.com/report/list?product=B2G&signature=mozilla%3A%3Agfx%3A%3AUserData%3A%3AAdd%28mozilla%3A%3Agfx%3A%3AUserDataKey*%2C+void*%2C+void+%28*%29%28void*%29%29

Looks like peak crashes
Blocks: 1123550
Currently at #43 in the 36.0b Top Crash list (it's likely going to climb though since
many slots at the moment are about bug 1133190 which has been fixed).

Here's a better crash report with links to source code:
bp-595b8009-ddbe-41b2-a5e4-f55f42150217

#0 mozilla::gfx::UserData::Add(mozilla::gfx::UserDataKey*, void*, void (*)(void*))
#1 nsTHashtable<mozilla::FrameLayerBuilder::PaintedLayerItemsEntry>::s_ClearEntry(PLDHashTable*, PLDHashEntryHdr*)
#2 mozilla::FrameLayerBuilder::`scalar deleting destructor'(unsigned int)
#3 mozilla::layers::LayerManagerUserDataDestroy
#4 mozilla::gfx::UserData::Add(mozilla::gfx::UserDataKey*, void*, void (*)(void*))
#5 nsDisplayList::PaintRoot(nsDisplayListBuilder*, nsRenderingContext*, unsigned int)
...

I guess #1 is the destruction of FrameLayerBuilder::mPaintedLayerItems (with
type nsTHashtable<PaintedLayerItemsEntry>), I don't see how we get from
~PaintedLayerItemsEntry to frame #0 though:
http://hg.mozilla.org/mozilla-central/annotate/5f1009731a97/layout/base/FrameLayerBuilder.h#l593
(perhaps the symbol in frame #0 is bogus?)

Could it be a double-delete of the FrameLayerBuilder in #2?
Keywords: topcrash
OS: Android → All
There's also a print related stack leading up to the same signature in:
bp-04c2ae9a-113e-479f-bf3a-8b39b2150216

mozilla::gfx::UserData::Add(mozilla::gfx::UserDataKey*, void*, void (*)(void*))
nsDeviceContext::CreateRenderingContext()
PresShell::CreateReferenceRenderingContext()
PresShell::DoReflow(nsIFrame*, bool)
PresShell::ProcessReflowCommands(bool)
PresShell::FlushPendingNotifications(mozilla::ChangesToFlush)
PresShell::FlushPendingNotifications(mozFlushType)
nsPrintEngine::ReflowPrintObject(nsPrintObject*)
nsPrintEngine::ReflowDocList(nsPrintObject*, bool)
nsPrintEngine::InitPrintDocConstruction(bool)
nsPrintEngine::DoCommonPrint(bool, nsIPrintSettings*, nsIWebProgressListener*, nsIDOMDocument*)
nsPrintEngine::CommonPrint(bool, nsIPrintSettings*, nsIWebProgressListener*, nsIDOMDocument*)
nsPrintEngine::Print(nsIPrintSettings*, nsIWebProgressListener*)
...
I managed to reproduce this issue on Firefox 39.0a1 (2015-03-02) using Windows 7 x64.

STR
1.Launch Firefox with clean profile.
2.Open a HTML page (e.g. http://mozqa.com/data/firefox/layout/mozilla.html )
3.Go to File -> Print.
4.Print the page.

AR
The tab crashed. Crash ID: 4ba319ca-e1ee-462d-9178-a1dcc2150303

ER
The page is printed successfully.

Additional notes:
- This issue is e10s related.
- This issue is not reproducible on Mac 10.8.5.
- “e10s printing is not implemented yet. Bug 927188” message is displayed on Ubuntu 13.10 x64.
OS: All → Windows 7
Version: 32 Branch → 39 Branch
OS: Windows 7 → All
[Tracking Requested - why for this release]: topcrash

It's currently at #19 in 36.0.1 Top Crashers list and rising.

I tried the STR in comment 3 in Nightly (39) on Linux64 and it worked fine for me,
both with e10s enabled and disabled.  Print Preview and Print to file.

It might be more reproducible on Windows though.
Top crash, tracking.
Removing [b2g-crash] as it impacts also fx.
(In reply to Vasilica Mihasca, QA [:vasilica_mihasca] from comment #3)

Vasilica can you get a regression window?
Flags: needinfo?(vasilica.mihasca)
Here is the regression window:

Last good revision: 88037f94b7d7 (2014-12-31)
First bad revision: 3c296aa11c51 (2015-01-01)
Pushlog:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=88037f94b7d7&tocha
nge=3c296aa11c51
Flags: needinfo?(vasilica.mihasca)
Can you get the inbound regression window?
Flags: needinfo?(vasilica.mihasca)
We don't yet have enough information to go on here. We are not likely to have a fix for 37 but I'm going to leave this on the tracked list in case we can take a fix in a point release.
This is the inbound regression window, I hope it is ok:

Last good revision: b17e7747d3fb
First bad revision: fa8a9954fdb5
Pushlog: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=b17e7747d3fb&tochange=fa8a9954fdb5
Flags: needinfo?(vasilica.mihasca)
That doesn't seem like the right regression window. Can you try again?
Flags: needinfo?(vasilica.mihasca)
I have tried to push further but I'm pretty sure this is not what we are looking for. What do you think?

(m-c)
Last good revision: df3fc7cb7e80 (2014-11-30)
First bad revision: af5fc587f98b (2014-12-01)
Pushlog:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=df3fc7cb7e80&tochange=af5fc587f98b

(m-i)
Last good revision: c25bf9e8b5a5
First bad revision: f76fa8396ca8
Pushlog:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=c25bf9e8b5a5&tochange=f76fa8396ca8
Flags: needinfo?(vasilica.mihasca)
Yeah, those don't look suspicious. What's going wrong during the regression window search? Can you not reproduce it reliably? Are you using mozregression?
Flags: needinfo?(vasilica.mihasca)
I confirm that I used mozregression. I repeated this 2 times, but with the same result. I do not realize why this does not return a satisfactory pushlog.
Flags: needinfo?(vasilica.mihasca)
Let's assume that the regression range in comment 12 is correct, despite our reservations; Bob, can you see how this could break printing?  Can you reproduce the problem?

Vasilica, what kind of a printer are you printing to?  Can you reproduce with "print to XPS" that exists on Windows even without any printers installed?
Flags: needinfo?(bobowen.code)
(In reply to Milan Sreckovic [:milan] from comment #15)
> Let's assume that the regression range in comment 12 is correct, despite our
> reservations; Bob, can you see how this could break printing?  Can you
> reproduce the problem?
> 
> Vasilica, what kind of a printer are you printing to?  Can you reproduce
> with "print to XPS" that exists on Windows even without any printers
> installed?

Well I suppose it's possible, but I don't see how this could be the original problem as it landed after the bug was reported.
I also don't see how it would cause a problem in that particular bit of code, but maybe it's in the lead up to it.

Anyway, if it can be reproduced, you could then try running with an environment variable of MOZ_DISABLE_CONTENT_SANDBOX=1.
This disables the sandbox and would prove it.
Flags: needinfo?(bobowen.code)
Milan what's the next step here? What might help here either with QA or in helping find someone to work on it?
Flags: needinfo?(milan)
Trying to reproduce it locally - comment 15 has a question for Vasilica, so that we can try and reproduce the scenario that gets us the crash.
Flags: needinfo?(milan)
(In reply to Milan Sreckovic [:milan] from comment #15)

> Vasilica, what kind of a printer are you printing to? 

I am using a Kyocera ECOSYS P2135dn printer.

> Can you reproduce with "print to XPS" that exists on Windows even without any printers installed?

This crash is also reproducible with "print to XPS" on Firefox 40.0a1 (2015-04-14) using Windows 7 64-bit.
Still can't reproduce locally, on Windows 7 64-bit OS, using either 32-bit or 64-bit Firefox.

Vasilica, what's your about:support for graphics?
Flags: needinfo?(vasilica.mihasca)
These are the Graphics details:

Adapter Description	AMD 760G
Adapter Drivers	atiu9p64 aticfx64 atiu9pag aticfx32 atiumd64 atidxx64 atiumdag atidxx32 atiumdva atiumd6a atitmm64
Adapter RAM	512
Asynchronous Pan/Zoom	none
Device ID	0x9616
Direct2D Enabled	true
DirectWrite Enabled	true (6.2.9200.16492)
Driver Date	4-6-2010
Driver Version	8.723.0.0
GPU #2 Active	false
GPU Accelerated Windows	1/1 Direct3D 11 (OMTC)
Subsys ID	00000000
Vendor ID	0x1002
WebGL Renderer	Google Inc. -- ANGLE (AMD 760G Direct3D11 vs_4_0 ps_4_0)
windowLayerManagerRemote	true
AzureCanvasBackend	direct2d 1.1
AzureContentBackend	direct2d 1.1
AzureFallbackCanvasBackend	cairo
AzureSkiaAccelerated	0
Flags: needinfo?(vasilica.mihasca)
Windows 7 64bit, administrator user, nightly 32 bit.
Experiencing crash.
set MOZ_DISABLE_CONTENT_SANDBOX=1 printing works
Also print preview is blank. i.e. no paper just chrome and dark background.

If you can't reproduce with administrator try; I have UAC bottom off setting, tried with it top on and still same. (Just restarted nightly. I didn't log out / reboot so maybe UAC didn't switch when I tested it.)

Standard user and/or nightly 64bit work without problem.
Thanks Vasilica and Jonathan, your particular scenario now makes sense.

From looking at the crash stacks, I think there are at least three different problems here all leading to the same crash signature.

* Printing with e10s enabled (and therefore the default USER_NON_ADMIN sandbox) and UAC turned off.
* Other Printing issues (as there seem to be printing crashes outside of Nightly, so not e10s related).
* One or more other problems (I'm certainly no expert in this area, but there seem to be crashes not related to printing).

(I've also realised during this investigation that the thumbnail content process in Release will be getting sandboxed, but I don't think that could be causing the other printing issues as they happen on the main process. So, I'll fix that in a separate unrelated bug.)

The case where we try to print with e10s (and therefore the default USER_NON_ADMIN sandbox) and UAC turned off, is the one that Vasilica and Jonathan are hitting.
This is because, when I first landed that sandbox, I didn't fully understand the problems that having different initial and delayed integrity levels can cause with GDI calls.

What is happening here is that the first GDI call must happen at the initial integrity level (high with UAC turned off on a local-admin account), this sets the UI privilege level to high.
Subsequent calls, once the process integrity level has been dropped to medium then fail, as it is trying to interact with a higher integrity level.

Specifically here the call at [1] fails and this means the surface gets created with a null device context.
This then means we get a null DrawTarget at [2], which leads to our crash.

I'll file a blocking bug to fix the sandbox for this case.

I wonder if the other printing failures could be caused by [1] failing for other reasons, I think a null check and return of NS_ERROR_GFX_PRINTER_NAME_NOT_FOUND would cause an error dialog with "The selected printer could not be found.", which would seem appropriate.

Adding a null check to [2] might help to stop a crash in some of the non-printing cases, although some of them seem to be caused by some strange destructor re-entry loop caused by [3] in some circumstances.

Jeff - what do you think about the above two suggestions?

[1] https://hg.mozilla.org/mozilla-central/file/37d60e3b8be6/widget/windows/nsDeviceContextSpecWin.cpp#l306
[2] https://hg.mozilla.org/mozilla-central/file/37d60e3b8be6/gfx/src/nsDeviceContext.cpp#l409
[3] https://hg.mozilla.org/mozilla-central/file/37d60e3b8be6/gfx/2d/UserData.h#l33
Flags: needinfo?(jmuizelaar)
Depends on: 1158773
This is too late for 38 but we can take a patch for 39.
I hit that crash when printing (via MS XPS Document Writer) a PDF with FF40 and e10s enabled.
It crashes the Print Host Driver for 32-bit applications (splwow64.exe) then if I print again, I get the crash with the current signature.
https://crash-stats.mozilla.com/report/index/392955d0-c0f4-4d55-8016-cd1eb2150510
Some of the crashes are because we try to make 0x0 surface and fail.  Any fix in nsDeviceContext::CreateRenderingContext may need an answer about bug 996351 in order to not crash in the caller.
Loic, you don't have multiple monitors by any chance?  Or a laptop in a docking station?
Flags: needinfo?(jmuizelaar) → needinfo?(epinal99-bugzilla2)
Nope, just a single notebook with Win 7.
Flags: needinfo?(epinal99-bugzilla2)
Loic, this is a big favour to ask, but it would help us a lot if you can do it when you have a chance.

1. Download the build from here: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/msreckovic@mozilla.com-9fbbe0881f7c/try-win32/

(This build is from this test build, https://treeherder.mozilla.org/#/jobs?repo=try&revision=9fbbe0881f7c&filter-searchStr=Windows%20XP%20opt%20Build%20%28B%29, based on the code change https://hg.mozilla.org/try/rev/eb18ac02f26f if you want to take a look.)

2. Do the custom install (so that it doesn't change your installed Firefox.)

Then:
3. create a new integer preference in about:config named gfx.logging.crash.length, set it to 50.  (This just lets us see more than the default 6 error messages in step 5 below.)

4. Restart the browser, and in a new session, run the test above that reproduces the crash.

5a. If it does crash, attach the crash report here OR...
5b. If it does not crash, attach the content of about:support graphics section, which should be longer than default and have a number of error messages in it.
Flags: needinfo?(epinal99-bugzilla2)
(In reply to Milan Sreckovic [:milan] from comment #30)
> Loic, this is a big favour to ask, but it would help us a lot if you can do
> it when you have a chance.

Done with this standalone version
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/msreckovic@mozilla.com-9fbbe0881f7c/try-win32/firefox-41.0a1.en-US.win32.zip and a new profile.
It crashed.
CR:
https://crash-stats.mozilla.com/report/index/0fffa34f-14be-43ad-8786-857b52150528
https://crash-stats.mozilla.com/report/index/2f3ab37f-2f83-4055-bb9a-837db2150528
Flags: needinfo?(epinal99-bugzilla2)
Awesome, thanks.

Some notes since I don't know this code well:

We leave nsDeviceContext::CalcPrintingSize with size 0x0. We set it "if (inPoints)" branch, which makes sense for PDF printing, in which case gfxPDFSurface is size 0x0; I guess it is also possible we came in with an unrecognized type for mPrintingSurface.

nsDeviceContext::InitForPrinting, which first gets the printing surface from nsDeviceContextSpecWin::GetSurfaceForPrinter, then calls Init, then calls CalcPrintingSize.

GetSurfaceForPrinter, in the case of PDF, uses nsPrintSettings::GetEffectivePageSize to get the size.  If that returns zero, we get in the trouble above, but that can happen only if something explicitly sets those values to zero using setPaperWidth/setPaperHeight, which doesn't really make sense.

Do we have the wrong surface type for the mPrintingSurface?  Coming from nsDeviceContextSpecWin::GetSurfaceForPrinter (as it is not null in the above report), it is either gfxPDFSurface (in which case we are in the PDF scenario above), or gfxWindowsSurface.  Where we should end with gfxSurfaceType::Win32Printing type, which apparently we're not.

I'll do another custom build with more info, to see which scenario we're hitting.
Loic, if you wouldn't mind repeating steps from comment 30, except with this build: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/msreckovic@mozilla.com-03ec4936420e/try-win32/ that would be most awesome.
If it doesn't crash, just attach what graphics section of about:support looks like.  ALso, looking at your second crash in comment 31, it looks like it may be worth increasing gfx.logging.crash.length to a 100 instead.
Flags: needinfo?(epinal99-bugzilla2)
Again, thanks a lot.


(In reply to Milan Sreckovic [:milan] from comment #32)
>
> Do we have the wrong surface type for the mPrintingSurface?  Coming from
> nsDeviceContextSpecWin::GetSurfaceForPrinter (as it is not null in the above
> report), it is either gfxPDFSurface (in which case we are in the PDF
> scenario above), or gfxWindowsSurface.  Where we should end with
> gfxSurfaceType::Win32Printing type, which apparently we're not.

This seems to be the case.  I'll dig a bit more through the logic.
With some repeats from previous comments to have it all in one place.

It starts with nsPrintEngine::DoCommonPrint.
That eventually creates a new nsDeviceContext and calls nsDeviceContext::InitForPrinting.
This calls nsDeviceContextSpecWin::GetSurfaceForPrinter.
This creates a new gfxWindowsSurface, using the HDC + flags constructor.
That calls cairo_win32_printing_surface_create with the same HDC, which fails to create a valid surface.
The invalid surface is passed to gfxASurface::Init, which then sets mValidSurface to false, after which GetType() returns -1.
We end up going to the default case inside of nsDeviceContext::CalcPrintingSize and are left with size 0.  
That eventually has us failing inside of nsDeviceContext::CreateRenderingContext in CreateDrawTargetForSurface (we do not like size 0).
And we crash.

The original failure is cairo_win32_printing_surface_create creating invalid surface from the HDC we gave it.  Haven't captured what the failure code is.
Since it seems that we're failing in cairo, here's a change with Jeff's idea to just crash when we hit the cairo error in order to see where and what's happening.

http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/msreckovic@mozilla.com-f9435c041875/try-win32/

Loic, if you have a chance...
Flags: needinfo?(epinal99-bugzilla2)
(In reply to Milan Sreckovic [:milan] from comment #37)

> http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/msreckovic@mozilla.
> com-f9435c041875/try-win32/
> 
> Loic, if you have a chance...

Build crashed (new profile, gfx.logging.crash.length=100):
https://crash-stats.mozilla.com/report/index/e3926407-c132-4b85-a5c6-69c412150530
https://crash-stats.mozilla.com/report/index/731c307f-37e2-41be-bdec-e2c292150530
Flags: needinfo?(epinal99-bugzilla2)
I would like to add I didn't crash with the builds before May.
The regression range I found:
good=2015-04-29
bad=2015-04-30
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=1ad65cbeb2f4&tochange=4b9b12c248dc
Loic rules!

(In reply to Bob Owen (:bobowen) from comment #24)
> 
> What is happening here is that the first GDI call must happen at the initial
> integrity level (high with UAC turned off on a local-admin account), this
> sets the UI privilege level to high.
> Subsequent calls, once the process integrity level has been dropped to
> medium then fail, as it is trying to interact with a higher integrity level.
> 
> Specifically here the call at [1] fails and this means the surface gets
> created with a null device context.
> This then means we get a null DrawTarget at [2], which leads to our crash.

I can confirm Loic's scenario is this one; ::CreateDCW fails, we don't handle it along the way.

> 
> I'll file a blocking bug to fix the sandbox for this case.

Bob, just to be clear, this is bug 1158773, and it being fixed still has ::CreateDCW failing as the "correct" behaviour, right?
Flags: needinfo?(bobowen.code)
(In reply to Milan Sreckovic [:milan] from comment #40)
> Loic rules!
 
> > 
> > I'll file a blocking bug to fix the sandbox for this case.
> 
> Bob, just to be clear, this is bug 1158773, and it being fixed still has
> ::CreateDCW failing as the "correct" behaviour, right?

Hmm, I thought I reproduced and checked the fix for bug 1158773 was working.
Maybe changing the sandbox to low integrity has broken this again.

Loic - sorry about this, but would you mind running another couple of tests on Nightly with e10s enabled?
First try this:
* set the pref security.sandbox.content.level=0
* restart Firefox
* try to reproduce

If that doesn't work try:
* with Firefox not started, set an environment variable of MOZ_DISABLE_CONTENT_SANDBOX with a value of 1
* start Firefox
* try to reproduce

Remember to remove the environment variable and reset the pref to 1, if you want the latest content process sandbox setting when you have finished testing.

I'll look into reproducing myself tomorrow as well (leaving the NI).

As I said in comment 24, it looks like this crash signature might happen for other reasons as well.
I added a couple of suggestions for trying to fix those, but we might want to added some sort of reporting as well so we don't miss the failures.
Flags: needinfo?(epinal99-bugzilla2)
The "real" change is in nsDeviceContextSpecWin.cpp - the rest are documentation and user messages.  If the change is OK, probably don't want to land until we figure out why ::CreateDCW still fails.
Attachment #8613625 - Flags: review?(bobowen.code)
Comment on attachment 8613625 [details] [diff] [review]
Watch for failing CreateDCW. r=dvander

Sorry Milan, I'm really not the right person to review this.

It hits a few areas, none of which I am a peer in.
Attachment #8613625 - Flags: review?(bobowen.code)
(In reply to Bob Owen (:bobowen) from comment #41)
> (In reply to Milan Sreckovic [:milan] from comment #40)
> > Loic rules!
>  
> > > 
> > > I'll file a blocking bug to fix the sandbox for this case.
> > 
> > Bob, just to be clear, this is bug 1158773, and it being fixed still has
> > ::CreateDCW failing as the "correct" behaviour, right?
> 
> Hmm, I thought I reproduced and checked the fix for bug 1158773 was working.
> Maybe changing the sandbox to low integrity has broken this again.
> 
> Loic - sorry about this, but would you mind running another couple of tests
> on Nightly with e10s enabled?
> First try this:
> * set the pref security.sandbox.content.level=0
> * restart Firefox
> * try to reproduce

With e10s enabled and security.sandbox.content.level=0, the latest Nightly doesn't crash when printing a PDF with MS XPS Document Writer.
Flags: needinfo?(epinal99-bugzilla2)
(In reply to Loic from comment #44)

> With e10s enabled and security.sandbox.content.level=0, the latest Nightly
> doesn't crash when printing a PDF with MS XPS Document Writer.

Thanks for checking that, it confirms that the fix for bug 1158773 worked, but that it has re-broken with low integrity (bug 1151767).

There is already an outstanding bug for other low integrity printing problems, but not one that causes a crash.
The fix for that will probably fix this, but I'll try to reproduce and see if we need a separate bug.

I've been distracted by other things at the moment, but the printing bug was next on my list before these other things came up.

Thanks again for your help.
Attachment #8613625 - Flags: review?(dvander)
Attachment #8613625 - Attachment description: Watch for failing CreateDCW. r=bobowen → Watch for failing CreateDCW. r=dvander
Assignee: nobody → milan
I've tried this with UAC turned off and it doesn't crash for me.

However, I am getting a different problem where the printing isn't working even if I try to save the xps to a low integrity directory.
This is with UAC on or off.

This is because as part of the print we seem to create a temporary file in the users Appdata\Local\Temp directory and the sandbox is blocking this.
The directory service TmpD directory should be overridden to a low integrity temp, so I can only assume that whatever is creating the temp file isn't using the directory service.
Looks like it might be something inside Windows own DLLs.

Either way around printing using XPS at low integrity just doesn't work.
I think that many of the issues related to the sandbox will be fixed when I (or someone) finally get around to bug 1090454.

Loic, I'm not sure why the CreateDCW call is failing for you with UAC turned off and the low integrity sandbox, but Milan's patch will at least stop the crash.
Although printing to XPS with the low integrity sandbox will still not work for the moment.

By the way, one other option is to open a "New Non-e10s" window from the hamburger menu and print to XPS from that window.
Flags: needinfo?(bobowen.code)
Comment on attachment 8613625 [details] [diff] [review]
Watch for failing CreateDCW. r=dvander

Review of attachment 8613625 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the late review - r=me

::: gfx/src/nsDeviceContext.cpp
@@ +414,5 @@
>                                                               gfx::IntSize(mWidth, mHeight));
>  
> +    if (!dt) {
> +        gfxCriticalError() << "Failed to create draw target in device context sized " << mWidth << "x" << mHeight << " and pointers " << hexa(mPrintingSurface) << " and " << hexa(printingSurface);
> +        MOZ_CRASH("Cannot CreateDrawTargetForSurface");

Do we crash here because our callers assume this function is infallible?

::: widget/windows/nsDeviceContextSpecWin.cpp
@@ +313,4 @@
>  
>        // have this surface take over ownership of this DC
>        newSurface = new gfxWindowsSurface(dc, gfxWindowsSurface::FLAG_TAKE_DC | gfxWindowsSurface::FLAG_FOR_PRINTING);
> +      if (newSurface->GetType() == (gfxSurfaceType)-1) {

It this the standard way of checking if a surface is invalid? Would be nice to have an IsValid() helper or something.
Attachment #8613625 - Flags: review?(dvander) → review+
Liz, this bug affects 38.0.1 with 60.45%, and 39.0b1 with 7.66% (511 crashes). People can't print things and seem to be very upset about this in the comments.
Milan, I noticed that this patch was r+ a few days back. Are we ready to land this in moz-central? Do you think we will be able to fix this in time for 39 Beta5? We literally have a day or two window for that to happen.
Flags: needinfo?(milan)
Yes, we may as well check it in, it won't hurt.  Note that it will "convert" a crash into failure to print, so the users are still not happy.  However, the claim is that bug 1151767 actually fixes the problem, so that should be investigated and uplifted in order to really take care of the "no printing" scenario.
Depends on: 1151767
Flags: needinfo?(milan) → needinfo?(bobowen.code)
Keywords: checkin-needed
Comment on attachment 8613625 [details] [diff] [review]
Watch for failing CreateDCW. r=dvander

Approval Request Comment
[Feature/regressing bug #]:
[User impact if declined]: Crash instead of failed print (if print is to fail for whatever reason.)
[Describe test coverage new/current, TreeHerder]:
[Risks and why]: 
[String/UUID change made/needed]:
Attachment #8613625 - Flags: approval-mozilla-beta?
Attachment #8613625 - Flags: approval-mozilla-aurora?
(In reply to Milan Sreckovic [:milan] from comment #50)
> Yes, we may as well check it in, it won't hurt.  Note that it will "convert"
> a crash into failure to print, so the users are still not happy.  However,
> the claim is that bug 1151767 actually fixes the problem, so that should be
> investigated and uplifted in order to really take care of the "no printing"
> scenario.

Er no, bug 1151767 breaks printing to XPS for e10s, but in Nightly only.
Flags: needinfo?(bobowen.code)
for the checkin can we get a try run ?
Flags: needinfo?(milan)
(In reply to Carsten Book [:Tomcat] from comment #53)
> for the checkin can we get a try run ?

It actually needs a rebase anyway.
Flags: needinfo?(milan)
Rebase and https://treeherder.mozilla.org/#/jobs?repo=try&revision=68264afaa04e
Attachment #8613625 - Attachment is obsolete: true
Attachment #8613625 - Flags: approval-mozilla-beta?
Attachment #8613625 - Flags: approval-mozilla-aurora?
Attachment #8620973 - Flags: review+
Comment on attachment 8620973 [details] [diff] [review]
Watch for failing CreateDCW. Carry r=dvander

Approval Request Comment
See above, just rebasing the patch.
Attachment #8620973 - Flags: approval-mozilla-beta?
Attachment #8620973 - Flags: approval-mozilla-aurora?
Milan, since the try run failed, does this still need work?  Should it land on m-c before I approve it for uplift?
Flags: needinfo?(milan)
Let's wait for the clean try before approving.  https://hg.mozilla.org/try/rev/804210a7f561
Flags: needinfo?(milan)
Attachment #8622584 - Flags: review+
Sigh. https://treeherder.mozilla.org/#/jobs?repo=try&revision=a3e7f6636f81
Attachment #8620973 - Attachment is obsolete: true
Attachment #8622584 - Attachment is obsolete: true
Attachment #8620973 - Flags: approval-mozilla-beta?
Attachment #8620973 - Flags: approval-mozilla-aurora?
Attachment #8622667 - Flags: review+
(In reply to Milan Sreckovic [:milan] from comment #58)
> Created attachment 8622584 [details] [diff] [review]
> Watch for failing CreateDCW. Carry r=dvander
> 
> Let's wait for the clean try before approving. 
> https://hg.mozilla.org/try/rev/804210a7f561

Don't forget to update the reviewer in the comment. :-)

Also, dvander had a couple of questions in his review in comment 47.
I don't know if you contacted him outside of bmo, but it would be useful to have the answers here.
(Useful for me at least, next quarter I'm going to try and tackle printing via the chrome process for e10s and I'll need all the information / help I can get. :-) )
(In reply to David Anderson [:dvander] from comment #47)
> ...
> >  
> > +    if (!dt) {
> > +        gfxCriticalError() << "Failed to create draw target in device context sized " << mWidth << "x" << mHeight << " and pointers " << hexa(mPrintingSurface) << " and " << hexa(printingSurface);
> > +        MOZ_CRASH("Cannot CreateDrawTargetForSurface");
> 
> Do we crash here because our callers assume this function is infallible?

Correct, that's bug 1168934.

> 
> ::: widget/windows/nsDeviceContextSpecWin.cpp
> @@ +313,4 @@
> >  
> >        // have this surface take over ownership of this DC
> >        newSurface = new gfxWindowsSurface(dc, gfxWindowsSurface::FLAG_TAKE_DC | gfxWindowsSurface::FLAG_FOR_PRINTING);
> > +      if (newSurface->GetType() == (gfxSurfaceType)-1) {
> 
> It this the standard way of checking if a surface is invalid? Would be nice
> to have an IsValid() helper or something.

I agree, probably worth a separate bug to cover that.
(In reply to Milan Sreckovic [:milan] from comment #59)
> Created attachment 8622667 [details] [diff] [review]
> Watch for failing CreateDCW. Carry r=dvander
> 
> Sigh. https://treeherder.mozilla.org/#/jobs?repo=try&revision=a3e7f6636f81

Still not green, but this seems to be unrelated.  Will retry.
(In reply to Milan Sreckovic [:milan] from comment #62)
> (In reply to Milan Sreckovic [:milan] from comment #59)
> > Created attachment 8622667 [details] [diff] [review]
> > Watch for failing CreateDCW. Carry r=dvander
> > 
> > Sigh. https://treeherder.mozilla.org/#/jobs?repo=try&revision=a3e7f6636f81
> 
> Still not green, but this seems to be unrelated.  Will retry.

Looks like one of the crash tests is triggering that "ASurface Init failing with Cairo status " message and the gfx critical logger is set up to assert when used.
Right, but not something I'd expect to see from the change - I ran another one: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0fb1d17692e7 but that didn't run all the tests on XP, not sure if there is a way to trigger those.
This should pass try now; the old code was handling failed init, may as well continue that.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=80e237966214
Attachment #8622667 - Attachment is obsolete: true
Attachment #8624268 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/f57fc897b7bb
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
I think it is too late/risky for beta uplift. I'd like us to try it out in 40 though. Then we will be able to see how the fix affects the larger beta audience who seem to consistently hit this issue. 

Milan, can you request aurora uplift if you think that's a good idea?
Flags: needinfo?(milan)
Comment on attachment 8624268 [details] [diff] [review]
Watch for failing CreateDCW. Carry r=dvander

Approval Request Comment
[Feature/regressing bug #]: Not sure about the regression window.
[User impact if declined]: Top crash
[Describe test coverage new/current, TreeHerder]:
[Risks and why]: Low risk, we're handling invalid returns.
[String/UUID change made/needed]:
Flags: needinfo?(milan)
Attachment #8624268 - Flags: approval-mozilla-aurora?
Comment on attachment 8624268 [details] [diff] [review]
Watch for failing CreateDCW. Carry r=dvander

Top crash, taking it.
Attachment #8624268 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
i guess this needs a rebase for aurora
(In reply to Bob Owen (:bobowen) from comment #45)
> I've been distracted by other things at the moment, but the printing bug was
> next on my list before these other things came up.

Which is the bug report?
(In reply to Loic from comment #74)
> (In reply to Bob Owen (:bobowen) from comment #45)
> > I've been distracted by other things at the moment, but the printing bug was
> > next on my list before these other things came up.
> 
> Which is the bug report?

Bug 1156742 was the original report.
Bug 1090454 is the older more general one, that should fix this.
See Also: → 1222487
Blocks: 1222487
Comment on attachment 8731152 [details] [diff] [review]
patch for esr38

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
User impact if declined: Could cause a crash during printing. It was reported in Bug 1222487 comment 0.
Fix Landed on Version: firefox40
Risk to taking this patch (and alternatives if risky): low risk. The patch just add oom error handling.
String or UUID changes made by this patch: none

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Attachment #8731152 - Flags: approval-mozilla-esr38?
Crash Signature: [@ mozilla::gfx::UserData::Add(mozilla::gfx::UserDataKey*, void*, void (*)(void*))] → [@ mozilla::gfx::UserData::Add(mozilla::gfx::UserDataKey*, void*, void (*)(void*))] [@ mozilla::gfx::UserData::Add]
Comment on attachment 8731152 [details] [diff] [review]
patch for esr38

Sorry, it doesn't match the esr criteria
Attachment #8731152 - Flags: approval-mozilla-esr38? → approval-mozilla-esr38-
You need to log in before you can comment on or make changes to this bug.