Last Comment Bug 651498 - gfx initialization requires that prefs be read (e.g., gfx.direct2d.disabled no longer has any effect)
: gfx initialization requires that prefs be read (e.g., gfx.direct2d.disabled n...
Status: RESOLVED FIXED
: regression
Product: Core
Classification: Components
Component: Graphics (show other bugs)
: Trunk
: x86 Windows 7
: -- normal with 5 votes (vote)
: ---
Assigned To: Zack Weinberg (:zwol)
:
Mentors:
Depends on: 651926
Blocks: 651017 652754
  Show dependency treegraph
 
Reported: 2011-04-20 07:51 PDT by Kai Liu
Modified: 2013-12-27 14:21 PST (History)
25 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
fixed


Attachments
the obvious fix that might well be wrong (1.05 KB, patch)
2011-04-20 09:50 PDT, Zack Weinberg (:zwol)
zackw: review-
Details | Diff | Splinter Review
hopefully a good fix (13.08 KB, patch)
2011-04-21 11:21 PDT, Zack Weinberg (:zwol)
no flags Details | Diff | Splinter Review
fix v1.1 (13.08 KB, patch)
2011-04-21 12:14 PDT, Zack Weinberg (:zwol)
no flags Details | Diff | Splinter Review
fix v2 (still doesn't work) (18.20 KB, patch)
2011-04-25 15:11 PDT, Zack Weinberg (:zwol)
zackw: review-
Details | Diff | Splinter Review
v3 (should actually work) (5.51 KB, patch)
2011-04-26 07:42 PDT, Zack Weinberg (:zwol)
joe: review+
Details | Diff | Splinter Review
stack (from WinDbg) (12.82 KB, text/plain)
2011-04-26 15:15 PDT, Kai Liu
no flags Details
v3.1 (fix windows GfxInfo bug) (7.31 KB, patch)
2011-04-26 17:25 PDT, Zack Weinberg (:zwol)
benjamin: review+
Details | Diff | Splinter Review

Description Kai Liu 2011-04-20 07:51:00 PDT
Works: 2b974a79020f
Broken: 1764e405eb02

-> Bug 651017
Comment 1 Kai Liu 2011-04-20 08:20:51 PDT
The problem is with the part 2 patch and the moving of the initialization to app-startup.  The user profile (and thus preferences) are not yet loaded at this stage.  profile-do-change would probably be more appropriate than app-startup.
Comment 2 Zack Weinberg (:zwol) 2011-04-20 09:46:12 PDT
Ok, so it's trivial to change the category from app-startup to profile-do-change (will attach patch shortly), and it's pretty clear from code inspection that gfxPlatform::Init needs preferences, and profile-do-change is documented as the earliest preferences are available.  I regret not noticing this earlier.

But I don't know if profile-do-change is too LATE for gfxPlatform::Init.  There are lots of gfx objects that can be created without going through XPCOM, it's not clear to me how early this can happen, and it's not clear to me how disastrous it is if any of those objects are created before gfxPlatform::Init has been called.  libpr0n takes some pains to make sure it's been called; taking that out has no negative effect on a smoke test on my machine, but this bug demonstrates that there are dependencies in here that are invisible both to smoke tests and our build farm.

(See also long whine on m.d.platform -- I don't even know if my approach to ensuring ::Init gets called is sane!)

So I'm reluctant to put in the obvious fix without further advice.
Comment 3 Zack Weinberg (:zwol) 2011-04-20 09:50:32 PDT
Created attachment 527304 [details] [diff] [review]
the obvious fix that might well be wrong
Comment 4 Benjamin Smedberg [:bsmedberg] 2011-04-20 10:47:00 PDT
See bug 576540 for relevant history here. This patch is definitely incorrect, because we don't fire profile notifications in some cases, such as when showing profile migration UI or the profile manager. I also put more details and suggestions in mozilla.dev.platform
Comment 5 Zack Weinberg (:zwol) 2011-04-20 10:53:28 PDT
Comment on attachment 527304 [details] [diff] [review]
the obvious fix that might well be wrong

Your comments in .platform were very helpful.  I'll have a hard look at what thebes actually needs, but I may not be able to do it till tomorrow.  Do you think this is a serious enough problem that we should back out bug 651017?
Comment 6 Benjamin Smedberg [:bsmedberg] 2011-04-20 10:59:10 PDT
That's a question for jrm/bas, I think.
Comment 7 Joe Drew (not getting mail) 2011-04-20 11:58:30 PDT
This doesn't require a backout. Turning off hardware acceleration via our blacklist should still work just fine, as well as turning off layers acceleration (which is read in widget iirc).

This'll impede testing enough that it needs to be tracked for Fx6 though.
Comment 8 Bas Schouten (:bas.schouten) 2011-04-20 12:17:51 PDT
Not to mention people who don't like HW accelerated font rendering or are having other issues. Those people should really be able to switch it off.
Comment 9 Bas Schouten (:bas.schouten) 2011-04-20 12:19:39 PDT
If I wasn't clear enough, this makes the 'Use hardware acceleration' checkbox, which is part of the actual menu, not work properly I think. If you use that now you'll get BasicLayers (since that pref is read by widget and probably still used) with Direct2D in theory, that's an untested combination and a big issue!
Comment 10 Leon Sorokin 2011-04-20 14:40:09 PDT
forced hardware acceleration font rendering makes Leon a sad panda :(

in lieu of eyes bleeding, reverting to 04/19 nightly till resolved.
Comment 11 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2011-04-20 16:41:06 PDT
I think probably we should backout to fix this.
Comment 12 Zack Weinberg (:zwol) 2011-04-21 11:21:18 PDT
Created attachment 527590 [details] [diff] [review]
hopefully a good fix

This should return things to a state of being no worse than they were before I did anything. :)  gfxPlatform::Init is now called from the first call to gfxPlatform::GetPlatform, which should mean that it happens exactly as early as it has to, and no earlier.  It appears already to cope with being called either before or after preferences are available.  gfxPlatform::Shutdown is called from an xpcom-shutdown observer installed as the last act of gfxPlatform::Init, and nsDeviceContext::ClearCachedSystemFonts is called from an xpcom-shutdown observer installed on the first call to GetSystemFont.  Collectively, this means we don't need to do anything in the gfx module constructor or destructor anymore, and we don't need to worry about whether that module has been initialized.

Not sure who's an appropriate reviewer.  Also, none of my computers have a hardware/OS combination for which a non-blacklisted graphics driver exists, so I can't confirm that this fixes the original regression.  Smoke-tested ok on my Linux box, pushed to try.
Comment 13 Zack Weinberg (:zwol) 2011-04-21 11:24:51 PDT
As a side note, gfxPlatform::Init ought to crash rather than returning an error code -- even in production builds -- if anything fails to start up, because nothing checks for a null pointer return from gfxPlatform::GetPlatform, so we're going to crash anyway.  But this was already broken, so I think it should get its own bug.
Comment 14 Zack Weinberg (:zwol) 2011-04-21 11:35:39 PDT
(In reply to comment #13)
> But this was already broken, so I think it should get its own bug.

Namely, bug 651926.
Comment 15 Zack Weinberg (:zwol) 2011-04-21 12:14:27 PDT
Created attachment 527612 [details] [diff] [review]
fix v1.1

Revised patch now with (hopefully) 100% fewer build-breaking typos.
Comment 16 Joe Drew (not getting mail) 2011-04-21 12:23:09 PDT
Comment on attachment 527612 [details] [diff] [review]
fix v1.1

do like! I'll test if it works when we get the try builds :)
Comment 17 Zack Weinberg (:zwol) 2011-04-21 13:18:18 PDT
Try builds are now turning up at http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/zackw@panix.com-30f79c37b8dc/ -- no Windows yet, though.  And there seems to be a memory leak in need of plugging, but it should still be possible to test for the regression having been fixed.

See also http://tbpl.mozilla.org/?tree=MozillaTry&rev=30f79c37b8dc
Comment 18 Kai Liu 2011-04-21 16:04:59 PDT
The tryserver build (In reply to comment #17)
> it should still be possible to test for the regression having been fixed.

I can confirm that the tryserver build does fix the regression.
Comment 19 Kai Liu 2011-04-21 16:06:43 PDT
...and that it also crashes on shutdown.  :)
Comment 21 Zack Weinberg (:zwol) 2011-04-21 16:38:49 PDT
If anything it appears xpcom-shutdown is too EARLY.  Check out this stack trace:
(from http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1303427904.1303428391.11630.gz)

###!!! ABORT: Already started???: file ../../../gfx/thebes/gfxPlatform.cpp, line 267
gfxPlatform::Init [gfxPlatform.cpp:269]
gfxPlatform::GetPlatform [gfxPlatform.cpp:260]
PresShell::GetReferenceRenderingContext [nsPresShell.cpp:3765]
PresShell::DoReflow [nsPresShell.cpp:7639]
PresShell::ProcessReflowCommands [nsPresShell.cpp:7848]
PresShell::FlushPendingNotifications [nsPresShell.cpp:4812]
mozilla::imagelib::SVGDocumentWrapper::FlushLayout [SVGDocumentWrapper.cpp:468]
mozilla::imagelib::SVGDocumentWrapper::OnStopRequest [SVGDocumentWrapper.cpp:308]
mozilla::imagelib::VectorImage::OnStopRequest [VectorImage.cpp:660]
imgRequest::OnStopRequest [imgRequest.cpp:927]
ProxyListener::OnStopRequest [imgLoader.cpp:2002]
imgCacheValidator::OnStopRequest [imgLoader.cpp:2158]
nsBaseChannel::OnStopRequest [nsBaseChannel.cpp:728]
nsInputStreamPump::OnStateStop [nsInputStreamPump.cpp:579]
nsInputStreamPump::OnInputStreamReady [nsInputStreamPump.cpp:403]
nsInputStreamReadyEvent::Run [nsStreamUtils.cpp:115]
nsThread::ProcessNextEvent [nsThread.cpp:618]
NS_ProcessPendingEvents_P [nsThreadUtils.cpp:200]
mozilla::ShutdownXPCOM [nsXPComInit.cpp:625]
NS_ShutdownXPCOM_P [nsXPComInit.cpp:583]
ScopedXPCOMStartup::~ScopedXPCOMStartup [nsAppRunner.cpp:1112]
XRE_main [nsAppRunner.cpp:3800]
main [nsBrowserApp.cpp:158]

The very first thing mozilla::ShutdownXPCOM does is call the shutdown observers, and it spins the event loop several times after that point.  I'm not sure we have any business calling into *reflow*, of all things, after shutdown has been initiated, but it might be very hard to prevent.

Kinda at a loss for options.
Comment 22 Zack Weinberg (:zwol) 2011-04-21 16:47:16 PDT
Note also that this is not the only way we can loop back into gfx after gfxPlatform::Shutdown -- http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1303428751.1303429087.14086.gz shows us leaking gfxFontEntry objects, and I'm pretty sure that's because they're getting created after ::Shutdown was called, but somehow *without* hitting the tripwires I put in both gfxPlatform::Init and gfxFontCache::Init.
Comment 23 Zack Weinberg (:zwol) 2011-04-25 15:11:22 PDT
Created attachment 528193 [details] [diff] [review]
fix v2 (still doesn't work)

We have a shutdown-ordering problem here which is beyond my ability to solve.  The attached patch scores all-green on the try server except for a leak of one lousy gfxFontEntry on the Linux debug-mode crashtests -- but that's the tip of a very large iceberg.

1) gfxPlatform::Shutdown must not be called until after we are completely done spinning the event loop; in particular, xpcom-shutdown observers are much too early.  (This is what was wrong with the previous patch.)

2) It must also not be called until after nsLayoutStatics::Shutdown (unsurprisingly).

3) nsLayoutStatics::Shutdown can be called as early as the xpcom-shutdown observers or as late as nsCycleCollector_shutdown, depending on conditions which are unclear to me.  Therefore, because of (1) we cannot simply call gfxPlatform::Shutdown _from_ nsLayoutStatics::Shutdown.

4) There is, as far as I can tell, no usable hook point after nsCycleCollector_shutdown.  The thing I'm doing in this patch to call gfxPlatform::Shutdown from FreeServices() (which happens immediately *before* nsCycleCollector_shutdown) is already pretty damn ugly and it's not good enough.
Comment 24 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2011-04-25 16:41:35 PDT
Can we remove requirement #1 somehow (what causes that requirement?) and then call gfxPlatform::Shutdown from nsLayoutStatics::Shutdown?
Comment 25 Zack Weinberg (:zwol) 2011-04-25 16:54:21 PDT
(In reply to comment #24)
> Can we remove requirement #1 somehow (what causes that requirement?) and then
> call gfxPlatform::Shutdown from nsLayoutStatics::Shutdown?

Anytime the event loop spins inside ShutdownXPCOM, we can wind up inside reflow, and from there try to call gfx.  See for instance the stack trace in comment 21. This appears to be possible *even if* nsLayoutStatics::Shutdown has already happened -- that doesn't seem like it should work to me, but some of the crashes I got on the try server only make sense if that's what happened.

However, an approach which _should_ avoid all of these problems occurred to me right after I gave up (isn't that always the way?)  I'm not going to post it till I see it pass try, though.
Comment 26 Zack Weinberg (:zwol) 2011-04-26 07:42:22 PDT
Created attachment 528309 [details] [diff] [review]
v3 (should actually work)

It finally dawned on me yesterday that I was trying too hard.

To fix this bug, we only need to change when the first call to gfxPlatform::Init() happens, it having been demonstrated that an app-startup notification is too early.  We do *not* need to change when gfxPlatform::Shutdown() happens.  That works fine now from component teardown, and changing that is what was causing all the trouble.

The only trick, then, is making sure that the gfx component is registered even if we don't need any of the remaining XPCOM classes in there, so that its destructor gets called and can call ::Shutdown.  That's easily done by calling do_CreateInstance on that dummy contract ID that I added the last time around -- from gfxPlatform::Init, rather than from a category entry.  ::Init is still called lazily from ::GetPlatform, which was the change that actually addressed the bug, 'way up above.

This is all green on the try server.  It would probably be good if someone re-tested the try builds - https://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/zackw@panix.com-68e1263d0c08/ - to make sure this does fix the bug.

We might want to try again to find a better place to call ::Shutdown if we ever get rid of the remaining gfx XPCOM classes (three remain: nsIRegion, nsIScriptableRegion, nsIFontEnumerator) but since two of them are scriptable and actually used from JS in the tree, I don't think we need to worry about it right now.
Comment 27 Kai Liu 2011-04-26 08:50:21 PDT
Your try-build is hitting this:
xpcom_runtime_abort(###!!! ABORT: Already started???: file e:/builds/moz2_slave/try-w32/build/gfx/thebes/gfxPlatform.cpp, line 245)
Comment 28 Zack Weinberg (:zwol) 2011-04-26 08:56:59 PDT
(In reply to comment #27)
> Your try-build is hitting this:
> xpcom_runtime_abort(###!!! ABORT: Already started???: file
> e:/builds/moz2_slave/try-w32/build/gfx/thebes/gfxPlatform.cpp, line 245)

Can you get a stack trace?  (XPCOM_DEBUG_BREAK=stack-and-abort should do it.)
Comment 29 Kai Liu 2011-04-26 15:15:27 PDT
Created attachment 528458 [details]
stack (from WinDbg)

> Can you get a stack trace?  (XPCOM_DEBUG_BREAK=stack-and-abort should do it.)

Sorry, been away this afternoon.  Does XPCOM_DEBUG_BREAK work with NS_RUNTIMEABORT?  It's not giving me anything, so I used WinDbg instead...
Comment 30 Zack Weinberg (:zwol) 2011-04-26 15:21:42 PDT
(In reply to comment #29)
> 
> Sorry, been away this afternoon.  Does XPCOM_DEBUG_BREAK work with
> NS_RUNTIMEABORT?  It's not giving me anything, so I used WinDbg instead...

I guess it doesn't :-(  Your WinDbg stack trace would be fine except that it seems to have been cut off after the first few frames on thread 0 (which is the only thread whose stack trace is in any way relevant).  Can you do that again, please, and this time trace only thread 0, but all the way to the top?

If it claims that that's all the stack frames that there are, then it's confused, and I need you to find a tool that doesn't get confused.  Or switch to the debug build, PGO might be why it's confused.

Also, exactly what were you doing when you got this crash?  Closing the browser, or?
Comment 31 Joe Drew (not getting mail) 2011-04-26 15:31:58 PDT
Comment on attachment 528309 [details] [diff] [review]
v3 (should actually work)

Review of attachment 528309 [details] [diff] [review]:

I'm sad that we couldn't delete that dummy component, but happy that this is a much less involved patch than the previous one.

::: modules/libpr0n/build/nsImageModule.cpp
@@ -119,5 @@
 imglib_Initialize()
 {
-  // We need the gfx module to be initialized because we use gfxPlatform
-  // in imgFrame. It should have happened by now, but make sure.
-  nsCOMPtr<nsISupports> dummy = do_GetService("@mozilla.org/gfx/init;1");

Should we really remove this? I don't know if we're able to guarantee ordering of component bringup.
Comment 32 neil@parkwaycc.co.uk 2011-04-26 15:49:33 PDT
Comment on attachment 528458 [details]
stack (from WinDbg)

>WARNING: Stack unwind information not available. Following frames may be wrong.
This means you're not getting any symbols. Please configure symbols as per https://developer.mozilla.org/en/Using_the_Mozilla_symbol_server
Comment 33 Kai Liu 2011-04-26 16:12:08 PDT
Comment on attachment 528458 [details]
stack (from WinDbg)

Ah, I was using the wrong symbol server.

.  0  Id: d5c.be4 Suspend: 1 Teb: fffdd000 Unfrozen
ChildEBP RetAddr  
0025c434 695180b6 mozalloc!mozalloc_abort(char * msg = 0x6905261f "???")+0x2b [e:\builds\moz2_slave\try-w32\build\memory\mozalloc\mozalloc_abort.cpp @ 77]
0025c85c 6905261f xul!NS_DebugBreak_P(unsigned int aSeverity = 3, char * aStr = 0x698bc3e4 "Already started???", char * aExpr = 0x00000000 "", char * aFile = 0x698bc3a4 "e:/builds/moz2_slave/try-w32/build/gfx/thebes/gfxPlatform.cpp", int aLine = 0n245)+0x1da [e:\builds\moz2_slave\try-w32\build\xpcom\base\nsdebugimpl.cpp @ 345]
0025c8a8 68d7c06b xul!gfxPlatform::Init+0x1b90c8
0025c8b0 690d7d2a xul!gfxPlatform::GetPlatform(void)+0x14 [e:\builds\moz2_slave\try-w32\build\gfx\thebes\gfxplatform.cpp @ 238]
0025cf14 68eba9cf xul!mozilla::widget::GfxInfo::Init+0x27bc24
0025cf24 68ce8df8 xul!mozilla::widget::GfxInfoConstructor(class nsISupports * aOuter = 0x698fc06c, struct nsID * aIID = 0x0092e140, void ** aResult = 0x0095e040)+0x41 [e:\builds\moz2_slave\try-w32\build\widget\src\build\nswinwidgetfactory.cpp @ 103]
0025cf34 68cb02fc xul!mozilla::GenericFactory::CreateInstance(class nsISupports * aOuter = 0x0092e140, struct nsID * aIID = 0x0095e040, void ** aResult = 0x009821ac)+0x18 [e:\builds\moz2_slave\try-w32\build\obj-firefox\xpcom\build\genericfactory.cpp @ 48]
0025cf64 68d4b9fc xul!nsComponentManagerImpl::CreateInstanceByContractID(char * aContractID = 0x0092e140 "???", class nsISupports * aDelegate = 0x0095e040, struct nsID * aIID = 0x009821ac, void ** aResult = 0x0095e0bc)+0xcc [e:\builds\moz2_slave\try-w32\build\xpcom\components\nscomponentmanager.cpp @ 1302]
0025cfb8 68d73485 xul!nsComponentManagerImpl::GetServiceByContractID(char * aContractID = 0x7866672f "--- memory read error at address 0x7866672f ---", struct nsID * aIID = 0x666e692f, void ** result = 0x00313b6f)+0x2ec [e:\builds\moz2_slave\try-w32\build\xpcom\components\nscomponentmanager.cpp @ 1706]
0025cfd4 68e995d7 xul!nsCOMPtr_base::assign_from_gs_contractid(class nsGetServiceByContractID gs = class nsGetServiceByContractID, struct nsID * iid = 0x00000003)+0x25 [e:\builds\moz2_slave\try-w32\build\obj-firefox\xpcom\build\nscomptr.cpp @ 132]
0025d010 68d7c06b xul!gfxPlatform::Init(void)+0x80 [e:\builds\moz2_slave\try-w32\build\gfx\thebes\gfxplatform.cpp @ 271]
0025d018 68e07284 xul!gfxPlatform::GetPlatform(void)+0x14 [e:\builds\moz2_slave\try-w32\build\gfx\thebes\gfxplatform.cpp @ 238]
0025d020 68e06e6b xul!ShouldUseImageSurfaces(void)+0x6 [e:\builds\moz2_slave\try-w32\build\modules\libpr0n\src\imgframe.cpp @ 126]
0025d040 68e0a82f xul!imgFrame::Init(int aX = 0n0, int aY = 0n0, int aWidth = 0n2, int aHeight = 0n0, gfxASurface::gfxImageFormat aFormat = ImageFormatARGB32 (0n0), char aPaletteDepth = 0n0 '')+0x58 [e:\builds\moz2_slave\try-w32\build\modules\libpr0n\src\imgframe.cpp @ 211]
0025d09c 68c63e63 xul!mozilla::imagelib::RasterImage::InternalAddFrame(unsigned int framenum = 0, int aX = 0n0, int aY = 0n0, int aWidth = 0n2, int aHeight = 0n2, gfxASurface::gfxImageFormat aFormat = ImageFormatARGB32 (0n0), unsigned char aPaletteDepth = 0x00 '', unsigned char ** imageData = 0x06b4903c, unsigned int * imageLength = 0x0025d128, unsigned int ** paletteData = 0x00000000, unsigned int * paletteLength = 0x00000000)+0x5f [e:\builds\moz2_slave\try-w32\build\modules\libpr0n\src\rasterimage.cpp @ 820]
0025d0d4 68e4d97d xul!mozilla::imagelib::RasterImage::AppendFrame(int aX = 0n0, int aY = 0n0, int aWidth = 0n2, int aHeight = 0n2, gfxASurface::gfxImageFormat aFormat = ImageFormatARGB32 (0n0), unsigned char ** imageData = 0x06b4903c, unsigned int * imageLength = 0x0025d128)+0x45 [e:\builds\moz2_slave\try-w32\build\modules\libpr0n\src\rasterimage.cpp @ 881]
0025d120 68e4dd2d xul!mozilla::imagelib::nsGIFDecoder2::BeginImageFrame(unsigned short aDepth = 1)+0x4d [e:\builds\moz2_slave\try-w32\build\modules\libpr0n\decoders\nsgifdecoder2.cpp @ 229]
0025d144 68e432c4 xul!mozilla::imagelib::nsGIFDecoder2::WriteInternal(char * aBuffer = <Memory access error>, unsigned int aCount = <Memory access error>)+0x36a [e:\builds\moz2_slave\try-w32\build\modules\libpr0n\decoders\nsgifdecoder2.cpp @ 949]
0025d158 68e43336 xul!mozilla::imagelib::RasterImage::WriteToDecoder(char * aBuffer = 0x68c99327 "???", unsigned int aCount = 0x5ff2ce8)+0x2c [e:\builds\moz2_slave\try-w32\build\modules\libpr0n\src\rasterimage.cpp @ 2265]
0025d16c 68e433cc xul!mozilla::imagelib::RasterImage::AddSourceData(char * aBuffer = 0x68c99327 "???", unsigned int aCount = 0x5ff2ce8)+0x30 [e:\builds\moz2_slave\try-w32\build\modules\libpr0n\src\rasterimage.cpp @ 1237]
0025d180 68de7d51 xul!mozilla::imagelib::RasterImage::WriteToRasterImage(class nsIInputStream * __formal = 0x68c99327, void * aClosure = 0x05ff2ce8, char * aFromRawSegment = 0x68e433b8 "V???", unsigned int aCount = 0x2c, unsigned int * aWriteCount = 0x0025d224)+0x14 [e:\builds\moz2_slave\try-w32\build\modules\libpr0n\src\rasterimage.cpp @ 2761]
0025d1b8 68c99327 xul!nsPipeInputStream::ReadSegments(<function> * writer = 0x68e433b8, void * closure = 0x068fcbc0, unsigned int count = 0x2c, unsigned int * readCount = 0x0025d224)+0xa1 [e:\builds\moz2_slave\try-w32\build\xpcom\io\nspipe3.cpp @ 800]
0025d324 68c6b190 xul!imgRequest::OnDataAvailable(class nsIRequest * aRequest = 0x00000001, class nsISupports * ctxt = 0x00000000, class nsIInputStream * inStr = 0x06841280, unsigned int sourceOffset = 1, unsigned int count = 0x73710000)+0x3c7 [e:\builds\moz2_slave\try-w32\build\modules\libpr0n\src\imgrequest.cpp @ 1152]
0025d344 68c79bfb xul!ProxyListener::OnDataAvailable(class nsIRequest * aRequest = 0x060fef80, class nsISupports * ctxt = 0x00000000, class nsIInputStream * inStr = 0x05ff2ce8, unsigned int sourceOffset = 0, unsigned int count = 0x2c)+0x26 [e:\builds\moz2_slave\try-w32\build\modules\libpr0n\src\imgloader.cpp @ 2014]
0025d370 68c7d193 xul!nsJARChannel::OnDataAvailable(class nsIRequest * req = 0x068ffec0, class nsISupports * ctx = 0x00000000, class nsIInputStream * stream = 0x05ff2ce8, unsigned int offset = 0, unsigned int count = 0x2c)+0x24 [e:\builds\moz2_slave\try-w32\build\modules\libjar\nsjarchannel.cpp @ 944]
0025d3b4 68c7d009 xul!nsInputStreamPump::OnStateTransfer(void)+0xd3 [e:\builds\moz2_slave\try-w32\build\netwerk\base\src\nsinputstreampump.cpp @ 510]
0025d3c8 68dd9c11 xul!nsInputStreamPump::OnInputStreamReady(class nsIAsyncInputStream * stream = 0x05ff2ce8)+0x28 [e:\builds\moz2_slave\try-w32\build\netwerk\base\src\nsinputstreampump.cpp @ 407]
0025d3d8 68d4c90a xul!nsInputStreamReadyEvent::Run(void)+0x1d [e:\builds\moz2_slave\try-w32\build\xpcom\io\nsstreamutils.cpp @ 115]
0025d3fc 68e7e096 xul!nsThread::ProcessNextEvent(int mayWait = <Memory access error>, int * result = <Memory access error>)+0x15a [e:\builds\moz2_slave\try-w32\build\xpcom\threads\nsthread.cpp @ 624]
0025d440 68f06ee2 xul!nsThread::GetObserver(class nsIThreadObserver ** obs = 0x00000001)+0x25 [e:\builds\moz2_slave\try-w32\build\xpcom\threads\nsthread.cpp @ 692]
0025d44c 68e94bc5 xul!NS_InvokeByIndex_P(class nsISupports * that = 0x00000001, unsigned int methodIndex = 1, unsigned int paramCount = 0x934400, struct nsXPTCVariant * params = 0x0095aee0)+0x27 [e:\builds\moz2_slave\try-w32\build\xpcom\reflect\xptcall\src\md\win32\xptcinvoke.cpp @ 103]
0025d464 68d4c90a xul!nsProxyObjectCallInfo::Run(void)+0x1a [e:\builds\moz2_slave\try-w32\build\xpcom\proxy\src\nsproxyevent.cpp @ 182]
0025d520 68d7582e xul!nsThread::ProcessNextEvent(int mayWait = <Memory access error>, int * result = <Memory access error>)+0x15a [e:\builds\moz2_slave\try-w32\build\xpcom\threads\nsthread.cpp @ 624]
0025d554 68edacaf xul!mozilla::ipc::MessagePump::Run(class base::MessagePump::Delegate * aDelegate = 0x698773b0)+0x6e [e:\builds\moz2_slave\try-w32\build\ipc\glue\messagepump.cpp @ 110]
0025d560 68edac98 xul!MessageLoop::RunInternal(void)+0x11 [e:\builds\moz2_slave\try-w32\build\ipc\chromium\src\base\message_loop.cc @ 219]
0025d578 68bb808b xul!MessageLoop::RunHandler(void)+0x1d [e:\builds\moz2_slave\try-w32\build\ipc\chromium\src\base\message_loop.cc @ 203]
0025d598 68edac44 MOZCRT19!_VEC_memzero(void * dst = 0x0025d5e0, int val = 0n1, int len = 0n1759201280)+0x36
0025d5b8 68ec473d xul!MessageLoop::Run(void)+0x15 [e:\builds\moz2_slave\try-w32\build\ipc\chromium\src\base\message_loop.cc @ 177]
0025d5c4 68ef109c xul!nsBaseAppShell::Run(void)+0x34 [e:\builds\moz2_slave\try-w32\build\widget\src\xpwidgets\nsbaseappshell.cpp @ 191]
0025f518 68ef1046 xul!nsAppShell::Run(void)+0x42 [e:\builds\moz2_slave\try-w32\build\widget\src\windows\nsappshell.cpp @ 254]
0025f524 68c6b050 xul!nsAppStartup::Run(void)+0x1e [e:\builds\moz2_slave\try-w32\build\toolkit\components\startup\nsappstartup.cpp @ 225]
0025f8ac 003d134c xul!XRE_main(int argc = 0n1, char ** argv = 0x0092b0a8, struct nsXREAppData * aAppData = 0x00915300)+0xde7 [e:\builds\moz2_slave\try-w32\build\toolkit\xre\nsapprunner.cpp @ 3769]
0025f8fc 003d16f2 firefox!wmain(int argc = 0n1, wchar_t ** argv = 0x0092d790)+0x34c [e:\builds\moz2_slave\try-w32\build\toolkit\xre\nswindowswmain.cpp @ 128]
0025f93c 764033ca firefox!__tmainCRTStartup(void)+0x152 [e:\builds\moz2_slave\try-w32\build\obj-firefox\memory\jemalloc\crtsrc\crtexe.c @ 591]
0025f948 76ee9ed2 kernel32!BaseThreadInitThunk+0xe
0025f988 76ee9ea5 ntdll!__RtlUserThreadStart+0x70
0025f9a0 00000000 ntdll!_RtlUserThreadStart+0x1b
Comment 34 Kai Liu 2011-04-26 16:13:23 PDT
(In reply to comment #30)
> Also, exactly what were you doing when you got this crash?

Absolutely nothing.  It's dying on startup, before anything is shown.  It does so with a clean profile, too.
Comment 35 Kai Liu 2011-04-26 16:16:25 PDT
If it helps, the OS is 6.1.7601/x86-64, and the GPU is the Sandy Bridge onboard.
Comment 36 Zack Weinberg (:zwol) 2011-04-26 17:01:23 PDT
(In reply to comment #31)
> -  // We need the gfx module to be initialized because we use gfxPlatform
> -  // in imgFrame. It should have happened by now, but make sure.
> -  nsCOMPtr<nsISupports> dummy = do_GetService("@mozilla.org/gfx/init;1");
> 
> Should we really remove this? I don't know if we're able to guarantee ordering
> of component bringup.

I believe it to be unnecessary, because the code in libpr0n that needs gfx will end up calling gfxPlatform::GetPlatform anyway.
Comment 37 Zack Weinberg (:zwol) 2011-04-26 17:18:09 PDT
(In reply to comment #33)
> Comment on attachment 528458 [details]
> stack (from WinDbg)
> 
> Ah, I was using the wrong symbol server.
[snip]

Thanks for the update.  This turns out to be caused by a *recursive* call to gfxPlatform::Init which only happens on Windows with Intel graphics drivers (which I guess does not describe the build farm).  Cleaning out all the extra stuff, the interesting part of the stack trace goes

mozalloc_abort
NS_DebugBreak_P
gfxPlatform::Init
gfxPlatform::GetPlatform
mozilla::widget::GfxInfo::Init
mozilla::widget::GfxInfoConstructor
mozilla::GenericFactory::CreateInstance
nsComponentManagerImpl::CreateInstanceByContractID
nsComponentManagerImpl::GetServiceByContractID
nsCOMPtr_base::assign_from_gs_contractid
gfxPlatform::Init
gfxPlatform::GetPlatform
ShouldUseImageSurfaces
imgFrame::Init

gfxPlatform::Init tries to pull up the GfxInfo service right before it initializes gPlatform (and thus prevents such recursion) -- that appears to be intentional, or at least there's a comment to the effect that GfxInfo::Init must take care not to do things that require GetPlatform() to succeed.  But the Windows version of GfxInfo::Init goes and does this:

  if (mAdapterVendorID == vendorIntel) {
    // we've had big crashers (bugs 590373 and 595364) apparently correlated
    // with bad Intel driver installations where the DriverVersion reported
    // by the registry was not the version of the DLL.
    PRBool is64bitApp = sizeof(void*) == 8;
    const PRUnichar *dllFileName = is64bitApp
                                 ? L"igd10umd64.dll"
                                 : L"igd10umd32.dll";
    nsString dllVersion;
    // if GetDLLVersion fails, it gives "0.0.0.0"
    gfxWindowsPlatform::GetPlatform()->GetDLLVersion((PRUnichar*)dllFileName, dllVersion);

That GetPlatform() is totally unnecessary, because GetDLLVersion is a static method, which is why it used to work (GetPlatform would just return NULL and the compiler would cheerfully ignore it).

Updated patch shortly; try builds will take a couple hours, but I'll post the link to the build directory as soon as I know what it is.
Comment 38 Zack Weinberg (:zwol) 2011-04-26 17:25:32 PDT
Created attachment 528494 [details] [diff] [review]
v3.1 (fix windows GfxInfo bug)
Comment 39 Zack Weinberg (:zwol) 2011-04-26 19:41:37 PDT
https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/zackw@panix.com-fa543f5d94f0/ for builds (only Linux so far)
Comment 40 Kai Liu 2011-04-26 22:03:01 PDT
The v3.1 patch does fix the regression for me without crashing.
Comment 41 neil@parkwaycc.co.uk 2011-04-27 01:09:10 PDT
Ideally we would fix the other callers too to avoid copying the mistake:
http://mxr.mozilla.org/mozilla-central/search?string=%3EGetDLLVersion
Comment 42 Kemcy BEST 2011-04-27 01:39:50 PDT
(In reply to comment #39)
> https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/zackw@panix.com-fa543f5d94f0/
> for builds (*****only Linux so far*****)

But isn't this bug for Win? Linux builds don't use d2d.
Comment 43 Kai Liu 2011-04-27 06:28:50 PDT
(In reply to comment #42)
> But isn't this bug for Win? Linux builds don't use d2d.

The failure to read graphics prefs on initialization is a cross-platform problem, and these patches affect the code paths of every version of Firefox.  The problem is simply more noticeable on Windows because the D2D implementation makes use of a number of initialization prefs.
Comment 44 Zack Weinberg (:zwol) 2011-04-27 07:27:06 PDT
(In reply to comment #41)
> Ideally we would fix the other callers too to avoid copying the mistake:
> http://mxr.mozilla.org/mozilla-central/search?string=%3EGetDLLVersion

Can do.

(In reply to comment #42)
> (In reply to comment #39)
> > https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/zackw@panix.com-fa543f5d94f0/
> > for builds (*****only Linux so far*****)
> 
> But isn't this bug for Win? Linux builds don't use d2d.

I was just posting the link to the build-results directory in advance of the Windows builds showing up (so I could go to bed :)  Windows builds are much slower than Linux builds.
Comment 45 Leon Sorokin 2011-04-29 15:15:07 PDT
been running that try build for a few days, it has fixed the issue without ill side effects for me.
Comment 46 Zack Weinberg (:zwol) 2011-04-29 16:50:40 PDT
http://hg.mozilla.org/mozilla-central/rev/8a650b9e55db
Comment 47 Joe Drew (not getting mail) 2011-04-29 17:22:44 PDT
\o/
Comment 48 Jim Mathies [:jimm] 2011-04-29 19:09:42 PDT
(In reply to comment #47)
> \o/

indeed!
Comment 49 Virgil Dicu [:virgil] [QA] 2011-08-12 09:07:22 PDT
Could you please provide a test case for this issue in order to check it?
Comment 50 Zack Weinberg (:zwol) 2011-08-12 09:14:14 PDT
Toggle gfx.direct2d.disabled in about:config and observe whether it does anything (you have to restart the browser) (note: I don't actually know what visible effect this has, except that it's something to do with font rendering).

Note You need to log in before you can comment on or make changes to this bug.