Closed Bug 951139 Opened 9 years ago Closed 9 years ago

Intermittent 508908-1-ref.xul | This test left crash dumps behind, but we weren't expecting it to! | application crashed [@ google_breakpad::ExceptionHandler::WriteMinidump(std::basic_string<wchar_t,std::char_traits<wchar_t>,std::allocator<wchar_t> > ...

Categories

(Core Graveyard :: Plug-ins, defect)

x86
Windows 8
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: RyanVM, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: crash, intermittent-failure)

Some sort of crash while shutting down the plugin process? Having a hard time reading this stack.

https://tbpl.mozilla.org/php/getParsedLog.php?id=32045064&tree=Fx-Team

WINNT 6.2 fx-team opt test reftest on 2013-12-16 13:46:06 PST for push 8ab821223047
slave: t-w864-ix-061

13:56:40     INFO -  REFTEST TEST-START | file:///C:/slave/test/build/tests/reftest/tests/layout/reftests/bugs/508908-1.xul
13:56:40     INFO -  REFTEST TEST-LOAD | file:///C:/slave/test/build/tests/reftest/tests/layout/reftests/bugs/508908-1.xul | 3270 / 10343 (31%)
13:56:41     INFO -  REFTEST TEST-LOAD | file:///C:/slave/test/build/tests/reftest/tests/layout/reftests/bugs/508908-1-ref.xul | 3270 / 10343 (31%)
13:58:16     INFO -  ###!!! [Parent][MessageChannel::Call] Error: Channel timeout: cannot send/recv
13:58:16     INFO -  REFTEST TEST-PASS | file:///C:/slave/test/build/tests/reftest/tests/layout/reftests/bugs/508908-1.xul | image comparison (==)
13:58:16     INFO -  REFTEST TEST-UNEXPECTED-FAIL | file:///C:/slave/test/build/tests/reftest/tests/layout/reftests/bugs/508908-1-ref.xul | This test left crash dumps behind, but we weren't expecting it to!
13:58:16     INFO -  REFTEST INFO | Found unexpected crash dump file c:\users\cltbld~1.t-w\appdata\local\temp\tmphmdsdr.mozrunner\minidumps\815a9ffd-35d9-4e97-8dc8-50f2f56ae057-browser.dmp.
13:58:16     INFO -  REFTEST INFO | Found unexpected crash dump file c:\users\cltbld~1.t-w\appdata\local\temp\tmphmdsdr.mozrunner\minidumps\815a9ffd-35d9-4e97-8dc8-50f2f56ae057.dmp.
13:58:16     INFO -  REFTEST INFO | Found unexpected crash dump file c:\users\cltbld~1.t-w\appdata\local\temp\tmphmdsdr.mozrunner\minidumps\815a9ffd-35d9-4e97-8dc8-50f2f56ae057.extra.
13:58:16     INFO -  REFTEST INFO | Loading a blank page
13:58:16     INFO -  REFTEST TEST-END | file:///C:/slave/test/build/tests/reftest/tests/layout/reftests/bugs/508908-1.xul

14:07:15     INFO -  REFTEST FINISHED: Slowest test took 94708ms (file:///C:/slave/test/build/tests/reftest/tests/layout/reftests/bugs/508908-1-ref.xul)
14:07:15     INFO -  REFTEST INFO | Result summary:
14:07:15     INFO -  REFTEST INFO | Successful: 9959 (9940 pass, 19 load only)
14:07:15  WARNING -  REFTEST INFO | Unexpected: 1 (1 unexpected fail, 0 unexpected pass, 0 unexpected asserts, 0 unexpected fixed asserts, 0 failed load, 0 exception)
14:07:15  WARNING -  One or more unittests failed.
14:07:15     INFO -  REFTEST INFO | Known problems: 385 (206 known fail, 0 known asserts, 111 random, 68 skipped, 0 slow)
14:07:15     INFO -  REFTEST INFO | Total canvas count = 8
14:07:17     INFO -  INFO | automation.py | Application ran for: 0:14:50.774000
14:07:17     INFO -  INFO | zombiecheck | Reading PID log: c:\users\cltbld~1.t-w\appdata\local\temp\tmplxtqk4pidlog
14:07:17     INFO -  ==> process 3556 launched child process 340 ("C:\slave\test\build\application\firefox\plugin-container.exe" --channel=3556.ea9b9b0.1225346891 "c:\users\cltbld~1.t-w\appdata\local\temp\tmphmdsdr.mozrunner\plugins\nptest.dll" -greomni "C:\slave\test\build\application\firefox\omni.ja" -appomni "C:\slave\test\build\application\firefox\browser\omni.ja" -appdir "C:\slave\test\build\application\firefox\browser" - 3556 "\\.\pipe\gecko-crash-server-pipe.3556" plugin)
14:07:17     INFO -  ==> process 3556 launched child process 1132 ("C:\slave\test\build\application\firefox\plugin-container.exe" --channel=3556.bbd1bc0.2007691327 "c:\users\cltbld~1.t-w\appdata\local\temp\tmphmdsdr.mozrunner\plugins\nptest.dll" -greomni "C:\slave\test\build\application\firefox\omni.ja" -appomni "C:\slave\test\build\application\firefox\browser\omni.ja" -appdir "C:\slave\test\build\application\firefox\browser" - 3556 "\\.\pipe\gecko-crash-server-pipe.3556" plugin)
14:07:17     INFO -  ==> process 3556 launched child process 4084 ("C:\slave\test\build\application\firefox\plugin-container.exe" --channel=3556.8271670.337672489 "c:\users\cltbld~1.t-w\appdata\local\temp\tmphmdsdr.mozrunner\plugins\nptest.dll" -greomni "C:\slave\test\build\application\firefox\omni.ja" -appomni "C:\slave\test\build\application\firefox\browser\omni.ja" -appdir "C:\slave\test\build\application\firefox\browser" - 3556 "\\.\pipe\gecko-crash-server-pipe.3556" plugin)
14:07:17     INFO -  mozcrash INFO | Downloading symbols from: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/fx-team-win32/1387225007/firefox-29.0a1.en-US.win32.crashreporter-symbols.zip
14:07:27  WARNING -  PROCESS-CRASH | Main app process exited normally | application crashed [@ google_breakpad::ExceptionHandler::WriteMinidump(std::basic_string<wchar_t,std::char_traits<wchar_t>,std::allocator<wchar_t> > const &,bool (*)(wchar_t const *,wchar_t const *,void *,_EXCEPTION_POINTERS *,MDRawAssertionInfo *,bool),void *)]
14:07:27     INFO -  Crash dump filename: c:\users\cltbld~1.t-w\appdata\local\temp\tmphmdsdr.mozrunner\minidumps\815a9ffd-35d9-4e97-8dc8-50f2f56ae057-browser.dmp
14:07:27     INFO -  Operating system: Windows NT
14:07:27     INFO -                    6.2.9200
14:07:27     INFO -  CPU: x86
14:07:27     INFO -       GenuineIntel family 6 model 30 stepping 5
14:07:27     INFO -       8 CPUs
14:07:27     INFO -  Crash reason:  EXCEPTION_NONCONTINUABLE_EXCEPTION
14:07:27     INFO -  Crash address: 0x0
14:07:27     INFO -  Thread 0 (crashed)
14:07:27     INFO -   0  xul.dll!google_breakpad::ExceptionHandler::WriteMinidump(std::basic_string<wchar_t,std::char_traits<wchar_t>,std::allocator<wchar_t> > const &,bool (*)(wchar_t const *,wchar_t const *,void *,_EXCEPTION_POINTERS *,MDRawAssertionInfo *,bool),void *) [exception_handler.cc:8ab821223047 : 741 + 0xb]
14:07:27     INFO -      eip = 0x73154abe   esp = 0x006ee974   ebp = 0x006eea54   ebx = 0x00000000
14:07:27     INFO -      esi = 0x006ee978   edi = 0x000008b4   eax = 0x006ee64c   ecx = 0x006ee918
14:07:27     INFO -      edx = 0x00000000   efl = 0x00200212
14:07:27     INFO -      Found by: given as instruction pointer in context
14:07:27     INFO -   1  xul.dll!CrashReporter::CreatePairedMinidumps(void *,unsigned long,nsIFile * *) [nsExceptionHandler.cpp:8ab821223047 : 2806 + 0x14]
14:07:27     INFO -      eip = 0x73150674   esp = 0x006eea5c   ebp = 0x006eead8
14:07:27     INFO -      Found by: call frame info
14:07:27     INFO -   2  xul.dll!mozilla::dom::CrashReporterParent::GeneratePairedMinidump<mozilla::plugins::PluginModuleParent>(mozilla::plugins::PluginModuleParent *) [CrashReporterParent.h:8ab821223047 : 117 + 0x10]
14:07:27     INFO -      eip = 0x729af570   esp = 0x006eeae0   ebp = 0x006eeaf0
14:07:27     INFO -      Found by: call frame info
14:07:27     INFO -   3  xul.dll!mozilla::plugins::PluginModuleParent::TerminateChildProcess(MessageLoop *) [PluginModuleParent.cpp:8ab821223047 : 457 + 0x7]
14:07:27     INFO -      eip = 0x729b09c9   esp = 0x006eeaf8   ebp = 0x006eec04
14:07:27     INFO -      Found by: call frame info
14:07:27     INFO -   4  xul.dll!mozilla::plugins::PluginModuleParent::ShouldContinueFromReplyTimeout() [PluginModuleParent.cpp:8ab821223047 : 425 + 0xc]
14:07:27     INFO -      eip = 0x729b1182   esp = 0x006eec0c   ebp = 0x006eeca0
14:07:27     INFO -      Found by: call frame info
14:07:27     INFO -   5  xul.dll!mozilla::ipc::MessageChannel::ShouldContinueFromTimeout() [MessageChannel.cpp:8ab821223047 : 1219 + 0xb]
14:07:27     INFO -      eip = 0x7252049c   esp = 0x006eec18   ebp = 0x006eeca0
14:07:27     INFO -      Found by: call frame info
14:07:27     INFO -   6  xul.dll!mozilla::ipc::MessageChannel::InterruptCall(IPC::Message *,IPC::Message *) [MessageChannel.cpp:8ab821223047 : 624 + 0x8]
14:07:27     INFO -      eip = 0x7252321a   esp = 0x006eec28   ebp = 0x006eeca0
14:07:27     INFO -      Found by: call frame info
14:07:27     INFO -   7  xul.dll!mozilla::ipc::MessageChannel::Call(IPC::Message *,IPC::Message *) [MessageChannel.cpp:8ab821223047 : 557 + 0x4]
14:07:27     INFO -      eip = 0x725239ac   esp = 0x006eeca8   ebp = 0x006eecb4
14:07:27     INFO -      Found by: call frame info
14:07:27     INFO -   8  xul.dll!mozilla::plugins::PPluginInstanceParent::CallNPP_Destroy(short *) [PPluginInstanceParent.cpp:8ab821223047 : 748 + 0xf]
14:07:27     INFO -      eip = 0x72598ca9   esp = 0x006eecbc   ebp = 0x006eecf8
14:07:27     INFO -      Found by: call frame info
14:07:27     INFO -   9  xul.dll!mozilla::plugins::PluginInstanceParent::Destroy() [PluginInstanceParent.cpp:8ab821223047 : 149 + 0xa]
14:07:27     INFO -      eip = 0x729aa983   esp = 0x006eed00   ebp = 0x006eed0c
14:07:27     INFO -      Found by: call frame info
14:07:27     INFO -  10  xul.dll!mozilla::plugins::PluginModuleParent::NPP_Destroy(_NPP *,_NPSavedData * *) [PluginModuleParent.cpp:8ab821223047 : 879 + 0x6]
14:07:27     INFO -      eip = 0x729aed18   esp = 0x006eed14   ebp = 0x006eed20
14:07:27     INFO -      Found by: call frame info
14:07:27     INFO -  11  xul.dll!nsNPAPIPluginInstance::Stop() [nsNPAPIPluginInstance.cpp:8ab821223047 : 327 + 0x12]
14:07:27     INFO -      eip = 0x7298f639   esp = 0x006eed28   ebp = 0x006eed68
14:07:27     INFO -      Found by: call frame info
14:07:27     INFO -  12  xul.dll!nsPluginHost::StopPluginInstance(nsNPAPIPluginInstance *) [nsPluginHost.cpp:8ab821223047 : 2949 + 0x9]
14:07:27     INFO -      eip = 0x729946c3   esp = 0x006eed70   ebp = 0x006eedac
14:07:27     INFO -      Found by: call frame info
14:07:27     INFO -  13  xul.dll!nsObjectLoadingContent::StopPluginInstance() [nsObjectLoadingContent.cpp:8ab821223047 : 2855 + 0x58]
14:07:27     INFO -      eip = 0x72b9f273   esp = 0x006eedb4   ebp = 0x006eedcc
14:07:27     INFO -      Found by: call frame info
14:07:27     INFO -  14  xul.dll!FreezeElement [nsPresShell.cpp:8ab821223047 : 7694 + 0x5]
14:07:27     INFO -      eip = 0x72e6fd34   esp = 0x006eedd4   ebp = 0x006eeddc
14:07:27     INFO -      Found by: call frame info
14:07:27     INFO -  15  xul.dll!EnumWalkAllRules [nsBindingManager.cpp:8ab821223047 : 856 + 0xd]
14:07:27     INFO -      eip = 0x72d6ac44   esp = 0x006eede4   ebp = 0x006eedec
14:07:27     INFO -      Found by: call frame info
Per those calls:
  PluginModuleParent::ShouldContinueFromReplyTimeout()
  MessageChannel::ShouldContinueFromTimeout()

... it is a plugin hang while waiting for the plugin to shut down (CallNPP_Destroy(), StopPluginInstance(), ...).
With nothing else in the log jumping out on me at first glance it looks like the test plugin took too long to shut down here - that would mean it exceeded the 45sec timeout though.
Hm, the above have all timed out here, just waiting for the childs reply:
http://hg.mozilla.org/mozilla-central/file/940c52e93281/ipc/glue/MessageChannel.cpp#l624

The childs stack is:
* gfx::SharedDIBWin::SetupSurface(HDC__ *,BITMAPV4HEADER *) [SharedDIBWin.cpp:940c52e93281 : 130 + 0x1f]
* gfx::SharedDIBWin::Create(HDC__ *,unsigned int,unsigned int,bool) [SharedDIBWin.cpp:940c52e93281 : 65 + 0xb]
* gfx::SharedDIBSurface::Create(HDC__ *,unsigned int,unsigned int,bool) [SharedDIBSurface.cpp:940c52e93281 : 21 + 0x15]
* PluginInstanceChild::CreateOptSurface() [PluginInstanceChild.cpp:940c52e93281 : 2825 + 0x12]
* PluginInstanceChild::EnsureCurrentBuffer() [PluginInstanceChild.cpp:940c52e93281 : 2938 + 0x6]
* PluginInstanceChild::ShowPluginFrame() [PluginInstanceChild.cpp:940c52e93281 : 3337 + 0x6]
* PluginInstanceChild::InvalidateRectDelayed() [PluginInstanceChild.cpp:940c52e93281 : 3588 + 0x6]
* MessageLoop::RunTask(Task *)

I wonder if something can go wrong with setting up surfaces when we're tearing down the plugin.
seems same problem for 645951-1.html
https://tbpl.mozilla.org/php/getParsedLog.php?id=34860083&tree=Fx-Team

Woah, sure started hit these a lot today...
Flags: needinfo?(benjamin)
This is an unexpected plugin hang.

Browser stack:
* event loop...
* nsDocument::UnblockOnload
* nsDocLoader::doStopDocumentLoad...
* PresShell::UnsuppressPainting...
* nsDocumentViewer::Show
* nsDocumentViewer::Destroy()
* PresShell::Freeze()
* nsIDocument::EnumerateFreezableElements...
* nsObjectLoadingContent::StopPluginInstance
* nsObjectLoadingContent::DoStopPlugin...
* mozilla::plugins::PPluginInstanceParent::CallNPP_Destroy...
* IPC hung

Plugin stack:
* event loop...
* mozilla::plugins::PluginInstanceChild::InvalidateRectDelayed
* mozilla::plugins::PluginInstanceChild::ShowPluginFrame()
* mozilla::plugins::PluginInstanceChild::EnsureCurrentBuffer()
* mozilla::plugins::PluginInstanceChild::CreateOptSurface()
* mozilla::gfx::SharedDIBSurface::Create(HDC__ *,unsigned int,unsigned int,bool)
* mozilla::gfx::SharedDIBWin::Create(HDC__ *,unsigned int,unsigned int,bool)
* mozilla::gfx::SharedDIBWin::SetupSurface(HDC__ *,BITMAPV4HEADER *)
* unknown GDI and ntdll stuff

I don't have a clear indication of IPC deadlock or infinite looping or anything abnormal in those stacks, so I'm not sure I have many more ideas here. Are the slaves overloaded to the point where normal testing operations might hit a 45-second timeout?

Ryan do you have a regression range link I could look through?
Flags: needinfo?(benjamin) → needinfo?(ryanvm)
First things first - I can't help but notice that all of these instances today are on PGO builds (as they were in most of the prior instances as well). For whatever that's worth.

https://tbpl.mozilla.org/php/getParsedLog.php?id=34820405&tree=Mozilla-Inbound (https://tbpl.mozilla.org/?rev=5775ed3239fa&tree=Mozilla-Inbound) appears to the earliest instance (and appearances on other trees do seem to be after this was merged to them). Retriggers on the PGO run prior to that are green (unfortunately, there was another one in the middle of that range that was canceled, oh well I guess). So going off that, here's the range between the last good PGO run and the failing one:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=4f9f58d41eac&tochange=5775ed3239fa

Maybe something from evilpies' push?
Flags: needinfo?(ryanvm)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #21)
> Are the slaves overloaded to the point where normal testing operations
> might hit a 45-second timeout?

And this seems doubtful given that it does reproduce on rev 5775ed3239fa and not on rev 4f9f58d41eac in the range above.
Nothing obvious steps out :-(

Some fairly unlikely possibilities:

9945bc3b7d36	Masatoshi Kimura — Bug 969918 - Use decltype to declare pointers for dynamic-loaded functions. r=jimm sr=roc
Only if there were a bug in the code which caused us to stop importing some important function.

43e799fdb80b	Masayuki Nakano — Bug 965685 Use IMENotification struct for nsIWidget::NotifyIME() and merge NotifyIMEOfTextChange() with it r=smaug, sr=roc
Mainly because IME is known to do awful synchronous API calls on Windows. But the tests shouldn't be using IME and the hang doesn't point to widget-layer stuff.

47b135232731	Timothy Nikkel — Bug 970250. Add the ability to set a display port on scroll frames that aren't the root scroll frame to reftests. r=dbaron
But that seems exceedingly unlikely.

I'm a bit stumped.
I can reproduce on rev 43e799fdb80b, so triggering more further back.
Updated regression range:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=9a268a4c902b&tochange=2dd72beeb8f2

Rev 2dd72beeb8f2 is confirmed crashing. Running PGO on rev right now 9a268a4c902b.
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #35)
> Rev 2dd72beeb8f2 is confirmed crashing. Running PGO on rev right now
> 9a268a4c902b.

That second rev should have been aa631f68a4fe (bug 972099), sorry.
Rev aa631f68a4fe is showing the issue too, so it appears that this is a regression from bug 972099.
Blocks: 972099
Backing out bug 972099 per Ben's recommendation on IRC.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #21)
> This is an unexpected plugin hang.
> ...

We've been under the impression that Windows builds were running with color management system "force sRGB is on" preference setting.  Turns out, there was a period between starting Firefox and that option being set that anybody checking the preference or dealing with images, etc. was running with the default "force sRGB is off".
With bug 972099, that "force sRGB on" is now set from the very beginning.

Knowing absolutely nothing about plug-ins or not much else, but seeing an image comparison up there, my guess is that this somehow has an effect. I'll push to try a simple change to the default settings of gfx.color_management.force_srgb set to true, as that's basically what bug 972099 was doing, but for reftests only.

Any of this make any sense?
Flags: needinfo?(benjamin)
gfx::SharedDIBWin::SetupSurface does call ::CreateDIBSection, which does pay attention to the color management stuff.  Forgot to mention above that the existing code calls ShutdownCMS right at the beginning, as the preference is changed from default to, well, not, which releases the QCMS profile, never to come back.  With the change in bug 972099, that release does not happen until the gfxPlatform::Shutdown.  Seems  related - the patch in bug 972099 changes the shutdown behavior for color management, we hang in shutdown in a function that is affected by color management.
The problem is that the modified shutdown behavior is the correct one.  Benoit, any thoughts?
Flags: needinfo?(bgirard)
I suppose PGO/LTO doing something funny would also fit that pattern then too?
The gfx::SharedDIBWin::SetupSurface call in question is not running in the Firefox process, it's running in the plugin process, which doesn't have any access to preferences (since preferences are an XPCOM construct).

I suspect that there is some kind of mismatch between the processes causing the issue, but I don't actually know what would cause a hang here.
Flags: needinfo?(benjamin)
Right, it isn't so much the preference, it's the fact that the preference now has a different default, so isn't modified after the initial value, so QCMS shutdown does more stuff than it previously did.
I don't think that shutdown is related to this at all. The problem seems to be that CreateDIBSection is now hanging because the CMS value changed.
OK, you're saying not related to shutdown of CMS, but to CreateDIBSection being called with the existing QCMS output profile, where before it wasn't there? Anyway, I'll leave it to :BenWa to comment.
Bas, BenWa, I'll leave the needinfo on you so that we don't forget, but this should wait until 31.
Closing bugs where TBPLbot has previously commented, but have now not been modified for >3 months & do not contain the whiteboard strings for disabled/annotated tests or use the keyword leave-open. Filter on: mass-intermittent-bug-closure-2014-07
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
Flags: needinfo?(bgirard)
Flags: needinfo?(bas)
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.