Closed Bug 1268332 Opened 9 years ago Closed 9 years ago

Frequent WinXP debug e10s test_zmedia_cleanup.html | application crashed [@ RtlpWaitForCriticalSection + 0x5b] on Ash

Categories

(Core :: IPC, defect)

Unspecified
Windows XP
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla49
Tracking Status
e10s + ---
firefox47 --- unaffected
firefox48 --- fixed
firefox49 --- fixed
firefox-esr45 --- unaffected

People

(Reporter: RyanVM, Assigned: jya)

References

(Blocks 1 open bug)

Details

(4 keywords, Whiteboard: [e10s-orangeblockers] fixed by bug 1264694)

+++ This bug was initially created as a clone of Bug #1264694 +++ This has a separate regression range from bug 1264694. AFAICT, going off Try bisection, this first started happening around the time that bug 1262898 landed. https://treeherder.mozilla.org/logviewer.html#?job_id=171856&repo=ash 09:55:13 WARNING - TEST-UNEXPECTED-FAIL | dom/media/tests/mochitest/test_zmedia_cleanup.html | application terminated with exit code 1 09:55:13 INFO - runtests.py | Application ran for: 0:04:26.487000 09:55:13 INFO - zombiecheck | Reading PID log: c:\docume~1\cltbld~1.t-x\locals~1\temp\tmpcs8jpkpidlog 09:55:13 INFO - ==> process 3840 launched child process 1380 ("C:\slave\test\build\application\firefox\plugin-container.exe" --channel="3840.0.1410468376\1186251921" -greomni "C:\slave\test\build\application\firefox\omni.ja" -appomni "C:\slave\test\build\application\firefox\browser\omni.ja" -sandbox -appdir "C:\slave\test\build\application\firefox\browser" 3840 "\\.\pipe\gecko-crash-server-pipe.3840" tab) 09:55:13 INFO - ==> process 3840 launched child process 2416 ("C:\slave\test\build\application\firefox\plugin-container.exe" --channel="3840.5.994893901\2116887052" -greomni "C:\slave\test\build\application\firefox\omni.ja" -appomni "C:\slave\test\build\application\firefox\browser\omni.ja" -sandbox -appdir "C:\slave\test\build\application\firefox\browser" 3840 "\\.\pipe\gecko-crash-server-pipe.3840" tab) 09:55:13 INFO - ==> process 3840 launched child process 792 ("C:\slave\test\build\application\firefox\plugin-container.exe" --channel="3840.11.710806168\1852140024" "c:\docume~1\cltbld~1.t-x\locals~1\temp\tmpms4re_.mozrunner\plugins\gmp-fakeopenh264\1.0" "C:\slave\test\build\application\firefox\voucher.bin" -greomni "C:\slave\test\build\application\firefox\omni.ja" -appomni "C:\slave\test\build\application\firefox\browser\omni.ja" -sandbox -appdir "C:\slave\test\build\application\firefox\browser" 3840 "\\.\pipe\gecko-crash-server-pipe.3840" geckomediaplugin) 09:55:13 INFO - ==> process 3840 launched child process 2588 ("C:\slave\test\build\application\firefox\plugin-container.exe" --channel="3840.13.1497891729\709073335" "c:\docume~1\cltbld~1.t-x\locals~1\temp\tmpms4re_.mozrunner\plugins\gmp-fakeopenh264\1.0" "C:\slave\test\build\application\firefox\voucher.bin" -greomni "C:\slave\test\build\application\firefox\omni.ja" -appomni "C:\slave\test\build\application\firefox\browser\omni.ja" -sandbox -appdir "C:\slave\test\build\application\firefox\browser" 3840 "\\.\pipe\gecko-crash-server-pipe.3840" geckomediaplugin) 09:55:13 INFO - zombiecheck | Checking for orphan process with PID: 1380 09:55:13 INFO - zombiecheck | Checking for orphan process with PID: 2416 09:55:13 INFO - zombiecheck | Checking for orphan process with PID: 792 09:55:13 INFO - zombiecheck | Checking for orphan process with PID: 2588 09:55:13 INFO - mozcrash Copy/paste: C:\slave\test\build\win32-minidump_stackwalk.exe c:\docume~1\cltbld~1.t-x\locals~1\temp\tmpms4re_.mozrunner\minidumps\0d8c5de2-b2d3-40ee-936c-054ffa13542e.dmp C:\slave\test\build\symbols 09:55:31 INFO - mozcrash Saved minidump as C:\slave\test\build\blobber_upload_dir\0d8c5de2-b2d3-40ee-936c-054ffa13542e.dmp 09:55:31 INFO - mozcrash Saved app info as C:\slave\test\build\blobber_upload_dir\0d8c5de2-b2d3-40ee-936c-054ffa13542e.extra 09:55:31 WARNING - PROCESS-CRASH | dom/media/tests/mochitest/test_zmedia_cleanup.html | application crashed [@ RtlpWaitForCriticalSection + 0x5b] 09:55:31 INFO - Crash dump filename: c:\docume~1\cltbld~1.t-x\locals~1\temp\tmpms4re_.mozrunner\minidumps\0d8c5de2-b2d3-40ee-936c-054ffa13542e.dmp 09:55:31 INFO - Operating system: Windows NT 09:55:31 INFO - 5.1.2600 Service Pack 3 09:55:31 INFO - CPU: x86 09:55:31 INFO - GenuineIntel family 6 model 30 stepping 5 09:55:31 INFO - 8 CPUs 09:55:31 INFO - Crash reason: EXCEPTION_ACCESS_VIOLATION_WRITE 09:55:31 INFO - Crash address: 0xffffffffe5e5e5f5 09:55:31 INFO - Process uptime: 267 seconds 09:55:31 INFO - Thread 3 (crashed) 09:55:31 INFO - 0 ntdll.dll!RtlpWaitForCriticalSection + 0x5b 09:55:31 INFO - eip = 0x7c91b1fa esp = 0x01affc70 ebp = 0x01affce4 ebx = 0x00000000 09:55:31 INFO - esi = 0x138ce86c edi = 0x00000000 eax = 0xe5e5e5e5 ecx = 0x00000000 09:55:31 INFO - edx = 0x138ce86c efl = 0x00010282 09:55:31 INFO - Found by: given as instruction pointer in context 09:55:31 INFO - 1 ntdll.dll!RtlEnterCriticalSection + 0x46 09:55:31 INFO - eip = 0x7c901046 esp = 0x01affcec ebp = 0x01affd14 ebx = 0x138ce86c 09:55:31 INFO - Found by: call frame info 09:55:31 INFO - 2 xul.dll!LockImpl::Lock() [lock_impl_win.cc:ab0044bfa1df : 45 + 0x7] 09:55:31 INFO - eip = 0x03733cb7 esp = 0x01affcf4 ebp = 0x01affd14 09:55:31 INFO - Found by: call frame info 09:55:31 INFO - 3 xul.dll!MessageLoop::PostTask_Helper(tracked_objects::Location const &,Task *,int) [message_loop.cc:ab0044bfa1df : 311 + 0x7] 09:55:31 INFO - eip = 0x037462c9 esp = 0x01affd1c ebp = 0x01affd50 09:55:31 INFO - Found by: call frame info 09:55:31 INFO - 4 xul.dll!MessageLoop::PostTask(tracked_objects::Location const &,Task *) [message_loop.cc:ab0044bfa1df : 263 + 0xd] 09:55:31 INFO - eip = 0x03746230 esp = 0x01affd58 ebp = 0x01affd64 09:55:31 INFO - Found by: call frame info 09:55:31 INFO - 5 xul.dll!mozilla::ipc::MessageChannel::PostErrorNotifyTask() [MessageChannel.cpp:ab0044bfa1df : 2109 + 0x25] 09:55:31 INFO - eip = 0x03773541 esp = 0x01affd6c ebp = 0x01affd84 09:55:31 INFO - Found by: call frame info 09:55:31 INFO - 6 xul.dll!mozilla::ipc::ProcessLink::OnChannelError() [MessageLink.cpp:ab0044bfa1df : 428 + 0x8] 09:55:31 INFO - eip = 0x0377220b esp = 0x01affd8c ebp = 0x01affdc8 09:55:31 INFO - Found by: call frame info 09:55:31 INFO - 7 xul.dll!IPC::Channel::ChannelImpl::OnIOCompleted(base::MessagePumpForIO::IOContext *,unsigned long,unsigned long) [ipc_channel_win.cc:ab0044bfa1df : 537 + 0x8] 09:55:31 INFO - eip = 0x03739995 esp = 0x01affd9c ebp = 0x01affdc8 09:55:31 INFO - Found by: call frame info 09:55:31 INFO - 8 xul.dll!base::MessagePumpForIO::WaitForIOCompletion(unsigned long,base::MessagePumpForIO::IOHandler *) [message_pump_win.cc:ab0044bfa1df : 492 + 0xc] 09:55:31 INFO - eip = 0x03734ed6 esp = 0x01affdd0 ebp = 0x01affe00 09:55:31 INFO - Found by: call frame info 09:55:31 INFO - 9 xul.dll!base::MessagePumpForIO::DoRunLoop() [message_pump_win.cc:ab0044bfa1df : 438 + 0xb] 09:55:31 INFO - eip = 0x037345f7 esp = 0x01affe08 ebp = 0x01affe34 09:55:31 INFO - Found by: call frame info 09:55:31 INFO - 10 xul.dll!base::MessagePumpWin::RunWithDispatcher(base::MessagePump::Delegate *,base::MessagePumpWin::Dispatcher *) [message_pump_win.cc:ab0044bfa1df : 54 + 0x5] 09:55:31 INFO - eip = 0x03734d3a esp = 0x01affe1c ebp = 0x01affe34 09:55:31 INFO - Found by: call frame info 09:55:31 INFO - 11 xul.dll!base::MessagePumpWin::Run(base::MessagePump::Delegate *) [message_pump_win.h:ab0044bfa1df : 78 + 0xd] 09:55:31 INFO - eip = 0x03734cfa esp = 0x01affe3c ebp = 0x01affe44 09:55:31 INFO - Found by: call frame info 09:55:31 INFO - 12 xul.dll!MessageLoop::RunInternal() [message_loop.cc:ab0044bfa1df : 230 + 0xf] 09:55:31 INFO - eip = 0x03747c05 esp = 0x01affe4c ebp = 0x01affe64 09:55:31 INFO - Found by: call frame info 09:55:31 INFO - 13 xul.dll!MessageLoop::RunHandler() [message_loop.cc:ab0044bfa1df : 223 + 0x5] 09:55:31 INFO - eip = 0x03747bbd esp = 0x01affe6c ebp = 0x01affe98 09:55:31 INFO - Found by: call frame info 09:55:31 INFO - 14 xul.dll!MessageLoop::Run() [message_loop.cc:ab0044bfa1df : 203 + 0x7] 09:55:31 INFO - eip = 0x03747916 esp = 0x01affea0 ebp = 0x01affeb8 09:55:31 INFO - Found by: call frame info 09:55:31 INFO - 15 xul.dll!base::Thread::ThreadMain() [thread.cc:ab0044bfa1df : 174 + 0xb] 09:55:31 INFO - eip = 0x03748741 esp = 0x01affec0 ebp = 0x01afffac 09:55:31 INFO - Found by: call frame info 09:55:31 INFO - 16 xul.dll!`anonymous namespace'::ThreadFunc [platform_thread_win.cc:ab0044bfa1df : 26 + 0x8] 09:55:31 INFO - eip = 0x03735ab1 esp = 0x01afffb4 ebp = 0x01afffb4 09:55:31 INFO - Found by: call frame info 09:55:31 INFO - 17 kernel32.dll!BaseThreadStart + 0x37 09:55:31 INFO - eip = 0x7c80b713 esp = 0x01afffbc ebp = 0x01afffec 09:55:31 INFO - Found by: call frame info
(In reply to Ryan VanderMeulen [:RyanVM] from comment #0) > +++ This bug was initially created as a clone of Bug #1264694 +++ > > This has a separate regression range from bug 1264694. AFAICT, going off Try > bisection, this first started happening around the time that bug 1262898 > landed. > > https://treeherder.mozilla.org/logviewer.html#?job_id=171856&repo=ash > > 09:55:13 WARNING - TEST-UNEXPECTED-FAIL | > dom/media/tests/mochitest/test_zmedia_cleanup.html | application terminated > with exit code 1 > 09:55:13 INFO - runtests.py | Application ran for: 0:04:26.487000 > 09:55:13 INFO - zombiecheck | Reading PID log: > c:\docume~1\cltbld~1.t-x\locals~1\temp\tmpcs8jpkpidlog > 09:55:13 INFO - ==> process 3840 launched child process 1380 > ("C:\slave\test\build\application\firefox\plugin-container.exe" > --channel="3840.0.1410468376\1186251921" -greomni > "C:\slave\test\build\application\firefox\omni.ja" -appomni > "C:\slave\test\build\application\firefox\browser\omni.ja" -sandbox -appdir > "C:\slave\test\build\application\firefox\browser" 3840 > "\\.\pipe\gecko-crash-server-pipe.3840" tab) > 09:55:13 INFO - ==> process 3840 launched child process 2416 > ("C:\slave\test\build\application\firefox\plugin-container.exe" > --channel="3840.5.994893901\2116887052" -greomni > "C:\slave\test\build\application\firefox\omni.ja" -appomni > "C:\slave\test\build\application\firefox\browser\omni.ja" -sandbox -appdir > "C:\slave\test\build\application\firefox\browser" 3840 > "\\.\pipe\gecko-crash-server-pipe.3840" tab) > 09:55:13 INFO - ==> process 3840 launched child process 792 > ("C:\slave\test\build\application\firefox\plugin-container.exe" > --channel="3840.11.710806168\1852140024" > "c:\docume~1\cltbld~1.t-x\locals~1\temp\tmpms4re_.mozrunner\plugins\gmp- > fakeopenh264\1.0" "C:\slave\test\build\application\firefox\voucher.bin" > -greomni "C:\slave\test\build\application\firefox\omni.ja" -appomni > "C:\slave\test\build\application\firefox\browser\omni.ja" -sandbox -appdir > "C:\slave\test\build\application\firefox\browser" 3840 > "\\.\pipe\gecko-crash-server-pipe.3840" geckomediaplugin) > 09:55:13 INFO - ==> process 3840 launched child process 2588 > ("C:\slave\test\build\application\firefox\plugin-container.exe" > --channel="3840.13.1497891729\709073335" > "c:\docume~1\cltbld~1.t-x\locals~1\temp\tmpms4re_.mozrunner\plugins\gmp- > fakeopenh264\1.0" "C:\slave\test\build\application\firefox\voucher.bin" > -greomni "C:\slave\test\build\application\firefox\omni.ja" -appomni > "C:\slave\test\build\application\firefox\browser\omni.ja" -sandbox -appdir > "C:\slave\test\build\application\firefox\browser" 3840 > "\\.\pipe\gecko-crash-server-pipe.3840" geckomediaplugin) > 09:55:13 INFO - zombiecheck | Checking for orphan process with PID: 1380 > 09:55:13 INFO - zombiecheck | Checking for orphan process with PID: 2416 > 09:55:13 INFO - zombiecheck | Checking for orphan process with PID: 792 > 09:55:13 INFO - zombiecheck | Checking for orphan process with PID: 2588 > 09:55:13 INFO - mozcrash Copy/paste: > C:\slave\test\build\win32-minidump_stackwalk.exe > c:\docume~1\cltbld~1.t-x\locals~1\temp\tmpms4re_. > mozrunner\minidumps\0d8c5de2-b2d3-40ee-936c-054ffa13542e.dmp > C:\slave\test\build\symbols > 09:55:31 INFO - mozcrash Saved minidump as > C:\slave\test\build\blobber_upload_dir\0d8c5de2-b2d3-40ee-936c-054ffa13542e. > dmp > 09:55:31 INFO - mozcrash Saved app info as > C:\slave\test\build\blobber_upload_dir\0d8c5de2-b2d3-40ee-936c-054ffa13542e. > extra > 09:55:31 WARNING - PROCESS-CRASH | > dom/media/tests/mochitest/test_zmedia_cleanup.html | application crashed [@ > RtlpWaitForCriticalSection + 0x5b] > 09:55:31 INFO - Crash dump filename: > c:\docume~1\cltbld~1.t-x\locals~1\temp\tmpms4re_. > mozrunner\minidumps\0d8c5de2-b2d3-40ee-936c-054ffa13542e.dmp > 09:55:31 INFO - Operating system: Windows NT > 09:55:31 INFO - 5.1.2600 Service Pack 3 > 09:55:31 INFO - CPU: x86 > 09:55:31 INFO - GenuineIntel family 6 model 30 stepping 5 > 09:55:31 INFO - 8 CPUs > 09:55:31 INFO - Crash reason: EXCEPTION_ACCESS_VIOLATION_WRITE > 09:55:31 INFO - Crash address: 0xffffffffe5e5e5f5 > 09:55:31 INFO - Process uptime: 267 seconds > 09:55:31 INFO - Thread 3 (crashed) > 09:55:31 INFO - 0 ntdll.dll!RtlpWaitForCriticalSection + 0x5b > 09:55:31 INFO - eip = 0x7c91b1fa esp = 0x01affc70 ebp = > 0x01affce4 ebx = 0x00000000 > 09:55:31 INFO - esi = 0x138ce86c edi = 0x00000000 eax = > 0xe5e5e5e5 ecx = 0x00000000 > 09:55:31 INFO - edx = 0x138ce86c efl = 0x00010282 > 09:55:31 INFO - Found by: given as instruction pointer in context > 09:55:31 INFO - 1 ntdll.dll!RtlEnterCriticalSection + 0x46 > 09:55:31 INFO - eip = 0x7c901046 esp = 0x01affcec ebp = > 0x01affd14 ebx = 0x138ce86c > 09:55:31 INFO - Found by: call frame info > 09:55:31 INFO - 2 xul.dll!LockImpl::Lock() > [lock_impl_win.cc:ab0044bfa1df : 45 + 0x7] > 09:55:31 INFO - eip = 0x03733cb7 esp = 0x01affcf4 ebp = > 0x01affd14 > 09:55:31 INFO - Found by: call frame info > 09:55:31 INFO - 3 > xul.dll!MessageLoop::PostTask_Helper(tracked_objects::Location const &,Task > *,int) [message_loop.cc:ab0044bfa1df : 311 + 0x7] > 09:55:31 INFO - eip = 0x037462c9 esp = 0x01affd1c ebp = > 0x01affd50 > 09:55:31 INFO - Found by: call frame info > 09:55:31 INFO - 4 > xul.dll!MessageLoop::PostTask(tracked_objects::Location const &,Task *) > [message_loop.cc:ab0044bfa1df : 263 + 0xd] > 09:55:31 INFO - eip = 0x03746230 esp = 0x01affd58 ebp = > 0x01affd64 > 09:55:31 INFO - Found by: call frame info > 09:55:31 INFO - 5 > xul.dll!mozilla::ipc::MessageChannel::PostErrorNotifyTask() > [MessageChannel.cpp:ab0044bfa1df : 2109 + 0x25] > 09:55:31 INFO - eip = 0x03773541 esp = 0x01affd6c ebp = > 0x01affd84 > 09:55:31 INFO - Found by: call frame info > 09:55:31 INFO - 6 xul.dll!mozilla::ipc::ProcessLink::OnChannelError() > [MessageLink.cpp:ab0044bfa1df : 428 + 0x8] > 09:55:31 INFO - eip = 0x0377220b esp = 0x01affd8c ebp = > 0x01affdc8 > 09:55:31 INFO - Found by: call frame info > 09:55:31 INFO - 7 > xul.dll!IPC::Channel::ChannelImpl::OnIOCompleted(base::MessagePumpForIO:: > IOContext *,unsigned long,unsigned long) [ipc_channel_win.cc:ab0044bfa1df : > 537 + 0x8] > 09:55:31 INFO - eip = 0x03739995 esp = 0x01affd9c ebp = > 0x01affdc8 > 09:55:31 INFO - Found by: call frame info > 09:55:31 INFO - 8 > xul.dll!base::MessagePumpForIO::WaitForIOCompletion(unsigned > long,base::MessagePumpForIO::IOHandler *) [message_pump_win.cc:ab0044bfa1df > : 492 + 0xc] > 09:55:31 INFO - eip = 0x03734ed6 esp = 0x01affdd0 ebp = > 0x01affe00 > 09:55:31 INFO - Found by: call frame info > 09:55:31 INFO - 9 xul.dll!base::MessagePumpForIO::DoRunLoop() > [message_pump_win.cc:ab0044bfa1df : 438 + 0xb] > 09:55:31 INFO - eip = 0x037345f7 esp = 0x01affe08 ebp = > 0x01affe34 > 09:55:31 INFO - Found by: call frame info > 09:55:31 INFO - 10 > xul.dll!base::MessagePumpWin::RunWithDispatcher(base::MessagePump::Delegate > *,base::MessagePumpWin::Dispatcher *) [message_pump_win.cc:ab0044bfa1df : 54 > + 0x5] > 09:55:31 INFO - eip = 0x03734d3a esp = 0x01affe1c ebp = > 0x01affe34 > 09:55:31 INFO - Found by: call frame info > 09:55:31 INFO - 11 > xul.dll!base::MessagePumpWin::Run(base::MessagePump::Delegate *) > [message_pump_win.h:ab0044bfa1df : 78 + 0xd] > 09:55:31 INFO - eip = 0x03734cfa esp = 0x01affe3c ebp = > 0x01affe44 > 09:55:31 INFO - Found by: call frame info > 09:55:31 INFO - 12 xul.dll!MessageLoop::RunInternal() > [message_loop.cc:ab0044bfa1df : 230 + 0xf] > 09:55:31 INFO - eip = 0x03747c05 esp = 0x01affe4c ebp = > 0x01affe64 > 09:55:31 INFO - Found by: call frame info > 09:55:31 INFO - 13 xul.dll!MessageLoop::RunHandler() > [message_loop.cc:ab0044bfa1df : 223 + 0x5] > 09:55:31 INFO - eip = 0x03747bbd esp = 0x01affe6c ebp = > 0x01affe98 > 09:55:31 INFO - Found by: call frame info > 09:55:31 INFO - 14 xul.dll!MessageLoop::Run() > [message_loop.cc:ab0044bfa1df : 203 + 0x7] > 09:55:31 INFO - eip = 0x03747916 esp = 0x01affea0 ebp = > 0x01affeb8 > 09:55:31 INFO - Found by: call frame info > 09:55:31 INFO - 15 xul.dll!base::Thread::ThreadMain() > [thread.cc:ab0044bfa1df : 174 + 0xb] > 09:55:31 INFO - eip = 0x03748741 esp = 0x01affec0 ebp = > 0x01afffac > 09:55:31 INFO - Found by: call frame info > 09:55:31 INFO - 16 xul.dll!`anonymous namespace'::ThreadFunc > [platform_thread_win.cc:ab0044bfa1df : 26 + 0x8] > 09:55:31 INFO - eip = 0x03735ab1 esp = 0x01afffb4 ebp = > 0x01afffb4 > 09:55:31 INFO - Found by: call frame info > 09:55:31 INFO - 17 kernel32.dll!BaseThreadStart + 0x37 > 09:55:31 INFO - eip = 0x7c80b713 esp = 0x01afffbc ebp = > 0x01afffec > 09:55:31 INFO - Found by: call frame info I'll have a look at this. I suspect the reason this started happening after bug 1262898 is that we simply weren't cleaning up causing stuff to leak rather than the actual crash having anything directly to do with this.
Hrm, I cannot reproduce this issue on central. And it seems on ash I can't seem to run mochitest locally at all.
Ash is straight-up mozilla-central. It just has e10s tests enabled on platforms we can't run in production due to capacity constraints. Note that e10s is enabled by default now for local mochitest runs, so just |./mach mochitest dom/media/tests/mochitest| should give you what you want.
(In reply to Ryan VanderMeulen [:RyanVM] from comment #3) > Ash is straight-up mozilla-central. It just has e10s tests enabled on > platforms we can't run in production due to capacity constraints. Note that > e10s is enabled by default now for local mochitest runs, so just |./mach > mochitest dom/media/tests/mochitest| should give you what you want. Yeah, I did run the e10s tests on m-c. I've tried fiddling with prefs to make things more like they would be on XP. Matt.. which prefs would I have to switch in media land to get as close as possible to an XP environment?
Flags: needinfo?(matt.woodrow)
Chris would be a better person to answer this. I don't think XP supports WMF at all, so media.wmf.enabled=false.
Flags: needinfo?(matt.woodrow) → needinfo?(cpearce)
(In reply to Matt Woodrow (:mattwoodrow) from comment #5) > Chris would be a better person to answer this. > > I don't think XP supports WMF at all, so media.wmf.enabled=false. At least with that pref, still no luck reproducing anything.
Not sure I understand the rationale for P5 here. This is an e10s near-permafail on a Tier 1 supported platform on a job that would be running in production if we had enough test machines to actually do so. If we can't prioritize investigating this issue, I'm going to have to resort to bisecting and backing out whatever caused it.
Flags: needinfo?(ajones)
Whiteboard: [e10s-orangeblockers]
Chris - can you take a look at this one?
Flags: needinfo?(ajones)
Priority: P5 → P2
(In reply to Ryan VanderMeulen [:RyanVM] from comment #8) > Not sure I understand the rationale for P5 here. This is an e10s > near-permafail on a Tier 1 supported platform on a job that would be running > in production if we had enough test machines to actually do so. If we can't > prioritize investigating this issue, I'm going to have to resort to > bisecting and backing out whatever caused it. That would make all the other tests just go back to leaking on way more than just one WinXP test as I believe it was my shutting down the child process properly that affected the timings in such a way that this started occurring. Having said that, I suspect it's narrowly related to bug 1264694 and I'm making progress there on finding offenders of these sort of issues.
(In reply to Bas Schouten (:bas.schouten) from comment #4) > Matt.. which prefs would I have to switch in media land to get as close as > possible to an XP environment? (In reply to Matt Woodrow (:mattwoodrow) from comment #5) > Chris would be a better person to answer this. > > I don't think XP supports WMF at all, so media.wmf.enabled=false. Setting media.wmf.enabled=false and ensure that media.gmp.decoder.enabled=false. Note: test_zmedia_cleanup.html is a WebRTC related test, not a media playback test.
Component: Audio/Video: Playback → WebRTC
Flags: needinfo?(cpearce)
Will let the WebRTC team re-prioritize this.
Flags: needinfo?(mreavy)
Priority: P2 → --
Wasn't it determined that bug 1264694 was the culprit and the hang would only show up here because this test is the last one to run?
(In reply to Jean-Yves Avenard [:jya] from comment #13) > Wasn't it determined that bug 1264694 was the culprit and the hang would > only show up here because this test is the last one to run? Dunno. The thread stacks looked different to me.
They are. This is a use after free issue with webrtc. this is not media/playback
So from looking up the code from the stack dump I don't see any WebRTC related code in there. This is "just" all IPC code trying to deliver some event and crashing while trying to acquire a mutex, I'm guessing while Firefox is shutting down. Yes this happens on WebRTC last mochitest. As test_zmedia_cleanup.html is a B2G specific test we could actually just removed/disable that test. But I think that would be only hiding a potential real IPC problem. Maire: can you help find an IPC expert who can take a look at this and decide if they want to investigate this, or if we should simply try to remove the test?
Ok. First, this is an e5e5 crash -> UAF -> sec issue. Second: IPC should never UAF, even if the code calling it does something stupid, so likely there's an IPC bug being tripped over here. Third: I strongly suspect given the regression-range from comment 0 that this is somehow related to the gfx and/or compositor IPC channels at shutdown/cleanup time (which is what's happening here; everything in shutdown gets tagged to the last test). Bas: what's the impact of bug 1262898? Jimm/Billm: any thoughts how that bug (or anything else) could trigger a UAF here in IPC? Ryan: Can we try this with bug 1262898 backed out just to verify that's the proximate cause, if this hasn't been done? Doing that on Try should be ok, especially if not clearly linked to a sec bug.
Group: media-core-security, core-security
Flags: needinfo?(wmccloskey)
Flags: needinfo?(ryanvm)
Flags: needinfo?(mreavy)
Flags: needinfo?(jmathies)
Flags: needinfo?(bas)
Component: WebRTC → IPC
Confirmed on Try that bug 1262898 is when this started. FWIW, bug 1271657 is currently masking this issue on Ash since we end up crashing before getting to the zmedia test. With any luck, a fix for that will land soon so we can see if bug 1264694 helps or not.
Flags: needinfo?(ryanvm)
(In reply to Randell Jesup [:jesup] from comment #17) > Ok. First, this is an e5e5 crash -> UAF -> sec issue. > > Second: IPC should never UAF, even if the code calling it does something > stupid, so likely there's an IPC bug being tripped over here. > > Third: I strongly suspect given the regression-range from comment 0 that > this is somehow related to the gfx and/or compositor IPC channels at > shutdown/cleanup time (which is what's happening here; everything in > shutdown gets tagged to the last test). > > Bas: what's the impact of bug 1262898? > Jimm/Billm: any thoughts how that bug (or anything else) could trigger a UAF > here in IPC? > Ryan: Can we try this with bug 1262898 backed out just to verify that's the > proximate cause, if this hasn't been done? Doing that on Try should be ok, > especially if not clearly linked to a sec bug. Before bug 1262898 on windows all IPC channels and such related to graphics would simply get leaked. (This is why most of our tests were failing on e10s windows, since they were leaking the world). It's hard to be certain that this is a bug -in- IPC code, but if it is, the fact that we clean it up now certainly makes it not surprising that this started occurring. It should also be noted -before- bug 1262898 I guess we didn't have any cross-process IPC that was off the main thread in the parent process ever being shut down properly. It's possible this is somehow related? I'll have a look at this as well with the patch for bug 1271657.
Flags: needinfo?(bas)
Group: core-security → dom-core-security
I ran a Try push on top of m-c tip (which includes bug 1264694) and also applied Jeff's WIP patch from bug 1271657, and XP debug M-e10s(mda) looks much happier now: https://treeherder.mozilla.org/#/jobs?repo=try&revision=2e92d85e56ea&group_state=expanded I can try to bisect the fix to confirm that it was bug 1264694 if people think it's worth the effort doing so. Or we can just dupe it over.
Usually this sort of error means that an IPC channel is still open while the actor or MessageLoop associated with it has been freed. We should deal with this better in the IPC code, but right now it's up to callers to do it correctly. I would have expected bug 1262898 to make this better rather than worse, but I guess it could have just changed the timing of something. > It should also be noted -before- bug 1262898 I guess we didn't have any cross-process IPC > that was off the main thread in the parent process ever being shut down properly. Well, PBackground, PProcessHangMonitor, and maybe some GMP stuff run off thread in the parent and shut down correctly as far as I know. I don't really understand why bug 1271657 would fix this.
Flags: needinfo?(wmccloskey)
Bug 1264694 is what most-likely fixed it. Bug 1271657 was just making it impossible to tell (see comment 18).
(In reply to Ryan VanderMeulen [:RyanVM] from comment #22) > Bug 1264694 is what most-likely fixed it. Bug 1271657 was just making it > impossible to tell (see comment 18). Yep, that's my theory, my investigation concurs, post bug 1271657 I cannot reproduce this problem.
Flags: needinfo?(jmathies)
Assignee: nobody → jyavenard
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
Group: media-core-security, dom-core-security → core-security-release
Whiteboard: [e10s-orangeblockers] → [e10s-orangeblockers] fixed by bug 1264694
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.