Closed Bug 1044245 Opened 10 years ago Closed 10 years ago

GMP with OpenH264 crashes on Windows.

Categories

(Core :: WebRTC: Audio/Video, defect, P1)

x86_64
Windows 8.1
defect

Tracking

()

VERIFIED FIXED
mozilla34
Tracking Status
firefox33 --- verified
firefox34 --- verified

People

(Reporter: ehugg, Assigned: jesup)

References

(Blocks 1 open bug)

Details

Attachments

(1 file, 1 obsolete file)

Testing OpenH264 works for a while on Windows and then crashes.  On Debug builds it hits an assert in the dtor of SyncStacFrame.  Here's a stack from a debug build:

 	KernelBase.dll!_DebugBreak@0()	Unknown
>	xul.dll!RealBreak() Line 504	C++
 	xul.dll!Break(const char * aMsg) Line 579	C++
 	xul.dll!NS_DebugBreak(unsigned int aSeverity, const char * aStr, const char * aExpr, const char * aFile, int aLine) Line 464	C++
 	xul.dll!mozilla::ipc::MessageChannel::SyncStackFrame::~SyncStackFrame() Line 660	C++
 	xul.dll!mozilla::ipc::MessageChannel::Send(IPC::Message * aMsg, IPC::Message * aReply) Line 593	C++
 	xul.dll!mozilla::layers::PLayerTransactionChild::SendUpdate(const nsTArray<mozilla::layers::Edit> & cset, const unsigned __int64 & id, const mozilla::layers::TargetConfig & targetConfig, const bool & isFirstPaint, const bool & scheduleComposite, const unsigned int & paintSequenceNumber, const bool & isRepeatTransaction, nsTArray<mozilla::layers::EditReply> * reply) Line 237	C++
 	xul.dll!mozilla::layers::ShadowLayerForwarder::EndTransaction(nsTArray<mozilla::layers::EditReply> * aReplies, const nsIntRegion & aRegionToClear, unsigned __int64 aId, bool aScheduleComposite, unsigned int aPaintSequenceNumber, bool aIsRepeatTransaction, bool * aSent) Line 586	C++
 	xul.dll!mozilla::layers::ClientLayerManager::ForwardTransaction(bool aScheduleComposite) Line 447	C++
 	xul.dll!mozilla::layers::ClientLayerManager::EndTransaction(void (mozilla::layers::ThebesLayer *, gfxContext *, const nsIntRegion &, mozilla::layers::DrawRegionClip, const nsIntRegion &, void *) * aCallback, void * aCallbackData, mozilla::layers::LayerManager::EndTransactionFlags aFlags) Line 240	C++
 	xul.dll!nsDisplayList::PaintForFrame(nsDisplayListBuilder * aBuilder, nsRenderingContext * aCtx, nsIFrame * aForFrame, unsigned int aFlags) Line 1306	C++
 	xul.dll!nsDisplayList::PaintRoot(nsDisplayListBuilder * aBuilder, nsRenderingContext * aCtx, unsigned int aFlags) Line 1159	C++
 	xul.dll!nsLayoutUtils::PaintFrame(nsRenderingContext * aRenderingContext, nsIFrame * aFrame, const nsRegion & aDirtyRegion, unsigned int aBackstop, unsigned int aFlags) Line 3063	C++
 	xul.dll!PresShell::Paint(nsView * aViewToPaint, const nsRegion & aDirtyRegion, unsigned int aFlags) Line 6236	C++
 	xul.dll!nsViewManager::ProcessPendingUpdatesPaint(nsIWidget * aWidget) Line 443	C++
 	xul.dll!nsViewManager::ProcessPendingUpdatesForView(nsView * aView, bool aFlushDirtyRegion) Line 386	C++
 	xul.dll!nsViewManager::ProcessPendingUpdates() Line 1076	C++
 	xul.dll!nsRefreshDriver::Tick(__int64 aNowEpoch, mozilla::TimeStamp aNowTime) Line 1280	C++
 	xul.dll!mozilla::RefreshDriverTimer::TickDriver(nsRefreshDriver * driver, __int64 jsnow, mozilla::TimeStamp now) Line 172	C++
 	xul.dll!mozilla::RefreshDriverTimer::Tick() Line 162	C++
 	xul.dll!mozilla::RefreshDriverTimer::TimerTick(nsITimer * aTimer, void * aClosure) Line 189	C++
 	xul.dll!nsTimerImpl::Fire() Line 618	C++
 	xul.dll!nsTimerEvent::Run() Line 716	C++
 	xul.dll!nsThread::ProcessNextEvent(bool aMayWait, bool * aResult) Line 770	C++
 	xul.dll!NS_ProcessNextEvent(nsIThread * aThread, bool aMayWait) Line 265	C++
 	xul.dll!mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate * aDelegate) Line 99	C++
 	xul.dll!MessageLoop::RunInternal() Line 230	C++
 	xul.dll!MessageLoop::RunHandler() Line 223	C++
 	xul.dll!MessageLoop::Run() Line 197	C++
 	xul.dll!nsBaseAppShell::Run() Line 166	C++
 	xul.dll!nsAppShell::Run() Line 191	C++
 	xul.dll!nsAppStartup::Run() Line 278	C++
 	xul.dll!XREMain::XRE_mainRun() Line 4013	C++
 	xul.dll!XREMain::XRE_main(int argc, char * * argv, const nsXREAppData * aAppData) Line 4084	C++
 	xul.dll!XRE_main(int argc, char * * argv, const nsXREAppData * aAppData, unsigned int aFlags) Line 4298	C++
 	firefox.exe!do_main(int argc, char * * argv, nsIFile * xreDirectory) Line 282	C++
 	firefox.exe!NS_internal_main(int argc, char * * argv) Line 643	C++
 	firefox.exe!wmain(int argc, wchar_t * * argv) Line 105	C++
 	[External Code]
Whiteboard: [openh264-uplift]
My crash IDs from an Aurora install on WIN7

e8a06803-78d1-493c-8072-a49e02140727
45a76d8d-cc9d-44bc-ad94-e31122140727
2f26baf3-10be-44ff-bb63-563c72140727
Here are a couple of crashes from today's Nightly downloaded fresh from nightly.mozilla.org and with a fresh profile:


https://crash-stats.mozilla.com/report/index/4176064a-9f57-4903-b6ae-4457a2140727
https://crash-stats.mozilla.com/report/index/5af60d76-1a17-43bd-a7e5-730582140727
Assignee: nobody → rjesup
This will get uplifted to Fx33 once we have a fix
Target Milestone: --- → mozilla34
Priority: -- → P1
Info from email discussion:

In a debug build on Windows:

[6260] ###!!! ASSERTION: Received "nonqueued" message 49456 during a
synchronous IPC message for window -1978793492 ("MozillaHiddenWindowClass"), sending it to DefWindowProc instead of the normal window procedure.: 'Error', file c:/mozilla/inbound2/ipc/glue/WindowsMessageLoop.cpp, line 186
[6260] ###!!! ASSERTION: Received "nonqueued" message 49456 during a synchronous IPC message for window 2688224 ("nsAppShell:EventWindowClass"), sending it to DefWindowProc instead of the normal window procedure.: 'Error', file
c:/mozilla/inbound2/ipc/glue/WindowsMessageLoop.cpp, line 186
[6260] ###!!! ASSERTION: Mismatched static Interrupt stack frames: 'this == sStaticTopFrame', file c:/mozilla/inbound2/ipc/glue/WindowsMessageLoop.cpp, line 660

[6260] ###!!! ASSERTION: Should only set this once!: '!gNeuteredWindows', file c:/mozilla/inbound2/ipc/glue/WindowsMessageLoop.cpp, line 649
1406454516905   addons.update-checker   WARN    Update manifest for {972ce4c6-7e08-4474-a285-3208198ce6fd} did not contain an updates property
[6260] ###!!! ASSERTION: Mismatched static Interrupt stack frames: 'this == sStaticTopFrame', file c:/mozilla/inbound2/ipc/glue/WindowsMessageLoop.cpp, line 660

They occur in bursts every minute or two.  Disabling the NS_ASSERTION
popup, after 20 minutes I got a crash in WeakReference::get(), from
WindowsMessageLoop.cpp:687
(channel->Listener()->ProcessRemoveNativeEventsInInterruptCall();)

I presume they're related to the 'intr' class of IPC message used to
implement NeedShmem (synchronous interrupting request for an Shmem
segment from the Child to the Parent), which we discussed before. Note
that the Parent may continue to queue async transactions to the child
before the Parent processes the intr request, and that allocating an
Shmem itself in response to the NeedShmem causes an IPC message (which
if you recall was why we moved it to an intr class to begin with - a
sync NeedShmem request fails when the parent needs to allocate an Shmem
in response.)

I did notice that the MessageLoop stuff (and I think intr in particular)
is different between Windows and other platforms.


We may also want to (longer term) take  a look at better ways to handle
shmem needs from a sandboxed Child that can't create them directly (due
to fd/open() issues).  I have the NeedShmem, plus local pools on both
sides to minimize creation/destruction and number of IPC requests
(NeedShmem).  In the OpenH264 case this works well, since the Shmem
usage is fairly balanced (module framerate). Other uses of codecs (pure
decoders or encoders, etc) may be permanently unbalanced; though similar
pooling and transfer mechanisms may help or handle this (piggybacking
them on other requests, using ShmemForPool IPC message I added to allow
one side to transfer an Shmem ownership to the pool on the other side).

One moderately painful (and not for uplift to 33) alternative would be
to implement our own shared-memory-allocator on top of a set of large
shmem blocks.  It's better not to need to constantly allocate at all
(and to avoid N IPC transactions per frame; best would be one in each
direction (DecodeThis() -> Decoded(), etc, and no extra shmem traffic at
all). -- or so I presume
I'm not having much luck reproducing this. I ran a debug build for about twenty minutes with debuggers attached to both processes, I didn't get a crash. I do see the occasional ASSERTION: Mismatched static Interrupt stack frames error, but I think this is unrelated.
So I've managed to reproduce some problems using a laptop that has a real camera. My previous tests were using something called ManyCam on a system without a built-in camera.

I was able to reproduce the crash in MessageChannel::NotifyGeckoEventDispatch() during the call to ProcessRemoteNativeEventsInInterruptCall(). The crash occurred when trying to send an Enocde message from PGMPVideoEncoderParent. Since this notification isn't actually needed unless you're running plugins that show modal dialogs, I commented it out and continued testing.

Next I got this crash in webrtc code:

mozglue.dll!jemalloc_crash(...) Line 1572	C
mozglue.dll!arena_run_reg_dalloc(arena_run_s * run=0x0fcd0000, arena_bin_s * bin=0x00250220, void * ptr=0x0fcd00e0, unsigned int size=32) Line 3270	C
mozglue.dll!arena_dalloc_small(arena_s * arena=0x00250040, arena_chunk_s * chunk=0x0fc00000, void * ptr=0x0fcd00e0, arena_chunk_map_s * mapelm=0x0fc009d0) Line 4462	C
mozglue.dll!arena_dalloc(void * ptr=0x0fcd00e0, unsigned int offset=852192) Line 4590	C
mozglue.dll!je_free(void * ptr=0x0fcd00e0) Line 6510	C
mozglue.dll!free_impl(void * ptr=0x0fcd00e0) Line 201	C
mozalloc.dll!moz_free(void * ptr=0x0fcd00e0) Line 46	C++
xul.dll!operator delete(void * ptr=0x0fcd00e0) Line 225	C++
xul.dll!std::allocator<std::_List_node<webrtc::media_optimization::MediaOptimization::EncodedFrameSample,void *> >::deallocate(std::_List_node<webrtc::media_optimization::MediaOptimization::EncodedFrameSample,void *> * _Ptr=0x0fcd00e0, unsigned int __formal=1) Line 586	C++
xul.dll!std::_Wrap_alloc<std::allocator<std::_List_node<webrtc::media_optimization::MediaOptimization::EncodedFrameSample,void *> > >::deallocate(std::_List_node<webrtc::media_optimization::MediaOptimization::EncodedFrameSample,void *> * _Ptr=0x0fcd00e0, unsigned int _Count=1) Line 888	C++
xul.dll!std::_List_buy<webrtc::media_optimization::MediaOptimization::EncodedFrameSample,std::allocator<webrtc::media_optimization::MediaOptimization::EncodedFrameSample> >::_Freenode(std::_List_node<webrtc::media_optimization::MediaOptimization::EncodedFrameSample,void *> * _Pnode=0x0fcd00e0) Line 862	C++
xul.dll!std::list<webrtc::media_optimization::MediaOptimization::EncodedFrameSample,std::allocator<webrtc::media_optimization::MediaOptimization::EncodedFrameSample> >::erase(std::_List_const_iterator<std::_List_val<std::_List_simple_types<webrtc::media_optimization::MediaOptimization::EncodedFrameSample> > > _Where={...}) Line 1434	C++
xul.dll!std::list<webrtc::media_optimization::MediaOptimization::EncodedFrameSample,std::allocator<webrtc::media_optimization::MediaOptimization::EncodedFrameSample> >::pop_front() Line 1283	C++
xul.dll!webrtc::media_optimization::MediaOptimization::PurgeOldFrameSamples(__int64 now_ms=1378372565) Line 495	C++
xul.dll!webrtc::media_optimization::MediaOptimization::SentBitRate() Line 340	C++
xul.dll!webrtc::vcm::VideoSender::Process() Line 94	C++
xul.dll!webrtc::`anonymous namespace'::VideoCodingModuleImpl::Process() Line 104	C++
xul.dll!webrtc::ProcessThreadImpl::Process() Line 174	C++
xul.dll!webrtc::ProcessThreadImpl::Run(void * obj=0x09462d60) Line 134	C++
xul.dll!webrtc::ThreadWindows::Run() Line 170	C++
xul.dll!webrtc::ThreadWindows::StartThread(void * lp_parameter=0x095d58d0) Line 67	C++
msvcr110.dll!_callthreadstartex() Line 354	C
msvcr110.dll!_threadstartex(void * ptd=0x1c507278) Line 332	C

Running some more I get:

xul.dll!mozilla::VectorBase<mozilla::ipc::MessageChannel::InterruptFrame,0,mozilla::MallocAllocPolicy,mozilla::Vector<mozilla::ipc::MessageChannel::InterruptFrame,0,mozilla::MallocAllocPolicy> >::reserved() Line 335	C++
xul.dll!mozilla::VectorBase<mozilla::ipc::MessageChannel::InterruptFrame,0,mozilla::MallocAllocPolicy,mozilla::Vector<mozilla::ipc::MessageChannel::InterruptFrame,0,mozilla::MallocAllocPolicy> >::append<mozilla::ipc::MessageChannel::InterruptFrame>(mozilla::ipc::MessageChannel::InterruptFrame && aU={...}) Line 1086	C++
xul.dll!mozilla::ipc::MessageChannel::CxxStackFrame::CxxStackFrame(mozilla::ipc::MessageChannel & that={...}, mozilla::ipc::Direction direction=IN_MESSAGE, const IPC::Message * msg=0x068af60c) Line 149	C++
xul.dll!mozilla::ipc::MessageChannel::OnMaybeDequeueOne() Line 1049	C++
xul.dll!DispatchToMethod<mozilla::ipc::MessageChannel,bool (__thiscall mozilla::ipc::MessageChannel::*)(void)>(mozilla::ipc::MessageChannel * obj=0x06f9d430, bool (void) * method=0x55a1b823, const Tuple0 & arg={...}) Line 384	C++
xul.dll!RunnableMethod<mozilla::ipc::MessageChannel,bool (__thiscall mozilla::ipc::MessageChannel::*)(void),Tuple0>::Run() Line 307	C++
xul.dll!mozilla::ipc::MessageChannel::RefCountedTask::Run() Line 390	C++
xul.dll!mozilla::ipc::MessageChannel::DequeueTask::Run() Line 407	C++
xul.dll!MessageLoop::RunTask(Task * task=0x1a0fecc0) Line 358	C++
xul.dll!MessageLoop::DeferOrRunPendingTask(const MessageLoop::PendingTask & pending_task={...}) Line 368	C++
xul.dll!MessageLoop::DoWork() Line 443	C++
xul.dll!mozilla::ipc::DoWorkRunnable::Run() Line 234	C++
xul.dll!nsThread::ProcessNextEvent(bool aMayWait=false, bool * aResult=0x068af7ad) Line 766	C++
xul.dll!NS_ProcessNextEvent(nsIThread * aThread=0x0641bb80, bool aMayWait=false) Line 265	C++
xul.dll!mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate * aDelegate=0x064891f0) Line 326	C++
xul.dll!MessageLoop::RunInternal() Line 230	C++
xul.dll!MessageLoop::RunHandler() Line 223	C++
xul.dll!MessageLoop::Run() Line 197	C++
xul.dll!nsThread::ThreadFunc(void * aArg=0x0641bb80) Line 353	C++
nss3.dll!_PR_NativeRunThread(void * arg=0x01e129b0) Line 397	C
nss3.dll!pr_root(void * arg=0x01e129b0) Line 90	C
msvcr110.dll!_callthreadstartex() Line 354	C
msvcr110.dll!_threadstartex(void * ptd=0x027f1010) Line 332	C

This was also on a send of an Encode message.

At this point I'm not sure what's going wrong. The ramdomness and trashed memory I see in objects screams memory problems, fwiw.

One other thing I've noticed - here, when sending an encode - 

http://mxr.mozilla.org/mozilla-central/source/content/media/gmp/GMPVideoEncoderParent.cpp#148

frameData didn't appear initialized. The lengths on the three color channels were positive but the buffer was null. I don't know this code yet so maybe this is expected. Will dig some more.
Attached patch patch for jesup (obsolete) — Splinter Review
disables special handling for plugins on windows in the ipc code.
I wonder if there's some way we can move the parent off the main thread? Looks like we would bypass most of the stack frame stuff, which is really only meant for the ui thread.
Depends on: 1047442
(In reply to Jim Mathies [:jimm] from comment #9)
> I wonder if there's some way we can move the parent off the main thread?
> Looks like we would bypass most of the stack frame stuff, which is really
> only meant for the ui thread.

We only use the GMPParent's IPC channel from the GMPThread....

Perhaps this is the reason:
  // The GMPParent inherits from IToplevelProtocol, which must be created
  // on the main thread to be threadsafe. See Bug 1035653.
This patch addresses the crashing in the parent for me, after a run of about 20 minutes. I did get one child crash, which was a failed deserialization of an shmem request, but I'm guessing that's unrelated to this.
Attachment #8465741 - Attachment is obsolete: true
With your patch, and my two (which are now on inbound), I've done a 6 hour call and am now at 7 1/2 hours into a second, all in the same browser, without problems.  Twice in the first call near the start I saw one decoder freeze, but "Disable video" followed by "enable video" caused it to recover (note that this would cause a switch to pure black input and might induce an IDR as well).  It may be failed recovery, or it might be a decoder error not propagating up and causing a refresh request (in async decode, errors might be reported differently).   In any case, they don't appear to have any relationship to the IPC issues here.
Attachment #8466404 - Attachment description: don't track stack frames on non-gui threads (wip) → don't track stack frames on non-gui threads v.1
Attachment #8466404 - Flags: review?(benjamin)
Comment on attachment 8466404 [details] [diff] [review]
don't track stack frames on non-gui threads v.1


I applied this patch on top of M-C which now has the patches from 1047442 on it and I can no longer replicate this crash.
Note in bug 1009590 we tried to fix this but that work was incomplete. While we avoided all the gui  thread hoops, we didn't turn off stack checking.
Attachment #8466404 - Flags: review?(benjamin) → review+
https://hg.mozilla.org/mozilla-central/rev/5a9241e29c61
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Comment on attachment 8466404 [details] [diff] [review]
don't track stack frames on non-gui threads v.1

Approval Request Comment
[Feature/regressing bug #]: Openh264

[User impact if declined]: OpenH264 becomes effectively unusable on fx33; we'd have to defer the feature to 34.

[Describe test coverage new/current, TBPL]: Requires manual testing due to a) length of time required to hit the bug randomly, b) we can't test against real openh264 plugins in TBPL.

[Risks and why]: Turns off a non-threadsafe optimization/construct for 'intr' protocols when used off the Gui thread; Good overall assertion coverage.  GMP is the only protocol currently using 'intr' off the gui thread.

[String/UUID change made/needed]: none
Attachment #8466404 - Flags: approval-mozilla-aurora?
Attachment #8466404 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
QA Contact: alexandra.lucinet
Reproduced the issue on old Nightly builds, verified as fixed on Windows 8.1 64bit using latest Nightly, latest Aurora, debug Nightly and Aurora.

Debug Nightly build 25.07.2014:
[6076] ###!!! ASSERTION: Mismatched static Interrupt stack frames: 'this == sSta
ticTopFrame', file c:/builds/moz2_slave/m-cen-w32-d-000000000000000000/build/ipc
/glue/WindowsMessageLoop.cpp, line 660

Nightly build 25.07.2014:
bp-4ab0d2f3-f9e3-4597-9fe4-be9682140811

Latest Nightly build 10.08.2014:
Works fine.

Debug Nightly build 10.08.2014:
Works fine.

Latest Aurora build 10.08.2014:
Works fine.

Debug Aurora build 10.08.2014:
Works fine.
Status: RESOLVED → VERIFIED
Whiteboard: [openh264-uplift]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: