If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

crash in sipcc::PeerConnectionWrapper::PeerConnectionWrapper

RESOLVED INCOMPLETE

Status

()

Core
WebRTC
--
critical
RESOLVED INCOMPLETE
4 years ago
2 years ago

People

(Reporter: jsmith, Unassigned)

Tracking

({crash})

22 Branch
x86
Windows XP
crash
Points:
---

Firefox Tracking Flags

(firefox22 affected)

Details

(Whiteboard: [WebRTC][blocking-webrtc-], crash signature)

Attachments

(2 attachments)

(Reporter)

Description

4 years ago
This bug was filed from the Socorro interface and is 
report bp-a7f63872-53d8-467f-a7a1-2c6e32130610 .
============================================================= 

Frame 	Module 	Signature 	Source
0 	xul.dll 	sipcc::PeerConnectionWrapper::PeerConnectionWrapper 	media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp:1398
1 	xul.dll 	vcmSetIceMediaParams_m 	media/webrtc/signaling/src/media/VcmSIPCCBinding.cpp:833
2 	xul.dll 	mozilla::dom::NativeInterface2JSObjectAndThrowIfFailed 	dom/bindings/BindingUtils.cpp:593
3 	xul.dll 	mozilla::dom::WrapNativeISupportsParent<nsISupports> 	obj-firefox/dist/include/mozilla/dom/BindingUtils.h:971
4 	xul.dll 	nsClientRect::WrapObject 	content/html/content/src/nsClientRect.cpp:45
5 	xul.dll 	mozilla::dom::ElementBinding::getBoundingClientRect 	obj-firefox/dom/bindings/ElementBinding.cpp:1371

Updated

4 years ago
status-firefox22: --- → affected
OS: Windows NT → Windows XP
Version: Trunk → 22 Branch
(Reporter)

Updated

4 years ago
Whiteboard: [WebRTC][blocking-webrtc-]

Comment 1

4 years ago
Problem might be with the jump from:
2. mozilla::dom::NativeInterface2JSObjectAndThrowIfFailed
To
1. vcmSetIceMediaParams_m

We might have invalid data within "NativeInterface2JSObjectAndThrowIfFailed" causing the jump to vcmSetIceMediaParams_m

If the jump is correct, then the crash is likely due to "PeerConnectionCtx::GetInstance()" being NULL.
i.e: 
- crash is read from address 0x8. 
- Inside "PeerConnectionCtx" class, "mPeerConnections" should have offset 0x8.


Bigger callstack would have been useful here...

Comment 2

4 years ago
Found other similar bugs like:
https://crash-stats.mozilla.com/report/index/a74f05d5-40f5-4450-9fa9-551ae2131109
https://crash-stats.mozilla.com/report/index/27725a5a-aebc-4672-8ff5-64c7c2131108


Is this a lone occurance? (stack trace ending up in PeerCOnnectionWrapper?)
If so, issue is likely in "NativeInterface2JSObjectAndThrowIfFailed" (jump at random).
Flags: needinfo?(jsmith)
(Reporter)

Comment 3

4 years ago
I don't know.
Flags: needinfo?(jsmith)

Comment 4

4 years ago
Created attachment 830101 [details] [diff] [review]
corruption.diff

Fixes for issues:
https://crash-stats.mozilla.com/report/index/a74f05d5-40f5-4450-9fa9-551ae2131109
https://crash-stats.mozilla.com/report/index/27725a5a-aebc-4672-8ff5-64c7c2131108

Issue here is a side effect. Problem is with jump from:
2. mozilla::dom::NativeInterface2JSObjectAndThrowIfFailed
To
1. vcmSetIceMediaParams_m

This happens because "cache" may point to invalid data.

In crash:
https://crash-stats.mozilla.com/report/index/27725a5a-aebc-4672-8ff5-64c7c2131108

Frame 4: nsJSEventListener::HandleEvent(nsIDOMEvent *)
Lines:
 JS::Value retval =
    handler->Call(mTarget, *(aEvent->InternalDOMEvent()), rv);

May have "mTarget" invalid. That is because we just store a pointer and not a smart pointer. 

Quick & dirty fix - store a nsCOMPtr

In crash:
https://crash-stats.mozilla.com/report/index/a74f05d5-40f5-4450-9fa9-551ae2131109

We have a bigger problem:
- nsGlobalWindow inherits multiple sources of "nsISupports". (which is used for reference count) 
- This inheritance is not virtual. (CORBA does not allow virtual inheritance, classes which we derive are generated from idl files)
- Current code only protected one super class (and not all :) )
- - We only protected: "nsIScriptGlobalObject" superclass.
- If ANY of the other superclasses are deleted, main class is destroyed as well via virtual destructor.

Actual crash is likely due to:
- "RunTimeoutHandler" destroying one of the other superclasses.
- That caused main class to be destroyed as well (disregarding our protection)
- Corrupted data trickled down to another "RunTimeoutHandler"
- That second call crashed.

Fix is relatively simple, since we can't have multiple inheritance, we protect all "nsISupports" derived superclasses.
When protector class is destroyed, it needs to stop when a delete happens.

Notes:
- This is all in DOM code - Please ask a area owner to review.
- This might have security implications... 
- I had no idea where to place the MultipleInheritance protector class, so it might need moving (if approved)
- Wanted to keep explanation short and to the point, I can detail further if anyone wants.
Flags: needinfo?

Comment 5

4 years ago
Additional note:
- I'm new to this, so please add details regarding next steps (ex: what I need to test, etc)

Updated

4 years ago
Flags: needinfo?(rjesup)

Comment 6

4 years ago
Created attachment 830703 [details] [diff] [review]
corruptionfix.patch

Request review, please add any other persons that need to review this.

Details are in previous comments. (This seems like a DOM issue and I don't know who to add to foreview)
Attachment #830703 - Flags: review?(rjesup)
Comment on attachment 830703 [details] [diff] [review]
corruptionfix.patch

Review of attachment 830703 [details] [diff] [review]:
-----------------------------------------------------------------

bz: not my area...  You should check if the analysis seems valid.  Seems like a quite fundamental change; if the analysis is correct, we should also consider alternatives.

Adrian:  Thanks for taking a run at such a confusing crash.  Even if the analysis turns out to be wrong (and it can be tough to diagnose a blind crash report like this), it's great that you're doing this.  I'll be more available on IRC (#media is best) in a few days, as I'm traveling for standards work at the moment.
Attachment #830703 - Flags: review?(rjesup) → review?(bzbarsky)
Comment on attachment 830703 [details] [diff] [review]
corruptionfix.patch

> - nsGlobalWindow inherits multiple sources of "nsISupports". (which is used for reference
> count) 

Not quite.  nsISupports is a pure virtual class with no data members.  The reference count lives in concrete classes as needed.  

Looking at nsGlobalWindow, it inherits from mozilla::dom::EventTarget, a bunch of interfaces (which have no members), PRCListStr, and nsSupportsWeakReference.

nsSupportsWeakReference and PRCListStr have no refcount members.  

mozilla::dom::EventTarget has no refcount member, and inherits from nsIDOMEventTarget and nsWrapperCache.  Neither of those has a refcount member either.

nsGlobalWindow _does_ however declare a refcount member:

  NS_DECL_CYCLE_COLLECTING_ISUPPORTS

and that's the one reference count window objects have.

>- - We only protected: "nsIScriptGlobalObject" superclass.

No, because its addref/release are virtual so calling them calls nsGlobalWindow::AddRef/Release.

So the MultipleInheritanceCOMPtr bit should not be necessary.  If it actually fixes something (do we have evidence that it does?) something is fishy.

As for the mTarget of nsIJSEventListener.h, there is a comment right before the class declaration:

168 // Note, mTarget is a raw pointer and the owner of the nsIJSEventListener object
169 // is expected to call Disconnect()!

Is that not happening?
Attachment #830703 - Flags: review?(bzbarsky) → review-
Crash-stats seems to be down, so I can't look at the reports above, but I assume that we have no reliable way to reproduce testcase for any of these crashes?
OK, crash-stats is back up.

The stacks in comment 4 are quite different from the original stack this bug was filed on.  We should get a separate bug filed on those...

That said, I agree that even the stack in comment 0 has nothing to do with PeerConnection per se.  It seems to be corrupt memory in a place where that just shouldn't be happening.
Flags: needinfo?

Comment 11

4 years ago
bz - thanks a lot for the review and explanations, had some trouble following the refcount.

As for this crash, the callstack seems too small to figure out what would be corrupted, That's why I was looking at other crash reports.

Updated

4 years ago
Flags: needinfo?(rjesup)
Overtaken by events.  In any case, thanks Adrian for taking a run at such a confusing bug
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.