Last Comment Bug 834100 - Null deref if you call addIceCandidate on an RTCPeerConnection before setting localDesc [@ fsmdef_ev_addcandidate]
: Null deref if you call addIceCandidate on an RTCPeerConnection before setting...
Status: VERIFIED FIXED
[webrtc][blocking-webrtc+]
: crash
Product: Core
Classification: Components
Component: WebRTC: Signaling (show other bugs)
: Trunk
: All All
: P2 critical (vote)
: mozilla21
Assigned To: Andrew Miller
: Jason Smith [:jsmith]
Mentors:
Depends on:
Blocks: 833043 834270
  Show dependency treegraph
 
Reported: 2013-01-23 18:30 PST by Andrew Miller
Modified: 2013-02-05 01:41 PST (History)
7 users (show)
ryanvm: in‑testsuite+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Test case (crashes trunk browser if media.peerconnection.enabled set to on) (1.19 KB, text/html)
2013-01-23 18:30 PST, Andrew Miller
no flags Details
Fix bug 834100 by initialising the SDP structure if it isn't already (1.51 KB, patch)
2013-01-23 18:36 PST, Andrew Miller
adam: review-
Details | Diff | Review
Crashtest (950 bytes, patch)
2013-01-29 01:44 PST, Andrew Miller
no flags Details | Diff | Review
Attempt at a more complete fix where we give an error for out-of-sequence operations (14.02 KB, patch)
2013-01-29 02:02 PST, Andrew Miller
no flags Details | Diff | Review
Revised patch that adds error and success callbacks. Includes the crashtest (24.37 KB, patch)
2013-01-30 21:16 PST, Andrew Miller
adam: review+
Details | Diff | Review
Fix the issues identified in the previous review (25.30 KB, patch)
2013-01-31 11:49 PST, Andrew Miller
adam: review+
Details | Diff | Review

Description Andrew Miller 2013-01-23 18:30:21 PST
Created attachment 705688 [details]
Test case (crashes trunk browser if media.peerconnection.enabled set to on)

If you run the attached test case (with media.peerconnection.enabled set to on) Firefox crashes with a null pointer deref in fsmdef_ev_addcandidate on this line,
  gsmsdp_set_ice_attribute (SDP_ATTR_ICE_CANDIDATE, level,
        dcb->sdp->dest_sdp, candidate);
because dcb->sdp is NULL.
Comment 1 Andrew Miller 2013-01-23 18:36:03 PST
Created attachment 705689 [details] [diff] [review]
Fix bug 834100 by initialising the SDP structure if it isn't already
Comment 2 Henrik Skupin (:whimboo) 2013-01-24 00:15:42 PST
The testcase doesn't work for me. I see an empty error in the error console. A crashtest would be nice to have.
Comment 3 Andrew Miller 2013-01-24 02:03:21 PST
(In reply to Henrik Skupin (:whimboo) from comment #2)
> The testcase doesn't work for me. I see an empty error in the error console.

This is what happens if media.peerconnection.enabled is set to off. Are you sure you enabled peerconnection in about:config?
Comment 4 Henrik Skupin (:whimboo) 2013-01-24 02:27:55 PST
Oh, my profile has been reset. Most likely it was due to some regression tests I had to do recently. :/ After retesting I can confirm the crash. Bad sadly we fail in correctly analyzing the crash at least on OS X:

Report: bp-f6c32ea9-9a49-4c2d-996f-4add72130124

Ted, are you aware of an issue?

Otherwise a crashtest would be nice to have for this case. I would be happy to help you out with everything you need to get this implemented.
Comment 5 Ted Mielczarek [:ted.mielczarek] 2013-01-24 06:10:56 PST
Huh, the symbol file for libxul is completely missing from that symbol package for some reason. I filed bug 834228 on it. Thanks for the heads-up!
Comment 6 Henrik Skupin (:whimboo) 2013-01-25 03:59:47 PST
Andrew, is it a nightly (optimized) or debug build?
Comment 7 Andrew Miller 2013-01-25 12:36:35 PST
(In reply to Henrik Skupin (:whimboo) from comment #6)
> Andrew, is it a nightly (optimized) or debug build?
I have been testing with debug builds of mozilla-central (ac_add_options --disable-optimize, 
ac_add_options --enable-debug, ac_add_options --enable-debugger-info-modules, ac_add_options --enable-debug-symbols).
Comment 8 Adam Roach [:abr] 2013-01-25 15:07:33 PST
Comment on attachment 705689 [details] [diff] [review]
Fix bug 834100 by initialising the SDP structure if it isn't already

Review of attachment 705689 [details] [diff] [review]:
-----------------------------------------------------------------

I don't think this is the approach we really want to take. Fundamentally, any calls into fsmdef_ev_addcandidate with a null SDP poitner indicate that either the application has performed a set of operations in the wrong order, or (possible but less likely) we've messed up the queuing operations somewhere between the application and here. In either case, silently creating the SDP block is not the right action to take.

Unfortunately, the current WebRTC spec doesn't provide success and error callbacks for addIceCandidate. I am of the opinion that this is a simple oversight, and that it will be fixed shortly.

I'm pretty certain that the correct solution here is to implement success and failure callbacks for addIceCandidate (which requires touching PeerConnnection.js, PeerConnectionImpl.{h,cpp}, and adding a new message in the sipcc->PeerConnectionImpl plumbing to communicate success or failure); and then trigger the failure callback here if there isn't an SDP block already. This would overhang the standardization by a little bit, but I'm comfortable that we can fix the standard (and then adjust compliance if it ends up using a different API).

I recognize that this is something that you may want to fix more rapidly than the timeframe required to develop and test a patch of that magnitude. I would be perfectly happy if you want to land a wallpaper patch that simply makes this event a no-op if there's no SDP, and then open another bug to track handling it as I describe above. Basically, instead of the new code you've added, you would simply add:

> if (!dcb->sdp) {
>   FSM_DEBUG_SM(DEB_F_PREFIX"dcb->sdp is NULL. Has the "
>                "remote description been set yet?\n",
>                DEB_F_PREFIX_ARGS(FSM, __FUNCTION__));
>   return (SM_RC_END);
> }

In case you're going to dive into the larger fix, I've made a couple of small comments on the code itself. If you do the wallpaper fix instead, feel free to assign the new, long-term-fix bug to me.

::: media/webrtc/signaling/src/sipcc/core/gsm/fsmdef.c
@@ +3625,5 @@
>          FSM_DEBUG_SM(DEB_F_PREFIX"dcb is NULL.\n", DEB_F_PREFIX_ARGS(FSM, __FUNCTION__));
>          return SM_RC_CLEANUP;
>      }
>  
> +    if (dcb->sdp == NULL) {

When testing a pointer, please use (!myPtr) or (myPtr) rather than comparing to NULL or nullptr.

@@ +3628,5 @@
>  
> +    if (dcb->sdp == NULL) {
> +        cause = gsmsdp_create_local_sdp(dcb, FALSE, TRUE, TRUE, TRUE, TRUE);
> +        if (cause != CC_CAUSE_OK) {
> +            ui_create_offer(evCreateOfferError, line, call_id,

evCreateOfferError is the wrong event to return to the PeerConnection; it will produce surprising results in PeerConnection.js.

Unfortunately, there is no error event for addcandidate at the moment (see my general comment), so we would need to define one in order to send any kind of failure indication back up to the peer connection from here.

@@ +3631,5 @@
> +        if (cause != CC_CAUSE_OK) {
> +            ui_create_offer(evCreateOfferError, line, call_id,
> +                dcb->caller_id.call_instance_id, strlib_empty());
> +            FSM_DEBUG_SM(get_debug_string(FSM_DBG_SDP_BUILD_ERR));
> +            return (fsmdef_release(fcb, cause, FALSE));

Finally, I don't think this is the kind of error that would warrant tearing down the call. In general, if an error condition doesn't, in and of itself, indicate that the call cannot be continued, I think we want to try as hard as possible to let it do so. For this kind of situation, I would send an error back to the application (if possible), and then simply return an SM_RC_END (which, contrary to what its name might imply, doesn't end the call -- that's what SM_RC_CLEANUP would do -- it just ends execution of the chain of events that is underway).
Comment 9 Adam Roach [:abr] 2013-01-25 15:21:59 PST
I've sent a message to the W3C list to try to resolve the spec bug I mention above: http://lists.w3.org/Archives/Public/public-webrtc/2013Jan/0075.html
Comment 10 Randell Jesup [:jesup] 2013-01-28 16:10:09 PST
Ok, I'm removing the block on preffing on.  This would block release, so leaving blocking+. 

Let's take a wallpaper fix for now as mentioned by adam, unless a full fix that can pass review is in the offing.
Comment 11 Andrew Miller 2013-01-29 01:44:25 PST
Created attachment 707503 [details] [diff] [review]
Crashtest
Comment 12 Andrew Miller 2013-01-29 02:02:20 PST
Created attachment 707508 [details] [diff] [review]
Attempt at a more complete fix where we give an error for out-of-sequence operations

This isn't finished yet - it needs testing, but I don't have time today. If someone wants to take this bug in the interim, please feel free, otherwise I may be able to look at it further tomorrow and get it ready for review.
Comment 13 Adam Roach [:abr] 2013-01-30 10:29:44 PST
Comment on attachment 707508 [details] [diff] [review]
Attempt at a more complete fix where we give an error for out-of-sequence operations

Review of attachment 707508 [details] [diff] [review]:
-----------------------------------------------------------------

Yes, this looks like it's going in the right direction. I have a few comments inline, the overall direction of which is: we need both success and error callbacks.

::: dom/media/PeerConnection.js
@@ +551,5 @@
>    updateIce: function(config, constraints, restart) {
>      return Cr.NS_ERROR_NOT_IMPLEMENTED;
>    },
>  
> +  addIceCandidate: function(cand, onError) {

It seems extremely likely that any standardization of this method will be of the form addIceCandidate(cand, onSuccess, onError), so we're probably better off going down that path.

Also, the way we currently service the queue relies on calls down to SIPCC *either* having two callbacks (success/fail) *or* having none.

Those that have no callbacks pass "wait: false" to the _queueOrRun call, which will execute the next command in the queue without waiting for this one to complete (appropriate for those commands with no callbacks). Those that have two callbacks will pass "wait: true" to the _queueOrRun call, which won't move on to the next command. When this happens, we rely on the success and error callbacks calling _dompc._executeNext() to trigger execution of the next operation in the queue. If we only get callbacks for only one of the two operations (success or failure), then the whole scheme kind of falls apart.

@@ +566,4 @@
>      this._queueOrRun({
>        func: this._pc.addIceCandidate,
>        args: [cand.candidate, cand.sdpMid || "", cand.sdpMLineIndex],
>        wait: false

Just to reiterate (because the resulting bug is quite difficult to diagnose if we get it wrong): if we're calling _executeNext() in the callbacks, "wait" has to be true.

::: media/webrtc/signaling/src/sipcc/core/common/ui.c
@@ +1689,5 @@
>  }
>  
> +/**
> + * Send data from addIceCandidate to the UI
> + * 

Mind the trailing whitespace...

::: media/webrtc/signaling/src/sipcc/core/gsm/fsmdef.c
@@ +3643,5 @@
>      */
>  
>      /* XXX: Bug 833043. Need to sanity check that this is actually a valid candidate */
>      /* Update remote SDP with new candidate information */
>      candidate = (char *)msg->data.candidate.candidate;

About 36 lines down from here, we have a ui_update_remote_description call. Now that we're adding callbacks to addIceCandidate, it would make more sense to use ui_ice_candidate_add; something like:

    ui_ice_candidate_add(evAddIceCandidate, line, call_id,
        dcb->caller_id.call_instance_id, strlib_malloc(remote_sdp,-1));

In other words, I propose that we remove ui_update_remote_description altogether and use ui_ice_candidate_add in its place; I would also replace evUpdateRemoteDesc/UPDATE_REMOTE_DESC with evAddIceCandidate/ADD_ICE_CANDIDATE.

This has the side effect of having an SDP parameter on your ui_ice_candidate_add function make sense. :)
Comment 14 Andrew Miller 2013-01-30 21:16:23 PST
Created attachment 708454 [details] [diff] [review]
Revised patch that adds error and success callbacks. Includes the crashtest
Comment 15 Adam Roach [:abr] 2013-01-31 07:40:30 PST
Comment on attachment 708454 [details] [diff] [review]
Revised patch that adds error and success callbacks. Includes the crashtest

Review of attachment 708454 [details] [diff] [review]:
-----------------------------------------------------------------

In general, this looks excellent. Thanks for the additional work in signaling_unittests.cpp!

I have a few comments below, with the only substantial one being that the success callback shouldn't return SDP (as doing so is inconsistent with the way that the WebRTC spec generally works).

::: dom/media/PeerConnection.js
@@ +760,5 @@
>      }
>      this._dompc._executeNext();
>    },
>  
> +  onAddIceCandidateSuccess: function(sdp) {

In the current W3C document, SDP is only present in success callbacks if the operation that succeeded is one that creates our own local SDP (i.e., createOffer and createAnswer).

I think the model you want to follow here is more like setRemoteDescription, since that's effectively what we're doing here -- that is:

  onAddIceCandidateSuccess: function(code)

::: dom/media/bridge/IPeerConnection.idl
@@ +41,5 @@
>    void onSetLocalDescriptionSuccess(in unsigned long code);
>    void onSetRemoteDescriptionSuccess(in unsigned long code);
>    void onSetLocalDescriptionError(in unsigned long code);
>    void onSetRemoteDescriptionError(in unsigned long code);
> +  void onAddIceCandidateSuccess(in string sdp);

Should be:

  void onAddIceCandidateSuccess(in unsigned long code);

(See comments in PeerConnection.js)

::: dom/media/tests/crashtests/834100.html
@@ +6,5 @@
> +  remotePC = new mozRTCPeerConnection();
> +  var cand = new mozRTCIceCandidate(
> +                {candidate: "1 1 UDP 1 127.0.0.1 34567 type host",
> +                 sdpMid: "helloworld",
> +                 sdbMid: "helloworld"

What is this sdbMid for? Remove or add a comment, please.

@@ +8,5 @@
> +                {candidate: "1 1 UDP 1 127.0.0.1 34567 type host",
> +                 sdpMid: "helloworld",
> +                 sdbMid: "helloworld"
> +                });
> +  cand.sdpMLineIndex = 1;

Why are we setting this externally rather than putting it in the dictionary above? Move or add a comment, please.

::: media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp
@@ +133,5 @@
>        mRemoteStream = mPC->media()->GetRemoteStream(streams->media_stream_id);
>        MOZ_ASSERT(mRemoteStream);
>      }
> +    if ((mCallState == CREATEOFFER) || (mCallState == CREATEANSWER) ||
> +        (mCallState == ADDICECANDIDATE)) {

Since we're not returning SDP to the observer for ADDICECANDIDATE (see comments in PeerConnection.js), we don't need to copy it into this object.

@@ +175,5 @@
>          mObserver->OnSetRemoteDescriptionError(mCode);
>          break;
>  
> +      case ADDICECANDIDATE:
> +        mObserver->OnAddIceCandidateSuccess(mSdpStr.c_str());

mObserver->OnAddIceCandidateSuccess(mCode);

@@ +1165,5 @@
>        mLocalSDP = aInfo->getSDP();
>        break;
>  
>      case SETREMOTEDESC:
>      case UPDATEREMOTEDESC:

With this change, we're *replacing* UPDATEREMOTEDESC with ADDICECANDIDATE (since new ice candidates are the only way we can ever have the remote description get updated).

What's here works just fine, but it does leave some dead code lying around. Please either remove evUpdateRemoteDesc/UPDATEREMOTEDESC and ui_update_remote_description/UPDATE_REMOTE_DESC, or open an additional bug to remind us to perform this cleanup later.
Comment 16 Andrew Miller 2013-01-31 11:49:56 PST
Created attachment 708711 [details] [diff] [review]
Fix the issues identified in the previous review
Comment 17 Adam Roach [:abr] 2013-01-31 11:57:54 PST
Comment on attachment 708711 [details] [diff] [review]
Fix the issues identified in the previous review

Review of attachment 708711 [details] [diff] [review]:
-----------------------------------------------------------------

This looks great. Thanks for all your work on this.
Comment 18 Andrew Miller 2013-01-31 12:34:01 PST
Note to accompany checkin-needed whiteboard status: The patch has been tested on x86_64 Linux against a recent mozilla-central. The signalling tests run cleanly and the new crashtest passes on that platform. I don't have try server access or access to all platforms so haven't tested it on other systems (although nothing in the patch is likely to trigger differences between platforms).
Comment 19 Adam Roach [:abr] 2013-01-31 12:39:44 PST
Andrew -- I've pushed the patch to try on all platforms, with mochi and crash testing:

https://tbpl.mozilla.org/?tree=Try&rev=aea6ff4da1b3
Comment 20 Ryan VanderMeulen [:RyanVM] 2013-01-31 12:43:32 PST
https://hg.mozilla.org/integration/mozilla-inbound/rev/b1e8d7154488

Andrew, to make life easier for those checking in your behalf, can you make sure that you've got hg configured to generate all the necessary metadata for checkin? Thanks!
https://developer.mozilla.org/en-US/docs/Creating_a_patch_that_can_be_checked_in
Comment 21 Ryan VanderMeulen [:RyanVM] 2013-01-31 12:44:35 PST
Nice mid-air there. Adam - I cancelled the Try job to save resources. If the patch is bad, we'll find out on inbound soon enough :)
Comment 22 Mozilla RelEng Bot 2013-01-31 13:00:41 PST
Try run for aea6ff4da1b3 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=aea6ff4da1b3
Results (out of 19 total builds):
    exception: 13
    failure: 6
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/adam@nostrum.com-aea6ff4da1b3
Comment 23 Adam Roach [:abr] 2013-01-31 13:58:51 PST
I'm putting this here so I can find it again easily if I need to:
https://tbpl.mozilla.org/?rev=b1e8d7154488&tree=Mozilla-Inbound
Comment 24 Ryan VanderMeulen [:RyanVM] 2013-01-31 14:15:47 PST
Backed out for mochitest timeouts (available in comment 23).
https://hg.mozilla.org/integration/mozilla-inbound/rev/c6a2e0805de7
Comment 25 Ryan VanderMeulen [:RyanVM] 2013-01-31 19:40:43 PST
Re-landed:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f8e40ab9bf6c
Comment 26 Ryan VanderMeulen [:RyanVM] 2013-02-01 13:18:46 PST
https://hg.mozilla.org/mozilla-central/rev/f8e40ab9bf6c
Comment 27 Jason Smith [:jsmith] 2013-02-01 16:54:07 PST
Crash test included = automatic verification.

Note You need to log in before you can comment on or make changes to this bug.