Closed Bug 834100 Opened 12 years ago Closed 12 years ago

Null deref if you call addIceCandidate on an RTCPeerConnection before setting localDesc [@ fsmdef_ev_addcandidate]


(Core :: WebRTC: Signaling, defect, P2)






(Reporter: andrew, Assigned: andrew)


(Blocks 1 open bug)


(Keywords: crash, Whiteboard: [webrtc][blocking-webrtc+])


(2 files, 4 obsolete files)

If you run the attached test case (with media.peerconnection.enabled set to on) Firefox crashes with a null pointer deref in fsmdef_ev_addcandidate on this line,
  gsmsdp_set_ice_attribute (SDP_ATTR_ICE_CANDIDATE, level,
        dcb->sdp->dest_sdp, candidate);
because dcb->sdp is NULL.
Assignee: nobody → ak.miller
The testcase doesn't work for me. I see an empty error in the error console. A crashtest would be nice to have.
Severity: major → critical
Flags: in-testsuite?
Keywords: crash
Summary: Null deref if you call addIceCandidate on an RTCPeerConnection before setting localDesc → Null deref if you call addIceCandidate on an RTCPeerConnection before setting localDesc [@ fsmdef_ev_addcandidate]
(In reply to Henrik Skupin (:whimboo) from comment #2)
> The testcase doesn't work for me. I see an empty error in the error console.

This is what happens if media.peerconnection.enabled is set to off. Are you sure you enabled peerconnection in about:config?
Oh, my profile has been reset. Most likely it was due to some regression tests I had to do recently. :/ After retesting I can confirm the crash. Bad sadly we fail in correctly analyzing the crash at least on OS X:

Report: bp-f6c32ea9-9a49-4c2d-996f-4add72130124

Ted, are you aware of an issue?

Otherwise a crashtest would be nice to have for this case. I would be happy to help you out with everything you need to get this implemented.
Flags: needinfo?(ted)
Huh, the symbol file for libxul is completely missing from that symbol package for some reason. I filed bug 834228 on it. Thanks for the heads-up!
Flags: needinfo?(ted)
Andrew, is it a nightly (optimized) or debug build?
(In reply to Henrik Skupin (:whimboo) from comment #6)
> Andrew, is it a nightly (optimized) or debug build?
I have been testing with debug builds of mozilla-central (ac_add_options --disable-optimize, 
ac_add_options --enable-debug, ac_add_options --enable-debugger-info-modules, ac_add_options --enable-debug-symbols).
Comment on attachment 705689 [details] [diff] [review]
Fix bug 834100 by initialising the SDP structure if it isn't already

Review of attachment 705689 [details] [diff] [review]:

I don't think this is the approach we really want to take. Fundamentally, any calls into fsmdef_ev_addcandidate with a null SDP poitner indicate that either the application has performed a set of operations in the wrong order, or (possible but less likely) we've messed up the queuing operations somewhere between the application and here. In either case, silently creating the SDP block is not the right action to take.

Unfortunately, the current WebRTC spec doesn't provide success and error callbacks for addIceCandidate. I am of the opinion that this is a simple oversight, and that it will be fixed shortly.

I'm pretty certain that the correct solution here is to implement success and failure callbacks for addIceCandidate (which requires touching PeerConnnection.js, PeerConnectionImpl.{h,cpp}, and adding a new message in the sipcc->PeerConnectionImpl plumbing to communicate success or failure); and then trigger the failure callback here if there isn't an SDP block already. This would overhang the standardization by a little bit, but I'm comfortable that we can fix the standard (and then adjust compliance if it ends up using a different API).

I recognize that this is something that you may want to fix more rapidly than the timeframe required to develop and test a patch of that magnitude. I would be perfectly happy if you want to land a wallpaper patch that simply makes this event a no-op if there's no SDP, and then open another bug to track handling it as I describe above. Basically, instead of the new code you've added, you would simply add:

> if (!dcb->sdp) {
>   FSM_DEBUG_SM(DEB_F_PREFIX"dcb->sdp is NULL. Has the "
>                "remote description been set yet?\n",
>                DEB_F_PREFIX_ARGS(FSM, __FUNCTION__));
>   return (SM_RC_END);
> }

In case you're going to dive into the larger fix, I've made a couple of small comments on the code itself. If you do the wallpaper fix instead, feel free to assign the new, long-term-fix bug to me.

::: media/webrtc/signaling/src/sipcc/core/gsm/fsmdef.c
@@ +3625,5 @@
>          return SM_RC_CLEANUP;
>      }
> +    if (dcb->sdp == NULL) {

When testing a pointer, please use (!myPtr) or (myPtr) rather than comparing to NULL or nullptr.

@@ +3628,5 @@
> +    if (dcb->sdp == NULL) {
> +        cause = gsmsdp_create_local_sdp(dcb, FALSE, TRUE, TRUE, TRUE, TRUE);
> +        if (cause != CC_CAUSE_OK) {
> +            ui_create_offer(evCreateOfferError, line, call_id,

evCreateOfferError is the wrong event to return to the PeerConnection; it will produce surprising results in PeerConnection.js.

Unfortunately, there is no error event for addcandidate at the moment (see my general comment), so we would need to define one in order to send any kind of failure indication back up to the peer connection from here.

@@ +3631,5 @@
> +        if (cause != CC_CAUSE_OK) {
> +            ui_create_offer(evCreateOfferError, line, call_id,
> +                dcb->caller_id.call_instance_id, strlib_empty());
> +            FSM_DEBUG_SM(get_debug_string(FSM_DBG_SDP_BUILD_ERR));
> +            return (fsmdef_release(fcb, cause, FALSE));

Finally, I don't think this is the kind of error that would warrant tearing down the call. In general, if an error condition doesn't, in and of itself, indicate that the call cannot be continued, I think we want to try as hard as possible to let it do so. For this kind of situation, I would send an error back to the application (if possible), and then simply return an SM_RC_END (which, contrary to what its name might imply, doesn't end the call -- that's what SM_RC_CLEANUP would do -- it just ends execution of the chain of events that is underway).
Attachment #705689 - Flags: review?(adam) → review-
I've sent a message to the W3C list to try to resolve the spec bug I mention above:
Blocks: 796463
Whiteboard: [webrtc][blocking-webrtc+]
Ok, I'm removing the block on preffing on.  This would block release, so leaving blocking+. 

Let's take a wallpaper fix for now as mentioned by adam, unless a full fix that can pass review is in the offing.
No longer blocks: 796463
Priority: -- → P2
Attached patch Crashtest (obsolete) — Splinter Review
This isn't finished yet - it needs testing, but I don't have time today. If someone wants to take this bug in the interim, please feel free, otherwise I may be able to look at it further tomorrow and get it ready for review.
Attachment #705689 - Attachment is obsolete: true
Comment on attachment 707508 [details] [diff] [review]
Attempt at a more complete fix where we give an error for out-of-sequence operations

Review of attachment 707508 [details] [diff] [review]:

Yes, this looks like it's going in the right direction. I have a few comments inline, the overall direction of which is: we need both success and error callbacks.

::: dom/media/PeerConnection.js
@@ +551,5 @@
>    updateIce: function(config, constraints, restart) {
>    },
> +  addIceCandidate: function(cand, onError) {

It seems extremely likely that any standardization of this method will be of the form addIceCandidate(cand, onSuccess, onError), so we're probably better off going down that path.

Also, the way we currently service the queue relies on calls down to SIPCC *either* having two callbacks (success/fail) *or* having none.

Those that have no callbacks pass "wait: false" to the _queueOrRun call, which will execute the next command in the queue without waiting for this one to complete (appropriate for those commands with no callbacks). Those that have two callbacks will pass "wait: true" to the _queueOrRun call, which won't move on to the next command. When this happens, we rely on the success and error callbacks calling _dompc._executeNext() to trigger execution of the next operation in the queue. If we only get callbacks for only one of the two operations (success or failure), then the whole scheme kind of falls apart.

@@ +566,4 @@
>      this._queueOrRun({
>        func: this._pc.addIceCandidate,
>        args: [cand.candidate, cand.sdpMid || "", cand.sdpMLineIndex],
>        wait: false

Just to reiterate (because the resulting bug is quite difficult to diagnose if we get it wrong): if we're calling _executeNext() in the callbacks, "wait" has to be true.

::: media/webrtc/signaling/src/sipcc/core/common/ui.c
@@ +1689,5 @@
>  }
> +/**
> + * Send data from addIceCandidate to the UI
> + * 

Mind the trailing whitespace...

::: media/webrtc/signaling/src/sipcc/core/gsm/fsmdef.c
@@ +3643,5 @@
>      */
>      /* XXX: Bug 833043. Need to sanity check that this is actually a valid candidate */
>      /* Update remote SDP with new candidate information */
>      candidate = (char *)msg->data.candidate.candidate;

About 36 lines down from here, we have a ui_update_remote_description call. Now that we're adding callbacks to addIceCandidate, it would make more sense to use ui_ice_candidate_add; something like:

    ui_ice_candidate_add(evAddIceCandidate, line, call_id,
        dcb->caller_id.call_instance_id, strlib_malloc(remote_sdp,-1));

In other words, I propose that we remove ui_update_remote_description altogether and use ui_ice_candidate_add in its place; I would also replace evUpdateRemoteDesc/UPDATE_REMOTE_DESC with evAddIceCandidate/ADD_ICE_CANDIDATE.

This has the side effect of having an SDP parameter on your ui_ice_candidate_add function make sense. :)
Attachment #707503 - Attachment is obsolete: true
Attachment #707508 - Attachment is obsolete: true
Attachment #708454 - Flags: review?
Attachment #708454 - Flags: review? → review?(adam)
Comment on attachment 708454 [details] [diff] [review]
Revised patch that adds error and success callbacks. Includes the crashtest

Review of attachment 708454 [details] [diff] [review]:

In general, this looks excellent. Thanks for the additional work in signaling_unittests.cpp!

I have a few comments below, with the only substantial one being that the success callback shouldn't return SDP (as doing so is inconsistent with the way that the WebRTC spec generally works).

::: dom/media/PeerConnection.js
@@ +760,5 @@
>      }
>      this._dompc._executeNext();
>    },
> +  onAddIceCandidateSuccess: function(sdp) {

In the current W3C document, SDP is only present in success callbacks if the operation that succeeded is one that creates our own local SDP (i.e., createOffer and createAnswer).

I think the model you want to follow here is more like setRemoteDescription, since that's effectively what we're doing here -- that is:

  onAddIceCandidateSuccess: function(code)

::: dom/media/bridge/IPeerConnection.idl
@@ +41,5 @@
>    void onSetLocalDescriptionSuccess(in unsigned long code);
>    void onSetRemoteDescriptionSuccess(in unsigned long code);
>    void onSetLocalDescriptionError(in unsigned long code);
>    void onSetRemoteDescriptionError(in unsigned long code);
> +  void onAddIceCandidateSuccess(in string sdp);

Should be:

  void onAddIceCandidateSuccess(in unsigned long code);

(See comments in PeerConnection.js)

::: dom/media/tests/crashtests/834100.html
@@ +6,5 @@
> +  remotePC = new mozRTCPeerConnection();
> +  var cand = new mozRTCIceCandidate(
> +                {candidate: "1 1 UDP 1 34567 type host",
> +                 sdpMid: "helloworld",
> +                 sdbMid: "helloworld"

What is this sdbMid for? Remove or add a comment, please.

@@ +8,5 @@
> +                {candidate: "1 1 UDP 1 34567 type host",
> +                 sdpMid: "helloworld",
> +                 sdbMid: "helloworld"
> +                });
> +  cand.sdpMLineIndex = 1;

Why are we setting this externally rather than putting it in the dictionary above? Move or add a comment, please.

::: media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp
@@ +133,5 @@
>        mRemoteStream = mPC->media()->GetRemoteStream(streams->media_stream_id);
>        MOZ_ASSERT(mRemoteStream);
>      }
> +    if ((mCallState == CREATEOFFER) || (mCallState == CREATEANSWER) ||
> +        (mCallState == ADDICECANDIDATE)) {

Since we're not returning SDP to the observer for ADDICECANDIDATE (see comments in PeerConnection.js), we don't need to copy it into this object.

@@ +175,5 @@
>          mObserver->OnSetRemoteDescriptionError(mCode);
>          break;
> +      case ADDICECANDIDATE:
> +        mObserver->OnAddIceCandidateSuccess(mSdpStr.c_str());


@@ +1165,5 @@
>        mLocalSDP = aInfo->getSDP();
>        break;
>      case SETREMOTEDESC:

With this change, we're *replacing* UPDATEREMOTEDESC with ADDICECANDIDATE (since new ice candidates are the only way we can ever have the remote description get updated).

What's here works just fine, but it does leave some dead code lying around. Please either remove evUpdateRemoteDesc/UPDATEREMOTEDESC and ui_update_remote_description/UPDATE_REMOTE_DESC, or open an additional bug to remind us to perform this cleanup later.
Attachment #708454 - Flags: review?(adam) → review+
Blocks: 834270
Blocks: 833043
Attachment #708454 - Attachment is obsolete: true
Attachment #708711 - Flags: review?
Attachment #708711 - Flags: review? → review?(adam)
Comment on attachment 708711 [details] [diff] [review]
Fix the issues identified in the previous review

Review of attachment 708711 [details] [diff] [review]:

This looks great. Thanks for all your work on this.
Attachment #708711 - Flags: review?(adam) → review+
Note to accompany checkin-needed whiteboard status: The patch has been tested on x86_64 Linux against a recent mozilla-central. The signalling tests run cleanly and the new crashtest passes on that platform. I don't have try server access or access to all platforms so haven't tested it on other systems (although nothing in the patch is likely to trigger differences between platforms).
Whiteboard: [webrtc][blocking-webrtc+] → [webrtc][blocking-webrtc+][checkin-needed]
Keywords: checkin-needed
Andrew -- I've pushed the patch to try on all platforms, with mochi and crash testing:

Andrew, to make life easier for those checking in your behalf, can you make sure that you've got hg configured to generate all the necessary metadata for checkin? Thanks!
Flags: in-testsuite? → in-testsuite+
Keywords: checkin-needed
Whiteboard: [webrtc][blocking-webrtc+][checkin-needed] → [webrtc][blocking-webrtc+]
Nice mid-air there. Adam - I cancelled the Try job to save resources. If the patch is bad, we'll find out on inbound soon enough :)
Try run for aea6ff4da1b3 is complete.
Detailed breakdown of the results available here:
Results (out of 19 total builds):
    exception: 13
    failure: 6
Builds (or logs if builds failed) available at:
I'm putting this here so I can find it again easily if I need to:
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla21
Crash test included = automatic verification.
You need to log in before you can comment on or make changes to this bug.