Last Comment Bug 792659 - PeerConnectionImpl needs better error handling and logging
: PeerConnectionImpl needs better error handling and logging
Status: RESOLVED FIXED
[WebRTC], [blocking-webrtc+] [qa-]
:
Product: Core
Classification: Components
Component: WebRTC: Signaling (show other bugs)
: unspecified
: All All
: -- normal (vote)
: mozilla18
Assigned To: Ethan Hugg [:ehugg]
: Jason Smith [:jsmith]
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-09-19 19:08 PDT by Ethan Hugg [:ehugg]
Modified: 2012-10-12 12:10 PDT (History)
4 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Patch 1 - PeerConnectionImpl needs better error handling and logging (25.95 KB, patch)
2012-09-21 14:45 PDT, Ethan Hugg [:ehugg]
ekr: feedback+
Details | Diff | Splinter Review
Patch 1 - PeerConnectionImpl needs better error handling and logging (32.46 KB, patch)
2012-09-24 11:12 PDT, Ethan Hugg [:ehugg]
no flags Details | Diff | Splinter Review
Patch 1 - PeerConnectionImpl needs better error handling and logging (32.36 KB, patch)
2012-09-24 11:20 PDT, Ethan Hugg [:ehugg]
no flags Details | Diff | Splinter Review
Patch 1 - PeerConnectionImpl needs better error handling and logging (32.45 KB, patch)
2012-09-24 11:40 PDT, Ethan Hugg [:ehugg]
rjesup: feedback+
Details | Diff | Splinter Review
Patch 1 - PeerConnectionImpl needs better error handling and logging (32.43 KB, patch)
2012-09-24 14:28 PDT, Ethan Hugg [:ehugg]
no flags Details | Diff | Splinter Review

Description Ethan Hugg [:ehugg] 2012-09-19 19:08:52 PDT
PeerConnectionImpl needs better error handling and reporting.  Specifically:

- Add error checking of all function calls - sometime we assert but continue, and sometimes we don't check at all.
- Add error checking of parameters at top of each method.
- Remove some unneeded code such as the body to NotifyQueuedTrackChanges 
- Make logging more consistent - CSFLogErrorS when error, DebugS when debug, use stream versions rather than printf-style versions.
Comment 1 Ethan Hugg [:ehugg] 2012-09-21 14:45:33 PDT
Created attachment 663566 [details] [diff] [review]
Patch 1 - PeerConnectionImpl needs better error handling and logging
Comment 2 Ethan Hugg [:ehugg] 2012-09-21 14:55:49 PDT
Comment on attachment 663566 [details] [diff] [review]
Patch 1 - PeerConnectionImpl needs better error handling and logging


I decided to break this into a couple patches.  If you agree with what I've done so far, I can finish PeerConnectionImpl and PeerConnectionCtx the same way in a second patch.
Comment 3 Eric Rescorla (:ekr) 2012-09-21 15:58:13 PDT
Comment on attachment 663566 [details] [diff] [review]
Patch 1 - PeerConnectionImpl needs better error handling and logging

Review of attachment 663566 [details] [diff] [review]:
-----------------------------------------------------------------

This looks like a good start. I think you may be being a little aggressive with the PR_ASSERTs. some of these things seem like they might legitimately fail for some unforeseen runtime conditions, and we don't want to crash. I know I had a lot of asserts, but I think we need to be more cautious when we actually plan to ship it.  I've commented the ones I think are most relevant.

::: media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp
@@ +256,5 @@
>  PeerConnectionImpl* PeerConnectionImpl::CreatePeerConnection()
>  {
>    PeerConnectionImpl *pc = new PeerConnectionImpl();
>  
> +  if (!pc)

I believe that new already returns a valid object (or crashes), so you don't need to check this.

@@ +316,4 @@
>  
> +  CSFLogDebugS(logTag, "PeerConnection " << static_cast<void*>(this) 
> +    << ": Created media stream " << static_cast<void*>(stream.get()) 
> +    << " inner: " << static_cast<void*>(stream->GetStream()));

Do you need to do stream,get() as the input to the static cast?

@@ +325,3 @@
>  PeerConnectionImpl::MakeRemoteSource(nsDOMMediaStream* stream, RemoteSourceStreamInfo** info)
>  {
> +  PR_ASSERT(info);

I could live without non-assert checks on arguments if it's a programming error for them to be wrong.

@@ +362,3 @@
>      ), NS_DISPATCH_SYNC);
>    }
> +  PR_ASSERT(NS_SUCCEEDED(res));

We should back this up with a non-assert check in case someone compiles without assert.

@@ +378,2 @@
>    ), NS_DISPATCH_SYNC);
> +  PR_ASSERT(NS_SUCCEEDED(res));

Same as above.

@@ +418,5 @@
>    }
>  
>    mCall = pcctx->createCall();
> +
> +  PR_ASSERT(mCall.get());

I'm not sure you need to .get() on these because they declare operator T*

@@ +454,5 @@
> +  RefPtr<NrIceMediaStream> audioStream = mIceCtx->CreateStream("stream1", 2);
> +  RefPtr<NrIceMediaStream> videoStream = mIceCtx->CreateStream("stream2", 2);
> +  RefPtr<NrIceMediaStream> dcStream = mIceCtx->CreateStream("stream3", 2);
> +
> +  PR_ASSERT(audioStream.get());

I think maybe remove these asserts? It seems like there might be legit reasons why this could happen.

@@ +489,5 @@
>  
>    // Create the DTLS Identity
>    mIdentity = DtlsIdentity::Generate("self");
>  
> +  PR_ASSERT(mIdentity.get());

Same here.

@@ +554,3 @@
>      ), NS_DISPATCH_SYNC);
>    }
> +  PR_ASSERT(NS_SUCCEEDED(res));

This could totally fail, I suspect.
Comment 4 Randell Jesup [:jesup] 2012-09-22 23:41:49 PDT
Comment on attachment 663566 [details] [diff] [review]
Patch 1 - PeerConnectionImpl needs better error handling and logging

Review of attachment 663566 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp
@@ +256,5 @@
>  PeerConnectionImpl* PeerConnectionImpl::CreatePeerConnection()
>  {
>    PeerConnectionImpl *pc = new PeerConnectionImpl();
>  
> +  if (!pc)

Eric is right, infallible malloc means new can't fail

@@ +306,3 @@
>  PeerConnectionImpl::MakeMediaStream(PRUint32 hint, nsIDOMMediaStream** retval)
>  {
> +  PR_ASSERT(retval);

MOZ_ASSERT for debug-only asserts, NS_ASSERTION for production asserts, generally.

@@ +325,4 @@
>  PeerConnectionImpl::MakeRemoteSource(nsDOMMediaStream* stream, RemoteSourceStreamInfo** info)
>  {
> +  PR_ASSERT(info);
> +  if (!info) {

Agree with ekr; either use NS_ASSERTION and include the test in production code (and drop the "if (!...)", or decide it's only worth checking for while debugging and use MOZ_ASSERT (and drop the "if (!....)")
Comment 5 Ethan Hugg [:ehugg] 2012-09-24 11:12:35 PDT
Created attachment 664131 [details] [diff] [review]
Patch 1 - PeerConnectionImpl needs better error handling and logging
Comment 6 Eric Rescorla (:ekr) 2012-09-24 11:19:48 PDT
Comment on attachment 664131 [details] [diff] [review]
Patch 1 - PeerConnectionImpl needs better error handling and logging

Review of attachment 664131 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm with one trivial comment

::: media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp
@@ +415,5 @@
> +  RefPtr<NrIceMediaStream> audioStream = mIceCtx->CreateStream("stream1", 2);
> +  RefPtr<NrIceMediaStream> videoStream = mIceCtx->CreateStream("stream2", 2);
> +  RefPtr<NrIceMediaStream> dcStream = mIceCtx->CreateStream("stream3", 2);
> +
> +  if (!audioStream) {

IMO we should return an error for this failure.
Comment 7 Ethan Hugg [:ehugg] 2012-09-24 11:20:31 PDT
Created attachment 664136 [details] [diff] [review]
Patch 1 - PeerConnectionImpl needs better error handling and logging
Comment 8 Ethan Hugg [:ehugg] 2012-09-24 11:40:57 PDT
Created attachment 664145 [details] [diff] [review]
Patch 1 - PeerConnectionImpl needs better error handling and logging
Comment 9 Ethan Hugg [:ehugg] 2012-09-24 11:46:45 PDT
Comment on attachment 664145 [details] [diff] [review]
Patch 1 - PeerConnectionImpl needs better error handling and logging


get() is not needed for nsRefPtrs but it is still required for the ones that derive from linked_ptr like CC_CallPtr.

I used MOZ_ASSERT for things that "can't happen" including null pointer params and runtime error checking/setting nsresult for things that could.

Removed checking of memory allocators.
Comment 10 Randell Jesup [:jesup] 2012-09-24 12:07:53 PDT
Comment on attachment 664145 [details] [diff] [review]
Patch 1 - PeerConnectionImpl needs better error handling and logging

Review of attachment 664145 [details] [diff] [review]:
-----------------------------------------------------------------

r+=me modulo dealing with the nits.

::: media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp
@@ +338,5 @@
>      ), NS_DISPATCH_SYNC);
>    }
>  
> +  if (NS_FAILED(res)) {
> +    return NS_ERROR_FAILURE;

Is there a reason for not using "return res" here? Do we need to clean things to NS_OK/NS_ERROR_FAILURE?

@@ +357,4 @@
>    ), NS_DISPATCH_SYNC);
> +
> +  if (NS_FAILED(res)) {
> +    return NS_ERROR_FAILURE;

Is there a reason for not using "return res" here?  Do we need to clean things to NS_OK/NS_ERROR_FAILURE?

@@ +358,5 @@
> +
> +  if (NS_FAILED(res)) {
> +    return NS_ERROR_FAILURE;
> +  }
> +  

very minor, trailing space

@@ +387,4 @@
>  
>    mCall = pcctx->createCall();
> +
> +  MOZ_ASSERT(mCall.get());

.get()?  or MOZ_ASSERT(mCall)?

@@ +446,5 @@
>    );
> +
> +  if (NS_FAILED(res)) {
> +    CSFLogErrorS(logTag, __FUNCTION__ << ": StartGathering failed: " << res);
> +    return NS_ERROR_FAILURE;

NS_ERROR_FAILURE, or "return res;"?  (Repeat for most other uses of this idiom in the patch)

@@ +1051,5 @@
>  PeerConnectionImpl::onCallEvent(ccapi_call_event_e callEvent,
>    CSF::CC_CallPtr call, CSF::CC_CallInfoPtr info)
>  {
> +  MOZ_ASSERT(call.get());
> +  MOZ_ASSERT(info.get());

.get() or MOZ_ASSERT(info)?
Comment 11 Ethan Hugg [:ehugg] 2012-09-24 14:05:46 PDT
>@@ +1051,5 @@
>>  PeerConnectionImpl::onCallEvent(ccapi_call_event_e callEvent,
>>    CSF::CC_CallPtr call, CSF::CC_CallInfoPtr info)
>>  {
>> +  MOZ_ASSERT(call.get());
>> +  MOZ_ASSERT(info.get());
>
>.get() or MOZ_ASSERT(info)?

CC_CallPtr and CC_CallInfoPtr derive from linked_ptr which needs this .get() - won't compile without it.

Good point on re-mapping res to ERROR_FAILURE, I'll fix all of those.
Comment 12 Ethan Hugg [:ehugg] 2012-09-24 14:28:08 PDT
Created attachment 664216 [details] [diff] [review]
Patch 1 - PeerConnectionImpl needs better error handling and logging
Comment 13 Ethan Hugg [:ehugg] 2012-09-24 14:36:00 PDT
Comment on attachment 664216 [details] [diff] [review]
Patch 1 - PeerConnectionImpl needs better error handling and logging


Pushed to Alder - http://hg.mozilla.org/projects/alder/rev/2b62d30467e7
Comment 14 Jason Smith [:jsmith] 2012-10-10 18:27:08 PDT
Ethan - Is this testable through hitting the DOM API with PeerConnection? I'm wondering if it would be worth running some quick DOM tests here or not.
Comment 15 Ethan Hugg [:ehugg] 2012-10-10 18:34:24 PDT
(In reply to Jason Smith [:jsmith] from comment #14)
> Ethan - Is this testable through hitting the DOM API with PeerConnection?
> I'm wondering if it would be worth running some quick DOM tests here or not.

Yes, your fuzz testing is already testing this.  Essentially there were error conditions where did the wrong things - crashed instead of returning NS_ERROR_FAILURE, or didn't check return codes of subroutines so  we would continue and crash later, things like that.  We're still refining this as you've seen from 791270 and 791278 where I didn't realize the JS layer would pass through NULLs and we had MOZ_ASSERTs instead of runtime checking.
Comment 16 Jason Smith [:jsmith] 2012-10-12 12:10:28 PDT
(In reply to Ethan Hugg [:ehugg] from comment #15)
> (In reply to Jason Smith [:jsmith] from comment #14)
> > Ethan - Is this testable through hitting the DOM API with PeerConnection?
> > I'm wondering if it would be worth running some quick DOM tests here or not.
> 
> Yes, your fuzz testing is already testing this.  Essentially there were
> error conditions where did the wrong things - crashed instead of returning
> NS_ERROR_FAILURE, or didn't check return codes of subroutines so  we would
> continue and crash later, things like that.  We're still refining this as
> you've seen from 791270 and 791278 where I didn't realize the JS layer would
> pass through NULLs and we had MOZ_ASSERTs instead of runtime checking.

Okay. The security assurance team has been managing this, so I'll keep this out of my scope testing wise (I'm focusing on the other areas outside of the fuzzing).

Note You need to log in before you can comment on or make changes to this bug.