Closed Bug 980609 Opened 10 years ago Closed 9 years ago

Tab-to-tab call while VPNed cause failed assertion: (ctx->state == NR_TURN_CLIENT_STATE_INITTED || ctx->state == NR_TURN_CLIENT_STATE_CONNECTED)

Categories

(Core :: WebRTC, defect, P4)

30 Branch
x86
macOS
defect

Tracking

()

RESOLVED FIXED
mozilla42
Tracking Status
firefox42 --- fixed

People

(Reporter: thomas, Assigned: bwc)

Details

Attachments

(7 files, 4 obsolete files)

Attached file Log dump
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.146 Safari/537.36

Steps to reproduce:

1. Connect to OpenVPN (settings in attachment)
2. Open two tabs in https://appear.in to perform call


Actual results:

Firefox Nightly Debug crashes with assertion error:
Assertion failed: (ctx->state == NR_TURN_CLIENT_STATE_INITTED || ctx->state == NR_TURN_CLIENT_STATE_CONNECTED)


Expected results:

Call should have been established and/or failed gracefully.
Attached file OpenVPN config file
Summary: Tab-to-tab call while tunneled cause failed assertion: (ctx->state == NR_TURN_CLIENT_STATE_INITTED || ctx->state == NR_TURN_CLIENT_STATE_CONNECTED) → Tab-to-tab call while VPNed cause failed assertion: (ctx->state == NR_TURN_CLIENT_STATE_INITTED || ctx->state == NR_TURN_CLIENT_STATE_CONNECTED)
Can you supply a stack trace, perhaps?
Attached file Stacktrace
Stacktrace. Not from the same session as the log dump, but it's the same error.
Hmm.. This isn't giving us the nICEr symbols. 

Can you provide two new things:
1. A run with ICE logging (R_LOG_LEVEL=9, R_LOG_DESTINATION=stderr)
2. The value of ctx->state

Thanks
1. Log provided
2. I haven't built Firefox myself (ever), I'm using the nightly debug binaries. I guess that is required in order to provide this value?
bruun: thanks those should help.  Some logs are only available in debug builds only; signaling and ICE logs are available in opt/release builds.

BTW, I gave you cancofirm and editbugs privs in bugzilla

Byron, can you look at the logs and confirm when you have some time?
Flags: needinfo?(docfaraday)
Looking at the code, it seems probable that a TCP connection failure to a TURN server before our timer for forming the allocation pops could hit this assertion.
Flags: needinfo?(docfaraday)
Assignee: nobody → docfaraday
Looking at the logging, I see lots of TURN TCP client contexts timing out toward the end, and no logging that indicates that any of them had tried forming an allocation yet. This seems to fit.
Also, this service uses _6_ relay candidates for each side ( _11_ candidates total!!! ), resulting in a whopping 86 candidate pairs once pruning is carried out. We're definitely pushing the limits of what is reasonable here.
(Although, my machine is multi-homed, and this may have exacerbated the pileup of candidates/pairs somewhat)
For context: At the time of the crash, we had a lot more. 

In the iceServers config, we had: 
3 STUN urls – 3 availability zones.
12 TURN urls – 3 availability zones * 2 (TCP and UDP) * 2 different ports.

We removed most of those today, as they proved to slow things down a lot:
6 TURN urls – 3 availability zones * 2 (TCP and UDP)
Make it easier to reproduce this bug, by disabling the TCP connected callback, and reducing the TCP connect timeout. Do not check this in, please.
Logging this no-op is probably gratuitous; we have already logged the failure earlier.
Attachment #8387905 - Attachment is obsolete: true
Attachment #8387932 - Flags: review?(ekr)
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Attachment #8387932 - Attachment is obsolete: true
Attachment #8387932 - Flags: review?(ekr)
Attachment #8438578 - Flags: review?(ekr)
Byron -- Can you unrot this and revector to Nils when you're both back from PTO?
backlog: --- → webRTC+
Rank: 45
Flags: needinfo?(docfaraday)
Priority: -- → P4
Comment on attachment 8438578 [details] [diff] [review]
Do not assert that TURN TCP client contexts can't fail before attempting to allocate.

Review of attachment 8438578 [details] [diff] [review]:
-----------------------------------------------------------------

Clearing r?. Feel free to r? when this is un-rotted.
Attachment #8438578 - Flags: review?(ekr)
Comment on attachment 8438578 [details] [diff] [review]
Do not assert that TURN TCP client contexts can't fail before attempting to allocate.

Review of attachment 8438578 [details] [diff] [review]:
-----------------------------------------------------------------

Patch applies cleanly still. I really doubt we'll see any problems on try (given that this should only happen with TURN TCP, which we have never tested in CI), but here's one just for completeness:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=2beaff7aa72b
Attachment #8438578 - Flags: review?(ekr)
Flags: needinfo?(docfaraday)
Comment on attachment 8438578 [details] [diff] [review]
Do not assert that TURN TCP client contexts can't fail before attempting to allocate.

Review of attachment 8438578 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/mtransport/third_party/nICEr/src/stun/turn_client_ctx.c
@@ +532,5 @@
>    nr_turn_stun_ctx *stun = 0;
>    int r,_status;
>  
> +  if(ctx->state == NR_TURN_CLIENT_STATE_FAILED ||
> +     ctx->state == NR_TURN_CLIENT_STATE_CANCELLED){

How do you end up with allocate being called when you are in cancelled. It seems like the only time we call nr_turn_client_cancel() is in nr_turn_client_ctx_destroy()
Attachment #8438578 - Flags: review?(ekr)
(In reply to Eric Rescorla (:ekr) from comment #20)
> Comment on attachment 8438578 [details] [diff] [review]
> Do not assert that TURN TCP client contexts can't fail before attempting to
> allocate.
> 
> Review of attachment 8438578 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: media/mtransport/third_party/nICEr/src/stun/turn_client_ctx.c
> @@ +532,5 @@
> >    nr_turn_stun_ctx *stun = 0;
> >    int r,_status;
> >  
> > +  if(ctx->state == NR_TURN_CLIENT_STATE_FAILED ||
> > +     ctx->state == NR_TURN_CLIENT_STATE_CANCELLED){
> 
> How do you end up with allocate being called when you are in cancelled. It
> seems like the only time we call nr_turn_client_cancel() is in
> nr_turn_client_ctx_destroy()

   This is the case right now, yes. I think I wanted this to continue doing the right thing if we started canceling for whatever reason.
Please add a comment to that effect, then r=me
Attachment #8438578 - Attachment is obsolete: true
Attachment #8627722 - Attachment is obsolete: true
Attachment #8627723 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/26c012160bbc
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla42
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: