no relay candidates after ice restarts

VERIFIED FIXED in Firefox 49

Status

()

defect
P1
normal
Rank:
15
VERIFIED FIXED
3 years ago
3 years ago

People

(Reporter: fippo, Assigned: mjf)

Tracking

50 Branch
mozilla51
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox48 wontfix, firefox49 fixed, firefox-esr45 unaffected, firefox50 fixed, firefox51 verified)

Details

Attachments

(2 attachments)

Andreas Pehrsons just experienced an ice failure and I noticed something interesting about it. After the ice restart no relay candidates were gathered.

I managed to reproduce this in a fiddle: https://jsfiddle.net/ww22ayuk/4/
(turn credentials might have to be updated soon)

The odd behaviour is that before the restart, there are relay candidates gathered but none after the restart (which happens after about 10 seconds)
mjf, I think this one is for you. Wasn't sure if I should make it a P1 or a P2 since it's not a regression but I think a P1 is justified since this kind of breaks the purpose of ICE restart.
Rank: 15
Flags: needinfo?(mfroman)
Priority: -- → P1
Assignee: nobody → mfroman
this is a pcap from captured while running the fiddle. We can see the initial TURN allocation at the beginning of this with the allocation that generates the candidate in packet #17 (I am not sure why it allocates two different ports in packets #18 and 19?!).

There are refresh requests with a lifetime of 0 in packets #34 and #35. This is probably closing the ports from the first generation of ice candidates. But I do not see any attempt to create new allocations which should happen in parallel.

the fiddle does currently not use a stun server, this might be worth checking in addition.
At least in the case of this fiddle, it looks like only the offerer side is actually restarting ice.  I'm getting an rr run to see what is really happening.
Flags: needinfo?(mfroman)
Comment on attachment 8783052 [details]
Bug 1295552 - restart ice ctx needs stun/turn/dns setup to match original ctx.

https://reviewboard.mozilla.org/r/73024/#review70788

One smaller issue, but otherwise LGTM.

::: media/mtransport/nricectxhandler.cpp:110
(Diff revision 1)
> +  NrIceResolver* resolver =
> +    static_cast<NrIceResolver*>(this->current_ctx->ctx_->resolver->obj);
> +  if (!resolver ||
> +      NS_FAILED(new_ctx->SetResolver(resolver->AllocateResolver()))) {

This makes me wonder if we should try to preserve the DNS lookup results from the old ICE ctx (to avoid looking them up again)?
Might be worth filling a follow up ticket for later optimization.

::: media/mtransport/third_party/nICEr/src/ice/ice_ctx.c:187
(Diff revision 1)
> +        ABORT(R_NO_MEMORY);
> +      }
> +      if (r = r_data_create(&ctx->turn_servers[i].password,
> +                            servers[i].password->data,
> +                            servers[i].password->len)) {
> +        RFREE(ctx->turn_servers[i].username);

You think it's worth trying to free the N's username here, but leave the usernames and password for 1..N-1 allocated?
As your function does not return how many turn servers is copy over at this point I would prefer to either clean up everything, or don't free the username here either (if memory allocation fails we are pretty much doomed any how).
Attachment #8783052 - Flags: review?(drno) → review+
Comment on attachment 8783052 [details]
Bug 1295552 - restart ice ctx needs stun/turn/dns setup to match original ctx.

https://reviewboard.mozilla.org/r/73024/#review70788

> This makes me wonder if we should try to preserve the DNS lookup results from the old ICE ctx (to avoid looking them up again)?
> Might be worth filling a follow up ticket for later optimization.

Created Bug 1296755.
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/e7b0bdaa0661
restart ice ctx needs stun/turn/dns setup to match original ctx. r=drno
Keywords: checkin-needed
Backout by ihsiao@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9f442b970ac1
Backed out changeset e7b0bdaa0661 for GTest crashed @mozilla::NrIceCtxHandler::CreateCtx
(In reply to Iris Hsiao [:ihsiao] from comment #9)
> Sorry had to backout for GTest crashed @mozilla::NrIceCtxHandler::CreateCtx,
> e.g.,
> https://treeherder.mozilla.org/logviewer.
> html#?job_id=2286074&repo=autoland#L0-L9855
> https://treeherder.mozilla.org/logviewer.
> html#?job_id=2284755&repo=autoland#L8027

I have pushed a new change to review that fixes this issue.  Sorry for the need to backout!
Flags: needinfo?(mfroman)
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/60145a6103ee
restart ice ctx needs stun/turn/dns setup to match original ctx. r=drno
Keywords: checkin-needed
Comment on attachment 8783052 [details]
Bug 1295552 - restart ice ctx needs stun/turn/dns setup to match original ctx.

Approval Request Comment
[Feature/regressing bug #]: Bug 906986 - Support ICE restart.
[User impact if declined]: Ice restart through firewall/NAT will fail because no relay candidates will be found.  Without this fix, Ice restart doesn't work, and ICE restart is an important tool for improving call success rates for WebRTC users.
[Describe test coverage new/current, TreeHerder]: GTest for ice_unittest currently tests the restart mechanism, but does not inspect for relay candidates.
[Risks and why]: There is little risk in accepting this patch.  The changes are localized to a method (NrIceCtxHandler::CreateCtx) that is only called during ice restart, plus a new function in ice_ctx.c that is only used from NrIceCtxHandler::CreateCtx.  Normal ice functionality is not affected.
[String/UUID change made/needed]: none
Attachment #8783052 - Flags: approval-mozilla-beta?
Attachment #8783052 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/60145a6103ee
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
Hello Philipp, could you please verify this issue is fixed as expected on a latest Nightly build? Thanks!
Flags: needinfo?(fippo)
I'd like this to stabilize on Nightly for a few days and perhaps uplift early next week.
works for me in 51.0a1 (2016-08-28)!
Flags: needinfo?(fippo)
(In reply to Philipp Hancke from comment #18)
> works for me in 51.0a1 (2016-08-28)!

Thanks for verifying!
Status: RESOLVED → VERIFIED
Comment on attachment 8783052 [details]
Bug 1295552 - restart ice ctx needs stun/turn/dns setup to match original ctx.

Fix was verified on Nightly and has stabilized over the weekend, Aurora50+
Attachment #8783052 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8783052 [details]
Bug 1295552 - restart ice ctx needs stun/turn/dns setup to match original ctx.

Verified fix, sounds like this is unlikely to affect anyone beyond the users who already are not in an ideal situation. Let's take this for beta 9.
Attachment #8783052 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.