Closed Bug 1309585 Opened 8 years ago Closed 8 years ago

GetAdaptersAddresses failure results in call failure on Win

Categories

(Core :: WebRTC: Networking, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla52
Tracking Status
firefox52 --- fixed

People

(Reporter: mreavy, Assigned: drno)

References

Details

Attachments

(3 files)

Dan Minor and I tried to use a appear.in for our 1:1 today.  We had 2 attempts calls before we finally connected.  Dan refreshed the page twice (second time worked).  I'll be attaching my logs and Dan's.
Here is Dan's side of the call
Nils, Byron, Michael -- Can one of you have a look at our logs?  Both Dan and I were using Nightly. Thanks.
Flags: needinfo?(mfroman)
Flags: needinfo?(drno)
Flags: needinfo?(docfaraday)
Rank: 10
I see the following on Maire's side:

(stun/ERR) Error getting buf len from GetAdaptersAddresses()

(ice/ERR) ICE(PC:1476280719236000 (id=2147484481 url=https://appear.in/mreavy)): unable to find local addresses
Flags: needinfo?(docfaraday)
So I understand - those messages only appear in nr_ice_get_local_addresses and stun_get_win32_addrs.  So this looks like a windows networking failure of some type?
Flags: needinfo?(mfroman)
(In reply to Byron Campen [:bwc] from comment #3)
> I see the following on Maire's side:
> 
> (stun/ERR) Error getting buf len from GetAdaptersAddresses()

This is one of the low level error, which I believed so far is fatal. According to old Hello logs this happens from time to time on Windows machines. How often? We don't know. Another reason why we need nICEr to report into Telemetry.
 
> (ice/ERR) ICE(PC:1476280719236000 (id=2147484481
> url=https://appear.in/mreavy)): unable to find local addresses

What I don't get is that I always thought the above GetAdaptersAddresses() call is a fatal failure and everything would stop afterwards.
But looking at the log here it looks like that PC still gathered quite a bit of TURN addresses. Then eventually it deallocates the relay addresses and starts over with new port numbers, and that seems to work better.

First of all I think the GetAdaptersAddresses() should probably log the return code when it fails. The other good news is that the later success on the same PC means this was apparently only a temporary error. So maybe we should simply try to put a loop around these GetAdaptersAddresses() calls to try it three times before we give up?
Flags: needinfo?(drno)
Attachment #8800536 - Flags: review?(docfaraday)
Comment on attachment 8800536 [details]
Bug 1309585: use default address dicovery as backup for interface iteration.

https://reviewboard.mozilla.org/r/85460/#review84554

I wonder... would using nr_ice_get_default_address let us bypass failures in GetAdaptersAddresses? Maybe we should always call nr_ice_get_default_address, make that address the top priority, and then get additional addresses with nr_stun_get_addrs (if configured to)?

::: media/mtransport/third_party/nICEr/src/stun/addrs.c:172
(Diff revision 3)
>  
> -    r = GetAdaptersAddresses(AF_UNSPEC, 0, NULL, AdapterAddresses, &buflen);
> +    while (n < 3) {
> +      r = GetAdaptersAddresses(AF_UNSPEC, GAA_FLAG_SKIP_MULTICAST | GAA_FLAG_SKIP_DNS_SERVER, NULL, AdapterAddresses, &buflen);
> -    if (r != ERROR_BUFFER_OVERFLOW) {
> +      if (r != ERROR_BUFFER_OVERFLOW) {
> -      r_log(NR_LOG_STUN, LOG_ERR, "Error getting buf len from GetAdaptersAddresses()");
> +        r_log(NR_LOG_STUN, LOG_ERR, "Error (%d) getting buf len from GetAdaptersAddresses()", r);
> +      } else if (r == NO_ERROR) {

r == ERROR_BUFFER_OVERFLOW here, always, so this will never be true.

::: media/mtransport/third_party/nICEr/src/stun/addrs.c:177
(Diff revision 3)
> +    if (n > 2) {
> +      r_log(NR_LOG_STUN, LOG_ERR, "Failed 3 times to get buf len from GetAdaptersAddresses()", r);
>        ABORT(R_INTERNAL);
>      }
>  

You'll need to set n back to 0 after this.

::: media/mtransport/third_party/nICEr/src/stun/addrs.c:178
(Diff revision 3)
> +        break;
> +      }
> +      n++;
> +    }
> +    if (n > 2) {
> +      r_log(NR_LOG_STUN, LOG_ERR, "Failed 3 times to get buf len from GetAdaptersAddresses()", r);

This needs a %d somewhere.
Attachment #8800536 - Flags: review?(docfaraday) → review-
Assignee: nobody → drno
Comment on attachment 8800536 [details]
Bug 1309585: use default address dicovery as backup for interface iteration.

https://reviewboard.mozilla.org/r/85460/#review84554

Yes, that sounds like a really good idea to me. I'll see if I can get it coded up.
Comment on attachment 8800536 [details]
Bug 1309585: use default address dicovery as backup for interface iteration.

https://reviewboard.mozilla.org/r/85460/#review85086

::: media/mtransport/third_party/nICEr/src/ice/ice_ctx.c:666
(Diff revision 4)
>    abort:
>      nr_socket_destroy(&sock);
>      return(_status);
>    }
>  
> -static int nr_ice_get_default_local_address(nr_ice_ctx *ctx, int ip_version, nr_local_addr* addrs, int addr_ct, nr_local_addr *addrp)
> +static int nr_ice_get_default_local_address(nr_ice_ctx *ctx, int ip_version, nr_local_addr addrs[], int addr_ct)

We should probably just have this take (nr_ice_ctx*, int ip_version, nr_local_addr*)

::: media/mtransport/third_party/nICEr/src/ice/ice_ctx.c:677
(Diff revision 4)
> -      }
> -    }
> -    if (i==addr_ct)
> -      ABORT(R_NOT_FOUND);
>  
> +    strlcpy(addrs[addr_ct].addr.ifname, "unknown", sizeof(addrs[addr_ct].addr.ifname));

Let's use "default route" here.

::: media/mtransport/third_party/nICEr/src/ice/ice_ctx.c:692
(Diff revision 4)
> -      /* First, gather all the local addresses we have */
> -      if((r=nr_stun_find_local_addresses(local_addrs,MAXADDRS,&addr_ct))) {
> -        r_log(LOG_ICE,LOG_ERR,"ICE(%s): unable to find local addresses",ctx->label);
> -        ABORT(r);
> +      /* First get the default IPv4 and IPv6 addrs */
> +      if (!nr_ice_get_default_local_address(ctx, NR_IPV4, local_addrs, addr_ct)) {
> +        ++addr_ct;
> +      }
> +      if (!nr_ice_get_default_local_address(ctx, NR_IPV6, local_addrs, addr_ct)) {
> +        ++addr_ct;
> +      }
> +
> +      if (!(ctx->flags & NR_ICE_CTX_FLAGS_ONLY_DEFAULT_ADDRS)) {
> +        /* Then gather all remaining local addresses we have */
> +        if((r=nr_stun_find_local_addresses(local_addrs, MAXADDRS, &addr_ct))) {
> +          r_log(LOG_ICE, LOG_ERR, "ICE(%s): failed to find more local addresses", ctx->label);
> +        }

Hmm, we get duplicates here. Maybe the thing to do is try calling nr_stun_find_local_addresses, and if it fails (or is prevented by NR_ICE_CTX_FLAGS_ONLY_DEFAULT_ADDRESS), fall back to nr_ice_get_default_local_address.

::: media/mtransport/third_party/nICEr/src/net/local_addr.c:59
(Diff revision 4)
>                         (addr_type & NR_INTERFACE_TYPE_MOBILE) ? "mobile" :
>                         "unknown";
>  
>      snprintf(buf, len, "%s%s, estimated speed: %d kbps",
>               vpn, type, addr->interface.estimated_speed);
> +    buf[len] = '\0';

I thought snprintf always wrote a null terminator?

::: media/mtransport/third_party/nICEr/src/stun/addrs.c
(Diff revision 4)
> -    *count = 0;
> -

Any particular reason you removed this?

::: media/mtransport/third_party/nICEr/src/stun/addrs.c:259
(Diff revision 4)
> -  *count=0;
> +  if (maxaddrs <= 0)
> +    ABORT(R_INTERNAL);

Again, any particular reason you removed the *count = 0; ?

::: media/mtransport/third_party/nICEr/src/stun/addrs.c:375
(Diff revision 4)
> -    for (i = 0; i < *count; ++i) {
> +    /* reverse order to give interface dicovered addresses preference over the
> +     * default address discovery result which is missing the ifname */
> +    for (i = *count-1; i >= 0 ; --i) {

If we prevent duplicates above, we don't end up needing this change. Also, the interface prioritizer is probably just going to mess this up anyhow.
Attachment #8800536 - Flags: review?(docfaraday) → review-
Comment on attachment 8800536 [details]
Bug 1309585: use default address dicovery as backup for interface iteration.

https://reviewboard.mozilla.org/r/85460/#review86912

::: media/mtransport/third_party/nICEr/src/net/local_addr.c:59
(Diff revision 4)
>                         (addr_type & NR_INTERFACE_TYPE_MOBILE) ? "mobile" :
>                         "unknown";
>  
>      snprintf(buf, len, "%s%s, estimated speed: %d kbps",
>               vpn, type, addr->interface.estimated_speed);
> +    buf[len] = '\0';

Up to Win10 it is not guaranteed on Win https://msdn.microsoft.com/en-us/library/2ts7cx93.aspx#Remarks
Comment on attachment 8800536 [details]
Bug 1309585: use default address dicovery as backup for interface iteration.

https://reviewboard.mozilla.org/r/85460/#review85086

> Any particular reason you removed this?

In this version of patch I had to drop these, because the gathering through all the interfaces happened after the default address gathering. So |count| would already be 1 or 2 and it had to append to it, rather then start at zero. No longer relevant, because of the re-ordering from the above comment.
Comment on attachment 8800536 [details]
Bug 1309585: use default address dicovery as backup for interface iteration.

https://reviewboard.mozilla.org/r/85460/#review85086

> We should probably just have this take (nr_ice_ctx*, int ip_version, nr_local_addr*)

Even though I agree that nr_ice_get_default_local_address() signature looks ugly, I think what you are asking for would basically be nr_ice_get_default_address(). But as nr_ice_get_default_address() fails to figure out the interface name we need some function to patch that up. The only way I see around that would be if nr_ice_get_default_local_address() would always use "defaulte route" for the interface name. But that would screw over the interface prioritizer. Therefore I'll leave it like it is for now. Feel free to flag it again if you think we should change it.
Summary: 2 failed calls ("your networking is blocking call set up") → GetAdaptersAddresses failure results in call failure on Win
Byron, it looks like Chrome used the same approach of calling GetAdaptersAddresses() twice and had some failures when doing that: https://bugs.chromium.org/p/chromium/issues/detail?id=616908
As they switched to the recommended solution from MSDN (https://msdn.microsoft.com/en-us/library/windows/desktop/aa365915(v=vs.85).aspx) of doing the initial GetAdaptersAddresses with a 15k buffer already (https://chromium.googlesource.com/chromium/src.git/+/e5a246f582215340d3c71022a87d4c152a8269df%5E%21/#F0) I think we should try the same, rather then staying on old approach.
And hopefully combining this with the default route idea as a backup this is really going to make it even better.
Comment on attachment 8800536 [details]
Bug 1309585: use default address dicovery as backup for interface iteration.

https://reviewboard.mozilla.org/r/85460/#review87694

Some nits, otherwise good.

::: media/mtransport/third_party/nICEr/src/stun/addrs.c:167
(Diff revision 5)
>  
>      if (maxaddrs <= 0)
>        ABORT(R_INTERNAL);
>  
> -    /* Call GetAdaptersAddresses() twice.  First, just to get the buf length */
> -
> +    /* According to MSDN (see above) we have try GetAdapterAddresses() multiple times */
> +    while (n < 5) {

for loop?

::: media/mtransport/third_party/nICEr/src/stun/addrs.c:181
(Diff revision 5)
> +    }
> +    if (n >= 5) {

A line break here would be good.

::: media/mtransport/third_party/nICEr/src/stun/addrs.c:264
(Diff revision 5)
> +  if (maxaddrs <= 0)
> +    ABORT(R_INTERNAL);

Probably should be R_BAD_ARGS.
Attachment #8800536 - Flags: review?(docfaraday) → review+
Is this related to bug 1107702?
Flags: needinfo?(drno)
(In reply to Randell Jesup [:jesup] from comment #20)
> Is this related to bug 1107702?

Yes kind of. If the new code here works as we think/hope it does then we should be able to remove the extra code from bug 1107702. Because if FF has crossed the magic 2GB mark GetAdaptersAddresses will/might (?) fail (since the IPv6 support landed we no longer use GetAdaptersInfo - therefore it is questionable if the code from bug 1107702 does any good any how), we then fall back to the new default route discovery which does not rely on any of the Windows specific API's.

So we could either remove the code from bug 1107702 together with this landing or leave it in for now and remove it later once we know more about it.
Flags: needinfo?(drno)
See Also: → 1107702
Pushed by drno@ohlmeier.org:
https://hg.mozilla.org/integration/autoland/rev/a18f7b70b808
use default address dicovery as backup for interface iteration. r=bwc
https://hg.mozilla.org/mozilla-central/rev/a18f7b70b808
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
Depends on: 1324995
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: