Closed
Bug 1287874
Opened 8 years ago
Closed 8 years ago
Assert fail on ctx->timeout_ms
Categories
(Core :: WebRTC: Signaling, defect, P2)
Tracking
()
RESOLVED
FIXED
mozilla50
Tracking | Status | |
---|---|---|
firefox50 | --- | fixed |
People
(Reporter: q1, Unassigned)
References
()
Details
Attachments
(1 file)
While attempting to establish a Hello session between two instances of FF 47.0 x64 debug hosted on the same machine, I hit an assertion failure in media\mtransport\third_party\nICEr\src\stun\stun_client_ctx.c, line 419: 419: assert(ctx->timeout_ms); I had the debugger attached, so I was able to investigate further. The proximate cause of the problem appears to be that |ctx->maximum_transmits_timeout_ms| == 0. This matters because the condition on line 401 was satisfied: 401: if (ctx->request_ct >= ctx->maximum_transmits) { 402: /* Reliable transport only get here once. Unreliable get here for 403: * their final timeout. */ 404: ctx->timeout_ms += ctx->maximum_transmits_timeout_ms; 405: } so, if |ctx_timeout_ms| was 0 previously, it would remain 0 and thus trigger the assert. The instance of FF that failed was displaying the page http://iris.not.iac.es/view/index.shtml . I opened Hello in that instance, copied the conversation link, and pasted it into a different instance of FF. After ~30 seconds (this is a debug build!), the assert appeared. The stack trace is: ucrtbase.dll!000007feea3e6ab0() Unknown > xul.dll!nr_stun_client_send_request(nr_stun_client_ctx_ * ctx=0x000000000a6c400c) Line 419 C xul.dll!nr_stun_client_start(nr_stun_client_ctx_ * ctx=0x000000000a6c400c, int mode=0x00000014, void(*)(void *, int, void *) finished_cb=0x000007fecd1fa660, void * cb_arg=0x00000000150b3cac) Line 143 C xul.dll!nr_turn_stun_ctx_start(nr_turn_stun_ctx_ * ctx=0x00000000150b3cac) Line 212 C xul.dll!nr_turn_client_allocate(nr_turn_client_ctx_ * ctx=0x0000000018d2a40c, void(*)(void *, int, void *) finished_cb=0x000007fecd1cd630, void * cb_arg=0x000000001880300c) Line 648 C xul.dll!nr_ice_start_relay_turn_timer_cb(void * s=0x0000000000000000, int how=0x00000000, void * cb_arg=0x000000001880300c) Line 771 C xul.dll!mozilla::nrappkitTimerCallback::Notify(nsITimer * timer=0x0000000019dcda60) Line 126 C++ xul.dll!nsTimerImpl::Fire() Line 528 C++ xul.dll!nsTimerEvent::Run() Line 290 C++ xul.dll!nsThread::ProcessNextEvent(bool aMayWait=true, bool * aResult=0x000000000462f122) Line 994 C++ xul.dll!NS_ProcessNextEvent(nsIThread * aThread=0x0000000003602f00, bool aMayWait=true) Line 297 C++ xul.dll!nsSocketTransportService::Run() Line 894 C++ xul.dll!nsThread::ProcessNextEvent(bool aMayWait=false, bool * aResult=0x000000000462f5f2) Line 994 C++ xul.dll!NS_ProcessNextEvent(nsIThread * aThread=0x0000000003602f00, bool aMayWait=false) Line 297 C++ xul.dll!mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate * aDelegate=0x0000000000967530) Line 326 C++ xul.dll!MessageLoop::RunInternal() Line 235 C++ xul.dll!MessageLoop::RunHandler() Line 228 C++ xul.dll!MessageLoop::Run() Line 202 C++ xul.dll!nsThread::ThreadFunc(void * aArg=0x0000000003602f00) Line 398 C++ nss3.dll!_PR_NativeRunThread(void * arg=0x0000000000923c00) Line 406 C nss3.dll!pr_root(void * arg=0x0000000000923c00) Line 96 C ucrtbase.dll!000007feea3e82dd() Unknown kernel32.dll!BaseThreadInitThunk() Unknown ntdll.dll!RtlUserThreadStart() Unknown I also have a dump file, which I will be happy to examine (but not upload) if you have specific questions. I have not attempted to reproduce the problem.
Hmm, I just noticed that *both* instances of FF crash with the assert. Also, the crash occurs even when using brand-new profiles in both instances.
Comment 3•8 years ago
|
||
This is most likely an issue in the core webrtc code, so moving to that product/component for better visibility.
Component: Client → WebRTC
Product: Hello (Loop) → Core
Updated•8 years ago
|
Rank: 21
Component: WebRTC → WebRTC: Signaling
Priority: -- → P2
Comment 4•8 years ago
|
||
|maximum_transmits_timeout_ms| should never be 0, looking at the code. Does the dump file show that this is the case? If so, what kind of shape is the rest of |ctx| in?
Flags: needinfo?(q1)
Comment 5•8 years ago
|
||
|maximum_transmits_timeout_ms| could be 0 if "stun.client.final_retransmit_backoff" or "stun.client.retransmission_timeout" are set to 0. But that does not match the description that this happens even with fresh profiles. Very strange is that the |ctx| just got created in the nr_turn_client_allocate() call 3 frames up here: http://searchfox.org/mozilla-central/source/media/mtransport/third_party/nICEr/src/stun/turn_client_ctx.c#639
(In reply to Byron Campen [:bwc] from comment #4) > |maximum_transmits_timeout_ms| should never be 0, looking at the code. Does > the dump file show that this is the case? Yes. > If so, what kind of shape is the rest of |ctx| in? I'm not sure what it should look like, but it isn't overwritten with obvious trash. For example, most of the pointers leading out of the object point to valid memory or contain nullptr, the |label| field contains believable text, the |my_addr| and |peer_addr| fields contain believable addresses, etc. The |params| field, though, seems uninitialized; should it contain anything sensible?
(In reply to q1 from comment #6) > (In reply to Byron Campen [:bwc] from comment #4) > > |maximum_transmits_timeout_ms| should never be 0, looking at the code. Does > > the dump file show that this is the case? > > Yes. To clarify, the dump file shows |maximum_transmits_timeout_ms| == 0.
Comment 8•8 years ago
|
||
(In reply to q1 from comment #6) > (In reply to Byron Campen [:bwc] from comment #4) > > |maximum_transmits_timeout_ms| should never be 0, looking at the code. Does > > the dump file show that this is the case? > > Yes. > > > If so, what kind of shape is the rest of |ctx| in? > > I'm not sure what it should look like, but it isn't overwritten with obvious > trash. For example, most of the pointers leading out of the object point to > valid memory or contain nullptr, the |label| field contains believable text, > the |my_addr| and |peer_addr| fields contain believable addresses, etc. The > |params| field, though, seems uninitialized; should it contain anything > sensible? How strange. I would expect |params| to just be zeros. What about |rto_ms|, |retransmission_backoff_factor|, and |maximum_transmits|?
(In reply to Byron Campen [:bwc] from comment #8) > (In reply to q1 from comment #6) > > (In reply to Byron Campen [:bwc] from comment #4) > > > |maximum_transmits_timeout_ms| should never be 0, looking at the code. Does > > > the dump file show that this is the case? > > > > Yes. > > > > > If so, what kind of shape is the rest of |ctx| in? > > > > I'm not sure what it should look like, but it isn't overwritten with obvious > > trash. For example, most of the pointers leading out of the object point to > > valid memory or contain nullptr, the |label| field contains believable text, > > the |my_addr| and |peer_addr| fields contain believable addresses, etc. The > > |params| field, though, seems uninitialized; should it contain anything > > sensible? > > How strange. I would expect |params| to just be zeros. The first two bytes of |params| are 0x10, 0x0e, and the rest are 0. > What about |rto_ms|, 0x64 > |retransmission_backoff_factor|, 2.0 > and |maximum_transmits|? 1 Have you tried to reproduce this crash? It happens every time for me. I am using 47.0 x64 debug, built with no modifications, but not including Crash Reporter, which had build errors; I used the --disable-crashreporter switch to exclude it.
Comment 10•8 years ago
|
||
This is bizarre. It's almost as if pow is malfunctioning here: https://dxr.mozilla.org/mozilla-release/source/media/mtransport/third_party/nICEr/src/stun/stun_client_ctx.c#102
Reporter | ||
Comment 11•8 years ago
|
||
(In reply to Byron Campen [:bwc] from comment #10) > This is bizarre. It's almost as if pow is malfunctioning here: > https://dxr.mozilla.org/mozilla-release/source/media/mtransport/third_party/ > nICEr/src/stun/stun_client_ctx.c#102 Hmm. My build log says of that line "warning C4013: 'pow' undefined; assuming extern returning int", so perhaps it's linked to something unexpected.
Comment 12•8 years ago
|
||
Review commit: https://reviewboard.mozilla.org/r/67462/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/67462/
Reporter | ||
Comment 13•8 years ago
|
||
(In reply to Byron Campen [:bwc] from comment #10) > This is bizarre. It's almost as if pow is malfunctioning here: > https://dxr.mozilla.org/mozilla-release/source/media/mtransport/third_party/ > nICEr/src/stun/stun_client_ctx.c#102 It *is* malfunctioning. The first argument, |ctx->retransmission_backoff_factor|, == 2.0, and the second argument, |ctx->maximum_transmits| == 7, so the result should be 128.0. However, the function instead returns 0 in register |rax|, which the generated code then multiplies by |ctx->rto_ms| using the |imul| instruction. This implies that the compiler is indeed calling the wrong version of |pow| because of the missing declaration.
Comment 14•8 years ago
|
||
Neato.
Updated•8 years ago
|
Attachment #8775282 -
Flags: review?(drno)
Reporter | ||
Comment 15•8 years ago
|
||
So the question is, why does this code ever work? Have I found this problem only because the debug build is so slow?
Comment 16•8 years ago
|
||
Could just be compiler version.
Reporter | ||
Comment 17•8 years ago
|
||
(In reply to Byron Campen [:bwc] from comment #16) > Could just be compiler version. FWIW, I used VS 2015 Pro update 3 RC to build. But I think the compiler did the "right" thing given that it had no function prototype for |pow|: it sent the function the arguments, and expected a 32-bit int back.
Updated•8 years ago
|
Attachment #8775282 -
Flags: review?(drno) → review+
Comment 18•8 years ago
|
||
Comment on attachment 8775282 [details] Bug 1287874: Add missing math.h include. https://reviewboard.mozilla.org/r/67462/#review64676
Comment 19•8 years ago
|
||
(In reply to q1 from comment #17) > FWIW, I used VS 2015 Pro update 3 RC to build. I guess we should check how Mozilla's current official Windows compiler is handling this.
Comment 20•8 years ago
|
||
So the current treeherder builds show some other C4013 warnings, for which I opened bug 1289984. But from the logs it also looks like treeherder uses VS2015 u2.
Comment 21•8 years ago
|
||
Pushed by bcampen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/8e0019ff663e Add missing math.h include. r=drno
Comment 22•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/8e0019ff663e
Status: NEW → RESOLVED
Closed: 8 years ago
status-firefox50:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
Comment 23•8 years ago
|
||
I verified with VS 2015 U2 (the compiler currently in use on our build machines) in the debugger that pow() does the correct thing. The strange thing though is that even after upgrading to VS 2015 U3 I don't see the C4013 compiler warning for the call to pow() (I do see the warning for the things fixed in bug 1289984). Is is possible that either some environment things, like special compiler flags or libs installed on the system, have caused this to only show up on the system of the bug reporter?
Comment 24•8 years ago
|
||
As we are not in control of the compiler version used on our build machines I think we should uplift this as high a possible to prevent any problems in case RelEng decides to start using VS 2015 U3.
Reporter | ||
Comment 25•8 years ago
|
||
(In reply to Nils Ohlmeier [:drno] from comment #23) > I verified with VS 2015 U2 (the compiler currently in use on our build > machines) in the debugger that pow() does the correct thing. > The strange thing though is that even after upgrading to VS 2015 U3 I don't > see the C4013 compiler warning for the call to pow() (I do see the warning > for the things fixed in bug 1289984). Is is possible that either some > environment things, like special compiler flags or libs installed on the > system, have caused this to only show up on the system of the bug reporter? I don't know of any special libraries or of unusual build flags other than --disable-crashreporter (which I needed because crashreporter had build errors. I did verify the source tarball's SHA512, so I shouldn't have broken source). I built using Mozillabuild 2.2.0. Did you build the x64 image? That's the one I built.
Comment 26•8 years ago
|
||
Finally got it. I get the warning for the pow() function only with 64bit build and enable-debug and disable-optimize. And even though I still can't repro the 0 value at least all the C4013 warnings are gone (except for one in sqlite). Unfortunately I can't go back to U2 any more (VS keeps crashing after the downgrade).
Reporter | ||
Comment 27•8 years ago
|
||
(In reply to Nils Ohlmeier [:drno] from comment #26) > Finally got it. I get the warning for the pow() function only with 64bit > build and enable-debug and disable-optimize. And even though I still can't > repro the 0 value That is bizarre. Maybe it happens only with U3 RC, and not U3 release? > Unfortunately I can't go back to U2 any more (VS keeps crashing after the > downgrade). Ugh.
Comment 28•8 years ago
|
||
But the fact that the warning only shows up on 64bit debug builds lets me sleep better. And I don't think we need to uplift it for that small build group.
You need to log in
before you can comment on or make changes to this bug.
Description
•