Closed Bug 922068 Opened 11 years ago Closed 11 years ago

Intermittent crash in test_peerConnection_setLocalOfferInHaveRemoteOffer.html | application crashed [@ msvcr100.dll + 0x1e2ad][@ nr_ice_format_candidate_attribute(ns_ice_candidate *,char*,in)]

Categories

(Core :: WebRTC: Networking, defect)

x86
Windows XP
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla27
Tracking Status
firefox25 --- unaffected
firefox26 --- unaffected
firefox27 --- fixed
firefox-esr17 --- unaffected
firefox-esr24 --- unaffected
b2g18 --- unaffected
b2g-v1.2 --- unaffected

People

(Reporter: jesup, Assigned: ekr)

References

Details

(4 keywords, Whiteboard: [webrtc])

Attachments

(2 files, 1 obsolete file)

mis-filed under bug 921156 -- different stack, almost certainly a different bug.

From the report:
gcp
https://tbpl.mozilla.org/php/getParsedLog.php?id=28544292&tree=Mozilla-Inbound
Windows 7 32-bit mozilla-inbound opt test mochitest-3 on 2013-09-29 22:54:53
revision: d09f86ec45cd
slave: t-w732-ix-110

PROCESS-CRASH | /tests/dom/media/tests/mochitest/test_peerConnection_setLocalOfferInHaveRemoteOffer.html | application crashed [@ msvcr100.dll + 0x1e2ad]
Return code: 1

Crash reason:  EXCEPTION_ACCESS_VIOLATION_WRITE
INFO -  Crash address: 0x74

Crash is in snprintf() called from  nr_ice_format_candidate_attribute()
Attached file callstack
On a quick read of the code, my guess is that we're running into a synchronization issue here.

This stack trace shows nr_ice_media_stream_get_attributes being called on the main thread, which I'm not sure is safe: it reads the nr_ice_media_stream components and their candidates directly. I believe these structures are manipulated by nICEr on the STS thread.

In particular, this function first iterates over the components and their candidates to get a total count of how many buffers to allocate. It then allocates those buffers and performs a second iteration over the components and candidates, filling in the buffers. If the list of candidates increases between the initial count and the filling in of the buffers -- say, because the nICEr stack has received a STUN binding_request response -- then snprintf is going to try to format a candidate into whatever location is pointed to by a random bit of heap memory.

EKR -- does that sound about right to you?
Flags: needinfo?(ekr)
Group: media-core-security
I think this is mostly but not quite right.

Specifically I think this is a new with trickle issue because in
non-trickle this fxn would only be called after gathering is
complete but before we start ICE checking, so the candidate
list should not be changing at this point.

With that said, now that we have implemented trickle,
we do need to push this onto the STS thread somehow. Unfortunately,
it's not as simple as just calling from here. What we probably need
to do is to acquire a RefPtr<NrIceCtx> and RefPtr<NrIceMediaStream>
from the main thread and then we can call these fxns on the STS thread.
I agree that this situation is impossible prior to the trickle ice patches. Before those landed, we would have all our candidates gathered (except peer reflexive) before either SetRemoteDescription or CreateAnswer could be called (because of the way the PC.js code enqueued those commands). And the peer reflexive candidates couldn't start to be gathered until after the setremote/createanswer cycle was done, since the remote browser could not begin STUN checks without the answer.
Assignee: nobody → ekr
Flags: needinfo?(ekr)
Group: core-security
Blocks: 921225
Per email discussion, marking public intermittent bug as depending on this one (and not noting the security aspects there)
In past s-s intermittents (including a prior WebRTC one), we've just lived with manually starring the s-s bug. Having multiple bugs open to track the same issue is begging for confusion. Let's just star this bug and get on with life.
Comment on attachment 814388 [details] [diff] [review]
Move ICE candidate retrieval to the STS thread.

Review of attachment 814388 [details] [diff] [review]:
-----------------------------------------------------------------

I see one typo in a comment.

This has passed one try run:
https://tbpl.mozilla.org/?tree=Try&rev=f46ed73550db

Adam, do you know how reliable failures are here.
Attachment #814388 - Flags: review?(adam)
(In reply to Eric Rescorla (:ekr) from comment #13)
> Adam, do you know how reliable failures are here.

Retriggered a few more for you ;)
Thanks.
I was hopeful that this patch would fix the unittest as well (bug 922836), but it doesn't change that behavior so these must be two different bugs.
Yes, I still need to diagnose 922836.
Comment on attachment 814388 [details] [diff] [review]
Move ICE candidate retrieval to the STS thread.

50 green runs each on XP/7/8 tell me that this patch works :)
Attachment #814388 - Flags: feedback+
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #19)
> Comment on attachment 814388 [details] [diff] [review]
> Move ICE candidate retrieval to the STS thread.
> 
> 50 green runs each on XP/7/8 tell me that this patch works :)

Good.

Do we need any kind of security approval to land this since the
problem is just on m-c?
Trunk-only bugs don't need sec-approval.
Comment on attachment 814388 [details] [diff] [review]
Move ICE candidate retrieval to the STS thread.

Review of attachment 814388 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good for a quick fix to the problem. r+ with minor nits.

::: media/webrtc/signaling/src/media/VcmSIPCCBinding.cpp
@@ +485,5 @@
>   *  @param[in]  mcap_id - Media Capability ID
>   *  @param[in]  group_id - group identifier to which stream belongs.
>   *  @param[in]  stream_id - stream identifier
>   *  @param[in]  call_handle  - call identifier
>   *  @param[in]  peerconnection - the peerconnection in use

Probably want to to document "level" here.

@@ +487,5 @@
>   *  @param[in]  stream_id - stream identifier
>   *  @param[in]  call_handle  - call identifier
>   *  @param[in]  peerconnection - the peerconnection in use
> + *  @param[out] ctx - the NrIceCtx
> + *  @param[out] ctream - the NrIceStream

stream

@@ +514,5 @@
> +  if (!*ctx)
> +    return VCM_ERROR;
> +
> +  CSFLogDebug( logTag, "%s: Getting stream %d", __FUNCTION__, level);
> +  *stream = pc.impl()->media()->ice_media_stream(level-1);

It would be nice to have a comment here indicating the the SDP level is one-based, while the ice_media_stream level is zero-based.

@@ +551,5 @@
>  {
> +  // Assign these to a RefPtr so we release the strong reference
> +  // on exit.
> +  RefPtr<NrIceCtx> ctx(ctx_in);
> +  RefPtr<NrIceMediaStream> stream(stream_in);

I don't think this is necessary: TemporaryRef manipulates the RefPtr-style refcounts to keep them alive for as long as the TemporaryRef is alive. If I read this correctly, ctx/ctx_in/stream/stream_in will all go away on stack unwinding when this method returns, making the explicit RefPtrs superfluous.

@@ +608,5 @@
>   *  @param[in]  group_id - group identifier to which stream belongs.
>   *  @param[in]  stream_id - stream identifier
>   *  @param[in]  call_handle  - call identifier
>   *  @param[in]  peerconnection - the peerconnection in use
> + *  @param[in]   level        - the m-line index

Nit: "level" is indented further than the other parameters. Perhaps mention that the index is one-based?

@@ +640,2 @@
>    mozilla::SyncRunnable::DispatchToThread(VcmSIPCCBinding::getMainThread(),
> +                        WrapRunnableNMRet(&vcmGetIceStream_m,

This change in indenting makes it look like the following list are parameters to DispatchToThread rather than parameters to WrapRunnableNMRet. I think the original indenting was much clearer.

@@ +654,5 @@
> +  // Now get the ICE parameters from the STS thread.
> +  // We .forget() the strong refs so that they can be
> +  // released on the STS thread.
> +  mozilla::SyncRunnable::DispatchToThread(VcmSIPCCBinding::getSTSThread(),
> +                        WrapRunnableNMRet(&vcmRxAllocICE_s,

Same comment regarding indent as above.
Comment on attachment 814388 [details] [diff] [review]
Move ICE candidate retrieval to the STS thread.

Review of attachment 814388 [details] [diff] [review]:
-----------------------------------------------------------------

This time with 100% more r+.
Attachment #814388 - Flags: review?(adam) → review+
Attachment #814388 - Attachment is obsolete: true
Comment on attachment 814617 [details] [diff] [review]
Move ICE candidate retrieval to the STS thread.

Review of attachment 814617 [details] [diff] [review]:
-----------------------------------------------------------------

Carrying forward r+ from abr
Attachment #814617 - Flags: review+
Attachment #814617 - Flags: checkin?
I'd suggest checking this in - as it's only on 27, there's no need for security approval.  (Not sure why it was marked "checkin?")
Comment on attachment 814617 [details] [diff] [review]
Move ICE candidate retrieval to the STS thread.

https://hg.mozilla.org/integration/mozilla-inbound/rev/994c850a583a
Attachment #814617 - Flags: checkin? → checkin+
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla27
Group: media-core-security
Group: core-security
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: