Closed Bug 856423 Opened 8 years ago Closed 8 years ago

WebRTC assertion failure: pc_stream_id == 0 and crash [@gsmsdp_negotiate_media_lines]

Categories

(Core :: WebRTC: Signaling, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla22
Tracking Status
firefox22 + fixed

People

(Reporter: posidron, Assigned: ehugg)

References

Details

(Keywords: assertion, crash, testcase, Whiteboard: [WebRTC] [blocking-webrtc+][qa-])

Attachments

(5 files, 1 obsolete file)

Attached file callstack
This crash occurs by reloading the attached testcase multiple times.

/*
 * Bubble the stream added event up to the PC UI
 */
if (notify_stream_added) {
    for (j=0; j < CC_MAX_STREAMS; j++ ) {
        if (dcb_p->remote_media_stream_tbl->streams[j].
            num_tracks &&
            (!dcb_p->remote_media_stream_tbl->streams[j].
             num_tracks_notified)) {
            /* Note that we only notify when the number of tracks
               changes from 0 -> !0 (i.e. on creation).
               TODO(adam@nostrum.com): Figure out how to notify
               when streams gain tracks */
            ui_on_remote_stream_added(evOnRemoteStreamAdd,
                dcb_p->line, dcb_p->call_id,
                dcb_p->caller_id.call_instance_id,
                dcb_p->remote_media_stream_tbl->streams[j]);

            dcb_p->remote_media_stream_tbl->streams[j].num_tracks_notified =
                dcb_p->remote_media_stream_tbl->streams[j].num_tracks;  <<====


Tested with m-i changeset: 126761:c99f1fd792db
Attached file testcase
Flags: in-testsuite?
Assignee: nobody → adam
Whiteboard: [WebRTC] [blocking-WebRTC?]
Whiteboard: [WebRTC] [blocking-WebRTC?] → [WebRTC] [blocking-webrtc?]
cdiehl indicates this crash happens every 20 -30 reloads.
Priority: -- → P2
Whiteboard: [WebRTC] [blocking-webrtc?] → [WebRTC] [blocking-webrtc+]
I've tried many, many times to reproduce this with no luck on a Linux asan build (current inbound approximately).

I also tried adding automatic reloads either at the end of the test with a small delay, or off a 1-1.5 second delay from the start of the test (1 second often interrupted it).  I've let it run many hundred iterations with no crashes or asan hits.

Cristoph, can you re-verify this happens, and give more info about how you're triggering it?  are you waiting for the page to finish loading?  If not, when are you triggering?  I'm guessing you're on mac, though that's unlikely to cause this (but not impossible).

Thanks.  If this continues to be hard/impossible to reproduce we may move it to blocking-, at least unless/until we can get a handle on the cause.
Flags: needinfo?(cdiehl)
FWIW, I also can't reproduce this crash after loads of reloads of the attached test case on Win 7.
I still can reproduce it with "cmd+shift+r" against http://hg.mozilla.org/integration/mozilla-inbound/rev/3e19dec4264f
Flags: needinfo?(cdiehl)
I have tested it now again and hit it on the second try. 
As soon as you see "close pc2" inside the terminal; force the reload - that seems to be a reliable way.
(In reply to Christoph Diehl [:cdiehl] from comment #6)
> I have tested it now again and hit it on the second try. 
> As soon as you see "close pc2" inside the terminal; force the reload - that
> seems to be a reliable way.

Is this ASAN only? Cause even with this updated STR, I still can't reproduce on Nightly, even after adding a small modification to the test case to force a reload right after closed pc 2.
No, that's a plain null ptr crash.
Assertion failure: pc_stream_id == 0, at /Users/cdiehl/dev/repos/mozilla/mozilla-inbound/media/webrtc/signaling/src/sipcc/core/gsm/gsm_sdp.c:4627
ASAN:SIGSEGV

/* This is a hack to keep all the media in a single
   stream.
   TODO(ekr@rtfm.com): revisit when we have media
   assigned to streams in the SDP */
if (!created_media_stream){
    lsm_add_remote_stream (dcb_p->line,
                           dcb_p->call_id,
                           media,
                           &pc_stream_id);
    MOZ_ASSERT(pc_stream_id == 0);
    /* Use index 0 because we only have one stream */
    result = gsmsdp_add_remote_stream(0,
                                      pc_stream_id,
                                      dcb_p);
Keywords: assertion
Summary: WebRTC crash [@gsmsdp_negotiate_media_lines] → WebRTC assertion failure: pc_stream_id == 0 and crash [@gsmsdp_negotiate_media_lines]
I can reproduce the behavior in comment 10 on a debug build on Win 7 very easily.

Assertion failure: pc_stream_id == 0, at c:/Users/jsmith/Documents/mozilla-centr
al/obj-i686-pc-mingw32/media/webrtc/signaling/signaling_sipcc/../../../../../med
ia/webrtc/signaling/src/sipcc/core/gsm/gsm_sdp.c:4627

What I did for STR was I opened up my modified reload at the end test case in about 8 tabs, reloaded all of tabs in a row a few times, and I eventually will crash pretty quickly on an assertion failure.
Attached file Console Output
Thanks; reloading it in multiple tabs seems to be the key.  With a 1.5 second automatic reload (started before createOffer/etc), I got it open in 3 tabs before crashing.
Ok, the crash is due to pc_stream_id being -1.  This is caused because lsm_add_remote_stream() gets the id from vcmCreateRemoteStream(), which can fail.  It returns a failure to the lsm (and as a side-effect leaves pc_stream_id -1), but lsm_add_remote_stream doesn't check for errors and has no error return itself.

The code there needs to handle errors (and perhaps modify the call to gsmsdp_add_remote_stream() as well, though right now that "can't" fail (it asserts also if told to add a stream that wasn't created -- with PR_ASSERT) and the lsm code needs to check-for and pass along errors.

-> Ethan to resolve

Thanks Jason!
Assignee: adam → ethanhugg
OS: Mac OS X → All
Priority: P2 → P1
Hardware: x86_64 → All
Whiteboard: [WebRTC] [blocking-webrtc+] → [WebRTC] [blocking-webrtc+][webrtc-uplift]
Target Milestone: --- → mozilla22
Attachment #734704 - Attachment is obsolete: true
Comment on attachment 734719 [details] [diff] [review]
Signaling - check return codes from vcmCreateRemoteStream


Hitting CTRL-R many times on this test case should have an error like this in the log instead of crashing.

$ tail -f ~/tmp/nspr.log | grep lsm_add_remote_stream 
-1523382528[7f959f07f030]: [GSM Task|lsm] lsm.c:5317: lsm_add_remote_stream: vcmCreateRemoteStream returned error: -1
Attachment #734719 - Flags: review?(rjesup)
Attachment #734719 - Flags: review?(rjesup) → review+
https://hg.mozilla.org/mozilla-central/rev/3146a645d19e
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: mozilla22 → mozilla23
Not worth getting a crashtest here - it's a really strong edge case that's going to be tough to reproduce in a crashtest.
Flags: in-testsuite? → in-testsuite-
Whiteboard: [WebRTC] [blocking-webrtc+][webrtc-uplift] → [WebRTC] [blocking-webrtc+][webrtc-uplift][qa-]
Comment on attachment 734719 [details] [diff] [review]
Signaling - check return codes from vcmCreateRemoteStream

[Approval Request Comment]
Bug caused by (feature/regressing bug #): n/a

User impact if declined: race condition can lead to a null deref/crash.  Hard to hit, but risk is timing/load stress related.

Testing completed (on m-c, etc.): on m-c; local testing

Risk to taking this patch (and alternatives if risky): low risk; mostly just adds error codes and passes them around

String or IDL/UUID changes made by this patch: none
Attachment #734719 - Flags: approval-mozilla-aurora?
Attachment #734719 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
https://hg.mozilla.org/releases/mozilla-aurora/rev/bf6348f00690
Whiteboard: [WebRTC] [blocking-webrtc+][webrtc-uplift][qa-] → [WebRTC] [blocking-webrtc+][qa-]
Target Milestone: mozilla23 → mozilla22
You need to log in before you can comment on or make changes to this bug.