Assertion failure: cb_hdr and crash [@ fim_process_event/FIM_DEBUG]

RESOLVED FIXED in mozilla21

Status

()

defect
P3
critical
RESOLVED FIXED
7 years ago
7 years ago

People

(Reporter: posidron, Assigned: ehugg)

Tracking

(Blocks 1 bug, {assertion, crash, testcase})

Trunk
mozilla21
x86_64
macOS
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [WebRTC], [blocking-webrtc-], [qa-], crash signature)

Attachments

(4 attachments, 1 obsolete attachment)

Posted file callstack
./media/webrtc/signaling/src/sipcc/core/gsm/fim.c:538

[...]
if (call_chn == NULL) {                                                                                                     
    FIM_DEBUG(get_debug_string(GSM_DBG1), "FIM", call_id, fname, "no call_chn");
[...]


@rforbes
SEED: 1355734171.18
Fault was detected on test 111

Tested with m-i changeset: 116243:9d6e95e77855
Assignee: nobody → ethanhugg
Priority: -- → P3
Whiteboard: [WebRTC], [blocking-webrtc-]
Unfortunately this callstack is not informative enough for me to figure this out.  If you get another crash in this area, please post it.  Thanks.
Just a note that this crash still pops up relatively often with m-c changeset:   118049:928550157e6e
I am seeing this assertion failure before the crash:

Assertion failure: cb_hdr, at /Users/cdiehl/Code/Mozilla/mc-inbound-asan/media/webrtc/signaling/src/sipcc/core/gsm/fim.c:604
Keywords: assertion
Summary: WebRTC crash [@fim_process_event/FIM_DEBUG] → Assertion failure: cb_hdr and crash [@ fim_process_event/FIM_DEBUG]
I just got this to happen under the debugger.  It appears to happen when we go over the MAX_CONFIG_LINES limit in SIPCC.  If I hit reload quickly perhaps I'm creating PCs faster than they are being destructed.

Not sure what to do about it yet, whether we even need a maximum number here, whether it could be arbitrarily large, or whether we can make a saner error case when it's hit.
(In reply to Ethan Hugg [:ehugg] from comment #4)
> I just got this to happen under the debugger. If I hit reload quickly
> perhaps I'm creating PCs faster than they are being destructed.

That's great. That practice covers with my fuzzing procedures.
Since I added a delay of 3 seconds between each test I am not seeing this crash anymore.
Low-brow testcase that creates and holds many PeerConnection objects and hits this assert every time for me.
Keywords: testcase
Blocks: 796463
Comment on attachment 702490 [details] [diff] [review]
Handle creation of more PeerConnections than MAX_LINES

Review of attachment 702490 [details] [diff] [review]:
-----------------------------------------------------------------

In this case we are running out of a resource known as the feature control block which are allocated in a fixed array in fsm.c.  This is the simple fix where we handle this with an error instead of a crash.

Also note that after making 45 PeerConnections you will see this

JavaScript error: file:///home/ehugg/mozilla/mozilla-central/obj-ff-dbg/dist/bin/components/PeerConnection.js, line 278: NS_ERROR_FAILURE: Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [IPeerConnection.initialize]

Which is from PeerConnectionMedia::Init/NrIceCtx::StartGathering which is has this maximum limit.  I did not try to fix this since it is handled with a proper error and doesn't crash,  but it appears 45 is our limit for making PCs.

I have also considered changing the fcbs to be either a dynamically expandable array or a linked list so that they would only run out with memory.  I feel that needs its own patch, but let me know if you think that would be a priority.
Attachment #702490 - Flags: review?(rjesup)
Comment on attachment 702490 [details] [diff] [review]
Handle creation of more PeerConnections than MAX_LINES

Review of attachment 702490 [details] [diff] [review]:
-----------------------------------------------------------------

We will likely need to address the number of slots as well, in a different bug
Attachment #702490 - Flags: review?(rjesup) → review+
(In reply to Ethan Hugg [:ehugg] from comment #8) 
> Also note that after making 45 PeerConnections you will see this
> 
> JavaScript error:
> file:///home/ehugg/mozilla/mozilla-central/obj-ff-dbg/dist/bin/components/
> PeerConnection.js, line 278: NS_ERROR_FAILURE: Component returned failure
> code: 0x80004005 (NS_ERROR_FAILURE) [IPeerConnection.initialize]

Keep in mind that this is bug 824919 and will hopefully be fixed soon.
https://hg.mozilla.org/mozilla-central/rev/096ee26f5a0d
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla21
Whiteboard: [WebRTC], [blocking-webrtc-] → [WebRTC], [blocking-webrtc-], [qa-]
Posted patch Crashtest (obsolete) — Splinter Review
Comment on attachment 706855 [details] [diff] [review]
Crashtest

Crashtest for this bug. I've taken Ethan's test case, cleaned some of the code up, and made it a crash test.
Attachment #706855 - Flags: review?(ethanhugg)
Comment on attachment 706855 [details] [diff] [review]
Crashtest

>+    var Logger = {
>+      console: function(msg) {
>+        if (window.dump) {
>+          window.dump(msg);
>+        } else if (window.console && window.console.log) {
>+          window.console.log(msg);
>+        }
>+      }
>+    }
[..]
>+
>+    function onFailure(code) {
>+      Logger.console("\n########## Failure: " + code + "\n");
>+    }

I do not think that logging is necessary here to reproduce the crash. It has only put in to the manual testcase to see what's going on.
Attachment #706855 - Flags: feedback-
(In reply to Henrik Skupin (:whimboo) from comment #15)
> Comment on attachment 706855 [details] [diff] [review]
> Crashtest
> 
> >+    var Logger = {
> >+      console: function(msg) {
> >+        if (window.dump) {
> >+          window.dump(msg);
> >+        } else if (window.console && window.console.log) {
> >+          window.console.log(msg);
> >+        }
> >+      }
> >+    }
> [..]
> >+
> >+    function onFailure(code) {
> >+      Logger.console("\n########## Failure: " + code + "\n");
> >+    }
> 
> I do not think that logging is necessary here to reproduce the crash. It has
> only put in to the manual testcase to see what's going on.

I'm not sure that's right. Take a look at the patch attached here. I'll wait for Ethan's feedback. Clearing feedback- as there isn't a clear indication is correct or not.
Attachment #706855 - Flags: feedback-
Comment on attachment 706855 [details] [diff] [review]
Crashtest

Review of attachment 706855 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me, just a couple nits.

::: dom/media/tests/crashtests/822197.html
@@ +4,5 @@
> +https://bugzilla.mozilla.org/show_bug.cgi?id=822197
> +-->
> +<head>
> +  <meta charset="utf-8">
> +  <title>Bug 822917 - Many PeerConnections with CreateOffer</title>

The bug number is 822197 - typo.

@@ +22,5 @@
> +    function onCreateOfferSuccess(offer) {
> +    }
> +
> +    function onFailure(code) {
> +      Logger.console("\n########## Failure: " + code + "\n");

As Henrik mentions, logging is only applicable to a manual run.  The loop below should create and hold enough PCs to test the crash.
Attachment #706855 - Flags: review?(ethanhugg) → review+
Attachment #706855 - Attachment is obsolete: true
Comment on attachment 707185 [details] [diff] [review]
Crashtest v2

Cleaned up per review comments.
Attachment #707185 - Flags: review?(ethanhugg)
Attachment #707185 - Flags: review?(ethanhugg) → review+
You need to log in before you can comment on or make changes to this bug.