Closed Bug 799419 Opened 7 years ago Closed 7 years ago

WebRTC Assertion failure: mod != NULL, at pk11slot.c:1766

Categories

(Core :: WebRTC: Networking, defect, P1, critical)

x86_64
macOS
defect

Tracking

()

RESOLVED FIXED
mozilla19
Tracking Status
firefox15 --- unaffected
firefox16 --- unaffected
firefox17 --- unaffected
firefox18 - affected
firefox19 --- fixed
firefox-esr10 --- unaffected

People

(Reporter: posidron, Assigned: jesup)

References

(Blocks 1 open bug)

Details

(Keywords: crash, testcase, Whiteboard: [WebRTC], [blocking-webrtc+] [qa-])

Attachments

(3 files, 4 obsolete files)

Attached file testcase
The testcase crashes with an m-c optimized -O1 build.

changeset:   109677:87f3548e638e (tip)
Does it also crash with {video: true, fake: true}?
Yup
Also crashes with a debug build

The stack backtrace is:

#2  0x00007ffff7ac3e53 in PR_Assert (s=0x7ffff72348f4 "mod != NULL", file=0x7ffff7234818 "pk11slot.c", ln=1766) at ../../../../../nsprpub/pr/src/io/prlog.c:554
#3  0x00007ffff71450e2 in PK11_GetInternalSlot () at pk11slot.c:1766
#4  0x00007ffff43929e9 in mozilla::nr_crypto_nss_random_bytes (buf=0x7fffffff4930 "%K\027\365\377\177", len=4) at nricectx.cpp:93
#5  0x00007ffff437873d in nr_ice_random_string (str=0x7fffffff49e0 "", len=8) at ../../../../../../media/mtransport/third_party/nICEr/src/ice/ice_ctx.c:439
#6  0x00007ffff4377af5 in nr_ice_ctx_create (label=0x7fffe8c741a8 "PC:a07cadf4ff7f0000", flags=5, ctxp=0x7fffe71acb38)
    at ../../../../../../media/mtransport/third_party/nICEr/src/ice/ice_ctx.c:209
#7  0x00007ffff4393715 in mozilla::NrIceCtx::Create (name="PC:a07cadf4ff7f0000", offerer=true, set_interface_priorities=true) at nricectx.cpp:291
#8  0x00007ffff43ed1fd in sipcc::PeerConnectionImpl::Initialize (this=0x7fffd71f1d50, aObserver=0x7fffd8cfed80, aWindow=0x7fffd9764000, aThread=0x7ffff7d3d230)
    at ../../../../../media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp:411
Pretty sure this is a "called nss without being nss properly initialized" issue.
Whiteboard: [WebRTC] → [WebRTC], [blocking-webrtc+]
Yes. You will recall that bsmith has been hassling us about making sure that NSS is available in PC? This is it.

Bsmith: can you point us to the incantations we need to catch NSS startup and NSS shutdown?
To initialize PSM, which is responsible for initializing NSS:
http://mxr.mozilla.org/mozilla-central/source/toolkit/identity/IdentityCryptoService.cpp?rev=4af7c8a2159f#220

For shutdown, you need to observe the "profile-change-net-teardown" event, and tear down all network connections and stop using NSS and PSM functionality. After that point, you should always fail to create any new WebRTC connections when requested.

if you need to use NSS beyond the "profile-change-net-teardown" event, then you need to implement the nsNSSShutdownObject protocol, which is documented in https://mxr.mozilla.org/mozilla/source/security/manager/ssl/src/nsNSSShutDown.h#197
You should look at how nsHttpHandler deals with profile-change-net-teardown and profile-change-net-restore.

Handling those events in a similar way will also get you support for offline mode "for free".

Also, I see that DataChannel.cpp is already observing "profile-change-net-teardown" but not "profile-change-net-restore".
Priority: -- → P1
Not tracking this for FF18,considering there is no movement on the bug and as WebRTC lands as an experimental feature in FF18 , not a blocker for release.
Flags: in-testsuite?
Assignee: nobody → rjesup
Work-in-process patch - Doesn't actually disable network access in the transportlayerfoo classes.  For some reason I haven't figured out, ::Observe() is never getting called (I verified AddObserver was called) on linux debug (where I'm testing) when closing the window, Quitting, or going offline and back.
Attachment #682807 - Flags: feedback?(mcmanus)
Attachment #682807 - Flags: feedback?(ekr)
Attachment #682807 - Flags: feedback?(bsmith)
I'm not sure this is the angle I would take. A few comments:

There are a lot of places in WebRTC that use NSS and/or STS so I think individually disabling all of them is problematic. Instead, what I would do is catch the event in a single place and then destroy all the PCs. This will shut down all use of NSS and the STS without having to have catches in a whole bunch of places. Certainly there's no reason why we couldn't force NSS *startup* in PC.

I'm especially concerned about having TransportFlow do this, because it has no current dependency on either of these services. If we *must* have elements of mtransport monitor this stuff (and again, I think this is a bad idea), then that should be limited to those which use NSS and/or STS.
Brian,

Can you elaborate on the conditions when "profile-change-net-teardown" fires? Is it both offline and change of profile?

My view is that when the profile changes that's like a differnet user and we should tear down the call. Maybe not so much with network failures.

Regardless, I suggest that for now we just fix the startup problem which we know is causing crashes and is simple and leave the shutdown problem for later.
See https://developer.mozilla.org/en-US/docs/Observer_Notifications
and https://wiki.mozilla.org/XPCOM_Shutdown
and https://groups.google.com/forum/#!topic/mozilla.dev.platform/cMQ0-ICOW-w

We don't actually support profile switching anymore. so net-teardown should only occur during shutdown and maybe when entering offline mode. ("Work Offline" is in the developer menu, instead of the main menu, so I'm guessing that it is now intended as a "simulate what happens when there is no network connection" feature at this point.)

I agree with Eric that you should have a single place that observes the event and tears down all connections, instead of handling shutdown in multiple places. If you do that in net-teardown, then you will still have NSS alive throughout, as NSS is shut down in profile-change-teardown, which occurs after net-teardown. Just make sure you don't allow any data channels to be created between net-teardown and net-restore. (On shutdown, you will never get a net-restore, but I guess you may get one if somebody turns off Offline Mode after turning it on. Test it out and see.)
(In reply to Brian Smith (:bsmith) from comment #12)
> ("Work Offline" is in the developer menu, instead of the main menu, so I'm 
> guessing that it is now intended as a "simulate what happens when there is
> no network connection" feature at this point.)

Ignore the above sentence.

> Just make sure you don't allow any data channels

s/data channels/PeerConnections/.

Put another way, you must ensure that after net-teardown that you don't use NSS anymore unless/until you get a net-restore.
ok, revised heavily to Close() all peerconnections on teardown or incipient offline status ('work offline').  Note that for this, I've made it synchronous close() because once we return from the observer it will continue on with shutdown.
Attachment #682807 - Attachment is obsolete: true
Attachment #682807 - Flags: feedback?(mcmanus)
Attachment #682807 - Flags: feedback?(ekr)
Attachment #682807 - Flags: feedback?(bsmith)
Comment on attachment 682857 [details] [diff] [review]
Watch network (tear)down events and kill PeerConnections; ensure init of NSS

Note: I realize I need to rev the UUID in IPeerConnection.idl as well
Attachment #682857 - Flags: review?(ekr)
Attachment #682857 - Flags: review?(bugs)
Attachment #682857 - Flags: review?(bsmith)
Comment on attachment 682857 [details] [diff] [review]
Watch network (tear)down events and kill PeerConnections; ensure init of NSS

Review of attachment 682857 [details] [diff] [review]:
-----------------------------------------------------------------

I would suggest solving this bug with a patch like the one I attached and then file a separate bug to address the shutdown problem.

::: media/mtransport/transportlayerdtls.cpp
@@ +437,5 @@
>    if (verification_mode_ == VERIFY_UNSET) {
>      MOZ_MTLOG(PR_LOG_ERROR, "Can't start DTLS without specifying a verification mode");
>      return false;
>    }
>  

This doesn't happen early enough. If you look at the stack trace you will see that the error is happening in PCImpl::Initialize(), long before any DTLS structures are fired up. That's because both ICE and PC itself use NSS.

I believe a better approach would be to allow mtransport to assume NSS is available and instead to set it up at the top of PeerConnectionImpl. I've attached a patch that does that.

::: media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp
@@ +964,5 @@
>    // Note that no media calls may be made after this point
>    // because we have removed the pointer.
>    // Forget the reference so that we can transfer it to
>    // SelfDestruct().
>    RUN_ON_THREAD(mThread, WrapRunnable(mMedia.forget().get(),

I believe that using NS_DISPATCH_SYNC here will result in a regression on bug 811183, which resulted from synchronously closing.
Attachment #682857 - Flags: review?(ekr) → review-
Comment on attachment 682888 [details] [diff] [review]
Force NSS startup

Good by me.  All uses of NSS via transport/ice/etc are hung off PeerConnection, so this is good.  If mtransport/etc is ever used independently of PeerConnection, then we'd have to move or copy this down into that code.
Attachment #682888 - Flags: review+
Comment on attachment 682888 [details] [diff] [review]
Force NSS startup

Should be an easy review
Attachment #682888 - Flags: review?(bsmith)
Blocks: 812886
Comment on attachment 682857 [details] [diff] [review]
Watch network (tear)down events and kill PeerConnections; ensure init of NSS

Now in the dependent bug
Attachment #682857 - Attachment is obsolete: true
Attachment #682857 - Flags: review?(bugs)
Attachment #682857 - Flags: review?(bsmith)
Comment on attachment 682888 [details] [diff] [review]
Force NSS startup

Review of attachment 682888 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp
@@ +370,5 @@
> +#ifdef MOZILLA_INTERNAL_API
> +  // This code interferes with the C++ unit test startup code.
> +  nsCOMPtr<nsISupports> nssDummy = do_GetService("@mozilla.org/psm;1", &res);
> +  NS_ENSURE_SUCCESS(res, res);
> +#endif

IMO, the C++ unit tests should be changed to initialize NSS the same way Gecko does, so that this is a non-issue. Not in this patch though.

Use the name "rv" for nsresults or "nrv" when there is another rv variable in the same scope.
Attachment #682888 - Flags: review?(bsmith) → review+
https://hg.mozilla.org/mozilla-central/rev/1e3d1096ab5e
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Jesse, which type of test could be used here or don't we have a chance at all to get this tested? Or even not worth the time?
Flags: needinfo?(jruderman)
The testcase looks like it would work as a crashtest.
Flags: needinfo?(jruderman)
Whiteboard: [WebRTC], [blocking-webrtc+] → [WebRTC], [blocking-webrtc+] [qa-]
This is not fixed and the testcase still crashes the browser. In this case I have tested with the recent nightly build Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:20.0) Gecko/20.0 Firefox/20.0 ID:20121125030729

Crash report: bp-1c2c040f-1bcd-467e-bb30-e269c2121126

0 		@0x10b6ecb80 	
1 	CMIOUnits 	CMIOUnits@0x63e4b 	
2 	CMIOUnits 	CMIOUnits@0x63d95 	
3 	VDC 	VDC@0x16a73
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Henrik Skupin (:whimboo) from comment #26)
> This is not fixed and the testcase still crashes the browser. In this case I
> have tested with the recent nightly build Mozilla/5.0 (Macintosh; Intel Mac
> OS X 10.7; rv:20.0) Gecko/20.0 Firefox/20.0 ID:20121125030729
> 
> Crash report: bp-1c2c040f-1bcd-467e-bb30-e269c2121126
> 
> 0 		@0x10b6ecb80 	
> 1 	CMIOUnits 	CMIOUnits@0x63e4b 	
> 2 	CMIOUnits 	CMIOUnits@0x63d95 	
> 3 	VDC 	VDC@0x16a73

Hmm...the stack doesn't tell us much. Is this the same bug with the same stack or a different bug? I recall there being some other bug open that was tracking a crash that would happen after this patch was applied as a followup.
Make probably more sense to file it as a new bug. I will do it right away.
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
Duplicate of this bug: 795114
Attached patch Crashtest v1 (obsolete) — Splinter Review
Attachment #688161 - Attachment is obsolete: true
Attached patch Crashtest v2 (obsolete) — Splinter Review
Attachment #688162 - Attachment is obsolete: true
Comment on attachment 688164 [details] [diff] [review]
Crashtest v3

Here's another crashtest that basically ports the test case into a crashtest with fake video instead.
Attachment #688164 - Flags: review?(rjesup)
Attachment #688164 - Flags: review?(rjesup) → review+
Flags: in-testsuite? → in-testsuite+
You need to log in before you can comment on or make changes to this bug.