Closed Bug 1301407 Opened 8 years ago Closed 7 years ago

Crash in nsNSSComponent::nsNSSComponent

Categories

(Core :: Networking, defect)

49 Branch
Unspecified
Android
defect
Not set
critical

Tracking

()

RESOLVED DUPLICATE of bug 1339267
Tracking Status
firefox49 - wontfix
firefox50 + wontfix
firefox51 --- wontfix
firefox52 --- wontfix
firefox53 --- wontfix
firefox54 --- wontfix
firefox55 --- fixed

People

(Reporter: marcia, Assigned: keeler)

References

Details

(Keywords: crash, regression, Whiteboard: [necko-active][tbird crash])

Crash Data

Attachments

(5 files, 10 obsolete files)

2.12 KB, patch
dragana
: review+
Details | Diff | Splinter Review
855 bytes, patch
keeler
: review+
Details | Diff | Splinter Review
2.57 KB, patch
dragana
: review+
Details | Diff | Splinter Review
2.62 KB, patch
dragana
: review+
Details | Diff | Splinter Review
4.73 KB, patch
dragana
: review+
Details | Diff | Splinter Review
This bug was filed from the Socorro interface and is 
report bp-3d2b8867-3bba-488b-ae04-5ae892160908.
=============================================================

Seen while looking at Fennec beta 11 crashes: http://bit.ly/2cwb43x

New signature that is currently low volume in RC but didn't appear in released versions. Also present in very small volume on Aurora and Nightly.

Not sure if this should be bucketed in the NSS component.
This is coming from bug 1273475.
Flags: needinfo?(jjones)
Flags: needinfo?(dkeeler)
It looks like we're doing a speculative connect before PSM/NSS has been initialized on the main thread:

> 16 libxul.so mozilla::net::nsHttpConnectionMgr::OnMsgSpeculativeConnect netwerk/protocol/http/nsHttpConnectionMgr.cpp:2923

I tried to figure out how this could be happening, but all the code paths I'm finding have already initialized PSM/NSS on the main thread. Maybe Patrick has an idea?
Flags: needinfo?(dkeeler) → needinfo?(mcmanus)
Flags: needinfo?(jjones)
I don't really know what path, but you can probably do this and I'd r+ it

   if (!IsNeckoChild()) {
        // HACK: make sure PSM gets initialized on the main thread.
        net_EnsurePSMInit();
    }
in

nsHttpConnectionMgr::SpeculativeConnect() (in nsHttpConnectionMgr.cpp)
Flags: needinfo?(mcmanus)
Assignee: nobody → dkeeler
Component: Security → Networking
Whiteboard: [necko-active]
[Tracking Requested - why for this release]:
this is making up 1.7% of crashes on android release right now and almost all of them crashes on startup.

@patrick - could you also explicitly r+ this change?
Flags: needinfo?(mcmanus)
Comment on attachment 8793963 [details]
bug 1301407 - ensure PSM is initialized on the main thread before doing a speculative connect

https://reviewboard.mozilla.org/r/80544/#review82244
Attachment #8793963 - Flags: review?(mcmanus) → review+
I'm sorry about the review delay. I already reviewed this but I must not have succeeded in publishing it and it fell off the radar. thanks for the ni
Flags: needinfo?(mcmanus)
Pushed by dkeeler@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8450da83e7bc
ensure PSM is initialized on the main thread before doing a speculative connect r=mcmanus
Thanks! (shakes fist at review board...)
https://hg.mozilla.org/mozilla-central/rev/8450da83e7bc
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
can we uplift the fix in terms of scope & risk? (as noted in comment #5, it's a top crasher on android)
Flags: needinfo?(dkeeler)
Comment on attachment 8793963 [details]
bug 1301407 - ensure PSM is initialized on the main thread before doing a speculative connect

Approval Request Comment
[Feature/regressing bug #]: speculative connect causing PSM initialization off the main thread
[User impact if declined]: startup crashes
[Describe test coverage new/current, TreeHerder]: the related features have tests, but nothing covers this specific case
[Risks and why]: low - the fix is small and takes existing behavior and applies it to the new case of speculative connect. Basically the worst that could happen is this doesn't fix the issue.
[String/UUID change made/needed]: none
Flags: needinfo?(dkeeler)
Attachment #8793963 - Flags: approval-mozilla-beta?
Attachment #8793963 - Flags: approval-mozilla-aurora?
Comment on attachment 8793963 [details]
bug 1301407 - ensure PSM is initialized on the main thread before doing a speculative connect

Fixes a top crasher on Fennec, Aurora51+, Beta50+
Attachment #8793963 - Flags: approval-mozilla-beta?
Attachment #8793963 - Flags: approval-mozilla-beta+
Attachment #8793963 - Flags: approval-mozilla-aurora?
Attachment #8793963 - Flags: approval-mozilla-aurora+
just making sure dkeeler see comment #17
Flags: needinfo?(dkeeler)
(In reply to David Keeler [:keeler] (use needinfo?) from comment #12)
...
> Basically the worst that could happen is this doesn't fix the issue.

:(

On a more serious note, I'm sort-of out of ideas as to how this could still be happening.
Status: RESOLVED → REOPENED
Flags: needinfo?(dkeeler)
Resolution: FIXED → ---
Patrick - I think we/keeler need some necko eyes on this for help in tracking down what could be happening in the stack to even get here. Can you point us to someone who can take a look? Thanks!
Flags: needinfo?(mcmanus)
Could we call net_EnsurePSMInit in nsHttpHandler::Init() ? Or is that too early in the startup process?
Is calling GetSocketProvider("ssl") enough, or do we need to keep a ref to the socket provider around?
Target Milestone: mozilla52 → ---
Comment on attachment 8793963 [details]
bug 1301407 - ensure PSM is initialized on the main thread before doing a speculative connect

(Clearing the flags so it doesn't show up on Sheriff queries)
Attachment #8793963 - Flags: approval-mozilla-beta+
Attachment #8793963 - Flags: approval-mozilla-aurora+
(In reply to Valentin Gosu [:valentin] from comment #21)
> Could we call net_EnsurePSMInit in nsHttpHandler::Init() ? Or is that too
> early in the startup process?
> Is calling GetSocketProvider("ssl") enough, or do we need to keep a ref to
> the socket provider around?

The question above is more for security people :) I will needinfo them.

we can also do it just after socketTransport thread is created (in nsIOService), that is the earliest :)
Flags: needinfo?(jjones)
Flags: needinfo?(dkeeler)
(In reply to Valentin Gosu [:valentin] from comment #21)
> Could we call net_EnsurePSMInit in nsHttpHandler::Init() ? Or is that too
> early in the startup process?

It looks like that could work. The only requirements are that it be on the main thread and that, if a profile directory is available, it happens after the directory service is set up to return that directory if queried for it.

> Is calling GetSocketProvider("ssl") enough, or do we need to keep a ref to
> the socket provider around?

My understanding is that since it's a service, it should stay around in the service manager (which keeps a reference to it). Note that GetSocketProvider("ssl") is a bit indirect - you could call do_GetService(PSM_COMPONENT_CONTRACTID) to be more clear that the purpose is to ensure that PSM has properly initialized.

(In reply to Dragana Damjanovic [:dragana] from comment #23)
> we can also do it just after socketTransport thread is created (in
> nsIOService), that is the earliest :)

Again, as long as that's main-thread-only and the profile directory is available, that should work. Earlier is probably better (if we can simplify the reasoning about when PSM/NSS will be available, that would be nice).
Flags: needinfo?(dkeeler)
This crash is interesting:

https://crash-stats.mozilla.com/report/index/bfd8eed6-fd15-44ba-830d-e87862161022

It crashes in OnMsgSpeculativeConnect(). The only place we call (correctly we dispatch it to the socketThread) this is in SpeculativeConnect
https://hg.mozilla.org/releases/mozilla-beta/annotate/cefddaa8e227/netwerk/protocol/http/nsHttpConnectionMgr.cpp#l455

A couple of lines above this call we call net_EnsurePSMInit()
https://hg.mozilla.org/releases/mozilla-beta/annotate/cefddaa8e227/netwerk/protocol/http/nsHttpConnectionMgr.cpp#l413

So the only way this can happen is if net_EnsurePSMInit fails somewhere or reference to the service is not kept.

We could check whether functions called by net_EnsurePSMInit have succeeded and if not we should crash. In that way we can get the real reason this fails.

Maybe keeler have an idea in which cases it could fail.
This will not resolved the bug but just make it fail earlier if PSM cannot be initialized.

try run is in comment 26.
Attachment #8803725 - Flags: review?(mcmanus)
Attachment #8803725 - Flags: review?(dkeeler)
Of course it must be MOZ_RELEASE_ASSERT.
Attachment #8803725 - Attachment is obsolete: true
Attachment #8803725 - Flags: review?(mcmanus)
Attachment #8803725 - Flags: review?(dkeeler)
Attachment #8803769 - Flags: review?(mcmanus)
Attachment #8803769 - Flags: review?(dkeeler)
test are failing :)
Attachment #8793963 - Attachment is obsolete: true
Attachment #8803769 - Attachment is obsolete: true
Attachment #8803769 - Flags: review?(mcmanus)
Attachment #8803769 - Flags: review?(dkeeler)
Attachment #8803869 - Flags: review?(mcmanus)
Attachment #8803869 - Flags: review?(dkeeler)
Comment on attachment 8803869 [details] [diff] [review]
bug_1301407_ensure_nss_initialized.patch

Review of attachment 8803869 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/nsHttpConnectionMgr.cpp
@@ +414,2 @@
>          // HACK: make sure PSM gets initialized on the main thread.
> +        nsCOMPtr<nsISupports> nss = do_GetService(PSM_COMPONENT_CONTRACTID, &rv);

isn't this logic that net_EnsurePSMInit should do?
 (In reply to Patrick McManus [:mcmanus] from comment #32)
> Comment on attachment 8803869 [details] [diff] [review]
> bug_1301407_ensure_nss_initialized.patch
> 
> Review of attachment 8803869 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: netwerk/protocol/http/nsHttpConnectionMgr.cpp
> @@ +414,2 @@
> >          // HACK: make sure PSM gets initialized on the main thread.
> > +        nsCOMPtr<nsISupports> nss = do_GetService(PSM_COMPONENT_CONTRACTID, &rv);
> 
> isn't this logic that net_EnsurePSMInit should do?

It does not do it directly but using GetSocketProvider("ssl"). The reason I am doing it this way is to make sure that nssComponent is created, e.g. to assert on NS_SUCCEEEDED(rv).

SpeculativeConnect function calls net_EnsurePSMInit but we still create nssComponent on socketThread and of course fail assertion. (crash from comment 25)
Flags: needinfo?(jjones)
Comment on attachment 8803869 [details] [diff] [review]
bug_1301407_ensure_nss_initialized.patch

Review of attachment 8803869 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/nsHttpConnectionMgr.cpp
@@ +414,2 @@
>          // HACK: make sure PSM gets initialized on the main thread.
> +        nsCOMPtr<nsISupports> nss = do_GetService(PSM_COMPONENT_CONTRACTID, &rv);

Right - maybe we should just fix net_EnsurePSMInit to directly initialize PSM, check for failure, and raise a release assertion if it fails? (Unfortunately, it won't tell us why PSM is failing to initialize...)
Attachment #8803869 - Flags: review?(dkeeler)
(In reply to David Keeler [:keeler] (use needinfo?) from comment #35)
> Comment on attachment 8803869 [details] [diff] [review]
> bug_1301407_ensure_nss_initialized.patch
> 
> Review of attachment 8803869 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: netwerk/protocol/http/nsHttpConnectionMgr.cpp
> @@ +414,2 @@
> >          // HACK: make sure PSM gets initialized on the main thread.
> > +        nsCOMPtr<nsISupports> nss = do_GetService(PSM_COMPONENT_CONTRACTID, &rv);
> 
> Right - maybe we should just fix net_EnsurePSMInit to directly initialize
> PSM, check for failure, and raise a release assertion if it fails?
> (Unfortunately, it won't tell us why PSM is failing to initialize...)

I know, but I should have been more precise what I want to do.
I want to uplift this to beta, maybe next beta this one is late in the cycle. I was hoping to get a crash stack on the main thread, maybe that will tell us more. This is not a solution, this patch should gave us hopefully more info what is happening. This crash is bad on android.
Attachment #8803869 - Attachment is obsolete: true
Attachment #8803869 - Flags: review?(mcmanus)
Attachment #8804167 - Flags: review?(mcmanus)
Attachment #8804167 - Flags: review?(dkeeler)
there is still a possibility that nssComponent get destroyed after creation.
Comment on attachment 8804167 [details] [diff] [review]
bug_1301407_ensure_nss_initialized.patch

Review of attachment 8804167 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/base/nsNetUtil.cpp
@@ +2126,5 @@
>  
>  void net_EnsurePSMInit()
>  {
> +    nsresult rv;
> +    nsCOMPtr<nsISupports> nss = do_GetService(PSM_COMPONENT_CONTRACTID, &rv);

s/nss/psm ?

@@ -2128,5 @@
> -    nsCOMPtr<nsISocketProviderService> spserv =
> -            do_GetService(NS_SOCKETPROVIDERSERVICE_CONTRACTID);
> -    if (spserv) {
> -        nsCOMPtr<nsISocketProvider> provider;
> -        spserv->GetSocketProvider("ssl", getter_AddRefs(provider));

I didn't do the research to make sure this is made redundant by the new code.. just making sure you did :)
Attachment #8804167 - Flags: review?(mcmanus) → review+
(In reply to Patrick McManus [:mcmanus] from comment #40)
> Comment on attachment 8804167 [details] [diff] [review]
> bug_1301407_ensure_nss_initialized.patch
> 
> Review of attachment 8804167 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: netwerk/base/nsNetUtil.cpp
> @@ +2126,5 @@
> >  
> >  void net_EnsurePSMInit()
> >  {
> > +    nsresult rv;
> > +    nsCOMPtr<nsISupports> nss = do_GetService(PSM_COMPONENT_CONTRACTID, &rv);
> 
> s/nss/psm ?
> 
> @@ -2128,5 @@
> > -    nsCOMPtr<nsISocketProviderService> spserv =
> > -            do_GetService(NS_SOCKETPROVIDERSERVICE_CONTRACTID);
> > -    if (spserv) {
> > -        nsCOMPtr<nsISocketProvider> provider;
> > -        spserv->GetSocketProvider("ssl", getter_AddRefs(provider));
> 
> I didn't do the research to make sure this is made redundant by the new
> code.. just making sure you did :)

Comment 24 states that and I also look through the code a bit.
Comment on attachment 8804167 [details] [diff] [review]
bug_1301407_ensure_nss_initialized.patch

Review of attachment 8804167 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/base/nsNetUtil.cpp
@@ -2128,5 @@
> -    nsCOMPtr<nsISocketProviderService> spserv =
> -            do_GetService(NS_SOCKETPROVIDERSERVICE_CONTRACTID);
> -    if (spserv) {
> -        nsCOMPtr<nsISocketProvider> provider;
> -        spserv->GetSocketProvider("ssl", getter_AddRefs(provider));

From my reading, the new code really is just a simplification of this original code (the important part was always that it would cause PSM to be initialized).
Attachment #8804167 - Flags: review?(dkeeler) → review+
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a14a46bd1ea2
Ensure nss initialized during nsHttpHandler initialization. r=mcmanus, r=keeler
Keywords: checkin-needed
Too late for a fix in 49. But we still have time to get a fix into 50.
https://hg.mozilla.org/mozilla-central/rev/a14a46bd1ea2
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
I forgot to set "leave-open".
Probably, this patch will not fix the issue. We do not know what goes wrong and this patch should give us more clues.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I will ask for a uplift to aurora after it runs for a day in nightly.
There is no crashes on nightly but there are some on aurora.
Comment on attachment 8804360 [details] [diff] [review]
bug_1301407_ensure_nss_initialized.patch

Approval Request Comment
[Feature/regressing bug #]:
[User impact if declined]: Startup crash. This patch will not fix the issue, but it could help us by giving us some more clues. Nightly is not crashing therefore we need uplift to aurora (there are some crashes on aurora, very low volume).
[Describe test coverage new/current, TreeHerder]: This code path is very heavily used.
[Risks and why]: low
[String/UUID change made/needed]: none
Attachment #8804360 - Flags: approval-mozilla-aurora?
Comment on attachment 8804360 [details] [diff] [review]
bug_1301407_ensure_nss_initialized.patch

Take this patch to help investigate the issue. Aurora51+.
Attachment #8804360 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Unsure if this means 51 gets marked as fixed, feel free to correct it if I'm wrong.
Flags: needinfo?(dkeeler)
(In reply to Wes Kocher (:KWierso) from comment #52)
> Unsure if this means 51 gets marked as fixed, feel free to correct it if I'm
> wrong.

This patch should help us find the problem, most probably it will not fix it. I will change to affected.
Flags: needinfo?(dkeeler)
Crash Signature: [@ nsNSSComponent::nsNSSComponent] → [@ nsNSSComponent::nsNSSComponent] [@ net_EnsurePSMInit]
Now we have a bit more clue what is happening.

net_EnsurePSMInit crashes on assert.
For some reason we cannot initialize PSM service.
any idea ^?
Flags: needinfo?(dkeeler)
Well, we could add a release assertion in every place where nsNSSComponent initialization might return a failing error code and see if that tells us which operation is failing. I'll put together a patch.
Flags: needinfo?(dkeeler)
Comment on attachment 8808374 [details] [diff] [review]
1301407-nsNSSComponent-asserts.diff

Review of attachment 8808374 [details] [diff] [review]:
-----------------------------------------------------------------

Thank you for doing this.
Attachment #8808374 - Flags: review?(dd.mozilla) → review+
Pushed by dkeeler@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/db6a856a9d37
temporarily add assertions during nsNSSComponent::Init to find out what's failing on Android r=dragana
Backout by ihsiao@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/788b95bf4ea3
Backed out changeset db6a856a9d37 for Assertion failures at nsNSSComponent.cpp
Sorry had to back out for Assertion failures like https://treeherder.mozilla.org/logviewer.html#?job_id=38802703&repo=mozilla-inbound#L4854
Flags: needinfo?(dkeeler)
I would remove release assert before this aurora get to beta.

This assert will failed for wrong tls versions(see the back out above), but in that case nsNSSComponent is already created and the original crash will not happened. The original crash was in nsNSSComponent constructor. This assertion will produce even more crashes.
Attachment #8808509 - Flags: review?(dkeeler)
Flags: needinfo?(mcmanus)
Attachment #8808509 - Flags: review?(dkeeler) → review+
Looks like some of those asserts were a little over-zealous. This looks better: https://treeherder.mozilla.org/#/jobs?repo=try&revision=56451399609e8b58fd63e9d0526278bf63b1543c
Attachment #8808374 - Attachment is obsolete: true
Flags: needinfo?(dkeeler)
Attachment #8808820 - Flags: review+
Pushed by dkeeler@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/425f66fa9b10
temporarily add assertions during nsNSSComponent::Init to find out what's failing on Android r=dragana
Comment on attachment 8808509 [details] [diff] [review]
bug_130147_remove_release_assert.patch

Approval Request Comment
[Feature/regressing bug #]:
[User impact if declined]: I believe that this will cause same assertion crashes. We believed that it was safe to set an assertion at that point in code, but some assertion crashes on Nightly and aurora show that this will happen in cases that we do not want it to happen and that should not crash the browser.
[Describe test coverage new/current, TreeHerder]: It only changes MOZ_RELEASE_ASSERT into MOZ_ASSERT. Current code on beta do not have this assertion, so if we change assertion to MOZ_ASSERT the behavior will be the same as on beta50
[Risks and why]: none
[String/UUID change made/needed]: none
Attachment #8808509 - Flags: approval-mozilla-aurora?
checkin-needed for bug_130147_remove_release_assert.patch
Keywords: checkin-needed
Pushed by cbook@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/166114662d40
Remove release assert for nss initialization. r=keeler
Keywords: checkin-needed
Comment on attachment 8808509 [details] [diff] [review]
bug_130147_remove_release_assert.patch

Fix a crash. Aurora51+.
Attachment #8808509 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
I have no good explanation for this, so here's more debugging info.
The bad news is this isn't why this code is failing in 50/51, because this code is only here because of a change made in 52 (see bug 1227638). But maybe once we figure this failure out we can find the underlying failure.
Attachment #8810598 - Flags: review?(dd.mozilla)
Attachment #8810598 - Flags: review?(dd.mozilla) → review+
Crash Signature: [@ nsNSSComponent::nsNSSComponent] [@ net_EnsurePSMInit] → [@ nsNSSComponent::nsNSSComponent] [@ net_EnsurePSMInit] [@ nsNSSComponent::InitializeNSS]
Pushed by dkeeler@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/68d6f69e0837
add annotated crashes to find out why PSM initialization is failing r=dragana
(In reply to Dragana Damjanovic [:dragana] from comment #70)
> It is crashing at:
> 1861 rv = LoadExtendedValidationInfo();
> 1862 #ifdef ANDROID
> 1863 MOZ_RELEASE_ASSERT(NS_SUCCEEDED(rv));
> 1864 #endif
> https://hg.mozilla.org/mozilla-central/annotate/336759fad462/security/
> manager/ssl/nsNSSComponent.cpp#l1863
> 
> https://crash-stats.mozilla.com/signature/?version=52.
> 0a1&signature=nsNSSComponent%3A%3AInitializeNSS&date=%3E%3D2016-11-
> 04T09%3A40%3A05.000Z&date=%3C2016-11-11T09%3A40%3A05.
> 000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_colum
> ns=platform&_columns=reason&_columns=address&_sort=-date&page=1#reports

Hmmm - crash stats may have been giving us misleading information here. The report does link to that line of source code, but the "crash reason" is "MOZ_RELEASE_ASSERT(init_rv == SECSuccess)", which would point to this line:

https://hg.mozilla.org/mozilla-central/annotate/336759fad462/security/manager/ssl/nsNSSComponent.cpp#l1823

This makes a bit more sense, because there's really nothing in LoadExtendedValidationInfo that would be failing in some configurations but not others (other than out-of-memory errors, but I imagine those wouldn't be consistently failing in the same place).  That said, I'm also not aware of any reason NSS_NoDB_Init would fail, because that should be doing memory-only operations that don't depend on any previous or external state.
(In reply to David Keeler [:keeler] (use needinfo?) from comment #77)
> (In reply to Dragana Damjanovic [:dragana] from comment #70)
> > It is crashing at:
> > 1861 rv = LoadExtendedValidationInfo();
> > 1862 #ifdef ANDROID
> > 1863 MOZ_RELEASE_ASSERT(NS_SUCCEEDED(rv));
> > 1864 #endif
> > https://hg.mozilla.org/mozilla-central/annotate/336759fad462/security/
> > manager/ssl/nsNSSComponent.cpp#l1863
> > 
> > https://crash-stats.mozilla.com/signature/?version=52.
> > 0a1&signature=nsNSSComponent%3A%3AInitializeNSS&date=%3E%3D2016-11-
> > 04T09%3A40%3A05.000Z&date=%3C2016-11-11T09%3A40%3A05.
> > 000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_colum
> > ns=platform&_columns=reason&_columns=address&_sort=-date&page=1#reports
> 
> Hmmm - crash stats may have been giving us misleading information here. The
> report does link to that line of source code, but the "crash reason" is
> "MOZ_RELEASE_ASSERT(init_rv == SECSuccess)", which would point to this line:
> 
> https://hg.mozilla.org/mozilla-central/annotate/336759fad462/security/
> manager/ssl/nsNSSComponent.cpp#l1823
> 

You are right.
We should probably try to narrow this down when we already went so far.
David, can you make a patch?
Flags: needinfo?(dkeeler)
Attached patch 1301407-1_3-backout.diff (obsolete) — Splinter Review
Since NSS_NoDB_Init is NSS code and it's not clear where to start looking for the problem, I thought it might make sense to get some annotated crash reports that tell us what the PR error code from the failure was. With any luck, that will be specific enough to narrow down the search.

At the same time, I'm going to back out some of the now-unnecessary patches with the android-specific release assertions.
Attachment #8808820 - Attachment is obsolete: true
Attachment #8810598 - Attachment is obsolete: true
Flags: needinfo?(dkeeler)
Attachment #8820979 - Flags: review?(dd.mozilla)
Attached patch 1301407-2_3-backout.diff (obsolete) — Splinter Review
Attachment #8820980 - Flags: review?(dd.mozilla)
Attachment #8820979 - Flags: review?(dd.mozilla) → review+
Attachment #8820980 - Flags: review?(dd.mozilla) → review+
Attachment #8820981 - Flags: review?(dd.mozilla) → review+
Pushed by dkeeler@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f0bd0f0e2b53
Backed out changeset 68d6f69e0837 for not being necessary any longer r=dragana
https://hg.mozilla.org/integration/mozilla-inbound/rev/b9bfeb3cf38a
Backed out changeset 425f66fa9b10 for not being necessary any longer r=dragana
https://hg.mozilla.org/integration/mozilla-inbound/rev/15f2799c3990
capture NSS_NoDB_Init failure error codes in crash reports r=dragana
Target Milestone: mozilla52 → ---
What's next for this bug?
Flags: needinfo?(mcmanus)
Flags: needinfo?(dkeeler)
The 3 reports I can find that are from the most recent changes have the annotation "NSS_NoDB_Init failed with PRErrorCode -5925". -5925 is PR_CALL_ONCE_ERROR. Presumably a PR_CallOnce call failed in an earlier call to NSS_Initialize. Unfortunately the API doesn't save the original error, so we'll have to grab them manually and include them in the annotation if we want to know what it is. That might lead us to what's actually failing.
Flags: needinfo?(dkeeler)
Attachment #8820979 - Attachment is obsolete: true
Attachment #8820980 - Attachment is obsolete: true
Attachment #8830515 - Flags: review?(dd.mozilla)
Flags: needinfo?(mcmanus)
Comment on attachment 8830515 [details] [diff] [review]
1301407-save-prerror.diff

Review of attachment 8830515 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks.
Attachment #8830515 - Flags: review?(dd.mozilla) → review+
Pushed by dkeeler@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/54cecb685bca
save PRErrorCode from all NSS initialization calls and include in annotated crash report r=dragana
Looks like all the recent reports with this signature have been on desktop. Should we maybe uplift the last patch further up the chain to see if we get more useful reports?
Flags: needinfo?(dkeeler)
For some reason I had the impression this was android-only, but it looks like that's not the case. Since I'm not seeing any reports for the most recent patch, perhaps the thing to do would be to take out the #ifdef ANDROID bits and see if we get more information.
Flags: needinfo?(dkeeler)
From one of the Windows crashes, it's crashing in an assert:
 MOZ_RELEASE_ASSERT(NS_IsMainThread());
Flags: needinfo?(dkeeler)
(In reply to Randell Jesup [:jesup] from comment #96)
> From one of the Windows crashes, it's crashing in an assert:
>  MOZ_RELEASE_ASSERT(NS_IsMainThread());

Right - for platforms without the diagnostic crash reporting code, the first (every?) attempt at initialization (on the main thread) seems to fail. It then appears that something off the main thread uses something that causes nsNSSComponent to be initialized, but since it's not the main thread, that assertion fails.

I'm hoping bug 1339267 will fix this, but since that's not quite ready, we might as well enable the diagnostic crash reporting on all platforms and see if it sheds any more light on this.
Flags: needinfo?(dkeeler)
Attachment #8837826 - Flags: review?(dd.mozilla) → review+
Pushed by dkeeler@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0cd8df20768f
enable nsNSSComponent initialization diagnostic crash report on all platforms r=dragana
Whiteboard: [necko-active] → [necko-active][tbird crash]
Is this bug fixed by bug 1339267, or should we do something in addition?
Flags: needinfo?(dkeeler)
I think so - I'm not seeing any crashes with this signature after when that landed. I'm going to keep looking for a bit, though.
Flags: needinfo?(dkeeler)
Too late for firefox 52, mass-wontfix.
At this point, it's looking like this was fixed by bug 1339267. (The error codes from the few reports received after this landed correspond to SEC_ERROR_LEGACY_DATABASE and SEC_ERROR_PKCS11_DEVICE_ERROR. If a user encounters these at this point, really the only thing they can do is start in safe mode or try with a new profile.)
Status: REOPENED → RESOLVED
Closed: 8 years ago7 years ago
Resolution: --- → DUPLICATE
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: