Crash in nsNSSComponent::nsNSSComponent

RESOLVED DUPLICATE of bug 1339267

Status

()

Core
Networking
--
critical
RESOLVED DUPLICATE of bug 1339267
a year ago
8 months ago

People

(Reporter: marcia, Assigned: keeler)

Tracking

({crash, leave-open, regression})

49 Branch
Unspecified
Android
crash, leave-open, regression
Points:
---

Firefox Tracking Flags

(firefox49- wontfix, firefox50+ wontfix, firefox51 wontfix, firefox52 wontfix, firefox53 wontfix, firefox54 wontfix, firefox55 fixed)

Details

(Whiteboard: [necko-active][tbird crash], crash signature)

Attachments

(5 attachments, 10 obsolete attachments)

2.12 KB, patch
dragana
: review+
Details | Diff | Splinter Review
855 bytes, patch
keeler
: review+
Details | Diff | Splinter Review
2.57 KB, patch
dragana
: review+
Details | Diff | Splinter Review
2.62 KB, patch
dragana
: review+
Details | Diff | Splinter Review
4.73 KB, patch
dragana
: review+
Details | Diff | Splinter Review
(Reporter)

Description

a year ago
This bug was filed from the Socorro interface and is 
report bp-3d2b8867-3bba-488b-ae04-5ae892160908.
=============================================================

Seen while looking at Fennec beta 11 crashes: http://bit.ly/2cwb43x

New signature that is currently low volume in RC but didn't appear in released versions. Also present in very small volume on Aurora and Nightly.

Not sure if this should be bucketed in the NSS component.
This is coming from bug 1273475.
Flags: needinfo?(jjones)
Flags: needinfo?(dkeeler)
It looks like we're doing a speculative connect before PSM/NSS has been initialized on the main thread:

> 16 libxul.so mozilla::net::nsHttpConnectionMgr::OnMsgSpeculativeConnect netwerk/protocol/http/nsHttpConnectionMgr.cpp:2923

I tried to figure out how this could be happening, but all the code paths I'm finding have already initialized PSM/NSS on the main thread. Maybe Patrick has an idea?
Flags: needinfo?(dkeeler) → needinfo?(mcmanus)

Updated

a year ago
Flags: needinfo?(jjones)
I don't really know what path, but you can probably do this and I'd r+ it

   if (!IsNeckoChild()) {
        // HACK: make sure PSM gets initialized on the main thread.
        net_EnsurePSMInit();
    }
in

nsHttpConnectionMgr::SpeculativeConnect() (in nsHttpConnectionMgr.cpp)
Flags: needinfo?(mcmanus)
Comment hidden (mozreview-request)
(Assignee)

Updated

a year ago
Assignee: nobody → dkeeler
Component: Security → Networking
Whiteboard: [necko-active]

Updated

a year ago
status-firefox52: --- → affected
Keywords: regression

Comment 5

a year ago
[Tracking Requested - why for this release]:
this is making up 1.7% of crashes on android release right now and almost all of them crashes on startup.

@patrick - could you also explicitly r+ this change?
tracking-firefox49: --- → ?
tracking-firefox50: --- → ?
Flags: needinfo?(mcmanus)

Comment 6

a year ago
mozreview-review
Comment on attachment 8793963 [details]
bug 1301407 - ensure PSM is initialized on the main thread before doing a speculative connect

https://reviewboard.mozilla.org/r/80544/#review82244
Attachment #8793963 - Flags: review?(mcmanus) → review+
I'm sorry about the review delay. I already reviewed this but I must not have succeeded in publishing it and it fell off the radar. thanks for the ni
Flags: needinfo?(mcmanus)

Comment 8

a year ago
Pushed by dkeeler@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8450da83e7bc
ensure PSM is initialized on the main thread before doing a speculative connect r=mcmanus
Thanks! (shakes fist at review board...)

Comment 10

a year ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/8450da83e7bc
Status: NEW → RESOLVED
Last Resolved: a year ago
status-firefox52: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla52

Comment 11

a year ago
can we uplift the fix in terms of scope & risk? (as noted in comment #5, it's a top crasher on android)
Flags: needinfo?(dkeeler)
Comment on attachment 8793963 [details]
bug 1301407 - ensure PSM is initialized on the main thread before doing a speculative connect

Approval Request Comment
[Feature/regressing bug #]: speculative connect causing PSM initialization off the main thread
[User impact if declined]: startup crashes
[Describe test coverage new/current, TreeHerder]: the related features have tests, but nothing covers this specific case
[Risks and why]: low - the fix is small and takes existing behavior and applies it to the new case of speculative connect. Basically the worst that could happen is this doesn't fix the issue.
[String/UUID change made/needed]: none
Flags: needinfo?(dkeeler)
Attachment #8793963 - Flags: approval-mozilla-beta?
Attachment #8793963 - Flags: approval-mozilla-aurora?
Tracked as it's a top crasher.
tracking-firefox50: ? → +
Comment on attachment 8793963 [details]
bug 1301407 - ensure PSM is initialized on the main thread before doing a speculative connect

Fixes a top crasher on Fennec, Aurora51+, Beta50+
Attachment #8793963 - Flags: approval-mozilla-beta?
Attachment #8793963 - Flags: approval-mozilla-beta+
Attachment #8793963 - Flags: approval-mozilla-aurora?
Attachment #8793963 - Flags: approval-mozilla-aurora+

Comment 15

a year ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-aurora/rev/3e11e42465cc
status-firefox51: affected → fixed

Comment 16

a year ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-beta/rev/6b1a3f769aa6
status-firefox50: affected → fixed

Comment 17

a year ago
hi, the patch has landed in 50.0b6 but crashes with this signature are continuing:
https://crash-stats.mozilla.com/signature/?product=Firefox&product=FennecAndroid&process_type=browser&version=50.0b6&signature=nsNSSComponent%3A%3AnsNSSComponent#reports

should we reopen the bug?
status-firefox50: fixed → affected
status-firefox51: fixed → affected
status-firefox52: fixed → affected
just making sure dkeeler see comment #17
Flags: needinfo?(dkeeler)
(In reply to David Keeler [:keeler] (use needinfo?) from comment #12)
...
> Basically the worst that could happen is this doesn't fix the issue.

:(

On a more serious note, I'm sort-of out of ideas as to how this could still be happening.
Status: RESOLVED → REOPENED
Flags: needinfo?(dkeeler)
Resolution: FIXED → ---

Comment 20

a year ago
Patrick - I think we/keeler need some necko eyes on this for help in tracking down what could be happening in the stack to even get here. Can you point us to someone who can take a look? Thanks!
Flags: needinfo?(mcmanus)
Could we call net_EnsurePSMInit in nsHttpHandler::Init() ? Or is that too early in the startup process?
Is calling GetSocketProvider("ssl") enough, or do we need to keep a ref to the socket provider around?
Target Milestone: mozilla52 → ---
Comment on attachment 8793963 [details]
bug 1301407 - ensure PSM is initialized on the main thread before doing a speculative connect

(Clearing the flags so it doesn't show up on Sheriff queries)
Attachment #8793963 - Flags: approval-mozilla-beta+
Attachment #8793963 - Flags: approval-mozilla-aurora+
(In reply to Valentin Gosu [:valentin] from comment #21)
> Could we call net_EnsurePSMInit in nsHttpHandler::Init() ? Or is that too
> early in the startup process?
> Is calling GetSocketProvider("ssl") enough, or do we need to keep a ref to
> the socket provider around?

The question above is more for security people :) I will needinfo them.

we can also do it just after socketTransport thread is created (in nsIOService), that is the earliest :)
Flags: needinfo?(jjones)
Flags: needinfo?(dkeeler)
(In reply to Valentin Gosu [:valentin] from comment #21)
> Could we call net_EnsurePSMInit in nsHttpHandler::Init() ? Or is that too
> early in the startup process?

It looks like that could work. The only requirements are that it be on the main thread and that, if a profile directory is available, it happens after the directory service is set up to return that directory if queried for it.

> Is calling GetSocketProvider("ssl") enough, or do we need to keep a ref to
> the socket provider around?

My understanding is that since it's a service, it should stay around in the service manager (which keeps a reference to it). Note that GetSocketProvider("ssl") is a bit indirect - you could call do_GetService(PSM_COMPONENT_CONTRACTID) to be more clear that the purpose is to ensure that PSM has properly initialized.

(In reply to Dragana Damjanovic [:dragana] from comment #23)
> we can also do it just after socketTransport thread is created (in
> nsIOService), that is the earliest :)

Again, as long as that's main-thread-only and the profile directory is available, that should work. Earlier is probably better (if we can simplify the reasoning about when PSM/NSS will be available, that would be nice).
Flags: needinfo?(dkeeler)
This crash is interesting:

https://crash-stats.mozilla.com/report/index/bfd8eed6-fd15-44ba-830d-e87862161022

It crashes in OnMsgSpeculativeConnect(). The only place we call (correctly we dispatch it to the socketThread) this is in SpeculativeConnect
https://hg.mozilla.org/releases/mozilla-beta/annotate/cefddaa8e227/netwerk/protocol/http/nsHttpConnectionMgr.cpp#l455

A couple of lines above this call we call net_EnsurePSMInit()
https://hg.mozilla.org/releases/mozilla-beta/annotate/cefddaa8e227/netwerk/protocol/http/nsHttpConnectionMgr.cpp#l413

So the only way this can happen is if net_EnsurePSMInit fails somewhere or reference to the service is not kept.

We could check whether functions called by net_EnsurePSMInit have succeeded and if not we should crash. In that way we can get the real reason this fails.

Maybe keeler have an idea in which cases it could fail.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c2c1b51f6025
Created attachment 8803725 [details] [diff] [review]
bug_1301407_ensure_nss_initialized.patch

This will not resolved the bug but just make it fail earlier if PSM cannot be initialized.

try run is in comment 26.
Attachment #8803725 - Flags: review?(mcmanus)
Attachment #8803725 - Flags: review?(dkeeler)
Created attachment 8803769 [details] [diff] [review]
bug_1301407_ensure_nss_initialized.patch

Of course it must be MOZ_RELEASE_ASSERT.
Attachment #8803725 - Attachment is obsolete: true
Attachment #8803725 - Flags: review?(mcmanus)
Attachment #8803725 - Flags: review?(dkeeler)
Attachment #8803769 - Flags: review?(mcmanus)
Attachment #8803769 - Flags: review?(dkeeler)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=2ae45aeb3fb0
test are failing :)
Created attachment 8803869 [details] [diff] [review]
bug_1301407_ensure_nss_initialized.patch
Attachment #8793963 - Attachment is obsolete: true
Attachment #8803769 - Attachment is obsolete: true
Attachment #8803769 - Flags: review?(mcmanus)
Attachment #8803769 - Flags: review?(dkeeler)
Attachment #8803869 - Flags: review?(mcmanus)
Attachment #8803869 - Flags: review?(dkeeler)
Comment on attachment 8803869 [details] [diff] [review]
bug_1301407_ensure_nss_initialized.patch

Review of attachment 8803869 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/nsHttpConnectionMgr.cpp
@@ +414,2 @@
>          // HACK: make sure PSM gets initialized on the main thread.
> +        nsCOMPtr<nsISupports> nss = do_GetService(PSM_COMPONENT_CONTRACTID, &rv);

isn't this logic that net_EnsurePSMInit should do?
 (In reply to Patrick McManus [:mcmanus] from comment #32)
> Comment on attachment 8803869 [details] [diff] [review]
> bug_1301407_ensure_nss_initialized.patch
> 
> Review of attachment 8803869 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: netwerk/protocol/http/nsHttpConnectionMgr.cpp
> @@ +414,2 @@
> >          // HACK: make sure PSM gets initialized on the main thread.
> > +        nsCOMPtr<nsISupports> nss = do_GetService(PSM_COMPONENT_CONTRACTID, &rv);
> 
> isn't this logic that net_EnsurePSMInit should do?

It does not do it directly but using GetSocketProvider("ssl"). The reason I am doing it this way is to make sure that nssComponent is created, e.g. to assert on NS_SUCCEEEDED(rv).

SpeculativeConnect function calls net_EnsurePSMInit but we still create nssComponent on socketThread and of course fail assertion. (crash from comment 25)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=4bb430ed92ff

Updated

a year ago
Flags: needinfo?(jjones)
Comment on attachment 8803869 [details] [diff] [review]
bug_1301407_ensure_nss_initialized.patch

Review of attachment 8803869 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/nsHttpConnectionMgr.cpp
@@ +414,2 @@
>          // HACK: make sure PSM gets initialized on the main thread.
> +        nsCOMPtr<nsISupports> nss = do_GetService(PSM_COMPONENT_CONTRACTID, &rv);

Right - maybe we should just fix net_EnsurePSMInit to directly initialize PSM, check for failure, and raise a release assertion if it fails? (Unfortunately, it won't tell us why PSM is failing to initialize...)
Attachment #8803869 - Flags: review?(dkeeler)
(In reply to David Keeler [:keeler] (use needinfo?) from comment #35)
> Comment on attachment 8803869 [details] [diff] [review]
> bug_1301407_ensure_nss_initialized.patch
> 
> Review of attachment 8803869 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: netwerk/protocol/http/nsHttpConnectionMgr.cpp
> @@ +414,2 @@
> >          // HACK: make sure PSM gets initialized on the main thread.
> > +        nsCOMPtr<nsISupports> nss = do_GetService(PSM_COMPONENT_CONTRACTID, &rv);
> 
> Right - maybe we should just fix net_EnsurePSMInit to directly initialize
> PSM, check for failure, and raise a release assertion if it fails?
> (Unfortunately, it won't tell us why PSM is failing to initialize...)

I know, but I should have been more precise what I want to do.
I want to uplift this to beta, maybe next beta this one is late in the cycle. I was hoping to get a crash stack on the main thread, maybe that will tell us more. This is not a solution, this patch should gave us hopefully more info what is happening. This crash is bad on android.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=66cc17067bdd
Created attachment 8804167 [details] [diff] [review]
bug_1301407_ensure_nss_initialized.patch
Attachment #8803869 - Attachment is obsolete: true
Attachment #8803869 - Flags: review?(mcmanus)
Attachment #8804167 - Flags: review?(mcmanus)
Attachment #8804167 - Flags: review?(dkeeler)
there is still a possibility that nssComponent get destroyed after creation.
Comment on attachment 8804167 [details] [diff] [review]
bug_1301407_ensure_nss_initialized.patch

Review of attachment 8804167 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/base/nsNetUtil.cpp
@@ +2126,5 @@
>  
>  void net_EnsurePSMInit()
>  {
> +    nsresult rv;
> +    nsCOMPtr<nsISupports> nss = do_GetService(PSM_COMPONENT_CONTRACTID, &rv);

s/nss/psm ?

@@ -2128,5 @@
> -    nsCOMPtr<nsISocketProviderService> spserv =
> -            do_GetService(NS_SOCKETPROVIDERSERVICE_CONTRACTID);
> -    if (spserv) {
> -        nsCOMPtr<nsISocketProvider> provider;
> -        spserv->GetSocketProvider("ssl", getter_AddRefs(provider));

I didn't do the research to make sure this is made redundant by the new code.. just making sure you did :)
Attachment #8804167 - Flags: review?(mcmanus) → review+
(In reply to Patrick McManus [:mcmanus] from comment #40)
> Comment on attachment 8804167 [details] [diff] [review]
> bug_1301407_ensure_nss_initialized.patch
> 
> Review of attachment 8804167 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: netwerk/base/nsNetUtil.cpp
> @@ +2126,5 @@
> >  
> >  void net_EnsurePSMInit()
> >  {
> > +    nsresult rv;
> > +    nsCOMPtr<nsISupports> nss = do_GetService(PSM_COMPONENT_CONTRACTID, &rv);
> 
> s/nss/psm ?
> 
> @@ -2128,5 @@
> > -    nsCOMPtr<nsISocketProviderService> spserv =
> > -            do_GetService(NS_SOCKETPROVIDERSERVICE_CONTRACTID);
> > -    if (spserv) {
> > -        nsCOMPtr<nsISocketProvider> provider;
> > -        spserv->GetSocketProvider("ssl", getter_AddRefs(provider));
> 
> I didn't do the research to make sure this is made redundant by the new
> code.. just making sure you did :)

Comment 24 states that and I also look through the code a bit.
Comment on attachment 8804167 [details] [diff] [review]
bug_1301407_ensure_nss_initialized.patch

Review of attachment 8804167 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/base/nsNetUtil.cpp
@@ -2128,5 @@
> -    nsCOMPtr<nsISocketProviderService> spserv =
> -            do_GetService(NS_SOCKETPROVIDERSERVICE_CONTRACTID);
> -    if (spserv) {
> -        nsCOMPtr<nsISocketProvider> provider;
> -        spserv->GetSocketProvider("ssl", getter_AddRefs(provider));

From my reading, the new code really is just a simplification of this original code (the important part was always that it would cause PSM to be initialized).
Attachment #8804167 - Flags: review?(dkeeler) → review+
Created attachment 8804360 [details] [diff] [review]
bug_1301407_ensure_nss_initialized.patch
Attachment #8804167 - Attachment is obsolete: true
Attachment #8804360 - Flags: review+
Keywords: checkin-needed

Comment 44

a year ago
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a14a46bd1ea2
Ensure nss initialized during nsHttpHandler initialization. r=mcmanus, r=keeler
Keywords: checkin-needed
Too late for a fix in 49. But we still have time to get a fix into 50.
status-firefox49: affected → wontfix
tracking-firefox49: ? → -

Comment 46

a year ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/a14a46bd1ea2
Status: REOPENED → RESOLVED
Last Resolved: a year agoa year ago
status-firefox52: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
I forgot to set "leave-open".
Probably, this patch will not fix the issue. We do not know what goes wrong and this patch should give us more clues.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I will ask for a uplift to aurora after it runs for a day in nightly.
There is no crashes on nightly but there are some on aurora.
Comment on attachment 8804360 [details] [diff] [review]
bug_1301407_ensure_nss_initialized.patch

Approval Request Comment
[Feature/regressing bug #]:
[User impact if declined]: Startup crash. This patch will not fix the issue, but it could help us by giving us some more clues. Nightly is not crashing therefore we need uplift to aurora (there are some crashes on aurora, very low volume).
[Describe test coverage new/current, TreeHerder]: This code path is very heavily used.
[Risks and why]: low
[String/UUID change made/needed]: none
Attachment #8804360 - Flags: approval-mozilla-aurora?
Comment on attachment 8804360 [details] [diff] [review]
bug_1301407_ensure_nss_initialized.patch

Take this patch to help investigate the issue. Aurora51+.
Attachment #8804360 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Comment 51

a year ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-aurora/rev/60538c8091f1
status-firefox51: affected → fixed
Unsure if this means 51 gets marked as fixed, feel free to correct it if I'm wrong.
Flags: needinfo?(dkeeler)
(In reply to Wes Kocher (:KWierso) from comment #52)
> Unsure if this means 51 gets marked as fixed, feel free to correct it if I'm
> wrong.

This patch should help us find the problem, most probably it will not fix it. I will change to affected.
Flags: needinfo?(dkeeler)
status-firefox51: fixed → affected
status-firefox52: fixed → affected
Crash Signature: [@ nsNSSComponent::nsNSSComponent] → [@ nsNSSComponent::nsNSSComponent] [@ net_EnsurePSMInit]
Now we have a bit more clue what is happening.

net_EnsurePSMInit crashes on assert.
For some reason we cannot initialize PSM service.
any idea ^?
Flags: needinfo?(dkeeler)
Well, we could add a release assertion in every place where nsNSSComponent initialization might return a failing error code and see if that tells us which operation is failing. I'll put together a patch.
Flags: needinfo?(dkeeler)
Created attachment 8808374 [details] [diff] [review]
1301407-nsNSSComponent-asserts.diff
Attachment #8808374 - Flags: review?(dd.mozilla)
Comment on attachment 8808374 [details] [diff] [review]
1301407-nsNSSComponent-asserts.diff

Review of attachment 8808374 [details] [diff] [review]:
-----------------------------------------------------------------

Thank you for doing this.
Attachment #8808374 - Flags: review?(dd.mozilla) → review+
(Assignee)

Updated

a year ago
Keywords: leave-open

Comment 59

a year ago
Pushed by dkeeler@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/db6a856a9d37
temporarily add assertions during nsNSSComponent::Init to find out what's failing on Android r=dragana

Comment 60

a year ago
Backout by ihsiao@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/788b95bf4ea3
Backed out changeset db6a856a9d37 for Assertion failures at nsNSSComponent.cpp
Sorry had to back out for Assertion failures like https://treeherder.mozilla.org/logviewer.html#?job_id=38802703&repo=mozilla-inbound#L4854
Flags: needinfo?(dkeeler)
Created attachment 8808509 [details] [diff] [review]
bug_130147_remove_release_assert.patch

I would remove release assert before this aurora get to beta.

This assert will failed for wrong tls versions(see the back out above), but in that case nsNSSComponent is already created and the original crash will not happened. The original crash was in nsNSSComponent constructor. This assertion will produce even more crashes.
Attachment #8808509 - Flags: review?(dkeeler)
Flags: needinfo?(mcmanus)
(Assignee)

Updated

a year ago
Attachment #8808509 - Flags: review?(dkeeler) → review+
Created attachment 8808820 [details] [diff] [review]
1301407-nsNSSComponent-asserts.diff v2

Looks like some of those asserts were a little over-zealous. This looks better: https://treeherder.mozilla.org/#/jobs?repo=try&revision=56451399609e8b58fd63e9d0526278bf63b1543c
Attachment #8808374 - Attachment is obsolete: true
Flags: needinfo?(dkeeler)
Attachment #8808820 - Flags: review+

Comment 64

a year ago
Pushed by dkeeler@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/425f66fa9b10
temporarily add assertions during nsNSSComponent::Init to find out what's failing on Android r=dragana
Comment on attachment 8808509 [details] [diff] [review]
bug_130147_remove_release_assert.patch

Approval Request Comment
[Feature/regressing bug #]:
[User impact if declined]: I believe that this will cause same assertion crashes. We believed that it was safe to set an assertion at that point in code, but some assertion crashes on Nightly and aurora show that this will happen in cases that we do not want it to happen and that should not crash the browser.
[Describe test coverage new/current, TreeHerder]: It only changes MOZ_RELEASE_ASSERT into MOZ_ASSERT. Current code on beta do not have this assertion, so if we change assertion to MOZ_ASSERT the behavior will be the same as on beta50
[Risks and why]: none
[String/UUID change made/needed]: none
Attachment #8808509 - Flags: approval-mozilla-aurora?
checkin-needed for bug_130147_remove_release_assert.patch
Keywords: checkin-needed

Comment 67

a year ago
Pushed by cbook@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/166114662d40
Remove release assert for nss initialization. r=keeler
Keywords: checkin-needed

Comment 68

a year ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/425f66fa9b10

Comment 69

a year ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/166114662d40
It is crashing at:
1861 rv = LoadExtendedValidationInfo();
1862 #ifdef ANDROID
1863 MOZ_RELEASE_ASSERT(NS_SUCCEEDED(rv));
1864 #endif
https://hg.mozilla.org/mozilla-central/annotate/336759fad462/security/manager/ssl/nsNSSComponent.cpp#l1863

https://crash-stats.mozilla.com/signature/?version=52.0a1&signature=nsNSSComponent%3A%3AInitializeNSS&date=%3E%3D2016-11-04T09%3A40%3A05.000Z&date=%3C2016-11-11T09%3A40%3A05.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_sort=-date&page=1#reports
Comment on attachment 8808509 [details] [diff] [review]
bug_130147_remove_release_assert.patch

Fix a crash. Aurora51+.
Attachment #8808509 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Comment 72

a year ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-aurora/rev/6c8d0b43d402
status-firefox51: affected → fixed
Created attachment 8810598 [details] [diff] [review]
annotate LoadExtendedValidationInfo failures

I have no good explanation for this, so here's more debugging info.
The bad news is this isn't why this code is failing in 50/51, because this code is only here because of a change made in 52 (see bug 1227638). But maybe once we figure this failure out we can find the underlying failure.
Attachment #8810598 - Flags: review?(dd.mozilla)
Attachment #8810598 - Flags: review?(dd.mozilla) → review+
Crash Signature: [@ nsNSSComponent::nsNSSComponent] [@ net_EnsurePSMInit] → [@ nsNSSComponent::nsNSSComponent] [@ net_EnsurePSMInit] [@ nsNSSComponent::InitializeNSS]

Comment 74

a year ago
Pushed by dkeeler@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/68d6f69e0837
add annotated crashes to find out why PSM initialization is failing r=dragana
Thanks! We'll see how this goes.
(This was the try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=480e52ffa4ecd8797b6f6e70df7848c53475092a )

Comment 76

a year ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/68d6f69e0837
(In reply to Dragana Damjanovic [:dragana] from comment #70)
> It is crashing at:
> 1861 rv = LoadExtendedValidationInfo();
> 1862 #ifdef ANDROID
> 1863 MOZ_RELEASE_ASSERT(NS_SUCCEEDED(rv));
> 1864 #endif
> https://hg.mozilla.org/mozilla-central/annotate/336759fad462/security/
> manager/ssl/nsNSSComponent.cpp#l1863
> 
> https://crash-stats.mozilla.com/signature/?version=52.
> 0a1&signature=nsNSSComponent%3A%3AInitializeNSS&date=%3E%3D2016-11-
> 04T09%3A40%3A05.000Z&date=%3C2016-11-11T09%3A40%3A05.
> 000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_colum
> ns=platform&_columns=reason&_columns=address&_sort=-date&page=1#reports

Hmmm - crash stats may have been giving us misleading information here. The report does link to that line of source code, but the "crash reason" is "MOZ_RELEASE_ASSERT(init_rv == SECSuccess)", which would point to this line:

https://hg.mozilla.org/mozilla-central/annotate/336759fad462/security/manager/ssl/nsNSSComponent.cpp#l1823

This makes a bit more sense, because there's really nothing in LoadExtendedValidationInfo that would be failing in some configurations but not others (other than out-of-memory errors, but I imagine those wouldn't be consistently failing in the same place).  That said, I'm also not aware of any reason NSS_NoDB_Init would fail, because that should be doing memory-only operations that don't depend on any previous or external state.
Too late to fix in 50.1.0 release
status-firefox50: affected → wontfix
(In reply to David Keeler [:keeler] (use needinfo?) from comment #77)
> (In reply to Dragana Damjanovic [:dragana] from comment #70)
> > It is crashing at:
> > 1861 rv = LoadExtendedValidationInfo();
> > 1862 #ifdef ANDROID
> > 1863 MOZ_RELEASE_ASSERT(NS_SUCCEEDED(rv));
> > 1864 #endif
> > https://hg.mozilla.org/mozilla-central/annotate/336759fad462/security/
> > manager/ssl/nsNSSComponent.cpp#l1863
> > 
> > https://crash-stats.mozilla.com/signature/?version=52.
> > 0a1&signature=nsNSSComponent%3A%3AInitializeNSS&date=%3E%3D2016-11-
> > 04T09%3A40%3A05.000Z&date=%3C2016-11-11T09%3A40%3A05.
> > 000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_colum
> > ns=platform&_columns=reason&_columns=address&_sort=-date&page=1#reports
> 
> Hmmm - crash stats may have been giving us misleading information here. The
> report does link to that line of source code, but the "crash reason" is
> "MOZ_RELEASE_ASSERT(init_rv == SECSuccess)", which would point to this line:
> 
> https://hg.mozilla.org/mozilla-central/annotate/336759fad462/security/
> manager/ssl/nsNSSComponent.cpp#l1823
> 

You are right.
We should probably try to narrow this down when we already went so far.
David, can you make a patch?
Flags: needinfo?(dkeeler)
Created attachment 8820979 [details] [diff] [review]
1301407-1_3-backout.diff

Since NSS_NoDB_Init is NSS code and it's not clear where to start looking for the problem, I thought it might make sense to get some annotated crash reports that tell us what the PR error code from the failure was. With any luck, that will be specific enough to narrow down the search.

At the same time, I'm going to back out some of the now-unnecessary patches with the android-specific release assertions.
Attachment #8808820 - Attachment is obsolete: true
Attachment #8810598 - Attachment is obsolete: true
Flags: needinfo?(dkeeler)
Attachment #8820979 - Flags: review?(dd.mozilla)
Created attachment 8820980 [details] [diff] [review]
1301407-2_3-backout.diff
Attachment #8820980 - Flags: review?(dd.mozilla)
Created attachment 8820981 [details] [diff] [review]
1301407-3_3-annotate-NSS_NoDB_Init.diff
Attachment #8820981 - Flags: review?(dd.mozilla)
Attachment #8820979 - Flags: review?(dd.mozilla) → review+
Attachment #8820980 - Flags: review?(dd.mozilla) → review+
Attachment #8820981 - Flags: review?(dd.mozilla) → review+
Thanks!

Comment 84

a year ago
Pushed by dkeeler@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f0bd0f0e2b53
Backed out changeset 68d6f69e0837 for not being necessary any longer r=dragana
https://hg.mozilla.org/integration/mozilla-inbound/rev/b9bfeb3cf38a
Backed out changeset 425f66fa9b10 for not being necessary any longer r=dragana
https://hg.mozilla.org/integration/mozilla-inbound/rev/15f2799c3990
capture NSS_NoDB_Init failure error codes in crash reports r=dragana
Thanks!
https://treeherder.mozilla.org/#/jobs?repo=try&revision=cdd258a8782518e09fd59c271e64f3bff91ae95b

Comment 86

a year ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/15f2799c3990
(The backouts merged to m-c, too.)
https://hg.mozilla.org/mozilla-central/rev/f0bd0f0e2b53
https://hg.mozilla.org/mozilla-central/rev/b9bfeb3cf38a
status-firefox53: --- → affected
(Reporter)

Updated

11 months ago
status-firefox54: --- → affected
status-firefox51: fixed → wontfix
Target Milestone: mozilla52 → ---
What's next for this bug?
Flags: needinfo?(mcmanus)
Flags: needinfo?(dkeeler)
(Assignee)

Comment 89

11 months ago
The 3 reports I can find that are from the most recent changes have the annotation "NSS_NoDB_Init failed with PRErrorCode -5925". -5925 is PR_CALL_ONCE_ERROR. Presumably a PR_CallOnce call failed in an earlier call to NSS_Initialize. Unfortunately the API doesn't save the original error, so we'll have to grab them manually and include them in the annotation if we want to know what it is. That might lead us to what's actually failing.
Flags: needinfo?(dkeeler)
(Assignee)

Comment 90

11 months ago
Created attachment 8830515 [details] [diff] [review]
1301407-save-prerror.diff
Attachment #8820979 - Attachment is obsolete: true
Attachment #8820980 - Attachment is obsolete: true
Attachment #8830515 - Flags: review?(dd.mozilla)
Flags: needinfo?(mcmanus)
Comment on attachment 8830515 [details] [diff] [review]
1301407-save-prerror.diff

Review of attachment 8830515 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks.
Attachment #8830515 - Flags: review?(dd.mozilla) → review+

Comment 92

11 months ago
Pushed by dkeeler@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/54cecb685bca
save PRErrorCode from all NSS initialization calls and include in annotated crash report r=dragana

Comment 93

11 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/54cecb685bca
Looks like all the recent reports with this signature have been on desktop. Should we maybe uplift the last patch further up the chain to see if we get more useful reports?
Flags: needinfo?(dkeeler)
(Assignee)

Comment 95

10 months ago
For some reason I had the impression this was android-only, but it looks like that's not the case. Since I'm not seeing any reports for the most recent patch, perhaps the thing to do would be to take out the #ifdef ANDROID bits and see if we get more information.
Flags: needinfo?(dkeeler)

Comment 96

10 months ago
From one of the Windows crashes, it's crashing in an assert:
 MOZ_RELEASE_ASSERT(NS_IsMainThread());
Flags: needinfo?(dkeeler)
(Assignee)

Comment 97

10 months ago
(In reply to Randell Jesup [:jesup] from comment #96)
> From one of the Windows crashes, it's crashing in an assert:
>  MOZ_RELEASE_ASSERT(NS_IsMainThread());

Right - for platforms without the diagnostic crash reporting code, the first (every?) attempt at initialization (on the main thread) seems to fail. It then appears that something off the main thread uses something that causes nsNSSComponent to be initialized, but since it's not the main thread, that assertion fails.

I'm hoping bug 1339267 will fix this, but since that's not quite ready, we might as well enable the diagnostic crash reporting on all platforms and see if it sheds any more light on this.
Flags: needinfo?(dkeeler)
(Assignee)

Comment 98

10 months ago
Created attachment 8837826 [details] [diff] [review]
1301407-enable-on-all-platforms.diff
Attachment #8837826 - Flags: review?(dd.mozilla)
Attachment #8837826 - Flags: review?(dd.mozilla) → review+

Comment 99

10 months ago
Pushed by dkeeler@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0cd8df20768f
enable nsNSSComponent initialization diagnostic crash report on all platforms r=dragana

Comment 100

10 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/0cd8df20768f

Updated

10 months ago
Whiteboard: [necko-active] → [necko-active][tbird crash]
Is this bug fixed by bug 1339267, or should we do something in addition?
Flags: needinfo?(dkeeler)
I think so - I'm not seeing any crashes with this signature after when that landed. I'm going to keep looking for a bit, though.
Flags: needinfo?(dkeeler)
Too late for firefox 52, mass-wontfix.
status-firefox52: affected → wontfix
At this point, it's looking like this was fixed by bug 1339267. (The error codes from the few reports received after this landed correspond to SEC_ERROR_LEGACY_DATABASE and SEC_ERROR_PKCS11_DEVICE_ERROR. If a user encounters these at this point, really the only thing they can do is start in safe mode or try with a new profile.)
Status: REOPENED → RESOLVED
Last Resolved: a year ago8 months ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1339267
status-firefox53: affected → wontfix
status-firefox54: affected → wontfix
status-firefox55: --- → fixed
You need to log in before you can comment on or make changes to this bug.