Closed Bug 1650654 Opened 4 years ago Closed 3 years ago

Crash in [@ PK11_TraverseCertsInSlot]

Categories

(Core :: Security: PSM, defect, P1)

79 Branch
All
Unspecified
defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox-esr68 --- unaffected
firefox-esr78 --- unaffected
firefox78 --- unaffected
firefox79 + disabled
firefox80 + disabled
firefox81 --- disabled
firefox82 --- disabled

People

(Reporter: philipp, Assigned: keeler)

References

(Regression)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 obsolete file)

This bug is for crash report bp-802f5a47-ae5b-445f-ba81-295a30200705.

Top 10 frames of crashing thread:

0 nss3.dll PK11_TraverseCertsInSlot security/nss/lib/pk11wrap/pk11cert.c:2353
1 nss3.dll PK11_ListCertsInSlot security/nss/lib/pk11wrap/pk11cert.c:2893
2 xul.dll IntermediatePreloadingHealerCallback security/manager/ssl/nsNSSComponent.cpp:2186
3 xul.dll nsTimerImpl::Fire xpcom/threads/nsTimerImpl.cpp:562
4 xul.dll nsTimerEvent::Run xpcom/threads/TimerThread.cpp:251
5 xul.dll mozilla::TaskQueue::Runner::Run xpcom/threads/TaskQueue.cpp:158
6 xul.dll nsThreadPool::Run xpcom/threads/nsThreadPool.cpp:299
7 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1234
8 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:504
9 xul.dll mozilla::ipc::MessagePumpForNonMainThreads::Run ipc/glue/MessagePump.cpp:332

this crash signature is starting to show up in the 79.0b cycle and looks related to the changes from bug 1630434.

Crash volume looks a bit concerning.

Flags: needinfo?(dkeeler)

Ryan, it looks like our best next step here would be to land a diagnostic assert directly into beta's copy of NSS, and pull it out after we get some more data. Such an edit-to-the-RTM-tagged-NSS move wouldn't work for Linux, but it appears to be pretty much only Windows, where that kind of trickery works.

Assignee: nobody → dkeeler
Status: NEW → ASSIGNED
Severity: -- → S3
Flags: needinfo?(dkeeler)
Keywords: leave-open
Priority: -- → P1

Comment on attachment 9162009 [details]
Bug 1650654 - diagnostic assertions to narrow down where PK11_TraverseCertsInSlot is crashing r?jcj

Beta/Release Uplift Approval Request

  • User impact if declined: we're using this to (hopefully) narrow down where/why PK11_TraverseCertsInSlot is crashing when we try to run the intermediate preloading healer
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): this should be low-risk - the behavior we're changing is how/where we crash if NSS would dereference a null pointer (so, this will crash if and only if it would already crash)
    also, we're going to back this out before release
  • String changes made/needed:
Attachment #9162009 - Flags: approval-mozilla-beta?

Comment on attachment 9162009 [details]
Bug 1650654 - diagnostic assertions to narrow down where PK11_TraverseCertsInSlot is crashing r?jcj

Diagnostic asserts to hopefully track down the cause of this crash. Approved for 79.0b5.

Attachment #9162009 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment on attachment 9162009 [details]
Bug 1650654 - diagnostic assertions to narrow down where PK11_TraverseCertsInSlot is crashing r?jcj

Uplifted to Beta. Clearing the approval flag to get it off the needs-uplift radar.
https://hg.mozilla.org/releases/mozilla-beta/rev/68fb048ce9ee

Attachment #9162009 - Flags: approval-mozilla-beta+

who knew that kernel32 wasn't around?

I'm debugging this in the phabricator review... pretty much debug-by-trypush, nothing elegent.

Comment on attachment 9162009 [details]
Bug 1650654 - diagnostic assertions to narrow down where PK11_TraverseCertsInSlot is crashing r?jcj

Beta/Release Uplift Approval Request

Flags: needinfo?(dkeeler)
Attachment #9162009 - Flags: approval-mozilla-beta?

Comment on attachment 9162009 [details]
Bug 1650654 - diagnostic assertions to narrow down where PK11_TraverseCertsInSlot is crashing r?jcj

Approved for 79.0b6.

Attachment #9162009 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment on attachment 9162009 [details]
Bug 1650654 - diagnostic assertions to narrow down where PK11_TraverseCertsInSlot is crashing r?jcj

Landed on Beta for 79.0b6.
https://hg.mozilla.org/releases/mozilla-beta/rev/5f02771d855b

Removing the approval flag to get this off the needs-uplift radar.

Attachment #9162009 - Flags: approval-mozilla-beta+

loads of reports crashing on the added assertion are already coming in after 79.0b6 got released - so far all of them are crashing here: https://hg.mozilla.org/releases/mozilla-beta/annotate/49c7a261c59c4c5ace83bc2d727d7eef0dd29e9c/security/nss/lib/dev/devslot.c#l150

Crash Signature: [@ PK11_TraverseCertsInSlot] → [@ PK11_TraverseCertsInSlot] [@ nssSlot_IsTokenPresent | nssSlot_GetToken | nssTrustDomain_FindTrustForCertificate | stan_GetCERTCertificate] [@ nssSlot_IsTokenPresent | nssSlot_GetToken | nssTrustDomain_FindCertificateByIssuerAndSerialNumber | NSSTrust…

Backed out for 79.0b7 per discussion with Dana and JC.
https://hg.mozilla.org/releases/mozilla-beta/rev/b479804e64c1

Attachment #9162009 - Attachment is obsolete: true

Beta79 is no longer affected by this due to bug 1651155 disabling the regressing feature.

This will also be disabled as of 80.0b7 in a couple of weeks.

Hi Dana, can we assume that this remains disabled also for 81 late beta?

Flags: needinfo?(dkeeler)

Yes - this will probably remain disabled for the foreseeable future while we re-work some aspects of how gecko uses NSS.

Flags: needinfo?(dkeeler)
See Also: → 1663661

The leave-open keyword is there and there is no activity for 6 months.
:keeler, maybe it's time to close this bug?

Flags: needinfo?(dkeeler)

Yeah, I'm not sure it makes sense to keep this bug open.

Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Flags: needinfo?(dkeeler)
Resolution: --- → FIXED

security.intermediate_preloading_healer.enabled is still set to true in Nightly right? So is this crash still happening there or was it fixed in another bug?

Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: