Closed Bug 1612116 Opened 1 year ago Closed 1 month ago

Crash in [@ mozilla::psm::GetXPCOMFromNSSError]

Categories

(Core :: Security: PSM, defect, P1)

defect

Tracking

()

RESOLVED FIXED
90 Branch
Tracking Status
firefox-esr78 --- fixed
firefox72 --- wontfix
firefox73 --- wontfix
firefox88 --- wontfix
firefox89 --- fixed
firefox90 --- fixed

People

(Reporter: philipp, Assigned: keeler)

Details

(Keywords: crash, Whiteboard: [psm-assigned])

Crash Data

Attachments

(3 files)

This bug is for crash report bp-c072d82e-9353-44e0-a4f5-f8c410200129.

Top 10 frames of crashing thread:

0 xul.dll mozilla::psm::GetXPCOMFromNSSError security/manager/ssl/NSSErrorsService.cpp:76
1 xul.dll mozilla::net::nsHttpConnection::EnsureNPNComplete netwerk/protocol/http/nsHttpConnection.cpp:572
2 xul.dll mozilla::net::nsHttpConnection::OnSocketWritable netwerk/protocol/http/nsHttpConnection.cpp:2068
3 xul.dll mozilla::net::TLSFilterTransaction::StartTimerCallback netwerk/protocol/http/TunnelUtils.cpp:579
4 xul.dll mozilla::net::TLSFilterTransaction::Notify netwerk/protocol/http/TunnelUtils.cpp:558
5 xul.dll nsTimerImpl::Fire xpcom/threads/nsTimerImpl.cpp:564
6 xul.dll nsTimerEvent::Run xpcom/threads/TimerThread.cpp:259
7 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1241
8 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:486
9 xul.dll mozilla::net::nsSocketTransportService::Run netwerk/base/nsSocketTransportService2.cpp:1013

this browser process crash signature is spiking up in volume in the past couple of hours and reports generally come with MOZ_CRASH(Function failed without calling PR_GetError). i'm not entirely sure if this bug is better belonging in security:psm or networking.

i can't see any obvious correlation yet of what might explain the sudden spike - some of the more affected locales are id & tr which is something we don't commonly see and NSFW site are quite common among the urls reported for crashing.

Looks like this spiked on/around January 29 and then went back down to the previous "a handful of crashes per day" volume.

The priority flag is not set for this bug.
:keeler, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(dkeeler)

It's not immediately obvious what's going on here, but we could start by adding more assertions that TransportSecurityInfo::mErrorCode and mCanceled are consistent. Since the volume went back down I'm not too concerned.

Flags: needinfo?(dkeeler)
Priority: -- → P2
Whiteboard: [psm-backlog]
Severity: normal → S3

This spiked up again yesterday with 1300+ crash reports.

(I'm also wondering whether we should move this over to necko instead of PSM?)

I think we'll need to land some diagnostic assertions in PSM and go from there, so let's leave it here for now. We can have a look at this next week.

Assignee: nobody → bugs
Status: NEW → ASSIGNED
Keywords: leave-open
Priority: P2 → P1
Whiteboard: [psm-backlog] → [psm-assigned]
Pushed by ncsoregi@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7c9218657abb
Added diagnostics to ensure mErrorCode and mCanceled are consistent r=keeler

Looks like we're seeing this crash signature pick up in the last few days, on both release and esr.

There's also some crashes on aurora with the diagnostic assert added here, e.g. bp-0b2abc87-602f-42b1-82dd-273170201102

Crash Signature: [@ mozilla::psm::GetXPCOMFromNSSError] → [@ mozilla::psm::GetXPCOMFromNSSError] [@ nsNSSSocketInfo::DriveHandshake]

This assertion is failing:

  SECStatus rv = SSL_ForceHandshake(mFd);

  if (rv != SECSuccess) {
    PRErrorCode errorCode = PR_GetError();
    MOZ_DIAGNOSTIC_ASSERT(errorCode, "handshake failed without error code");

which means that NSS is returning a non-success status while not setting the error code, which probably means there's a bug in NSS.

Assignee: bugs → nobody
Severity: S3 → --
Status: ASSIGNED → NEW
Component: Security: PSM → Libraries
Priority: P1 → --
Product: Core → NSS
QA Contact: jjones
Whiteboard: [psm-assigned]
Version: unspecified → other

The severity field is not set for this bug.
:jcj, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(jc)

At least it's calmed back down. Leaving for triage.

Flags: needinfo?(jc)
Severity: -- → S4
Priority: -- → P5
Whiteboard: [nss-fx]

This started to spike again across channels yesterday, many of the urls mention https://outlook.office365.com I cannot access it but get redirected to https://login.microsoftonline.com which interestingly has a recently issued certificate (serial 01:71:20:D9:F6:91:37:4B:45:0C:3C:9F:2F:3D:D6:58) with validity Not Before Fri, 05 Mar 2021 00:00:00 GMT.

Benjamin, could you check what causes the spike and what fails, please?

Flags: needinfo?(bbeurdouche)

I might be able to help with reproducing on outlook, but while I have an Outlook 365 subscription, it uses a different domain. That one is probably not phishing, but I can't tell for sure.

More seriously, why do we care to retain this crash? There might be a bug in NSS somewhere (correction: there almost certainly is a bug in NSS somewhere), but tracking this one down without a reproducer is a lot of work (you can't do it manually, so we'd need to break new ground and write a tool to trace all the SECFailure returns that a failure in SSL_ForceHandshake might trace back to).

I know that that might bury this, but at these failure rates, finding a reliable way to reproduce the error is going to be very hard.

Over 700 reported crashes in the last week is not a low-volume crash.

Since this isn't really actionable, let's paper over it by converting "no error code set" to SEC_ERROR_LIBRARY_FAILURE or something in this specific place.

Assignee: nobody → dkeeler
Severity: S4 → S3
Component: Libraries → Security: PSM
Keywords: leave-open
Priority: P5 → P1
Product: NSS → Core
Whiteboard: [nss-fx] → [psm-assigned]
Version: other → unspecified

Sometimes SSL_ForceHandshake will return SECFailure without setting an error
code. When this happens, calling GetXPCOMFromNSSError on that not-an-error-code
will fail. This patch first checks for this situation and substitutes
SEC_ERROR_LIBRARY_FAILURE if applicable.

Pushed by dkeeler@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/aa53405e47cc
turn NSS not setting an error code into SEC_ERROR_LIBRARY_FAILURE r=bbeurdouche
Status: NEW → RESOLVED
Closed: 1 month ago
Resolution: --- → FIXED
Target Milestone: --- → 90 Branch
Flags: needinfo?(bbeurdouche)

Please nominate this for uplifts when you get a chance.

Flags: needinfo?(dkeeler)

Comment on attachment 9221453 [details]
Bug 1612116 - turn NSS not setting an error code into SEC_ERROR_LIBRARY_FAILURE r?bbeurdouche

Beta/Release Uplift Approval Request

  • User impact if declined: Rare assertion failures leading to crashes.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This patch gets rid of the assertion failure by first checking for the condition that causes it and coming up with a generic error to report to the user instead. It's very small and localized, so it shouldn't be risky.
  • String changes made/needed: none
Flags: needinfo?(dkeeler)
Attachment #9221453 - Flags: approval-mozilla-release?
Attachment #9221453 - Flags: approval-mozilla-beta?

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration: crashes
User impact if declined: occasional crashes
Fix Landed on Version: 90
Risk to taking this patch (and alternatives if risky): low - it's small and localized

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.

Attachment #9222280 - Flags: review+
Attachment #9222280 - Flags: approval-mozilla-esr78?

Comment on attachment 9221453 [details]
Bug 1612116 - turn NSS not setting an error code into SEC_ERROR_LIBRARY_FAILURE r?bbeurdouche

Low risk stability fix, approved for 89 beta 14, thanks.

Attachment #9221453 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment on attachment 9222280 [details] [diff] [review]
1612116-patch-for-esr78.diff

approved for 78.11, thanks

Attachment #9222280 - Flags: approval-mozilla-esr78? → approval-mozilla-esr78+

Comment on attachment 9221453 [details]
Bug 1612116 - turn NSS not setting an error code into SEC_ERROR_LIBRARY_FAILURE r?bbeurdouche

Fx89 is in RC now, no need to take in 88.

Attachment #9221453 - Flags: approval-mozilla-release? → approval-mozilla-release-
You need to log in before you can comment on or make changes to this bug.