Crash in [@ mozilla::psm::GetXPCOMFromNSSError]
Categories
(Core :: Security: PSM, defect, P1)
Tracking
()
People
(Reporter: philipp, Assigned: keeler)
Details
(Keywords: crash, Whiteboard: [psm-assigned])
Crash Data
Attachments
(3 files)
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
pascalc
:
approval-mozilla-beta+
RyanVM
:
approval-mozilla-release-
|
Details | Review |
|
1.38 KB,
patch
|
keeler
:
review+
jcristau
:
approval-mozilla-esr78+
|
Details | Diff | Splinter Review |
This bug is for crash report bp-c072d82e-9353-44e0-a4f5-f8c410200129.
Top 10 frames of crashing thread:
0 xul.dll mozilla::psm::GetXPCOMFromNSSError security/manager/ssl/NSSErrorsService.cpp:76
1 xul.dll mozilla::net::nsHttpConnection::EnsureNPNComplete netwerk/protocol/http/nsHttpConnection.cpp:572
2 xul.dll mozilla::net::nsHttpConnection::OnSocketWritable netwerk/protocol/http/nsHttpConnection.cpp:2068
3 xul.dll mozilla::net::TLSFilterTransaction::StartTimerCallback netwerk/protocol/http/TunnelUtils.cpp:579
4 xul.dll mozilla::net::TLSFilterTransaction::Notify netwerk/protocol/http/TunnelUtils.cpp:558
5 xul.dll nsTimerImpl::Fire xpcom/threads/nsTimerImpl.cpp:564
6 xul.dll nsTimerEvent::Run xpcom/threads/TimerThread.cpp:259
7 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1241
8 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:486
9 xul.dll mozilla::net::nsSocketTransportService::Run netwerk/base/nsSocketTransportService2.cpp:1013
this browser process crash signature is spiking up in volume in the past couple of hours and reports generally come with MOZ_CRASH(Function failed without calling PR_GetError). i'm not entirely sure if this bug is better belonging in security:psm or networking.
i can't see any obvious correlation yet of what might explain the sudden spike - some of the more affected locales are id & tr which is something we don't commonly see and NSFW site are quite common among the urls reported for crashing.
Comment 1•5 years ago
|
||
Looks like this spiked on/around January 29 and then went back down to the previous "a handful of crashes per day" volume.
Comment 2•5 years ago
|
||
The priority flag is not set for this bug.
:keeler, could you have a look please?
For more information, please visit auto_nag documentation.
| Assignee | ||
Comment 3•5 years ago
|
||
It's not immediately obvious what's going on here, but we could start by adding more assertions that TransportSecurityInfo::mErrorCode and mCanceled are consistent. Since the volume went back down I'm not too concerned.
| Assignee | ||
Updated•5 years ago
|
Comment 4•5 years ago
|
||
This spiked up again yesterday with 1300+ crash reports.
Comment 5•5 years ago
|
||
(I'm also wondering whether we should move this over to necko instead of PSM?)
| Assignee | ||
Comment 6•5 years ago
|
||
I think we'll need to land some diagnostic assertions in PSM and go from there, so let's leave it here for now. We can have a look at this next week.
Comment 7•5 years ago
|
||
Updated•5 years ago
|
| Assignee | ||
Updated•5 years ago
|
Comment 9•5 years ago
|
||
| bugherder | ||
Comment 10•5 years ago
|
||
Looks like we're seeing this crash signature pick up in the last few days, on both release and esr.
There's also some crashes on aurora with the diagnostic assert added here, e.g. bp-0b2abc87-602f-42b1-82dd-273170201102
| Assignee | ||
Comment 11•5 years ago
|
||
This assertion is failing:
SECStatus rv = SSL_ForceHandshake(mFd);
if (rv != SECSuccess) {
PRErrorCode errorCode = PR_GetError();
MOZ_DIAGNOSTIC_ASSERT(errorCode, "handshake failed without error code");
which means that NSS is returning a non-success status while not setting the error code, which probably means there's a bug in NSS.
Comment 12•5 years ago
|
||
The severity field is not set for this bug.
:jcj, could you have a look please?
For more information, please visit auto_nag documentation.
Updated•4 years ago
|
Comment 14•4 years ago
|
||
This started to spike again across channels yesterday, many of the urls mention https://outlook.office365.com I cannot access it but get redirected to https://login.microsoftonline.com which interestingly has a recently issued certificate (serial 01:71:20:D9:F6:91:37:4B:45:0C:3C:9F:2F:3D:D6:58) with validity Not Before Fri, 05 Mar 2021 00:00:00 GMT.
Benjamin, could you check what causes the spike and what fails, please?
Comment 15•4 years ago
|
||
I might be able to help with reproducing on outlook, but while I have an Outlook 365 subscription, it uses a different domain. That one is probably not phishing, but I can't tell for sure.
More seriously, why do we care to retain this crash? There might be a bug in NSS somewhere (correction: there almost certainly is a bug in NSS somewhere), but tracking this one down without a reproducer is a lot of work (you can't do it manually, so we'd need to break new ground and write a tool to trace all the SECFailure returns that a failure in SSL_ForceHandshake might trace back to).
I know that that might bury this, but at these failure rates, finding a reliable way to reproduce the error is going to be very hard.
Comment 16•4 years ago
|
||
Over 700 reported crashes in the last week is not a low-volume crash.
| Assignee | ||
Comment 17•4 years ago
|
||
Since this isn't really actionable, let's paper over it by converting "no error code set" to SEC_ERROR_LIBRARY_FAILURE or something in this specific place.
| Assignee | ||
Comment 18•4 years ago
|
||
Sometimes SSL_ForceHandshake will return SECFailure without setting an error
code. When this happens, calling GetXPCOMFromNSSError on that not-an-error-code
will fail. This patch first checks for this situation and substitutes
SEC_ERROR_LIBRARY_FAILURE if applicable.
Comment 19•4 years ago
|
||
Comment 20•4 years ago
|
||
| bugherder | ||
Updated•4 years ago
|
Comment 21•4 years ago
|
||
Please nominate this for uplifts when you get a chance.
| Assignee | ||
Comment 22•4 years ago
|
||
Comment on attachment 9221453 [details]
Bug 1612116 - turn NSS not setting an error code into SEC_ERROR_LIBRARY_FAILURE r?bbeurdouche
Beta/Release Uplift Approval Request
- User impact if declined: Rare assertion failures leading to crashes.
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): This patch gets rid of the assertion failure by first checking for the condition that causes it and coming up with a generic error to report to the user instead. It's very small and localized, so it shouldn't be risky.
- String changes made/needed: none
| Assignee | ||
Comment 23•4 years ago
|
||
[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration: crashes
User impact if declined: occasional crashes
Fix Landed on Version: 90
Risk to taking this patch (and alternatives if risky): low - it's small and localized
See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Comment 24•4 years ago
|
||
Comment on attachment 9221453 [details]
Bug 1612116 - turn NSS not setting an error code into SEC_ERROR_LIBRARY_FAILURE r?bbeurdouche
Low risk stability fix, approved for 89 beta 14, thanks.
Comment 25•4 years ago
|
||
| bugherder uplift | ||
Comment 26•4 years ago
|
||
Comment on attachment 9222280 [details] [diff] [review]
1612116-patch-for-esr78.diff
approved for 78.11, thanks
Comment 27•4 years ago
|
||
| bugherder uplift | ||
Comment 28•4 years ago
|
||
Comment on attachment 9221453 [details]
Bug 1612116 - turn NSS not setting an error code into SEC_ERROR_LIBRARY_FAILURE r?bbeurdouche
Fx89 is in RC now, no need to take in 88.
Updated•4 years ago
|
Description
•