Open Bug 1789458 Opened 2 years ago Updated 8 months ago

Assertion failure: success (Transport Security Getters should not fail.), at security/manager/ssl/nsNSSIOLayer.cpp:1220

Categories

(Core :: Security: PSM, defect, P3)

defect

Tracking

()

REOPENED
106 Branch
Tracking Status
firefox106 --- fix-optional

People

(Reporter: dlrobertson, Unassigned)

References

(Regression)

Details

(Keywords: regression, Whiteboard: [psm-backlog])

Attachments

(1 file)

Steps to reproduce:

  • Have a debug build of firefox
  • Run ./mach run <some url>

Expected results:

Firefox starts and shows the url

Actual results:

Assertion failure: success (Transport Security Getters should not fail.), at /Users/danrobertson/work/mozilla/central/security/manager/ssl/nsNSSIOLayer.cpp:1220
#01: (anonymous namespace)::checkHandshake(int, bool, PRFileDesc*, nsNSSSocketInfo*)[/Users/danrobertson/work/mozilla/central/objdir-asan/toolkit/library/build/XUL +0xfa45a8c]
#02: mozilla::net::nsSocketOutputStream::Write(char const*, unsigned int, unsigned int*)[/Users/danrobertson/work/mozilla/central/objdir-asan/toolkit/library/build/XUL +0x1066c08]
#03: mozilla::net::nsHttpConnection::OnReadSegment(char const*, unsigned int, unsigned int*)[/Users/danrobertson/work/mozilla/central/objdir-asan/toolkit/library/build/XUL +0x1ca3ed4]
#04: mozilla::net::Http2Session::FlushOutputQueue()[/Users/danrobertson/work/mozilla/central/objdir-asan/toolkit/library/build/XUL +0x199d398]
#05: mozilla::net::Http2Session::ReadSegmentsAgain(mozilla::net::nsAHttpSegmentReader*, unsigned int, unsigned int*, bool*)[/Users/danrobertson/work/mozilla/central/objdir-asan/toolkit/library/build/XUL +0x19b2f10]
#06: mozilla::net::nsHttpConnection::OnSocketWritable()[/Users/danrobertson/work/mozilla/central/objdir-asan/toolkit/library/build/XUL +0x1ca488c]
#07: mozilla::net::nsHttpConnection::OnOutputStreamReady(nsIAsyncOutputStream*)[/Users/danrobertson/work/mozilla/central/objdir-asan/toolkit/library/build/XUL +0x1c9a0f0]
#08: mozilla::net::nsHttpConnection::Activate(mozilla::net::nsAHttpTransaction*, unsigned int, int)[/Users/danrobertson/work/mozilla/central/objdir-asan/toolkit/library/build/XUL +0x1c97bac]
#09: mozilla::net::nsHttpConnectionMgr::DispatchAbstractTransaction(mozilla::net::ConnectionEntry*, mozilla::net::nsAHttpTransaction*, unsigned int, mozilla::net::HttpConnectionBase*, int)[/Users/danrobertson/work/mozilla/central/ob
jdir-asan/toolkit/library/build/XUL +0x1cd4058]
#10: mozilla::net::nsHttpConnectionMgr::DispatchTransaction(mozilla::net::ConnectionEntry*, mozilla::net::nsHttpTransaction*, mozilla::net::HttpConnectionBase*)[/Users/danrobertson/work/mozilla/central/objdir-asan/toolkit/library/bu
ild/XUL +0x1cd2930]
#11: mozilla::net::DnsAndConnectSocket::SetupConn(bool, nsresult)[/Users/danrobertson/work/mozilla/central/objdir-asan/toolkit/library/build/XUL +0x1951fe4]
#12: mozilla::net::DnsAndConnectSocket::OnOutputStreamReady(nsIAsyncOutputStream*)[/Users/danrobertson/work/mozilla/central/objdir-asan/toolkit/library/build/XUL +0x1953f88]
#13: mozilla::net::nsSocketOutputStream::OnSocketReady(nsresult)[/Users/danrobertson/work/mozilla/central/objdir-asan/toolkit/library/build/XUL +0x10664fc]
#14: mozilla::net::nsSocketTransport::OnSocketReady(PRFileDesc*, short)[/Users/danrobertson/work/mozilla/central/objdir-asan/toolkit/library/build/XUL +0x10751f8]
#15: mozilla::net::nsSocketTransportService::DoPollIteration(mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator>*)[/Users/danrobertson/work/mozilla/central/objdir-asan/toolkit/library/build/XUL +0x108f9d8]
#16: mozilla::net::nsSocketTransportService::Run()[/Users/danrobertson/work/mozilla/central/objdir-asan/toolkit/library/build/XUL +0x108e188]

Note: The build I am using is a address sanitized build and I am on macOS.

Flags: needinfo?(djackson)

The assert mentions that Transport Security Getters should not fail., but while TransportSecurityInfo::GetUsedPrivateDNS and TransportSecurityInfo::GetMadeOCSPRequests indeed cannot fail, TransportSecurityInfo::GetIsAcceptedEch can fail here and TransportSecurityInfo::GetProtocolVersion can fail here.

This is blocking development for me. (Windows)

Severity: -- → S1
Priority: -- → P1

Specifically, this is hitting here within GetProtocolVersion:
https://searchfox.org/mozilla-central/rev/9f8e74292115b4fbbb41698bfe9a7d8cc12b31cf/security/manager/ssl/TransportSecurityInfo.cpp#1197

And if skipped, it hits a similar line in GetIsAcceptedEch.

Workaround for debug builds is to comment out the MOZ_ASSERT lines following both GetProtocolVersion and GetIsAcceptedEch.

Severity: S1 → S2
Priority: P1 → --
Priority: -- → P1

Honestly I can't decide between S1 and S2 :)

Flags: needinfo?(djackson)

.

Assignee: nobody → djackson

Hi Kelsey and Dan,

Thank you for the report. I'm not able to reproduce this in either normal browsing or via ./mach run, with debug or asan builds. Is this crashing for you 100% of the time or only for a proportion of runs? Also to confirm that you're using an up to date to pull of central? These asserts did briefly land and get backed out due to a crash in these asserts and then re-landed with fixes.

Whilst the getters can fail in general, these asserts are only hit if the TLS connection is marked as succeeding, at which point the protocol version and ech status should be set.

This crashes for me 100% of the time, on macOS. I have locally commented out the asserts after GetProtocolVersion and GetIsAcceptedEch to proceed.

(In reply to Dennis Jackson from comment #7)

Thank you for the report. I'm not able to reproduce this in either normal browsing or via ./mach run, with debug or asan builds. Is this crashing for you 100% of the time or only for a proportion of runs?

For me 100% of the time.

Also to confirm that you're using an up to date to pull of central? These asserts did briefly land and get backed out due to a crash in these asserts and then re-landed with fixes.

Rebased on 663615ef7a19, and still crashes.

Are you trying something non-TLS, such as file:/// access? I know that was busted for me. I tried one other website and it seemed to also hit the asserts, but it might also have been some weird caching/restore thing going on.

Flags: needinfo?(djackson)
Summary: Assertion failure: success (Transport Security Getters should not fail.) → Assertion failure: success (Transport Security Getters should not fail.), at security/manager/ssl/nsNSSIOLayer.cpp:1220

(In reply to Kelsey Gilbert [:jgilbert] from comment #10)

Are you trying something non-TLS, such as file:/// access? I know that was busted for me. I tried one other website and it seemed to also hit the asserts, but it might also have been some weird caching/restore thing going on.

There is definitely some weird caching/restore thing going on. If i rm -rf <object dir>/tmp, I don't hit this assert for a while, but eventually on continued testing of whatever bug I'm working on I start hitting this.

I'm backing the asserts out until I can look into this further, I'm still unable to reproduce it locally.

To summarise the reports as I understand them:

  • Once a crash occurs, it then occurs 100% of the time until the profile directory is cleared.
  • There's no special networking configuration (e.g. a proxy) for the impacted builds.
  • The crash only happens on MacOS for Dan and Brad.
  • Kelsey sees the crash even when accessing resources which don't use TLS like file:///.

@Kelsey are you seeing this only on MacOS as well?

Flags: needinfo?(djackson) → needinfo?(jgilbert)
Severity: S2 → S4
Status: NEW → ASSIGNED
Keywords: leave-open

I saw this on Linux

Pushed by djackson@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/1ddbfd389a34
Backout asserts from 1788290. r=keeler
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 106 Branch

I think this should have been left open due to the leave-open keyword being set. Please feel free to close if I've misunderstood things.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

(In reply to Dennis Jackson from comment #13)

I'm backing the asserts out until I can look into this further, I'm still unable to reproduce it locally.

To summarise the reports as I understand them:

  • Once a crash occurs, it then occurs 100% of the time until the profile directory is cleared.
  • There's no special networking configuration (e.g. a proxy) for the impacted builds.
  • The crash only happens on MacOS for Dan and Brad.
  • Kelsey sees the crash even when accessing resources which don't use TLS like file:///.

@Kelsey are you seeing this only on MacOS as well?

No, Windows 10 here!

Flags: needinfo?(jgilbert)
Flags: needinfo?(djackson)
Flags: needinfo?(djackson)
See Also: → 1795831
Severity: S4 → N/A
Priority: P1 → P3
Severity: N/A → S4
Assignee: djackson → nobody

The leave-open keyword is there and there is no activity for 6 months.
:keeler, maybe it's time to close this bug?
For more information, please visit BugBot documentation.

Flags: needinfo?(dkeeler)
Flags: needinfo?(dkeeler)
Whiteboard: [psm-backlog]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: