Closed Bug 1711738 Opened 3 years ago Closed 3 years ago

Crash in [@ mozilla::net::DnsAndConnectSocket::TransportSetup::SetupStreams]

Categories

(Core :: Networking: HTTP, defect, P1)

Unspecified
All
defect

Tracking

()

RESOLVED FIXED
91 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox88 --- unaffected
firefox89 --- unaffected
firefox90 --- unaffected
firefox91 + fixed

People

(Reporter: aryx, Assigned: dragana)

References

(Regression)

Details

(Keywords: crash, regression, Whiteboard: [necko-triaged])

Crash Data

Attachments

(3 files)

13 crashes with 9 installations, all OS

Crash report: https://crash-stats.mozilla.org/report/index/a7a05ac8-4146-40d1-9790-0da350210518

MOZ_CRASH Reason: MOZ_DIAGNOSTIC_ASSERT(ent)

Top 10 frames of crashing thread:

0 xul.dll mozilla::net::DnsAndConnectSocket::TransportSetup::SetupStreams netwerk/protocol/http/DnsAndConnectSocket.cpp:1176
1 xul.dll mozilla::net::DnsAndConnectSocket::TransportSetup::OnLookupComplete netwerk/protocol/http/DnsAndConnectSocket.cpp:1249
2 xul.dll mozilla::net::DnsAndConnectSocket::OnLookupComplete netwerk/protocol/http/DnsAndConnectSocket.cpp:419
3 xul.dll mozilla::detail::RunnableFunction<`lambda at /builds/worker/checkouts/gecko/netwerk/dns/DNSListenerProxy.cpp:28:30'>::Run xpcom/threads/nsThreadUtils.h:534
4 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1153
5 xul.dll mozilla::net::nsSocketTransportService::Run netwerk/base/nsSocketTransportService2.cpp:1200
6 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1153
7 xul.dll mozilla::ipc::MessagePumpForNonMainThreads::Run ipc/glue/MessagePump.cpp:300
8 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:328
9 xul.dll MessageLoop::Run ipc/chromium/src/base/message_loop.cc:310
Severity: -- → S2
Flags: needinfo?(dd.mozilla)
Assignee: nobody → dd.mozilla
Status: NEW → ASSIGNED
Flags: needinfo?(dd.mozilla)
Priority: -- → P1
Whiteboard: [necko-triaged]
Keywords: leave-open
Blocks: 1713689
See Also: → 1711038
No longer blocks: 1713689

Bug 1705065 was backed out, so 90 is no affected any more.

Attachment #9224258 - Attachment description: Add diagnostic assertions to beer understand bug 1711738 and 1711038 → Add diagnostic assertions to better understand bug 1711738 and 1711038
Pushed by ddamjanovic@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/9099f9d14ed6 Add diagnostic assertions to better understand bug 1711738 and 1711038 r=necko-reviewers,valentin

So far 5 assertion failures hitting this in the latest Nightly, e.g. bp-5223a070-9a20-45cc-a79c-d9fd40210610

Crash Signature: [@ mozilla::net::DnsAndConnectSocket::TransportSetup::SetupStreams] → [@ mozilla::net::DnsAndConnectSocket::CheckIsDone] [@ mozilla::net::DnsAndConnectSocket::TransportSetup::SetupStreams]
Flags: needinfo?(dd.mozilla)

Thanks to the new asserts, We have more information what is happening. I think I can fix this issue. I will work on patch today.

Flags: needinfo?(dd.mozilla)

Can we at least disable the diagnostic asserts asap?

(In reply to Julien Cristau [:jcristau] from comment #8)

Can we at least disable the diagnostic asserts asap?

I will try to resolve the issues as soon as possible.

There are 2 kind of crashes:

  1. most common one should be resolved by D117706
  2. the other kind of crash is:
    a988968b-91c0-4601-a3ad-9acff0210614
    I am not sure how it is getting to that state. I will add more assertions to figure it out.
Pushed by ddamjanovic@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/af3e019e67c8 When retrying to resolve a host always set the state to TransportSetupState::RETRY_RESOLVING. When a connection fails the status was set to TransportSetupState::RESOLVING which have a bad influence on the state of DnsAndConnectSocket. r=necko-reviewers,kershaw
Pushed by ddamjanovic@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/eeb43a0ccb88 Add more diagnostic assertion to DnsAndConnectSocket. r=necko-reviewers,kershaw

Comment on attachment 9226936 [details]
When retrying to resolve a host always set the state to TransportSetupState::RETRY_RESOLVING. When a connection fails the status was set to TransportSetupState::RESOLVING which have a bad influence on the state of DnsAndConnectSocket.

Beta/Release Uplift Approval Request

  • User impact if declined: Diagnostic asserts added in bug 1705065 showed that DnsAndConnectSocket is sometimes in a wrong state. This patch has fix that. The effect on users is unclear. This may help with crash in 1667102, but I do not have a real good explanation how it would help.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Fixes DnsAndConnectSocket state machine. And fix is verified in Nightly with a lot of diagnostic asserts that make sure that DnsAndConnectSocket is in a proper state.
  • String changes made/needed:
Attachment #9226936 - Flags: approval-mozilla-beta?
See Also: → CVE-2021-43535

Given the low (known) impact, and given we're in the last week for beta 90, I'd prefer to let this ride the trains.

Attachment #9226936 - Flags: approval-mozilla-beta? → approval-mozilla-beta-

Should this be marked as fixed now, or is there more work to do? (the leave-open keyword is still around from the first diagnostic patch)

Flags: needinfo?(dd.mozilla)

I was waiting a bit to be sure there is no other issue.

This can be closed now.

Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Flags: needinfo?(dd.mozilla)
Keywords: leave-open
Resolution: --- → FIXED
Target Milestone: --- → 91 Branch
Has Regression Range: --- → yes
See Also: → 1837252
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: