Closed Bug 1767920 Opened 3 years ago Closed 3 years ago

Crash in [@ PR_MD_PR_POLL | mozilla::net::nsSocketTransportService::Run]

Categories

(Core :: Networking, defect, P2)

Unspecified
Windows
defect

Tracking

()

RESOLVED FIXED
108 Branch
Tracking Status
firefox-esr102 107+ fixed
firefox106 --- wontfix
firefox107 + fixed
firefox108 + fixed

People

(Reporter: gsvelto, Assigned: kershaw)

References

(Blocks 1 open bug)

Details

(Keywords: crash, sec-high, Whiteboard: [necko-triaged][necko-priority-review][post-critsmash-triage][adv-main107+r][adv-esr102.5+r])

Crash Data

Attachments

(2 files, 1 obsolete file)

Crash report: https://crash-stats.mozilla.org/report/index/1eef1d6e-094f-4bea-8e8c-0c4440220422

Reason: EXCEPTION_ACCESS_VIOLATION_READ

Top 10 frames of crashing thread:

0 nss3.dll PR_MD_PR_POLL nsprpub/pr/src/md/windows/w32poll.c:273
1 xul.dll mozilla::net::nsSocketTransportService::Run netwerk/base/nsSocketTransportService2.cpp:1150
2 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:467
3 xul.dll mozilla::ipc::MessagePumpForNonMainThreads::Run ipc/glue/MessagePump.cpp:300
4 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:324
5 xul.dll MessageLoop::Run ipc/chromium/src/base/message_loop.cc:306
6 xul.dll static nsThread::ThreadFunc xpcom/threads/nsThread.cpp:389
7 nss3.dll _PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c:399
8 nss3.dll pr_root nsprpub/pr/src/md/windows/w95thred.c:139
9 ucrtbase.dll thread_start<unsigned int , 1> 

This appears to be some kind of wild-pointer access. The crashing address is either in rcx or rdx. When we're hitting rdx it's usually a near-NULL access, but when we're hitting rcx it's almost always a non-canonical address and it most likely looks like some kind of integer data. In at least one case - the crash above - rcx also contains the poison pattern. So it doesn't look like a straight UAF and there are crashes which are stack overflows so we may be stomping over memory that's on the stack.

Blocks: clouseau

This looks like bug 1760611, which already has a patch ready to land.
Let's see if bug 1760611 can fix this.

Priority: -- → P2
Whiteboard: [necko-triaged]
Depends on: 1760611
Keywords: sec-high
Whiteboard: [necko-triaged] → [necko-triaged] maybe will be fixed by bug 1760611
Whiteboard: [necko-triaged] maybe will be fixed by bug 1760611 → [necko-triaged] [necko-priority-review]

Waiting for release of this patch to assess if has resolved in production release 104 2022-08-23.

Fix landed with bug 1782740 in 104.0b8 - no crashes since then.

(In reply to Valentin Gosu [:valentin] (he/him) from comment #3)

Fix landed with bug 1782740 in 104.0b8 - no crashes since then.

Let's close this one.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: [necko-triaged] [necko-priority-review] → [necko-triaged][necko-priority-queue]
Assignee: nobody → kershaw

Hi Jed,

Since you fixed a similar issue in bug 1760611, do you probably have an idea about this one?
The crashes in this bug seem to be all on windows platform, but I can't figure out how to fix this.
It'd be really appreciated if you can provide some insights.

Thanks.

Flags: needinfo?(jld)

Windows select seems to work differently: fd_set is an array of fds, according to Microsoft's documentation, so it's not a bitset like typical Unix implementations and it won't overflow from being given a numerically large fd. (And, according to comments in NSPR, FD_SET will silently fail if there are more fds than can fit in the array.)

The crash in comment #5 has the crash reason EXCEPTION_GUARD_PAGE (and crashed in _chkstk at the top of PR_MD_PR_POLL), so that might be a stack overflow; are we setting the SocketTransportService thread's stack size to something that's too small?

And the crash linked in the initial bug report blames the call to PR_GetIdentitiesLayer, so that's from dealing with NSPR's data structures, not fd_sets.

So, I don't think this is related to bug 1760611 in a useful way.

Flags: needinfo?(jld)

(In reply to Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧ from comment #7)

Windows select seems to work differently: fd_set is an array of fds, [according to Microsoft's documentation][ms-doc], so it's not a bitset like typical Unix implementations and it won't overflow from being given a numerically large fd. (And, according to [comments in NSPR][nspr-comment], FD_SET will silently fail if there are more fds than can fit in the array.)

The crash in comment #5 has the crash reason EXCEPTION_GUARD_PAGE (and crashed in _chkstk at the top of PR_MD_PR_POLL), so that might be a stack overflow; are we setting the SocketTransportService thread's stack size to something that's too small?

Thanks for the feedback.
I think trying to increase the stack size might be a good way to go.

Comment on attachment 9297245 [details]
Bug 1767920 - Increase thread stack size on windows, r=#necko

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: Unknown. This patch only increases socket thread's stack size.
  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
  • Which older supported branches are affected by this flaw?: all
  • If not all supported branches, which bug introduced the flaw?: None
  • Do you have backports for the affected branches?: No
  • If not, how different, hard to create, and risky will they be?: I think we don't have to uplift this.
  • How likely is this patch to cause regressions; how much testing does it need?: Low. I think increasing the stack size should be fine.
  • Is Android affected?: No
Attachment #9297245 - Flags: sec-approval?

Comment on attachment 9297245 [details]
Bug 1767920 - Increase thread stack size on windows, r=#necko

Approved to land and uplift after beta branches

Attachment #9297245 - Flags: sec-approval? → sec-approval+
Whiteboard: [necko-triaged][necko-priority-queue] → [necko-triaged][necko-priority-review]
Group: network-core-security → core-security-release
Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 108 Branch

This is going to need a rebased patch for ESR102 uplift. It grafts cleanly to Beta as-landed.

Flags: needinfo?(kershaw)

(In reply to Ryan VanderMeulen [:RyanVM] from comment #13)

This is going to need a rebased patch for ESR102 uplift. It grafts cleanly to Beta as-landed.

I think we don't need to uplift this to ESR, since the crash rate is low. Also, this patch does not really fix the crash. It only increases the stack size, since we still don't understand the root cause.

Flags: needinfo?(kershaw)

The patch landed in nightly and beta is affected.
:kershaw, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox107 to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(kershaw)

(In reply to Kershaw Chang [:kershaw] from comment #14)

(In reply to Ryan VanderMeulen [:RyanVM] from comment #13)

This is going to need a rebased patch for ESR102 uplift. It grafts cleanly to Beta as-landed.

I think we don't need to uplift this to ESR, since the crash rate is low. Also, this patch does not really fix the crash. It only increases the stack size, since we still don't understand the root cause.

In general, sec-high bugs are always uplifted (and comment 11 seems to imply that as well). Please confirm with Tom and the security team that they're OK with not taking this fix on branches.

In general, sec-high bugs are always uplifted (and comment 11 seems to imply that as well). Please confirm with Tom and the security team that they're OK with not taking this fix on branches.

Hi Tom,
What do you think? Should we uplift this patch to beta and ESR?(In reply to Ryan VanderMeulen [:RyanVM] from comment #16)

Flags: needinfo?(kershaw) → needinfo?(tom)
Flags: qe-verify-
Whiteboard: [necko-triaged][necko-priority-review] → [necko-triaged][necko-priority-review][post-critsmash-triage]

Sorry, I remember typing a reply but I guess I never submitted it. I don't see why we shouldn't uplift it. Although it may not fix the issue, it makes the behavior universal at least, and if it can be exploited at a lower size but is more difficult at the higher size we don't look bad in not doing so.

Flags: needinfo?(tom)

Comment on attachment 9297245 [details]
Bug 1767920 - Increase thread stack size on windows, r=#necko

Beta/Release Uplift Approval Request

  • User impact if declined: Possible crash.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: N/A
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This patch only increases the stack size of socket thread.
  • String changes made/needed: N/A
  • Is Android affected?: Yes
Attachment #9297245 - Flags: approval-mozilla-beta?

Comment on attachment 9297245 [details]
Bug 1767920 - Increase thread stack size on windows, r=#necko

Approved for 107.0b9

Attachment #9297245 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment on attachment 9301594 [details]
Bug 1767920 - Increase thread stack size on windows (for ESR 102), r=#necko

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration:
  • User impact if declined: Possible crash.
  • Fix Landed on Version: 108
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This patch only incrrease the stack size of socket thread.
Attachment #9301594 - Flags: approval-mozilla-esr102?
Attachment #9301595 - Attachment is obsolete: true

Comment on attachment 9301594 [details]
Bug 1767920 - Increase thread stack size on windows (for ESR 102), r=#necko

Approved for 102.5esr.

Attachment #9301594 - Flags: approval-mozilla-esr102? → approval-mozilla-esr102+
Whiteboard: [necko-triaged][necko-priority-review][post-critsmash-triage] → [necko-triaged][necko-priority-review][post-critsmash-triage][adv-main107+r]
Whiteboard: [necko-triaged][necko-priority-review][post-critsmash-triage][adv-main107+r] → [necko-triaged][necko-priority-review][post-critsmash-triage][adv-main107+r][adv-esr102.5+r]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: