Closed Bug 1459680 Opened 7 years ago Closed 2 years ago

Crash in QueryDnsForFamily

Categories

(External Software Affecting Firefox :: Other, defect, P3)

All
Windows

Tracking

(firefox-esr52 wontfix, firefox-esr60 wontfix, firefox-esr115 wontfix, firefox60 wontfix, firefox61 wontfix, firefox62 wontfix, firefox120 wontfix, firefox121 wontfix, firefox122 fixed)

RESOLVED FIXED
122 Branch
Tracking Status
firefox-esr52 --- wontfix
firefox-esr60 --- wontfix
firefox-esr115 --- wontfix
firefox60 --- wontfix
firefox61 --- wontfix
firefox62 --- wontfix
firefox120 --- wontfix
firefox121 --- wontfix
firefox122 --- fixed

People

(Reporter: philipp, Assigned: gstoll)

References

Details

(Keywords: crash, csectype-wildptr, sec-vector, Whiteboard: [necko-triaged] [qa-not-actionable][adv-main122-])

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is report bp-2b415f8d-f874-4eac-a4a4-c09680180507. ============================================================= Top 10 frames of crashing thread: 0 @0x7ff8457d5387 1 ws2_32.dll QueryDnsForFamily 2 ws2_32.dll QueryDns 3 ws2_32.dll LookupAddressForName 4 ws2_32.dll GetAddrInfoW 5 ws2_32.dll getaddrinfo 6 nss3.dll PR_GetAddrInfoByName nsprpub/pr/src/misc/prnetdb.c:2037 7 xul.dll mozilla::net::GetAddrInfo netwerk/dns/GetAddrInfo.cpp:345 8 xul.dll nsHostResolver::ThreadFunc netwerk/dns/nsHostResolver.cpp:1805 9 nss3.dll PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c:397 ============================================================= these crashes on windows have been around for a while already - they mostly appear to be wild-pointer issues. the user comment at bp-f6c15ab6-ef9c-40b9-b54d-228060180121 said "i disconnect vpn application and firefox crashed"...
Group: core-security → network-core-security
This is not our bug -- crashing deep in windows sockets. All we're doing is looking up a simple name. In the case of the user with the comment maybe the VPN didn't clean up after itself and windows sockets are calling out into a library that's not there.
Group: network-core-security
Keywords: sec-vector
I don't think there is much we can really do about it. Keeping it open for tracking purposes.
Priority: -- → P3
Whiteboard: [necko-triaged]

We should re-evaluate looking into this crash even though it's low volume. While the call stacks on Socorro are often truncated the actual call stack is always the same:

1. ws2_32.dll!QueryDnsForFamily()	Unknown
2. ws2_32.dll!QueryDns()	Unknown
3. ws2_32.dll!LookupAddressForName()	Unknown
4. ws2_32.dll!GetAddrInfoW()	Unknown
5. ws2_32.dll!getaddrinfo()	Unknown
6. nss3.dll!PR_GetAddrInfoByName(const char * hostname, unsigned short af, int flags) Line 2171	C
7. [Inline Frame] xul.dll!mozilla::net::_GetAddrInfo_Portable(const nsTSubstring<char> & aCanonHost, unsigned short aAddressFamily, unsigned short aFlags, mozilla::net::AddrInfo * * aAddrInfo) Line 241	C++
8. xul.dll!mozilla::net::GetAddrInfo(const nsTSubstring<char> & aHost, unsigned short aAddressFamily, unsigned short aFlags, mozilla::net::AddrInfo * * aAddrInfo, bool aGetTtl) Line 363	C++
9. xul.dll!nsHostResolver::ThreadFunc() Line 2221	C++
10. [Inline Frame] xul.dll!mozilla::detail::RunnableMethodArguments<>::applyImpl(nsMemoryReporterManager * o, nsresult(nsMemoryReporterManager::*)() m, mozilla::Tuple<> & args, std::integer_sequence<unsigned long long>) Line 1148	C++
11. [Inline Frame] xul.dll!mozilla::detail::RunnableMethodArguments<>::apply(nsMemoryReporterManager * o, nsresult(nsMemoryReporterManager::*)() m) Line 1154	C++
12. xul.dll!mozilla::detail::RunnableMethodImpl<nsMemoryReporterManager *,nsresult (nsMemoryReporterManager::*)(),1,mozilla::RunnableKind::Standard>::Run() Line 1204	C++
13. xul.dll!nsThreadPool::Run() Line 305	C++
14. xul.dll!nsThread::ProcessNextEvent(bool aMayWait, bool * aResult) Line 1164	C++
15. [Inline Frame] xul.dll!NS_ProcessNextEvent(nsIThread * aThread, bool aMayWait) Line 548	C++
16. xul.dll!mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate * aDelegate) Line 302	C++
17. [Inline Frame] xul.dll!MessageLoop::RunInternal() Line 335	C++
18. xul.dll!MessageLoop::RunHandler() Line 329	C++
19. xul.dll!MessageLoop::Run() Line 311	C++
20. xul.dll!nsThread::ThreadFunc(void * aArg) Line 393	C++
21. nss3.dll!_PR_NativeRunThread(void * arg) Line 421	C
22. nss3.dll!pr_root(void * arg) Line 140	C

The failure is always the same too, it's a FAST_FAIL_STACK_COOKIE_CHECK_FAILURE which means the return stack was corrupted somehow. It is happening on very recent versions of Windows (see this crash for example) which makes it somewhat unlikely for it to be an issue with Microsoft code that went undetected for years.

We probably need to pick apart a nightly minidump (such as this one) and figure out exactly what data was written over the stack cookie.

This may be caused by a third-party application who hooked ws2_32.dll!QueryDnsForFamily like Proxifier does (bug 1698057). If a hook function mistakenly shifts the stack pointer and returns to the original function, this happens.

Most of the crashes of this signature loaded FwcWsp64.dll (Microsoft Forefront TMG Client?). There are several crashes without FwcWsp64.dll, but those crashed at a different instruction like ec4f22db-e71e-4f82-a722-75f1e0210329 or 46668267-7a60-4ece-b43a-c69c10210402.

I looked at 5 crashes relatively randomly and all 5 had the FwcWsp64.dll you mention. 3 of the 5 had several .DLLs loaded from "crypto pro", including one called "cpwinet.dll" which is maybe relevant given comment 4 and the functionality of wininet.dll

Pretty sure the crashes weren't STATUS_STACK_BUFFER_OVERRUN / FAST_FAIL_STACK_COOKIE_CHECK_FAILURE when I unhid this bug in comment 1 three years ago. That sounds a bit scary, but definitely shouldn't be something we did simply by passing in a string to a windows API, even if we passed in a bogus pointer (that would trigger access violations instead).

Whiteboard: [necko-triaged] → [necko-triaged] [qa-not-actionable]

I've analyzed the remaining crashes here and the original problem is entirely gone. What we're seeing here are crashes from bug 1698057. The reason why they're appearing under this signature and not the original one is that when we switched stackwalkers earlier this year the new stackwalker started ignoring unloaded modules when it generated the crash stack, hence the first frame inside Proxifier is ignored and we end with this signature. I'll write a fix for that problem and when we ship it we can close this bug and reopen bug 1698057.

Severity: critical → S2
Summary: Crash in QueryDnsForFamily → Crash in QueryDnsForFamily -- due to Proxifier Portable Network Engine

I'm reprocessing the crashes under this signature now that unloaded module support is back. I'll re-open bug 1698057 and close this one if we find out that all remaining crashes are caused by the Proxifier Portable Network Engine.

Summary: Crash in QueryDnsForFamily -- due to Proxifier Portable Network Engine → Crash in QueryDnsForFamily
See Also: → 1797732

I filed bug 1797732 with the new signatures that we can reliably attribute to the Proxifier Portable Network Engine. Let's see what happens to the remaining volume here.

rate is down to 3-6/day. All appear to be stack overflows still, though the stacktrace is only about 16 levels deep

Flags: needinfo?(gsvelto)

I've checked the remaining crashes and they're not overflows, but rather scenarios where a stack cookie is being smashed (possibly by overrunning a stack-allocated buffer). However I don't think we should worry about it because all the remaining crashes have something in common: the FwcWsp64.dll appears to be injected in the process and it is always the same version 7.0.7734.100. This is a component of Microsoft Forefront Threat Management Gateway , a piece of software that hasn't seen a release in 13 years.

Gian-Carlo, while this is Microsoft software it seems like a piece of leftover junk that people have installed from ages past, can we consider blocking it and getting rid of this crash for good?

Flags: needinfo?(gsvelto) → needinfo?(gpascutto)

Sounds good to me.

Flags: needinfo?(gpascutto)

Alright, moving it to the right component.

Component: Networking → Other
Product: Core → External Software Affecting Firefox

I haven't been able to extract information from the 3rd party ping on sql.telemetry.mozilla.org... maybe I should try BigQuery.

I did a quick STMO query and verified that every crash in the last 2 weeks has that version of FwcWsp64.dll loaded. I'll block it.

Assignee: nobody → gstoll
Status: NEW → ASSIGNED
Pushed by gstoll@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f76de864f845 block fwcwsp64.dll r=gsvelto,win-reviewers,handyman
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 122 Branch

The patch landed in nightly and beta is affected.
:gstoll, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox121 to wontfix.

For more information, please visit BugBot documentation.

Flags: needinfo?(gstoll)

It's pretty low-volume, so let's not uplift.

Flags: needinfo?(gstoll)
Whiteboard: [necko-triaged] [qa-not-actionable] → [necko-triaged] [qa-not-actionable][adv-main122-]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: