Closed Bug 1222933 Opened 9 years ago Closed 7 months ago

crash in nsSocketOutputStream::Write due to PR_Write interception

Categories

(Core :: Networking, defect)

x86
Windows
defect

Tracking

()

RESOLVED INCOMPLETE
Future
Tracking Status
firefox42 - wontfix
firefox43 + wontfix
firefox44 + wontfix
firefox45 + wontfix
firefox46 --- wontfix
firefox47 --- wontfix
firefox48 --- wontfix
firefox49 + wontfix
firefox-esr45 --- wontfix
firefox50 + wontfix
firefox51 --- wontfix
firefox52 --- wontfix
firefox-esr60 --- wontfix
firefox64 --- wontfix
firefox65 --- wontfix

People

(Reporter: kairo, Unassigned)

References

(Depends on 1 open bug)

Details

(4 keywords, Whiteboard: [necko-backlog][tbird crash])

Crash Data

Attachments

(1 obsolete file)

[Tracking Requested - why for this release]:

This bug was filed from the Socorro interface and is 
report bp-39e49265-b663-4bb2-a383-f082c2151105.
=============================================================

Stack Trace:
0 		@0xbe62ad0 	
1 		@0xbe8f06a 	
2 		@0xbe8ef36 	
3 	xul.dll 	nsSocketOutputStream::Write(char const*, unsigned int, unsigned int*) 	netwerk/base/nsSocketTransport2.cpp
4 	xul.dll 	mozilla::net::nsHttpConnection::OnReadSegment(char const*, unsigned int, unsigned int*) 	netwerk/protocol/http/nsHttpConnection.cpp
5 	xul.dll 	mozilla::net::nsHttpTransaction::ReadRequestSegment(nsIInputStream*, void*, char const*, unsigned int, unsigned int, unsigned int*) 	netwerk/protocol/http/nsHttpTransaction.cpp
6 	xul.dll 	nsBufferedInputStream::ReadSegments(nsresult (*)(nsIInputStream*, void*, char const*, unsigned int, unsigned int, unsigned int*), void*, unsigned int, unsigned int*) 	netwerk/base/nsBufferedStreams.cpp
7 	xul.dll 	mozilla::net::nsHttpTransaction::ReadSegments(mozilla::net::nsAHttpSegmentReader*, unsigned int, unsigned int*) 	netwerk/protocol/http/nsHttpTransaction.cpp


I'm filing this under networking because the stacks are there, but it looks like it's at least triggered by adware - we should find out if it's actually caused by the malware or our issue and triggered by it, and if the former, we should find out if we can harden our side to not fall over with it.

This is the #4 crash in 43.0b1 at this time with 1.9% of the total crashes in this beta. Almost half of those crashes are actually within the first 60s of uptime, so what we call "startup crashes".

Looking at correlations, I find this:
  nsSocketOutputStream::Write|EXCEPTION_ACCESS_VIOLATION_READ (262 crashes)
     97% (253/262) vs.  10% (1488/14505) sfc_os.dll
     86% (225/262) vs.   2% (255/14505) AM32-34121.dll
     98% (256/262) vs.  51% (7433/14505) secur32.dll
     95% (249/262) vs.  57% (8334/14505) RpcRtRemote.dll
     68% (177/262) vs.  30% (4385/14505) SensApi.dll
     94% (245/262) vs.  56% (8185/14505) wship6.dll
     96% (251/262) vs.  60% (8680/14505) WSHTCPIP.DLL
     96% (251/262) vs.  61% (8823/14505) Wldap32.dll
     73% (190/262) vs.  38% (5548/14505) rtutils.dll

AM32-34121.dll is "Ad Muncher 32-bit Hook DLL" - see http://www.freefixer.com/library/file/AM32-34121.dll-166831/ - which seems to be adware.
And http://www.file.net/process/sfc_os.dll.html says "Some malware camouflages itself as sfc_os.dll." so this looks like another issue causing/triggering the same crash.
Summary: crash in nsSocketOutputStream::Write → crash in nsSocketOutputStream::Write mostly with Ad Muncher
This also affects 42.0, being 0.7% of the overall crashes there and 2% of the startup crashes in that release. Correlations for 42 release look as follows (for yesterday):

  nsSocketOutputStream::Write|EXCEPTION_ACCESS_VIOLATION_READ (223 crashes)
     90% (200/223) vs.   9% (5337/59068) sfc_os.dll
     57% (128/223) vs.   0% (193/59068) AM32-34121.dll
     93% (207/223) vs.  50% (29385/59068) RpcRtRemote.dll
ad muncher seems to be another adblocking tool working on the system level.
Tracking for 43, topcrash
Our code is calling PR_Write, but the stack is showing that we actually end up executing in a mapped virtual address range that contains executable code. Unfortunately we don't get enough info from crash reporter dumps to know who is responsible for that.

I suspect that xul.dll's IAT has been fudged with in order to make such a call stack possible, but that information doesn't look to be available in the dump either.
it probably uses a windows interface for hooking I/O (e.g. LSP) and crashes in their code executing in our address space. If someone could get a repro going we might be able to fool around with trying to see what the trigger is - but short of that a blacklist is probably the only near term answer. All firefox knows is that it is writing something out to the network..
Attached patch DLL Blocklist Patch (obsolete) — Splinter Review
Assignee: nobody → aklotz
Status: NEW → ASSIGNED
Attachment #8685644 - Flags: review?(benjamin)
Tracking for 43+ since this sounds like a fairly bad crash on release and beta.
Attachment #8685644 - Flags: review?(benjamin) → review+
https://hg.mozilla.org/mozilla-central/rev/594fd9ec7e88
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla45
Aaron, want to request uplift for this? It's the top crash on beta right now. We go to build on Thursday for beta 7 (or we could aim for beta 8 next Wednesday).
Flags: needinfo?(aklotz)
Comment on attachment 8685644 [details] [diff] [review]
DLL Blocklist Patch

Approval Request Comment
[Feature/regressing bug #]: DLL blocklist
[User impact if declined]: crashes caused by foreign dll
[Describe test coverage new/current, TreeHerder]: N/A, really needs to be deployed against infringing dll
[Risks and why]: Low, adds entry to blocklist data
[String/UUID change made/needed]: None
Flags: needinfo?(aklotz)
Attachment #8685644 - Flags: approval-mozilla-beta?
Attachment #8685644 - Flags: approval-mozilla-aurora?
Comment on attachment 8685644 [details] [diff] [review]
DLL Blocklist Patch

Prevents a topcrash, please uplift to aurora and beta.
Attachment #8685644 - Flags: approval-mozilla-beta?
Attachment #8685644 - Flags: approval-mozilla-beta+
Attachment #8685644 - Flags: approval-mozilla-aurora?
Attachment #8685644 - Flags: approval-mozilla-aurora+
this patch has gone into 43.0b7 but unfortunately the crash volume of the signature hasn't reduced at all (it's still #2 at the top crash score). 
so it looks like the dll is correlating but not directly causing this crash - should we back out the blocklist patch again in this case?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
aaron, any thoughts here? Still over 500 crashes in 43 beta 7 with this signature.
Flags: needinfo?(aklotz)
I'll investigate further. In the meantime I'm going to back out the Ad Muncher blocklist entry since it is ineffective. Aurora and beta approvals by Ritu and Liz over IRC.

https://hg.mozilla.org/releases/mozilla-beta/rev/7496aae828b6
https://hg.mozilla.org/releases/mozilla-aurora/rev/bb474924f23b
Flags: needinfo?(aklotz)
Keywords: leave-open
I am vewwy vewwy curious as to whether Windows App Compatability shims might be playing into this somehow. I've messaged my twitter contact to see what I can find out.
Philipp, according to comment 17, this is fixed in 44 and 45. Agreed?
Flags: needinfo?(philipp)
hi, probably a different philipp you needinfo'd there than you wanted to reach :-)

as per comment 17 we backed out a failed blocklisting attempt across the mentioned channels, that we've tried to tackle this crashing bug with originally. 
so this would leave these versions still affected by the crash described in comment 0, and it is indeed still one of the top crashers around. in last weeks beta data this crash is on #16 and making up ~1% of all crashes there...
Given that we don't have a fix ready and we are into RC mode, it's too late and this is now a wontfix for Fx44.
Attachment #8685644 - Attachment is obsolete: true
Attachment #8685644 - Flags: approval-mozilla-beta+
Attachment #8685644 - Flags: approval-mozilla-aurora+
Whiteboard: [necko-active]
If I look at the dump from https://crash-stats.mozilla.com/report/index/9971859f-a51f-4b04-a76d-a45212160220, the only thing that I can determine is that there is an IAT patch on PR_Write.

Using the new !iat command that I added to mozdbgext (shameless plug), it outputs the following:
Expected target: nss3.dll!PR_Write
Actual target: 0x247c094

Unfortunately 0x247c094 is not included in our crash dump, so I have no further information.
Renaming -- Ad Muncher was not the problem.
Summary: crash in nsSocketOutputStream::Write mostly with Ad Muncher → crash in nsSocketOutputStream::Write due to PR_Write interception
Wontfix as Aaron suggested.
Aaron--any updates here?
Flags: needinfo?(aklotz)
Unfortunately no -- there is just not enough data included in our dumps to determine the origin or the content of those unknown frames.

bug 1250687 and/or bug 1251395 would probably help with this but they require some investigation and possibly upstream changes to breakpad.
Flags: needinfo?(aklotz)
thanks
Whiteboard: [necko-active] → [necko-backlog]
Crash volume for signature 'nsSocketOutputStream::Write':
 - beta    (version 48): 2042 crashes from 2016-06-06.
 - release (version 47): 8865 crashes from 2016-05-31.
 - esr     (version 45): 126 crashes from 2016-04-07.

Crash volume on the last weeks:
             Week N-1   Week N-2   Week N-3   Week N-4   Week N-5   Week N-6   Week N-7
 - nightly          0          0          0          0          0          0          0
 - aurora           0          0          0          0          0          0          0
 - beta           330        307        262        249        339        330        106
 - release       1302       1331       1176       1250       1457       1430        515
 - esr              3          7         14          8          2          6          1

Affected platforms: Windows, Linux
starting with 49 this crash is showing up as [@ mozilla::net::nsSocketOutputStream::Write ]
Crash Signature: [@ nsSocketOutputStream::Write] → [@ nsSocketOutputStream::Write] [@ mozilla::net::nsSocketOutputStream::Write ]
See Also: → 1302343
Tracking in hopes we will see the crash rate go down as Norton rolls out its fix.
Crash volume for signature 'mozilla::net::nsSocketOutputStream::Write':
 - nightly (version 52): 1 crash from 2016-09-19.
 - aurora  (version 51): 0 crashes from 2016-09-19.
 - beta    (version 50): 663 crashes from 2016-09-20.
 - release (version 49): 2852 crashes from 2016-09-05.
 - esr     (version 45): 0 crashes from 2016-07-25.

Crash volume on the last weeks (Week N is from 10-17 to 10-23):
            W. N-1  W. N-2  W. N-3  W. N-4
 - nightly       0       0       0       0
 - aurora        0       0       0       0
 - beta         90     208     269      58
 - release     822     819     746     191
 - esr           0       0       0       0

Affected platform: Windows

Crash rank on the last 7 days:
           Browser     Content   Plugin
 - nightly #918
 - aurora
 - beta    #144
 - release #69
 - esr
See Also: → 1334907
See Also: → 1352206
Too late for firefox 52, mass-wontfix.
Assignee: aklotz → nobody
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P1
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: P1 → P3
This user crashes 1-3 times per day bp-b4bd2d49-b4d7-4c96-bdee-468650180618  mozilla::net::nsSocketOutputStream::Write
Whiteboard: [necko-backlog] → [necko-backlog][tbird crash]
Keywords: leave-open
OS: Windows NT → Windows
Target Milestone: mozilla45 → Future
Flags: needinfo?(nhnguyen)
Keywords: topcrash

Bugbug thinks this bug is a regression, but please revert this change in case of error.

Keywords: regression

Junior, can you have a look? Thanks!

Flags: needinfo?(nhnguyen) → needinfo?(juhsu)

The additional clue is 71 raises the crash rate.
We might make more folks to brainstorm which patch might be the raiser in 71.
OTOH let's see if there's any clue to get a STR from crash report.

Flags: needinfo?(juhsu)
QA Whiteboard: qa-not-actionable
See Also: → 1742233
Severity: critical → S2

Ongoing crash at low-ish rates (average of ~2/day). Frequent wildptrs

Group: core-security
Priority: P3 → --
Group: core-security → network-core-security
Crash Signature: [@ nsSocketOutputStream::Write] [@ mozilla::net::nsSocketOutputStream::Write ] → [@ mozilla::net::nsSocketOutputStream::Write ]
Duplicate of this bug: 1742233
Status: REOPENED → NEW

Invalid reads and nearly all the reports have a high confidence bit flip.

Status: NEW → RESOLVED
Closed: 9 years ago7 months ago
Resolution: --- → INCOMPLETE

Since the bug is closed, the stalled keyword is now meaningless.
For more information, please visit BugBot documentation.

Keywords: stalled
Group: network-core-security
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: