Closed Bug 798045 (CVE-2012-4191) Opened 12 years ago Closed 12 years ago

crash in mozilla::net::FailDelayManager::Lookup

Categories

(Core :: Networking: WebSockets, defect)

16 Branch
x86
Windows 7
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla19
Tracking Status
firefox16 + fixed
firefox17 --- fixed
firefox18 --- fixed
firefox19 --- verified
firefox-esr10 --- unaffected

People

(Reporter: scoobidiver, Assigned: jduell.mcbugs)

Details

(5 keywords, Whiteboard: [blocking bug 711793, but see comment 8][qa-])

Crash Data

Attachments

(2 files, 1 obsolete file)

It's #114 top browser crasher in 16.0b5 and seems to happen only in Beta versions with this signature: 15.0 or 16.0.

Signature 	arena_dalloc_small | je_free | nsACString_internal::`scalar deleting destructor''(unsigned int) More Reports Search
UUID	c977c9fe-233d-4d8a-a1be-19ea42121004
Date Processed	2012-10-04 19:56:15
Uptime	82347
Last Crash	22.9 hours before submission
Install Age	4.4 days since version was first installed.
Install Time	2012-06-28 10:56:11
Product	Firefox
Version	16.0
Build ID	20120925201946
Release Channel	beta
OS	Windows NT
OS Version	6.1.7600
Build Architecture	x86
Build Architecture Info	GenuineIntel family 6 model 42 stepping 7
Crash Reason	EXCEPTION_BREAKPOINT
Crash Address	0x6e8f01f1
App Notes 	
AdapterVendorID: 0x8086, AdapterDeviceID: 0x0116, AdapterSubsysID: 15ea10cf, AdapterDriverVersion: 8.15.10.2287
D2D? D2D+ DWrite? DWrite+ D3D10 Layers? D3D10 Layers+ 
EMCheckCompatibility	True
Adapter Vendor ID	0x8086
Adapter Device ID	0x0116
Total Virtual Memory	2147352576
Available Virtual Memory	1357484032
System Memory Use Percentage	54
Available Page File	5218824192
Available Physical Memory	1687126016

Frame 	Module 	Signature 	Source
0 	mozglue.dll 	arena_dalloc_small 	memory/mozjemalloc/jemalloc.c:4510
1 	mozglue.dll 	je_free 	memory/mozjemalloc/jemalloc.c:6565
2 	xul.dll 	nsACString_internal::`scalar deleting destructor' 	
3 	xul.dll 	mozilla::net::FailDelayManager::Lookup 	netwerk/protocol/websocket/WebSocketChannel.cpp:218
4 	xul.dll 	mozilla::net::FailDelayManager::DelayOrBegin 	netwerk/protocol/websocket/WebSocketChannel.cpp:229
5 	xul.dll 	mozilla::net::nsWSAdmissionManager::ConditionallyConnect 	netwerk/protocol/websocket/WebSocketChannel.cpp:339
6 	xul.dll 	mozilla::net::WebSocketChannel::OnLookupComplete 	netwerk/protocol/websocket/WebSocketChannel.cpp:2217
7 	xul.dll 	`anonymous namespace'::DNSListenerProxy::OnLookupCompleteRunnable::Run 	netwerk/dns/nsDNSService2.cpp:552
8 	xul.dll 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:624
9 	xul.dll 	mozilla::ipc::MessagePump::Run 	ipc/glue/MessagePump.cpp:82
10 	xul.dll 	MessageLoop::RunHandler 	ipc/chromium/src/base/message_loop.cc:201
11 	xul.dll 	MessageLoop::Run 	ipc/chromium/src/base/message_loop.cc:175
12 	xul.dll 	nsBaseAppShell::Run 	widget/xpwidgets/nsBaseAppShell.cpp:163
13 	xul.dll 	nsAppShell::Run 	widget/windows/nsAppShell.cpp:232
14 	xul.dll 	nsAppStartup::Run 	toolkit/components/startup/nsAppStartup.cpp:257
15 	xul.dll 	XREMain::XRE_mainRun 	toolkit/xre/nsAppRunner.cpp:3794
16 	xul.dll 	XREMain::XRE_main 	toolkit/xre/nsAppRunner.cpp:3871
17 	xul.dll 	XRE_main 	toolkit/xre/nsAppRunner.cpp:3947
18 	firefox.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp:100
19 	firefox.exe 	__tmainCRTStartup 	crtexe.c:552
20 	kernel32.dll 	BaseThreadInitThunk 	
21 	ntdll.dll 	__RtlUserThreadStart 	
22 	ntdll.dll 	_RtlUserThreadStart 	

More reports at:
https://crash-stats.mozilla.com/report/list?signature=arena_dalloc_small+|+je_free+|+nsACString_internal%3A%3A%60scalar+deleting+destructor%27%27%28unsigned+int%29
> 2 xul.dll nsACString_internal::`scalar deleting destructor' 	

Do we know how this gets generated (instead of the normal foo.cpp:line# stack trace)?

Assuming this is indeed happening in a CString, that would be FailDelay.mAddress.  But I don't see any obvious errors in how it's allocated/freed.   Do we know of any jemalloc bugs that might be the cause of this?
With combined signatures, it's #15 top browser crasher in 16.0b6.

There are no interesting correlations.
Crash Signature: [@ arena_dalloc_small | je_free | nsACString_internal::`scalar deleting destructor''(unsigned int)] → [@ arena_dalloc_small | je_free | nsACString_internal::`scalar deleting destructor''(unsigned int)] [@ nsAString_internal::Finalize()] [@ arena_dalloc_small | je_free | nsAutoString::`scalar deleting destructor''(unsigned int)] [@ memmove | nsTArray_ba…
Keywords: regression, topcrash
Assigning to you for investigation Jason, since this is in WebSockets. Also including Brian, since he landed the two networking changes in FF16b6.
Assignee: nobody → jduell.mcbugs
Group: core-security
bsmith suggests this may be sec-critical
<bsmith> I don't think it is that race.
<bsmith> Seems like it gets deleted during static destruction of nsWSAdmissionManager
<bsmith> I did notice this:
<bsmith>       if (fail->mAddress.Equals(address) && fail->mPort == port) {
<bsmith>         if (outIndex)
<bsmith>           *outIndex = i;
<bsmith>         result = fail;
<bsmith>       } else if (fail->IsExpired(rightNow)) {
<bsmith> seems like there shoul be a "break;" after result = fail;
<bsmith> Oh, I got it, I think
<bsmith> I'm pretty sure that's it
<bsmith> We set *outIndex = i;
<bsmith> then, later, because we don't break, we delete some entries between 0 and i
<bsmith> but, we don't decrease i
I would expect this to crash in line 252/253, not line 218. But, I think the compiler might be inlining this code so that a crash at line 253 is reported as a crash in line 218. See:

https://mxr.mozilla.org/mozilla-beta/source/netwerk/protocol/websocket/WebSocketChannel.cpp?rev=6cd5f16a8c3c#216

216       } else if (fail->IsExpired(rightNow)) {
217         mEntries.RemoveElementAt(i);
218         delete fail;
219       }

...

https://mxr.mozilla.org/mozilla-beta/source/netwerk/protocol/websocket/WebSocketChannel.cpp?rev=6cd5f16a8c3c#251

251         } else if (fail->IsExpired(rightNow)) {
252           mEntries.RemoveElementAt(failIndex);
253           delete fail;
254         }

See also:
https://crash-stats.mozilla.com/query/query?product=Firefox&version=ALL%3AALL&range_value=1&range_unit=weeks&date=10%2F08%2F2012+23%3A27%3A41&query_search=signature&query_type=contains&query=DelayOrBegin&reason=&build_id=&process_type=any&hang_type=any&do_query=1

Or, it could be a different problem.
(In reply to Alex Keybl [:akeybl] from comment #4)
> bsmith suggests this may be sec-critical

This seems likely to have been caused by the patch for bug 711793, which also caused use-after-free bug 771318, which we're planning to release an advisory for. So, if this is exploitable, it is likely as close to (if not) a zero-day, especially if we release that advisory.

To avoid drawing even more attention to this code, I avoided adding it as blocking bug 711793 and I avoided mentioning this bug (and adding this comment) in bug 771318, just in case bug 771318 gets opened up.
Whiteboard: [blocking bug 711793, but see comment]
Attached patch v1Splinter Review
Actually crash @ line 218 makes perfect sense here:  when Lookup() returns an index for an item that is no longer valid, and it now happens to be off the end of the nsTArray, we only have MOZ_ASSERT checks in RemoveElementAt() so it passes through in release builds and we then try to delete some garbage.  Solution as bsmith suggested is to break once Lookup() finds a match.  We'll get rid of any expired entries the next time it's called and nothing is found (which is the most common case), so stale entries are not an issue.

thanks to bsmith for tracking this down.

> Don't we need kung-fu death grip?

Good question, but not the bug at hand IMO.  I've emailed bsmedberg about whether our sWebsocketAdmissions usage is safe, and will file a bug if not.
Attachment #669347 - Flags: review?(bsmith)
(In reply to Brian Smith (:bsmith) from comment #8)
> (In reply to Alex Keybl [:akeybl] from comment #4)
> > bsmith suggests this may be sec-critical
> 
> This seems likely to have been caused by the patch for bug 711793, which
> also caused use-after-free bug 771318, which we're planning to release an
> advisory for. So, if this is exploitable, it is likely as close to (if not)
> a zero-day, especially if we release that advisory.

bug 771318 is in a roll-up advisory for Abhishek's ASAN bugs. We don't list it or a similar bug by number in the advisory because the issues never shipped in a release, being found and fixed before becoming public.
Attachment #669347 - Flags: review?(bsmith)
Attachment #669347 - Flags: review+
Attachment #669347 - Flags: approval-mozilla-release?
Attachment #669347 - Flags: approval-mozilla-beta?
Attachment #669347 - Flags: approval-mozilla-aurora?
Comment on attachment 669347 [details] [diff] [review]
v1

[Security approval request comment]
> How easily can the security issue be deduced from the patch?

I found the potential out-of-bounds memory access / double-free within 30 minutes, and I am not particularly good at this stuff.

> Do comments in the patch, the check-in comment, or tests
> included in the patch paint a bulls-eye on the security problem?

Review the comment in the patch.

> Which older supported branches are affected by this flaw?
> If not all supported branches, which bug introduced the flaw?

Bug 711793.

> Do you have backports for the affected branches? 
> If not, how different, hard to create, and risky will they be?

Should apply easily.

> How likely is this patch to cause regressions; how much testing does it need?

Very unlikely / not much.

> Is there anything we could do to make this sec-approval form better?

Yes, please have it automatically formatted like I have done here.
Attachment #669347 - Flags: sec-approval?
https://hg.mozilla.org/integration/mozilla-inbound/rev/f9469bf48ff6

re: exploitability.  It's possible to cause this crash by creating several websockets to different host:ports and then delaying the TCP connect process differently per socket at reconnect time.  I'm not sure how easy it is to then deterministically exploit the crash.
meh--sorry, still getting used to the sec-approval process.
https://hg.mozilla.org/mozilla-central/rev/f9469bf48ff6
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla19
a nit in the comment: "will be be deleted"
Comment on attachment 669347 [details] [diff] [review]
v1

[Triage Comment]
Since this landed on m-i/m-c already, let's also get this on Aurora/Beta in preparation for possible 16.0.1 inclusion.
Attachment #669347 - Flags: approval-mozilla-beta?
Attachment #669347 - Flags: approval-mozilla-beta+
Attachment #669347 - Flags: approval-mozilla-aurora?
Attachment #669347 - Flags: approval-mozilla-aurora+
Comment on attachment 669347 [details] [diff] [review]
v1

[Triage Comment]
To help prevent the possibility of a 16.0.2, let's land this fix in 16.0.1. Please land on mozilla-release asap.
Attachment #669347 - Flags: sec-approval?
Attachment #669347 - Flags: sec-approval+
Attachment #669347 - Flags: approval-mozilla-release?
Attachment #669347 - Flags: approval-mozilla-release+
Is there a testcase or steps to reproduce this crash? Given this is being uplifted to release, QA would like to be able to verify it is fixed.
Marking as ESR10 unaffected based upon comment 11.
If you have any suggestions on how to test this, please let us know. We'll have Fx16.0.1 builds ready sometime this evening, and any instructions would be helpful.
I'm writing a test for this right now... stay tuned.
Attached file web page that demonstrates issue (obsolete) —
So this webpage (you can either open it as a file or serve it off a webserver) crashes without the patch and is OK with it.  Crashes in opt-builds (such as the release) are harder to cause deterministically (in debug mode we hit an assert: in release mode we try to delete off the end of an array and that doesn't always crash), but I'm confident this is catching the cause of the crash.

Note that this test takes 70 seconds to run, so I don't think we want to make a mochitest out of it.  We're not likely to regress this codepath in this way again, so I think that's ok.
Status: RESOLVED → VERIFIED
Alias: CVE-2012-4191
Keywords: sec-critical
Whiteboard: [blocking bug 711793, but see comment] → [blocking bug 711793, but see comment 8]
I tried Fx16 and Fx16.0.1 builds but neither of them crashed on Windows 7. I tried a Mac nightly debug build where an older version crashed but the latest one didn't. I'm changing the status-firefox16 flag based on this and comment #24.
I was unable to reproduce a crash with the attached testcase using Firefox 17.0b1#1, 17.0b1#2, 2012-10-10 mozilla-beta debug, 2012-10-09 mozilla-beta debug, and 2012-10-07 mozilla-beta debug builds on Windows 7. 

Given these results, I don't think we can mark this verified for Firefox 17. If someone else is able to reproduce this more easily, please help by verifying this for Firefox 17.0b1#2, Firefox 18.0a2, and Firefox 19.0a1 builds.

That said, I'm not going to block Firefox 17b1's release if this goes unverified. I'm basing this decision on comment 24 and 25.
(In reply to juan becerra [:juanb] from comment #25)
> I tried Fx16 and Fx16.0.1 builds but neither of them crashed on Windows 7.
So it's not verified.
Attached file test v2
This version has just crashed FF 16 for me on linux 3 out of 6 times.  I consider that good enough, given that we have crash reports that point us right at the line number that's failing, a very exact understanding of the bug, and deterministic failure in debug mode.
Attachment #670198 - Attachment is obsolete: true
Unfortunately, I'm still not able to trigger a crash using test v2. I tried Firefox 17.0b1 build 1 on Linux 64-bit and 2012-10-09 mozilla-beta debug (Firefox 17) on Windows 7 32-bit. I tried 7 times on each waiting 2 minutes before reloading the test. Still no crash.

I think we are going to have to ship without explicit verification here, trusting Jason's assessment and crash reports as pointed out in comment 29.
Whiteboard: [blocking bug 711793, but see comment 8] → [blocking bug 711793, but see comment 8][qa-]
Followup on comment 5 (which we were wondering might need a separate bug):

> Don't we need kung-fu death grip [on sWebSocketAdmissions]?

After talking to bsmedberg about this:  No, we're guaranteed that since we don't destroy sWebSocketAdmissions until XPCOM shutdown, we can rely on it existing during any XPCOM events that we dispatch.  We won't handle any events after shutdown is called.
Group: core-security
I tried to reproduce a crash with attachment from comment 29 on Firefox 16.0 beta 5 with no luck. 

No crashes also for Firefox 19.0 beta 5
Mozilla/5.0 (Windows NT 6.1; rv:19.0) Gecko/20100101 Firefox/19.0
Build ID: 20130206083616

Crash reports are not showing any crashes with these signatures.
Moving this to verified.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: