Last Comment Bug 798045 - (CVE-2012-4191) crash in mozilla::net::FailDelayManager::Lookup
(CVE-2012-4191)
: crash in mozilla::net::FailDelayManager::Lookup
Status: VERIFIED FIXED
[blocking bug 711793, but see comment...
: crash, regression, sec-critical, steps-wanted, topcrash
Product: Core
Classification: Components
Component: Networking: WebSockets (show other bugs)
: 16 Branch
: x86 Windows 7
: -- critical (vote)
: mozilla19
Assigned To: Jason Duell [:jduell] (needinfo me)
:
: Patrick McManus [:mcmanus]
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-10-04 14:24 PDT by Scoobidiver (away)
Modified: 2015-10-16 11:39 PDT (History)
14 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
fixed
fixed
fixed
verified
unaffected


Attachments
v1 (1.11 KB, patch)
2012-10-08 17:24 PDT, Jason Duell [:jduell] (needinfo me)
brian: review+
akeybl: approval‑mozilla‑aurora+
akeybl: approval‑mozilla‑beta+
akeybl: approval‑mozilla‑release+
akeybl: sec‑approval+
Details | Diff | Splinter Review
web page that demonstrates issue (1.32 KB, text/html)
2012-10-10 17:42 PDT, Jason Duell [:jduell] (needinfo me)
no flags Details
test v2 (1.37 KB, text/html)
2012-10-11 10:57 PDT, Jason Duell [:jduell] (needinfo me)
no flags Details

Description Scoobidiver (away) 2012-10-04 14:24:46 PDT
It's #114 top browser crasher in 16.0b5 and seems to happen only in Beta versions with this signature: 15.0 or 16.0.

Signature 	arena_dalloc_small | je_free | nsACString_internal::`scalar deleting destructor''(unsigned int) More Reports Search
UUID	c977c9fe-233d-4d8a-a1be-19ea42121004
Date Processed	2012-10-04 19:56:15
Uptime	82347
Last Crash	22.9 hours before submission
Install Age	4.4 days since version was first installed.
Install Time	2012-06-28 10:56:11
Product	Firefox
Version	16.0
Build ID	20120925201946
Release Channel	beta
OS	Windows NT
OS Version	6.1.7600
Build Architecture	x86
Build Architecture Info	GenuineIntel family 6 model 42 stepping 7
Crash Reason	EXCEPTION_BREAKPOINT
Crash Address	0x6e8f01f1
App Notes 	
AdapterVendorID: 0x8086, AdapterDeviceID: 0x0116, AdapterSubsysID: 15ea10cf, AdapterDriverVersion: 8.15.10.2287
D2D? D2D+ DWrite? DWrite+ D3D10 Layers? D3D10 Layers+ 
EMCheckCompatibility	True
Adapter Vendor ID	0x8086
Adapter Device ID	0x0116
Total Virtual Memory	2147352576
Available Virtual Memory	1357484032
System Memory Use Percentage	54
Available Page File	5218824192
Available Physical Memory	1687126016

Frame 	Module 	Signature 	Source
0 	mozglue.dll 	arena_dalloc_small 	memory/mozjemalloc/jemalloc.c:4510
1 	mozglue.dll 	je_free 	memory/mozjemalloc/jemalloc.c:6565
2 	xul.dll 	nsACString_internal::`scalar deleting destructor' 	
3 	xul.dll 	mozilla::net::FailDelayManager::Lookup 	netwerk/protocol/websocket/WebSocketChannel.cpp:218
4 	xul.dll 	mozilla::net::FailDelayManager::DelayOrBegin 	netwerk/protocol/websocket/WebSocketChannel.cpp:229
5 	xul.dll 	mozilla::net::nsWSAdmissionManager::ConditionallyConnect 	netwerk/protocol/websocket/WebSocketChannel.cpp:339
6 	xul.dll 	mozilla::net::WebSocketChannel::OnLookupComplete 	netwerk/protocol/websocket/WebSocketChannel.cpp:2217
7 	xul.dll 	`anonymous namespace'::DNSListenerProxy::OnLookupCompleteRunnable::Run 	netwerk/dns/nsDNSService2.cpp:552
8 	xul.dll 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:624
9 	xul.dll 	mozilla::ipc::MessagePump::Run 	ipc/glue/MessagePump.cpp:82
10 	xul.dll 	MessageLoop::RunHandler 	ipc/chromium/src/base/message_loop.cc:201
11 	xul.dll 	MessageLoop::Run 	ipc/chromium/src/base/message_loop.cc:175
12 	xul.dll 	nsBaseAppShell::Run 	widget/xpwidgets/nsBaseAppShell.cpp:163
13 	xul.dll 	nsAppShell::Run 	widget/windows/nsAppShell.cpp:232
14 	xul.dll 	nsAppStartup::Run 	toolkit/components/startup/nsAppStartup.cpp:257
15 	xul.dll 	XREMain::XRE_mainRun 	toolkit/xre/nsAppRunner.cpp:3794
16 	xul.dll 	XREMain::XRE_main 	toolkit/xre/nsAppRunner.cpp:3871
17 	xul.dll 	XRE_main 	toolkit/xre/nsAppRunner.cpp:3947
18 	firefox.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp:100
19 	firefox.exe 	__tmainCRTStartup 	crtexe.c:552
20 	kernel32.dll 	BaseThreadInitThunk 	
21 	ntdll.dll 	__RtlUserThreadStart 	
22 	ntdll.dll 	_RtlUserThreadStart 	

More reports at:
https://crash-stats.mozilla.com/report/list?signature=arena_dalloc_small+|+je_free+|+nsACString_internal%3A%3A%60scalar+deleting+destructor%27%27%28unsigned+int%29
Comment 1 Jason Duell [:jduell] (needinfo me) 2012-10-04 17:45:29 PDT
> 2 xul.dll nsACString_internal::`scalar deleting destructor' 	

Do we know how this gets generated (instead of the normal foo.cpp:line# stack trace)?

Assuming this is indeed happening in a CString, that would be FailDelay.mAddress.  But I don't see any obvious errors in how it's allocated/freed.   Do we know of any jemalloc bugs that might be the cause of this?
Comment 2 Scoobidiver (away) 2012-10-08 07:20:51 PDT
With combined signatures, it's #15 top browser crasher in 16.0b6.

There are no interesting correlations.
Comment 3 Alex Keybl [:akeybl] 2012-10-08 15:35:46 PDT
Assigning to you for investigation Jason, since this is in WebSockets. Also including Brian, since he landed the two networking changes in FF16b6.
Comment 4 Alex Keybl [:akeybl] 2012-10-08 15:46:50 PDT
bsmith suggests this may be sec-critical
Comment 5 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-10-08 15:53:39 PDT
http://hg.mozilla.org/releases/mozilla-beta/annotate/c32d775d1039/netwerk/protocol/websocket/WebSocketChannel.cpp#l2215

sWebSocketAdmissions->ConditionallyConnect(this);

Don't we need kung-fu death grip?
Comment 6 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-10-08 16:11:25 PDT
<bsmith> I don't think it is that race.
<bsmith> Seems like it gets deleted during static destruction of nsWSAdmissionManager
<bsmith> I did notice this:
<bsmith>       if (fail->mAddress.Equals(address) && fail->mPort == port) {
<bsmith>         if (outIndex)
<bsmith>           *outIndex = i;
<bsmith>         result = fail;
<bsmith>       } else if (fail->IsExpired(rightNow)) {
<bsmith> seems like there shoul be a "break;" after result = fail;
<bsmith> Oh, I got it, I think
<bsmith> I'm pretty sure that's it
<bsmith> We set *outIndex = i;
<bsmith> then, later, because we don't break, we delete some entries between 0 and i
<bsmith> but, we don't decrease i
Comment 7 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-10-08 16:39:10 PDT
I would expect this to crash in line 252/253, not line 218. But, I think the compiler might be inlining this code so that a crash at line 253 is reported as a crash in line 218. See:

https://mxr.mozilla.org/mozilla-beta/source/netwerk/protocol/websocket/WebSocketChannel.cpp?rev=6cd5f16a8c3c#216

216       } else if (fail->IsExpired(rightNow)) {
217         mEntries.RemoveElementAt(i);
218         delete fail;
219       }

...

https://mxr.mozilla.org/mozilla-beta/source/netwerk/protocol/websocket/WebSocketChannel.cpp?rev=6cd5f16a8c3c#251

251         } else if (fail->IsExpired(rightNow)) {
252           mEntries.RemoveElementAt(failIndex);
253           delete fail;
254         }

See also:
https://crash-stats.mozilla.com/query/query?product=Firefox&version=ALL%3AALL&range_value=1&range_unit=weeks&date=10%2F08%2F2012+23%3A27%3A41&query_search=signature&query_type=contains&query=DelayOrBegin&reason=&build_id=&process_type=any&hang_type=any&do_query=1

Or, it could be a different problem.
Comment 8 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-10-08 16:57:17 PDT
(In reply to Alex Keybl [:akeybl] from comment #4)
> bsmith suggests this may be sec-critical

This seems likely to have been caused by the patch for bug 711793, which also caused use-after-free bug 771318, which we're planning to release an advisory for. So, if this is exploitable, it is likely as close to (if not) a zero-day, especially if we release that advisory.

To avoid drawing even more attention to this code, I avoided adding it as blocking bug 711793 and I avoided mentioning this bug (and adding this comment) in bug 771318, just in case bug 771318 gets opened up.
Comment 9 Jason Duell [:jduell] (needinfo me) 2012-10-08 17:24:32 PDT
Created attachment 669347 [details] [diff] [review]
v1

Actually crash @ line 218 makes perfect sense here:  when Lookup() returns an index for an item that is no longer valid, and it now happens to be off the end of the nsTArray, we only have MOZ_ASSERT checks in RemoveElementAt() so it passes through in release builds and we then try to delete some garbage.  Solution as bsmith suggested is to break once Lookup() finds a match.  We'll get rid of any expired entries the next time it's called and nothing is found (which is the most common case), so stale entries are not an issue.

thanks to bsmith for tracking this down.

> Don't we need kung-fu death grip?

Good question, but not the bug at hand IMO.  I've emailed bsmedberg about whether our sWebsocketAdmissions usage is safe, and will file a bug if not.
Comment 10 Al Billings [:abillings] 2012-10-08 17:50:30 PDT
(In reply to Brian Smith (:bsmith) from comment #8)
> (In reply to Alex Keybl [:akeybl] from comment #4)
> > bsmith suggests this may be sec-critical
> 
> This seems likely to have been caused by the patch for bug 711793, which
> also caused use-after-free bug 771318, which we're planning to release an
> advisory for. So, if this is exploitable, it is likely as close to (if not)
> a zero-day, especially if we release that advisory.

bug 771318 is in a roll-up advisory for Abhishek's ASAN bugs. We don't list it or a similar bug by number in the advisory because the issues never shipped in a release, being found and fixed before becoming public.
Comment 11 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-10-08 19:38:59 PDT
Comment on attachment 669347 [details] [diff] [review]
v1

[Security approval request comment]
> How easily can the security issue be deduced from the patch?

I found the potential out-of-bounds memory access / double-free within 30 minutes, and I am not particularly good at this stuff.

> Do comments in the patch, the check-in comment, or tests
> included in the patch paint a bulls-eye on the security problem?

Review the comment in the patch.

> Which older supported branches are affected by this flaw?
> If not all supported branches, which bug introduced the flaw?

Bug 711793.

> Do you have backports for the affected branches? 
> If not, how different, hard to create, and risky will they be?

Should apply easily.

> How likely is this patch to cause regressions; how much testing does it need?

Very unlikely / not much.

> Is there anything we could do to make this sec-approval form better?

Yes, please have it automatically formatted like I have done here.
Comment 12 Jason Duell [:jduell] (needinfo me) 2012-10-08 21:49:09 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/f9469bf48ff6

re: exploitability.  It's possible to cause this crash by creating several websockets to different host:ports and then delaying the TCP connect process differently per socket at reconnect time.  I'm not sure how easy it is to then deterministically exploit the crash.
Comment 13 Jason Duell [:jduell] (needinfo me) 2012-10-08 21:51:34 PDT
meh--sorry, still getting used to the sec-approval process.
Comment 14 Ed Morley [:emorley] 2012-10-09 08:01:36 PDT
https://hg.mozilla.org/mozilla-central/rev/f9469bf48ff6
Comment 15 Scoobidiver (away) 2012-10-09 08:06:27 PDT
a nit in the comment: "will be be deleted"
Comment 16 Alex Keybl [:akeybl] 2012-10-09 10:34:18 PDT
Comment on attachment 669347 [details] [diff] [review]
v1

[Triage Comment]
Since this landed on m-i/m-c already, let's also get this on Aurora/Beta in preparation for possible 16.0.1 inclusion.
Comment 17 Jason Duell [:jduell] (needinfo me) 2012-10-09 17:11:33 PDT
https://hg.mozilla.org/releases/mozilla-beta/rev/5764e33f79d6
https://hg.mozilla.org/releases/mozilla-aurora/rev/aff619c66f1e

typo from comment 15 fixed there and in inbound.
Comment 18 Alex Keybl [:akeybl] 2012-10-10 10:24:16 PDT
Comment on attachment 669347 [details] [diff] [review]
v1

[Triage Comment]
To help prevent the possibility of a 16.0.2, let's land this fix in 16.0.1. Please land on mozilla-release asap.
Comment 19 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-10-10 10:31:37 PDT
Is there a testcase or steps to reproduce this crash? Given this is being uplifted to release, QA would like to be able to verify it is fixed.
Comment 20 Patrick McManus [:mcmanus] 2012-10-10 10:35:49 PDT
alex asked me on irc to land this on release

 https://hg.mozilla.org/releases/mozilla-release/rev/f24140f9a4d4
Comment 21 Alex Keybl [:akeybl] 2012-10-10 13:17:03 PDT
Marking as ESR10 unaffected based upon comment 11.
Comment 22 juan becerra [:juanb] 2012-10-10 15:10:52 PDT
If you have any suggestions on how to test this, please let us know. We'll have Fx16.0.1 builds ready sometime this evening, and any instructions would be helpful.
Comment 23 Jason Duell [:jduell] (needinfo me) 2012-10-10 15:30:51 PDT
I'm writing a test for this right now... stay tuned.
Comment 24 Jason Duell [:jduell] (needinfo me) 2012-10-10 17:42:51 PDT
Created attachment 670198 [details]
web page that demonstrates issue

So this webpage (you can either open it as a file or serve it off a webserver) crashes without the patch and is OK with it.  Crashes in opt-builds (such as the release) are harder to cause deterministically (in debug mode we hit an assert: in release mode we try to delete off the end of an array and that doesn't always crash), but I'm confident this is catching the cause of the crash.

Note that this test takes 70 seconds to run, so I don't think we want to make a mochitest out of it.  We're not likely to regress this codepath in this way again, so I think that's ok.
Comment 25 juan becerra [:juanb] 2012-10-10 20:29:38 PDT
I tried Fx16 and Fx16.0.1 builds but neither of them crashed on Windows 7. I tried a Mac nightly debug build where an older version crashed but the latest one didn't. I'm changing the status-firefox16 flag based on this and comment #24.
Comment 26 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-10-10 21:21:36 PDT
I was unable to reproduce a crash with the attached testcase using Firefox 17.0b1#1, 17.0b1#2, 2012-10-10 mozilla-beta debug, 2012-10-09 mozilla-beta debug, and 2012-10-07 mozilla-beta debug builds on Windows 7. 

Given these results, I don't think we can mark this verified for Firefox 17. If someone else is able to reproduce this more easily, please help by verifying this for Firefox 17.0b1#2, Firefox 18.0a2, and Firefox 19.0a1 builds.

That said, I'm not going to block Firefox 17b1's release if this goes unverified. I'm basing this decision on comment 24 and 25.
Comment 27 Scoobidiver (away) 2012-10-11 00:58:19 PDT
(In reply to juan becerra [:juanb] from comment #25)
> I tried Fx16 and Fx16.0.1 builds but neither of them crashed on Windows 7.
So it's not verified.
Comment 28 Ed Morley [:emorley] 2012-10-11 07:13:37 PDT
Typo fix:
https://hg.mozilla.org/mozilla-central/rev/b310e265f26c
Comment 29 Jason Duell [:jduell] (needinfo me) 2012-10-11 10:57:03 PDT
Created attachment 670454 [details]
test v2

This version has just crashed FF 16 for me on linux 3 out of 6 times.  I consider that good enough, given that we have crash reports that point us right at the line number that's failing, a very exact understanding of the bug, and deterministic failure in debug mode.
Comment 30 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-10-11 11:42:06 PDT
Unfortunately, I'm still not able to trigger a crash using test v2. I tried Firefox 17.0b1 build 1 on Linux 64-bit and 2012-10-09 mozilla-beta debug (Firefox 17) on Windows 7 32-bit. I tried 7 times on each waiting 2 minutes before reloading the test. Still no crash.

I think we are going to have to ship without explicit verification here, trusting Jason's assessment and crash reports as pointed out in comment 29.
Comment 31 Jason Duell [:jduell] (needinfo me) 2012-10-19 13:21:39 PDT
Followup on comment 5 (which we were wondering might need a separate bug):

> Don't we need kung-fu death grip [on sWebSocketAdmissions]?

After talking to bsmedberg about this:  No, we're guaranteed that since we don't destroy sWebSocketAdmissions until XPCOM shutdown, we can rely on it existing during any XPCOM events that we dispatch.  We won't handle any events after shutdown is called.
Comment 32 Virgil Dicu [:virgil] [QA] 2013-02-07 08:04:13 PST
I tried to reproduce a crash with attachment from comment 29 on Firefox 16.0 beta 5 with no luck. 

No crashes also for Firefox 19.0 beta 5
Mozilla/5.0 (Windows NT 6.1; rv:19.0) Gecko/20100101 Firefox/19.0
Build ID: 20130206083616

Crash reports are not showing any crashes with these signatures.
Moving this to verified.

Note You need to log in before you can comment on or make changes to this bug.