Closed Bug 798045 (CVE-2012-4191) Opened 12 years ago Closed 12 years ago

crash in mozilla::net::FailDelayManager::Lookup

Categories

(Core :: Networking: WebSockets, defect)

16 Branch
x86
Windows 7
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla19
Tracking Status
firefox16 + fixed
firefox17 --- fixed
firefox18 --- fixed
firefox19 --- verified
firefox-esr10 --- unaffected

People

(Reporter: scoobidiver, Assigned: jduell.mcbugs)

Details

(5 keywords, Whiteboard: [blocking bug 711793, but see comment 8][qa-])

Crash Data

Attachments

(2 files, 1 obsolete file)

It's #114 top browser crasher in 16.0b5 and seems to happen only in Beta versions with this signature: 15.0 or 16.0. Signature arena_dalloc_small | je_free | nsACString_internal::`scalar deleting destructor''(unsigned int) More Reports Search UUID c977c9fe-233d-4d8a-a1be-19ea42121004 Date Processed 2012-10-04 19:56:15 Uptime 82347 Last Crash 22.9 hours before submission Install Age 4.4 days since version was first installed. Install Time 2012-06-28 10:56:11 Product Firefox Version 16.0 Build ID 20120925201946 Release Channel beta OS Windows NT OS Version 6.1.7600 Build Architecture x86 Build Architecture Info GenuineIntel family 6 model 42 stepping 7 Crash Reason EXCEPTION_BREAKPOINT Crash Address 0x6e8f01f1 App Notes AdapterVendorID: 0x8086, AdapterDeviceID: 0x0116, AdapterSubsysID: 15ea10cf, AdapterDriverVersion: 8.15.10.2287 D2D? D2D+ DWrite? DWrite+ D3D10 Layers? D3D10 Layers+ EMCheckCompatibility True Adapter Vendor ID 0x8086 Adapter Device ID 0x0116 Total Virtual Memory 2147352576 Available Virtual Memory 1357484032 System Memory Use Percentage 54 Available Page File 5218824192 Available Physical Memory 1687126016 Frame Module Signature Source 0 mozglue.dll arena_dalloc_small memory/mozjemalloc/jemalloc.c:4510 1 mozglue.dll je_free memory/mozjemalloc/jemalloc.c:6565 2 xul.dll nsACString_internal::`scalar deleting destructor' 3 xul.dll mozilla::net::FailDelayManager::Lookup netwerk/protocol/websocket/WebSocketChannel.cpp:218 4 xul.dll mozilla::net::FailDelayManager::DelayOrBegin netwerk/protocol/websocket/WebSocketChannel.cpp:229 5 xul.dll mozilla::net::nsWSAdmissionManager::ConditionallyConnect netwerk/protocol/websocket/WebSocketChannel.cpp:339 6 xul.dll mozilla::net::WebSocketChannel::OnLookupComplete netwerk/protocol/websocket/WebSocketChannel.cpp:2217 7 xul.dll `anonymous namespace'::DNSListenerProxy::OnLookupCompleteRunnable::Run netwerk/dns/nsDNSService2.cpp:552 8 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:624 9 xul.dll mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:82 10 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:201 11 xul.dll MessageLoop::Run ipc/chromium/src/base/message_loop.cc:175 12 xul.dll nsBaseAppShell::Run widget/xpwidgets/nsBaseAppShell.cpp:163 13 xul.dll nsAppShell::Run widget/windows/nsAppShell.cpp:232 14 xul.dll nsAppStartup::Run toolkit/components/startup/nsAppStartup.cpp:257 15 xul.dll XREMain::XRE_mainRun toolkit/xre/nsAppRunner.cpp:3794 16 xul.dll XREMain::XRE_main toolkit/xre/nsAppRunner.cpp:3871 17 xul.dll XRE_main toolkit/xre/nsAppRunner.cpp:3947 18 firefox.exe wmain toolkit/xre/nsWindowsWMain.cpp:100 19 firefox.exe __tmainCRTStartup crtexe.c:552 20 kernel32.dll BaseThreadInitThunk 21 ntdll.dll __RtlUserThreadStart 22 ntdll.dll _RtlUserThreadStart More reports at: https://crash-stats.mozilla.com/report/list?signature=arena_dalloc_small+|+je_free+|+nsACString_internal%3A%3A%60scalar+deleting+destructor%27%27%28unsigned+int%29
> 2 xul.dll nsACString_internal::`scalar deleting destructor' Do we know how this gets generated (instead of the normal foo.cpp:line# stack trace)? Assuming this is indeed happening in a CString, that would be FailDelay.mAddress. But I don't see any obvious errors in how it's allocated/freed. Do we know of any jemalloc bugs that might be the cause of this?
With combined signatures, it's #15 top browser crasher in 16.0b6. There are no interesting correlations.
Crash Signature: [@ arena_dalloc_small | je_free | nsACString_internal::`scalar deleting destructor''(unsigned int)] → [@ arena_dalloc_small | je_free | nsACString_internal::`scalar deleting destructor''(unsigned int)] [@ nsAString_internal::Finalize()] [@ arena_dalloc_small | je_free | nsAutoString::`scalar deleting destructor''(unsigned int)] [@ memmove | nsTArray_ba…
Keywords: regression, topcrash
Assigning to you for investigation Jason, since this is in WebSockets. Also including Brian, since he landed the two networking changes in FF16b6.
Assignee: nobody → jduell.mcbugs
Group: core-security
bsmith suggests this may be sec-critical
<bsmith> I don't think it is that race. <bsmith> Seems like it gets deleted during static destruction of nsWSAdmissionManager <bsmith> I did notice this: <bsmith> if (fail->mAddress.Equals(address) && fail->mPort == port) { <bsmith> if (outIndex) <bsmith> *outIndex = i; <bsmith> result = fail; <bsmith> } else if (fail->IsExpired(rightNow)) { <bsmith> seems like there shoul be a "break;" after result = fail; <bsmith> Oh, I got it, I think <bsmith> I'm pretty sure that's it <bsmith> We set *outIndex = i; <bsmith> then, later, because we don't break, we delete some entries between 0 and i <bsmith> but, we don't decrease i
I would expect this to crash in line 252/253, not line 218. But, I think the compiler might be inlining this code so that a crash at line 253 is reported as a crash in line 218. See: https://mxr.mozilla.org/mozilla-beta/source/netwerk/protocol/websocket/WebSocketChannel.cpp?rev=6cd5f16a8c3c#216 216 } else if (fail->IsExpired(rightNow)) { 217 mEntries.RemoveElementAt(i); 218 delete fail; 219 } ... https://mxr.mozilla.org/mozilla-beta/source/netwerk/protocol/websocket/WebSocketChannel.cpp?rev=6cd5f16a8c3c#251 251 } else if (fail->IsExpired(rightNow)) { 252 mEntries.RemoveElementAt(failIndex); 253 delete fail; 254 } See also: https://crash-stats.mozilla.com/query/query?product=Firefox&version=ALL%3AALL&range_value=1&range_unit=weeks&date=10%2F08%2F2012+23%3A27%3A41&query_search=signature&query_type=contains&query=DelayOrBegin&reason=&build_id=&process_type=any&hang_type=any&do_query=1 Or, it could be a different problem.
(In reply to Alex Keybl [:akeybl] from comment #4) > bsmith suggests this may be sec-critical This seems likely to have been caused by the patch for bug 711793, which also caused use-after-free bug 771318, which we're planning to release an advisory for. So, if this is exploitable, it is likely as close to (if not) a zero-day, especially if we release that advisory. To avoid drawing even more attention to this code, I avoided adding it as blocking bug 711793 and I avoided mentioning this bug (and adding this comment) in bug 771318, just in case bug 771318 gets opened up.
Whiteboard: [blocking bug 711793, but see comment]
Attached patch v1Splinter Review
Actually crash @ line 218 makes perfect sense here: when Lookup() returns an index for an item that is no longer valid, and it now happens to be off the end of the nsTArray, we only have MOZ_ASSERT checks in RemoveElementAt() so it passes through in release builds and we then try to delete some garbage. Solution as bsmith suggested is to break once Lookup() finds a match. We'll get rid of any expired entries the next time it's called and nothing is found (which is the most common case), so stale entries are not an issue. thanks to bsmith for tracking this down. > Don't we need kung-fu death grip? Good question, but not the bug at hand IMO. I've emailed bsmedberg about whether our sWebsocketAdmissions usage is safe, and will file a bug if not.
Attachment #669347 - Flags: review?(bsmith)
(In reply to Brian Smith (:bsmith) from comment #8) > (In reply to Alex Keybl [:akeybl] from comment #4) > > bsmith suggests this may be sec-critical > > This seems likely to have been caused by the patch for bug 711793, which > also caused use-after-free bug 771318, which we're planning to release an > advisory for. So, if this is exploitable, it is likely as close to (if not) > a zero-day, especially if we release that advisory. bug 771318 is in a roll-up advisory for Abhishek's ASAN bugs. We don't list it or a similar bug by number in the advisory because the issues never shipped in a release, being found and fixed before becoming public.
Attachment #669347 - Flags: review?(bsmith)
Attachment #669347 - Flags: review+
Attachment #669347 - Flags: approval-mozilla-release?
Attachment #669347 - Flags: approval-mozilla-beta?
Attachment #669347 - Flags: approval-mozilla-aurora?
Comment on attachment 669347 [details] [diff] [review] v1 [Security approval request comment] > How easily can the security issue be deduced from the patch? I found the potential out-of-bounds memory access / double-free within 30 minutes, and I am not particularly good at this stuff. > Do comments in the patch, the check-in comment, or tests > included in the patch paint a bulls-eye on the security problem? Review the comment in the patch. > Which older supported branches are affected by this flaw? > If not all supported branches, which bug introduced the flaw? Bug 711793. > Do you have backports for the affected branches? > If not, how different, hard to create, and risky will they be? Should apply easily. > How likely is this patch to cause regressions; how much testing does it need? Very unlikely / not much. > Is there anything we could do to make this sec-approval form better? Yes, please have it automatically formatted like I have done here.
Attachment #669347 - Flags: sec-approval?
https://hg.mozilla.org/integration/mozilla-inbound/rev/f9469bf48ff6 re: exploitability. It's possible to cause this crash by creating several websockets to different host:ports and then delaying the TCP connect process differently per socket at reconnect time. I'm not sure how easy it is to then deterministically exploit the crash.
meh--sorry, still getting used to the sec-approval process.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla19
a nit in the comment: "will be be deleted"
Comment on attachment 669347 [details] [diff] [review] v1 [Triage Comment] Since this landed on m-i/m-c already, let's also get this on Aurora/Beta in preparation for possible 16.0.1 inclusion.
Attachment #669347 - Flags: approval-mozilla-beta?
Attachment #669347 - Flags: approval-mozilla-beta+
Attachment #669347 - Flags: approval-mozilla-aurora?
Attachment #669347 - Flags: approval-mozilla-aurora+
Comment on attachment 669347 [details] [diff] [review] v1 [Triage Comment] To help prevent the possibility of a 16.0.2, let's land this fix in 16.0.1. Please land on mozilla-release asap.
Attachment #669347 - Flags: sec-approval?
Attachment #669347 - Flags: sec-approval+
Attachment #669347 - Flags: approval-mozilla-release?
Attachment #669347 - Flags: approval-mozilla-release+
Is there a testcase or steps to reproduce this crash? Given this is being uplifted to release, QA would like to be able to verify it is fixed.
Marking as ESR10 unaffected based upon comment 11.
If you have any suggestions on how to test this, please let us know. We'll have Fx16.0.1 builds ready sometime this evening, and any instructions would be helpful.
I'm writing a test for this right now... stay tuned.
Attached file web page that demonstrates issue (obsolete) —
So this webpage (you can either open it as a file or serve it off a webserver) crashes without the patch and is OK with it. Crashes in opt-builds (such as the release) are harder to cause deterministically (in debug mode we hit an assert: in release mode we try to delete off the end of an array and that doesn't always crash), but I'm confident this is catching the cause of the crash. Note that this test takes 70 seconds to run, so I don't think we want to make a mochitest out of it. We're not likely to regress this codepath in this way again, so I think that's ok.
Status: RESOLVED → VERIFIED
Alias: CVE-2012-4191
Keywords: sec-critical
Whiteboard: [blocking bug 711793, but see comment] → [blocking bug 711793, but see comment 8]
I tried Fx16 and Fx16.0.1 builds but neither of them crashed on Windows 7. I tried a Mac nightly debug build where an older version crashed but the latest one didn't. I'm changing the status-firefox16 flag based on this and comment #24.
I was unable to reproduce a crash with the attached testcase using Firefox 17.0b1#1, 17.0b1#2, 2012-10-10 mozilla-beta debug, 2012-10-09 mozilla-beta debug, and 2012-10-07 mozilla-beta debug builds on Windows 7. Given these results, I don't think we can mark this verified for Firefox 17. If someone else is able to reproduce this more easily, please help by verifying this for Firefox 17.0b1#2, Firefox 18.0a2, and Firefox 19.0a1 builds. That said, I'm not going to block Firefox 17b1's release if this goes unverified. I'm basing this decision on comment 24 and 25.
(In reply to juan becerra [:juanb] from comment #25) > I tried Fx16 and Fx16.0.1 builds but neither of them crashed on Windows 7. So it's not verified.
Attached file test v2
This version has just crashed FF 16 for me on linux 3 out of 6 times. I consider that good enough, given that we have crash reports that point us right at the line number that's failing, a very exact understanding of the bug, and deterministic failure in debug mode.
Attachment #670198 - Attachment is obsolete: true
Unfortunately, I'm still not able to trigger a crash using test v2. I tried Firefox 17.0b1 build 1 on Linux 64-bit and 2012-10-09 mozilla-beta debug (Firefox 17) on Windows 7 32-bit. I tried 7 times on each waiting 2 minutes before reloading the test. Still no crash. I think we are going to have to ship without explicit verification here, trusting Jason's assessment and crash reports as pointed out in comment 29.
Whiteboard: [blocking bug 711793, but see comment 8] → [blocking bug 711793, but see comment 8][qa-]
Followup on comment 5 (which we were wondering might need a separate bug): > Don't we need kung-fu death grip [on sWebSocketAdmissions]? After talking to bsmedberg about this: No, we're guaranteed that since we don't destroy sWebSocketAdmissions until XPCOM shutdown, we can rely on it existing during any XPCOM events that we dispatch. We won't handle any events after shutdown is called.
Group: core-security
I tried to reproduce a crash with attachment from comment 29 on Firefox 16.0 beta 5 with no luck. No crashes also for Firefox 19.0 beta 5 Mozilla/5.0 (Windows NT 6.1; rv:19.0) Gecko/20100101 Firefox/19.0 Build ID: 20130206083616 Crash reports are not showing any crashes with these signatures. Moving this to verified.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: