Closed
Bug 474529
Opened 16 years ago
Closed 15 years ago
TM: Avoid artificial OOM conditions
Categories
(Core :: JavaScript Engine, defect, P1)
Core
JavaScript Engine
Tracking
()
VERIFIED
FIXED
mozilla1.9.2a1
People
(Reporter: BijuMailList, Assigned: graydon)
Details
(4 keywords, Whiteboard: [sg:critical?][crashtest?])
Attachments
(3 files)
I have modified attachment 280909 [details] on bug 396191 by adding one more iteration. Now it crashes Firefox. Steps:- 1. open attached arrayCreation.html Result:- crashes on the last iteration Expected:- dont crash
Comment 1•16 years ago
|
||
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090119 Minefield/3.2a1pre ID:20090119074946 crash report: Signature nanojit::LirBufWriter::insLinkToFar(nanojit::LOpcode, nanojit::LIns*) UUID c0652bb7-9fc9-42e4-bac3-202f32090120 perhaps related to bug 470310 ?
Comment 2•16 years ago
|
||
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090122 Minefield/3.2a1pre ID:20090122021228 Still crashes with a build from the tracemonkey tree (I don't know if the patch has made it to mozilla-central yet though). Crash happens one frame earlier than before. crash report: Signature nanojit::LirBufWriter::insLinkTo(nanojit::LOpcode, nanojit::LIns*) bp-a9ef17cb-f6de-43a9-a467-2d5a22090123
Comment 3•15 years ago
|
||
(I tested this on latest TM checkout debug build on WinXP SP3) Assertion failed: _buf->_nextPage (c:/tracemonkey/js/src/nanojit/LIR.cpp:168) ===== (I tested this on latest-trunk mozilla-central nightly build on WinXP SP3) Turning security-sensitive and nominating blocking1.9.1? just-in-case, due to !exploitable PROBABLY_EXPLOITABLE result. I also confirm this as being JIT-only, as turning JIT content off doesn't crash. 1:002> !exploitable -v HostMachine\HostUser Executing Processor Architecture is x86 Debuggee is in User Mode Debuggee is a live user mode debugging session on the local machine Event Type: Exception *** WARNING: Unable to verify checksum for C:\Documents and Settings\Administrator\Desktop\firefox\xul.dll Exception Faulting Address: 0x4 First Chance Exception Type: STATUS_ACCESS_VIOLATION (0xC0000005) Exception Sub-Type: Write Access Violation Exception Hash (Major/Minor): 0x563a0616.0x32486061 Stack Trace: js3250!nanojit::LirBufWriter::insLinkToFar+0x6 js3250!nanojit::LirBufWriter::insLinkTo+0x63 js3250!nanojit::LirBufWriter::skip+0xc9 js3250!TraceRecorder::snapshot+0x4b8 js3250!TraceRecorder::box_jsval+0x9b js3250!TraceRecorder::newArray+0x1e8 js3250!TraceRecorder::functionCall+0x13a js3250!TraceRecorder::monitorRecording+0xa82 js3250!js_Interpret+0x1d62 js3250!js_Invoke+0x289 js3250!js_InternalInvoke+0x102 js3250!JS_CallFunctionValue+0x27 xul!nsJSContext::CallEventHandler+0x162 xul!nsJSEventListener::HandleEvent+0x230 xul!nsEventListenerManager::HandleEventSubType+0x6b xul!nsEventListenerManager::HandleEvent+0x222 xul!nsEventTargetChainItem::HandleEvent+0xb4 xul!nsEventTargetChainItem::HandleEventTargetChain+0x1eb xul!nsEventDispatcher::Dispatch+0x4f5 xul!DocumentViewerImpl::LoadComplete+0x10c xul!nsDocShell::EndPageLoad+0x63 xul!nsWebShell::EndPageLoad+0xa9 xul!nsDocShell::OnStateChange+0x83 xul!nsDocLoader::FireOnStateChange+0x123 xul!nsDocLoader::doStopDocumentLoad+0x1c xul!nsDocLoader::DocLoaderIsEmpty+0x130 xul!nsDocLoader::OnStopRequest+0xc8 xul!nsLoadGroup::RemoveRequest+0xb8 xul!nsDocument::DoUnblockOnload+0x3b Instruction Address: 0x480156 Description: User Mode Write AV near NULL Short Description: WriteAV Exploitability Classification: PROBABLY_EXPLOITABLE Recommended Bug Title: Probably Exploitable - User Mode Write AV near NULL starting at js3250!nanojit::LirBufWriter::insLinkToFar+0x6 (Hash=0x563a0616.0x32486061) User mode write access violations that are near NULL are probably exploitable.
Group: core-security
Flags: blocking1.9.1?
Keywords: assertion
Summary: TM: Modified test on Bug 396191 crashes firefox → TM: Crash [@ nanojit::LirBufWriter::insLinkToFar] or "Assertion failed: _buf->_nextPage"
Whiteboard: [sg:critical?]
Updated•15 years ago
|
Flags: blocking1.9.1? → blocking1.9.1+
Updated•15 years ago
|
Priority: -- → P1
Assignee | ||
Updated•15 years ago
|
Assignee: general → graydon
Assignee | ||
Comment 4•15 years ago
|
||
This is another case in which we have run out of official memory in the LIR buffer, it has set it's "I ran out of memory" flag, and we are still piling new data into it, because it'll be a while until we next check the flag. As with all such cases, the "correct" fix is to either teach NJ to be infallible with memory allocation (say, make it abort the process when it *really* runs out of memory, and never OOM in this unofficial self-mutilating fashion due to merely passing its high water mark), or teach all the callers of it to check OOM after every call, or teach NJ to return NULL when it OOMs and check for NULL after every call. I expect the response to these suggestions will be "all those are expensive and/or intrusive and/or poor style", so we'll carry on playing russian roulette with the notion that we can check the OOM status "often enough" that we don't keep reproducing this situation. If we do, we will get what we pay for: more bugs exactly like this one.
Comment 5•15 years ago
|
||
Easy fix. Check for the OOM flag in the loop. We do that every else too. We could actually do a simple static analysis to prove that we do this inside all loops, where it really hurts. We have 4k reserve so for straight-line code we should be good.
Assignee | ||
Comment 6•15 years ago
|
||
(In reply to comment #5) > Easy fix. Check for the OOM flag in the loop. We do that every else too. We > could actually do a simple static analysis to prove that we do this inside all > loops, where it really hurts. We have 4k reserve so for straight-line code we > should be good. No. I appreciate what you're trying to do here -- try to find "the last bug" in our OOM-checking code -- but I want to propose a counterargument that I think is more correct. Currently we have two conditions being modeled via the OOM flag in NJ. One is "we really ran out of memory", like no more system memory, and one is just exceeding our 16mb or 32mb highwater mark for the TM code cache + LIR buffers. I think this is a mistake. I think that TM is actually not taking very good care to check the OOM flag often enough, and I think there are lots of cases still lurking in which it doesn't and won't. And we're about to ship a product. Your hedging language here suggests you actually lack confidence in the notion that we're looking at "the last" of these bugs too. I think we're not. I'd bet my new bike we're not. What I'd propose is -- at least for now -- modeling these two conditions separately. Currently we upgrade every highwater-overflow (which is frequent, predictable, guaranteed to be happening every few minutes on millions of desktops) to a full-scale OOM. This is unnecessary. We have, in almost every one of these cases (say 99.9999%) lots of memory left. More than a page. Probably thousands of pages. Certainly enough that there's no sense turning a modest bug (exceeding highwater) into a likely-exploitable memory error. I don't want to patch over the problem of catching *real* OOMs indefinitely, but they're so comparatively rare that I don't think it's wise to be doing what we're currently doing. I've got a patch I'm about to post that does both what you ask and also, along the lines I'm suggesting, backs off using the OOM flag for anything other than a *real* OOM. Instead, it tells the fragmento to use as much memory as it can (up to 1<<32), and probes it from time to time to flush when it notices we're over the vastly-lower limit. This is *much more practically safe* when sent out to the world. Please consider that fact. A followup bug to "audit the OOM paths" is entirely reasonable. But I think opening users up to this is not. It's not going to spontaneously audit all those paths and make them correct to ship. It'll just make a lot of people vulnerable.
Assignee | ||
Comment 7•15 years ago
|
||
Attachment #370344 -
Flags: review?(gal)
Comment 8•15 years ago
|
||
Sure, the proposed approach is much more effective. As long we check often enough, even mobile should not have an issue. Even under extreme memory pressure this should not behave worse than what we have now (we have a bunch of unchecked malloc/reallocs that crash when hit).
Updated•15 years ago
|
Attachment #370344 -
Flags: review?(gal) → review+
Comment 9•15 years ago
|
||
Comment on attachment 370344 [details] [diff] [review] A conservative fix that removes the artificial-OOM behavior I am noting with amusement that you actually added the !lirbuf->outOMem() check as well, as suggested ;)
Assignee | ||
Updated•15 years ago
|
Summary: TM: Crash [@ nanojit::LirBufWriter::insLinkToFar] or "Assertion failed: _buf->_nextPage" → TM: Avoid artificial OOM conditions
Assignee | ||
Comment 10•15 years ago
|
||
http://hg.mozilla.org/tracemonkey/rev/fe60e7460cd1
Comment 11•15 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/fe60e7460cd1
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Comment 12•15 years ago
|
||
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/183a06b83e3c
Keywords: fixed1.9.1
Comment 13•15 years ago
|
||
better implemented as a crash test and checked in when fixed.
Flags: in-testsuite?
Whiteboard: [sg:critical?] → [sg:critical?][crashtest?]
Comment 14•15 years ago
|
||
Verified fixed with testcase given in comment 0 on trunk and 1.9.1 with the following debug builds: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2a1pre) Gecko/20090422 Minefield/3.6a1pre ID:20090422224452 Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1b4pre) Gecko/20090422 Shiretoko/3.5b4pre ID:20090422122043
Status: RESOLVED → VERIFIED
Keywords: fixed1.9.1 → verified1.9.1
OS: Windows XP → All
Hardware: x86 → All
Target Milestone: --- → mozilla1.9.2a1
Comment 15•15 years ago
|
||
Updated•15 years ago
|
Flags: in-testsuite? → in-testsuite+
Updated•15 years ago
|
Group: core-security
Flags: wanted1.9.0.x-
Comment 16•14 years ago
|
||
test checked into 1.9.0, 1.9.1, 1.9.2, tracemonkey. 1.9.3 will get picked up in the next merge.
You need to log in
before you can comment on or make changes to this bug.
Description
•