Bugzilla

Updated

•

15 years ago

Flags: blocking1.9.1? → blocking1.9.1+

Updated

•

15 years ago

Priority: -- → P1

Assignee

Updated

•

15 years ago

Assignee: general → graydon

Assignee

Comment 4

•

15 years ago

This is another case in which we have run out of official memory in the LIR buffer, it has set it's "I ran out of memory" flag, and we are still piling new data into it, because it'll be a while until we next check the flag.

As with all such cases, the "correct" fix is to either teach NJ to be infallible with memory allocation (say, make it abort the process when it *really* runs out of memory, and never OOM in this unofficial self-mutilating fashion due to merely passing its high water mark), or teach all the callers of it to check OOM after every call, or teach NJ to return NULL when it OOMs and check for NULL after every call.

I expect the response to these suggestions will be "all those are expensive and/or intrusive and/or poor style", so we'll carry on playing russian roulette with the notion that we can check the OOM status "often enough" that we don't keep reproducing this situation. If we do, we will get what we pay for: more bugs exactly like this one.

Comment 5

•

15 years ago

Easy fix. Check for the OOM flag in the loop. We do that every else too. We could actually do a simple static analysis to prove that we do this inside all loops, where it really hurts. We have 4k reserve so for straight-line code we should be good.

Assignee

Comment 6

•

15 years ago

(In reply to comment #5)
> Easy fix. Check for the OOM flag in the loop. We do that every else too. We
> could actually do a simple static analysis to prove that we do this inside all
> loops, where it really hurts. We have 4k reserve so for straight-line code we
> should be good.

No. I appreciate what you're trying to do here -- try to find "the last
bug" in our OOM-checking code -- but I want to propose a counterargument
that I think is more correct.

Currently we have two conditions being modeled via the OOM flag in NJ. One
is "we really ran out of memory", like no more system memory, and one is
just exceeding our 16mb or 32mb highwater mark for the TM code cache + LIR
buffers.

I think this is a mistake. I think that TM is actually not taking very good
care to check the OOM flag often enough, and I think there are lots of
cases still lurking in which it doesn't and won't. And we're about to ship
a product. Your hedging language here suggests you actually lack confidence
in the notion that we're looking at "the last" of these bugs too. I think
we're not. I'd bet my new bike we're not.

What I'd propose is -- at least for now -- modeling these two conditions
separately. Currently we upgrade every highwater-overflow (which is
frequent, predictable, guaranteed to be happening every few minutes on
millions of desktops) to a full-scale OOM.  This is unnecessary. We have,
in almost every one of these cases (say 99.9999%) lots of memory left. More
than a page. Probably thousands of pages. Certainly enough that there's no
sense turning a modest bug (exceeding highwater) into a likely-exploitable
memory error.

I don't want to patch over the problem of catching *real* OOMs
indefinitely, but they're so comparatively rare that I don't think it's
wise to be doing what we're currently doing. I've got a patch I'm about to
post that does both what you ask and also, along the lines I'm suggesting,
backs off using the OOM flag for anything other than a *real* OOM. Instead,
it tells the fragmento to use as much memory as it can (up to 1<<32), and
probes it from time to time to flush when it notices we're over the
vastly-lower limit.

This is *much more practically safe* when sent out to the world. Please
consider that fact.  A followup bug to "audit the OOM paths" is entirely
reasonable. But I think opening users up to this is not. It's not going to
spontaneously audit all those paths and make them correct to ship. It'll
just make a lot of people vulnerable.

Assignee

Comment 7

•

15 years ago

Attached patch A conservative fix that removes the artificial-OOM behavior — Details — Splinter Review

Attachment #370344 - Flags: review?(gal)

Comment 8

•

15 years ago

Sure, the proposed approach is much more effective. As long we check often enough, even mobile should not have an issue. Even under extreme memory pressure this should not behave worse than what we have now (we have a bunch of unchecked malloc/reallocs that crash when hit).

Updated

•

15 years ago

Attachment #370344 - Flags: review?(gal) → review+

Comment 9

•

15 years ago

Comment on attachment 370344 [details] [diff] [review]
A conservative fix that removes the artificial-OOM behavior

I am noting with amusement that you actually added the !lirbuf->outOMem() check as well, as suggested ;)

Assignee

Updated

•

15 years ago

Summary: TM: Crash [@ nanojit::LirBufWriter::insLinkToFar] or "Assertion failed: _buf->_nextPage" → TM: Avoid artificial OOM conditions

Assignee

Comment 10

•

15 years ago

http://hg.mozilla.org/tracemonkey/rev/fe60e7460cd1

Comment 11

•

15 years ago

http://hg.mozilla.org/mozilla-central/rev/fe60e7460cd1

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

Comment 12

•

15 years ago

http://hg.mozilla.org/releases/mozilla-1.9.1/rev/183a06b83e3c

Keywords: fixed1.9.1

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 13

•

15 years ago

better implemented as a crash test and checked in when fixed.

Flags: in-testsuite?

Whiteboard: [sg:critical?] → [sg:critical?][crashtest?]

Comment 14

•

15 years ago

Verified fixed with testcase given in comment 0 on trunk and 1.9.1 with the
following debug builds:

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2a1pre)
Gecko/20090422 Minefield/3.6a1pre ID:20090422224452

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1b4pre)
Gecko/20090422 Shiretoko/3.5b4pre ID:20090422122043

Status: RESOLVED → VERIFIED

Keywords: fixed1.9.1 → verified1.9.1

OS: Windows XP → All

Hardware: x86 → All

Target Milestone: --- → mozilla1.9.2a1

Comment 15

•

15 years ago

Attached file js1_5/Array/regress-474529.js — Details

Updated

•

15 years ago

Flags: in-testsuite? → in-testsuite+

Daniel Veditz [:dveditz]

Updated

•

15 years ago

Group: core-security

Flags: wanted1.9.0.x-