Closed Bug 617505 Opened 14 years ago Closed 14 years ago

Don't OOM so easily growing dense arrays

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- betaN+

People

(Reporter: dmandelin, Assigned: gwagner)

References

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(2 files)

We have figured out the cause of the Dromaeo halt in sunspider-arrays-nsieve. A half-day WinDbg session with gregor and dvander showed that:

1. We run the test really fast on trace--500 run/s on a fast machine. This test creates 2 arrays, one of length 20,000 and one of length 40,000. (Here I estimate that we will have one array of length 32,000 and one of length 64,000. We also could have fragmentation from the old allocations, totalling a roughly equal amount of slots. So we use up to 168,000 * 8 * 500 slots here == 768,000 bytes. (Gregor backed this up by measuring a live malloc usage from this test of 885MB on a Mac.))

2. We OOM in JSContext::realloc|JSObject::growSlots, via a SETELEM called from trace. When we OOM on trace, js_ReportOutOfMemoryError does nothing--by design we return false from the function that OOM'd, so that the trace can bail and the interpreter can retry.

3. The trace exits and we retry in the interpreter. There, we OOM again. This time, js_ReportOutOfMemoryError actually does something: it sends the error on to NS_ScriptErrorReporter, which does nothing. This may be another bug. Either way, the result is that we exit JS execution without any error, so the Dromaeo test halts.

Why do we OOM, i.e., why don't we ever GC there?

1. We don't GC on trace, so clearly we have to run on trace until we OOM.

2. JSContext::(m|re)alloc can GC. Specifically, if the alloc function calls updateMallocCounter and the counter goes negative, then we GC. But realloc only calls updateMallocCounter if the input argument is NULL. [1] This mostly doesn't apply for growing big arrays, so we're not counting, and not checking.

3. JSContext::(m|re)alloc call onOutOfMemory if they get NULL from the system allocator. onOutOfMemory tries harder to find more memory, so it could potentially help us. But it only waits for any current sweep to end. It doesn't start a new GC.

Solution ideas:

1. Add a new form of JSContext::realloc that allows the original allocation size to be passed in. Use it in growSlots. Possibly use it in other places.

2. Run a GC in onOutOfMemory.

My take: #1 is easy, low risk, and should solve the immediate problem. #2 seems more powerful, but I'm not sure if there are risks to calling GC from that point. Can we GC successfully if malloc is out of memory?

Suggestions?


[1] See also bug 468840. Before that bug, realloc never allocated the counter, and it caused us to use huge amounts of memory in an array-heavy page. ISTR we considered trying to improve the accounting for realloc with non-null input, but were concerned about too much GC and there wasn't a compelling need at the time.
blocking2.0: --- → beta9+
Attached image instruments
The allocation statistics for the sunspider-arrays-nsieve benchmark:

The 2 big spikes are allocations of 256 and 512KB that account for about 1.4GB.
They are still alive after the benchmark so no GC happened.

The reason as David described in the previous post:
void* realloc(void* p, size_t bytes, JSContext *cx = NULL) {
    /*
     * For compatibility we do not account for realloc that increases
     * previously allocated memory.
     */
(In reply to comment #0)
> We have figured out the cause of the Dromaeo halt in sunspider-arrays-nsieve. 
> 
> Solution ideas:
> 
> 1. Add a new form of JSContext::realloc that allows the original allocation
> size to be passed in. Use it in growSlots. Possibly use it in other places.
> 

Lets try this first.
Assignee: general → anygregor
Yeah, the fact that we didn't do #1 never made much sense to me.  Let's give it a shot!
(In reply to comment #0)
> 2. Run a GC in onOutOfMemory.

Even with the conservative scanner working running the GC under JSContext::malloc is not safe.

First we need to monitor the code to ensure that all GC things are initialized before we call the malloc to ensure that the GC would not see halve-initialized GC thinks. Second we need to monitor the usage of non-GC structures like id auto arrays or property descriptors or shapes or XPConnect objects that points into GC things to ensure that it is safe against the GC under JS_malloc and friends.

A better approach would be to add a version of JS malloc/realloc that explicitly can do the GC and gradually replace the older code with never functions. So I suggest to start with a form of realloc that takes the size of the previous allocation and that can the GC on OOM or when other heuristics suggests so.
Blocks: 598466
Attached patch patchSplinter Review
This patch uses the information from growSlots to updated the mallocCounter for reallocs. Tryserver is green.

We could also use it for shrink slots but I don't think this would make much sense since we reset the mallocCounter after each GC anyway.
Other places where we use realloc: js_NewStringFromCharBuffer, StackTraceToString, APPEND_STRING_TO_STACK, MakeUpvarForEval, AddSpanDep
Comment on attachment 496218 [details] [diff] [review]
patch

Add an assert that new > old in realloc (in runtime, the choke point). New bug on remaining reallocs?
Attachment #496218 - Flags: review+
Bug 617805 is a followup for further changes.
http://hg.mozilla.org/tracemonkey/rev/aa1d2555b057
Whiteboard: fixed-in-tracemonkey
dmandelin will file a DOM bug to figure out why we silently failed here without a dialog saying "OOM" or something like that
http://hg.mozilla.org/mozilla-central/rev/aa1d2555b057
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
No longer blocks: 604961
No longer blocks: 609543
As per today's meeting, beta 9 will be a time-based release. Marking these all betaN+. Please move it back to beta9+ if  you believe it MUST be in the next beta (ie: trunk is in an unshippable state without this)
blocking2.0: beta9+ → betaN+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: