Closed Bug 617656 Opened 9 years ago Closed 9 years ago

Discard jitcode to avoid excessive memory consumption

Categories

(Core :: JavaScript Engine, defect)

x86
All
defect
Not set

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- betaN+

People

(Reporter: dmandelin, Assigned: bhackett)

References

Details

(Whiteboard: [cib-memory][fixed-in-tracemonkey])

Attachments

(1 file, 4 obsolete files)

Let's avoid using up all the memory for jitcode by discarding some jitcode when we get too much. The basic feature to discard code is already there, for the debugger, but there is some key design work to be done:

Issue 1. When do we decide to discard jitcode? 

On systems with a lot of memory, it may never be necessary, so it would be nice to do it only under memory pressure. But I don't think we have time to figure out memory pressure detection, or how to do such things safely in low-memory situations, so I think we have to keep it a lot simpler. One idea would be to set a reasonable budget for total jitcode memory consumption, and if we go over, discard until we go under.

Tests so far say that the mjit consumes 380MB in the 80-tab test. Bug 611400 will cut that to 290MB. Bug 616367 and bug 616310 may shave off another 30MB or so, so we should be around 250MB after that. Apparently that's too much, so maybe we can start with 128MB as the limit. That should be enough to jit everything in a 40-tab test, while still using much less memory than before. The limit should be prefable, though.

Issue 2. How do we discard jitcode?

The easiest thing would be to stop jitting once we reach the limit. That's kind of bad, though, because it means if you have a lot of stuff open and you start using a new JS-heavy thing, it won't get jitted.

I think the next easiest is to iterate over live scripts and arbitrarily discard jitcode that is not part of active frames, and then purge call ICs. One nice thing about that is that it discards only code that isn't now running, which is presumably less likely to run later. One problem is that if lots of stuff is active, we may not be able discard, but I think that would be rare, and we can always fall back to not jitting new code in that case.

One refinement to that scheme would be to use LRU or some related scheme to try to discard code that is less likely to run later. I'm not sure we can do that without taking a perf hit, though. Maybe instead we could use the clock algorithm on compartments, to try to purge mostly out of compartments that have not run recently.

Another refinement would be to be able to discard live code if necessary. I think it's not so hard to implement, but it is higher risk and I'm not sure there's much benefit.

A third refinement would be to mark a script as unjittable if it gets discarded a lot, to reduce thrashing on compilation. Not sure we will need it.
blocking2.0: --- → beta9+
Coincidentally, I was just filing the same bug when dvander pointed out yours.  I like the basic idea you described: allocate up to a certain size limit, then, if you cross it, you kick off someone else.  I also like the idea of maintaining an LRU list of compartments: compartments are entered/exited relatively infrequently and through a few pinch points, so it should be cheap and easy to maintain an embedded linked list.

Another issue in addition those you mentioned is "where to detect over-use and discard jit code?"  It seems GC is the right time to do this.  With everything stopped, once we decide we need to purge, it should be cheap to scan all the JSThreadData's StackSpace's StackSegment's to find the list of active compartments and exclude them from consideration for purging.  This should avoid any complexities due to concurrency or purging active scripts.  Also, since we run GC during idle time, the memory should be freed by the time any human takes a measurement.
Generally the goal should be to discard those JITed scripts that provide the least performance gain. Those usually are scripts which are not running (e.g. in background tabs) or scripts that only run spuriously.

Basically, anything that hasn't run for... let's say 10 seconds should be considered discardable. Why? Because if it hasn't run for 10 seconds it doesn't run for a significant fraction of the CPU time, therefore compiling it doesn't yield much of a performance improvement and we can discard the JITed code.

The only thing that might actually bite us here is that JM compiles eagerly. So 10000 scripts where each runs every 11 seconds would incur the full compilation overhead every time they run because the JITed scripts would get discarded after 10 seconds.

But this is a worst-case scenario which could be solved in the future by using lazy compilation. (see bug 615199 comment 20).

So even a simple scheme like "drop everything that hasn't been used for 10 seconds" will only incur a 0.1% performance penalty (assuming 10ms compilation overhead every 10 seconds) in most cases. And the bad edge cases would be covered by lazy compilation in the future. And none of them would occur in benchmarks, since those keep the CPU busy and aren't inactive for 10 seconds.


In other words, we don't really need to restrict ourselves to "when under memory pressure" conditions, because the performance penalty for dropping rarely used JIT code should be minimal, while freeing up physical memory can improve other performance characteristics of a system (file cache, swapping).
Assignee: general → bhackett1024
(In reply to comment #2)
> Generally the goal should be to discard those JITed scripts that provide the
> least performance gain. Those usually are scripts which are not running (e.g.
> in background tabs) or scripts that only run spuriously.

I was initially in favor of time-based purging but preferred a max-limit strategy for two reasons.  One is the intermittent script execution problem you mentioned.  The other is the case of scripts run in response to user input.  In both cases, lazy script compilation would reduce latency for short-running scripts, so that's good.  However, with lazy compilation, long-running scripts that benefit from the mjit, tjit and (later) type inference will perpetually give "just loaded" performance.  One scenario where this could be felt is a browser session with multiple open webapps that the user switches between.

The downside of the max-limit-based strategy, of course, is that over a period of inactivity it'll hold more memory alive.  This suggests using both time and a memory limit to purge, except with a much longer period of inactivity.  I suspect a realistic purging strategy would ultimately contain a mix of several such heuristics.
(In reply to comment #3)
> However, with lazy compilation, long-running scripts
> that benefit from the mjit, tjit and (later) type inference will perpetually
> give "just loaded" performance.  One scenario where this could be felt is a
> browser session with multiple open webapps that the user switches between.
Only if they get GCed in the meantime, due to inactivity, where inactivity means that they're using memory without trading that memory for CPU time, which is wasted memory.

Webapps may have timers running (ajax refresh) that would prevent them from getting GCed anyway.


Anyway, a big issue is wasted memory in inactive tabs. I have a session with 300 tabs of which i'll only use about 60-100 a day, depending on what i'm doing. Since all tabs get loaded at startup and their JS gets compiled it would be nice if there was some mechanism eventually leading to the JIT code being discarded.

> I was initially in favor of time-based purging but preferred a max-limit
> strategy for two reasons.
Well, my concern is that a max-limit may not be adaptive enough. A very heavy session might constantly run into it, leading to performance degradation due to constant re-compiling (or complete lack of JITed code) while it may be wasteful on a very small session on some weak system (such as a netbook).




Generally a maxed limit does not feel like "as much as possible and as little as necessary", it's rather arbitrary.
(In reply to comment #4)
> Only if they get GCed in the meantime, due to inactivity

GC is scheduled when the browser is idle.  There are many idle moments, even when actively using a web browser, so, e.g., you may use a web app, switch away for a few seconds, come back, and the app's script's jit code has been flushed.

> Webapps may have timers running (ajax refresh) that would prevent them from
> getting GCed anyway.

I think you are misunderstanding the proposal.  If a script is GCed, we immediately release its associated mjit resources (currently), so that's not the problem.  The issue is scripts that are live, but have attached infrequently-used mjit data that we want to throw away.
Going to work on a patch for this, I agree with comment 3 that we'll end up using a mix of heuristics, and mostly need a mechanism for discarding JIT code from GC.  I think as a first approximation though that a max-limit is unappealing for the reasons in comment 4 and that evicting scripts from compartments that haven't been used in a while looks nice (if there's an easy way to get that info).

Even if the initial heuristics suck you can make things more aggressive/adaptive by remembering on a script how many times it's been compiled, and reducing the likelihood of throwing away the JIT code on scripts that have been repeatedly compiled.
(In reply to comment #5)
> (In reply to comment #4)
> > Only if they get GCed in the meantime, due to inactivity
> 
> GC is scheduled when the browser is idle.  There are many idle moments, even
> when actively using a web browser, so, e.g., you may use a web app, switch away
> for a few seconds, come back, and the app's script's jit code has been flushed.

Then you misunderstood my reasoning. Not every GC should discard JIT code. I meant that during a GC it should discard JIT code that has not been used for N seconds. I.e. GCing should not discard all JIT code everytime it runs.



> > Webapps may have timers running (ajax refresh) that would prevent them from
> > getting GCed anyway.
> 
> I think you are misunderstanding the proposal.  If a script is GCed, we
> immediately release its associated mjit resources (currently), so that's not
> the problem.  The issue is scripts that are live, but have attached
> infrequently-used mjit data that we want to throw away.

Yes, i know that. See above, my reasoning was that timers would touch scripts and thus reset their "last used since" timestamp and thus prevent them from getting GCed.


Sorry for using the term garbage collection loosely.
Attached patch aggressive discard patch (obsolete) — Splinter Review
This patch does the simple and aggressive thing of waiting for a GC and then discarding all method JIT code in compartments that do not have active stack frames.  This will (probably) be too aggressive, but is a decent baseline --- now we need to decide what we want to keep, instead of what to discard.

To help with that, this patch also spews compilation/release data for all scripts and their compartments (in both debug and release modes, though I've only tested debug), to a file compilation.txt.  By post-processing with a script (to be attached shortly), we can measure how much time is spent compiling scripts for the first time, recompiling scripts, and how much code memory and JITScript memory is allocated and released.
Attached patch revised patch (obsolete) — Splinter Review
This adds timestamps and info about deleted scripts to the log, and removes all URI info (just pointer addresses, times and sizes are logged).  compilations.txt logs from browsing sessions with lots of tabs and/or JS-intensive tabs (e.g. gmail) would be very helpful!
Attachment #496519 - Attachment is obsolete: true
Attached file log script (obsolete) —
Script for getting basic compilation time and memory information from log files.
Attachment #496526 - Attachment mime type: application/octet-stream → text/plain
run time of JS calls could be useful to compare them to the compile time, i.e. to see how compiling overhead vs. compiled code speedup play out against each other.
Yes, but it's hard to easily measure that info without instrumenting the generated code to see how often it runs.  Going to work on getting some semi-repeatable numbers for compilation times.
Overall time spent in content-JS would work too, we could just substract the compilation time i think.
Depends on: 618056
Whiteboard: [cib-memory]
I rolled in a fix for bug 618056 and did some testing.  Numbers are from an OSX x64 release build.  Descriptions of the numbers:

Compile: Total time spent compiling new code.
Recompile: Total time spent recompiling code discarded by GC.
Allocated: Total allocation for JITScripts + JIT code (not including PIC stubs).

Workloads:

--- #1 Load 50 CAD tabs.

Compile: 879 ms
Recompile: 19 ms
Allocated: 117.7 MB

Per tab, this is 18ms to compile 2.4MB of code.  There's no reason I can see to keep code like this around, we should be aggressive in throwing it away.

--- #2a Sign in to GMail, open inbox (one tab).

Compile: 275 ms
Recompile: 21 ms
Allocated: 41.3 MB

--- #2b View GMail messages.

Tab out of Firefox and GC, then look at signed in GMail tab, open a thread, click on three messages, go back to inbox.  Repeat five times.  Numbers are inclusive of workload #2a.

Compile: 337 ms
Recompile: 786 ms
Allocated: 141.5 MB

This is 160ms of recompilation time every time GMail is revisited.  Should try to keep this code around.

--- #3 Load 20 Google search tabs (numbers 'one' through 'twenty').

Compile: 355 ms
Recompile: 16 ms
Allocated: 50.8 MB

This is very similar to workload #1.

--- #4a Load 5 Google maps tabs, scroll around and zoom in somewhere.

Compile: 609 ms
Recompile: 71 ms
Allocated: 84.8 MB

--- #4b Revisit map tabs.

Tab out of Firefox, then visit each tab, zoom out, scroll somewhere else, zoom in.  Repeat twice.  Numbers are inclusive of workload #4a.

Compile: 660 ms
Recompile: 490 ms
Allocated: 134.1 MB

This is 50ms of recompilation every time a maps tab is revisited and used, between workloads #1 and #2.

---

These suggest to me that it's more important to throw code away than to keep it around, but we should still try to keep the code for frequently accessed tabs like GMail.  I think LRU is overkill; the compiler should remember the last N (5?) compartments code was compiled for, and during GC throw away all code in other compartments which are not on the stack.  We shouldn't have to watch the actual execution (AutoEnterCompartment etc.), as GMail and maps execute some new code after tabbing into them.  I don't know if GMail executes new code when polling the server for new messages.
Forgot that compartments can be shared across tabs, and when opening up 50 CAD tabs everything is in the same compartment.  I'm going to guess the 50 CAD tab workload is not too realistic (any data available from Test Pilot?) but not *that* unrealistic --- it's probably common for someone to have many tabs open for different pages on the same site, mixed in with other tabs.

Given that the goal is to avoid code bloat in long lived browser sessions, and that recompilation overhead is not that huge, I think a simple solution is to just give compiled code a half life.  If you view a webpage, compile code for it and leave the tab open, after N amount of time (10 minutes?) 50% of that code will have been thrown away, 75% after 2N, 87.5% after 3N, etc. irrespective of how often that code runs.

Code from frequently accessed webpages will still occasionally be thrown away, but the recompilation penalty isn't excessive because most code will still be there when accessing the page.  And code from tabs which haven't been used in a while will almost all be done, even if code in other tabs from the same compartment has been used recently.
(In reply to comment #15)
> Forgot that compartments can be shared across tabs, and when opening up 50 CAD
> tabs everything is in the same compartment.

I haven't personally inspected the code, but that doesn't match up with what I've been told: every browsing context (tab/iframe) gets its own compartment.  Perhaps Andreas knows?
> that doesn't match up with what I've been told

Then you were told wrong.  Compartments are per-origin, not per-tab at the moment.  That might change at some point, though.

Brian, see http://mxr.mozilla.org/mozilla-central/source/xpcom/ds/nsExpirationTracker.h ?  We may be able to move it somewhere where jseng can use it too, if you think it would do what you want.
Attached patch patch (obsolete) — Splinter Review
This adds a 1/8 life for each JIT script of 2 minutes, which works out to a 1/2 life of about 10 minutes.  After 2 minutes have elapsed, at the next GC all scripted call ICs are purged and every eighth compiled script has its JIT code thrown away.

I don't think an LRU structure is needed here, mainly because of the overhead in tracking LRU order at a script granularity (and tracking things at a compartment granularity won't work too well per comment 15).
Attachment #496525 - Attachment is obsolete: true
Attachment #496526 - Attachment is obsolete: true
Attachment #496984 - Flags: review?(dmandelin)
(In reply to comment #17)
> Then you were told wrong.

Well, it seems I was told speculatively.
Comment on attachment 496984 [details] [diff] [review]
patch

Looks correct to me, except for one possible quasi-typo but I have some styling nits. r+ with nits addressed. 

>diff --git a/js/src/jscompartment.cpp b/js/src/jscompartment.cpp
>--- a/js/src/jscompartment.cpp
>+++ b/js/src/jscompartment.cpp

> void
>-JSCompartment::sweep(JSContext *cx)
>+JSCompartment::sweep(JSContext *cx, uint32 jitReleaseInterval)

The name 'jitReleaseInterval' is somewhat unclear. I'm not sure what's better, though. 'Modulus' seems more technically correct but probably not helpful to most readers. Maybe just explain it more explicitly in a comment (e.g., below) along the lines of 'jitReleaseInterval = N means that 1 out of every N scripts is released; except 0, which means release none'. And 'counter' would be a better name than 'interval' for the local variable.

>+
>+    /*
>+     * The JIT release interval is the frequency with which we should discard the JIT
>+     * code for compiled scripts, zero to never discard JIT code. Ignore this for
>+     * compartments which currently have active stack frames.
>+     */
>+    uint32 interval = jitReleaseInterval;
>+
>     for (JSCList *cursor = scripts.next; cursor != &scripts; cursor = cursor->next) {
>         JSScript *script = reinterpret_cast<JSScript *>(cursor);
>+        if (script->hasJITCode()) {
>+            if (active || jitReleaseInterval == 0) {
>+                mjit::ic::SweepCallICs(script, false);
>+            } else {
>+                if (--interval == 0) {
>+                    mjit::ReleaseScriptCode(cx, script);
>+                    interval = jitReleaseInterval;
>+                } else {
>+                    mjit::ic::SweepCallICs(script, true);
>+                }
>+            }
>+        }

IMO this is better (I just like fewer branches), but it is a matter of taste:

              if (!active && jitReleaseInterval != 0 && --interval == 0) {
                  mjit::ReleaseScriptCode(cx, script);
                  interval = jitReleaseInterval;
              } else {
                  mjit::ic::SweepCallICs(script, true);
              }

>--- a/js/src/jscompartment.h
>+++ b/js/src/jscompartment.h

1-line comment documenting this would be good, in particular to emphasize that it is a mutable 'tracking' variable used during GC, and not something that can be relied upon at arbitrary times.

>+    bool                         active;

>diff --git a/js/src/jsgc.cpp b/js/src/jsgc.cpp

>+    /*
>+     * Figure out how much JIT code should be released from inactive compartments.
>+     * If multiple eighth-lifes have passed, compound the release interval;
>+     * if enough time has passed, all inactive JIT code will be released.
>+     */
>+    uint32 jitReleaseInterval = 0;
>+    int64 now = PRMJ_Now();
>+    if (now >= rt->gcJitReleaseTime) {
>+        jitReleaseInterval = 8;
>+        rt->gcJitReleaseTime += JIT_SCRIPT_EIGHTH_LIFETIME;
>+        if (now >= rt->gcJitReleaseTime) {
>+            jitReleaseInterval = 4;
>+            rt->gcJitReleaseTime += JIT_SCRIPT_EIGHTH_LIFETIME * 2;
>+            if (now >= rt->gcJitReleaseTime) {
>+                jitReleaseInterval = 2;
>+                rt->gcJitReleaseTime += JIT_SCRIPT_EIGHTH_LIFETIME * 2;
>+                if (now >= rt->gcJitReleaseTime) {
>+                    jitReleaseInterval = 1;
>+                    rt->gcJitReleaseTime = now;
>+                }
>+            }
>+        }
>+    }

Hmmmm. This could be less redundant as a loop with shifts. Are you concerned that it is perf-critical and the compiler won't unroll it? Might want to check that it matters before unrolling manually.

Also, I think the last increment to rt->gcJitReleaseTime wants to use a factor of 4.

Finally, the compounding doesn't compute the real compounding fraction. I think that is fine, just wanted to mention it.


>diff --git a/js/src/methodjit/Compiler.cpp b/js/src/methodjit/Compiler.cpp
>--- a/js/src/methodjit/Compiler.cpp
>+++ b/js/src/methodjit/Compiler.cpp

>-        if (slow) {
>-            if (!stubcc.jumpInScript(*slow, target))
>-                stubcc.jumpInScript(*slow, target);
>-        }
>+        if (slow && !stubcc.jumpInScript(*slow, target))
>+            return false;

Ouch. Good catch.
Attachment #496984 - Flags: review?(dmandelin) → review+
(In reply to comment #18)
> This adds a 1/8 life for each JIT script of 2 minutes, which works out to a 1/2
> life of about 10 minutes.  After 2 minutes have elapsed, at the next GC all
> scripted call ICs are purged and every eighth compiled script has its JIT code
> thrown away.
If i read the patch correctly it only checks for active stack frames. Wouldn't it be better to also check for active docshells to make sure foreground tabs don't get GCed? This would avoid recompilation-pauses (100ms are noticeable for humans) in active tabs and thus avoid regressions in responsiveness. I've been told that compilation does not happen on a separate thread, so it would be important to keep compilation on the active tab to a minimum. Especially when you consider some java-script heavy tech demos (e.g. rendering to a canvas) where such a hiccups would be quite visible and compilation time would be above average.

We also might want to be slightly more aggressive about tabs that have been opened in the background (i.e. that have never been active) to quickly discard all JITed code when mass-opening tabs or after startup.

So basically... tab-activeness should be an additional factor to script-activeness.
Is there a good way to check for tab-activeness?  The basic problem described in comment 15 is that JS works at the granularity of compartments, which can span many tabs.

GCs that occur due to allocation by a tech demo etc. will have code from that compartment on the stack, and won't discard code from that compartment (and once those GCs are per-compartment, they shouldn't discard any code at all).

Even if recompilation occurs on the foreground tab (due to a GC triggered by browser inactivity, etc.), unless the browser is left alone for several minutes only 1/8 of the scripts would be collected, which would correspond to 20ms of recompilation on GMail.  It's not a pause either, as recompilation is done per-script (a few hundred usec each) and does not stop the world like the GC.
(In reply to comment #22)
> Is there a good way to check for tab-activeness?
The same way image discarding does i would assume. Apparently they're using the isActive property of the DocShell objects for that. I assume this is accessible from native code too.
sorry for the spam, i missed that part

(In reply to comment #22)
> GCs that occur due to allocation by a tech demo etc. will have code from that
> compartment on the stack, and won't discard code from that compartment (and
> once those GCs are per-compartment, they shouldn't discard any code at all).
Is there something that guarantees that property?

Because javascript, even in techdemos, does not run continously since it would block the UI if did. It's either event-triggered or run via setTimeout/setInterval. So there will be times where not JS is running. If a GC occurs during that gap then there would be no stacks for that compartment, correct?

So it seems to me that this activity check is quite brittle.
The activity check is mostly there for safety, as I understand, so we don't discard jit code we're in the middle of running.

Going from a script to a corresponding docshell is nontrivial.  It _can_ be done for scripts running in a window, with some work (walk up to the global object, etc).  But it wouldn't really work for scripts in workers, for example.

Furthermore, nothing says one of those background tabs isn't doing a computation for you that you actually care about.  Unlike discarding images (which you can't see, since the tab is in the background), JS behavior of backround tabs is observable.

We should have a way of backgrounding tabs in a way that completely suspends all script execution in them, but that's a completely separate issue.
(In reply to comment #24)
> Because javascript, even in techdemos, does not run continously since it would
> block the UI if did. It's either event-triggered or run via
> setTimeout/setInterval. So there will be times where not JS is running. If a GC
> occurs during that gap then there would be no stacks for that compartment,
> correct?
> 
> So it seems to me that this activity check is quite brittle.

Can you point me to some of these tech demos so I can look at them?  Per the numbers in comment 14, I think we can get away with happily throwing away some code (just not all code) from all inactive compartments as the recompilation cost just isn't all that much.  GMail uses a lot of javascript, and even if it isn't as JS bound as a tech demo, it probably has more actual code and that's what counts for compilation time.
FWIW: Valgrind's code cache is divided into 8 sectors.  When the cache fills up one of the sectors is discarded.  The discard order is FIFO.  FIFO is much easier than LRU to implement and gives reasonably good results.(In reply to comment #14)

> --- #1 Load 50 CAD tabs.
> 
> Compile: 879 ms
> Recompile: 19 ms
> Allocated: 117.7 MB

These numbers are prior to your patch, right?
(In reply to comment #26)
> GMail uses a lot of javascript, and even if it
> isn't as JS bound as a tech demo, it probably has more actual code and that's
> what counts for compilation time.
Right, i haven't thought of that.

Anyway, i hope this one will do: http://mariuswatz.com/works/abstract01js/index_auto.html
Significant halting should be visible on it.
(In reply to comment #27)
> FWIW: Valgrind's code cache is divided into 8 sectors.  When the cache fills up
> one of the sectors is discarded.  The discard order is FIFO.  FIFO is much
> easier than LRU to implement and gives reasonably good results.(In reply to
> comment #14)
> 
> > --- #1 Load 50 CAD tabs.
> > 
> > Compile: 879 ms
> > Recompile: 19 ms
> > Allocated: 117.7 MB
> 
> These numbers are prior to your patch, right?

These numbers are from the patch in comment 9 (collect stuff as soon as possible).  I omitted info about released data --- pretty much everything is, since I tabbed out before measuring.
Attached patch pool patchSplinter Review
This restructures things to randomly pick pools to destroy rather than scripts.  During script sweep, we purge all ICs which have stubs, and pick every Nth pool containing a JITScript's code to destroy.  All scripts referring to that pool are released, which will probably destroy the pool itself.  This avoids fragmentation seen with the previous patch caused by destroying most scripts in a pool but not enough to free the pool itself (pools are 64KB, which typically will hold a dozen or more scripts).
Attachment #496984 - Attachment is obsolete: true
Attachment #497506 - Flags: review?(dmandelin)
Brian, do you have any measurements showing the effect of the patch?
(In reply to comment #30)
> Created attachment 497506 [details] [diff] [review]
> pool patch

You're still debugging a leak, right? I do have a couple of initial comments, one of which might be the leak.

1. I think the code below is wrong: I think a script can have both the normal and jit ctor, and presumably both are eligible for destruction.

>+            if (discardScripts) {
>+                if (script->jitNormal &&
>+                    ScriptPoolDestroyed(cx, script->jitNormal, releaseInterval, counter)) {
>+                    mjit::ReleaseScriptCode(cx, script);
>+                    continue;
>+                }
>+                if (script->jitCtor &&
>+                    ScriptPoolDestroyed(cx, script->jitCtor, releaseInterval, counter)) {
>+                    mjit::ReleaseScriptCode(cx, script);
>+                }
>+            }

2. Please add comments on the use of the generation and destroy flags. At first I thought the code was wrong, until I realized that the generation number was being used to decide whether to destroy each script exactly once during a cycle. Maybe a different name for generation, too, to indicate that it records the gc number.
(In reply to comment #32)
> You're still debugging a leak, right? I do have a couple of initial comments,
> one of which might be the leak.
> 
> 1. I think the code below is wrong: I think a script can have both the normal
> and jit ctor, and presumably both are eligible for destruction.
> 
> >+            if (discardScripts) {
> >+                if (script->jitNormal &&
> >+                    ScriptPoolDestroyed(cx, script->jitNormal, releaseInterval, counter)) {
> >+                    mjit::ReleaseScriptCode(cx, script);
> >+                    continue;
> >+                }
> >+                if (script->jitCtor &&
> >+                    ScriptPoolDestroyed(cx, script->jitCtor, releaseInterval, counter)) {
> >+                    mjit::ReleaseScriptCode(cx, script);
> >+                }
> >+            }

This patch is green on try, it's the one for bug 577359 that has a mysterious leak.  The above code should be OK, ReleaseScriptCode will release both jitNormal and jitCtor so this will have the effect of releasing both JITScripts if either one is in a pool marked for destruction.  That's a little stupid (I can fix), but shouldn't affect the results much as scripts with both JITScripts are rare.
(In reply to comment #31)
> Brian, do you have any measurements showing the effect of the patch?

Some numbers I just collected off a debug build (compilation times are much higher than on a release build).

Allocated/Released are the sum of JITScript size and code associated with JITScripts.  PoolAllocated/PoolReleased are the size of the ExecutablePools which code is allocated from (directly measuring the amount of mmap'ed memory).

Start up, open 50 CAD tabs.

Compile: 7208 ms
Recompile: 3 ms
Allocated: 127.2 MB
Released: 29.7 MB
PoolAllocated: 109.7 MB
PoolReleased: 19.5 MB

Open up some mail applications, wait 10 minutes.

Compile: 13620 ms
Recompile: 149 ms
Allocated: 225.2 MB
Released: 147.5 MB
PoolAllocated: 186.1 MB
PoolReleased: 115.2 MB

Wait 10 more minutes, using email.

Compile: 14251 ms
Recompile: 617 ms
Allocated: 239 MB
Released: 198.9 MB
PoolAllocated: 197.8 MB
PoolReleased: 159 MB

Wait 30 more minutes (JSD2 meeting), reopen mail tabs.

Compile: 15292 ms
Recompile: 1614 ms
Allocated: 260.4 MB
Released: 249.7 MB
PoolAllocated: 216.3 MB
PoolReleased: 203.3 MB

What we'd like is for PoolReleased to track Released pretty well (so that we don't introduce much fragmentation), which it does.
(In reply to comment #34)

> 
> Allocated/Released are the sum of JITScript size and code associated with
> JITScripts.  PoolAllocated/PoolReleased are the size of the ExecutablePools
> which code is allocated from (directly measuring the amount of mmap'ed memory).

Am I right in thinking that without your patch PoolReleased and Released would always be zero?

Also, is (PoolAllocated - PoolReleased) the current amount of mmap'd memory used by code?

It's good that the final results show only a 10% time overhead for recompilation (1614ms vs 15292ms) but almost all of the code is released (260.4MB - 249.7MB).

Another question: the mmap'd space used for the native code is accompanied by some calloc'd space which holds ICs and other stuff.  Is it affected by this patch?
(In reply to comment #30)
> pool patch

Profiling w/ 10 cad-comic tabs.  It seems effective in that

1. without the patch, peak liveness from JSC::ExecutablePool::systemAlloc
   is about 37MB.  With it, peak liveness is 31.3MB.

2. over a period of about 30 minutes, most of that 31.3MB is discarded,
   so that only 3.3MB remains.

+1 from me.
Great news. Last question: have you run browser benchmarks with this on to make sure we don't regress there?
Comment on attachment 497506 [details] [diff] [review]
pool patch

Just realized I forgot to r+ it. There's just this advisory comment from before:

Please add comments on the use of the generation and destroy flags...Maybe a different name for generation, too, to indicate that it records the gc number.
Attachment #497506 - Flags: review?(dmandelin) → review+
http://hg.mozilla.org/tracemonkey/rev/bd9cfa70bf18
Whiteboard: [cib-memory] → [cib-memory][fixed-in-tracemonkey]
Duplicate of this bug: 618056
(In reply to comment #21)
> We also might want to be slightly more aggressive about tabs that have been
> opened in the background (i.e. that have never been active) to quickly discard
> all JITed code when mass-opening tabs or after startup.

I've thought about this some more. What we really need is a way for a GC to discard *all* JITed code (except the active stuff) on OOM conditions.

The original issue behind this bug is OOM crashes on startup (see bug 590674). Since only 1/8th of the scripts are dicarded after 2 minutes this can already be too little, too late.

I'm not sure if it's already implemented or is just planned for the future. But if the infallible malloc causes a GC as last-ditch effort then that GC should cleanse as much as it can.
#41 is right on point. We should add a "Purge-All-Caches" call that the embedding can call on OOM. The trace cache can be flushed as well.
http://hg.mozilla.org/mozilla-central/rev/bd9cfa70bf18
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
(In reply to comment #42)
> #41 is right on point. We should add a "Purge-All-Caches" call that the
> embedding can call on OOM. The trace cache can be flushed as well.

Should a new bug be created for this?
No longer depends on: 618056
As per today's meeting, beta 9 will be a time-based release. Marking these all betaN+. Please move it back to beta9+ if  you believe it MUST be in the next beta (ie: trunk is in an unshippable state without this)
No longer blocks: JaegerShrink
blocking2.0: beta9+ → betaN+
Why the removal of blocking bug 615199?
Also, this is already fixed on m-c so surely blocking status irrelevant?
Christian, did you also mean to remove 16 people from the CC list?

I think your script has gone a bit haywire, has made some strange changes to other bugs too... (eg marked already landed/fixed bugs as blocking betaN rather than beta9, removed dependencies etc)
Re-adding the CCs removed by Christians (presumably) malfunctioning script.

Also bump for comment 44, which might have been obscured in the muddle.
Depends on: 618056
Depends on: 623139
Presume mobile (in particular) can still do with the suggestion from comment 44? Has this been filed?
You need to log in before you can comment on or make changes to this bug.