Closed Bug 932982 Opened 11 years ago Closed 10 years ago

IonMonkey: Don't throw away / invalidate active ion code during GC

Categories

(Core :: JavaScript Engine: JIT, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla29

People

(Reporter: h4writer, Assigned: bhackett1024)

References

(Blocks 1 open bug)

Details

(Keywords: perf, Whiteboard: [Shumway:m2])

Attachments

(1 file)

This used to be an utopia, but is coming closer and closer. The walls making this impossible are falling and there seems to be only a few things left, before this can become a reality.

This is important because we don't have GGC and a GC is still common. So every GC removes all ionscripts and we fall back to baseline. We need to compile every script again. As a result every GC adds some "startup time", before we are fast again. GC shouldn't push us on the ground and let us crawl, until we can stand up again.

With GGC this is also nice to have. We will still GC! So for keeping our performance during GC this will be much much much better. Especially for code that does graphical things. Constant speed is prefered in that case, without too big drops in performance.

I feel this bug is becoming more and more important. I spoke with nbp and terrence before to know what was needed, but didn't find time to look into this, especially since I'm fuzzy about a lot of details needed to implement this. Today Jan and Brian were talking about it. Both have more knowledge about it and Brian mentioned he would possibly take this after rounding up his other stuff. I'm opening this as a bug/feature/todo list. (Since I have other bugs that will depend on this)
Tested this on pdf.js with "gcPreserveCode()" and our score goes from 13873 to 17561. That would be a 26% improvement for pdf.js and bring us on par with chrome (v8).
Blocks: 807162
Since we expect Shumway to run pretty much permanently, this is of fairly high importance.
Blocks: 885526
Whiteboard: [Shumway:m2]
Attached patch patchSplinter Review
This patch removes the special type marking steps that need to be taken when we are preserving jitcode.  Instead of marking all scripts, type objects and singletons, we just mark the heap as normal and can collect whatever is dead.  Since discarding jitcode aggressively is still good for memory consumption this leaves the related heuristics unchanged, except that the lastCodeRelease catch is removed as we can still collect stuff when keeping jitcode around.  Presumably these heuristics can be relaxed to keep jitcode around more often once this patch goes in.

This patch seems stable with jit-tests with or without forcing ShouldPreserveJITCode to always return true.

Besides removal of markTypes, the changes this makes:

- Trace type constraints along with their associated type sets.  Since these constraints are what trigger invalidation they need to persist through the GC.  As with type sets though these just hold weak references.

- Remove NEW_SCRIPT_REGENERATE junk, which was a hack to deal with the fact we always threw away type constraints on GC.

- Compress the compiler outputs indexing jit compilations on GC, so that the array does not grow without bounds.

- Move some stuff from TypeCompartment to TypeZone for ease of use.

- Removal of the pending array on compartments.  This is kind of unrelated and is more cleanup; resolving constraints is no longer reentrant so the pending array isn't needed to avoid blowing the native stack.
Assignee: nobody → bhackett1024
Attachment #8338909 - Flags: review?(wmccloskey)
Attachment #8338909 - Flags: review?(jdemooij)
Depends on: 943667
Comment on attachment 8338909 [details] [diff] [review]
patch

Review of attachment 8338909 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/gc/Zone.cpp
@@ -85,5 @@
> -{
> -    /*
> -     * Mark all scripts, type objects and singleton JS objects in the
> -     * compartment. These can be referred to directly by type sets, which we
> -     * cannot modify while code which depends on these type sets is active.

Good to see this go.

::: js/src/jit/Ion.cpp
@@ +2541,1 @@
>      // If this script has Ion code on the stack, invalidation() will return

Pre-existing: s/invalidation()/invalidated()

::: js/src/jsinfer.cpp
@@ +4266,4 @@
>      /*
>       * Sweep analysis information and everything depending on it from the
>       * compartment, including all remaining mjit code if inference is
>       * enabled in the compartment.

This comment could use an update.
Attachment #8338909 - Flags: review?(jdemooij) → review+
Comment on attachment 8338909 [details] [diff] [review]
patch

Review of attachment 8338909 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks very much! Have you done measurements about the cost of the sweeping? We really need to do that before landing. I think we should run Gregor's MemBench test (http://gregor-wagner.com/tmp/mem) and record the time for PHASE_DISCARD_ANALYSIS in each slice, looking at how the distribution changes. Once you're ready to push this to try, can you post a link? I don't mind running the measurements.

Please update the comment here:
http://mxr.mozilla.org/mozilla-central/source/js/src/jsinfer.h#275
Also here? http://mxr.mozilla.org/mozilla-central/source/js/src/jsinfer.h#859

::: js/src/jsgc.cpp
@@ +2837,4 @@
>  ShouldPreserveJITCode(JSCompartment *comp, int64_t currentTime)
>  {
>      JSRuntime *rt = comp->runtimeFromMainThread();
>      if (rt->gcShouldCleanUpEverything || !comp->zone()->types.inferenceEnabled)

As long as you're touching this code, what's the purpose for the !comp->zone()->types.inferenceEnabled check here?

::: js/src/jsinfer.cpp
@@ +922,5 @@
> +        cx->zone()->types.addPendingRecompile(cx, script_);
> +    }
> +
> +    TypeConstraint *sweep(TypeZone &zone)
> +    {

Brace should go on previous line.

@@ +1031,5 @@
>                 ? property.maybeTypes()->isSubset(expected)
>                 : property.maybeTypes()->empty();
>      }
> +
> +    bool shouldSweep() { return false; }

I can't help but notice that every implementation of shouldSweep in the type constraints returns false. I guess they provide a nice place to comment on why we don't need to worry about sweeping, so it's probably okay.

@@ +1435,1 @@
>      {}

Braces should be moved up a line.

@@ +3120,5 @@
>  
>      void newType(JSContext *cx, TypeSet *source, Type type) {}
> +
> +    TypeConstraint *sweep(TypeZone &zone)
> +    {

Brace should go on previous line.

@@ +3176,5 @@
>              object->clearAddendum(cx);
>      }
> +
> +    TypeConstraint *sweep(TypeZone &zone)
> +    {

Brace should go on previous line.

::: js/src/jsinfer.h
@@ +1302,5 @@
>      bool pendingInvalidation_ : 1;
>  
> +    // During sweeping, the list of compiler outputs is compacted and invalidated
> +    // outputs are removed. This gives the new index for a valid compiler output.
> +    uint32_t sweepIndex_ : 29;

Can you put some code somewhere so that we MOZ_CRASH if we ever exceed 29 bits?

@@ +1428,5 @@
>      /* Pool for type information in this zone. */
>      static const size_t TYPE_LIFO_ALLOC_PRIMARY_CHUNK_SIZE = 8 * 1024;
>      js::LifoAlloc                typeLifoAlloc;
>  
> +    /* Valid & Invalid script referenced by type constraints. */

Could you improve this comment?
Attachment #8338909 - Flags: review?(wmccloskey) → review+
It just occurred to me that we can eliminate the code from bug 755604, either here or as a follow-up.
(In reply to Bill McCloskey (:billm) from comment #5)
> Thanks very much! Have you done measurements about the cost of the sweeping?
> We really need to do that before landing. I think we should run Gregor's
> MemBench test (http://gregor-wagner.com/tmp/mem) and record the time for
> PHASE_DISCARD_ANALYSIS in each slice, looking at how the distribution
> changes. Once you're ready to push this to try, can you post a link? I don't
> mind running the measurements.

This try run is looking pretty good.  Can you do the measurements?  Thanks!

https://tbpl.mozilla.org/?tree=Try&rev=3fffb7eca297
I measured and it looks pretty much like you would expect. Times get a little worse, but it's not too bad.

Here are the distributions for PHASE_DISCARD_ANALYSIS:

without this patch:
<5:53 <10:11 <15:8 <20:5 <25:0 <30:0 <35:1 <40:0

with this patch:
<5:41 <10:14 <15:6 <20:7 <25:1 <30:1 <35:0 <40:2

It's laid out as "<TIME:COUNT", so "<5:41" mean that 41 slices spent <5ms in PHASE_DISCARD_ANALYSIS. There are a few more slices on the high end with the patch, but it doesn't seem too bad. There's some natural variation between runs, so some of this could be noise anyway. The important thing is that there isn't a huge spike that was missing before.

I also looked at total slice times to make sure we're not blowing up somewhere else.

without this patch:
<12:269 <20:17 <30:19 <42:55 <50:10 <60:1 <80:0 <100:1 >=100:3

with this patch:
<12:210 <20:23 <30:19 <42:47 <50:10 <60:2 <80:1 <100:0 >=100:5

Again, looks similar but maybe a little worse. I looked specifically at the extra slices in the >=100 bucket. It looks like we just hit two additional non-incremental GCs in the run with the patch, which may also explain the higher numbers for PHASE_DISCARD_ANALYSIS.

Anyway, I feel comfortable taking the patch. The cost is reasonable, and it really will help us architecturally.
Keywords: perf
Restoring the test which causes us to throw away jitcode in non-inference-enabled compartments on every GC seems to make this nonsense orange go away.

https://hg.mozilla.org/integration/mozilla-inbound/rev/a29d7f5ddde6

https://tbpl.mozilla.org/?tree=Try&rev=83999c07630e
The assertion failure is actually a bug in bug 939614 exposed by an assertion added in this patch.  The mochitest-2 failure didn't happen in my Try run (https://tbpl.mozilla.org/?tree=Try&rev=e2bfcd435ff6) so it might be related to bug 939614 as well, but since you backed out both bugs at once we won't know I guess until this relands.
There were green M2 runs after bug 939614 landed until this patch landed.
Bug 939614 had several extant bugs when it landed which may have interacted with this patch.  As I said earlier, the mochitest-2 failure didn't happen on Try.
Well, this is still not showing up on try:

https://tbpl.mozilla.org/?tree=Try&rev=690be4075355

Does anyone have any ideas on how to deal with this?  I'm leaning towards WONTFIX.
https://hg.mozilla.org/integration/mozilla-inbound/rev/2e5ff5614254

Try looks good. Let's try relanding this with a clobber and see if that makes a difference.
Thanks Ryan!
https://hg.mozilla.org/mozilla-central/rev/2e5ff5614254
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
Blocks: 984537
You need to log in before you can comment on or make changes to this bug.