Closed Bug 1534859 Opened 6 years ago Closed 6 years ago

Preliminary measurements to anticipate the impact of Shypes (and intermediate steps) on performance

Categories

(Core :: JavaScript Engine, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED

People

(Reporter: mgaudet, Assigned: mgaudet)

References

Details

Attachments

(2 files)

We are interested in perhaps pursuing Shypes; however, before we do that we have some research questions that should be answered to anticipate the impact on performance (CPU and memory usage).

Research Questions

  1. What fraction of shapes see objects allocated from multiple groups?
  2. Would rooting shape trees in object groups be a memory win or loss?
  3. How effective would our shape guards work in a world where a shape guard also implies group?
    • Would our shape guards fail more frequently because of mismatched groups?
    • How many GuardShape/GuardGroup pairs collapse into GuardShape?

Experiment Description:

To answer RQ1 and RQ2 I performed the following experimentation:

  1. Annotate calls to ShapedObject::initShape and ShapedObject::setShape. Dump The following:
    • Object pointer
    • Initial Shape (and depth of shape chain)
    • Final Shape (and depth of shape chain)
    • group (nullptr in case of lazy group)
  2. Annotate calls to JSObject::setGroup() to ensure we see group updates that happen during object construction.
  3. Annotate GC methods. Dump the following:
    • Finalization of object, shape and group.
    • Tenuring decisions: Allow tracking objects as they move from the nursery to the tenured heap.
  4. Run with
    • JIT compilers disabled: JIT compilers will allocate objects without exposing them to our hooks.
    • Compacting disabled: We need compaction disabled to have object identity preserved
    • UnboxedObjects disabled to control group changes.
    • Offthread GC: This is disabled to ensure we see GC events in the same timeline as all our other object events. This was accomplished by disabling all extra threads in a browser build.

Common Analysis Steps

Most of the analysis answering our research questions are based on the following basic analysis steps.

  1. Walk through the chonologically ordered log, and keep only the last seen object shape and group (see the Group Transition Threats to Validition). The processor uses versioned addresses to disambiguate between recycled garbage collected pointers.

  2. Walk through the list of objects with their final shapes and groups, and convert that to a list of shapes with a count of objects and set of unique groups seen, along with the count of objects seen for each group.

  3. For each shape and group combination output the number of objects with that shape and group, as well as the probability that an object with that given shape has the group. (shape+group object count divided by shape only object count).

Shapes include their depth in the chain, which can be used to weight the increase in size.

This final datastructure, a set of shapes and associated groups and object counts help with answering the research questions.

Memory Computation

Using previously gathered logs, assume that each shape of depth N with M groups requires (M - 1) * N new shapes to be allocated, and each object shrinks by one pointer (the group).

Each shape is measured to be 48 bytes on my machine.

Note: This would overestimate the amount of shapes to be allocated, as this computes the cost of duplicating shape lineages, but our use of shape trees would collapse some lineages.

Preliminary Website Reports

The following results have been copy and pasted directly from the analysis script;

Facebook

Visit facebook.com, then facebook.com/videogames

This log appears to require the allocation of 189695 more shapes

If we assume that everything allocated is live (bad assumption!) we could expect to see memory increase by -1216032 bytes
(48 * new shapes 189695  ) - (8  * allocated_objects 1290174 ) bytes

83.98993% shapes were had only one group; the remaining 16.01007% had multiple groups

Google Docs

Visit a complex google document, and type a bit

This log appears to require the allocation of 54895 more shapes

If we assume that everything allocated is live (bad assumption!) we could expect to see memory increase by -14913656 bytes
(48 * new shapes 54895  ) - (8  * allocated_objects 2193577 ) bytes

85.94751% shapes were had only one group; the remaining 14.05249% had multiple groups

CNN

Visit CNN.com, scroll to bottom

This log appears to require the allocation of 168190 more shapes

If we assume that everything allocated is live (bad assumption!) we could expect to see memory increase by -798768 bytes
(48 * new shapes 168190  ) - (8  * allocated_objects 1108986 ) bytes

79.77963% shapes were had only one group; the remaining 20.220367% had multiple groups

Wikipedia:

Visit en.wikipedia.com, and then visit featured article.

This log appears to require the allocation of 65027 more shapes

If we assume that everything allocated is live (bad assumption!) we could expect to see memory increase by -285392 bytes
(48 * new shapes 65027  ) - (8  * allocated_objects 425836 ) bytes

83.974594% shapes were had only one group; the remaining 16.025406% had multiple groups

Initial Analysis

RQ1: What fraction of shapes see objects allocated from multiple groups.

Depending on the scripts, about 1/5 seems to be a good estimate.

RQ2: Would rooting shape trees in object groups be a memory win or loss

The computation the current analysis script does is very simple; however, there does seem to be evidence that we'd be able to save memory by rooting the shapes under the object groups.

RQ3: How effective would our shape guards work in a world where a shape guard also implies group.

I don't have evidence for or against this yet: Answer TBD.

Threats to Validity

Group Transitions

We see objects changing their groups during runs at a couple of points:

Lazy groups

Objects may transition from having a lazy group to a regular group, and vice-versa. This presents a challenge to our group counting analysis.

The current analysis code only keeps track of the last group seen for a particular object.

newScript groups

  • UpdateShapeTypeAndValueForWritableDataProp changes the group of an object to the group specified by the newScript.
    • The way this is handled in our current analysis is to ignore the change in group and only count the final group. However, this may be incorrect: it may make more sense to track a group set.
  • CreateThisForFunctionWithGroup - This appears to be a transient group assignment that only has a short lifespan.

Browser Shutdown

The browser shuts down aggressively. This truncates our logs. To work around this, our parser accepts partially terminated inputs and generally reflects the state of the world at shutdown.

Object Liveness

Memory Estimates are done at the end, rather than a running estimate for every event, and so don't demonstrate the high water mark of degredation or savings.

Website data was gathered like this SPEW_FILE=$1 SPEW=ShypesData ./mach run --setpref security.sandbox.content.level=0 --setpref javascript.options.mem.gc_compacting=false --setpref javascript.options.baselinejit=false --setpref javascript.options.ion=false --disable-e10s $1

I should note: The data gathering patch is attached because there's definitely subjectivity in how all this is put together.

Followup Investigation

I may not get to this shortly, as I have some other work I'd like to get to, but I want to write down some of the followup investigation that could be done to help us determine whether or not to proceed.

Many-group'd shapes:

Some shapes see thousands of groups associated with them over the course of a log: For example, from my Wikipedia log, selecting the 10 shapes with the highest group counts we see

group_count shape.depth
       11141           1
        4058           2
        3821           2
        2959           1
        2953           1
        1876           1
        1747           2
        1159           1
        1136           1
         412           1

So we see lots of shapes with low depth and high group counts. There are a couple of causes we know about that could lead to this: object literals get their own groups, as do functions. However, it does seem odd that we see 11141 literals or functions with one property; that seems high, so it might be worth investigating where these come from and if they pose a problem for our proposals.

RQ3: How effective would our shape guards work in a world where a shape guard also implies group?

The logging I set up before probably won't suffice to answer this question. One way to answer it would be to instrument our interpreter to eventually compute for each bytecode a list a (Shape,Group) pairs seen -- from that we could estimate the effectiveness of our inline caches.

(In reply to Matthew Gaudet (he/him) [:mgaudet] from comment #2)

So we see lots of shapes with low depth and high group counts. There are a couple of causes we know about that could lead to this: object literals get their own groups, as do functions. However, it does seem odd that we see 11141 literals or functions with one property; that seems high, so it might be worth investigating where these come from and if they pose a problem for our proposals.

In case it helps: one way to do this would be to log group->clasp()->name (Function, Object, Array, etc).

Addressing RQ 3: How effective would our shape guards work in a world where a shape guard also implies group?

To address RQ3 I tweaked the instrumentation system developed above. This time I replaced the shape change instrumentation with instrumentation in the IC fallback stubs.

This instrumentation logged for each fallback stub the group and shape of each argument.

In addition to logging, the ICs were disabled so no ICs would attach, to ensure we sampled every single input to the fallback stubs. When the input is an object, we log the shape and group of said object, along with the current opcode, pc, and argument index (to allow differentiating between LHS and RHS for the same IC).

Analaysis

For each IC Argument (identified by the opcode, pc, and arg index), the following is done:

We gather the set of (Shape,Group) pairs. Then the number of shapes that see multiple groups is counted, incrementing a counter for each shape that saw more than one group.

  • A value of zero indicates that for every input to the IC all shapes seen were always seen with the same group.
  • A value of N indicates that N shapes saw more than one group.

This computed metric, which I have called Groups Per Shape for no good reason, is a rough surrogate metric for the number of ICs you might expect to have to attach over and above the ICs we would have attached by default.

Measurements

I used the following preferences

// Shypes Data Collection
// - Disable sandbox for logs
user_pref("security.sandbox.content.level", 0);
// - Disable compacting so we know what is live (no moving)
user_pref("javascript.options.mem.gc_compacting", false);
// - Disable ion so we see IC hits
user_pref("javascript.options.ion", false);

added to testing/profiles/perf/user.js .

Using the raptor testing suite, I gathered data from a couple of the raptor benchmarks: raptor-tp6-facebook-firefox, raptor-tp6-docs-firefox, raptor-tp6-reddit-firefox run with --page-cycles 1 as I didn't need repetition.

Results

Distribution:

Looking at the distribution of Groups-Per-Shape (GPS), it seems that mostly we see a power-law curve, where a value of zero ranges from 8-10x more frequent than a value of 1, and so on.

The largest value of GPS observed was 157.

High Water Mark Overhead

One way to interpret the values of GPS collected is that the sum of the GPS is the count of new ICs that might need to be attached in a world where Shape implies group.

We can summarize this value by giving the sum(GPS)/count(IC location), which estimates the increase in ICs required. This estimate is however a high-water mark, as it does not take into account the fact that is unlikely all these ICs would be live at the same time:

  • Facebook: 0.24
  • Reddit: 0.33
  • Docs: 0.02

The very low value for Docs corresponds to the fact that the Google Docs test appears to very shape-group monomorphic overall, with very little occurence of shape-group divergence.

Conclusion

My expectation is that our shape guards would largely work as they do today, with the small potential for increased IC chain length in some circumstances. Despite this, I don't think these data present a compelling fundamental obstacle to shypes.

Threats to validity:

  • I did not log script GC, so there's a possibility that different IC entries could be conflated. Because I keep track of the IC opcode, I don't consider it to be too severe a threat, but it is worth noting. Ultimately, if this were a problem, what we would expect to see were higher counts than if we had entirely removed this issue.

  • This is of course a preliminary study, and so I didn't continue executing it across all our raptor benchmarks, however the preliminary results are promising, and I don't think in general suggest any crazy blow out results. (ie, we didn't see a GSP ratio anywhere near 1)

(I should add as a cautionary note: There of course could be pathological cases where we see catastrophic performance collapse relative to the existing system. It's easy to imagine; however, it's diffcult to evaluate the impact of a particular site; we could estimate frequency by summarizing the count of entries that contribute to the final number, though how meaningful this will be overall.... I am not sure)

I think we can close this as complete.

At this point, we're crossing the rubicon into refactoring to support Shypes and prototyping to see Shypes impact, which seems outside the scope of this bug.

Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: