Closed Bug 564522 Opened 14 years ago Closed 13 years ago

analyze and speed up access-binary-trees.js

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: n.nethercote, Assigned: n.nethercote)

References

Details

Attachments

(1 file)

Attached file Cachegrind output
Sayre says:

> access-binary-trees is worrisome to me. It's the only test where The Problem
> isn't immediately obvious. Sometimes, it's easy to find bad growth
> strategies in our code base that contribute to allocation hotspots (string
> growth seems to be one). But not in this case. So it seems to be something
> more general and bad about the way we manage memory, but I don't have a good
> handle on what that is. v8 has a much more elaborate setup than we do, but
> webkit doesn't. Both do much better on this stuff.

I've attached an i386 Cachegrind profile, it gives line-by-line instruction
counts.  Note that when compiling with optimisation sometimes counts don't
get attributed to lines exactly how you'd expect, esp. in the presence of
inlining, but it gives a good idea.

js_AddProperty() features highly, it appears multiple times in the
function-by-function counts at the top, this is due to things being inlined
into it.  They sum to about 8.5M instructions, ie. just over 10%.

malloc, free and friends account for about 14.6M instructions, ie. about 18%.

I also see this sequence in the LIR:

      &js_ObjectClass = immi 136562080
      js_NewInstance1 = calli.sro #js_NewInstance ( cx &js_ObjectClass/*136562080*/ $global0 )
      eqi8 = eqi js_NewInstance1, NULL/*0*/
      xt7: xt eqi8 -> pc=0x92c89b2 imacpc=(nil) sp+40 rp+0 (GuardID=003)

      sti.s sp[8] = js_NewInstance1
      fi = immi 153962908
      sti.r rp[0] = fi/*153962908*/
      sti.s sp[40] = NULL/*0*/
      ldi10 = ldi.o $global0[12]
      sti.s sp[48] = ldi10
      ldi9 = ldi.o js_NewInstance1[4]
      ~JSSLOT_CLASS_MASK_BITS = immi -4
      andi3 = andi ldi9, ~JSSLOT_CLASS_MASK_BITS/*-4*/
      clasp = immi 136562688
      guard(class is With) = eqi andi3, clasp/*136562688*/
      xt6: xt guard(class is With) -> pc=0x92c8409 imacpc=(nil) sp+56 rp+4 (GuardID=004)

I.e. we create an object with clasp=js_ObjectClass and then immediately test
if clasp==js_WithClass.  Will that test ever succeed?

js_IsIdIndex() also features high for such a simple function.  All its calls
come from JSScope::updateFlags(), which all come from JSScope::extend().
Adding a lot of properties probably creates a lot of scopes, which we currently malloc. I have a patch to GC them. That might help quite a bit, in particular since our heap is thread local and avoids malloc/free locking. The reason the patch never landed is talos. It subtly changes our memory use and changes score X by Y on talos Z and I couldn't find out why.
I learned some things about this benchmark while trying to optimize adding properties with the PIC. 

- As you noticed, js_IsIdIndex takes a lot of the time (1/3 of the time inside js_AddProperty in my measurements). It can be specialized away at trace/PIC compile time by checking the id. There are other opportunities for using special cases here.

- They add 3 properties to the object, so we allocate a dslots. If we had more inline slots, we'd save an allocation on each object created.

- In our design, we have to make a new mutable scope for each object that actually gets a project, so we are allocating new scopes. If we got rid of JSScope, we wouldn't have to do that. This is bug 558451.

- We also need to take a lock to mutate the scope. Jason's GC futures stuff is going to get rid of that.
See also bug 561506 comment 13 for details on how to improve js_AddProperty.
(In reply to comment #1)
> Adding a lot of properties probably creates a lot of scopes, which we currently
> malloc. I have a patch to GC them. 

It would be nice to know what proportion of mallocs are from scopes, what proportion are from dslots, and what proportion are from everything else.
I'm going to take bug 558451 next week.

As David said in bug 564522, we really should not call js_IdIsIndex from (code jitted for) ops that access interned property names (atoms) -- that's just silly (my bad, layering and generality == slowness). Those ids must be lexically valid Identifiers (see the ECMA-262 lexical grammar), so cannot be indexes. Nick, want to file and fix?

/be
(In reply to comment #6)
> 
> As David said in bug 564522, we really should not call js_IdIsIndex from (code
> jitted for) ops that access interned property names (atoms) -- that's just
> silly (my bad, layering and generality == slowness). Those ids must be
> lexically valid Identifiers (see the ECMA-262 lexical grammar), so cannot be
> indexes. Nick, want to file and fix?

Filed as bug 564581.  I'm happy to attempt to fix, but I'll need to ask for some help on IRC along the way.
(In reply to comment #5)
> (In reply to comment #1)
> > Adding a lot of properties probably creates a lot of scopes, which we currently
> > malloc. I have a patch to GC them. 
> 
> It would be nice to know what proportion of mallocs are from scopes, what
> proportion are from dslots, and what proportion are from everything else.

It's dead easy to get this data from Shark. Here you go:

http://people.mozilla.com/~sayrer/2010/05/07/sunspider/mshark/access-binary-trees-malloc.mshark

Technique:

1.) visit http://people.mozilla.com/~sayrer/2010/05/07/sunspider/shark-instrumented/driver.html in a Shark build.

2.) set Shark to Sampling -> Programmatic 

3.) set the Sampling mode to Malloc Trace

4.) uncheck "Record Only Active Blocks"

5.) click the access-binary-trees link

This profile shows that 98.1% of calls to malloc were from js_GetMutableScope. 2% came from js_Interpret, while 96% came from js_AddProperty.
So, bug 558451. Mine!

Comment 3 had this item:

> - They add 3 properties to the object, so we allocate a dslots. If we had more
> inline slots, we'd save an allocation on each object created.

But if the object is an instance of Object, then there are three fslots free after proto and parent. Comment 8 seems to confirm, dslots allocations not an issue.

Am I interpreting this right? It would be good to know that, for exactly access-binary-trees.js, we have no dslots issue.

We can make objects bigger by default based on more general workloads. There are some bugs touching on this: bug 555128 (which I claim is mis-summarized, we do not want to merge fslots and dslots and lose small-constant-offsets-on-trace); bug 502736 has some data on object size populations.

/be
The even better bugs for right-sizing objects are bug 547327 and bug 557407.

/be
Depends on: 564581
Summary: analyze access-binary-trees.js → analyze and speed up access-binary-trees.js
(In reply to comment #0)
>
> I also see this sequence in the LIR:
> 
>       js_NewInstance1 = calli.sro #js_NewInstance ( cx &js_ObjectClass/*136562080*/ $global0 )
>       ...
>       ldi9 = ldi.o js_NewInstance1[4]
>       ~JSSLOT_CLASS_MASK_BITS = immi -4
>       andi3 = andi ldi9, ~JSSLOT_CLASS_MASK_BITS/*-4*/
>       clasp = immi 136562688
>       guard(class is With) = eqi andi3, clasp/*136562688*/
>       xt6: xt guard(class is With) -> pc=0x92c8409 imacpc=(nil) sp+56 rp+4
> (GuardID=004)
> 
> I.e. we create an object with clasp=js_ObjectClass and then immediately test
> if clasp==js_WithClass.  Will that test ever succeed?

I filed bug 565250 to investigate this.
Depends on: 565250, 558451
We're fast enough on access-binary-trees now -- not as fast as either V8 or JSC, but close enough for it not to be a worry.  Bug 558451 undoubtedly helped a lot here.
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: