Closed Bug 365851 (js-propcache) Opened 18 years ago Closed 16 years ago

get/set/call optimizations (return of the property cache)

Categories

(Core :: JavaScript Engine, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla1.9beta4

People

(Reporter: brendan, Assigned: brendan)

References

Details

(Keywords: perf)

Attachments

(5 files, 22 obsolete files)

27.68 KB, patch
Details | Diff | Splinter Review
44.64 KB, application/x-tar
Details
20.16 KB, patch
igor
: review+
brendan
: approval1.9+
Details | Diff | Splinter Review
12.59 KB, text/plain
Details
201.78 KB, patch
shaver
: review+
Details | Diff | Splinter Review
First patch here will re-factor getter and setter code to pave the way for further optimizations.

/be
Attached patch cleanup patch (obsolete) — Splinter Review
Attachment #250391 - Flags: review?(igor.bukanov)
Attachment #250391 - Flags: review?(igor.bukanov) → review+
Attached patch cleanup patch, v2 (obsolete) — Splinter Review
Attachment #250391 - Attachment is obsolete: true
Attachment #250497 - Flags: review?(igor.bukanov)
Attachment #250497 - Flags: review?(igor.bukanov) → review+
Landed on trunk:

Checking in jsapi.c;
/cvsroot/mozilla/js/src/jsapi.c,v  <--  jsapi.c
new revision: 3.301; previous revision: 3.300
done
Checking in jscntxt.c;
/cvsroot/mozilla/js/src/jscntxt.c,v  <--  jscntxt.c
new revision: 3.98; previous revision: 3.97
done
Checking in jscntxt.h;
/cvsroot/mozilla/js/src/jscntxt.h,v  <--  jscntxt.h
new revision: 3.134; previous revision: 3.133
done
Checking in jsdbgapi.c;
/cvsroot/mozilla/js/src/jsdbgapi.c,v  <--  jsdbgapi.c
new revision: 3.83; previous revision: 3.82
done
Checking in jsgc.c;
/cvsroot/mozilla/js/src/jsgc.c,v  <--  jsgc.c
new revision: 3.196; previous revision: 3.195
done
Checking in jsinterp.c;
/cvsroot/mozilla/js/src/jsinterp.c,v  <--  jsinterp.c
new revision: 3.317; previous revision: 3.316
done
Checking in jsobj.c;
/cvsroot/mozilla/js/src/jsobj.c,v  <--  jsobj.c
new revision: 3.312; previous revision: 3.311
done
Checking in jsobj.h;
/cvsroot/mozilla/js/src/jsobj.h,v  <--  jsobj.h
new revision: 3.60; previous revision: 3.59
done
Checking in jsscope.c;
/cvsroot/mozilla/js/src/jsscope.c,v  <--  jsscope.c
new revision: 3.56; previous revision: 3.55
done
Checking in jsscope.h;
/cvsroot/mozilla/js/src/jsscope.h,v  <--  jsscope.h
new revision: 3.41; previous revision: 3.40
done

Onward!

/be
Status: NEW → ASSIGNED
This was a >2K code size reduction in jsinterp.o on my MacBook Pro, using -Os.  But it seems on argo-vm and luna to have bloated jsinterp.o.  Lack of -Os or -O2?

Anyone who can test Windows before/after jsinterp.obj size, please report it. Thanks,

/be
(In reply to comment #4)
> Anyone who can test Windows before/after jsinterp.obj size, please report it.

MSVC7.1 optimized:
Before: 64,924 bytes
After : 61,754 bytes
So we're using the wrong gcc options, I guess.  See bug 255433.

/be
Makefile.ref uses -O2 by default on Win32, my before/after results are:

Before patch applied:
jsinterp.obj 74387

After patch applied:
jsinterp.obj 72869


Testing with -O1 (which cl claims optimizes for minimum size) yields:

Before Patch applied:
jsinterp.obj 59343

After Patch applied:
jsinterp.obj 56935
Maybe I'm being dense, but I don't see the compile-flags relation to bug 255433
(In reply to comment #8)
> Maybe I'm being dense, but I don't see the compile-flags relation to bug 255433

Sorry, typoed bug 225433, which Stan Shebs has kindly adopted.

/be
Top of last /tmp/calltable.dump.N file after startup/load-gmail/click-message:

Method call summary:
   method callsites:   1801
 total method calls:  17240
native method calls:  12631 (73.27%)
  monomorphic calls:  12373 (71.77%)
monomorphic natives:   8595 (49.85%)
polymorphic-2 calls:   2057 (11.93%)
polymorphic-3 calls:    838 (4.86%)
polymorphic-4 calls:    618 (3.58%)
polymorphic-5 calls:    161 (0.93%)
polymorphic-6 calls:    120 (0.70%)
polymorphic-7 calls:     30 (0.17%)
polymorphic-8 calls:   1043 (6.05%)
  megamorphic calls:      0 (0.00%)

The majority of calls are native; a majority of callsites are monomorphic; over two-thirds of native callsites are monomorphic. There are zero megamorphic calls with a PIC of size 8, < 10% with size 4.

/be
Generalizing summary to fit problem space.

/be
Summary: getter and setter hard cases bloat and slow down critical paths → get/set/call optimizations
Suspect a bug due to mode at polymorphic-8 and 0 megamorphic... more tomorrow.

/be
Latest results (I've seen runs with non-zero megamorphic counts, so forget about the bug worry):

Method call summary:
        method callsites:   2742
      total method calls:  33663
   computed method calls:   3162
prototype chain searches:  31969
   prototype chain steps:  25279
    scope chain searches:    252
       scope chain steps:     59
  anonymous method calls:   2660
     native method calls:  25537 (75.86%)
     monomorphic natives:  15629 (46.43%)
       monomorphic calls:  23092 (68.60%)
     polymorphic-2 calls:   2127 (6.32%)
     polymorphic-3 calls:   1804 (5.36%)
     polymorphic-4 calls:   1270 (3.77%)
     polymorphic-5 calls:    494 (1.47%)
     polymorphic-6 calls:    414 (1.23%)
     polymorphic-7 calls:    615 (1.83%)
     polymorphic-8 calls:    685 (2.03%)
       megamorphic calls:      0 (0.00%)

This was startup/gmail/message/click-on-bugzilla-link/quit-closing-two-tabs.

Considering the low number of call sites, we could have a 4-entry PIC and use less than 175K.  Each entry would be either a method (function object) pointer, or an sprop (as with the old direct-mapped runtime-wide property cache that I recently removed).  It should be possible to split cases further to optimize for stub vs. non-stub getter and setter.

Since almost 80% of method calls search up one step in the prototype chain, would like to optimize to avoid looking in the directly-referenced receiver object when the cache says we're hitting one object "up" the prototype chain.  Notice how the scope chain searches hit on the head object more often (>75%, and such function or constructor calls are small potatoes compared to method calls).

/be
Attachment #250933 - Attachment is obsolete: true
I added function bytecode length dumping (at the end of the header line for each function), and sure enough:

$ grep ' calls ' /tmp/calltable.dump.3 | sort -n +2 -3 | tail -20
file:///Users/brendaneich/Hacking/trunk/mozilla/dist/MinefieldDebug.app/Contents/MacOS/components/nsExtensionManager.js:8085 calls 88 (4) argc 1/1 length 94
http://mail.google.com/mail/?view=page&name=js&ver=1u6fu1tdvx6hk:668 calls 101 (0) argc 0/0 length 24
http://mail.google.com/mail/?view=page&name=js&ver=1u6fu1tdvx6hk:668 calls 103 (0) argc 0/0 length 104
http://mail.google.com/mail/?view=page&name=js&ver=1u6fu1tdvx6hk:668 calls 103 (0) argc 0/0 length 54
http://mail.google.com/mail/?view=page&name=m_base&ver=aycm2mrckk0u:2 calls 106 (60) argc 1/1 length 149
chrome://global/content/globalOverlay.js:113 calls 112 (92) argc 1/1 length 92
chrome://global/content/globalOverlay.js:132 calls 112 (92) argc 2/2 length 79
chrome://global/content/globalOverlay.js:85 calls 112 (92) argc 1/1 length 238
http://mail.google.com/mail/?view=page&name=js&ver=1u6fu1tdvx6hk:909 calls 116 (0) argc 1/1 length 65
http://mail.google.com/mail/?view=page&name=m_base&ver=aycm2mrckk0u:25 calls 122 (101) argc 1/1 length 36
http://mail.google.com/mail/?view=page&name=js&ver=1u6fu1tdvx6hk:973 calls 133 (0) argc 0/0 length 5
http://mail.google.com/mail/?view=page&name=m_base&ver=aycm2mrckk0u:27 calls 138 (124) argc 1/1 length 11
file:///Users/brendaneich/Hacking/trunk/mozilla/dist/MinefieldDebug.app/Contents/MacOS/components/nsExtensionManager.js:6929 calls 159 (236) argc 2/2 length 83
file:///Users/brendaneich/Hacking/trunk/mozilla/dist/MinefieldDebug.app/Contents/MacOS/components/nsExtensionManager.js:6949 calls 159 (109) argc 2/2 length 101
file:///Users/brendaneich/Hacking/trunk/mozilla/dist/MinefieldDebug.app/Contents/MacOS/components/nsSafebrowsingApplication.js:1867 calls 162 (122) argc 1/1 length 10
http://mail.google.com/mail/?view=page&name=m_base&ver=aycm2mrckk0u:25 calls 164 (128) argc 1/1 length 13
chrome://browser/content/bookmarks/bookmarksMenu.js:1013 calls 181 (281) argc 4/4 length 101
file:///Users/brendaneich/Hacking/trunk/mozilla/dist/MinefieldDebug.app/Contents/MacOS/components/nsExtensionManager.js:7878 calls 188 (161) argc 3/3 length 104
chrome://browser/content/browser.js:3371 calls 191 (0) argc 0/0 length 60
chrome://browser/content/bookmarks/bookmarksMenu.js:1040 calls 214 (113) argc 2/2 length 53

The 20 most frequently called functions tend to be small.  The top 10 might even inline into the PIC space if these functions are called from monomorphic sites. Have to work on the dump format and stats a bit more....

/be
Depends on: 335700
Depends on: 367080
Depends on: 366292
Depends on: 366396
Hope to make up for loss of property cache (bug 128150 by way of bug 408144), and then some.

/be
Flags: wanted1.9+
Priority: -- → P1
Target Milestone: mozilla1.9alpha1 → mozilla1.9 M11
Almost all callsites (all native method callsites) are monomorphic:

callsites 2047 mean polymorphism 1.03566 sigma 0.267493

hits           1765843
misses          126516
overloaded           0
megamorphic       3182
erecycles       126808
trecycles       129787
esweeps            292
tsweeps            311

No need for poly- when mono- will do, and we aren't inlining for 1.9 (tracing after 1.9 for scripted functions, and natives can't be inlined), so I'm focusing on caller->callee fast path using a single-entry cache per callsite.

Also want to avoid memoizing function values in the property tree. Finally, want to handle getters and setters but may split that out into a followup bug.

/be
From the patch, megamorphic here means > 4 callees for the cached callsite -- collisions in the two-way associative direct cache will overwrite some entries, possibly undercounting these; see |erecycles - esweeps|. Still highly likely to be one callee per callsite in our code, only a bit less likely in gmail code.

/be
Preliminary patch to clear out this old code, which I do not want to maintain or leave bloating the source and rusting. Better approach is the jspic.[ch] approach, which could be improved to have fewer collisions, but which has the virtue of being out-of-line and less intrusive. Could even use (possibly beefed up) debugger call hooks for zero source code touch plugability.

I'd like to get this in today, ASAP. Thanks,

/be
Attachment #293190 - Flags: review?(igor)
Attachment #293190 - Flags: review?(igor) → review+
Comment on attachment 293190 [details] [diff] [review]
remove DUMP_CALL_TABLE

a=me, not part of build.

/be
Attachment #293190 - Flags: approval1.9+
DUMP_CALL_TABLE removed:

js/src/jsgc.c 3.257
js/src/jsinterp.c 3.394
js/src/jsinterp.h 3.68
js/src/jsobj.c 3.407
js/src/jsstr.c 3.179

/be
Blocks: 408144
Blocks: 410994
Flags: blocking1.9?
Attached patch work in progress, backup mostly (obsolete) — Splinter Review
Flags: blocking1.9? → blocking1.9+
Depends on: 411630
Attached patch wip backup 2, getting closer (obsolete) — Splinter Review
Passes js testsuite (I exclude spidermonkey-n.tests and slow-n.tests) -- bc, cayou try it out some time? More work to do, missing capability revocation on scopechain shadowing, e.g.

/be
Attachment #295891 - Attachment is obsolete: true
(In reply to comment #23)

> 
>  bc, > can you try it out some time? 

sure, I normally don't run with slow-n.tests either. Should I? I'll make a stab a bit later with browser|shell. If you have a wip 3 patch in the works, let me know so I can test that instead.

(In reply to comment #23)

Lots of Assertion failure: PCVAL_IS_NULL(v) || PCVAL_IS_SPROP(v), at jsinterp.c:128

just one example of many: ecma/Types/8.6.2.1-1.js shell

#0  JS_Assert (s=0x81365c4 "PCVAL_IS_NULL(v) || PCVAL_IS_SPROP(v)", file=0x8136568 "jsinterp.c", ln=128) at jsutil.c:63
#1  0x0809b7fa in js_FillPropertyCache (cache=0x9bf8a70, pc=0x9c0a75b "6", scopeIndex=0, obj=0xb7f4bae0, protoIndex=0, sprop=0x9c13910, entry=0x9bfb3c8) at jsinterp.c:128
#2  0x080c9222 in js_SetPropertyHelper (cx=0x9c00e08, obj=0xb7f4bae0, id=-1208674788, vp=0xbf92ec60, entry=0x9bfb3c8) at jsobj.c:3775
#3  0x080ae484 in js_Interpret (cx=0x9c00e08, pc=0x9c0a75b "6", result=0xbf92ef88) at jsinterp.c:3815
#4  0x0809d4d5 in js_Invoke (cx=0x9c00e08, argc=4, vp=0x9c05880, flags=1) at jsinterp.c:1178
#5  0x0809e984 in js_InvokeConstructor (cx=0x9c00e08, vp=0x9c05880, argc=4) at jsinterp.c:1748
#6  0x080a8ef3 in js_Interpret (cx=0x9c00e08, pc=0x9c1239b "#", result=0xbf92f6b4) at jsinterp.c:3402
#7  0x0809dcba in js_Execute (cx=0x9c00e08, chain=0xb7f50000, script=0x9c122c8, down=0x0, flags=0, result=0xbf92f75c) at jsinterp.c:1403
#8  0x08061aa8 in JS_ExecuteScript (cx=0x9c00e08, obj=0xb7f50000, script=0x9c122c8, rval=0xbf92f75c) at jsapi.c:4774
#9  0x0804a270 in Load (cx=0x9c00e08, obj=0xb7f50000, argc=1, argv=0x9c0585c, rval=0xbf92f7f0) at js.c:626
#10 0x0809d449 in js_Invoke (cx=0x9c00e08, argc=1, vp=0x9c05854, flags=0) at jsinterp.c:1162
#11 0x080afc29 in js_Interpret (cx=0x9c00e08, pc=0x9c0df9e ":", result=0xbf92fe94) at jsinterp.c:4084
#12 0x0809dcba in js_Execute (cx=0x9c00e08, chain=0xb7f50000, script=0x9c0df60, down=0x0, flags=0, result=0xbf930f68) at jsinterp.c:1403
#13 0x08061aa8 in JS_ExecuteScript (cx=0x9c00e08, obj=0xb7f50000, script=0x9c0df60, rval=0xbf930f68) at jsapi.c:4774
#14 0x0804980d in Process (cx=0x9c00e08, obj=0xb7f50000, filename=0x0, forceTTY=0) at js.c:306
#15 0x08049f08 in ProcessArgs (cx=0x9c00e08, obj=0xb7f50000, argv=0xbf9310f8, argc=0) at js.c:532
#16 0x0804f216 in main (argc=0, argv=0xbf9310f8, envp=0xbf9310fc) at js.c:3561


or in the browser during startup: Assertion failure: PCVAL_IS_NULL(v) || PCVAL_IS_SPROP(v), at /work/mozilla/builds/1.9.0-test/mozilla/js/src/jsinterp.c:128

wip'it up again, maestro!
Silly bogus assertion -- pay it no mind ;-). New patch soon, opt testing of that patch instead of dbg should turn out ok (pls. holler if not).

/be
Plenty of crashes (signal 11's) on linux (centos5) and maci386 in the browser version of the test but with one crash which was not repeatable, I could not reproduce it locally on fedora7 or on maci386. Not sure I trust that. I'll test the new patch when its available.
Attached file GMail crash stack
Keywords: perf
Depends on: 413097
Attached patch Daddy brings home the bacon! (obsolete) — Splinter Review
Seems stable -- sayrer, go nuts.

Needs to be cleaned up and be split into blocking bugs, for less mega-fix-mashup landing/auditing pain.

Narrative about what this is soon, although Ecma TC39 meeting will take time the next two days.

/be
Attachment #296794 - Attachment is obsolete: true
bc any chance you can run through the regression suite (if you haven't already)
I had been running the suite until I moved to Firefox to battle with gmail (which does all sorts of fun prototype chain foreshadowing); running suite now, seeing fresh problems. Bob, no need to attach results here, I should have a new patch at some point today.

/be
(In reply to comment #32)

k. 
Attached patch Makin' bacon on the beach... (obsolete) — Splinter Review
This passes the suite with the same test failures (failure details differ in one case due to assertion differences here and on the trunk). Bob, please confirm.

Sayrer, I'm hoping you can take this for a spin.

/be
Attachment #298896 - Attachment is obsolete: true
Attachment #298900 - Attachment is obsolete: true
I'm running the shell test suite (e4x, ecma*, js1_[1-8]; excluding slow-n.tests) under valgrind on sm-valgrind01 now, it'll take a couple of hours. (Restarting the shell and reloading the drivers for every test takes its toll!)

Note to self or sayrer:
  cd tests/v-logs-bacon; ../../summarize-valgrind-logs.sh
  summary-errors-only.txt is the compact output (test nerrors ncontext)
  summary-unified.txt is the full list for sanity checking

Have a sunspider-under-valgrind run going now as well.
Sunspider is valgrind-clean on 64-bit Linux.
Attachment #299534 - Attachment is obsolete: true
Attachment #299535 - Attachment is obsolete: true
(In reply to comment #34)
> Created an attachment (id=299534) [details]

> This passes the suite with the same test failures (failure details differ in
> one case due to assertion differences here and on the trunk). Bob, please
> confirm.

Tested in shell on linux. I didn't seen any difference with this patch in either opt or debug builds. I'll try attachment 299713 [details] [diff] [review]) on other platforms later today.

Took 10 hours, but valgrind says you're clean on e4x+ecma*+js1_*.
Alias: js-propcache
Depends on: 414452
Attachment #299713 - Attachment is obsolete: true
Attachment #299714 - Attachment is obsolete: true
Fix was this, in jsscope.c:

-#define SPROP_FLAGS_NOT_MATCHED SPROP_MARK
+#define SPROP_FLAGS_NOT_MATCHED (SPROP_MARK | SPROP_FLAG_PCTYPE_REGEN)

/be
Attachment #299958 - Attachment is obsolete: true
Attachment #299959 - Attachment is obsolete: true
Does this need review?
Beltzner, we're getting there. This is going to go in tomorrow, I hope -- special dispensation. The prior patches are getting in (last one I want to split out is bug 414452).

/be
Sorry - it was an honest question, as no review flag was set, so I didn't know if it was pending, etc. As you were!
Attached patch better bacon (obsolete) — Splinter Review
Still have to deal with some issues, tomorrow.

/be
Attachment #300229 - Attachment is obsolete: true
Attached patch better bacon, passing mochitests (obsolete) — Splinter Review
I need to rest and write some comments tomorrow. Shaver has heard some of the pitch, and read some whiteboard scribblings. His suggestions for where to write a sentence or an essay are welcome.

/be
Attachment #300624 - Attachment is obsolete: true
Attachment #300827 - Flags: review?(shaver)
Attached patch without debugging glorp (obsolete) — Splinter Review
Attachment #300827 - Attachment is obsolete: true
Attachment #300827 - Flags: review?(shaver)
Attached patch lightly polished for review (obsolete) — Splinter Review
I'm too sick to write good comments where needed, but shaver will guide me.

/be
Attachment #250497 - Attachment is obsolete: true
Attachment #250999 - Attachment is obsolete: true
Attachment #300911 - Attachment is obsolete: true
Attachment #301231 - Flags: review?(shaver)
Attachment #301231 - Attachment is obsolete: true
Attachment #301235 - Flags: review?(shaver)
Attachment #301231 - Flags: review?(shaver)
Pushing this out to block beta 4 ...
Target Milestone: mozilla1.9beta3 → mozilla1.9beta4
Don't let the TM deter you - we want this in on the trunk as fast as humanly possible :-)  

Attached patch latest patch (obsolete) — Splinter Review
Some other checkin regressed gmail. My last pull was 1/31. Grumble. Rebuilding and testing, want to attach this before retiring. More tomorrow.

/be
Attachment #301235 - Attachment is obsolete: true
Attachment #301457 - Flags: review?(shaver)
Attachment #301235 - Flags: review?(shaver)
Updating cured whatever ailed gmail. Not this patch, which seems to be working well. Shaver, you reviewing?

/be
Yeah, I'm a little under halfway through, hope to get done tonight.  So far my list is just a couple of places that want more expansive comments, and a wish that you hadn't rolled as many clean-up fixes into such a patch. :)
(In reply to comment #57)
> Yeah, I'm a little under halfway through, hope to get done tonight.  So far my
> list is just a couple of places that want more expansive comments,

Great -- lay them on me sooner rather than later.

> and a wish
> that you hadn't rolled as many clean-up fixes into such a patch. :)

My weakness, but I think you protest too much: the only such clean-up left (I split real work into blocking bugs) is the JOF_TYPE/JOF_OPTYPE macrology, and that is easy on the eyes.

I'm a bit concerned that I'll ding Ts/Txul/Tp somehow (e.g., top-level scripts are manually managed, not GC-managed, so js_FlushPropertyCacheForScript may rear its ugly head). I'd like to check into a quiet tree for a test cycle or three. Suggestions?

/be
> I'm a bit concerned that I'll ding Ts/Txul/Tp somehow (e.g., top-level scripts
> are manually managed, not GC-managed, so js_FlushPropertyCacheForScript may
> rear its ugly head). I'd like to check into a quiet tree for a test cycle or
> three. Suggestions?
> 

Let's close the tree for this.  Just need to know when and we'll do it.
I was thinking of the OBJ_SET_SLOT -> STOBJ_SET_SLOT changes as cleanup -- if there is other significance to them I haven't figured it out yet, but I am still not to the exciting conclusion!

The comment preceding JSRuntime.pcTypeGen says the GC prevents wraparound, but by my eyes it's js_GeneratePCType that does so, and the GC merely assists by compressing the space.  In the js_GeneratePCType code, we don't handle the case in which GC can't free up a pctype, other than by asserting in DEBUG builds.  If that case is safe for non-DEBUG, it'd be good to have a comment there expressing how; likewise in js_TraceObject and js_SweepScopeProperties.  (Is the code in js_GC re-enabling the property cache only if the overflow bit is unset sufficient?  I'm pondering the other cases in which we increment rt->pcTypeGen with only an assertion as defense.)

js_EmitTree's TOK_DOT/TOK_NAME sharing via fall-through is pretty convoluted now, and I wonder if it's worth it in order to avoid duplication of the JSOP_DUP and EMIT_INDEX_OP (though without the ternary operator selection, were it split out).

+#define PCVCAP_SCOPEMASK_LO     JS_BITMASK(PCVCAP_SCOPEBITS)
+#define PCVCAP_SCOPEMASK_HI     (PCVCAP_SCOPEMASK_LO << PCVCAP_PROTOBITS)

I think HI wants to shift by SCOPEBITS, if that's what's defining the width of LO.  (Saved by the numeric coincidence!)

Have to finish making dinner now, back after to untangle the horrible diff produced in js_Interpret under the out: label!
(In reply to comment #60)
> I was thinking of the OBJ_SET_SLOT -> STOBJ_SET_SLOT changes as cleanup -- if
> there is other significance to them I haven't figured it out yet, but I am
> still not to the exciting conclusion!

They are totally significant (I was a bit over-eager before the most recent patches that fixed this).

The layering is such that all native object arbitrary slot setting must funnel through LOCKED_OBJ_WRITE_BARRIER/GC_WRITE_BARRIER, to re-brand the scope when a branded scope's object sees function-valued properties (aka "methods") change value (old one overwritten, new one replacing a non-function value).

To make a long story short, most slot-setting was properly monitored, but I had forgotten to fix js_SetSlotThreadSafe to use the right macros. Recall that the OBJ_SET_SLOT macro calls js_SetSlotThreadSafe if the object is not native or if its scope->ownercx != cx. Yes, all my JS_THREADSAFE tests involved request-locked, contention-free object usage.

Fixing this left assertions when bad old pre-fslots/dslots code used OBJ_SET_SLOT on the well-known fslots, especially JSSLOT_PRIVATE but also JSSLOT_PROTO and JSSLOT_PARENT.

> The comment preceding JSRuntime.pcTypeGen says the GC prevents wraparound, but
> by my eyes it's js_GeneratePCType that does so, and the GC merely assists by
> compressing the space.

Good point, I will re-word.

> In the js_GeneratePCType code, we don't handle the case
> in which GC can't free up a pctype,

Nit: it's not a matter of freeing anything -- the GC renumbers the live world. But I know what you mean: In case there are 2^28 distinct branded structural types and prefixes of those types.

> other than by asserting in DEBUG builds.

I do not expect this to happen in practice, but I would like to know. Am I abusing an assertion? Could do a printf. Anyway, overflow is not a problem, because in this unlikely event (2^28 distinct pctypes among the property tree nodes and live scopes) the GC leaves the property cache disabled, until the regenerated type count reduces to not overflow.

Oops, it disables only the current cx->thread's cache! Will fix in revised patch along with other stuff you point out. Thanks.
 
> If that case is safe for non-DEBUG, it'd be good to have a comment there
> expressing how;

Will be safe once all caches are disabled ;-).

> likewise in js_TraceObject and js_SweepScopeProperties.  (Is
> the code in js_GC re-enabling the property cache only if the overflow bit is
> unset sufficient?  I'm pondering the other cases in which we increment
> rt->pcTypeGen with only an assertion as defense.)

It has to increment, or we get aliasing. Are you worried about it wrapping in 32 bits before the GC notices? I don't think that's possible, at least not on 32-bit systems -- too many sprops and scopes.

> js_EmitTree's TOK_DOT/TOK_NAME sharing via fall-through is pretty convoluted
> now, and I wonder if it's worth it in order to avoid duplication of the
> JSOP_DUP and EMIT_INDEX_OP (though without the ternary operator selection, were
> it split out).

Will give it a try.

> +#define PCVCAP_SCOPEMASK_LO     JS_BITMASK(PCVCAP_SCOPEBITS)
> +#define PCVCAP_SCOPEMASK_HI     (PCVCAP_SCOPEMASK_LO << PCVCAP_PROTOBITS)
> 
> I think HI wants to shift by SCOPEBITS, if that's what's defining the width of
> LO.  (Saved by the numeric coincidence!)

No, PROTOBITS. Low-order nybble of vcap is protobits, high-order nybble is scope. This is the typical bitfield defining idiom where a non-right-justified field's mask is left-shifted by the width of the right-justified field after it.

> Have to finish making dinner now, back after to untangle the horrible diff
> produced in js_Interpret under the out: label!

Sorry, diff -w is clean there. Check out earlier -w attachments here. You caught me, that was purely an indentation fix (Igor got off by one due to a major comment ;-).

/be
Tired, sick: for 2^28 above read 2^24. Still way more types than any 32-bit embedding is likely to see. Will ponder 64-bit scaling.

/be
(In reply to comment #61)
> > In the js_GeneratePCType code, we don't handle the case
> > in which GC can't free up a pctype,
> 
> Nit: it's not a matter of freeing anything -- the GC renumbers the live world.
> But I know what you mean: In case there are 2^28 distinct branded structural
> types and prefixes of those types.

Yeah, I meant "make available a no-longer-live pctype".

> Oops, it disables only the current cx->thread's cache! Will fix in revised
> patch along with other stuff you point out. Thanks.

I'd like to pretend that this is the case I was eventually going to come to, if all readers here would indulge my delusion.

> It has to increment, or we get aliasing. Are you worried about it wrapping in
> 32 bits before the GC notices? I don't think that's possible, at least not on
> 32-bit systems -- too many sprops and scopes.

Yeah, that was the case I was worried about, possibly on 64-bit systems.  Agree that it's far-fetched.

> > +#define PCVCAP_SCOPEMASK_LO     JS_BITMASK(PCVCAP_SCOPEBITS)
> > +#define PCVCAP_SCOPEMASK_HI     (PCVCAP_SCOPEMASK_LO << PCVCAP_PROTOBITS)
> > 
> > I think HI wants to shift by SCOPEBITS, if that's what's defining the width of
> > LO.  (Saved by the numeric coincidence!)
> 
> No, PROTOBITS. Low-order nybble of vcap is protobits, high-order nybble is
> scope. This is the typical bitfield defining idiom where a non-right-justified
> field's mask is left-shifted by the width of the right-justified field after
> it.

If the low-order nybble is PROTOBITS, then why is _LO defined in terms of SCOPEBITS?  That's the inconsistency I was pointing out: we shift according to the width of PROTOBITS, but define the right-justified field in terms of SCOPEBITS, so if we were to increase SCOPEBITS by one, _LO would overlap _HI, no?

> Sorry, diff -w is clean there. Check out earlier -w attachments here. You
> caught me, that was purely an indentation fix (Igor got off by one due to a
> major comment ;-).

I'm out of steam here, been running on Dayquil all day.  I'll resume in the morning; bacon makes a fine breakfast, and the -w patch will keep it from being too gloomy a start. :)
(In reply to comment #63)
> Yeah, that was the case I was worried about, possibly on 64-bit systems.  Agree
> that it's far-fetched.

I'll add more defense.

> > > +#define PCVCAP_SCOPEMASK_LO     JS_BITMASK(PCVCAP_SCOPEBITS)
> > > +#define PCVCAP_SCOPEMASK_HI     (PCVCAP_SCOPEMASK_LO << PCVCAP_PROTOBITS)
[. . .]
> If the low-order nybble is PROTOBITS, then why is _LO defined in terms of
> SCOPEBITS?  That's the inconsistency I was pointing out: we shift according to
> the width of PROTOBITS, but define the right-justified field in terms of
> SCOPEBITS, so if we were to increase SCOPEBITS by one, _LO would overlap _HI,
> no?

No, _LO is the right-justified version of _HI -- you can see this because _HI = (_LO << PROTOBITS). Perhaps the names could be better? I thought about using just SCOPEMASK for the shifted one, but that name is ambiguous.

/be

Still pondering 64-bit scaling.

/be
Attachment #301457 - Attachment is obsolete: true
Attachment #301636 - Flags: review?(shaver)
Attachment #301457 - Flags: review?(shaver)
Tail of /tmp/propcache.stats after big gmail/bugzilla/sunspider session:

Property cache stats for thread 17854784, GC 555
      fills    2893315
    nofills      87320
    rofills          0
   disfills     183878
   oddfills         11
   modfills    2264207
 brandfills       9380
 longchains          0
   recycles    2659730
 pcrecycles     105492
      tests   58421604
     pchits   54454495
protopchits    3997099
   settests    7871408
  addpchits     692338
  setpchits    6906280
setpcmisses      63954
  setmisses     269528
   idmisses    2699781
   komisses     117232
   vcmisses     344551
     misses    3161564
    flushes        549
hit rates: pc 93.2095% (proto 6.84182%), set 96.5344%, full 94.5884%

Sharing a single direct mapped cache between the two levels of keys (kpc,ktype) and the slower (atom,kobj) seems to be working well -- no sign of thrashing an entry from first to second level (unrelated) key.

I'm surprised setmisses is not higher, since I left the hazard

                                slot < STOBJ_NSLOTS(obj)) {

instead of calling js_AllocSlot in the "addpchit" fast path. But JSOP_INITPROP is not participating here, and setpchits (existing props hit by the first level cache test) are nice and high. Add JSOP_INITPROP to the fun is something for another bug.

/be
- why do we NULL out entry in JSOP_CALLPROP's non-native cache-miss case?  It doesn't look like we use it again afterward.

- in PROPERTY_CACHE_TEST, JS_ASSERT(&obj != &pobj) in service of the just-so restrictions?

- {LOCKED_,ST}OBJ_SET_SLOT, assertions against the function-to-non, non-to-function sets that hackers must note well?

I've finished mentally executing the paths for non-native getters/setters, and I think I'm ready to stamp this bad boy.  Have to take care of some real work, and then I'm over to arrays tomorrow for JSFastLockable refactoring or whatever, I very much hope!
Comment on attachment 301636 [details] [diff] [review]
with shaver's comments so far addressed

Oh yeah, I had to read this comment a couple of times to figure out where the list of initial conditions ended and the list of subsequent actions began.  Might be worth rewording for clarity, though it's not critical.

+     * At this point we are inevitably leaving an interpreted function or a
+     * top-level script, returning to an inline call in the same js_Interpret,
+     * an "out of line" call made through js_Invoke, a js_Execute activation,
+     * a generator (SendToGenerator, jsiter.c), or else js_FilterXMLList.

r=shaver, with gusto.
Attachment #301636 - Flags: review?(shaver) → review+
Renamed pctype => shape, etc., based on Andreas Gal's amazing parallel development of same idea.

A final once over would be good. I didn't burden DEBUG builds with extra checking under {LOCKED,ST}OBJ_SET_SLOT, since that would be pretty costly and it would require splitting out yet another sub-layer (LOCKED_OBJ_WRITE_BARRIER uses LOCKED_OBJ_SET_SLOT uses STOBJ_SET_SLOT). I'd rather do something to reduce the number of *OBJ_SET_SLOT macros, in a followup bug.

/be
Attachment #301636 - Attachment is obsolete: true
Attachment #301809 - Flags: review?(shaver)
Filed bug 416058 on *OBJ_SET_SLOT elimination.

/be
Comment on attachment 301809 [details] [diff] [review]
final patch for committing

I want to understand why you would do this mass renaming before checking in the long-sought-after patch, and mix other (albeit minor) changes into it for my review.  I thought we had trust.

Nonetheless, I have inspected the bugzilla interdiff, and I again grant my r+.
Attachment #301809 - Flags: review?(shaver) → review+
Shaver, "trust" would have been broken if I had checked in without re-requesting review. If you would rather I do the renaming and add a few more meters in a separate bug, just say so. I figured you could quickly confirm that nothing else changed, approve it without biting back and we could get on with our lives. But I am chastened....

/be
Ah, me, I was poking fun.  I am glad that we will be able to celebrate bacon together on the trunk of our software soon; my toasting glass stands ready.
I checked in:

js/src/js.c 3.192
js/src/jsapi.c 3.404
js/src/jsapi.h 3.182
js/src/jsbool.c 3.36
js/src/jscntxt.h 3.193
js/src/jsemit.c 3.301
js/src/jsexn.c 3.95
js/src/jsfun.c 3.258
js/src/jsgc.c 3.275
js/src/jsgc.h 3.103
js/src/jsinterp.c 3.417
js/src/jsinterp.h 3.70
js/src/jsiter.c 3.86
js/src/jslock.c 3.75
js/src/jslock.h 3.37
js/src/jsnum.c 3.105
js/src/jsobj.c 3.424
js/src/jsobj.h 3.83
js/src/jsopcode.c 3.287
js/src/jsopcode.h 3.58
js/src/jsopcode.tbl 3.106
js/src/jsprvtd.h 3.38
js/src/jspubtd.h 3.98
js/src/jsscope.c 3.78
js/src/jsscope.h 3.55
js/src/jsscript.c 3.169
js/src/jsscript.h 3.40
js/src/jsstr.c 3.189

Would not be surprised to see a regression, but last I tested, I was mochitest clean. Ts/Txul/Tp again make me wonder about js_FlushPropertyCacheForScript. We shall see....

/be
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Depends on: 416302
This seems to have caused a trace malloc leak increase on all platforms, e.g.

http://build-graphs.mozilla.org/graph/query.cgi?tbox=bm-xserve11.build.mozilla.org&testname=trace_malloc_leaks&units=bytes&autoscale=1&days=2&avg=1&

Possibly also a Ts regression on Linux and Mac.
Depends on: 416404
Depends on: 416406
(In reply to comment #75)
> This seems to have caused a trace malloc leak increase on all platforms, e.g.
> 
> http://build-graphs.mozilla.org/graph/query.cgi?tbox=bm-xserve11.build.mozilla.org&testname=trace_malloc_leaks&units=bytes&autoscale=1&days=2&avg=1&

That's an existing leak that grew with this bug's patch. See bug 399112.

> Possibly also a Ts regression on Linux and Mac.

Will be addressed by bug 408871.

/be
Depends on: 416460
Depends on: 416462
Depends on: 416601
Depends on: 416665
Summary: get/set/call optimizations → get/set/call optimizations (return of the property cache)
Depends on: 418139
Flags: in-testsuite-
Flags: in-litmus-
Depends on: 424636
Depends on: 425828
Depends on: 428282
Depends on: 461158
Blocks: 463239
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: