Closed Bug 62164 Opened 19 years ago Closed 18 years ago

JS scopes are malloc-happy; need JS_DOUBLE_HASHING


(Core :: JavaScript Engine, defect, P1)






(Reporter: brendan, Assigned: brendan)



(Keywords: js1.5, memory-footprint, perf)


(6 files, 16 obsolete files)

105 bytes, text/plain
4.20 KB, text/plain
683 bytes, patch
Details | Diff | Splinter Review
10.46 KB, text/plain
18.87 KB, text/html
225.14 KB, patch
: superreview+
Details | Diff | Splinter Review
See  The savings will
be substantial: one malloc for the growing/shrinking vector of hash table
entries rather than one malloc per entry *and* one per JSSymbol.  Aliasing will
devolve to defining extra properties with the same slot.

Keywords: js1.5, mozilla0.8
Priority: P3 → P1
Target Milestone: --- → mozilla0.8
Moving out, which means js1.5 final won't be until mozilla0.9 -- we need to do a
release candidate, probably at the same time as mozilla0.8.

Keywords: mozilla0.8mozilla0.9
Target Milestone: mozilla0.8 → mozilla0.9
Keywords: footprint
Keywords: perf
brendan, are you still working on this?

Quantify data show that we spend 20% F+D time during startup in malloc() and 
it's major callers are: PR_malloc() 36%, JS_malloc() 30%, JS_ArenaAllocate() 

HIIIIIIIIII.  I said I was working on this for 0.9 just the other day.  *SIGH*

Patch coming over the weekend.

Blocks: 7251
Sorry, my firefighting (my own fires, mostly) and reviewing duties kept me from
fixing this for 0.9.  I'll get a patch attached before next week.

Keywords: mozilla0.9mozilla0.9.1
Target Milestone: mozilla0.9 → mozilla0.9.1
hi brendan.
any update on this?  where is your patch, that you promised earlier?  :-)

Getting to this soon, update and patch later this week.

brendan, are you gonna make this in by tomorrow?
I hope so -- working on it now, and over the weekend.

and then my Dell Inspiron 3700 started crashing, corrupting its filesystem so
that fsck crashes, etc.  Probably a bad SIMM, but I'm doomed for 0.9.1 unless
the patch is in hand or super-easy.  For this bug, I really need to get the
patch off my laptop's disk and into this bug, as soon as I can get up long
enough to start networking and do the ftp.  Woe is me,

Target Milestone: mozilla0.9.1 → mozilla0.9.2
this is what happens to heathens the leave their bug list fully loaded
until the 11th hour of a milestone....

repent, sinner!
brendan, that's laaaaame!!!  wah...  fix it!  :-)
Honest, your dog ate my SIMM-B!

I'll attach a non-working patch in progress to show all you unbelievers, soon.

Target Milestone: mozilla0.9.2 → mozilla0.9.3
Another one needed soon for 1.5, but FastLoad work demands a 0.9.3 sacrifice.

Target Milestone: mozilla0.9.3 → mozilla0.9.4
No longer blocks: 7251
Blocks: 7251
FastLoad work preempted this in 0.9.4, but it's Next.

Target Milestone: mozilla0.9.4 → mozilla0.9.5

Target Milestone: mozilla0.9.5 → mozilla0.9.6
Comment on attachment 53981 [details] [diff] [review]
work in progress from my old laptop

I'm working on this bug now.

Attachment #53981 - Flags: needs-work+
Blocks: 92580
Just a friendly reminder to include implementations for
JS_GetFunctionArg{Count,Names} in jsdgbapi.h when this happens.
-> 0.9.7
Target Milestone: mozilla0.9.6 → mozilla0.9.7
I spent too much of my (increasingly scarce :-() coding time during 0.9.7 on bug
104077, but I'm working again on this bug, and will continue steadily and get it
landed (#ifdef'd off at first) at the start of 0.9.8.

Target Milestone: mozilla0.9.7 → mozilla0.9.8
Patch soon, it should shrink *and* speed up things. It'll be big though -- brace

Target Milestone: mozilla0.9.8 → mozilla0.9.9
helloooo.... got a working patch yet??  
can i test it out?

It's compiling, but not working.  I need to disappear from the world of IRC and
email for a day (today thru tomorrow?) and finish things up.

Clue me in. Will this reduce the number of allocations from:

    sprop = (JSScopeProperty *) JS_malloc(cx, sizeof(JSScopeProperty));

in js_NewScopeProperty()? We do ~9000 of these for startup + browser window, 36 
bytes each, is 316K.
sfraser: yes; I haven't got working code to measure, and I didn't jerry-rig old
code to estimate, but there will be quite a savings.  All properties conforming
to a given "type" (added in a given order, and with the same getter/setter,
slot, attrs, etc. for the given id) will be shared along paths, one per "type",
through a per-JSRuntime property tree.  The savings are likely to be big,
although I expect we'll still see many JS objects during startup.

Plus: double hashing means no JSSymbol mallocs.  Are you using trace-malloc? 
How many of those do you see?

Have you classified objects by their "type"?  It may be that we are using JS
badly, not only that we're using JS well but its implementation is too bloat-y 
-- see bug 72748.

This patch is fairly hideous, because I'm still using #ifdef JS_DOUBLE_HASHING
and other #if/#ifndef combos.  I'm going to rip those out and fix some of the
XXX or JS_ASSERT(0) hard cases here and attach another patch later tonight, but
this one does pass the JS testsuite.  Yay!

Attachment #53981 - Attachment is obsolete: true
QA note: these two tests have been added to the JS testsuite:


They define functions with duplicate formal parameter names,
and may help test the work done here -
Attached patch latest work in progress (obsolete) — Splinter Review
Still #ifdef'ed out the wazoo, but I'll fix that tomorrow.  Still hard cases
such as watchpoints temporarily broken, but it passes the testsuite (although I
see a fresh core file, darnit -- but no testsuite failure?  Phil, please

Attachment #68822 - Attachment is obsolete: true
Applied the latest patch on Linux, built the debug/optimized shells,
and ran the entire testsuite. I'm only getting one new failure, a crash:

       Testcase ecma/ExecutionContexts/10.1.3-1.js failed 
       Expected exit code 0, got 139
       Testcase terminated with signal 0
       Complete testcase output was:
       Testcase produced no output!

Note this is NOT the new testcase I added yesterday in the ecma_3 directory.
This one was written by Christine;I'll look into what it does. 

Here is the lxr link (I just need space for one more character!):
I have reduced Christine's testcase to the following three lines. 
This causes a crash when I run this as outlined below. If I comment out
any of the three lines, no crash. If I change the duplicate parameter
to a non-duplicate, no crash. Against the current JS trunk, no crash.

var testcases = new Array();
var f = new Function( "a", "a", "return a" );
function g(a, b, a){return a;}

I will attach that as a file below. Save that file into the
js/tests/ecma/ExecutionContexts directory and call it, say, 62164.js.

This only causes a crash if the ecma/shell.js file is loaded first,
in the way the test driver does, using the -f option to the JS shell:

[]  /data/phil/JS_EXP/mozilla/js/src/Linux_All_OPT.OBJ/js
-f  /data/phil/JS_TRUNK/mozilla/js/tests/ecma/shell.js
-f  /data/phil/JS_TRUNK/mozilla/js/tests/ecma/ExecutionContexts/62164.js

Just loading 62164.js by itself does not provoke a crash. Also, just running
the JS shell and loading these two files interactively via load() does not
cause a crash. Only running the two files via the -f option does it.

I don't know how to run gdb and provide the -f option values to the JS shell.
If someone could tell me how to do that, I can provide a stack trace.
On second thought, you probably don't want to pollute your 
js/tests/ecma/ExecutionContexts/ directory with this one-off test.

As long as your second -f option points to it, it doesn't matter
where it is -
Mike explained to me how to pass the -f options in gdb. Thanks!
I get this stack trace:

Program received signal SIGSEGV, Segmentation fault.
0x80b8ffc in js_SweepScopeProperties (rt=0x80dde90) at jsscope.c:1471
1471                    FREENODE_REMOVE(sprop);

(gdb) bt
#0  0x80b8ffc in js_SweepScopeProperties (rt=0x80dde90) at jsscope.c:1471
#1  0x807a942 in js_GC (cx=0x80e0e38, gcflags=0) at jsgc.c:1249
#2  0x807a208 in js_ForceGC (cx=0x80e0e38) at jsgc.c:980
#3  0x805a5fa in js_DestroyContext (cx=0x80e0e38, gcmode=JS_FORCE_GC)
#4  0x804ede8 in JS_DestroyContext (cx=0x80e0e38) at jsapi.c:889
#5  0x804c1c9 in main (argc=4, argv=0xbffffbc8) at js.c:2132

Let me know if there are any frames you want expanded via p, p/x, etc.
Phil, can you revert to trunk and apply this patch?  I hope it fixes the
problem you found, which I couldn't reproduce using run -f ecma/shell.js -f
foo.js in gdb (foo.js was where I saved attachment 69108 [details]).

Attachment #69018 - Attachment is obsolete: true
Attachment #69121 - Attachment is obsolete: true
My build, updated yesterday a.m. while the tree was closed, is only now
respinning.  This patch passes the js testsuite.  sfraser, jrgm, let me know if
you have troubles running the browser.	I'll test my build after sleeping.

Hiiiiiiiiii, it should be ready to test footprint and perf gains now. 
Cathleen, can you get lots of people to try the patch?	THANKS :-)

Attachment #69448 - Attachment is obsolete: true
A quick test on Win2k with an optimized build with symbols shows that mozilla
w/o this patch uses 18052k of memory, and the same test with a build where the
patch has been applied uses 17644k of memory, so 400k less with the patch! I
don't know how reliable those numbers are, but it looks like this would be a
fairly large step in the right direction :-)
Hrm. Trying to run a pageload test on linux with this patch ... most but not
all of the pages fail to execute the JS that runs the test, reporting:
"Error: Date is not a constructor", blaming 'var foo = (new Date()).getTime()'
as the invalid JS. Curiously, the same page will not report this error when 
it is reloaded. I can't reproduce this same error with a simple test page which 
has only the offending line. Something odd is happening.
Reminiscent of the quirky crash I had on the earlier patch
(Comment #30 - Comment #34). I'll see if I can reproduce it.

It, too, depended on a bizarre combination of circumstances -
Just ran the JS testsuite against the latest patch (id = 69599)
in the optimized JS shell, and got two regressions:

     Testcase ecma/ExecutionContexts/10.1.3-1.js failed 
     Expected exit code 0, got 139
     Testcase terminated with signal 0
     Complete testcase output was:
     Testcase produced no output!

     Testcase ecma_3/ExecutionContexts/10.1.3-1.js failed 
     Expected exit code 0, got 139
     Testcase terminated with signal 0
     Complete testcase output was:
     Testcase produced no output!

Note both testcases create functions with duplicate formal parameter names.
The former testcase also crashed on an earlier patch; see Comment #30 above -
I'll fix those testcases, but wanted to reassure people that they're obscure and
silly ECMA purity cases.  I still crave testing feedback _a la_ jrgm's, so do
try the patch if you have the chance -- it affects only js/src/*.[ch] and is
easy to back out.

I am in process of making an optimized build... will apply patch, and report
some results as soon as i have it! :-)
The pageload tests stall on the second or third page for me with this patch (opt 
Macho-O build)
I get a JS_ASSERT after one or two pages in the page-loader test, on either
Linux or Win32. I assume this is where things go wrong and lead to the 'Date is
not a constructor' in opt. builds

Assertion failure: 
  (uint32)slot < JS_MIN(((obj2)->map)->freeslot, ((obj2)->map)->nslots), 
  at H:\mozilla3\mozilla\js\src\jsinterp.c:2835

in jsinterpret.c

	    /* Get and push the obj[id] property's value. */
	    sprop = (JSScopeProperty *)prop;
	    slot = (uintN)sprop->slot;
	    rval = (slot != SPROP_INVALID_SLOT)
   -->		   ? LOCKED_OBJ_GET_SLOT(obj2, slot)
		   : JSVAL_VOID;
	    JS_UNLOCK_OBJ(cx, obj2);
	    ok = SPROP_GET(cx, sprop, obj, obj2, &rval);
	    JS_LOCK_OBJ(cx, obj2);
	    if (!ok) {
		OBJ_DROP_PROPERTY(cx, obj2, prop);
		goto out;
Stephend has kindly agreed to purify, and I'm still policing loose ends. 
Anyone with more debugging info, please feel free to mail it to me, or post it

Attachment #69599 - Attachment is obsolete: true
This seems to work for me, but I'm tired and I welcome all test results (anyone
who gets a crash in a debugger, please print out the relevant-looking local
vars and follow non-null pointers one level).  Thanks.

Attachment #69947 - Attachment is obsolete: true
On win32, with the latest patch, I'm getting two errors on certain pages in the 
page-loader: 'isNaN is not a function' and 'parseInt is not a function'. I'll
see what I get in a debug build tomorrow.
Comment on attachment 70049 [details] [diff] [review]
latest & greatest, loose ends tied

I crash on startup on win2k, with this patch.

FreeArenaList(JSArenaPool * 0x00000000, JSArena * 0x024b67e8, int 0) line 323
JS_ArenaRelease(JSArenaPool * 0x01026610, char * 0x0012fc18) line 354 + 18
js_Execute(JSContext * 0x01026628, JSObject * 0x01026638, JSScript *
0x00000000, JSStackFrame * 0x01026610, unsigned int 0, long * 0x0253cc01) line
js_Interpret(JSContext * 0x01026610, long * 0x0012fc18) line 3919
js_Execute(JSContext * 0x024e8ac8, JSObject * 0x024e8ac8, JSScript *
0x0253c498, JSStackFrame * 0x00000000, unsigned int 0, long * 0x0012fc18) line
JS_ExecuteScript(JSContext * 0x01026610, JSObject * 0x024e8ac8, JSScript *
0x0253c498, long * 0x0012fc18) line 3224 + 27 bytes
mozJSComponentLoader::GlobalForLocation(mozJSComponentLoader * const
0x0253ec89, const char * 0x024eed30, nsIFile * 0x024c1558) line 1201 + 17 bytes
mozJSComponentLoader::ModuleForLocation(mozJSComponentLoader * const
0x0253ec89, const char * 0x024eed30, nsIFile * 0x024c1558) line 1000
mozJSComponentLoader::AttemptRegistration(mozJSComponentLoader * const
0x0253ec89, nsIFile * 0x024c1558, int 0) line 834
mozJSComponentLoader::AutoRegisterComponent(mozJSComponentLoader * const
0x0109c6a8, int 0, nsIFile * 0x024c1558, int * 0x00000000) line 768
mozJSComponentLoader::RegisterComponentsInDir(mozJSComponentLoader * const
0x0253ec89, int 0, nsIFile * 0x024c1558) line 592
mozJSComponentLoader::AutoRegisterComponents(mozJSComponentLoader * const
0x0109c6a8, int 0, nsIFile * 0x01070a28) line 546
nsComponentManagerImpl::AutoRegisterImpl(nsComponentManagerImpl * const
0x0253ec89, int 0, nsIFile * 0x00000000, int 1) line 3186
nsComponentManagerImpl::AutoRegister(nsComponentManagerImpl * const 0x0032d8f4,
int 0, nsIFile * 0x00000000) line 3072 + 18 bytes
nsComponentManager::AutoRegister(int 0, nsIFile * 0x00000000) line 222
main1(int 1, char * * 0x00325d40, nsISupports * 0x00325d80) line 1133 + 8 bytes
main(int 1, char * * 0x00325d40) line 1625 + 24 bytes
WinMain(HINSTANCE__ * 0x00400000, HINSTANCE__ * 0x00400000, char * 0x00132d33,
HINSTANCE__ * 0x00400000) line 1643 + 21 bytes
MOZILLA! WinMainCRTStartup + 308 bytes
KERNEL32! 77e87903()
Cathleen, was that with a clobber build?  My Linux build works fine.  If you
have bad or no header file dependencies, you might have troubles.  If you did
clobber or otherwise trust your deps, how about a purify build?

The arguments in the the top few frames look all wrong, but maybe that's because
you're using an optimized build?

Brendan, my build is a depend build, optimized with symbols, so it works with
purify and quantify.  I can try clobber the whole tree, and build again to see
if it works. :-)
I got a working build.  But, I found some problems: 

- build gets stalled when running page-load test.
- pref dialog doesn't close, when press OK.  (after I cleared both memory
  and disk cache)

Here are some test results from win98, 200MHz, 64MB:

Startup (warm)
original: 8890, 8610, 8080, 8000, 8580, 8600 --> avg 8460 ms
w/ patch: 7750, 8040, 7990, 8360, 7900, 8190 --> avg 8038 ms
                                                diff  602 ms  (or 6.96%)

memory on startup (warm):
data kb  10,036  9,956  9,948
code kb   9,536  9,536  9,536

w/ patch:
data kb   9,564  9,636  9,640
code kb   9,508  9,488  9,488

shaved off some in data kb

Attached patch ok, this one's money (obsolete) — Splinter Review
My arrant failure to clear the property cache in js_Clear (which clears the
scope) led to bad cache hits and out-of-bounds slot numbers.  Thanks to jrgm
for the test-debugging setup.  Everyone please try this patch.

Attachment #70049 - Attachment is obsolete: true
okay, so with the "money" patch on Linux, I don't get the scope errors that I 
had before running the pageload test. My timing results for pageload, window
open, and startup show that there is no change to those times. I should get
some mem. stats too, but I wanted to note the above before tomorrow.
Attached patch mo' money (obsolete) — Splinter Review
Measurement shows mean entries/scope just under 5, with std. dev of ~8.  That
says I'm wasting too much space for the cycle savings (diminishing with double
hashing compared to short linear search) on scope->table, which started out at
8 words or 32 bytes in the last patch.	I revived the old jsscope.c's threshold
for switching from linear search from SCOPE_LAST_PROP(scope) to hashing.

This change required some augmented conditions in loops over SCOPE_LAST_PROP,
and conversion of loops over scope->table to instead follow the parent link up
from lastProp.	But the main changes were in jsscope.c, making the switch from
list to list+hash coherent (fortunately, I didn't have to make it reversible;
once again, I choose to penalize delete).

Faithful reviewers (hi shaver!) will want to read a plain diff of the last
patch against this one to catch up and not over-fry their brains.

Passes the js testsuite and runs fine in the big M.

Attachment #70218 - Attachment is obsolete: true
jrgm, thanks for the good news.  Can you try the "mo' money" patch when you get
a chance?  It shouldn't make any difference in times, and it helps footprint a
bit more.  Cathleen, dp, anyone: feel free to confirm.

Thanks to brendan's tireless education and patient repetition of simple
concepts, I can now say with reasonable confidence that I understand this patch.
 I wouldn't want to have to reimplement it in a closed-book exam, but I can
reason about it and perhaps even hack on it a bit.

And I like it, quite a bit.  But it's 0530, I have to leave for the airport in
90 mins, and I've been up all night reading and poking at the patch, so I don't
think I'm really going to have time to do a line-by-line review of the patch
before 0.9.9 closes (not that the patch is really amenable to such review, but
thoroughness is next to godliness).  If someone else (khanson? jband?) can do a
backstop review -- jband's special attention to lock symmetry would be
especially welcome -- I think we can land this pre-freeze.

If we miss the freeze, I recommend that drivers permit it to land in the 0.9.9
quiet period: the footprint and malloc-churn wins here are, at first
measurement, very real, and we will want at least one milestone of baking for a
patch of this size before 1.0.

Testers, please pound away, to help find any lingering errors, which I expect to
be minor and easily remedied when found.

not-that-super-r=shaver, I guess.
Attached patch most money (obsolete) — Splinter Review
I'm crashing, this one is just a comment improvement and a tweak to
js_RemoveScopeProperty to avoid creating scope->table (to stick with linear
search starting from scope->lastProp) in the case where the property being
removed is scope->lastProp.

Attachment #70263 - Attachment is obsolete: true
Attached patch most money^2 (obsolete) — Splinter Review
I meant "I'm going to bed" when I wrote "I'm crashing".  And then I read the
patch in bugzilla and saw that the linear search stanza in js_SearchScope (if
(!scope->table) { ... }) was not METERing hits and misses.  Easily fixed, this
is it until I wake.  jrgm, gimme some good news!

Attachment #70269 - Attachment is obsolete: true
Page load results from opt Mach-O build:
Before: 762 ms
After:  750 ms

No bloat data yet.
X:\trunk\mozilla\js\src\jsscope.c(1246) : warning C4101: 'newtable' : unreferenc
ed local variable

Not even used in debug code.
Stupid me didn't see the equivalent gcc -O warning.

Also improved a few comments in jsscope.c/.h, and removed the SPROP_ID macro,
which was used only once (everyone else uses the clearer sprop->id and I see no
need to hide that fact).

Attachment #70274 - Attachment is obsolete: true
... it didn't allow one to change "back" to null getter or setter (as could
happen when deleting a watch-point).  I was hopelessly attempting to overload
null for the getter arg to mean "use sprop->getter", but the caller can pass
that in (as the caller has sprop in hand -- it's an earlier param!).

I've been reading the heck out of this patch, I'm pretty happy.  Testers?

Attachment #70398 - Attachment is obsolete: true
With builds from this evening, with patch 70398 (the jband var patch) applied 
to one of the two builds, I get these crude bloat/mem measurements (sorry, 

  win2k, bookmarks sidebar showing, modern skin
  One browser window:
  with patch:
    MemUsage 17,472KB / PeakMem 17,604KB / VM Size 13,740KB
  without patch:
    MemUsage 17,824KB / PeakMem 18,024KB / VM Size 14,044KB
                352KB              420KB              404KB
  Open 6 total browser windows, same config as above:
  with patch:
    MemUsage 25,592KB / PeakMem 25,724KB / VM Size 21816KB
  without patch:
    MemUsage 26,488KB / PeakMem 26,512KB / VM Size 22748KB
                896KB              788KB             932KB
  Linux, bookmarks sidebar showing, modern skin:
                        SIZE/RSS             SIZE/RSS
  One browser window:  20804/20M            20508/20M  = 296KB
  Six browser windows: 28700/28M            27932/27M  = 768KB

For pageload, startup, windowopen ...

  On win2k, pageload and window open times are not changed in the
  tests that I did, with or without the patch.

  On win2k, for startup (warm), I am seeing some improvement when 
  running the "dp-style" startup test. Time drops from about 3.5s
  to 3.3s pretty consistently, but I'm a little skeptical (don't know
  why, just am).

  On linux (sorry, didn't get to startup and window open), again, I'm 
  somewhat skeptical (because I didn't see a change on win2k), but the 
  avg. median time drops from ~1300 to ~1280, or about . I've repeated 
  this measurement and got the same result twice, but I'm wondering if 
  I goofed somewhere.

At any rate, in summary, I'm not seeing any functional problems (and I 
did go through create a mail account, send/receive, did some ftp and 
http browsing and opened a few dialogs). The mem reduction seems solid,
and perf-time-wise, we're not slower, and may be faster in some situations.

So, I'd say this is good to go on the 9.9 branch, to get some broader shaking
and baking.
> drops from ~1300 to ~1280, or about 
 ... or about 1.5%

> this is good to go on the 9.9 branch
 ... modulo a good run through JS regression tests, of course :-)
I just applied patch 70482 to a fresh tree and ran the JS testsuite 
against it in the Linux debug shell. All six tests in js1_5/GetSet/
are failing. I will attach the test results below. All failures other
than the GetSet failures are known bugs -
My latest patch did fix a problem trying to overload null, but I'm sorry it
broke those JS tests -- I'll figure out what went wrong and have a "final" patch
later today for 0.9.9 (final if jband likes it).

So in un-overloading null for the getter and setter parameters to
js_Change{Native,Scope}PropertyAttrs, I went too fast in jsobj.c's
js_DefineNativeProperty.  That function has to handle the following hard cases:

2) defining a getter where the setter already exists, and vice versa, by
calling js_ChangeNativePropertyAttrs and returning early;
3) class-default getter and setter selection if the caller passed null;
4) getting a mutable scope if needed;
5) js_AddScopeProperty, which truly defines the non-getter/setter property;
6) clasp->addProperty notification;
7) property cache fill.

(Note for reviewers: since defining native function properties for args and
local vars, done by the compiler, do not need any of these special cases, those
definitions use js_AddScopeProperty directly.)

Step 2 involves the call to js_ChangeNativePropertyAttrs and an early return,
and the bug in the last patch was that this step didn't use the incoming attrs
to select getter and setter from the incoming getter/setter params and the
existing sprop->getter/setter.	Hence the ?: expressions in the fix (if you
diff the last patch with this one).

No other changes, testsuite passes.

Attachment #70482 - Attachment is obsolete: true
I'm reading (discovered that I didn't understand the *old* scheme as well as I
thought). I also thought it might be fun to run the xpcshell multi-threading
test: /js/src/xpconnect/tests/js/old/threads.js.

It botches a JS_ASSERT right quick:

in js_UnlockScope called from js_GetPRoperty ln 2466. in js_UnlockScope
scope->ownercx is null, but scope->u.count is 0.
D'oh, I blew it -- I had hoped to eliminate lock-holding from successful and
prop found return from OBJ_LOOKUP_PROPERTY through OBJ_DROP_PROPERTY or
equivalent in the native case.  That breaks the rules too much, precluding
future JSObjectOps wrapping, and actually requiring *more* (lock-free, usually,
thanks to the work in bug 54743, but still...) lock acquisitions and releases in
js_GetProperty's critical path.

New patch coming up.  Thanks, I wondered where that test case you wrote went!  I
had a copy on my old Dell laptop.

fwiw, my sad set of comments so far...

- jsscope.h is one tab away from tab free - do the deed!

- big comment talks about field order number (as part of hash key). Looks like
you  moved away from that and didn't fix the comment.

- aliases became sort of first class properties, no? It looks like they
enumerated and would not have been in the old code, right?

That thread test ought to run in xpcshell for others too. There are some
threadsafety imperfections in xpconnect that *can* trip it up and give 'false'
error reports (as far as the JS engine is concerned). But it is still the best
JS engine threadsafety test we have in our codebase.
There's no copying in the collection of JSScopeProperty structs, right? Any idea
if the arenas are getting fragmented enough that you don't manage to shrink back
down from highwater memory usage marks? It looks like moving JSScopeProperty
structs would be real messy. Just curious if it looks like it might ever be
worth doing.
>- jsscope.h is one tab away from tab free - do the deed!

Done, thanks.

>- big comment talks about field order number (as part of hash key). Looks like
>you  moved away from that and didn't fix the comment.

The comment says that order is part of a property tree node's label, but
the js_HashScopeProperty doesn't need to worry about order as part if
the identifier it's matching in the root ply of the tree, because all nodes at
that level have the same order, namely first!

>- aliases became sort of first class properties, no? It looks like they
>enumerated and would not have been in the old code, right?

Right, good point.  I whacked js_Enumerate to skip them.  Thanks.

>That thread test ought to run in xpcshell for others too. There are some
>threadsafety imperfections in xpconnect that *can* trip it up and give 'false'
>error reports (as far as the JS engine is concerned). But it is still the best
>JS engine threadsafety test we have in our codebase.

My latest patch passes nicely, attaching in a moment.

>There's no copying in the collection of JSScopeProperty structs, right? Any 
>idea if the arenas are getting fragmented enough that you don't manage to 
>shrink back down from highwater memory usage marks? It looks like moving 
>JSScopeProperty structs would be real messy. Just curious if it looks like it 
>might ever be worth doing.

Shaver is thinking of a copying GC.  As he points out, it's easy for the first
mark-object call to reach a JSScopeProperty to forward it to a brand new place,
leaving a pointer (forwarding address) in its old storage.  Other paths that
mark it rewrite their scopes to refer to the node's new address.

But I'm going to measure first, because if utilization is ok (it is for
JSGCArenas even though live things thread through them, potentially quite
sparsely), then I don't see the need for the added code and implementation
complexity.  More tonight or tomorrow on this issue of rt->propertyArenaPool

Relocating JSScopeProperty structures and updating the pointers isn't that hard:
compacting mark-and-sweep collectors do that all the time, and I hope that our
GC will as well, some day.  I tried to get brendan to drink my
JSScopeProperty-compaction Kool-Aid, but he didn't think it would be worth it.  

FWIW, we (could?) have similar fragmentation problems in the GCThing arena-pool,
but we would need some tighter API semantics around "external" roots to compact
there.  (Or copy to another generation, which is why a generational GC requires
some additional embedding burden for hard cases that want to store object
references out of the sight of the GC's mark phase.)
I went through every call to OBJ_LOOKUP_PROPERTY and js_LookupProperty.  I also
checked OBJ_DEFINE_PROPERTY and js_DefineProperty.  I restored all calls to
OBJ_DROP_PROPERTY that I'd deleted earlier in this bug's patches.  I found a
place in jsfun.c that ass-u-med the result of a lookup was a direct property of
the native |funobj| (call_resolve).  This led to a change to call_resolve to
find the function object for the call object's frame properly (you must use
fp->argv[-2], not fp->fun->object, because of cloned function objects).

I found a few bugs that had crept in last year or whenever strict warnings were
added, where early return missed a needed drop (jsparse.c, e.g. -- the early
return was a strict warning report where the WERROR option made the warning an
error).  Need a C++ auto-helper class -- but SpiderMonkey is C, d'oh!

I long ago optimized js_GetProperty to use JS_LOCK_SCOPE and JS_UNLOCK_SCOPE
once it had the right scope in hand.  But js_SetProperty still used the _OBJ
flavor, even though it knows (js_GetMutableScope having succeeded) by those
points that obj has its own scope, loaded in the |scope| local.  So I made
js_SetProperty match js_GetProperty by using JS_{,UN}LOCK_SCOPE instead of

The jsarray.c changes are all in #ifdef'd off code that mccabe wrote but never
got workign (JS_HAS_SPARSE_ARRAYS).

This passes the suite and runs well in Mozilla.  Testing still welcome, review
by diff'ing diffs, or re-reading the non-jsscope parts of the patch (sorry!)
still needed.

Attachment #70576 - Attachment is obsolete: true
I have just tested the latest patch (id=70732) on Linux and WinNT:

The patch passes the JS testsuite in both the debug and optimized JS shell.
The only failures I get are the 6 we know about; due to known bugs.

Same story: the patch passes with only those 6 known failures.

I cannot compile the debug shell. I keep getting this error:

cl -FoWINNT4.0_DBG.OBJ/ -c /MD /Od /Z7  -DXP_PC -DWIN32 -D_WINDOWS -D_WIN32 
/nologo /W3 /FpWINNT4.0_DBG.OBJ/js.pch  -DDEBUG -DDEBUG_   js.c
link.exe -out:"WINNT4.0_DBG.OBJ/js.exe" kernel32.lib user32.lib gdi32.lib 
winspool.lib comdlg32.lib advapi32.lib shell32.lib
ole32.lib oleaut32.lib uuid.lib oldnames.lib /nologo /subsystem:console /debug 
/pdb:none /machine:I386 WINNT4.0_DBG.OBJ/js.obj WINNT4.0_DBG.OBJ/js32.lib 
fdlibm/WINNT4.0_DBG.OBJ/fdlibm.libjs.obj : error LNK2001: unresolved external 
symbol _js_SearchScope
WINNT4.0_DBG.OBJ/js.exe : fatal error LNK1120: 1 unresolved externals
make[1]: *** [WINNT4.0_DBG.OBJ/js.exe] Error 96
make: *** [all] Error 2

Here is my version of the compiler:
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 12.00.8168 for 80x86
Thanks to rginda for the venkman hard test-case, which exposed a bad bug in
js_RemoveScopeProperty (and in the overwriting case of js_AddScopeProperty,
which also may remove scope->lastProp): if the next-to-last property is "middle
deleted", it remains in the scope->lastProp list; if the last property is then
deleted, scope->lastProp must not be left (tagged with the middle delete flag,
but that's ok and not the problem here) pointing at a node not mapped by
scope->table.  Otherwise, the GC will free such a node, which may be recycled,
but one or more scopes still refer to that node via lastProp.  The fix is to
loop removing any unmapped node at scope->lastProp.

Phil, I also added the needed JS_FRIEND_API usage for js_SearchScope, so the
debug shell can build on Windows.

Attachment #70732 - Attachment is obsolete: true
Latest patch, start venkman, type `open-dialog
chrome://venkman/content/tests/text.xul`, and click the "debugger" button on the
window that pops up.

Assertion failure: scope->ownercx != cx, at jslock.c:559

#0  0x40606a01 in __kill () from /lib/i686/
#1  0x403255bb in raise (sig=6) at signals.c:65
#2  0x40607f82 in abort () at ../sysdeps/generic/abort.c:88
#3  0x4011addb in JS_Assert () at jsutil.c:173
#4  0x400db93b in js_GetSlotThreadSafe (cx=0x838e4a8, obj=0x86cd438, slot=2)
    at jslock.c:559
#5  0x4299b33b in jsd_GetValueClassName (jsdc=0x83b6ad0, jsdval=0x86f4ac0)
    at jsd_val.c:565
#6  0x42992d9d in JSD_GetValueClassName (jsdc=0x83b6ad0, jsdval=0x86f4ac0)
    at jsdebug.c:1081
#7  0x429a07af in jsdValue::GetJsClassName (this=0x86f4af0, _rval=0xbfffbb74)
    at jsd_xpc.cpp:1879
#8  0x4024677c in XPTC_InvokeByIndex (that=0x86f4af0, methodIndex=13, 
    paramCount=1, params=0xbfffbb74) at xptcinvoke_unixish_x86.cpp:153
#9  0x408b4bf4 in ?? ()
rginda: the debugger works for me, and it's hard to see how that assertion can
botch, because the macro wrapper for js_GetSlotThreadSafe, OBJ_GET_SLOT, checks
that condition before calling js_GetSlotThreadSafe.  Some fine-grained race with
another thread who stomps scope->ownercx could do it.  Otherwise, I don't see
how it happens -- the assertion is really there to catch people who wrongly call
js_GetSlotThreadSafe directly, not via OBJ_GET_SLOT.

I'll IRC-buddy you in gdb and maybe we'll figure this out.  If anyone else who
is testing the patch could try to reproduce it and report back, that might help.
 Rob told me to 'load chrome://venkman/content/venkman-dev.js' in the debugger
console, then run the 'treetest' command, then click on Debug, the bottom button
on the dialog.  That WFM, it's what Rob did that crashed him with the assert-botch.


Looks like I needed to rebuild in js/jsd after applying the patch.  WFM after
doing that.
Comment on attachment 70894 [details] [diff] [review]
fix last-delete to skip middle-deleted props when updating scope->lastProp

Attachment #70894 - Flags: superreview+
a=asa (on behalf of drivers) for checkin to 0.9.9
It's in.  Thanks to everyone.  I need a vacation!

Closed: 18 years ago
Resolution: --- → FIXED
It's been suggested (but not yet proven either way) that this change caused bug
127557. Mentioning that fact here for posterity's sake. Anyone with an interest
in proving or disproving the theory that the bugs are related, hop on over :)

(bug 127557 is far too minor to ever consider backing out this patch for, but if
it is related to this change, it's presumably better to fix the JS engine to
work as it did before than to simply workaround the problem by changing the JS
code, as the current patch in that bug does)
Follow-on bugs such as bug 127557, bug 128258 have now been dealt with;
marking this bug Verified -
Flags: testcase+
Depends on: 507904
You need to log in before you can comment on or make changes to this bug.