Last Comment Bug 547941 - (WeakMap) Implement Harmony Weak Maps [ES6?]
(WeakMap)
: Implement Harmony Weak Maps [ES6?]
Status: RESOLVED FIXED
fixed-in-tracemonkey
: dev-doc-complete, feature
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Trunk
: All All
: -- normal with 2 votes (vote)
: mozilla6
Assigned To: Andreas Gal :gal
:
Mentors:
http://wiki.ecmascript.org/doku.php?i...
Depends on: 1006099 548733 650753 653248 655297 656828 657264 668855 673468 680937 686114 688277 707313 761620 798678 804386 819131 982561 998355 1009349
Blocks: es6 486643 WeakSet
  Show dependency treegraph
 
Reported: 2010-02-23 02:23 PST by Andreas Gal :gal
Modified: 2014-05-12 19:45 PDT (History)
35 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
patch (17.82 KB, patch)
2010-02-23 02:23 PST, Andreas Gal :gal
no flags Details | Diff | Splinter Review
patch (19.69 KB, patch)
2010-02-24 02:08 PST, Andreas Gal :gal
no flags Details | Diff | Splinter Review
patch (19.10 KB, patch)
2010-02-24 02:11 PST, Andreas Gal :gal
no flags Details | Diff | Splinter Review
patch with test case (20.19 KB, patch)
2010-02-24 02:30 PST, Andreas Gal :gal
no flags Details | Diff | Splinter Review
patch (20.66 KB, patch)
2010-02-24 02:39 PST, Andreas Gal :gal
no flags Details | Diff | Splinter Review
patch (22.92 KB, patch)
2010-02-24 17:05 PST, Andreas Gal :gal
no flags Details | Diff | Splinter Review
patch with cycle collector integration (25.88 KB, patch)
2010-02-25 09:00 PST, Andreas Gal :gal
igor: review-
Details | Diff | Splinter Review
patch (15.62 KB, patch)
2011-03-15 00:41 PDT, Andreas Gal :gal
no flags Details | Diff | Splinter Review
patch (21.70 KB, patch)
2011-03-15 00:42 PDT, Andreas Gal :gal
no flags Details | Diff | Splinter Review
patch (19.49 KB, patch)
2011-03-29 17:17 PDT, Andreas Gal :gal
no flags Details | Diff | Splinter Review
patch (19.37 KB, patch)
2011-03-29 17:19 PDT, Andreas Gal :gal
no flags Details | Diff | Splinter Review
patch (21.54 KB, patch)
2011-03-30 11:09 PDT, Andreas Gal :gal
no flags Details | Diff | Splinter Review
patch (21.91 KB, patch)
2011-04-11 15:35 PDT, Andreas Gal :gal
no flags Details | Diff | Splinter Review
patch (25.01 KB, patch)
2011-04-11 16:03 PDT, Andreas Gal :gal
no flags Details | Diff | Splinter Review
patch (26.56 KB, patch)
2011-04-11 21:50 PDT, Andreas Gal :gal
jorendorff: review+
Details | Diff | Splinter Review
weakmaps.js (4.75 KB, text/javascript)
2011-04-11 21:53 PDT, Jason Orendorff [:jorendorff]
no flags Details
fuzzmaps.js (2.37 KB, text/javascript)
2011-04-11 22:13 PDT, Jason Orendorff [:jorendorff]
no flags Details

Description Andreas Gal :gal 2010-02-23 02:23:31 PST
http://wiki.ecmascript.org/doku.php?id=strawman:ephemeron_tables

Usage:

var e = new EphemeronTable()
var x = new String("x")
e.put(x, ({}))

gc()

/* Table still contains x<->({}), because x is alive. */

x = null

gc()

/* The table is empty, which we can't verify, since we lost x, but anyway. */

Steps during GC:
1) first mark all objects and merely collect ephemeron tables in a list
2) go through all ephemeron tables and revive all values with alive keys
3) if any values were revived by 2), goto step 2)
4) remove entries with dead keys from all ephemeron tables

Iterative marking seems necessary to deal with cases like this:

var x = new String("x")
var y = new String("y")

e.put(x, y)
e.put(y, ({}))

y = null

gc()

If we mark in the order x, y everything is fine. By the time we get to y, we already revived y because x is alive and hence the entry (y, {}) will remain alive as well.

However, if we mark in the order y, x we are hosed. By the time we revive y, (y, {}) was already lost.

I am not sure whether there is a more clever way to do this, but at 2am this is the best I can think off.

The iteration is bounded by the length of the longest cycle (O^2). With this is definitely possible to build data structures that take quite a while to GC (DOS attack).
Comment 1 Andreas Gal :gal 2010-02-23 02:23:47 PST
Created attachment 428412 [details] [diff] [review]
patch
Comment 2 Brendan Eich [:brendan] 2010-02-23 10:55:49 PST
Comment on attachment 428412 [details] [diff] [review]
patch

More welcome late-night hacking -- youth is not wasted on all the wrong people :-).

>+		jsweak.cpp \

"Was that a cut?" :-P

How about jsweakptr.cpp for that proposal when it's time, and jsephemerontable.cpp or (if you don't want to get too long-winded, but I do not mind -- better to follow a simple naming rule at this point [some day we will cure the 8.3 hangover]) jsephemeron.cpp for this file?

>+class EphemeronTable {
>+    static EphemeronTable *fromJSObject(JSObject *obj);
>+
>+    JSObject *next;
>+    js::HashMap<JSObject*, JSObject *> map;
>+public:
>+

One blank line before label, not after.

>+    static JSBool put(JSContext *cx, uintN argc, jsval *vp);

Spec calls this "set" to match Object.defineProperty 'set' property in descriptor, etc.

>+    static JSBool get(JSContext *cx, uintN argc, jsval *vp);
>+
>+    static void Trace(JSTracer *trc, JSObject *obj);

Deadwood, right? Except we still support a generic heap-tracing API, so maybe this should be unified with Mark.

Nit: we don't StudlyCaps C++ member names, so trace, mark and sweep -- not Trace, Mark, and Sweep.

>+    EphemeronTable(JSContext *cx) : map(cx)
>+    {
>+    }

You don't initialize next here.

Can put "{}" on end of first line of ctor, per house C++ style.

>+EphemeronTable::put(JSContext *cx, uintN argc, jsval *vp)
>+{
>+    JSObject *obj = JS_THIS_OBJECT(cx, vp);
>+    if (!JS_InstanceOf(cx, obj, &js_EphemeronTableClass, vp ? vp + 2 : NULL))
>+        return false;
>+    if (argc < 2) {
>+        JS_ReportErrorNumber(cx, js_GetErrorMessage, NULL, JSMSG_MORE_ARGS_NEEDED,
>+                             "EphemeronTable.put", 1, "");
>+        return false;
>+    }
>+    JSObject *key = NonNullObject(cx, vp[2]);
>+    if (!key)
>+        return false;
>+    JSObject *value = NonNullObject(cx, vp[3]);

Spec does not require non-primitive value -- this is important.

>+    if (!value)
>+        return false;
>+
>+    EphemeronTable *table = (EphemeronTable *)obj->getPrivate();
>+    if (!table) {
>+        void *mem = js_malloc(sizeof(EphemeronTable));
>+        if (!mem)
>+            goto out_of_memory;
>+        table = new (mem) EphemeronTable(cx);
>+        if (!table->map.init()) {
>+            js_free(mem);
>+            goto out_of_memory;
>+        }
>+        obj->setPrivate(table);

So you use placement new into a js_malloc'ed allocation, but the finalizer delete's that pointer -- bug? Best to be consistent and use js_free.

That reminds me: shouldn't these be cx->malloc and cx->free, for proper memory pressure accounting.

>+    }
>+
>+    *vp = JSVAL_VOID;
>+    return !!table->map.put(key, value);

The !! is not required AFAIK, unless there's a VS bug. Try without unless someone can cite proof of such.

>+EphemeronTable::get(JSContext *cx, uintN argc, jsval *vp)
>+{
. . .
>+    js::HashMap<JSObject *, JSObject*>::Ptr ptr = fromJSObject(obj)->map.lookup(key);
>+    if (!ptr) {
>+        *vp = JSVAL_VOID;
>+        return true;
>+    }
>+    *vp = OBJECT_TO_JSVAL(ptr->value);
>+    return true;

Just write

    *vp = ptr ? OBJECT_TO_JSVAL(ptr->value) : JSVAL_VOID;
    return true;

and avoid the extra return and if-then overhead.

>+EphemeronTable::Trace(JSTracer *trc, JSObject *obj)
>+{
>+    if (IS_GC_MARKING_TRACER(trc)) {
>+        EphemeronTable *table = fromJSObject(obj);
>+        if (table) {
>+            if (table->map.empty()) {
>+                delete table;

This method is deadwood currently, but note delete of js_malloc'ed space.

>+                obj->setPrivate(NULL);
>+                return;
>+            }
>+            JSRuntime *rt = trc->context->runtime;
>+            if (rt->gcWeakPtrList)
>+                fromJSObject(rt->gcWeakPtrList)->next = obj;
>+            table->next = NULL;
>+            rt->gcWeakPtrList = obj;

Again this method is deadwood, but shouldn't it be inserting obj at rt->gcWeakPtrList, so setting table->next = rt->gcWeakPtrList before rt->gcWeakPtrList = obj? Noting in case something needs to do something like what this is trying to do.

This shows that the JSRuntime member should be named gcEphemeronTableList. The WeakPtr proposal

http://wiki.ecmascript.org/doku.php?id=strawman:weak_references

(which again suggests avoiding the "Ptr" [pointer] trope, which is too low-level and suggestive of unsafe pointers -- I'm nagging MarkM here ;-) is a different proposal, and it will not obviously be able to share this rt->gcWeakPtrList member. Again as-concrete-as-possible-at-first and systematically constructed names are best.

>+EphemeronTable::Mark(JSTracer *trc)
>+{
>+    JSContext *cx = trc->context;
>+    JSRuntime *rt = cx->runtime;
>+
>+    bool again;
>+    do {
>+        again = false;
>+        JSObject *obj = rt->gcWeakPtrList;
>+        while (obj) {

Please be l33t (and scope precisely) and put the JSObject *obj decl in the while condition.

>+            EphemeronTable *table = fromJSObject(obj);
>+            for (js::HashMap<JSObject *, JSObject *>::Enum e(table->map.enumerate());
>+                 !e.empty();
>+                 e.popFront()) {
>+                if (!JS_IsAboutToBeFinalized(cx, e.front().key)) {
>+                    /* If the key is alive, check whether we should revive the value. */
>+                    if (JS_IsAboutToBeFinalized(cx, e.front().value)) {
>+                        /* If this is the only reference to the value, revive it. */
>+                        JS_CALL_OBJECT_TRACER(trc, e.front().value, "__ephemeron__");

Blank line here.

Common e.front().value into jsval v (see below).

>+                        /* We revived a value, iterate again. */
>+                        again = true;

Easy optimization, to match allowing primitive values: again = !JSVAL_IS_PRIMITIVE(v).

>+EphemeronTable::Sweep(JSContext *cx)
>+{
>+    JSRuntime *rt = cx->runtime;
>+
>+    JSObject *obj = rt->gcWeakPtrList;
>+    while (obj) {

l33t C++ nit again.

>+        EphemeronTable *table = fromJSObject(obj);
>+        for (js::HashMap<JSObject *, JSObject *>::Enum e(table->map.enumerate());
>+             !e.empty();
>+             e.popFront()) {
>+            if (JS_IsAboutToBeFinalized(cx, e.front().key))
>+                e.removeFront();
>+        }
>+        obj = table->next;
>+    }
>+
>+    rt->gcWeakPtrList = NULL;

So Sweep nulls the list, so js_GC should not (it can assert that the member is null -- runtime is zero-initialized -- no need to store two nulls back to back to the list member).

This raises the question: what nulls table->next? Only the EphemeronTable::Trace dead-wood. And the ctor lacks that member-init currently (noted above). But who appends to rt->gcWeakPtrList? Only EphemeronTable::Trace, again.

Seems we want to build the list during mark and process/kill it during sweep. Obvious in light of day I'm sure :-).

>+EphemeronTable::Construct(JSContext *cx, JSObject *obj, uintN argc, jsval *argv, jsval *rval)
>+{
>+    if (!JS_IsConstructing(cx)) {
>+        if (!(obj = js_NewObject(cx, &js_EphemeronTableClass, NULL, NULL)))

Avoid nexting assignment in if conditions where gratuitous, per house style.

>+js_InitEphemeronTableClass(JSContext *cx, JSObject *obj)
>+{
>+    JSObject *proto;

Initialize this decl.

>+    proto = js_InitClass(cx, obj, NULL, &js_EphemeronTableClass, EphemeronTable::Construct, 0,
>+                         NULL, methods, NULL, NULL);
>+    if (!proto)
>+        return NULL;
>+
>+    proto->setPrivate(NULL);
>+

No need for a blank line here.

>+void
>+js_MarkEphemeronTables(JSTracer *trc)
>+{
>+    EphemeronTable::Mark(trc);
>+}
>+
>+void
>+js_SweepEphemeronTables(JSContext *cx)
>+{
>+    EphemeronTable::Sweep(cx);
>+}

Thinner code footprint if you just expose EphemeronTable in jsephemerontable.h and have the GC call its public statics directly. The C heritage still hangs heavily over us but this seems avoidable.

/be
Comment 3 Robert Sayre 2010-02-23 11:27:12 PST
what repository are you planning to land this in?
Comment 4 Brendan Eich [:brendan] 2010-02-23 11:40:40 PST
(In reply to comment #2)
> Avoid nexting assignment in if conditions where gratuitous, per house style.

s/nexting/nesting/

(In reply to comment #3)
> what repository are you planning to land this in?

We can configure this off at first to avoid security bug risks.

We do need to publicly try out Harmony proposals pretty soon, though. I'm not at all worried about breaking developers who use support and then the proposal changes. But the security bug worry is a good one.

To focus on this patch, the hazard would be a failure to mark something while keeping a reference to it exposed to script.

/be
Comment 5 Robert Sayre 2010-02-23 12:48:42 PST
(In reply to comment #4)
> 
> We can configure this off at first to avoid security bug risks.
> 
> We do need to publicly try out Harmony proposals pretty soon, though. I'm not
> at all worried about breaking developers who use support and then the proposal
> changes. But the security bug worry is a good one.

I'm mostly worried about integration pain, since we don't yet know how the GC and Jmonkey work is going to show up. TM is actually relatively close to the supposedly shippable m-c tree, so we don't want to get a bunch of intertwined half-done and/or experimental work landing that can't be configured off.
Comment 6 Brendan Eich [:brendan] 2010-02-23 13:02:58 PST
Traditional way is to use jsversion.h macros (trim old ones, keep moving forward by making unconditional what is now a standard, de-jure or even de-facto). Now we need configure magic too, but that is easy to copy and mutate.

I have to agree with Andreas that proxies have more attack surface, but either could have security bugs. As for GC integration, ETs want just a mark (trace) and a sweep hook, in this simple conception (prior to heavy optimization). So unless Igor or Gregor sees something, the GC dependencies seem ok.

/be
Comment 7 Andreas Gal :gal 2010-02-23 13:22:33 PST
To clarify. I only want to land either of these if we think we can ship it. I don't see a point in landing stuff that creates divergence from the product path. With proxies and the tables we can implement wrappers in JS and ... trace through them :) So I think we want that for chrome code at the very least.
Comment 8 Robert Sayre 2010-02-23 13:26:29 PST
alright, I don't think we(In reply to comment #7)
> To clarify. I only want to land either of these if we think we can ship it.

Alright, I don't think we've discussed a release vehicle for this work yet. We should probably have that discussion.
Comment 9 Brendan Eich [:brendan] 2010-02-23 13:45:54 PST
We don't land stuff to pull it out later, or leave it untested and bit-rotting, of course. The proxies proposal, and some kind of ephemeron proposal, are in good position to be in the next major ES standard -- indeed proxies are already in the harmony: namespace at http://wiki.ecmascript.org/.

We should not be any more shy about leading than we have in the past (ahem, but we should have tests -- other than fuzztesting -- better in hand... *cough* JS1.7 *cough* -- in our defense it was just mrbkap, part-time igor, and me). Ok?

/be
Comment 10 Brendan Eich [:brendan] 2010-02-23 13:48:28 PST
If I had to put odds on next release vehicle plans vs. harmonious proposals making it into the next major ES spec, I wouldn't be able to hedge well. The odds are close, and related to the extent that our prototyping proves the concepts and helps the rest of the committee agree.

IMHO "next Firefox non-patch-release vehicle" seems best. There is no need for a farther-out vehicle whose odds of being real must be lower at this point than the odds that we will both profitably use and successfully standardize this stuff.

/be
Comment 11 Andreas Gal :gal 2010-02-23 13:52:38 PST
Yeah, my goal is 3.7 for this.
Comment 12 Andreas Gal :gal 2010-02-23 16:39:02 PST
Actually, the way we mark here isn't safe. The children of an ephemeron value that was revived has to be marked as well. We need a work list.
Comment 13 Brendan Eich [:brendan] 2010-02-23 16:57:34 PST
(In reply to comment #12)
> Actually, the way we mark here isn't safe. The children of an ephemeron value
> that was revived has to be marked as well. We need a work list.

Your patch uses:

    JS_CALL_OBJECT_TRACER(trc, e.front().value, "__ephemeron__");

does indeed mark the graph reachable from e.front().value. So you're good.

/be
Comment 14 Igor Bukanov 2010-02-23 17:38:30 PST
Comment on attachment 428412 [details] [diff] [review]
patch

>diff --git a/js/src/jsgc.cpp b/js/src/jsgc.cpp
> 
>+    /* Sever all weak references that point to objects we are about to finalize. */
>+    js_MarkEphemeronTables(&trc);
>+
...
>     MarkDelayedChildren(&trc);
...
> 
>     JS_ASSERT(!cx->insideGCMarkCallback);
>     if (rt->gcCallback) {

The order of calls is wrong here. MarkDelayedChildren must be called before js_MarkEphemeronTables to make sure that the whole object graph is marked. Then EphemeronTable::Mark must integrate MarkDelayedChildren to call it after JS_CALL_OBJECT_TRACER to make sure that we have marked everything indeed.

But that without rt->gcCallback complexity. That callback assumes that it is called exactly between the marking and finalization phase. Its implementation in xpconnect uses that to add its own marking. After the last mark call the implementation then starts to finalize things. To deal with that the second MarkDelayedChildren call in js_CallGCMarker has to be also followed by js_MarkEphemeronTables.

That complication can be avoided if we split JSGC_MARK_END into JSGC_EXTRA_MARK and JSGC_MARK_FINALIZE_START. But that can wait as the above workaround could make things work for now.
Comment 15 Andreas Gal :gal 2010-02-24 00:16:54 PST
I can't find any evidence for the behavior described in #14. xpconnect starts finalizing when it gets a JSGC_FINALIZE_END callback. I will file a bug to remove the insideGCMarkCallback hack.
Comment 16 Andreas Gal :gal 2010-02-24 01:20:37 PST
We need constructors to run, so I use delete here.
Comment 17 Andreas Gal :gal 2010-02-24 02:05:48 PST
>+    JSObject *obj = rt->gcWeakPtrList;
>+    while (obj) {

l33t C++ nit again.

Nice try.

while (JSObject *obj = rt->gcWeakPtrList) { }

is an endless loop here.
Comment 18 Andreas Gal :gal 2010-02-24 02:08:52 PST
Created attachment 428675 [details] [diff] [review]
patch
Comment 19 Andreas Gal :gal 2010-02-24 02:11:19 PST
Created attachment 428676 [details] [diff] [review]
patch
Comment 20 Andreas Gal :gal 2010-02-24 02:14:30 PST
The latest version adds a length property that can be queried. Without this its basically impossible to test ephemeron tables.

I have to drive in 548215, after that this patch should be ready for testing and landing.
Comment 21 Andreas Gal :gal 2010-02-24 02:30:00 PST
Created attachment 428680 [details] [diff] [review]
patch with test case
Comment 22 Andreas Gal :gal 2010-02-24 02:39:23 PST
Created attachment 428682 [details] [diff] [review]
patch

Proper behavior when setting an undefined value (delete key/value pair if present, don't add otherwise).
Comment 23 Igor Bukanov 2010-02-24 07:30:14 PST
(In reply to comment #15)
> I can't find any evidence for the behavior described in #14.

xpconnect inside JSGC_MARK_END does both marking and calls to JS_IsAboutToBeFinalized, see, for example, http://hg.mozilla.org/tracemonkey/file/1b5ca6cc5ce8/js/src/xpconnect/src/nsXPConnect.cpp#l374 and  http://hg.mozilla.org/tracemonkey/file/1b5ca6cc5ce8/js/src/xpconnect/src/xpcwrappednativescope.cpp#l444. The latter requires that we know precisely what the garbage is. That is, all weak references must be properly marked when it calls the function.
Comment 24 Brendan Eich [:brendan] 2010-02-24 07:48:11 PST
(In reply to comment #17)
> >+    JSObject *obj = rt->gcWeakPtrList;
> >+    while (obj) {
> 
> l33t C++ nit again.
> 
> Nice try.

Sorry, pattern-matching too much -- still, you have to get l33t and forget teh C and Java :-P.

/be
Comment 25 Dave Herman [:dherman] 2010-02-24 08:42:28 PST
(In reply to comment #20)
> The latest version adds a length property that can be queried. Without this its
> basically impossible to test ephemeron tables.

I knew this would come up. :) The length property breaks the guarantee of ephemeron tables that it's not observable when weak references die. Otherwise you might as well just have ephemerons.

Of course you're right that it's harder to test. Would it be possible to add some sort of simple but privileged-C++ observation mechanism for observing things that otherwise shouldn't be observable to ordinary JS programs? And that's only available to the test harness? Something like:

#ifdef JS_HAS_TEST_HARNESS
# define INTERNALS_INIT     js_InternalsClass
#else
# define INTERNALS_INIT     js_NullClass
#endif

/* ... */

JS_PROTO(Internals, INTERNALS_INIT)

And then you could use this for the test harness to make privileged observations that you otherwise shouldn't be able to. So you could do things like:

Internals.ephemeronTableLength(t)

Without something like this, we'll keep ending up adding observations to SpiderMonkey that shouldn't be there, on a case-by-case basis, without thinking carefully about what we're exposing.

Dave
Comment 26 Dave Herman [:dherman] 2010-02-24 10:30:29 PST
Submitted a new bug suggesting a simple Internals API, with boilerplate patch. See bug #548341.

Dave
Comment 27 Andreas Gal :gal 2010-02-24 10:50:11 PST
Yeah, the .length was strictly temporary. We could also just wrap it into #ifdef DEBUG.
Comment 28 Dave Herman [:dherman] 2010-02-24 11:14:16 PST
Rather than #ifdef DEBUG or a temporary hack, we should just add a global ephemeronTableLength function to the global object in shell/js.cpp so the test harness can use it.

Dave
Comment 29 Igor Bukanov 2010-02-24 12:01:36 PST
We need to consider the implications of the Ephemeron tables for the cycle collector.
Comment 30 Andreas Gal :gal 2010-02-24 12:33:19 PST
Where _exactly_ does this behavior happen in nsXPConnect.cpp. Please be specific.

The other site looks plausible. Looking into it.

(In reply to comment #23)
> (In reply to comment #15)
> > I can't find any evidence for the behavior described in #14.
> 
> xpconnect inside JSGC_MARK_END does both marking and calls to
> JS_IsAboutToBeFinalized, see, for example,
> http://hg.mozilla.org/tracemonkey/file/1b5ca6cc5ce8/js/src/xpconnect/src/nsXPConnect.cpp#l374
> and 
> http://hg.mozilla.org/tracemonkey/file/1b5ca6cc5ce8/js/src/xpconnect/src/xpcwrappednativescope.cpp#l444.
> The latter requires that we know precisely what the garbage is. That is, all
> weak references must be properly marked when it calls the function.
Comment 31 Igor Bukanov 2010-02-24 12:59:31 PST
(In reply to comment #30)
> Where _exactly_ does this behavior happen in nsXPConnect.cpp. Please be
> specific.

The marking in JSGC_MARK_END is only happen via http://mxr.mozilla.org/mozilla-central/source/js/src/xpconnect/src/nsXPConnect.cpp#397 which is a cycle-collector installed callback. Which is good and indicates that the code improved and the normal marking that xpconnect needs is happens in http://mxr.mozilla.org/mozilla-central/source/js/src/xpconnect/src/xpcjsruntime.cpp#279. This is much better than it was a year or two ago :)

But I am not sure yet what to do about the cycle collector callback. We need to talk to the cycle collector guys.



> 
> The other site looks plausible. Looking into it.
> 
> (In reply to comment #23)
> > (In reply to comment #15)
> > > I can't find any evidence for the behavior described in #14.
> > 
> > xpconnect inside JSGC_MARK_END does both marking and calls to
> > JS_IsAboutToBeFinalized, see, for example,
> > http://hg.mozilla.org/tracemonkey/file/1b5ca6cc5ce8/js/src/xpconnect/src/nsXPConnect.cpp#l374
> > and 
> > http://hg.mozilla.org/tracemonkey/file/1b5ca6cc5ce8/js/src/xpconnect/src/xpcwrappednativescope.cpp#l444.
> > The latter requires that we know precisely what the garbage is. That is, all
> > weak references must be properly marked when it calls the function.
Comment 32 Robert Sayre 2010-02-24 16:29:07 PST
(In reply to comment #6)
> Traditional way is to use jsversion.h macros (trim old ones, keep moving
> forward by making unconditional what is now a standard, de-jure or even
> de-facto). Now we need configure magic too, but that is easy to copy and
> mutate.

So jsversion.h should separate this stuff from ES5, then.
Comment 33 Andreas Gal :gal 2010-02-24 17:05:08 PST
Created attachment 428847 [details] [diff] [review]
patch
Comment 34 Andreas Gal :gal 2010-02-24 17:12:04 PST
I updated the patch to interact with the cycle collector. What's still missing is reporting the list of ephemeron tables to the cycle collector to be used as additional root set.

Which highlights a problem here: if the cycle collector revives entire tables that were not in the list initially, what do we do?
Comment 35 Andreas Gal :gal 2010-02-25 09:00:38 PST
Created attachment 428928 [details] [diff] [review]
patch with cycle collector integration
Comment 36 Igor Bukanov 2010-02-25 11:39:59 PST
(In reply to comment #35)
> Created an attachment (id=428928) [details]
> patch with cycle collector integration

Suppose a table T holds a key A and a value B. Suppose there is xpcom link X from A to B like A->X->B. Suppose that A, B and X are unreachable. Thus the GC must be able to collect them no matter what happens with the table T. Now suppose that some live xpcom object LiveY holds a reference to T. Then according to the patch the cycle collector view would be:

LiveY->T->A->X->B
LiveY->T->B

This makes X live. As such the cycle collector would trigger the GC marking of all objects that X holds. This marks A and B preventing them from the GC.
Comment 37 Igor Bukanov 2010-02-25 11:40:50 PST
Here is a possible non-reviving algorithm that should work both for the GC marker and the cycle collector. 

First consider the GC marker. If during the marking the GC discovers some table T, the algorithm flags all its table entries with a special flag. Also, for any object A that the GC marks the GC also checks if A is also a key in some tables. Then for all tables holding A as the key the GC also flags the corresponding table entry. Whenever the entry is flagged twice, the entry value B is recursively marked by the GC.

Now consider the case of the cycle collector. There again if the CC discovers a table T, it flags all its entries. If it discovers some object A that is a key is some tables, it also flags all entries holding A in all tables.  Whenever the entry is flagged twice, the CC records an endge from A to B and from T to B and then it proceeds recursively to discover the edges from B.

AFAICS the GC case can be implemented efficiently using spare GC flags to flag an object whenever it is a key in some table T plus some support structure to find a tables where the object is a key. For the CC case we would need an ability to record an edge from A to B when processing the table T or a link from T to B when processing A.
Comment 38 Igor Bukanov 2010-02-25 11:45:38 PST
(In reply to comment #36)
> Suppose a table T holds a key A and a value B. Suppose there is xpcom link X
> from A to B like A->X->B. Suppose that A, B and X are unreachable. Thus the GC
> must be able to collect them no matter what happens with the table T. Now
> suppose that some live xpcom object LiveY holds a reference to T. Then
> according to the patch the cycle collector view would be:
> 
> LiveY->T->A->X->B
> LiveY->T->B
> 
> This makes X live. As such the cycle collector would trigger the GC marking of
> all objects that X holds. This marks A and B preventing them from the GC.

The last statement is not correct since during the GC marking the patch would skip A. A setup that shows a link would be in addition to the above B or X also pointing to A so the CC view would be:

> LiveY->T->A->X->B
> LiveY->T->B->A
Comment 39 Andreas Gal :gal 2010-02-25 13:18:11 PST
We only color table entries as suspected, not as actual roots. I am still not convinced #36 is possible. If that was the case, all wrappers would automatically keep everything alive all the time, too. peterv?
Comment 40 Andreas Gal :gal 2010-02-25 13:20:46 PST
Independently of this discussion, I would like to go ahead and land the patch. The alternative GC method in #37 is rather involved and its not clear there are no corner cases there either. Assuming #36 is correct, in those cases ephemeron tables behave no worse (and no better) than a regular hashtable built in JS for those specific entries. We are not creating any security problems. We merely fail to collect interdependent keys that have a cycle through xpcom. Not great, but not fatal. I am interesting in exploring better cycle collection, but I don't think that's a show stopper.
Comment 41 Igor Bukanov 2010-02-25 14:16:38 PST
(In reply to comment #40)
> Assuming #36 is correct, in those cases ephemeron
> tables behave no worse (and no better) than a regular hashtable built in JS for
> those specific entries. We are not creating any security problems.

If that is correct, then the GC may collect something that the CC assume should be alive. I do not know how bad is that.
Comment 42 Andreas Gal :gal 2010-02-25 14:22:31 PST
What do you mean? How would the GC collect it? I thought the problem is the CC will mark thinks that shouldn't be marked?
Comment 43 Igor Bukanov 2010-02-25 14:34:18 PST
(In reply to comment #40)
> Independently of this discussion, I would like to go ahead and land the patch.

Why this rush is necessary? And what if it turns out that ephemeron tables are simply incompatible with the cycle collection and the current xpcom model however unlikely this is? For this reason we should have at least an understanding how to make the table fully sound even if the initial implementation would not be much better than a hashtable.

> We merely
> fail to collect interdependent keys that have a cycle through xpcom. Not great,
> but not fatal.

If landing not very sound implementation is OK, I suggest to remove any xpcom integration and just run the table marking after the cycle collector finishes its job. This way we avoid the danger of creating two view of the heap, one for CC and one for GC.
Comment 44 Igor Bukanov 2010-02-25 14:41:57 PST
(In reply to comment #42)
> What do you mean? How would the GC collect it? I thought the problem is the CC
> will mark thinks that shouldn't be marked?

CC asks GC to mark recursively any JS things that a live XPCOM object contains. If one of the things is the table, that would fail since the table will not mark its entries. So if a key or value is a JS wrapper for XPCOM thing, that would be finalized even if CC things itr should not. I do not know what are the consequences of that. It could turned out to be safe, but then we have to have a proof for that.
Comment 45 Andreas Gal :gal 2010-02-25 14:43:00 PST
> If landing not very sound implementation is OK, I suggest to remove any xpcom
> integration and just run the table marking after the cycle collector finishes
> its job. This way we avoid the danger of creating two view of the heap, one for
> CC and one for GC.

Yeah, I am not opposed to that. The table isn't worse than a regular hash table, so we don't add additional leaks. We just fail to remove some.
Comment 46 Igor Bukanov 2010-02-25 16:15:21 PST
(In reply to comment #43)
> If landing not very sound implementation is OK, I suggest to remove any xpcom
> integration and just run the table marking after the cycle collector finishes
> its job. This way we avoid the danger of creating two view of the heap, one for
> CC and one for GC.

I talked with paterv about that and things is not that simple. If EphemeronTable::reviveValuesWithLiveKeys is called after the cycle collector does its job, then it may make alive things that the CC schedules for finalization. This is bad. If reviveValuesWithLiveKeys is called before the cycle collection, then it may remove keys that the CC would make alive. A workaround for that could be to treat the ephemeron tables as normal strong hashtables whenever js_GC is invoked as a part of CC. Only when js_GC is called alone the code may do reviveValuesWithLiveKeys magic. But that IMO looks like a hack just to land leaky tables.
Comment 47 Andreas Gal :gal 2010-02-25 17:09:58 PST
"Leaky" is a pretty misleading term here. We are talking about landing ephemeron tables that are not as eagerly purged as they possibly could be in the presence of XPCOM cycles. Worst case an ephemeron table is as efficient as using an object as a hash table. In 99% of the cases its more efficient.

We clearly need a way to coordinate between cycle collection and table purging, no doubt about it. But I don't see why we have to solve everything in one step. We are strictly making this use case _less_ "leaky" if we land what we have. Currently the best thing you can do is use an object, which "leaks" everything, every time.
Comment 48 Andreas Gal :gal 2010-02-26 17:19:42 PST
Ok, I patched up the cycle collector so it can deal with additional edges being added after a node (object) was scanned. With this we can now properly report all edges. In case of (conditionally) weak edges, we report them as a regular edge. The only negative side effect of that is that it might cause cycles, which the cycle collector is of course able to deal with.
Comment 49 Igor Bukanov 2010-02-27 10:53:08 PST
I have found that it is worth to consider ephemeron tables in a pure reference counting world. In that world, when we put a value B for a key A into the table T, we must increase the reference count for B. This is necessary to prevent B from going away before the cycle collector is running.

When the CC is running we need to report to it an edge that is responsible for an extra reference. Without this the CC will assume that there is an external reference to B and it would be held alive. Where this edge may come from? It we report it as an edge from A, then, if B also references T and A is alive,  this would make both B and T alive preventing T from the collection. If we report the edge as coming from T, then, if B also points to A and T is alive, it would make A alive preventing it from the collection.

For completeness lets report the edge as coming from B itself pretending that B is a self-reference. This ensures that B to stay alive until CC runs. Now the CC runs and finds that B should go away since it belongs to some dead loop. Then before breaking the loop CC checks if A and T are both alive (that is, they do not belong to some dead cycle). If they are, then CC would not break any loop involving B. But this does not work since A or T could belong to yet another loop that should be held alive by other ephemeron values.

So we cannot model the extra reference with normal edges.
Comment 50 Igor Bukanov 2010-03-02 09:29:07 PST
Here is a possible collection algorithm for a pure reference counting world that is similar in spirit to the GC marking in the attachment from the comment 35.  Consider an ephemeron table T, a key A and a value B. We model that by a strong reference from T to B and by a weak reference from T to A. 

Suppose now the cycle collector starts to run and detected all the edges for the initial set of gray objects. Then the CC colors black (alive) any gray objects that has an external reference to it. Normally then the CC proceeds recursively to mark black any object recursively reachable from the black (alive) objects. That is modified so during the recursion the CC skips the references from T to any of its value B. 

Then, when this is finished, the algorithm consider all black tables T and for them marks as black any key B recursively if the corresponding key A is not gray.  Note that, while we consider only tables explicitly colored black, the key A may be "not gray" not only because it is black but also because it is not a member of the initial gray set. That is, A was not member of any cycle yet. This difference comes from the fact that we add strong references only from T to B and not to A.

Since this may turn more tables and keys black, that has to be run repeatedly in the same way as the GC marking does in the comment 35.

After this is done, any value B that remains gray in a black T is only referenced via T so it can be removed. This will decrease the reference count to B. After this is done for all such B, then the the CC can proceed with collecting all gray cycles the same way it does now.
Comment 51 Andreas Gal :gal 2010-03-24 12:20:53 PDT
As a little optimization, the cursor should also be on the stack. So creating an iterator leaves (private, cursor) on the stack. private is a struct that contains the idlist and shape information. This way getting the next id during iteration no longer requires a builtin call.
Comment 52 Alex Vincent [:WeirdAl] 2011-03-01 11:54:29 PST
I'd be happy to be an early tester of this code - I have several areas where I need object hashtable support, and I'm currently hacking around it by giving each key object its own unique hash string key.  I don't like the solution I came up with, though, so this would be better.

The current patch is over a year old and has r-.  I'm concerned about potential bitrot, and would like an up-to-date patch that hasn't been rejected to play with.
Comment 53 Andreas Gal :gal 2011-03-15 00:41:36 PDT
Created attachment 519353 [details] [diff] [review]
patch

Fresh patch. The constructor has been renamed to WeakMap. Debug builds contain a .length property. This patch relies on the gray/black persistent mark bitmap work we did for FF4, so no special cycle collector magic should be necessary any more.
Comment 54 Andreas Gal :gal 2011-03-15 00:42:27 PDT
Created attachment 519354 [details] [diff] [review]
patch

And this time the right patch.
Comment 55 Alex Vincent [:WeirdAl] 2011-03-15 10:49:40 PDT
:(  I know I said I'd be an early tester of this patch (and I will)... but I had hoped the patch would include some unit tests in js/src/tests.

This is particularly surprising given that comment 21 did include a xpcshell test.
Comment 56 Alex Vincent [:WeirdAl] 2011-03-15 12:43:59 PDT
gal:  Which hg.m.o repository and revision does this patch apply to?  I just tried a dry-run against mozilla-central trunk and against releases/mozilla-2.0's last patch before being tagged, and it failed:

$ patch --dry-run -p1 < ~/Downloads/weakmap
patching file js/src/Makefile.in
patching file js/src/jsapi.cpp
Hunk #1 FAILED at 79.
Hunk #2 succeeded at 1667 (offset -16 lines).
1 out of 2 hunks FAILED -- saving rejects to file js/src/jsapi.cpp.rej
patching file js/src/jsatom.cpp
Hunk #1 succeeded at 207 (offset 1 line).
patching file js/src/jsatom.h
patching file js/src/jscntxt.h
Hunk #1 succeeded at 1051 (offset -2 lines).
patching file js/src/jsgc.cpp
Hunk #1 succeeded at 85 (offset 1 line).
Hunk #2 FAILED at 2271.
Hunk #3 FAILED at 2291.
Hunk #5 FAILED at 2405.
Hunk #6 succeeded at 2461 (offset -6 lines).
3 out of 6 hunks FAILED -- saving rejects to file js/src/jsgc.cpp.rej
patching file js/src/jsgcinlines.h
patching file js/src/jsproto.tbl
patching file js/src/jsweakmap.cpp
patching file js/src/jsweakmap.h
Comment 57 :Ms2ger 2011-03-15 13:05:42 PDT
Comment on attachment 519354 [details] [diff] [review]
patch

>+ * The Original Code is Mozilla SpiderMonkey JavaScript 1.9 code, released
>+ * May 28, 2008.

Really?
Comment 58 Andreas Gal :gal 2011-03-15 16:44:28 PDT
Tests are coming. The patch is against tracemonkey tip.
Comment 59 Jason Orendorff [:jorendorff] 2011-03-17 09:37:55 PDT
>     /* Finalize watch points associated with unreachable objects. */
>     js_SweepWatchPoints(cx);
> 
>+    /* Finalize unreachable (key,value) pairs in all weak maps. */
>+    WeakMap::sweep(cx);

It looks like these aren't compartment-aware. If you do a single-compartment GC, what happens to watchpoints and weakmaps in other compartments?

That would be an existing bug in the case of js_SweepWatchPoints.
Comment 60 Jason Orendorff [:jorendorff] 2011-03-17 09:48:59 PDT
Never mind. IsAboutToBeFinalized gets it right in that case!
Comment 61 Jason Orendorff [:jorendorff] 2011-03-17 14:33:45 PDT
It's great to see this moving again.

The basic GC algorithm looks correct, with a small bug. But I don't know enough
about the cycle collector to know if that is safe; and I don't know enough
about what's using the tracing APIs to know if it's safe for its behavior to
differ from the GC marking algorithm. I imagine it's fine, but someone else
should take a look too.

This shouldn't land without comprehensive tests. If you don't want to write 'em
yourself, call in a favor.

In jsgc.cpp, MarkAndSweepCompartment:
>      * tracing.
>      */
>     gcmarker.markDelayedChildren();
> 
>     /*
>      * Mark weak roots.
>      */
>     while (true) {
>-        if (!js_TraceWatchPoints(&gcmarker))
>+        if (!js_TraceWatchPoints(&gcmarker) &&
>+            !WeakMap::markIteratively(&gcmarker)) {
>             break;
>+        }
>         gcmarker.markDelayedChildren();
>     }
> 
>     rt->gcMarkingTracer = NULL;
> 
>     if (rt->gcCallback)
>         (void) rt->gcCallback(cx, JSGC_MARK_END);
> 

MarkAndSweep:
>      * tracing.
>      */
>     gcmarker.markDelayedChildren();
> 
>     /*
>      * Mark weak roots.
>      */
>     while (true) {
>-        if (!js_TraceWatchPoints(&gcmarker))
>+        if (!js_TraceWatchPoints(&gcmarker) &&
>+            !WeakMap::markIteratively(&gcmarker)) {
>             break;
>+        }
>         gcmarker.markDelayedChildren();
>     }
> 
>     rt->gcMarkingTracer = NULL;
> 
>     if (rt->gcCallback)
>         (void) rt->gcCallback(cx, JSGC_MARK_END);
> 

Time to factor out common code, I think.

In jsweakmap.h, class WeakMap:
>+  public:
>+    static JSBool set(JSContext *cx, uintN argc, Value *vp);
>+    static JSBool get(JSContext *cx, uintN argc, Value *vp);
>+
>+    static JSBool getProperty(JSContext *cx, JSObject *obj, jsid id, Value *vp);
>+
>+    static void mark(JSTracer *trc, JSObject *obj);
>+
>+    static JSBool construct(JSContext *cx, uintN argc, Value *vp);
>+    static void finalize(JSContext *cx, JSObject *obj);
>+
>+    static bool markIteratively(JSTracer *trc);
>+    static void sweep(JSContext *cx);
>+
>+    static Class jsclass;
>+};

Ideally everything would be private except the jsclass, mark, sweep, and a
WeakMap::initClass(cx, obj) static method, since this class isn't safe to
instantiate without a JSObject backing it. But it's not a big deal--whatever
works.

In jsweakmap.cpp, WeakMap::set:
>+    if (argc < 2) {
>+        JS_ReportErrorNumber(cx, js_GetErrorMessage, NULL, JSMSG_MORE_ARGS_NEEDED,
>+                             "WeakMap.put", 1, "");
>+        return false;
>+    }

The 1 needs to be in quotes, otherwise
  WeakMap().set("three");  // crashes

Also: "WeakMap.set", not put.

Also, I bet this gets spec'd such that if you omit the second argument, the
value is undefined, more like all the other standard builtin functions:

   map.set(Math);   // delete the entry for Math

>+        void *mem = cx->malloc(sizeof(WeakMap));
>+        if (!mem)
>+            goto out_of_memory;
>+        table = new (mem) WeakMap(cx);

Instead:
          table = js_new<WeakMap>(cx);
          if (!table)
              goto out_of_memory;

In WeakMap::get:
>+        JS_ReportErrorNumber(cx, js_GetErrorMessage, NULL, JSMSG_MORE_ARGS_NEEDED,
>+                             "WeakMap.put", "0", "s");

"WeakMap.get"

>+    js::HashMap<JSObject *, Value>::Ptr ptr = fromJSObject(obj)->map.lookup(key);

fromJSObject(obj) can be null here:
  WeakMap().get({});  // null crash

In WeakMap::mark:
>+            JSRuntime *rt = trc->context->runtime;
>+            if (rt->gcWeakMapList)
>+                fromJSObject(rt->gcWeakMapList)->next = obj;
>+            table->next = NULL;
>+            rt->gcWeakMapList = obj;

This doesn't seem right. It's building the list in the wrong direction. The
loop in markIteratively would only run once, since this always makes
fromJSObject(rt->gcWeakMapList)->next NULL.

In WeakMap::markIteratively:
>+                if (IsAboutToBeFinalized(cx, value.toGCThing())) {
>+                    js::gc::MarkValue(trc, value, "value");
>+                    /* We revived a value, we have to iterate again. */
>+                    again = true;
>+                }

If the value is a string, we don't have to iterate, right?

In jsweakmap.cpp:
>+void
>+WeakMap::finalize(JSContext *cx, JSObject *obj)
>+{
>+    WeakMap *table = fromJSObject(obj);
>+    if (table) {
>+        delete table;
>+        obj->setPrivate(NULL);
>+    }
>+}

Is this just paranoia, or is it necessary to obj->setPrivate(NULL) here?

In jsweakmap.cpp, WeakMap::construct:
>+    if (!IsConstructing(vp) &&

Shouldn't it do the same thing whether we're constructing or not? The Array
constructor (js_Array) doesn't check IsConstructing.

In jsweakmap.cpp:
>+Class WeakMap::jsclass = {
>+    "WeakMap",
>+    JSCLASS_HAS_PRIVATE |
>+    JSCLASS_HAS_CACHED_PROTO(JSProto_WeakMap),
>+    PropertyStub,         /* addProperty */
>+    PropertyStub,         /* delProperty */
>+#ifdef DEBUG
>+    WeakMap::getProperty,
>+#else
>+    PropertyStub,
>+#endif

Having a class-wide getter or an extra property somewhere affects all sorts of
things in the jits and property cache--things that should behave exactly the
same in debug and opt builds, for sanity's sake.

How about a debug-only shell function weakMapCount(map) instead?

(Class-wide getters are evil to begin with -- passing a JSPropertyOp to
DefineProperty would be the right way.)
Comment 62 Jeff Walden [:Waldo] (remove +bmo to email) 2011-03-17 15:09:25 PDT
I don't think we should overload set() to remove() mappings.  The two verbs are quite different, and putting both add-mapping and remove-mapping under the same verb seems likely to be confusing.

Also, having remove() instead of set(..., undefined) being special would also allow weak maps to map any object to any value -- no need for the developer to specially worry if he ever wants to map |undefined|.

Permitting insertion of |undefined| would also require a has() or contains() method to make that distinction.  This seems like another good verb to have, regardless of whether |set()| is or isn't overloaded, even if the can't-insert-|undefined| behavior means it's not actually functionality overloading right now.


Now a few less-important thoughts:

(In reply to comment #61)
> It's great to see this moving again.
> 
> This shouldn't land without comprehensive tests. If you don't want to write
> 'em yourself, call in a favor.

Concur!

> Also, I bet this gets spec'd such that if you omit the second argument, the
> value is undefined, more like all the other standard builtin functions:
> 
>    map.set(Math);   // delete the entry for Math

We can always loosen to allow omitting the second argument if it ends up being specified that way.  It's harder to go the opposite direction if we guess incorrectly.

There's a js::ToObject method which does exactly what your NonNullObject method does -- use that instead.  But I'd suggest having a more specific error message when the first argument to set() isn't an object, because I suspect the most likely expectation for developers is going to be that WeakMap is a superset of a normal object's mapping functionality, *not* that it's a subset of the exact inverse functionality.  (In other words: they'll expect this "better hash" will map anything to anything, not that it will only map objects to anything [or anything-minus-|undefined| as it stands now].)

> How about a debug-only shell function weakMapCount(map) instead?

I had exactly this concern and suggestion myself.
Comment 63 Alex Vincent [:WeirdAl] 2011-03-17 15:45:07 PDT
I have a general question.  Suppose that you have a collection of objects stored in the WeakMap as values, and that the WeakMap is the only place where references to those objects are stored.

Under the WeakMap specification, which I assume you're implementing here, the only references to those objects are weak references, then, and they are vulnerable to being deleted at any time.  The spec says:

"neither keys nor values in a weak map are held strongly"

I certainly understand and wholeheartedly support weak references to keys.  I'm just wondering if we can adjust the design and/or implementation so that the values can (not must!) be strongly held, if an extra true parameter is passed in either in the WeakMap constructor, or in the set() method.

(On the downside, suppose those objects hold strong references to the keys, as my 1:1 hashtable design does.  Then you're back in a circular reference scenario.)

Is this necessary and/or desirable?
Comment 64 Mark S. Miller 2011-03-18 00:11:01 PDT
(In reply to comment #63)
> Under the WeakMap specification, which I assume you're implementing here, the
> only references to those objects are weak references, then, and they are
> vulnerable to being deleted at any time.  The spec says:
> 
> "neither keys nor values in a weak map are held strongly"

This text from Dave's question is part of a proposed summary of the semantics. To avoid your misreading, it could perhaps be better stated as 

   * neither keys nor values in a weak map are held strongly or weakly

In any case, his meaning becomes clear (and this clear meaning is correct) when he goes on to say

   * i.e., the presence of a mapping k ⇒ v in a reachable weak map m does not by itself make k or v reachable

   * for each mapping k ⇒ v in a reachable weak map m, if k is reachable then v is reachable

In other words, the weakness provided by WeakMaps is not reduced (and in fact is not reducible) to the weakness or strongness of individual references to the keys and values. The WeakMap proposal in fact does not require the existence of weak references of any sort. 


> I certainly understand and wholeheartedly support weak references to keys. I'm

There is a separate strawman for weak references, and it could co-exist with WeakMaps, but it is not expected for EcmaScript-next. No one has signed up to propose it before the May deadline. Although I wrote it, I cannot say I wholeheartedly support it. My feelings about it are mixed, but in any case it lives or dies independently of WeakMaps.


> just wondering if we can adjust the design and/or implementation so that the
> values can (not must!) be strongly held, if an extra true parameter is passed
> in either in the WeakMap constructor, or in the set() method.
> 
> (On the downside, suppose those objects hold strong references to the keys, as
> my 1:1 hashtable design does.  Then you're back in a circular reference
> scenario.)
> 
> Is this necessary and/or desirable?
Comment 65 Mark S. Miller 2011-03-18 00:22:31 PDT
(In reply to comment #62)
> I don't think we should overload set() to remove() mappings.  The two verbs are
> quite different, and putting both add-mapping and remove-mapping under the same
> verb seems likely to be confusing.
> 
> Also, having remove() instead of set(..., undefined) being special would also
> allow weak maps to map any object to any value -- no need for the developer to
> specially worry if he ever wants to map |undefined|.
> 
> Permitting insertion of |undefined| would also require a has() or contains()
> method to make that distinction.  This seems like another good verb to have,
> regardless of whether |set()| is or isn't overloaded, even if the
> can't-insert-|undefined| behavior means it's not actually functionality
> overloading right now.

Hi Jeff, I am open to this change of API though it is not my first choice. Your suggestion corresponds to the Soft Own Fields example at http://wiki.ecmascript.org/doku.php?id=harmony:weak_maps#explicit_soft_own_fields . Either it or WeakMaps as proposed can trivially be built from the other, so either would be fine as the sanctioned primitive.

However, if you wish this change in API, the right place to advocate it is on es-discuss and at EcmaScript committee meetings. While the existing API stands as the accepted proposal, is there any reason not to implement it as is, until we have general agreement to revise it?

The reason I prefer the simpler API is it has a simpler semantics: A WeakMap is a total mapping from all objects to EcmaScript values. Only a finite number of these objects map to values other than undefined. With your suggested change, we'd then need to describe it as a partial mapping. Which I'm ok with, but it seems slightly ickier to me. It also means that for applications that only need the simpler semantics, i.e., that only use get/set, that the implementation would need to devote storage to maintain a distinction that these apps would never observe.


> 
> 
> Now a few less-important thoughts:
> 
> (In reply to comment #61)
> > It's great to see this moving again.
> > 
> > This shouldn't land without comprehensive tests. If you don't want to write
> > 'em yourself, call in a favor.
> 
> Concur!
> 
> > Also, I bet this gets spec'd such that if you omit the second argument, the
> > value is undefined, more like all the other standard builtin functions:
> > 
> >    map.set(Math);   // delete the entry for Math
> 
> We can always loosen to allow omitting the second argument if it ends up being
> specified that way.  It's harder to go the opposite direction if we guess
> incorrectly.
> 
> There's a js::ToObject method which does exactly what your NonNullObject method
> does -- use that instead.  But I'd suggest having a more specific error message
> when the first argument to set() isn't an object, because I suspect the most
> likely expectation for developers is going to be that WeakMap is a superset of
> a normal object's mapping functionality, *not* that it's a subset of the exact
> inverse functionality.  (In other words: they'll expect this "better hash" will
> map anything to anything, not that it will only map objects to anything [or
> anything-minus-|undefined| as it stands now].)
> 
> > How about a debug-only shell function weakMapCount(map) instead?
> 
> I had exactly this concern and suggestion myself.
Comment 66 Jason Orendorff [:jorendorff] 2011-03-18 08:42:46 PDT
I wrote:
> > How about a debug-only shell function weakMapCount(map) instead?

But actually there's no reason for that to be debug-only, really. If it's going to be used in tests, those tests might as well work in opt builds too.

(In reply to comment #62)
> There's a js::ToObject method which does exactly what your NonNullObject method
> does -- use that instead.

They're different. js::ToObject(cx, &stringval) creates a String wrapper object; NonNullObject(cx, stringval) throws.
Comment 67 Jason Orendorff [:jorendorff] 2011-03-18 09:10:10 PDT
(In reply to comment #65)
> (In reply to comment #62)
> > Permitting insertion of |undefined| would also require a has() or contains()
> > method to make that distinction.  This seems like another good verb to have,
> > regardless of whether |set()| is or isn't overloaded, even if the
> > can't-insert-|undefined| behavior means it's not actually functionality
> > overloading right now.
> 
> Hi Jeff, I am open to this change of API though it is not my first choice. 
> [...]
> The reason I prefer the simpler API is it has a simpler semantics: A WeakMap is
> a total mapping from all objects to EcmaScript values. Only a finite number of
> these objects map to values other than undefined. With your suggested change,
> we'd then need to describe it as a partial mapping.

I was on the fence before, but this convinced me that Waldo is right. :-)

Programmers are very much used to Map meaning "partial mapping". Finite data structures are the norm.

Even from a mathematical purity standpoint it's hard to see what value there is in making a WeakMap "a total mapping from all objects to EcmaScript values where a finite number of these objects map to values other than undefined" rather than "a finite partial mapping from objects to EcmaScript values".

For almost every total mapping the programmer might actually want, a WeakMap is insufficient: you have to write a function. WeakMaps will be used as partial mappings in practice, because that's the job they can actually do. The only question is whether it's a partial mapping from objects to values (per Waldo), or a partial mapping from objects to values-except-undefined (per Mark). The former seems nicer to me.

Since programmers will use these as partial maps in practice, we might as well offer has() and remove() so that they can write what they mean when querying to see if an object has been added already or removing an object.
Comment 68 Dave Herman [:dherman] 2011-03-18 10:38:55 PDT
A few thoughts:

a) This discussion should perhaps be on es-discuss or with TC39.

b) I'm surprised at Mark's comments in this bug thread; my understanding was that one of the key benefits of not exposing the table size or whether an object is in the table is that it avoids exposing the unspecified garbage collection behavior. Being able to observe the GC behavior is a portability hazard, and may (insert standard IANA security expert disclaimer here) also create side channels for attackers to communicate information.

c) I agree with Jason that the total mapping idea is unintuitive. Infinite data structures don't have an intuitive space model, and weak maps are all about space efficiency.

d) I don't consider c) sufficient to argue against the currently spec'ed API, just insufficient to argue for it. :)

e) I disagree with Jason that just because our tests want to observe something, we should automatically expose it to everyone. If we want to test implementation details, we shouldn't automatically expose those details to everyone. It's easier to change our test suite when the implementation details change than to change the entire web.

f) Now that I think about it, though, it seems that even with the currently-spec'ed API, it's still always possible to observe that a value has disappeared from the table, by simply boxing all values you place in the map. Then if you get back undefined, you know the mapping has disappeared. Does this mean my understanding in b) was wrong all along, and that weak maps still do, in fact, expose the GC details?

Dave
Comment 69 Dave Herman [:dherman] 2011-03-18 10:50:13 PDT
> f) Now that I think about it, though, it seems that even with the
> currently-spec'ed API, it's still always possible to observe that a value has
> disappeared from the table, by simply boxing all values you place in the map.
> Then if you get back undefined, you know the mapping has disappeared. Does this
> mean my understanding in b) was wrong all along, and that weak maps still do,
> in fact, expose the GC details?

Ah. The reason it doesn't expose the GC details is that you have to already have a strong reference to the object in order to look it up. But the same is true for has(). So I am in favor of adding has().

We should *not* expose count, though. But Jason tells me he only meant we should expose count to the shell, not the web. I'm fine with that.

I will try to bring this up at the face-to-face next week, even though we have a full agenda. We should try to see if we can resolve this quickly, so that we can get it into Fx5. Harmonia ship-ienda est.

Dave
Comment 70 Mark S. Miller 2011-03-19 18:15:55 PDT
While I still mildly prefer the simpler API, I do recognize the logic of these counter-arguments. If we add has() we should also add delete(). Either way, no non-determinism is exposed. I'm rather happy with either API being the official one. Since I have not heard anyone else feel strongly in favor of the simpler API, I suspect we can indeed get quick agreement on the get/set/has/delete API.

Btw, please have it be "delete" rather than "remove".


> I'm surprised at Mark's comments in this bug thread; 

Which comments were those? 


> by simply boxing all values you place in the map.

At http://wiki.ecmascript.org/doku.php?id=harmony:weak_maps#explicit_soft_own_fields I show how to implement the get/set/has/delete abstraction out of the simpler get/set abstraction with no boxing overhead.
Comment 71 Dave Herman [:dherman] 2011-03-19 19:06:40 PDT
> Btw, please have it be "delete" rather than "remove".

I don't have strong feelings either way, but I'm curious why?

> > I'm surprised at Mark's comments in this bug thread; 
> 
> Which comments were those? 

Oh sorry, I retract my surprise. :) I was just temporarily confused about which operations expose non-determinism.

Dave
Comment 72 Mark S. Miller 2011-03-19 19:42:48 PDT
> > Btw, please have it be "delete" rather than "remove".

> I don't have strong feelings either way, but I'm curious why?

With the addition of has and delete, WeakMaps can not be used more directly in a soft-field-like way, modeling own[*] properties externally by a side table. The current ES5 APIs expose these four property manipulation operations as associated with "set", "get" (i.e., property descriptors of accessor properties), "has" ("hasOwnProperty"), and obviously "delete".


[*] this "own" would now be the only difference between WeakMap and SoftField, so they should have the same API. For genuine soft fields, the analogy argument for "delete" is even stronger.
Comment 73 Mark S. Miller 2011-03-19 19:43:31 PDT
> WeakMaps can not be used

Oops. Should be: "WeakMaps can now be used ...
Comment 74 Andreas Gal :gal 2011-03-29 17:17:17 PDT
I need a consensus for the API here. FF5 freezes in 2 weeks. I still like:

set(key, value)
get(key)

Its very rare that you want to preserve void values, and in that case you can easily wrap:

map = new WeakMap();
map.set = function(k,v) { WeakMap.prototype.set.call(this, k, { value: v }); }
map.contains = function(k) { return !!WeakMap.prototype.get.call(this, k); }
map.get = function(k) { return WeakMap.prototype.get.call(this, k).value; }

Again, I think this is a rather rare case so the overhead shouldn't be too bad.
Comment 75 Andreas Gal :gal 2011-03-29 17:17:33 PDT
Created attachment 522883 [details] [diff] [review]
patch
Comment 76 Andreas Gal :gal 2011-03-29 17:19:01 PDT
Created attachment 522884 [details] [diff] [review]
patch
Comment 77 Dave Herman [:dherman] 2011-03-29 19:47:44 PDT
Small change of heart. Here's my suggested API:

map.set(k, v)
map.contains(k)
map.remove(k)
map.get(k[, v])
    - if k is in the map, it returns the found value and ignores v
    - if k is not in the map and v is callable, it calls v with k as the single argument and returns the result of that
    - if k is not in the map and v is not callable, it returns v

(I know Mark wanted it to be called delete, but programmers won't be able to use the keyword after the dot in legacy versions of JS, so they'll have to use map["delete"] instead, which is a loss for readability and convenience.)

Rationale for the get API:
    - you don't have to pass the default if you don't want, and you'll get undefined as a reasonable, if slightly ambiguous, sentinel value indicating "not there"
    - you can easily pass common default values such as null, undefined, 0, "", etc
    - this API doesn't distinguish between not passing the second argument and passing undefined, which is usually a good idea
    - this API never has to throw an error from within the engine
    - if you want the default value to be a function (which is probably rare), you can still do it by giving a function that returns the function you want -- a nuisance, certainly, but this is a less common case

I will propose this API to TC39. I can't promise it will be the final API, though.

Dave
Comment 78 Andreas Gal :gal 2011-03-30 11:09:15 PDT
print(map.contains(key));
print(map.set(key, 5));
print(map.get(key));
print(map.get({}));
print(map.get({}, null));
print(map.get({}, function(key) { print(key); print(this); return 42; }));

false
undefined
5
undefined
null
[object Object]
[object WeakMap]
42
Comment 79 Andreas Gal :gal 2011-03-30 11:09:48 PDT
Created attachment 523058 [details] [diff] [review]
patch

Patch with new API.
Comment 80 Mark S. Miller 2011-03-31 09:13:21 PDT
(In reply to comment #77)
> Small change of heart. Here's my suggested API:
> 
> map.set(k, v)
> map.contains(k)

I prefer "has". On a map-like abstraction, I find "contains" more suggestive of a test of values. YMMV. And JS's current use of "has" is "hasOwnProperty" which is already a key-like test, so it has precedent.

In any case, "has" is shorter. Even the clearest "hasKey" is shorter than "contains".


> map.get(k[, v])
>     - if k is in the map, it returns the found value and ignores v
>     - if k is not in the map and v is callable, it calls v with k as the single
> argument and returns the result of that
>     - if k is not in the map and v is not callable, it returns v

What about the case where k is not in the map and v is absent? The two obvious possibilities: return undefined or throw something.

I'm also not sure I like the third bullet above. If v is dynamically calculated or passed in from somewhere, and is usually a non-callable, then it simple code seem to work until v happens to be callable. If JS ends up with #-functions, then I'd get rid of the third bullet case as too accident prone in exchange for too little syntactic benefit. OTOH, without #-functions, perhaps the syntactic benefit becomes the more important consideration. (Which is an interesting lesson in syntactic costs leading to semantically worse APIs.)

> map.remove(k)

> (I know Mark wanted it to be called delete, but programmers won't be able to
> use the keyword after the dot in legacy versions of JS, so they'll have to use
> map["delete"] instead, which is a loss for readability and convenience.)

I don't understand this. Under what scenario would someone use the WeakMap API on an ES3 implementation? If the answer is essentially none, then I still favor "delete".


> 
> Rationale for the get API:
>     - you don't have to pass the default if you don't want, and you'll get
> undefined as a reasonable, if slightly ambiguous, sentinel value indicating
> "not there"
>     - you can easily pass common default values such as null, undefined, 0, "",
> etc
>     - this API doesn't distinguish between not passing the second argument and
> passing undefined, which is usually a good idea
>     - this API never has to throw an error from within the engine
>     - if you want the default value to be a function (which is probably rare),
> you can still do it by giving a function that returns the function you want --
> a nuisance, certainly, but this is a less common case

That clarifies why the fourth bullet was left out. But if we get rid of the third bullet, as I think we should, then we still need to distinguish between undefined and absence. OTOH, we could replace the third bullet by a narrower condition that allows null and undefined to be passed through as default values.

A final possibility, with some pros and cons, for missed lookup handling logic, is to pass an optional missed lookup handler to the constructor rather than the get function:

WeakMap(v_opt)

where the optional v has the same signature as the argument proposed for get above. I don't yet know what I think about this.


> 
> I will propose this API to TC39. I can't promise it will be the final API,
> though.

Indeed. I like the functionality. I do not like the method names.


> 
> Dave
Comment 81 Dave Herman [:dherman] 2011-03-31 10:01:17 PDT
> I prefer "has".

+1

> > map.get(k[, v])
> >     - if k is in the map, it returns the found value and ignores v
> >     - if k is not in the map and v is callable, it calls v with k as the single
> > argument and returns the result of that
> >     - if k is not in the map and v is not callable, it returns v
> 
> What about the case where k is not in the map and v is absent? The two obvious
> possibilities: return undefined or throw something.

I hinted at this in the rationale but wasn't clear about it, sorry; the idea is that leaving out the default value is identical to passing undefined -- i.e., if you leave it out, you get back undefined.

> I'm also not sure I like the third bullet above. If v is dynamically calculated
> or passed in from somewhere, and is usually a non-callable, then it simple code
> seem to work until v happens to be callable.

I hear you -- I was against this "overloading" at first, too. I'll admit that it was when I saw that Racket actually does the same overloading that I began to soften.

The rationale here is one of ergonomics; you want the default behavior of the common API to match the common case. But you also want to make it not too inconvenient to do the general case either. If, in particular, you want to be able to distinguish between maps-to-undefined and not-found, it's much more convenient to write

    map.get(key, orfail)

than to write

    map.has(key) ? map.get(key) : fail(key)

or to monkey-patch WeakMap with a new method to abstract the above. Another alternative would be to have two different API's for lookup, one with the convenient common case and one with the callback:

    map.get(key[, def=undefined]) // returns the value if present or def otherwise
    map.lookup(key, getdef) // returns the value if present or getdef(key) otherwise

IOW, break apart the overloading into two separate methods. I'm okay with this, although it does introduce an extra method into the API.

> I don't understand this. Under what scenario would someone use the WeakMap API
> on an ES3 implementation? If the answer is essentially none, then I still favor
> "delete".

I don't have a strong argument either way, but I think the rationale for "delete" is actually kind of thin; weak maps have plenty of use cases beyond soft fields, so I don't see a need to name the method for the one use case. And it's just that I have this nervous feeling that people will be worried about syntactic compatibility and end up writing map["delete"] instead of map.delete.

> That clarifies why the fourth bullet was left out. But if we get rid of the
> third bullet, as I think we should, then we still need to distinguish between
> undefined and absence. OTOH, we could replace the third bullet by a narrower
> condition that allows null and undefined to be passed through as default
> values.

IIUC, that would not be quite expressive enough. What if I have a map that can contain both null and undefined? I want the general ability to write custom behavior for the not-found case, including throwing.

> A final possibility, with some pros and cons, for missed lookup handling logic,
> is to pass an optional missed lookup handler to the constructor rather than the
> get function:
> 
> WeakMap(v_opt)

I don't think that's flexible enough. The creator of the map and the client of the map may not be the same code. The client should be the one to decide what they want to do for the not-found case.

Dave
Comment 82 Igor Bukanov 2011-04-03 11:24:01 PDT
(In reply to comment #81)
> Another
> alternative would be to have two different API's for lookup, one with the
> convenient common case and one with the callback:
> 
>     map.get(key[, def=undefined]) // returns the value if present or def
> otherwise
>     map.lookup(key, getdef) // returns the value if present or getdef(key)
> otherwise

One can always use a private unique object to implement the lookup on top of the getmethod  if really necessary. So the lookup version is only sugar. Given that sugar does not help the case when one wants to change the control flow in the caller of map.get depending on the absence of the key, I do not see the need to have that sugar in weak maps.
Comment 83 Jason Orendorff [:jorendorff] 2011-04-10 19:42:44 PDT
This shouldn't land without comprehensive tests. If you don't want to write 'em
yourself, call in a favor.

Give js::HashMap<JSObject *, Value> a typedef.

Rename "contains" to "has" per comment 81.

>+    if (!InstanceOf(cx, obj, &WeakMap::jsclass, vp ? vp + 2 : NULL))

vp is never null, so just `vp + 2`.

>+            if (!ExternalInvoke(cx, ObjectValue(*obj), *vp, 1, argv, vp))

Dherman changed his mind since then. Let's make it completely ignore the
second argument for now. We'll add in the final behavior later.

In WeakMap::set:
>+        if (value.isUndefined())
>+            return true;

These two lines have to go because of contains().

>+        void *mem = cx->malloc(sizeof(WeakMap));
>+        if (!mem)
>+            goto out_of_memory;
>+        table = js_new<WeakMap>(cx);

Bogus.

>+    if (!value.isUndefined())
>+        return table->map.put(key, value) != NULL;
>+    table->map.remove(key);
>+    return true;

Should just be:
  return table->map.put(key, value) != NULL;
Comment 84 Dave Herman [:dherman] 2011-04-10 19:58:42 PDT
> >+            if (!ExternalInvoke(cx, ObjectValue(*obj), *vp, 1, argv, vp))
> 
> Dherman changed his mind since then. Let's make it completely ignore the
> second argument for now. We'll add in the final behavior later.

If we're going to add the |lookup| function later, can we at least add the optional second argument that defaults to undefined and is always the return value when the property is not found? This should be trivial to implement and makes the API much more flexible at minimal implementation cost.

I still want the additional method when we can, though.

BTW, since Tuesday is coming fast and furious, I'm willing to be the one to flinch on the "remove vs. delete" game of chicken. I personally like the name remove better, but I concede that my concern over contextual keyword is probably not that big a deal, and we've already gone down that road with generator.throw anyway.

Dave
Comment 85 Mark S. Miller 2011-04-10 20:33:39 PDT
(In reply to comment #84)
> If we're going to add the |lookup| function later, can we at least add the
> optional second argument that defaults to undefined and is always the return
> value when the property is not found? This should be trivial to implement and
> makes the API much more flexible at minimal implementation cost.
> 
> I still want the additional method when we can, though.

+1.


> 
> BTW, since Tuesday is coming fast and furious, I'm willing to be the one to
> flinch on the "remove vs. delete" game of chicken.

Thanks David. I did not intend to turn this into a game of chicken, but my head has been elsewhere lately for various reasons and I didn't have time to engage on this issue. Since you had earlier written "I don't have a strong argument either way" (comment 81) I inferred that either would be acceptable enough to you. Thanks for being flexible on this.


> I personally like the name
> remove better, but I concede that my concern over contextual keyword is
> probably not that big a deal, and we've already gone down that road with
> generator.throw anyway.
> 
> Dave
Comment 86 Dave Herman [:dherman] 2011-04-10 21:08:00 PDT
> Thanks David. I did not intend to turn this into a game of chicken, but my head
> has been elsewhere lately for various reasons and I didn't have time to engage
> on this issue.

That came across more adversarial than I intended, sorry about that -- I just meant that with a Firefox deadline approaching, someone was going to have to give soon, and I'm willing to be it.

> Since you had earlier written "I don't have a strong argument
> either way" (comment 81) I inferred that either would be acceptable enough to
> you. Thanks for being flexible on this.

Sure. I'm not swayed by the argument that it should match the soft field use case, because weak maps are a general mechanism; the name should match the level of abstraction appropriate for the API it belongs to, not for a different layer of abstraction that's placed on top.

Anyway, all that said, I don't feel incredibly strongly on this issue. At the end of the day, they're pretty intuitive 6-letter/two-syllable words. :)

Dave
Comment 87 Andreas Gal :gal 2011-04-11 15:35:43 PDT
Created attachment 525194 [details] [diff] [review]
patch

New patch, please re-review and I will work on tests in the meantime.
Comment 88 Andreas Gal :gal 2011-04-11 16:03:54 PDT
Created attachment 525204 [details] [diff] [review]
patch
Comment 89 Dave Herman [:dherman] 2011-04-11 16:14:53 PDT
Comment on attachment 525204 [details] [diff] [review]
patch

Pardon the drive-by, but...

>+static JSObject *
>+NonNullObject(JSContext *cx, Value *vp)
>+{
>+    if (vp->isPrimitive()) {
>+        JS_ReportErrorNumber(cx, js_GetErrorMessage, NULL, JSMSG_NOT_NONNULL_OBJECT);
>+        return NULL;
>+    }
>+    return &vp->toObject();
>+}

this is also defined in jsproxy.cpp. Can it move into some .h somewhere?

Dave
Comment 90 Andreas Gal :gal 2011-04-11 16:21:45 PDT
comment 89: done.
Comment 91 Jason Orendorff [:jorendorff] 2011-04-11 18:37:50 PDT
>+    if (!js_DefineNativeProperty(cx, proto, ATOM_TO_JSID(cx->runtime->atomState.lengthAtom),
>+                                 UndefinedValue(), NULL, NULL,
>+                                 JSPROP_READONLY | JSPROP_PERMANENT | JSPROP_SHARED, 0, 0,
>+                                 NULL)) {
>+        return NULL;
>+    }

Kill this.

Someone proposed a shell-only function that gets the number of entries in a WeakMap. I want it for tests.
Comment 92 Andreas Gal :gal 2011-04-11 21:50:00 PDT
Created attachment 525298 [details] [diff] [review]
patch
Comment 93 Jason Orendorff [:jorendorff] 2011-04-11 21:53:26 PDT
Created attachment 525299 [details]
weakmaps.js

Here's the test. Patch passes with flying colors, except in debug builds where the bogus DEBUG-only .length property I mentioned in the previous comment causes it to fail.
Comment 94 Jason Orendorff [:jorendorff] 2011-04-11 21:57:24 PDT
Comment on attachment 525298 [details] [diff] [review]
patch

r=me with the test and with the .length property removed.

A shell-only primitive to get at the keys of a Weakmap (or just the number of keys) would be great, to test that we don't leak. However I'm fine with landing with only the attached test. (The most important thing is that we not crash.)
Comment 95 Jason Orendorff [:jorendorff] 2011-04-11 22:13:28 PDT
Created attachment 525302 [details]
fuzzmaps.js

For what it's worth, a program to randomly make tangles of weakmaps, then let the GC get rid of them. This found nothing wrong. Probably because there aren't any bugs, right?
Comment 97 David Baron :dbaron: ⌚️UTC+2 (review requests must explain patch) 2011-04-12 01:36:09 PDT
Backed out for causing jsreftest crashes on debug builds:
http://hg.mozilla.org/mozilla-central/rev/22d3f09a3f37
Comment 98 Andreas Gal :gal 2011-04-14 16:48:19 PDT
I think we were not resetting gcWeakMapList after sweeping. I will reland with that fix.
Comment 99 Andreas Gal :gal 2011-04-16 21:22:49 PDT
comment #98 was a red herring. The original patch passed try. The crazy aurora landing must have been busted. I will re-land on TM.
Comment 100 Andreas Gal :gal 2011-04-16 21:25:52 PDT
http://hg.mozilla.org/tracemonkey/rev/242947d76f73
Comment 101 Andreas Gal :gal 2011-04-16 21:26:49 PDT
Blizzard: this will be one of the major new JS features for FF6, assuming it sticks this time around.
Comment 102 Dão Gottwald [:dao] 2011-04-26 00:04:57 PDT
It looks like this hasn't relanded on mozilla-central yet.
Comment 103 Chris Leary [:cdleary] (not checking bugmail) 2011-04-26 15:43:18 PDT
http://hg.mozilla.org/mozilla-central/rev/242947d76f73
Comment 104 David Bruant 2011-05-11 11:47:02 PDT
First, I'd like to add a stress test to WeakMaps:
-----
var wm = WeakMap();
var key = {}, value = {};

while(true){
  wm.set(key, value);
  key = value;
  value = {};
}
-----
At each loop turn, the reference of the "key" object is lost, so it's garbage collectable. After a second turn, the "value" object of this entry becomes garbage-collectable too. In all rigor, this program doesn't use much memory, because it doesn't keep references. However, naive garbage-collector may just "scratch the surface" to see what is garbage-collectable and not notice that since some key objects aren't accessible, values aren't either. These values being themselves keys, they make another object unreachable, etc.


Then, I'd like to write the doc on this feature. On the tests files, I've seen that WeakMaps have a "has" and "delete" methods natively, unlike the current state of the harmony proposal (which just contains "get" and "set"). Is it normal?
Comment 105 Jason Orendorff [:jorendorff] 2011-05-12 08:39:24 PDT
(In reply to comment #104)
> First, I'd like to add a stress test to WeakMaps:

There's no actual testing in this test. Also we try to avoid tests that run forever.

To turn this into a real test, we need two things:

  * A way for a test to run GC without conservative stack scanning.
    This can only be done safely when the stack is empty, so it would
    have to be something like:

        fullGCAndThen(function () { ... stuff to do after gc ... });

    I think this is probably easy to implement both for the browser
    (using setTimeout) and for the shell, but no one has done it.

  * A way to check and see if the WeakMap entries got collected or not,
    rather than waiting for the process to run out of memory. We kicked
    this idea around in this bug earlier, but no one did it.

Neither of these things should be exposed to the Web. They're just for testing.

I will gladly draw straws for these two tasks, or split up the work, if anyone else is willing.

> Then, I'd like to write the doc on this feature. On the tests files, I've
> seen that WeakMaps have a "has" and "delete" methods natively, unlike the
> current state of the harmony proposal (which just contains "get" and "set").
> Is it normal?

See comment 62 and subsequent discussion (comment 65, comment 67, etc.)

The wiki page hadn't been updated, last I looked. I hope it will be.
Comment 106 Jesse Ruderman 2011-05-12 10:31:55 PDT
>   * A way for a test to run GC without conservative stack scanning.

OMG please yes.

It might be good for this API to take an optional compartment parameter, like the other GC APIs (after the patch in bug 650978).

>   * A way to check and see if the WeakMap entries got collected or not,
>     rather than waiting for the process to run out of memory. We kicked
>     this idea around in this bug earlier, but no one did it.

The API added in bug 597906 isn't specific to WeakMap, but I think it would work well enough to tell you whether a chain of WeakMaps got collected all the way.
Comment 107 David Bruant 2011-05-12 11:59:39 PDT
(In reply to comment #105)
> (In reply to comment #104)
> > First, I'd like to add a stress test to WeakMaps:
> 
> There's no actual testing in this test. Also we try to avoid tests that run
> forever.
Ok.
I've just run this code on the last nightly (on Ubuntu 11.04, 32bits):
----
var wm = WeakMap();
var key = {}, value = {};
var N = Math.pow(2, 20);
while(N--){
  wm.set(key, value);
  key = value;
  value = {};
}
----
The first time lasts about a second. Second run takes several minutes (and tell me the page is not responding and ask if I want to kill the script at some point). Using 21 instead of 20 makes the first run long.

> (...)
>   * A way to check and see if the WeakMap entries got collected or not,
>     rather than waiting for the process to run out of memory. We kicked
>     this idea around in this bug earlier, but no one did it.
This last point might be the explanation of the second run being slow. Should I file a bug for the long script or will there be bugs for this point?


> Neither of these things should be exposed to the Web. They're just for
> testing.
Do you mean that it is not planned that WeakMaps reach Firefox 6?


> > Then, I'd like to write the doc on this feature. On the tests files, I've
> > seen that WeakMaps have a "has" and "delete" methods natively, unlike the
> > current state of the harmony proposal (which just contains "get" and "set").
> > Is it normal?
> 
> See comment 62 and subsequent discussion (comment 65, comment 67, etc.)
Ok, thanks.


> The wiki page hadn't been updated, last I looked. I hope it will be.
In July probably according to https://mail.mozilla.org/pipermail/es-discuss/2011-May/014101.html
Comment 108 Mark S. Miller 2011-05-12 18:02:25 PDT
(In reply to comment #107)
> (In reply to comment #105)
[...]
> > The wiki page hadn't been updated, last I looked. I hope it will be.
> In July probably according to
> https://mail.mozilla.org/pipermail/es-discuss/2011-May/014101.html

y.

AFAIK, there are only two differences between what Andreas implemented and what I'm planning to propose:

1) Dave Herman suggested (what became a separate) lookup method which is like get, but, if its first key argument isn't found, in returns the result of calling its second argument. (An earlier proposal had both behaviors in one get method, dynamically switched based on whether the 2nd argument was a function. I think this was too error prone.)

2) WeakMap.prototype.set(k,v) and WeakMap.prototype.delete(k) both mutate the WeakMap.prototype itself. Freezing it doesn't prevent this mutation since the mutable state is in internal properties, not freezable properties. Instead, we agreed that the WeakMap methods, when given a WeakMap.prototype object from any frame, should throw a TypeError rather than mutate it. 

At http://codereview.appspot.com/4249052/diff/37002/src/com/google/caja/ses/es5shim.js I monkey patch the current FF6.01a Nightly implementation to fix problem #2. Search for "PATCH_MUTABLE_FROZEN_WEAKMAP_PROTO" on that page. As you can see, getting this right cross frame by monkey patching is needlessly expensive.



Should I file a separate bug on #2, or should we reopen this bug and track it here?
Comment 109 Andreas Gal :gal 2011-05-12 18:05:30 PDT
We can fix both in one bug (but a new one). Both 1) and 2) are clearly bugs that should and will be fixed.
Comment 110 Mark S. Miller 2011-05-12 19:39:56 PDT
See https://bugzilla.mozilla.org/show_bug.cgi?id=656828(In reply to comment #109)
> We can fix both in one bug (but a new one). Both 1) and 2) are clearly bugs
> that should and will be fixed.

At https://bugzilla.mozilla.org/show_bug.cgi?id=656828
Comment 111 David Bruant 2011-06-15 12:31:50 PDT
Wrote https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/WeakMap during the last MDN doc sprint. Could I have a review on this page before changing keyword to dev-doc-complete?
Thanks.
Comment 112 Jason Orendorff [:jorendorff] 2011-06-15 12:44:59 PDT
Reviewed. Very nice work. Thanks, David.

Note You need to log in before you can comment on or make changes to this bug.