Closed Bug 547941 (WeakMap) Opened 14 years ago Closed 13 years ago

Implement Harmony Weak Maps [ES6?]

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla6

People

(Reporter: gal, Assigned: gal)

References

(Blocks 1 open bug, )

Details

(Keywords: dev-doc-complete, feature, Whiteboard: fixed-in-tracemonkey)

Attachments

(3 files, 14 obsolete files)

26.56 KB, patch
jorendorff
: review+
Details | Diff | Splinter Review
4.75 KB, text/javascript
Details
2.37 KB, text/javascript
Details
http://wiki.ecmascript.org/doku.php?id=strawman:ephemeron_tables

Usage:

var e = new EphemeronTable()
var x = new String("x")
e.put(x, ({}))

gc()

/* Table still contains x<->({}), because x is alive. */

x = null

gc()

/* The table is empty, which we can't verify, since we lost x, but anyway. */

Steps during GC:
1) first mark all objects and merely collect ephemeron tables in a list
2) go through all ephemeron tables and revive all values with alive keys
3) if any values were revived by 2), goto step 2)
4) remove entries with dead keys from all ephemeron tables

Iterative marking seems necessary to deal with cases like this:

var x = new String("x")
var y = new String("y")

e.put(x, y)
e.put(y, ({}))

y = null

gc()

If we mark in the order x, y everything is fine. By the time we get to y, we already revived y because x is alive and hence the entry (y, {}) will remain alive as well.

However, if we mark in the order y, x we are hosed. By the time we revive y, (y, {}) was already lost.

I am not sure whether there is a more clever way to do this, but at 2am this is the best I can think off.

The iteration is bounded by the length of the longest cycle (O^2). With this is definitely possible to build data structures that take quite a while to GC (DOS attack).
Attached patch patch (obsolete) — Splinter Review
Assignee: general → gal
Comment on attachment 428412 [details] [diff] [review]
patch

More welcome late-night hacking -- youth is not wasted on all the wrong people :-).

>+		jsweak.cpp \

"Was that a cut?" :-P

How about jsweakptr.cpp for that proposal when it's time, and jsephemerontable.cpp or (if you don't want to get too long-winded, but I do not mind -- better to follow a simple naming rule at this point [some day we will cure the 8.3 hangover]) jsephemeron.cpp for this file?

>+class EphemeronTable {
>+    static EphemeronTable *fromJSObject(JSObject *obj);
>+
>+    JSObject *next;
>+    js::HashMap<JSObject*, JSObject *> map;
>+public:
>+

One blank line before label, not after.

>+    static JSBool put(JSContext *cx, uintN argc, jsval *vp);

Spec calls this "set" to match Object.defineProperty 'set' property in descriptor, etc.

>+    static JSBool get(JSContext *cx, uintN argc, jsval *vp);
>+
>+    static void Trace(JSTracer *trc, JSObject *obj);

Deadwood, right? Except we still support a generic heap-tracing API, so maybe this should be unified with Mark.

Nit: we don't StudlyCaps C++ member names, so trace, mark and sweep -- not Trace, Mark, and Sweep.

>+    EphemeronTable(JSContext *cx) : map(cx)
>+    {
>+    }

You don't initialize next here.

Can put "{}" on end of first line of ctor, per house C++ style.

>+EphemeronTable::put(JSContext *cx, uintN argc, jsval *vp)
>+{
>+    JSObject *obj = JS_THIS_OBJECT(cx, vp);
>+    if (!JS_InstanceOf(cx, obj, &js_EphemeronTableClass, vp ? vp + 2 : NULL))
>+        return false;
>+    if (argc < 2) {
>+        JS_ReportErrorNumber(cx, js_GetErrorMessage, NULL, JSMSG_MORE_ARGS_NEEDED,
>+                             "EphemeronTable.put", 1, "");
>+        return false;
>+    }
>+    JSObject *key = NonNullObject(cx, vp[2]);
>+    if (!key)
>+        return false;
>+    JSObject *value = NonNullObject(cx, vp[3]);

Spec does not require non-primitive value -- this is important.

>+    if (!value)
>+        return false;
>+
>+    EphemeronTable *table = (EphemeronTable *)obj->getPrivate();
>+    if (!table) {
>+        void *mem = js_malloc(sizeof(EphemeronTable));
>+        if (!mem)
>+            goto out_of_memory;
>+        table = new (mem) EphemeronTable(cx);
>+        if (!table->map.init()) {
>+            js_free(mem);
>+            goto out_of_memory;
>+        }
>+        obj->setPrivate(table);

So you use placement new into a js_malloc'ed allocation, but the finalizer delete's that pointer -- bug? Best to be consistent and use js_free.

That reminds me: shouldn't these be cx->malloc and cx->free, for proper memory pressure accounting.

>+    }
>+
>+    *vp = JSVAL_VOID;
>+    return !!table->map.put(key, value);

The !! is not required AFAIK, unless there's a VS bug. Try without unless someone can cite proof of such.

>+EphemeronTable::get(JSContext *cx, uintN argc, jsval *vp)
>+{
. . .
>+    js::HashMap<JSObject *, JSObject*>::Ptr ptr = fromJSObject(obj)->map.lookup(key);
>+    if (!ptr) {
>+        *vp = JSVAL_VOID;
>+        return true;
>+    }
>+    *vp = OBJECT_TO_JSVAL(ptr->value);
>+    return true;

Just write

    *vp = ptr ? OBJECT_TO_JSVAL(ptr->value) : JSVAL_VOID;
    return true;

and avoid the extra return and if-then overhead.

>+EphemeronTable::Trace(JSTracer *trc, JSObject *obj)
>+{
>+    if (IS_GC_MARKING_TRACER(trc)) {
>+        EphemeronTable *table = fromJSObject(obj);
>+        if (table) {
>+            if (table->map.empty()) {
>+                delete table;

This method is deadwood currently, but note delete of js_malloc'ed space.

>+                obj->setPrivate(NULL);
>+                return;
>+            }
>+            JSRuntime *rt = trc->context->runtime;
>+            if (rt->gcWeakPtrList)
>+                fromJSObject(rt->gcWeakPtrList)->next = obj;
>+            table->next = NULL;
>+            rt->gcWeakPtrList = obj;

Again this method is deadwood, but shouldn't it be inserting obj at rt->gcWeakPtrList, so setting table->next = rt->gcWeakPtrList before rt->gcWeakPtrList = obj? Noting in case something needs to do something like what this is trying to do.

This shows that the JSRuntime member should be named gcEphemeronTableList. The WeakPtr proposal

http://wiki.ecmascript.org/doku.php?id=strawman:weak_references

(which again suggests avoiding the "Ptr" [pointer] trope, which is too low-level and suggestive of unsafe pointers -- I'm nagging MarkM here ;-) is a different proposal, and it will not obviously be able to share this rt->gcWeakPtrList member. Again as-concrete-as-possible-at-first and systematically constructed names are best.

>+EphemeronTable::Mark(JSTracer *trc)
>+{
>+    JSContext *cx = trc->context;
>+    JSRuntime *rt = cx->runtime;
>+
>+    bool again;
>+    do {
>+        again = false;
>+        JSObject *obj = rt->gcWeakPtrList;
>+        while (obj) {

Please be l33t (and scope precisely) and put the JSObject *obj decl in the while condition.

>+            EphemeronTable *table = fromJSObject(obj);
>+            for (js::HashMap<JSObject *, JSObject *>::Enum e(table->map.enumerate());
>+                 !e.empty();
>+                 e.popFront()) {
>+                if (!JS_IsAboutToBeFinalized(cx, e.front().key)) {
>+                    /* If the key is alive, check whether we should revive the value. */
>+                    if (JS_IsAboutToBeFinalized(cx, e.front().value)) {
>+                        /* If this is the only reference to the value, revive it. */
>+                        JS_CALL_OBJECT_TRACER(trc, e.front().value, "__ephemeron__");

Blank line here.

Common e.front().value into jsval v (see below).

>+                        /* We revived a value, iterate again. */
>+                        again = true;

Easy optimization, to match allowing primitive values: again = !JSVAL_IS_PRIMITIVE(v).

>+EphemeronTable::Sweep(JSContext *cx)
>+{
>+    JSRuntime *rt = cx->runtime;
>+
>+    JSObject *obj = rt->gcWeakPtrList;
>+    while (obj) {

l33t C++ nit again.

>+        EphemeronTable *table = fromJSObject(obj);
>+        for (js::HashMap<JSObject *, JSObject *>::Enum e(table->map.enumerate());
>+             !e.empty();
>+             e.popFront()) {
>+            if (JS_IsAboutToBeFinalized(cx, e.front().key))
>+                e.removeFront();
>+        }
>+        obj = table->next;
>+    }
>+
>+    rt->gcWeakPtrList = NULL;

So Sweep nulls the list, so js_GC should not (it can assert that the member is null -- runtime is zero-initialized -- no need to store two nulls back to back to the list member).

This raises the question: what nulls table->next? Only the EphemeronTable::Trace dead-wood. And the ctor lacks that member-init currently (noted above). But who appends to rt->gcWeakPtrList? Only EphemeronTable::Trace, again.

Seems we want to build the list during mark and process/kill it during sweep. Obvious in light of day I'm sure :-).

>+EphemeronTable::Construct(JSContext *cx, JSObject *obj, uintN argc, jsval *argv, jsval *rval)
>+{
>+    if (!JS_IsConstructing(cx)) {
>+        if (!(obj = js_NewObject(cx, &js_EphemeronTableClass, NULL, NULL)))

Avoid nexting assignment in if conditions where gratuitous, per house style.

>+js_InitEphemeronTableClass(JSContext *cx, JSObject *obj)
>+{
>+    JSObject *proto;

Initialize this decl.

>+    proto = js_InitClass(cx, obj, NULL, &js_EphemeronTableClass, EphemeronTable::Construct, 0,
>+                         NULL, methods, NULL, NULL);
>+    if (!proto)
>+        return NULL;
>+
>+    proto->setPrivate(NULL);
>+

No need for a blank line here.

>+void
>+js_MarkEphemeronTables(JSTracer *trc)
>+{
>+    EphemeronTable::Mark(trc);
>+}
>+
>+void
>+js_SweepEphemeronTables(JSContext *cx)
>+{
>+    EphemeronTable::Sweep(cx);
>+}

Thinner code footprint if you just expose EphemeronTable in jsephemerontable.h and have the GC call its public statics directly. The C heritage still hangs heavily over us but this seems avoidable.

/be
what repository are you planning to land this in?
(In reply to comment #2)
> Avoid nexting assignment in if conditions where gratuitous, per house style.

s/nexting/nesting/

(In reply to comment #3)
> what repository are you planning to land this in?

We can configure this off at first to avoid security bug risks.

We do need to publicly try out Harmony proposals pretty soon, though. I'm not at all worried about breaking developers who use support and then the proposal changes. But the security bug worry is a good one.

To focus on this patch, the hazard would be a failure to mark something while keeping a reference to it exposed to script.

/be
(In reply to comment #4)
> 
> We can configure this off at first to avoid security bug risks.
> 
> We do need to publicly try out Harmony proposals pretty soon, though. I'm not
> at all worried about breaking developers who use support and then the proposal
> changes. But the security bug worry is a good one.

I'm mostly worried about integration pain, since we don't yet know how the GC and Jmonkey work is going to show up. TM is actually relatively close to the supposedly shippable m-c tree, so we don't want to get a bunch of intertwined half-done and/or experimental work landing that can't be configured off.
Traditional way is to use jsversion.h macros (trim old ones, keep moving forward by making unconditional what is now a standard, de-jure or even de-facto). Now we need configure magic too, but that is easy to copy and mutate.

I have to agree with Andreas that proxies have more attack surface, but either could have security bugs. As for GC integration, ETs want just a mark (trace) and a sweep hook, in this simple conception (prior to heavy optimization). So unless Igor or Gregor sees something, the GC dependencies seem ok.

/be
To clarify. I only want to land either of these if we think we can ship it. I don't see a point in landing stuff that creates divergence from the product path. With proxies and the tables we can implement wrappers in JS and ... trace through them :) So I think we want that for chrome code at the very least.
alright, I don't think we(In reply to comment #7)
> To clarify. I only want to land either of these if we think we can ship it.

Alright, I don't think we've discussed a release vehicle for this work yet. We should probably have that discussion.
We don't land stuff to pull it out later, or leave it untested and bit-rotting, of course. The proxies proposal, and some kind of ephemeron proposal, are in good position to be in the next major ES standard -- indeed proxies are already in the harmony: namespace at http://wiki.ecmascript.org/.

We should not be any more shy about leading than we have in the past (ahem, but we should have tests -- other than fuzztesting -- better in hand... *cough* JS1.7 *cough* -- in our defense it was just mrbkap, part-time igor, and me). Ok?

/be
If I had to put odds on next release vehicle plans vs. harmonious proposals making it into the next major ES spec, I wouldn't be able to hedge well. The odds are close, and related to the extent that our prototyping proves the concepts and helps the rest of the committee agree.

IMHO "next Firefox non-patch-release vehicle" seems best. There is no need for a farther-out vehicle whose odds of being real must be lower at this point than the odds that we will both profitably use and successfully standardize this stuff.

/be
Yeah, my goal is 3.7 for this.
Actually, the way we mark here isn't safe. The children of an ephemeron value that was revived has to be marked as well. We need a work list.
(In reply to comment #12)
> Actually, the way we mark here isn't safe. The children of an ephemeron value
> that was revived has to be marked as well. We need a work list.

Your patch uses:

    JS_CALL_OBJECT_TRACER(trc, e.front().value, "__ephemeron__");

does indeed mark the graph reachable from e.front().value. So you're good.

/be
Comment on attachment 428412 [details] [diff] [review]
patch

>diff --git a/js/src/jsgc.cpp b/js/src/jsgc.cpp
> 
>+    /* Sever all weak references that point to objects we are about to finalize. */
>+    js_MarkEphemeronTables(&trc);
>+
...
>     MarkDelayedChildren(&trc);
...
> 
>     JS_ASSERT(!cx->insideGCMarkCallback);
>     if (rt->gcCallback) {

The order of calls is wrong here. MarkDelayedChildren must be called before js_MarkEphemeronTables to make sure that the whole object graph is marked. Then EphemeronTable::Mark must integrate MarkDelayedChildren to call it after JS_CALL_OBJECT_TRACER to make sure that we have marked everything indeed.

But that without rt->gcCallback complexity. That callback assumes that it is called exactly between the marking and finalization phase. Its implementation in xpconnect uses that to add its own marking. After the last mark call the implementation then starts to finalize things. To deal with that the second MarkDelayedChildren call in js_CallGCMarker has to be also followed by js_MarkEphemeronTables.

That complication can be avoided if we split JSGC_MARK_END into JSGC_EXTRA_MARK and JSGC_MARK_FINALIZE_START. But that can wait as the above workaround could make things work for now.
I can't find any evidence for the behavior described in #14. xpconnect starts finalizing when it gets a JSGC_FINALIZE_END callback. I will file a bug to remove the insideGCMarkCallback hack.
Depends on: 548215
We need constructors to run, so I use delete here.
>+    JSObject *obj = rt->gcWeakPtrList;
>+    while (obj) {

l33t C++ nit again.

Nice try.

while (JSObject *obj = rt->gcWeakPtrList) { }

is an endless loop here.
Attached patch patch (obsolete) — Splinter Review
Attachment #428412 - Attachment is obsolete: true
Attached patch patch (obsolete) — Splinter Review
Attachment #428675 - Attachment is obsolete: true
The latest version adds a length property that can be queried. Without this its basically impossible to test ephemeron tables.

I have to drive in 548215, after that this patch should be ready for testing and landing.
Attached patch patch with test case (obsolete) — Splinter Review
Attachment #428676 - Attachment is obsolete: true
Attached patch patch (obsolete) — Splinter Review
Proper behavior when setting an undefined value (delete key/value pair if present, don't add otherwise).
Attachment #428680 - Attachment is obsolete: true
(In reply to comment #15)
> I can't find any evidence for the behavior described in #14.

xpconnect inside JSGC_MARK_END does both marking and calls to JS_IsAboutToBeFinalized, see, for example, http://hg.mozilla.org/tracemonkey/file/1b5ca6cc5ce8/js/src/xpconnect/src/nsXPConnect.cpp#l374 and  http://hg.mozilla.org/tracemonkey/file/1b5ca6cc5ce8/js/src/xpconnect/src/xpcwrappednativescope.cpp#l444. The latter requires that we know precisely what the garbage is. That is, all weak references must be properly marked when it calls the function.
(In reply to comment #17)
> >+    JSObject *obj = rt->gcWeakPtrList;
> >+    while (obj) {
> 
> l33t C++ nit again.
> 
> Nice try.

Sorry, pattern-matching too much -- still, you have to get l33t and forget teh C and Java :-P.

/be
(In reply to comment #20)
> The latest version adds a length property that can be queried. Without this its
> basically impossible to test ephemeron tables.

I knew this would come up. :) The length property breaks the guarantee of ephemeron tables that it's not observable when weak references die. Otherwise you might as well just have ephemerons.

Of course you're right that it's harder to test. Would it be possible to add some sort of simple but privileged-C++ observation mechanism for observing things that otherwise shouldn't be observable to ordinary JS programs? And that's only available to the test harness? Something like:

#ifdef JS_HAS_TEST_HARNESS
# define INTERNALS_INIT     js_InternalsClass
#else
# define INTERNALS_INIT     js_NullClass
#endif

/* ... */

JS_PROTO(Internals, INTERNALS_INIT)

And then you could use this for the test harness to make privileged observations that you otherwise shouldn't be able to. So you could do things like:

Internals.ephemeronTableLength(t)

Without something like this, we'll keep ending up adding observations to SpiderMonkey that shouldn't be there, on a case-by-case basis, without thinking carefully about what we're exposing.

Dave
Submitted a new bug suggesting a simple Internals API, with boilerplate patch. See bug #548341.

Dave
Yeah, the .length was strictly temporary. We could also just wrap it into #ifdef DEBUG.
Rather than #ifdef DEBUG or a temporary hack, we should just add a global ephemeronTableLength function to the global object in shell/js.cpp so the test harness can use it.

Dave
We need to consider the implications of the Ephemeron tables for the cycle collector.
Where _exactly_ does this behavior happen in nsXPConnect.cpp. Please be specific.

The other site looks plausible. Looking into it.

(In reply to comment #23)
> (In reply to comment #15)
> > I can't find any evidence for the behavior described in #14.
> 
> xpconnect inside JSGC_MARK_END does both marking and calls to
> JS_IsAboutToBeFinalized, see, for example,
> http://hg.mozilla.org/tracemonkey/file/1b5ca6cc5ce8/js/src/xpconnect/src/nsXPConnect.cpp#l374
> and 
> http://hg.mozilla.org/tracemonkey/file/1b5ca6cc5ce8/js/src/xpconnect/src/xpcwrappednativescope.cpp#l444.
> The latter requires that we know precisely what the garbage is. That is, all
> weak references must be properly marked when it calls the function.
(In reply to comment #30)
> Where _exactly_ does this behavior happen in nsXPConnect.cpp. Please be
> specific.

The marking in JSGC_MARK_END is only happen via http://mxr.mozilla.org/mozilla-central/source/js/src/xpconnect/src/nsXPConnect.cpp#397 which is a cycle-collector installed callback. Which is good and indicates that the code improved and the normal marking that xpconnect needs is happens in http://mxr.mozilla.org/mozilla-central/source/js/src/xpconnect/src/xpcjsruntime.cpp#279. This is much better than it was a year or two ago :)

But I am not sure yet what to do about the cycle collector callback. We need to talk to the cycle collector guys.



> 
> The other site looks plausible. Looking into it.
> 
> (In reply to comment #23)
> > (In reply to comment #15)
> > > I can't find any evidence for the behavior described in #14.
> > 
> > xpconnect inside JSGC_MARK_END does both marking and calls to
> > JS_IsAboutToBeFinalized, see, for example,
> > http://hg.mozilla.org/tracemonkey/file/1b5ca6cc5ce8/js/src/xpconnect/src/nsXPConnect.cpp#l374
> > and 
> > http://hg.mozilla.org/tracemonkey/file/1b5ca6cc5ce8/js/src/xpconnect/src/xpcwrappednativescope.cpp#l444.
> > The latter requires that we know precisely what the garbage is. That is, all
> > weak references must be properly marked when it calls the function.
No longer depends on: 548215
(In reply to comment #6)
> Traditional way is to use jsversion.h macros (trim old ones, keep moving
> forward by making unconditional what is now a standard, de-jure or even
> de-facto). Now we need configure magic too, but that is easy to copy and
> mutate.

So jsversion.h should separate this stuff from ES5, then.
Attached patch patch (obsolete) — Splinter Review
Attachment #428682 - Attachment is obsolete: true
I updated the patch to interact with the cycle collector. What's still missing is reporting the list of ephemeron tables to the cycle collector to be used as additional root set.

Which highlights a problem here: if the cycle collector revives entire tables that were not in the list initially, what do we do?
Attachment #428847 - Attachment is obsolete: true
Attachment #428928 - Flags: review?(igor)
Attachment #428928 - Flags: review?(peterv)
Attachment #428928 - Flags: review?(igor) → review-
(In reply to comment #35)
> Created an attachment (id=428928) [details]
> patch with cycle collector integration

Suppose a table T holds a key A and a value B. Suppose there is xpcom link X from A to B like A->X->B. Suppose that A, B and X are unreachable. Thus the GC must be able to collect them no matter what happens with the table T. Now suppose that some live xpcom object LiveY holds a reference to T. Then according to the patch the cycle collector view would be:

LiveY->T->A->X->B
LiveY->T->B

This makes X live. As such the cycle collector would trigger the GC marking of all objects that X holds. This marks A and B preventing them from the GC.
Here is a possible non-reviving algorithm that should work both for the GC marker and the cycle collector. 

First consider the GC marker. If during the marking the GC discovers some table T, the algorithm flags all its table entries with a special flag. Also, for any object A that the GC marks the GC also checks if A is also a key in some tables. Then for all tables holding A as the key the GC also flags the corresponding table entry. Whenever the entry is flagged twice, the entry value B is recursively marked by the GC.

Now consider the case of the cycle collector. There again if the CC discovers a table T, it flags all its entries. If it discovers some object A that is a key is some tables, it also flags all entries holding A in all tables.  Whenever the entry is flagged twice, the CC records an endge from A to B and from T to B and then it proceeds recursively to discover the edges from B.

AFAICS the GC case can be implemented efficiently using spare GC flags to flag an object whenever it is a key in some table T plus some support structure to find a tables where the object is a key. For the CC case we would need an ability to record an edge from A to B when processing the table T or a link from T to B when processing A.
(In reply to comment #36)
> Suppose a table T holds a key A and a value B. Suppose there is xpcom link X
> from A to B like A->X->B. Suppose that A, B and X are unreachable. Thus the GC
> must be able to collect them no matter what happens with the table T. Now
> suppose that some live xpcom object LiveY holds a reference to T. Then
> according to the patch the cycle collector view would be:
> 
> LiveY->T->A->X->B
> LiveY->T->B
> 
> This makes X live. As such the cycle collector would trigger the GC marking of
> all objects that X holds. This marks A and B preventing them from the GC.

The last statement is not correct since during the GC marking the patch would skip A. A setup that shows a link would be in addition to the above B or X also pointing to A so the CC view would be:

> LiveY->T->A->X->B
> LiveY->T->B->A
We only color table entries as suspected, not as actual roots. I am still not convinced #36 is possible. If that was the case, all wrappers would automatically keep everything alive all the time, too. peterv?
Independently of this discussion, I would like to go ahead and land the patch. The alternative GC method in #37 is rather involved and its not clear there are no corner cases there either. Assuming #36 is correct, in those cases ephemeron tables behave no worse (and no better) than a regular hashtable built in JS for those specific entries. We are not creating any security problems. We merely fail to collect interdependent keys that have a cycle through xpcom. Not great, but not fatal. I am interesting in exploring better cycle collection, but I don't think that's a show stopper.
(In reply to comment #40)
> Assuming #36 is correct, in those cases ephemeron
> tables behave no worse (and no better) than a regular hashtable built in JS for
> those specific entries. We are not creating any security problems.

If that is correct, then the GC may collect something that the CC assume should be alive. I do not know how bad is that.
What do you mean? How would the GC collect it? I thought the problem is the CC will mark thinks that shouldn't be marked?
(In reply to comment #40)
> Independently of this discussion, I would like to go ahead and land the patch.

Why this rush is necessary? And what if it turns out that ephemeron tables are simply incompatible with the cycle collection and the current xpcom model however unlikely this is? For this reason we should have at least an understanding how to make the table fully sound even if the initial implementation would not be much better than a hashtable.

> We merely
> fail to collect interdependent keys that have a cycle through xpcom. Not great,
> but not fatal.

If landing not very sound implementation is OK, I suggest to remove any xpcom integration and just run the table marking after the cycle collector finishes its job. This way we avoid the danger of creating two view of the heap, one for CC and one for GC.
(In reply to comment #42)
> What do you mean? How would the GC collect it? I thought the problem is the CC
> will mark thinks that shouldn't be marked?

CC asks GC to mark recursively any JS things that a live XPCOM object contains. If one of the things is the table, that would fail since the table will not mark its entries. So if a key or value is a JS wrapper for XPCOM thing, that would be finalized even if CC things itr should not. I do not know what are the consequences of that. It could turned out to be safe, but then we have to have a proof for that.
> If landing not very sound implementation is OK, I suggest to remove any xpcom
> integration and just run the table marking after the cycle collector finishes
> its job. This way we avoid the danger of creating two view of the heap, one for
> CC and one for GC.

Yeah, I am not opposed to that. The table isn't worse than a regular hash table, so we don't add additional leaks. We just fail to remove some.
(In reply to comment #43)
> If landing not very sound implementation is OK, I suggest to remove any xpcom
> integration and just run the table marking after the cycle collector finishes
> its job. This way we avoid the danger of creating two view of the heap, one for
> CC and one for GC.

I talked with paterv about that and things is not that simple. If EphemeronTable::reviveValuesWithLiveKeys is called after the cycle collector does its job, then it may make alive things that the CC schedules for finalization. This is bad. If reviveValuesWithLiveKeys is called before the cycle collection, then it may remove keys that the CC would make alive. A workaround for that could be to treat the ephemeron tables as normal strong hashtables whenever js_GC is invoked as a part of CC. Only when js_GC is called alone the code may do reviveValuesWithLiveKeys magic. But that IMO looks like a hack just to land leaky tables.
"Leaky" is a pretty misleading term here. We are talking about landing ephemeron tables that are not as eagerly purged as they possibly could be in the presence of XPCOM cycles. Worst case an ephemeron table is as efficient as using an object as a hash table. In 99% of the cases its more efficient.

We clearly need a way to coordinate between cycle collection and table purging, no doubt about it. But I don't see why we have to solve everything in one step. We are strictly making this use case _less_ "leaky" if we land what we have. Currently the best thing you can do is use an object, which "leaks" everything, every time.
Attachment #428928 - Flags: review?(peterv)
Depends on: 548733
Ok, I patched up the cycle collector so it can deal with additional edges being added after a node (object) was scanned. With this we can now properly report all edges. In case of (conditionally) weak edges, we report them as a regular edge. The only negative side effect of that is that it might cause cycles, which the cycle collector is of course able to deal with.
I have found that it is worth to consider ephemeron tables in a pure reference counting world. In that world, when we put a value B for a key A into the table T, we must increase the reference count for B. This is necessary to prevent B from going away before the cycle collector is running.

When the CC is running we need to report to it an edge that is responsible for an extra reference. Without this the CC will assume that there is an external reference to B and it would be held alive. Where this edge may come from? It we report it as an edge from A, then, if B also references T and A is alive,  this would make both B and T alive preventing T from the collection. If we report the edge as coming from T, then, if B also points to A and T is alive, it would make A alive preventing it from the collection.

For completeness lets report the edge as coming from B itself pretending that B is a self-reference. This ensures that B to stay alive until CC runs. Now the CC runs and finds that B should go away since it belongs to some dead loop. Then before breaking the loop CC checks if A and T are both alive (that is, they do not belong to some dead cycle). If they are, then CC would not break any loop involving B. But this does not work since A or T could belong to yet another loop that should be held alive by other ephemeron values.

So we cannot model the extra reference with normal edges.
Here is a possible collection algorithm for a pure reference counting world that is similar in spirit to the GC marking in the attachment from the comment 35.  Consider an ephemeron table T, a key A and a value B. We model that by a strong reference from T to B and by a weak reference from T to A. 

Suppose now the cycle collector starts to run and detected all the edges for the initial set of gray objects. Then the CC colors black (alive) any gray objects that has an external reference to it. Normally then the CC proceeds recursively to mark black any object recursively reachable from the black (alive) objects. That is modified so during the recursion the CC skips the references from T to any of its value B. 

Then, when this is finished, the algorithm consider all black tables T and for them marks as black any key B recursively if the corresponding key A is not gray.  Note that, while we consider only tables explicitly colored black, the key A may be "not gray" not only because it is black but also because it is not a member of the initial gray set. That is, A was not member of any cycle yet. This difference comes from the fact that we add strong references only from T to B and not to A.

Since this may turn more tables and keys black, that has to be run repeatedly in the same way as the GC marking does in the comment 35.

After this is done, any value B that remains gray in a black T is only referenced via T so it can be removed. This will decrease the reference count to B. After this is done for all such B, then the the CC can proceed with collecting all gray cycles the same way it does now.
As a little optimization, the cursor should also be on the stack. So creating an iterator leaves (private, cursor) on the stack. private is a struct that contains the idlist and shape information. This way getting the next id during iteration no longer requires a builtin call.
Summary: Implement Harmony Ephemeron Tables [ES6?] → Implement Harmony Weak Maps (nee Ephemeron Tables) [ES6?]
Blocks: 486643
I'd be happy to be an early tester of this code - I have several areas where I need object hashtable support, and I'm currently hacking around it by giving each key object its own unique hash string key.  I don't like the solution I came up with, though, so this would be better.

The current patch is over a year old and has r-.  I'm concerned about potential bitrot, and would like an up-to-date patch that hasn't been rejected to play with.
Summary: Implement Harmony Weak Maps (nee Ephemeron Tables) [ES6?] → Implement Harmony Weak Maps [ES6?]
Attached patch patch (obsolete) — Splinter Review
Fresh patch. The constructor has been renamed to WeakMap. Debug builds contain a .length property. This patch relies on the gray/black persistent mark bitmap work we did for FF4, so no special cycle collector magic should be necessary any more.
Attachment #428928 - Attachment is obsolete: true
Attached patch patch (obsolete) — Splinter Review
And this time the right patch.
Attachment #519353 - Attachment is obsolete: true
Attachment #519354 - Flags: review?(jorendorff)
:(  I know I said I'd be an early tester of this patch (and I will)... but I had hoped the patch would include some unit tests in js/src/tests.

This is particularly surprising given that comment 21 did include a xpcshell test.
gal:  Which hg.m.o repository and revision does this patch apply to?  I just tried a dry-run against mozilla-central trunk and against releases/mozilla-2.0's last patch before being tagged, and it failed:

$ patch --dry-run -p1 < ~/Downloads/weakmap
patching file js/src/Makefile.in
patching file js/src/jsapi.cpp
Hunk #1 FAILED at 79.
Hunk #2 succeeded at 1667 (offset -16 lines).
1 out of 2 hunks FAILED -- saving rejects to file js/src/jsapi.cpp.rej
patching file js/src/jsatom.cpp
Hunk #1 succeeded at 207 (offset 1 line).
patching file js/src/jsatom.h
patching file js/src/jscntxt.h
Hunk #1 succeeded at 1051 (offset -2 lines).
patching file js/src/jsgc.cpp
Hunk #1 succeeded at 85 (offset 1 line).
Hunk #2 FAILED at 2271.
Hunk #3 FAILED at 2291.
Hunk #5 FAILED at 2405.
Hunk #6 succeeded at 2461 (offset -6 lines).
3 out of 6 hunks FAILED -- saving rejects to file js/src/jsgc.cpp.rej
patching file js/src/jsgcinlines.h
patching file js/src/jsproto.tbl
patching file js/src/jsweakmap.cpp
patching file js/src/jsweakmap.h
Comment on attachment 519354 [details] [diff] [review]
patch

>+ * The Original Code is Mozilla SpiderMonkey JavaScript 1.9 code, released
>+ * May 28, 2008.

Really?
Tests are coming. The patch is against tracemonkey tip.
Blocks: 642332
>     /* Finalize watch points associated with unreachable objects. */
>     js_SweepWatchPoints(cx);
> 
>+    /* Finalize unreachable (key,value) pairs in all weak maps. */
>+    WeakMap::sweep(cx);

It looks like these aren't compartment-aware. If you do a single-compartment GC, what happens to watchpoints and weakmaps in other compartments?

That would be an existing bug in the case of js_SweepWatchPoints.
Never mind. IsAboutToBeFinalized gets it right in that case!
It's great to see this moving again.

The basic GC algorithm looks correct, with a small bug. But I don't know enough
about the cycle collector to know if that is safe; and I don't know enough
about what's using the tracing APIs to know if it's safe for its behavior to
differ from the GC marking algorithm. I imagine it's fine, but someone else
should take a look too.

This shouldn't land without comprehensive tests. If you don't want to write 'em
yourself, call in a favor.

In jsgc.cpp, MarkAndSweepCompartment:
>      * tracing.
>      */
>     gcmarker.markDelayedChildren();
> 
>     /*
>      * Mark weak roots.
>      */
>     while (true) {
>-        if (!js_TraceWatchPoints(&gcmarker))
>+        if (!js_TraceWatchPoints(&gcmarker) &&
>+            !WeakMap::markIteratively(&gcmarker)) {
>             break;
>+        }
>         gcmarker.markDelayedChildren();
>     }
> 
>     rt->gcMarkingTracer = NULL;
> 
>     if (rt->gcCallback)
>         (void) rt->gcCallback(cx, JSGC_MARK_END);
> 

MarkAndSweep:
>      * tracing.
>      */
>     gcmarker.markDelayedChildren();
> 
>     /*
>      * Mark weak roots.
>      */
>     while (true) {
>-        if (!js_TraceWatchPoints(&gcmarker))
>+        if (!js_TraceWatchPoints(&gcmarker) &&
>+            !WeakMap::markIteratively(&gcmarker)) {
>             break;
>+        }
>         gcmarker.markDelayedChildren();
>     }
> 
>     rt->gcMarkingTracer = NULL;
> 
>     if (rt->gcCallback)
>         (void) rt->gcCallback(cx, JSGC_MARK_END);
> 

Time to factor out common code, I think.

In jsweakmap.h, class WeakMap:
>+  public:
>+    static JSBool set(JSContext *cx, uintN argc, Value *vp);
>+    static JSBool get(JSContext *cx, uintN argc, Value *vp);
>+
>+    static JSBool getProperty(JSContext *cx, JSObject *obj, jsid id, Value *vp);
>+
>+    static void mark(JSTracer *trc, JSObject *obj);
>+
>+    static JSBool construct(JSContext *cx, uintN argc, Value *vp);
>+    static void finalize(JSContext *cx, JSObject *obj);
>+
>+    static bool markIteratively(JSTracer *trc);
>+    static void sweep(JSContext *cx);
>+
>+    static Class jsclass;
>+};

Ideally everything would be private except the jsclass, mark, sweep, and a
WeakMap::initClass(cx, obj) static method, since this class isn't safe to
instantiate without a JSObject backing it. But it's not a big deal--whatever
works.

In jsweakmap.cpp, WeakMap::set:
>+    if (argc < 2) {
>+        JS_ReportErrorNumber(cx, js_GetErrorMessage, NULL, JSMSG_MORE_ARGS_NEEDED,
>+                             "WeakMap.put", 1, "");
>+        return false;
>+    }

The 1 needs to be in quotes, otherwise
  WeakMap().set("three");  // crashes

Also: "WeakMap.set", not put.

Also, I bet this gets spec'd such that if you omit the second argument, the
value is undefined, more like all the other standard builtin functions:

   map.set(Math);   // delete the entry for Math

>+        void *mem = cx->malloc(sizeof(WeakMap));
>+        if (!mem)
>+            goto out_of_memory;
>+        table = new (mem) WeakMap(cx);

Instead:
          table = js_new<WeakMap>(cx);
          if (!table)
              goto out_of_memory;

In WeakMap::get:
>+        JS_ReportErrorNumber(cx, js_GetErrorMessage, NULL, JSMSG_MORE_ARGS_NEEDED,
>+                             "WeakMap.put", "0", "s");

"WeakMap.get"

>+    js::HashMap<JSObject *, Value>::Ptr ptr = fromJSObject(obj)->map.lookup(key);

fromJSObject(obj) can be null here:
  WeakMap().get({});  // null crash

In WeakMap::mark:
>+            JSRuntime *rt = trc->context->runtime;
>+            if (rt->gcWeakMapList)
>+                fromJSObject(rt->gcWeakMapList)->next = obj;
>+            table->next = NULL;
>+            rt->gcWeakMapList = obj;

This doesn't seem right. It's building the list in the wrong direction. The
loop in markIteratively would only run once, since this always makes
fromJSObject(rt->gcWeakMapList)->next NULL.

In WeakMap::markIteratively:
>+                if (IsAboutToBeFinalized(cx, value.toGCThing())) {
>+                    js::gc::MarkValue(trc, value, "value");
>+                    /* We revived a value, we have to iterate again. */
>+                    again = true;
>+                }

If the value is a string, we don't have to iterate, right?

In jsweakmap.cpp:
>+void
>+WeakMap::finalize(JSContext *cx, JSObject *obj)
>+{
>+    WeakMap *table = fromJSObject(obj);
>+    if (table) {
>+        delete table;
>+        obj->setPrivate(NULL);
>+    }
>+}

Is this just paranoia, or is it necessary to obj->setPrivate(NULL) here?

In jsweakmap.cpp, WeakMap::construct:
>+    if (!IsConstructing(vp) &&

Shouldn't it do the same thing whether we're constructing or not? The Array
constructor (js_Array) doesn't check IsConstructing.

In jsweakmap.cpp:
>+Class WeakMap::jsclass = {
>+    "WeakMap",
>+    JSCLASS_HAS_PRIVATE |
>+    JSCLASS_HAS_CACHED_PROTO(JSProto_WeakMap),
>+    PropertyStub,         /* addProperty */
>+    PropertyStub,         /* delProperty */
>+#ifdef DEBUG
>+    WeakMap::getProperty,
>+#else
>+    PropertyStub,
>+#endif

Having a class-wide getter or an extra property somewhere affects all sorts of
things in the jits and property cache--things that should behave exactly the
same in debug and opt builds, for sanity's sake.

How about a debug-only shell function weakMapCount(map) instead?

(Class-wide getters are evil to begin with -- passing a JSPropertyOp to
DefineProperty would be the right way.)
Attachment #519354 - Flags: review?(jorendorff)
I don't think we should overload set() to remove() mappings.  The two verbs are quite different, and putting both add-mapping and remove-mapping under the same verb seems likely to be confusing.

Also, having remove() instead of set(..., undefined) being special would also allow weak maps to map any object to any value -- no need for the developer to specially worry if he ever wants to map |undefined|.

Permitting insertion of |undefined| would also require a has() or contains() method to make that distinction.  This seems like another good verb to have, regardless of whether |set()| is or isn't overloaded, even if the can't-insert-|undefined| behavior means it's not actually functionality overloading right now.


Now a few less-important thoughts:

(In reply to comment #61)
> It's great to see this moving again.
> 
> This shouldn't land without comprehensive tests. If you don't want to write
> 'em yourself, call in a favor.

Concur!

> Also, I bet this gets spec'd such that if you omit the second argument, the
> value is undefined, more like all the other standard builtin functions:
> 
>    map.set(Math);   // delete the entry for Math

We can always loosen to allow omitting the second argument if it ends up being specified that way.  It's harder to go the opposite direction if we guess incorrectly.

There's a js::ToObject method which does exactly what your NonNullObject method does -- use that instead.  But I'd suggest having a more specific error message when the first argument to set() isn't an object, because I suspect the most likely expectation for developers is going to be that WeakMap is a superset of a normal object's mapping functionality, *not* that it's a subset of the exact inverse functionality.  (In other words: they'll expect this "better hash" will map anything to anything, not that it will only map objects to anything [or anything-minus-|undefined| as it stands now].)

> How about a debug-only shell function weakMapCount(map) instead?

I had exactly this concern and suggestion myself.
No longer blocks: 642332
I have a general question.  Suppose that you have a collection of objects stored in the WeakMap as values, and that the WeakMap is the only place where references to those objects are stored.

Under the WeakMap specification, which I assume you're implementing here, the only references to those objects are weak references, then, and they are vulnerable to being deleted at any time.  The spec says:

"neither keys nor values in a weak map are held strongly"

I certainly understand and wholeheartedly support weak references to keys.  I'm just wondering if we can adjust the design and/or implementation so that the values can (not must!) be strongly held, if an extra true parameter is passed in either in the WeakMap constructor, or in the set() method.

(On the downside, suppose those objects hold strong references to the keys, as my 1:1 hashtable design does.  Then you're back in a circular reference scenario.)

Is this necessary and/or desirable?
(In reply to comment #63)
> Under the WeakMap specification, which I assume you're implementing here, the
> only references to those objects are weak references, then, and they are
> vulnerable to being deleted at any time.  The spec says:
> 
> "neither keys nor values in a weak map are held strongly"

This text from Dave's question is part of a proposed summary of the semantics. To avoid your misreading, it could perhaps be better stated as 

   * neither keys nor values in a weak map are held strongly or weakly

In any case, his meaning becomes clear (and this clear meaning is correct) when he goes on to say

   * i.e., the presence of a mapping k ⇒ v in a reachable weak map m does not by itself make k or v reachable

   * for each mapping k ⇒ v in a reachable weak map m, if k is reachable then v is reachable

In other words, the weakness provided by WeakMaps is not reduced (and in fact is not reducible) to the weakness or strongness of individual references to the keys and values. The WeakMap proposal in fact does not require the existence of weak references of any sort. 


> I certainly understand and wholeheartedly support weak references to keys. I'm

There is a separate strawman for weak references, and it could co-exist with WeakMaps, but it is not expected for EcmaScript-next. No one has signed up to propose it before the May deadline. Although I wrote it, I cannot say I wholeheartedly support it. My feelings about it are mixed, but in any case it lives or dies independently of WeakMaps.


> just wondering if we can adjust the design and/or implementation so that the
> values can (not must!) be strongly held, if an extra true parameter is passed
> in either in the WeakMap constructor, or in the set() method.
> 
> (On the downside, suppose those objects hold strong references to the keys, as
> my 1:1 hashtable design does.  Then you're back in a circular reference
> scenario.)
> 
> Is this necessary and/or desirable?
(In reply to comment #62)
> I don't think we should overload set() to remove() mappings.  The two verbs are
> quite different, and putting both add-mapping and remove-mapping under the same
> verb seems likely to be confusing.
> 
> Also, having remove() instead of set(..., undefined) being special would also
> allow weak maps to map any object to any value -- no need for the developer to
> specially worry if he ever wants to map |undefined|.
> 
> Permitting insertion of |undefined| would also require a has() or contains()
> method to make that distinction.  This seems like another good verb to have,
> regardless of whether |set()| is or isn't overloaded, even if the
> can't-insert-|undefined| behavior means it's not actually functionality
> overloading right now.

Hi Jeff, I am open to this change of API though it is not my first choice. Your suggestion corresponds to the Soft Own Fields example at http://wiki.ecmascript.org/doku.php?id=harmony:weak_maps#explicit_soft_own_fields . Either it or WeakMaps as proposed can trivially be built from the other, so either would be fine as the sanctioned primitive.

However, if you wish this change in API, the right place to advocate it is on es-discuss and at EcmaScript committee meetings. While the existing API stands as the accepted proposal, is there any reason not to implement it as is, until we have general agreement to revise it?

The reason I prefer the simpler API is it has a simpler semantics: A WeakMap is a total mapping from all objects to EcmaScript values. Only a finite number of these objects map to values other than undefined. With your suggested change, we'd then need to describe it as a partial mapping. Which I'm ok with, but it seems slightly ickier to me. It also means that for applications that only need the simpler semantics, i.e., that only use get/set, that the implementation would need to devote storage to maintain a distinction that these apps would never observe.


> 
> 
> Now a few less-important thoughts:
> 
> (In reply to comment #61)
> > It's great to see this moving again.
> > 
> > This shouldn't land without comprehensive tests. If you don't want to write
> > 'em yourself, call in a favor.
> 
> Concur!
> 
> > Also, I bet this gets spec'd such that if you omit the second argument, the
> > value is undefined, more like all the other standard builtin functions:
> > 
> >    map.set(Math);   // delete the entry for Math
> 
> We can always loosen to allow omitting the second argument if it ends up being
> specified that way.  It's harder to go the opposite direction if we guess
> incorrectly.
> 
> There's a js::ToObject method which does exactly what your NonNullObject method
> does -- use that instead.  But I'd suggest having a more specific error message
> when the first argument to set() isn't an object, because I suspect the most
> likely expectation for developers is going to be that WeakMap is a superset of
> a normal object's mapping functionality, *not* that it's a subset of the exact
> inverse functionality.  (In other words: they'll expect this "better hash" will
> map anything to anything, not that it will only map objects to anything [or
> anything-minus-|undefined| as it stands now].)
> 
> > How about a debug-only shell function weakMapCount(map) instead?
> 
> I had exactly this concern and suggestion myself.
I wrote:
> > How about a debug-only shell function weakMapCount(map) instead?

But actually there's no reason for that to be debug-only, really. If it's going to be used in tests, those tests might as well work in opt builds too.

(In reply to comment #62)
> There's a js::ToObject method which does exactly what your NonNullObject method
> does -- use that instead.

They're different. js::ToObject(cx, &stringval) creates a String wrapper object; NonNullObject(cx, stringval) throws.
(In reply to comment #65)
> (In reply to comment #62)
> > Permitting insertion of |undefined| would also require a has() or contains()
> > method to make that distinction.  This seems like another good verb to have,
> > regardless of whether |set()| is or isn't overloaded, even if the
> > can't-insert-|undefined| behavior means it's not actually functionality
> > overloading right now.
> 
> Hi Jeff, I am open to this change of API though it is not my first choice. 
> [...]
> The reason I prefer the simpler API is it has a simpler semantics: A WeakMap is
> a total mapping from all objects to EcmaScript values. Only a finite number of
> these objects map to values other than undefined. With your suggested change,
> we'd then need to describe it as a partial mapping.

I was on the fence before, but this convinced me that Waldo is right. :-)

Programmers are very much used to Map meaning "partial mapping". Finite data structures are the norm.

Even from a mathematical purity standpoint it's hard to see what value there is in making a WeakMap "a total mapping from all objects to EcmaScript values where a finite number of these objects map to values other than undefined" rather than "a finite partial mapping from objects to EcmaScript values".

For almost every total mapping the programmer might actually want, a WeakMap is insufficient: you have to write a function. WeakMaps will be used as partial mappings in practice, because that's the job they can actually do. The only question is whether it's a partial mapping from objects to values (per Waldo), or a partial mapping from objects to values-except-undefined (per Mark). The former seems nicer to me.

Since programmers will use these as partial maps in practice, we might as well offer has() and remove() so that they can write what they mean when querying to see if an object has been added already or removing an object.
A few thoughts:

a) This discussion should perhaps be on es-discuss or with TC39.

b) I'm surprised at Mark's comments in this bug thread; my understanding was that one of the key benefits of not exposing the table size or whether an object is in the table is that it avoids exposing the unspecified garbage collection behavior. Being able to observe the GC behavior is a portability hazard, and may (insert standard IANA security expert disclaimer here) also create side channels for attackers to communicate information.

c) I agree with Jason that the total mapping idea is unintuitive. Infinite data structures don't have an intuitive space model, and weak maps are all about space efficiency.

d) I don't consider c) sufficient to argue against the currently spec'ed API, just insufficient to argue for it. :)

e) I disagree with Jason that just because our tests want to observe something, we should automatically expose it to everyone. If we want to test implementation details, we shouldn't automatically expose those details to everyone. It's easier to change our test suite when the implementation details change than to change the entire web.

f) Now that I think about it, though, it seems that even with the currently-spec'ed API, it's still always possible to observe that a value has disappeared from the table, by simply boxing all values you place in the map. Then if you get back undefined, you know the mapping has disappeared. Does this mean my understanding in b) was wrong all along, and that weak maps still do, in fact, expose the GC details?

Dave
> f) Now that I think about it, though, it seems that even with the
> currently-spec'ed API, it's still always possible to observe that a value has
> disappeared from the table, by simply boxing all values you place in the map.
> Then if you get back undefined, you know the mapping has disappeared. Does this
> mean my understanding in b) was wrong all along, and that weak maps still do,
> in fact, expose the GC details?

Ah. The reason it doesn't expose the GC details is that you have to already have a strong reference to the object in order to look it up. But the same is true for has(). So I am in favor of adding has().

We should *not* expose count, though. But Jason tells me he only meant we should expose count to the shell, not the web. I'm fine with that.

I will try to bring this up at the face-to-face next week, even though we have a full agenda. We should try to see if we can resolve this quickly, so that we can get it into Fx5. Harmonia ship-ienda est.

Dave
While I still mildly prefer the simpler API, I do recognize the logic of these counter-arguments. If we add has() we should also add delete(). Either way, no non-determinism is exposed. I'm rather happy with either API being the official one. Since I have not heard anyone else feel strongly in favor of the simpler API, I suspect we can indeed get quick agreement on the get/set/has/delete API.

Btw, please have it be "delete" rather than "remove".


> I'm surprised at Mark's comments in this bug thread; 

Which comments were those? 


> by simply boxing all values you place in the map.

At http://wiki.ecmascript.org/doku.php?id=harmony:weak_maps#explicit_soft_own_fields I show how to implement the get/set/has/delete abstraction out of the simpler get/set abstraction with no boxing overhead.
> Btw, please have it be "delete" rather than "remove".

I don't have strong feelings either way, but I'm curious why?

> > I'm surprised at Mark's comments in this bug thread; 
> 
> Which comments were those? 

Oh sorry, I retract my surprise. :) I was just temporarily confused about which operations expose non-determinism.

Dave
> > Btw, please have it be "delete" rather than "remove".

> I don't have strong feelings either way, but I'm curious why?

With the addition of has and delete, WeakMaps can not be used more directly in a soft-field-like way, modeling own[*] properties externally by a side table. The current ES5 APIs expose these four property manipulation operations as associated with "set", "get" (i.e., property descriptors of accessor properties), "has" ("hasOwnProperty"), and obviously "delete".


[*] this "own" would now be the only difference between WeakMap and SoftField, so they should have the same API. For genuine soft fields, the analogy argument for "delete" is even stronger.
> WeakMaps can not be used

Oops. Should be: "WeakMaps can now be used ...
I need a consensus for the API here. FF5 freezes in 2 weeks. I still like:

set(key, value)
get(key)

Its very rare that you want to preserve void values, and in that case you can easily wrap:

map = new WeakMap();
map.set = function(k,v) { WeakMap.prototype.set.call(this, k, { value: v }); }
map.contains = function(k) { return !!WeakMap.prototype.get.call(this, k); }
map.get = function(k) { return WeakMap.prototype.get.call(this, k).value; }

Again, I think this is a rather rare case so the overhead shouldn't be too bad.
Attached patch patch (obsolete) — Splinter Review
Attachment #519354 - Attachment is obsolete: true
Attached patch patch (obsolete) — Splinter Review
Attachment #522883 - Attachment is obsolete: true
Small change of heart. Here's my suggested API:

map.set(k, v)
map.contains(k)
map.remove(k)
map.get(k[, v])
    - if k is in the map, it returns the found value and ignores v
    - if k is not in the map and v is callable, it calls v with k as the single argument and returns the result of that
    - if k is not in the map and v is not callable, it returns v

(I know Mark wanted it to be called delete, but programmers won't be able to use the keyword after the dot in legacy versions of JS, so they'll have to use map["delete"] instead, which is a loss for readability and convenience.)

Rationale for the get API:
    - you don't have to pass the default if you don't want, and you'll get undefined as a reasonable, if slightly ambiguous, sentinel value indicating "not there"
    - you can easily pass common default values such as null, undefined, 0, "", etc
    - this API doesn't distinguish between not passing the second argument and passing undefined, which is usually a good idea
    - this API never has to throw an error from within the engine
    - if you want the default value to be a function (which is probably rare), you can still do it by giving a function that returns the function you want -- a nuisance, certainly, but this is a less common case

I will propose this API to TC39. I can't promise it will be the final API, though.

Dave
print(map.contains(key));
print(map.set(key, 5));
print(map.get(key));
print(map.get({}));
print(map.get({}, null));
print(map.get({}, function(key) { print(key); print(this); return 42; }));

false
undefined
5
undefined
null
[object Object]
[object WeakMap]
42
Attached patch patch (obsolete) — Splinter Review
Patch with new API.
Attachment #522884 - Attachment is obsolete: true
Attachment #523058 - Flags: review?(jorendorff)
(In reply to comment #77)
> Small change of heart. Here's my suggested API:
> 
> map.set(k, v)
> map.contains(k)

I prefer "has". On a map-like abstraction, I find "contains" more suggestive of a test of values. YMMV. And JS's current use of "has" is "hasOwnProperty" which is already a key-like test, so it has precedent.

In any case, "has" is shorter. Even the clearest "hasKey" is shorter than "contains".


> map.get(k[, v])
>     - if k is in the map, it returns the found value and ignores v
>     - if k is not in the map and v is callable, it calls v with k as the single
> argument and returns the result of that
>     - if k is not in the map and v is not callable, it returns v

What about the case where k is not in the map and v is absent? The two obvious possibilities: return undefined or throw something.

I'm also not sure I like the third bullet above. If v is dynamically calculated or passed in from somewhere, and is usually a non-callable, then it simple code seem to work until v happens to be callable. If JS ends up with #-functions, then I'd get rid of the third bullet case as too accident prone in exchange for too little syntactic benefit. OTOH, without #-functions, perhaps the syntactic benefit becomes the more important consideration. (Which is an interesting lesson in syntactic costs leading to semantically worse APIs.)

> map.remove(k)

> (I know Mark wanted it to be called delete, but programmers won't be able to
> use the keyword after the dot in legacy versions of JS, so they'll have to use
> map["delete"] instead, which is a loss for readability and convenience.)

I don't understand this. Under what scenario would someone use the WeakMap API on an ES3 implementation? If the answer is essentially none, then I still favor "delete".


> 
> Rationale for the get API:
>     - you don't have to pass the default if you don't want, and you'll get
> undefined as a reasonable, if slightly ambiguous, sentinel value indicating
> "not there"
>     - you can easily pass common default values such as null, undefined, 0, "",
> etc
>     - this API doesn't distinguish between not passing the second argument and
> passing undefined, which is usually a good idea
>     - this API never has to throw an error from within the engine
>     - if you want the default value to be a function (which is probably rare),
> you can still do it by giving a function that returns the function you want --
> a nuisance, certainly, but this is a less common case

That clarifies why the fourth bullet was left out. But if we get rid of the third bullet, as I think we should, then we still need to distinguish between undefined and absence. OTOH, we could replace the third bullet by a narrower condition that allows null and undefined to be passed through as default values.

A final possibility, with some pros and cons, for missed lookup handling logic, is to pass an optional missed lookup handler to the constructor rather than the get function:

WeakMap(v_opt)

where the optional v has the same signature as the argument proposed for get above. I don't yet know what I think about this.


> 
> I will propose this API to TC39. I can't promise it will be the final API,
> though.

Indeed. I like the functionality. I do not like the method names.


> 
> Dave
> I prefer "has".

+1

> > map.get(k[, v])
> >     - if k is in the map, it returns the found value and ignores v
> >     - if k is not in the map and v is callable, it calls v with k as the single
> > argument and returns the result of that
> >     - if k is not in the map and v is not callable, it returns v
> 
> What about the case where k is not in the map and v is absent? The two obvious
> possibilities: return undefined or throw something.

I hinted at this in the rationale but wasn't clear about it, sorry; the idea is that leaving out the default value is identical to passing undefined -- i.e., if you leave it out, you get back undefined.

> I'm also not sure I like the third bullet above. If v is dynamically calculated
> or passed in from somewhere, and is usually a non-callable, then it simple code
> seem to work until v happens to be callable.

I hear you -- I was against this "overloading" at first, too. I'll admit that it was when I saw that Racket actually does the same overloading that I began to soften.

The rationale here is one of ergonomics; you want the default behavior of the common API to match the common case. But you also want to make it not too inconvenient to do the general case either. If, in particular, you want to be able to distinguish between maps-to-undefined and not-found, it's much more convenient to write

    map.get(key, orfail)

than to write

    map.has(key) ? map.get(key) : fail(key)

or to monkey-patch WeakMap with a new method to abstract the above. Another alternative would be to have two different API's for lookup, one with the convenient common case and one with the callback:

    map.get(key[, def=undefined]) // returns the value if present or def otherwise
    map.lookup(key, getdef) // returns the value if present or getdef(key) otherwise

IOW, break apart the overloading into two separate methods. I'm okay with this, although it does introduce an extra method into the API.

> I don't understand this. Under what scenario would someone use the WeakMap API
> on an ES3 implementation? If the answer is essentially none, then I still favor
> "delete".

I don't have a strong argument either way, but I think the rationale for "delete" is actually kind of thin; weak maps have plenty of use cases beyond soft fields, so I don't see a need to name the method for the one use case. And it's just that I have this nervous feeling that people will be worried about syntactic compatibility and end up writing map["delete"] instead of map.delete.

> That clarifies why the fourth bullet was left out. But if we get rid of the
> third bullet, as I think we should, then we still need to distinguish between
> undefined and absence. OTOH, we could replace the third bullet by a narrower
> condition that allows null and undefined to be passed through as default
> values.

IIUC, that would not be quite expressive enough. What if I have a map that can contain both null and undefined? I want the general ability to write custom behavior for the not-found case, including throwing.

> A final possibility, with some pros and cons, for missed lookup handling logic,
> is to pass an optional missed lookup handler to the constructor rather than the
> get function:
> 
> WeakMap(v_opt)

I don't think that's flexible enough. The creator of the map and the client of the map may not be the same code. The client should be the one to decide what they want to do for the not-found case.

Dave
(In reply to comment #81)
> Another
> alternative would be to have two different API's for lookup, one with the
> convenient common case and one with the callback:
> 
>     map.get(key[, def=undefined]) // returns the value if present or def
> otherwise
>     map.lookup(key, getdef) // returns the value if present or getdef(key)
> otherwise

One can always use a private unique object to implement the lookup on top of the getmethod  if really necessary. So the lookup version is only sugar. Given that sugar does not help the case when one wants to change the control flow in the caller of map.get depending on the absence of the key, I do not see the need to have that sugar in weak maps.
This shouldn't land without comprehensive tests. If you don't want to write 'em
yourself, call in a favor.

Give js::HashMap<JSObject *, Value> a typedef.

Rename "contains" to "has" per comment 81.

>+    if (!InstanceOf(cx, obj, &WeakMap::jsclass, vp ? vp + 2 : NULL))

vp is never null, so just `vp + 2`.

>+            if (!ExternalInvoke(cx, ObjectValue(*obj), *vp, 1, argv, vp))

Dherman changed his mind since then. Let's make it completely ignore the
second argument for now. We'll add in the final behavior later.

In WeakMap::set:
>+        if (value.isUndefined())
>+            return true;

These two lines have to go because of contains().

>+        void *mem = cx->malloc(sizeof(WeakMap));
>+        if (!mem)
>+            goto out_of_memory;
>+        table = js_new<WeakMap>(cx);

Bogus.

>+    if (!value.isUndefined())
>+        return table->map.put(key, value) != NULL;
>+    table->map.remove(key);
>+    return true;

Should just be:
  return table->map.put(key, value) != NULL;
> >+            if (!ExternalInvoke(cx, ObjectValue(*obj), *vp, 1, argv, vp))
> 
> Dherman changed his mind since then. Let's make it completely ignore the
> second argument for now. We'll add in the final behavior later.

If we're going to add the |lookup| function later, can we at least add the optional second argument that defaults to undefined and is always the return value when the property is not found? This should be trivial to implement and makes the API much more flexible at minimal implementation cost.

I still want the additional method when we can, though.

BTW, since Tuesday is coming fast and furious, I'm willing to be the one to flinch on the "remove vs. delete" game of chicken. I personally like the name remove better, but I concede that my concern over contextual keyword is probably not that big a deal, and we've already gone down that road with generator.throw anyway.

Dave
(In reply to comment #84)
> If we're going to add the |lookup| function later, can we at least add the
> optional second argument that defaults to undefined and is always the return
> value when the property is not found? This should be trivial to implement and
> makes the API much more flexible at minimal implementation cost.
> 
> I still want the additional method when we can, though.

+1.


> 
> BTW, since Tuesday is coming fast and furious, I'm willing to be the one to
> flinch on the "remove vs. delete" game of chicken.

Thanks David. I did not intend to turn this into a game of chicken, but my head has been elsewhere lately for various reasons and I didn't have time to engage on this issue. Since you had earlier written "I don't have a strong argument either way" (comment 81) I inferred that either would be acceptable enough to you. Thanks for being flexible on this.


> I personally like the name
> remove better, but I concede that my concern over contextual keyword is
> probably not that big a deal, and we've already gone down that road with
> generator.throw anyway.
> 
> Dave
> Thanks David. I did not intend to turn this into a game of chicken, but my head
> has been elsewhere lately for various reasons and I didn't have time to engage
> on this issue.

That came across more adversarial than I intended, sorry about that -- I just meant that with a Firefox deadline approaching, someone was going to have to give soon, and I'm willing to be it.

> Since you had earlier written "I don't have a strong argument
> either way" (comment 81) I inferred that either would be acceptable enough to
> you. Thanks for being flexible on this.

Sure. I'm not swayed by the argument that it should match the soft field use case, because weak maps are a general mechanism; the name should match the level of abstraction appropriate for the API it belongs to, not for a different layer of abstraction that's placed on top.

Anyway, all that said, I don't feel incredibly strongly on this issue. At the end of the day, they're pretty intuitive 6-letter/two-syllable words. :)

Dave
Alias: WeakMap
Attached patch patch (obsolete) — Splinter Review
New patch, please re-review and I will work on tests in the meantime.
Attachment #523058 - Attachment is obsolete: true
Attachment #523058 - Flags: review?(jorendorff)
Attachment #525194 - Flags: review?(jorendorff)
Attached patch patch (obsolete) — Splinter Review
Attachment #525194 - Attachment is obsolete: true
Attachment #525194 - Flags: review?(jorendorff)
Attachment #525204 - Flags: review?(jorendorff)
Comment on attachment 525204 [details] [diff] [review]
patch

Pardon the drive-by, but...

>+static JSObject *
>+NonNullObject(JSContext *cx, Value *vp)
>+{
>+    if (vp->isPrimitive()) {
>+        JS_ReportErrorNumber(cx, js_GetErrorMessage, NULL, JSMSG_NOT_NONNULL_OBJECT);
>+        return NULL;
>+    }
>+    return &vp->toObject();
>+}

this is also defined in jsproxy.cpp. Can it move into some .h somewhere?

Dave
comment 89: done.
>+    if (!js_DefineNativeProperty(cx, proto, ATOM_TO_JSID(cx->runtime->atomState.lengthAtom),
>+                                 UndefinedValue(), NULL, NULL,
>+                                 JSPROP_READONLY | JSPROP_PERMANENT | JSPROP_SHARED, 0, 0,
>+                                 NULL)) {
>+        return NULL;
>+    }

Kill this.

Someone proposed a shell-only function that gets the number of entries in a WeakMap. I want it for tests.
Attached patch patchSplinter Review
Attachment #525204 - Attachment is obsolete: true
Attachment #525204 - Flags: review?(jorendorff)
Attachment #525298 - Flags: review?(jorendorff)
Attached file weakmaps.js
Here's the test. Patch passes with flying colors, except in debug builds where the bogus DEBUG-only .length property I mentioned in the previous comment causes it to fail.
Comment on attachment 525298 [details] [diff] [review]
patch

r=me with the test and with the .length property removed.

A shell-only primitive to get at the keys of a Weakmap (or just the number of keys) would be great, to test that we don't leak. However I'm fine with landing with only the attached test. (The most important thing is that we not crash.)
Attachment #525298 - Flags: review?(jorendorff) → review+
Attached file fuzzmaps.js
For what it's worth, a program to randomly make tangles of weakmaps, then let the GC get rid of them. This found nothing wrong. Probably because there aren't any bugs, right?
http://hg.mozilla.org/mozilla-central/rev/7dcd0d16cc08
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Whiteboard: fixed-in-tracemonkey
Backed out for causing jsreftest crashes on debug builds:
http://hg.mozilla.org/mozilla-central/rev/22d3f09a3f37
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attachment #525299 - Attachment mime type: text/x-c → text/javascript
Attachment #525302 - Attachment mime type: text/x-c → text/javascript
Whiteboard: fixed-in-tracemonkey
I think we were not resetting gcWeakMapList after sweeping. I will reland with that fix.
comment #98 was a red herring. The original patch passed try. The crazy aurora landing must have been busted. I will re-land on TM.
http://hg.mozilla.org/tracemonkey/rev/242947d76f73
Whiteboard: fixed-in-tracemonkey
Blizzard: this will be one of the major new JS features for FF6, assuming it sticks this time around.
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
Depends on: 650753
It looks like this hasn't relanded on mozilla-central yet.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
http://hg.mozilla.org/mozilla-central/rev/242947d76f73
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla6
First, I'd like to add a stress test to WeakMaps:
-----
var wm = WeakMap();
var key = {}, value = {};

while(true){
  wm.set(key, value);
  key = value;
  value = {};
}
-----
At each loop turn, the reference of the "key" object is lost, so it's garbage collectable. After a second turn, the "value" object of this entry becomes garbage-collectable too. In all rigor, this program doesn't use much memory, because it doesn't keep references. However, naive garbage-collector may just "scratch the surface" to see what is garbage-collectable and not notice that since some key objects aren't accessible, values aren't either. These values being themselves keys, they make another object unreachable, etc.


Then, I'd like to write the doc on this feature. On the tests files, I've seen that WeakMaps have a "has" and "delete" methods natively, unlike the current state of the harmony proposal (which just contains "get" and "set"). Is it normal?
(In reply to comment #104)
> First, I'd like to add a stress test to WeakMaps:

There's no actual testing in this test. Also we try to avoid tests that run forever.

To turn this into a real test, we need two things:

  * A way for a test to run GC without conservative stack scanning.
    This can only be done safely when the stack is empty, so it would
    have to be something like:

        fullGCAndThen(function () { ... stuff to do after gc ... });

    I think this is probably easy to implement both for the browser
    (using setTimeout) and for the shell, but no one has done it.

  * A way to check and see if the WeakMap entries got collected or not,
    rather than waiting for the process to run out of memory. We kicked
    this idea around in this bug earlier, but no one did it.

Neither of these things should be exposed to the Web. They're just for testing.

I will gladly draw straws for these two tasks, or split up the work, if anyone else is willing.

> Then, I'd like to write the doc on this feature. On the tests files, I've
> seen that WeakMaps have a "has" and "delete" methods natively, unlike the
> current state of the harmony proposal (which just contains "get" and "set").
> Is it normal?

See comment 62 and subsequent discussion (comment 65, comment 67, etc.)

The wiki page hadn't been updated, last I looked. I hope it will be.
>   * A way for a test to run GC without conservative stack scanning.

OMG please yes.

It might be good for this API to take an optional compartment parameter, like the other GC APIs (after the patch in bug 650978).

>   * A way to check and see if the WeakMap entries got collected or not,
>     rather than waiting for the process to run out of memory. We kicked
>     this idea around in this bug earlier, but no one did it.

The API added in bug 597906 isn't specific to WeakMap, but I think it would work well enough to tell you whether a chain of WeakMaps got collected all the way.
(In reply to comment #105)
> (In reply to comment #104)
> > First, I'd like to add a stress test to WeakMaps:
> 
> There's no actual testing in this test. Also we try to avoid tests that run
> forever.
Ok.
I've just run this code on the last nightly (on Ubuntu 11.04, 32bits):
----
var wm = WeakMap();
var key = {}, value = {};
var N = Math.pow(2, 20);
while(N--){
  wm.set(key, value);
  key = value;
  value = {};
}
----
The first time lasts about a second. Second run takes several minutes (and tell me the page is not responding and ask if I want to kill the script at some point). Using 21 instead of 20 makes the first run long.

> (...)
>   * A way to check and see if the WeakMap entries got collected or not,
>     rather than waiting for the process to run out of memory. We kicked
>     this idea around in this bug earlier, but no one did it.
This last point might be the explanation of the second run being slow. Should I file a bug for the long script or will there be bugs for this point?


> Neither of these things should be exposed to the Web. They're just for
> testing.
Do you mean that it is not planned that WeakMaps reach Firefox 6?


> > Then, I'd like to write the doc on this feature. On the tests files, I've
> > seen that WeakMaps have a "has" and "delete" methods natively, unlike the
> > current state of the harmony proposal (which just contains "get" and "set").
> > Is it normal?
> 
> See comment 62 and subsequent discussion (comment 65, comment 67, etc.)
Ok, thanks.


> The wiki page hadn't been updated, last I looked. I hope it will be.
In July probably according to https://mail.mozilla.org/pipermail/es-discuss/2011-May/014101.html
(In reply to comment #107)
> (In reply to comment #105)
[...]
> > The wiki page hadn't been updated, last I looked. I hope it will be.
> In July probably according to
> https://mail.mozilla.org/pipermail/es-discuss/2011-May/014101.html

y.

AFAIK, there are only two differences between what Andreas implemented and what I'm planning to propose:

1) Dave Herman suggested (what became a separate) lookup method which is like get, but, if its first key argument isn't found, in returns the result of calling its second argument. (An earlier proposal had both behaviors in one get method, dynamically switched based on whether the 2nd argument was a function. I think this was too error prone.)

2) WeakMap.prototype.set(k,v) and WeakMap.prototype.delete(k) both mutate the WeakMap.prototype itself. Freezing it doesn't prevent this mutation since the mutable state is in internal properties, not freezable properties. Instead, we agreed that the WeakMap methods, when given a WeakMap.prototype object from any frame, should throw a TypeError rather than mutate it. 

At http://codereview.appspot.com/4249052/diff/37002/src/com/google/caja/ses/es5shim.js I monkey patch the current FF6.01a Nightly implementation to fix problem #2. Search for "PATCH_MUTABLE_FROZEN_WEAKMAP_PROTO" on that page. As you can see, getting this right cross frame by monkey patching is needlessly expensive.



Should I file a separate bug on #2, or should we reopen this bug and track it here?
We can fix both in one bug (but a new one). Both 1) and 2) are clearly bugs that should and will be fixed.
See https://bugzilla.mozilla.org/show_bug.cgi?id=656828(In reply to comment #109)
> We can fix both in one bug (but a new one). Both 1) and 2) are clearly bugs
> that should and will be fixed.

At https://bugzilla.mozilla.org/show_bug.cgi?id=656828
See Also: → 656828
Depends on: 656828
See Also: 656828
Depends on: 657264
Wrote https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/WeakMap during the last MDN doc sprint. Could I have a review on this page before changing keyword to dev-doc-complete?
Thanks.
Reviewed. Very nice work. Thanks, David.
Depends on: 668855
Depends on: 653248, 655297
Depends on: 673468
Depends on: 686114
Depends on: 688277
Blocks: es6
Depends on: 680937
Depends on: 707313
Depends on: 761620
Blocks: WeakSet
Depends on: 798678
Keywords: feature
Depends on: 804386
Depends on: 982561
Depends on: 819131
Depends on: 998355
Depends on: 1006099
Depends on: 1009349
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: