Closed Bug 368529 Opened 18 years ago Closed 7 years ago

the in operator on arrays is unintuitive; proposing __contains__ protocol

Categories

(Core :: JavaScript Engine, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED INVALID

People

(Reporter: stryker330, Unassigned)

Details

Using the |in| operator on array produces unintuitive results:

'a' in ['a','b','c'] // false

I understand perfectly that this is in accordance with ECMAScript edition 3. The |in| operator works upon objects and it's a "is key in object" operator, and the keys in the above array are the indices (0, 1, 2).

However, I think with the advent of JavaScript 1.7 and eventually JavaScript 2.0, we can fix this unintuitiveness. I can think of two ways to fix this:

1) Make a special case for arrays. In |x in y|, if y instanceof Array, return true if x is in one of the values of y.

Pros: Easy fix.

Cons: Backwards-incompatible, albeit I bet there are few (if any at all) scripts that rely on the current behavior.

2) Add a __contains__ protocol, inspired from Python. |x in y| is evaluated as follows: Find a __contains__ property (go down prototype chain, etc.). If found, return f.call(this, x) where f is the found property. If not found, current behavior.

Pros: This follows the example set by JavaScript 1.7 with its new __iterator__ protocol. Also Pythonic (which seems to be a good thing these days).

Cons: Harder to implement.

Note: I'm not sure if Object.prototype.__contains__ should exist. I've omitted it in the description above, since Object.prototype.__iterator__ doesn't exist for security reasons. I'm assuming that Object.prototype.__contains__ could likewise be insecure.
Forgot to mention another reason why __contains__ makes sense. Right now, we can use the new __iterator__ protocol to make the following do what's intuitive:

for (x in ['a', 'b', 'c']) print(x); // unintuitive: outputs 1, 2, 3
Array.prototype.__iterator__ = function() {
    for (let i = 0; i < this.length; i++)
        yield this[i];
};
for (x in ['a', 'b', 'c']) print(x); // intuitive: outputs 'a', 'b', 'c'

It would be nice to be able to fix the other use of the |in| operator in a "symmetric" way:

print('a' in ['a', 'b', 'c']); // unintuitive: outputs false
Array.prototype.__contains__ = function(x) {
    return this.indexOf(x) != -1;
};
print('a' in ['a', 'b', 'c']); // intuitive: outputs true
(In reply to comment #1)
> Forgot to mention another reason why __contains__ makes sense. Right now, we
> can use the new __iterator__ protocol to make the following do what's
> intuitive:
> 
> for (x in ['a', 'b', 'c']) print(x); // unintuitive: outputs 1, 2, 3

Outputs 0, 1, 2. Backward compatible, and you can use |for each (x in ...)| if you want to.

> Array.prototype.__iterator__ = function() {
>     for (let i = 0; i < this.length; i++)
>         yield this[i];
> };
> for (x in ['a', 'b', 'c']) print(x); // intuitive: outputs 'a', 'b', 'c'
> 
> It would be nice to be able to fix the other use of the |in| operator in a
> "symmetric" way:
> 
> print('a' in ['a', 'b', 'c']); // unintuitive: outputs false
> Array.prototype.__contains__ = function(x) {
>     return this.indexOf(x) != -1;
> };
> print('a' in ['a', 'b', 'c']); // intuitive: outputs true

I like the idea of a containment protocol to complement the iteration protocol, but this should be pitched at the es4discuss @ mozilla.org list.  Leaving this bug around in hope that ECMA TG1 does agree.

/be
(In reply to comment #2)
> I like the idea of a containment protocol to complement the iteration protocol,
> but this should be pitched at the es4discuss @ mozilla.org list.  Leaving this
> bug around in hope that ECMA TG1 does agree.

Note how with a protocol, users can opt into the new and incompatible world that looks more Pythonic.  This is the only way to go; ECMA TG1 won't go for an overt incompatibility.

The ES4 name for __iterator__ is iterator::get, so the analogue of __contains__ would want to be something else.  The http://developer.mozilla.org/es4/discussion/operators.html page talks about an |operators| namespace containing the ids for static methods overriding +, *, etc. So one could imagine ''operator::in'' being the protocol hook. Comments welcome here before taking to es4-discuss, to get our stories straight ;-).

/be
Perhaps better, since operator as a namespace is used for static functions named +, *, %, etc. -- but not for the in or instanceof relational operators -- would be to use iterator::in.  That would keep the matched extension-function pairs in the same namespace:

o.iterator::get = function () { /* function to get or make an iterator for o */}
o.iterator::in  = function (v) { /* "contains" test function for |v in o| */}

ES4 and JS1.7 allow keywords as names after ::, ., and .. of course.

The pair are not symmetric, in that itertor::get for an iterator returning values of type T has type function ():IteratorType.<T> where

type IteratorType = {
  next: function .<T>() : T
};

while iterator::in would have type function .<T>(T):boolean.

The delegation via iterator::get is desirable and Pythonic: __iter__ is Python's name for ES4's iterator::get (hence the namespace-free __iterator__ in JS1.7), and delegation has important properties such as making all iterators iterables, and allowing generator functions to be used as iterator-getters. But as sketched above, the "in" predicate would be a method of the iterable, not of its iterator.

This asymmetry feels wrong, but we do not want to add a second ("in" testing) method to IteratorType, which would overcomplicate all iterator implementations and break similarity with Python. Nor should we require code to get an iterator merely to test value containment or membership via the |in| operator. So the asymmetry is justified, I think.

Comments?

/be
(In reply to comment #2)
> (In reply to comment #1)
> > Forgot to mention another reason why __contains__ makes sense. Right now, we
> > can use the new __iterator__ protocol to make the following do what's
> > intuitive:
> > 
> > for (x in ['a', 'b', 'c']) print(x); // unintuitive: outputs 1, 2, 3
> 
> Outputs 0, 1, 2. Backward compatible, and you can use |for each (x in ...)| if
> you want to.

Oops, that's what I meant. I was just showing the current behavior of for-in on arrays.

(In reply to comment #4)
> Perhaps better, since operator as a namespace is used for static functions
> named +, *, %, etc. -- but not for the in or instanceof relational operators --
> would be to use iterator::in.  That would keep the matched extension-function
> pairs in the same namespace:
> 
> o.iterator::get = function () { /* function to get or make an iterator for o
> */}
> o.iterator::in  = function (v) { /* "contains" test function for |v in o| */}
> 
> ES4 and JS1.7 allow keywords as names after ::, ., and .. of course.

For the sake of backwards compatibility, would o.iterator::get and o.__iterator__ always refer to the same function, or would it search one property then the other?

Would both o.iterators::in and o.__contains__ be defined? Again, would they refer to the same function?

I'm rather indifferent on the exact syntax for specifying operators, though I will say that the namespace way looks more elegant.

> This asymmetry feels wrong, but we do not want to add a second ("in" testing)
> method to IteratorType, which would overcomplicate all iterator implementations
> and break similarity with Python. Nor should we require code to get an iterator
> merely to test value containment or membership via the |in| operator. So the
> asymmetry is justified, I think.
> 
> Comments?
> 
> /be
> 

By symmetry, do you mean forcing the __contains__ (or whatever it's called) to be called each iteration in an iterator? If so, I agree that's not worth the effort in both complexity and performance.
> (In reply to comment #4)
> > Perhaps better, since operator as a namespace is used for static functions
> > named +, *, %, etc. -- but not for the in or instanceof relational operators --
> > would be to use iterator::in.  That would keep the matched extension-function
> > pairs in the same namespace:
> > 
> > o.iterator::get = function () { /* function to get or make an iterator for o
> > */}
> > o.iterator::in  = function (v) { /* "contains" test function for |v in o| */}
> > 
> > ES4 and JS1.7 allow keywords as names after ::, ., and .. of course.
> 
> For the sake of backwards compatibility, would o.iterator::get and
> o.__iterator__ always refer to the same function, or would it search one
> property then the other?

Not sure -- we might all be better off, once ES4 is finalized and implemented in SpiderMonkey/Tamarin, with breaking JS1.7 compatibility here.  Depends on how much old code has to be grep'ed and revised, and whether anyone who can do the renaming on such code is around to do it.  But since JS1.7 is not "for the Web" (not a standard implemented by other browsers), it's conceivable that we should just deprecate __iterator__ in JS1.8 while supporting the new name, then remove it by JS2/ES4.

> Would both o.iterator::in and o.__contains__ be defined? Again, would they
> refer to the same function?

If we can use the E4X namespace support for iterator::get in JS1.8, then why bother with __contains__? We can jump straight to iterator::in.

> By symmetry, do you mean forcing the __contains__ (or whatever it's called) to
> be called each iteration in an iterator?

Actually I meant that the iterator::in method is on the iterable, whereas iterator::get while on the iterable too returns a distinct object, which must then be consistent with iterator::in.  But that's no great hardship; lots of related functions implementing methods on data structures must be consistent in various ways.

> If so, I agree that's not worth the
> effort in both complexity and performance.

Your point is a good one: |in| and |for-in| must be coherent, so if the former tests true for some v in o, the latter should visit v when iterating over o. This is the main point, and I've added it to the ES4 proposal in the wiki (not yet exported).

/be
(In reply to comment #6)
> Not sure -- we might all be better off, once ES4 is finalized and implemented
> in SpiderMonkey/Tamarin, with breaking JS1.7 compatibility here.  Depends on
> how much old code has to be grep'ed and revised, and whether anyone who can do
> the renaming on such code is around to do it.  But since JS1.7 is not "for the
> Web" (not a standard implemented by other browsers), it's conceivable that we
> should just deprecate __iterator__ in JS1.8 while supporting the new name, then
> remove it by JS2/ES4.
> 
> > Would both o.iterator::in and o.__contains__ be defined? Again, would they
> > refer to the same function?
> 
> If we can use the E4X namespace support for iterator::get in JS1.8, then why
> bother with __contains__? We can jump straight to iterator::in.

Make sense. If any code does rely on __iterator__ when JS2 comes out, the incompatibility could be fixed by a little script that defines setters and getters for __iterator__.

> > By symmetry, do you mean forcing the __contains__ (or whatever it's called) to
> > be called each iteration in an iterator?
> 
> Actually I meant that the iterator::in method is on the iterable, whereas
> iterator::get while on the iterable too returns a distinct object, which must
> then be consistent with iterator::in.  But that's no great hardship; lots of
> related functions implementing methods on data structures must be consistent in
> various ways.
> 
> > If so, I agree that's not worth the
> > effort in both complexity and performance.
> 
> Your point is a good one: |in| and |for-in| must be coherent, so if the former
> tests true for some v in o, the latter should visit v when iterating over o.
> This is the main point, and I've added it to the ES4 proposal in the wiki (not
> yet exported).
> 
> /be
> 

But how would you ensure that they are coherent? The only way I can think of doing that is to call iterator::in within the returned iterator for each iteration, but that's costly in terms of performance.
(In reply to comment #7)
> But how would you ensure that they are coherent? The only way I can think of
> doing that is to call iterator::in within the returned iterator for each
> iteration, but that's costly in terms of performance.

I agree we shouldn't do that. The rules for iterator::get and iterator::in are no different from those applying to one's own collection or container data structure and its iterator-getter and membership-test or contains methods. But the syntax is a lot nicer, and that's the point.

/be
I've thought about this a bit more, especially the array __iterator__ example above. I totally forgot about the for each (...) statement.

That complicates the meaning of |in|, giving it two possible meanings. Either |in| works with keys or |in| works with values.

Right now, __iterator__ works with both. |for (x in y)| calls y.__iterator__(true) and iterates over keys, while |for each (x in y)| calls y.__iterator__(false) and iterates over values (actually (key, value) pairs for objects). Perhaps we can come up with something analogous for the |in| operator, since we want the |in| operator to correspond as closely as possible with the |for (x in y)| statement.

|x in y| calls y.iterator::in(x, true)
|x <some syntax> y| calls y.iterator:in(x, false).

The question is what <some syntax> should be. We have |for each (x in y)|, so maybe |x each in y|? But that makes no sense literally. On the other hand, the |let x in y| in |for (let x in y)| also doesn't make much sense literally. I can't think of anything sensible at the moment.
Assignee: general → nobody
New features to JavaScript should now be proposed to TC39 [1] before being implemented in SpiderMonkey, therefore closing as Invalid.

[1] https://github.com/tc39/proposals/blob/master/CONTRIBUTING.md
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.