Closed Bug 843840 Opened 8 years ago Closed 7 years ago

document.documentElement.getElementsByTagName('select') finds select tag with id twice when enumerating the list

Categories

(Core :: DOM: Core & HTML, defect)

19 Branch
x86
Windows XP
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla31

People

(Reporter: email, Assigned: bzbarsky)

References

()

Details

(Keywords: compat, dev-doc-needed, site-compat)

Attachments

(3 files)

User Agent: Mozilla/5.0 (Windows NT 5.1; rv:19.0) Gecko/20100101 Firefox/19.0
Build ID: 20130215130331

Steps to reproduce:

No, what did firefox do?
A function that worked normally up until today doesn't work anymore now. I tested in Chrome, Safari, Opera and IE after I saw that it did not work anymore. No problem there.
This is the function:
function addselectevent() {
  var selectarray=[];
  selectarray = document.documentElement.getElementsByTagName('select');
  for(var elem in selectarray) {
    var selectelem = selectarray[elem];
    if (selectelem.nodeType==1 && selectelem.className!="nostyle") {
       alert(selectarray[elem].id);
    }
  }
}


Actual results:

As from today (after an update of firefox) it returns twice the id of the only select tag that is inside the document. For an example, look here:
http://www.biggee.nl/test/test.html

I'll zip these to you too


Expected results:

This function should only return one id when there is only one select element inside the document.
Summary: document.documentElement.getElementsByTagName('select') finds select tag twice and more rubble → document.documentElement.getElementsByTagName('select') finds select tag twice
>  for(var elem in selectarray) {

This will enumerate all the enumerable properties of the HTMLCollection.   Per spec that's all the indexed properties and all the ids and names of elements in the collection.

We didn't use to do the latter, but we've fixed that bug.

WebKit claims to return a NodeList, not an HTMLCollection, which is also wrong per spec.

In general, if you want to enumerate the items of a list, you should be using a for loop from 0 to length, not a for/in loop: the for/in loop will enumerate all sorts of stuff you don't want (e.g. above you'll get "elem" set to "length" and "item" at the very least, which is why you had to add weird nodeType checks).
Blocks: 772869
Summary: document.documentElement.getElementsByTagName('select') finds select tag twice → document.documentElement.getElementsByTagName('select') finds select tag twice when enumerating the list
Summary: document.documentElement.getElementsByTagName('select') finds select tag twice when enumerating the list → document.documentElement.getElementsByTagName('select') finds select tag with id twice when enumerating the list
Well, you make a good point there but I am not sure if you understand my issue.
The function I showed searches on pages for select tags. So I can't know up front how much select tags are in the document. This is why I used the 'weird' nodeType checks (for my purpose I find it a lesser degree of weird).

Anyhow, it can't possibly not be a bug. The getElementsByTagName method should NEVER return the same element (nodeType 1) twice. Believe me, I was baffled too.
Have I already said that this was not the case for months in all browsers up until 21-02-2013 after an update of firefox. Now only firefox shows this erratic result of the getElementsByTagName method.
> So I can't know up front how much select tags are in the document.

Sure, but selectarray.length will tell you.

> Anyhow, it can't possibly not be a bug.

It sure can, depending on what happens with the spec here.

> The getElementsByTagName method should NEVER return the same element (nodeType 1) twice.

The node is in the list once.

You should really consider logging the values "elem" takes in your loop.  I think that will make it clear to you what's happening.

Note that with other nodelists (e.g. form.elements and various others) you'd get the same behavior you now see in Firefox in other browsers; we used to have broken behavior there too, but fixed it in bug 772869.
Well, I was surprised that suddenly a working function did not work anymore in firefox after an update. I guess anyone would be.

I will not use the for...in loop anymore for these jobs as it's easy to fix (with array.length as you patiently point out to me). But I am still curious about this kind of loop. When is it wise to use this loop instead of the for loop from 0 to length? Should I only avoid it in regards to HTMLCollection or is there more to look out for?

Anyway, thanks for your answers and this 'bug-report' can be closed.
> Well, I was surprised that suddenly a working function did not work anymore

Sure.  That part makes sense.  ;)

> When is it wise to use this loop instead of the for loop from 0 to length?

When you want to enumerate the properties of an object.  What that means depends on the object.  For example, if you have a key-value store, enumerating its properties might make sense.

Generally for list-like objects (HTMLCollection, NodeList, arrays, etc) a for/in loop doesn't make much sense.

> Anyway, thanks for your answers and this 'bug-report' can be closed.

Still trying to figure out whether we need to change the spec here somehow if there is widespread dependance on the old behavior...
So in spec terms...

Opera does the same thing we do for HTMLCollections, but has getElementsByTagName as a NodeList.

WebKit not only has it as a NodeList but never enumerates named properties on HTMLCollections.  Filed https://bugs.webkit.org/show_bug.cgi?id=110611 and https://bugs.webkit.org/show_bug.cgi?id=110612

IE9 does something really weird.  It enumerate the names of things in the list, but not the indices.  And if something doesn't have a name, IE makes up a name for it for enumeration purposes, which is why things sort of work in IE.
Oh, and in IE9 the list here is in fact an HTMLCollection.
Boris, 
 I think you read too much into the specification. There is nowhere that says that all possible "indices" of properties should be enumerated. Nor are all properties necessarily enumerable (that's an attribute).
 So, for you to say "Per spec that's all the indexed properties and all the ids and names of elements in the collection.", that is patently false.
 You should never return the same property twice in an enumeration. Neither Chrome nor IE nor FF18 do this, and it is blatantly incompatible with existing javascript.
 The ECMASCRIPT specification has the paragraph below. Note the LAST SENTENCE:

    "The mechanics and order of enumerating the properties (step 6.a in the first algorithm, step 7.a in the second) is not specified. Properties of the object being enumerated may be deleted during enumeration. If a property that has not yet been visited during enumeration is deleted, then it will not be visited. If new properties are added to the object being enumerated during enumeration, the newly added properties are not guaranteed to be visited in the active enumeration. A property name must not be visited more than once in any enumeration."

Since FF19 is visiting the same property name twice (once with a numeric index, and once with the name index), it is violating the ECMASCRIPT specification for the "for-in" iteration.

Why would you ever want to enumerate the same property twice? It's silly.
> There is nowhere that says that all possible "indices" of properties should be enumerated.

Sure there is, for this case.  It's right in http://dev.w3.org/2006/webapi/WebIDL/#indexed-and-named-properties and in particular in http://dev.w3.org/2006/webapi/WebIDL/#getownproperty which sets desc.[[Enumerable]] to true for the properties.  Plus of course the IDL for HTMLCollection at http://dom.spec.whatwg.org/#interface-htmlcollection which defines the indexed and named getters and the prose there that describes the set of supported property names.

> A property name must not be visited more than once in any enumeration."

Sure.

> Since FF19 is visiting the same property name twice

No.  It's visiting two different property names: "0" and "testselect".  They happen to have the same property _value_, but names and values are quite different things, you know.

> Why would you ever want to enumerate the same property twice? 

You wouldn't, and we don't.
IE, Chrome, and I disagree.

First, there is nothing in the DOM specification that says anything about "enumeration of indices".  The enumeration algorithm is defined in the ECMASCRIPT specification -- have you read that?

Second, FF19 IS visiting the same property name twice.  If my input field has a name of "FRED", then it may be referenced as collection[0] or collection["FRED"]. However, both of these properties have the SAME name (viz., collection["FRED"].name and collection[0].name are equal), and, hopefully, they are the same property/input field. The fact that they are indexed differently does not change the fact that they are the same property.

Third, FF19 is definitely enumerating the same property twice -- once as a named property and once as an indexed property.  If you cannot see this, then that explains why this is a bug in FF19 and not in other browsers.  

Finally, please explain to me why visiting the same input field twice in a "for-in" enumeration is something that a developer would want?  Assuming you are a developer, would you be happy if Java or C++ runtime started enumerating all Array elements twice? Even if your interpretation of the spec is POSSIBLE, why WOULD you interpret it that way when your colleagues at other browser labs do not? Is there any developer that has requested this odd "feature"?  Why would you break compatibility so badly?  Just to slavishly follow your (IMHO, wrong) interpretation of the spec? 

On behalf of javascript developers, I request that you rethink your position on this and make this bug go away.
> First, there is nothing in the DOM specification that says anything about "enumeration of
> indices".

Allan, please do read the links I provided in comment 9.

> have you read that?

Yes, several times.  And helped write parts of it...

> Second, FF19 IS visiting the same property name twice.

No, it isn't.  Just try logging the property names that are visited.

> However, both of these properties have the SAME name

No, they have the same VALUE.  They have the names "0" and "FRED".  You're confusing the DOM "name" property of form controls with the name of a property.

> please explain to me why visiting the same input field twice in a "for-in" enumeration
> is something that a developer would want?

You wouldn't, which is why you should never use for-in on list-like objects.

> would you be happy if Java or C++ runtime started enumerating all Array elements twice

If you want to enumerate arraylikes, you just use a loop up to length.  A for-in loop is a completely different beast that will enumerate your indices _and_ any other enumerable properties the object may have.  And the WebIDL spec clearly requires the named properties on these objects to be enumerable.

> Is there any developer that has requested this odd "feature"?

Yes, in fact, for WebIDL objects with named properties.

Note that other browsers have pretty serious bugs with named properties in general (e.g. Chrome isn't even consistent about whether the named properties are on the object at all).  That explains a large part of their behavior.

If you feel that the spec is wrong and these properties shouldn't be enumerable, do please raise a spec issue.  But using for-in loops on arraylikes is a footgun that will blow your foot off, often.  And you must already be working around the fact that it returns all sorts of non-nodes because "item" and "length" are enumerable, if you're using it at all.
But just to be clear, I'm sympathetic to the argument that this is a compat-breaking change.  What we'd need then is to figure out what the spec _should_ say such that browsers can converge on it.  Note that Chrome and IE have totally different behavior in terms of what property names the enumerate; the behavior currently in the spec was decided on after lengthy discussion involving representatives from Apple, Opera, Mozilla, Google, and Microsoft.  Furthermore, Opera already implements that behavior for HTMLCollections and has for a while (but of course getElementsByTagName doesn't return an HTMLCollection in Opera).

Seriously, if you have a concrete proposal for a behavior that all browsers should implement that differs from what's in the spec now, please mail public-script-coord@w3.org with it.  Please make sure to consider different kinds of objects as well as ES5 reflection facilities when defining the proposal...
Thanks Boris, for the frank discussion.

I will see how much time I can allocate to this going forward. We are currently in the process of eliminating all "for-in" iterations from our javascript because of the silly browser developer wars over this behavior.

I have worked on standards in the past, and we had similar problems.  The spec is apparently, as of now, not as unambiguous as it should be, and its backward compatibility with "ad hoc" standards like IE and FF18 is less than perfect.  And, your interpretation produces an un-useful enumeration that duplicates DOM objects without good cause to do so.

So, are you saying because MSFT agreed with your interpretation of the spec that IE will start duplicating fields in HTMLCollection enumerations?  I doubt that, and if they don't, it would indicate that your interpretation is not the same as your august group of representatives, wouldn't it?
Allan, the spec is actually very unambiguous.  It's just that the other browsers aren't implementing it yet (except Opera).

> So, are you saying because MSFT agreed with your interpretation of the spec

You can read for yourself.  All the discussion was public.  The relevant mail from Travis is at http://lists.w3.org/Archives/Public/public-script-coord/2011JulSep/0371.html and the rest of the thread is linked from there.

Note that if there _are_ significant compatibility reasons (and so far the number of problems reported has been very low, given the number of users who have been testing builds with this change) the spec may in fact get changed as the mail there notes.  The question is to what.
Boris,
 You may not see compatibility issues immediately, because it is often not visible or harmful (semantically) to iterate over the same underlying objects twice.  For example, a search loop that terminates with a found value will not care. Of our product usage, a small number of the "for-in" iterations caused major problems of all those that we used.
 That said, the fact that the double-iterations are occurring in javascript without being noticed is still an incompatibility, as it affects performance if nothing else.  
 I still think it is a bad idea, but the bad idea probably started a long time ago with the "named" properties, which were only implemented because javascript does not have HashMaps, as far as I can tell.
Duplicate of this bug: 918615
I changed the specification to not enumerate named properties.

http://dom.spec.whatwg.org/#htmlcollection
Keywords: dev-doc-needed
Yay spec changes...
Assignee: nobody → bzbarsky
Status: UNCONFIRMED → NEW
Ever confirmed: true
So there are two viable implementation strategies here.

One is to just fix getOwnPropertyDescriptor (e.g. by adding an IsEnumerable() method on the C++ object for the proxy to call) and then having enumerate() call keys() instead of getOwnPropertyNames(), and it should all just work.  The drawback is that enumerating an HTMLCollection will fetch all the names and then make sure they're all not enumerable, wasting lots of work.

Another option is to add a version of getOwnPropertyNames with a boolean arg for enumerability on DOM proxies and have both enumerate() and getOwnPropertyNames() call that.  We'd still need to fix getOwnPropertyDescriptor, of course.  The boolean would be passed to the C++ GetSupportedNames() callback.

Peter, preferences?  I sort of prefer the second option....
Flags: needinfo?(peterv)
Second sounds better.
Flags: needinfo?(peterv)
Ms2ger, note the imptests fixes (in both patches) to make them align with what the spec actually says now.
Flags: needinfo?(Ms2ger)
Keywords: compat
Whiteboard: [need review]
Comment on attachment 8395490 [details] [diff] [review]
part 2.  Add a way to ask DOM proxies for only their enumerable property names, and use that in the enumerate hook.

Review of attachment 8395490 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/html/content/src/HTMLFormControlsCollection.cpp
@@ +407,5 @@
>  {
> +  if (!(aFlags & JSITER_HIDDEN)) {
> +    return;
> +  }
> +  

ws

::: dom/bindings/DOMJSProxyHandler.cpp
@@ +253,5 @@
>  
>  bool
> +BaseDOMProxyHandler::getOwnPropertyNames(JSContext* cx,
> +                                         JS::Handle<JSObject*> proxy,
> +                                         JS::AutoIdVector &props)

& to the left, please (also nearby)

::: dom/bindings/DOMJSProxyHandler.h
@@ +71,5 @@
> +  // functionality.  The "flags" argument is either JSITER_OWNONLY (for keys())
> +  // or JSITER_OWNONLY | JSITER_HIDDEN (for getOwnPropertyNames()).
> +  virtual bool ownPropNames(JSContext* cx, JS::Handle<JSObject*> proxy,
> +                            unsigned flags,
> +                            JS::AutoIdVector &props) = 0;  

ws
Thanks for letting me know, I submitted <https://github.com/w3c/web-platform-tests/pull/798>. Should I also steal (part of) your new test?
Flags: needinfo?(Ms2ger)
Please do!
Comment on attachment 8395489 [details] [diff] [review]
part 1.  Add a way to ask DOM proxies with a named getter whether a property is enumerable or not and use that information in getOwnPropertyDescriptor.

Review of attachment 8395489 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/html/content/src/HTMLFormControlsCollection.cpp
@@ +387,5 @@
>  
> +bool
> +HTMLFormControlsCollection::NameIsEnumerable(const nsAString& aName)
> +{
> +  return false;

Could just use the one from the base class (nsIHTMLCollection).

::: content/html/content/src/HTMLOptionsCollection.h
@@ +135,5 @@
>    }
>    HTMLOptionElement* NamedGetter(const nsAString& aName, bool& aFound);
> +  bool NameIsEnumerable(const nsAString& aName)
> +  {
> +    return false;

Could just use the one from the base class (nsIHTMLCollection).
Attachment #8395489 - Flags: review?(peterv) → review+
Comment on attachment 8395490 [details] [diff] [review]
part 2.  Add a way to ask DOM proxies for only their enumerable property names, and use that in the enumerate hook.

Review of attachment 8395490 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/html/content/test/test_formelements.html
@@ +44,5 @@
> +is(names[8], "@@iterator", "Enum entry 9");
> +is(names[9], "length", "Enum entry 10");
> +
> +names = Object.getOwnPropertyNames(x);
> +is(names.length, 10, "Should have 9 items");

"10 items"?
Attachment #8395490 - Flags: review?(peterv) → review+
https://hg.mozilla.org/mozilla-central/rev/6ffe0fe626e0
https://hg.mozilla.org/mozilla-central/rev/f0057045ace5
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla31
Keywords: site-compat
Depends on: 1019417
You need to log in before you can comment on or make changes to this bug.