Last Comment Bug 751331 - All jsids should be canonical
: All jsids should be canonical
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: x86 Mac OS X
: -- normal (vote)
: mozilla15
Assigned To: Brian Hackett (:bhackett)
:
Mentors:
Depends on: 752374
Blocks: 750733
  Show dependency treegraph
 
Reported: 2012-05-02 13:24 PDT by Brian Hackett (:bhackett)
Modified: 2012-05-19 12:35 PDT (History)
8 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
patch (07a4d4b0260c) (211.96 KB, patch)
2012-05-02 18:35 PDT, Brian Hackett (:bhackett)
luke: review+
Details | Diff | Review

Description Brian Hackett (:bhackett) 2012-05-02 13:24:05 PDT
The API permits the existence of non-canonical jsids, e.g. the atom "123".  This leads to various js_CheckForStringIndex calls scattered around the engine, which screws up attempts to use handles for the involved ids since handles are immutable.

This should be fixed by just not allowing non-canonical jsids to exist, killing js_CheckForStringIndex.  With the newish PropertyName class doing this conversion isn't too hard since PropertyNames can't be indexes, *except* that PropertyNames can be negative integers which are representable with a jsid.  This can be fixed by not allowing jsids to be negative numbers, and restricting them to the rang [0,2^31-1].
Comment 1 Boris Zbarsky [:bz] (Out June 25-July 6) 2012-05-02 13:26:36 PDT
YES!
Comment 2 Brian Hackett (:bhackett) 2012-05-02 18:35:05 PDT
Created attachment 620547 [details] [diff] [review]
patch (07a4d4b0260c)

Pretty straightforward, mostly search and replace except for some gunk for handling negative short IDs (which are sometimes signed, sometimes unsigned.  blech).
Comment 3 Luke Wagner [:luke] 2012-05-02 18:54:44 PDT
While I look at the rest: I think several of the cases in the parser could be PropertyName* because variable names can't be intgers.

Waldo: is that right?  Can we change ParseNode::pn_u.name.atom to a ParseNode*?  (Presumably named 'name' :)
Comment 4 Jeff Walden [:Waldo] (remove +bmo to email) 2012-05-03 10:26:37 PDT
In theory yes, but there are definitely existing uses that could be numbers.  Going by ParseNode.h comments we have these (and I wouldn't be surprised to find others):

  * PNK_NAME has "name, string, or object atom" (I think there's something wrong
    in the string case, not certain tho)
  * PNK_XMLATTR, PNK_XMLCDATA, PNK_XMLCOMMENT look iffy
  * PNK_XMLTEXT maybe

Not impossible, would require some auditing, and/or more of the class-per-ParseNode-kind work I was doing awhile back.  (But then you run into the arity punning, and things get messy, but in theory there's nothing making this impossible...)
Comment 5 Brian Hackett (:bhackett) 2012-05-03 10:37:17 PDT
I thought about digging into the parser stuff when writing the patch, but it's all just so confusing and so many corner cases.  Can't destructuring variables have weird names?  Anyways, besides adding some much needed clarity to the code I don't think it would affect perf much, this patch does use an AtomToId which should be very fast in typical cases (inlined code checks first char for decimal before making a call).
Comment 6 Luke Wagner [:luke] 2012-05-03 10:40:26 PDT
I wasn't very concerned with perf, just asking if there was an easy type-y improvement that could be made, but it sounds like 'no'.
Comment 7 Luke Wagner [:luke] 2012-05-04 11:30:29 PDT
Comment on attachment 620547 [details] [diff] [review]
patch (07a4d4b0260c)

Review of attachment 620547 [details] [diff] [review]:
-----------------------------------------------------------------

Great stuff!

The big remaining wart is this InternNonIntElementId stuff.  E4X-removal should replace all that with standard ValueToId, I think.

::: js/src/frontend/BytecodeEmitter.cpp
@@ +767,5 @@
>          if (!(stmt->flags & SIF_SCOPE))
>              continue;
>  
>          StaticBlockObject &blockObj = *stmt->blockObj;
> +        const Shape *shape = blockObj.nativeLookup(tc->parser->context, AtomToId(atom));

I think you can NAME_TO_JSID(atom->asPropertyName()) here

::: js/src/frontend/Parser.cpp
@@ +2006,5 @@
>       * Define the let binding's property before storing pn in the the binding's
>       * slot indexed by blockCount off the class-reserved slot base.
>       */
>      bool redeclared;
> +    jsid id = AtomToId(atom);

NameToId(atom->asPropertyName()) or perhaps make BindLet take a PropertyName.

::: js/src/jsapi.cpp
@@ +1741,1 @@
>  StdNameToAtom(JSContext *cx, JSStdName *stdn)

StdNameToPropertyName?

@@ +2039,5 @@
>      return rida;
>  }
>  
>  static JSIdArray *
> +AddAtomToArray(JSContext *cx, PropertyName *name, JSIdArray *ida, int *ip)

AddNameToArray?

::: js/src/jsatom.h
@@ +71,2 @@
>  static JS_ALWAYS_INLINE jsid
> +NON_INTEGER_ATOM_TO_JSID(JSAtom *atom)

Could you explain a little bit more in the comment the unfortunate situation with jsid and PropertyName.  Something like:

Note: prefer NameToId or AtomToId over this function:

A PropertyName is an atom that does not contain an integer in the range [0, UINT32_MAX]. However, jsid can only hold an integer in the range [0, JSID_INT_MAX] (where JSID_INT_MAX == 2^31-1).  Thus, for the range of integers (JSID_INT_MAX, UINT32_MAX], to represent as a jsid 'id', it must be the case JSID_IS_ATOM(id) and !JSID_TO_ATOM(id)->isPropertyName().  In most cases when creating a jsid, code does not have to care about this corner case because:
 - when given an arbitrary JSAtom*, AtomToId must be used, which handles this case
 - when given a PropertyName*, NameToId can be used which which does not need to do any dynamic checks
Thus, it is only the rare third case which needs this function:
 - given an JSAtom* which is known not to contain an int in the range [0, JSID_INT_MAX)

::: js/src/jsdbgapi.cpp
@@ -320,5 @@
>  JS_ClearWatchPoint(JSContext *cx, JSObject *obj, jsid id,
>                     JSWatchPointHandler *handlerp, JSObject **closurep)
>  {
>      assertSameCompartment(cx, obj, id);
>  

Should you do a ValueToId(IdToValue(id)) normalization here too?

::: js/src/jsfriendapi.cpp
@@ +213,5 @@
>              return false;
>  
>          RootedVarFunction fun(cx);
>          fun = js_DefineFunction(cx, objRoot,
> +                                AtomToId(atom),

can you put AtomToId(atom) on the same line as objRoot?

::: js/src/jsgcmark.cpp
@@ +270,5 @@
>  static inline void
>  MarkIdInternal(JSTracer *trc, jsid *id)
>  {
>      JS_SET_TRACING_LOCATION(trc, (void *)id);
>      if (JSID_IS_STRING(*id)) {

JSID_IS_ATOM here?

::: js/src/vm/ScopeObject.cpp
@@ +956,5 @@
>                  return false;
>  
>              /* The empty string indicates an int id. */
>              jsid id = atom != cx->runtime->emptyString
> +                      ? AtomToId(atom)

NameToId(atom->asPropertyName())  (again, not b/c I think the perf is so important, but b/c it makes harder assertions and catches bugs earlier).

::: js/src/vm/String.h
@@ +839,5 @@
>      return static_cast<js::PropertyName *>(this);
>  }
>  
> +static JS_ALWAYS_INLINE jsid
> +NAME_TO_JSID(js::PropertyName *name)

Since you are touching this anyway, can we move more to NON_MACRO_NAMES by naming this js::NameToId?  (I'd leave NON_INT_ATOM_TO_JSID ugly though :)
Comment 8 Brian Hackett (:bhackett) 2012-05-06 13:45:57 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/86d28b6fa4fc
Comment 9 Tom Schuster [:evilpie] 2012-05-06 14:11:22 PDT
\o/ I like this, I also have a patch for fix some of the problems in the parser, just need to find it! ;)
Comment 10 :Ehsan Akhgari (busy, don't ask for review please) 2012-05-07 16:08:19 PDT
https://hg.mozilla.org/mozilla-central/rev/86d28b6fa4fc

Note You need to log in before you can comment on or make changes to this bug.