Closed Bug 433529 Opened 16 years ago Closed 12 years ago

Statically name anonymous JavaScript functions for the debugger and Error.stack

Categories

(Core :: JavaScript Engine, enhancement)

Other Branch
enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla17

People

(Reporter: jorendorff, Assigned: u443197)

References

(Blocks 1 open bug)

Details

Attachments

(11 files, 7 obsolete files)

4.79 KB, text/plain
Details
30.67 KB, patch
jimb
: review+
Details | Diff | Splinter Review
1.84 KB, patch
jimb
: review+
Details | Diff | Splinter Review
3.01 KB, patch
jimb
: review+
Details | Diff | Splinter Review
5.17 KB, patch
u443197
: review+
Details | Diff | Splinter Review
1.85 KB, patch
jimb
: review+
Details | Diff | Splinter Review
3.10 KB, patch
jimb
: review+
Details | Diff | Splinter Review
14.58 KB, patch
jimb
: review+
Details | Diff | Splinter Review
3.85 KB, patch
Details | Diff | Splinter Review
583 bytes, patch
Details | Diff | Splinter Review
2.05 KB, patch
peterv
: review+
Details | Diff | Splinter Review
This bug refers to two specific cases:

  <expr>.xyz = function() ...;

  ({xyz: function() ...})

In both cases, the function's name should be "xyz".

(There are several common cases that this doesn't help: setTimeout, forEach, etc.  Separate bug.)
To clarify: this change would be for diagnostic purposes only.
Severity: normal → enhancement
Summary: Improve name attribute for some anonymous functions → Improve .name property for some anonymous functions
But it would affect scripts that look at .name, right?
(In reply to comment #2)
> But it would affect scripts that look at .name, right?

I guess the idea is not to touch the name property of a function but rather improve the diagnostic messages that includes the function name.
No, I really am that dumb.  :)  I don't suppose we can change exc.stack either.

An alternative would be to add new APIs, perhaps fn.displayName, exc.displayStack, and JS_GetFunctionDisplayName.  And then fix up the various tools to use these.  That feels a lot less compelling to me though.  This is probably INVALID.
Blocks: js::dbg2
Perhaps changing .name isn't acceptable, but I definitely think something like this should be done. It's essential to producing a usable backtrace.
"Naming Anonymous JavaScript Functions" is the name of a paper about this (co-authored by jjb):

http://code.google.com/p/querypoint-debugging/downloads/detail?name=NamingJSFunctions.pdf

It's practically a spec. A lot of the necessary work can be done in the compiler; this bug is for that.

Changing the summary to be vague on the point of whether there should be an actual .displayName property on functions. (Probably not -- see the wiki.ecmascript.org link for some discussion.)
Summary: Improve .name property for some anonymous functions → Statically name anonymous JavaScript functions for the debugger and Error.stack
There are some cases where the compiler can't statically choose an ideal function name:

    obj[prop] = function () { ... };   // what do you call this?

If this turns out to be a pain in practice, we can file follow-up bugs to

  - assign good names to *stack frames* (rather than functions) in the debugger
    by sniffing around on the stack, maybe even examining the caller's bytecode
    to find the name the caller used to call it

  - impose a bit of runtime overhead on lambda creation in the few unusual
    cases like obj[prop]=, and set the function's displayName to the value
    of prop, rather than "prop".

etc. But let's tackle the easy part first.
Attached patch patch (obsolete) — Splinter Review
I implemented chunks of the algorithm lined out in the paper that jorendorff posted, and got some good names for anonymous functions. The paper did some stuff with functions as arguments to other things, but I thought those names were kind of obscure.

The names that I've implemented are:

  a.b - either "var a.b = ..." or "var a = {b: ...}"
  a   - any of "var a = ...", "function a ...", or
        "var a = function() { return function(){}; }()" where the inner function
        is named 'a'
  a/b - inner 'b' of "var a = function() { var b = function() {}; }"
  a<  - flags a "contributor" or basically some helper function which contributes
        to the function named 'a' by being anonymous inside it. This is kinda
        odd and is one I wouldn't mind dropping or enhancing.

Some function still go unnamed, but I'm seeing that number of functions as being drastically reduced and I looked at a few and unless there's weird names like "foo/bar/</a", then they seemed genuinely anonymous and weren't that relevant to the rest of the program.

I added a field pn_parent to ParseNode, and I ran this on a 10k line yui.js file with a 2% increase in memory (~7.75MB => ~7.95MB).

I've been working on this to get some better profiling information, and I'll post a few profiles I got after this one.
Assignee: general → acrichton
Attached file profiles
Profiles from a few websites with named functions instead of everything being called "<anonymous>"
Oh one other relevant fact is where I store the "pretty name." I didn't want to break any current tests or anything related to decompilation or the "name" property, but I still store the generated atom in the "atom" field of the JSFunction. In doing so, a flag is also set saying that the name was generated instead of given. The flag is then tested in the relevant locations to return an empty string instead of the generated atom.

I would have added a separate field to the JSFunction, but some static assertions about GC got angry at me, and otherwise it looks like I'd have to increase the size of the JSFunction unnecessarily. I thought this way tools which already read 'atom' would benefit, while those which cared about the semantic correctness would check explicitly to not return fancy names.
Attachment #644402 - Flags: review?(ejpbruel)
Comment on attachment 644402 [details] [diff] [review]
patch

Nick, could you review this one? Much of it is in the parser, and it adds a few field to parse nodes.
Attachment #644402 - Flags: review?(ejpbruel) → review?(n.nethercote)
(In reply to Alex Crichton [:acrichto] from comment #11)
> Oh one other relevant fact is where I store the "pretty name." I didn't want
> to break any current tests or anything related to decompilation or the
> "name" property, but I still store the generated atom in the "atom" field of
> the JSFunction. In doing so, a flag is also set saying that the name was
> generated instead of given. The flag is then tested in the relevant
> locations to return an empty string instead of the generated atom.
> 
> I would have added a separate field to the JSFunction, but some static
> assertions about GC got angry at me, and otherwise it looks like I'd have to
> increase the size of the JSFunction unnecessarily. I thought this way tools
> which already read 'atom' would benefit, while those which cared about the
> semantic correctness would check explicitly to not return fancy names.

Luke and I talked about this a bit. Right now, we assume that if JSFunction::atom is non-null, that means it's the function's name. For example:

    bool isNamedLambda()     const { return (flags & JSFUN_LAMBDA) && atom; }

Let's assume that the user usually prefers to see the names given in the source code over the ones we'd generate. So we only need to store generated names when atom would be NULL. This means we could have a bit in JSFunction::flags indicating whether the function is named. Then:

- Code which at present tests whether 'atom' is set could test the 'flags' bit instead.

- Code which wants a real identifier or nothing (for example, CallObjectLambdaName, which is going to stick the name on a scope object) could use an accessor member function that checks the 'flags' bit, returns NULL if it is clear, and 'atom' if it is set.

- Code which wants to get a display name, either as written in the source or as generated by us (for example, GetFunctionNameBytes, which is only used for producing error messages) would use a different accessor member function which returns 'atom' without checking the bit.
"when atom would, at present, be null"
(In reply to Alex Crichton [:acrichto] from comment #11)
> Oh one other relevant fact is where I store the "pretty name." I didn't want
> to break any current tests or anything related to decompilation or the
> "name" property, but I still store the generated atom in the "atom" field of
> the JSFunction. In doing so, a flag is also set saying that the name was
> generated instead of given. The flag is then tested in the relevant
> locations to return an empty string instead of the generated atom.

... which is exactly what I described. Sorry, I'm staying up too late. *sigh*
Attachment #644402 - Flags: review?(n.nethercote) → review?(jimb)
Adding a new field to ParseNode isn't great.  We easily have many 10s of MBs of them when loading sites, and although they're short-lived, they are usually a big factor in peak memory consumption, which is one of the most important measurements.  Is there any way to avoid adding the new field?
Attached patch patch2 (obsolete) — Splinter Review
Removed the pn_parent field in ParseNode and instead am traversing the AST looking for functions, keeping a history of nodes visited so the parent trail can be recreated.

I have a small test for these naming conventions, but I'm not sure how to expose this information to JS. It was mentioned previously that we shouldn't be using the displayName attribute, but is there perhaps another which is a good candidate? If not, I suppose I could make the test a C++ test and just test it from there as well.
Attachment #644402 - Attachment is obsolete: true
Attachment #644402 - Flags: review?(jimb)
Attachment #650385 - Flags: review?(jimb)
(In reply to Alex Crichton [:acrichto] from comment #18)
> 
> I have a small test for these naming conventions, but I'm not sure how to
> expose this information to JS. It was mentioned previously that we shouldn't
> be using the displayName attribute, but is there perhaps another which is a
> good candidate? If not, I suppose I could make the test a C++ test and just
> test it from there as well.

Adding something to js/src/builtin/TestingFunctions.cpp would be preferable to writing a C++ test. But would accessing it through the Debugger API work well enough?
Attached patch tests (obsolete) — Splinter Review
It looked like using the Debugger would be a pain because I'd have to do everything in separate compartments which would make all these different syntaxes ugly to look at.

The TestFunctions.cpp route was a really good idea though, so I did that and it worked out pretty well.
Comment on attachment 651472 [details] [diff] [review]
tests

Review of attachment 651472 [details] [diff] [review]:
-----------------------------------------------------------------

Wow. I'm not really happy with JJB's weird notation --- I think people are going to say "What the heck?" given "main/foo<" --- but it's vastly better than what we've got now, so let's take this as a basis for iteration.

Just an r+ for the tests; I'll look at the code tomorrow.

::: js/src/builtin/TestingFunctions.cpp
@@ +743,5 @@
> +DisplayName(JSContext *cx, unsigned argc, jsval *vp)
> +{
> +    CallArgs args = CallArgsFromVp(argc, vp);
> +    if (argc == 0 || !args[0].isObject() ||
> +        JS_GetClass(args[0].toObjectOrNull()) != Jsvalify(&FunctionClass))

I think you can simply say args[0].toObject().isFunction() here.

@@ +867,5 @@
>  
> +    JS_FN_HELP("displayName", DisplayName, 1, 0,
> +"displayName(fn)",
> +"  Gets the display name for a function, which can possibly be a guessed or\n"
> +"  inferred named based on where the function was defined. This can be\n"

"inferred name", right?
Attachment #651472 - Flags: review+
Have you measured the effect of this patch on parsing speed? We do have a parsing benchmark lying around somewhere.
All benchmarks came out with "Meh."
Comment on attachment 650385 [details] [diff] [review]
patch2

Review of attachment 650385 [details] [diff] [review]:
-----------------------------------------------------------------

Here are some comments for now; I'm going to have to come back to this later.

As I've said before, we desperately need something like this; calling these things 'anonymous' is just pathetic, and inferring names for functions added to prototypes and appearing in other sorts of structures is a huge win.

In that context, I really don't want to complain about details, but it is quite hard for me to understand exactly how we're producing these names in some of the more obscure cases. The code makes the relationship between the enclosing syntactic forms and the names hard to anticipate. That is, I don't trust myself to be able to distinguish between intended and incidental behavior. See, for example, my question about "skipping relevant context" below. I can't even tell if that's a bug or not.

Other thoughts:

This patch is fine as it is, but for the future, it's helpful to reviewers (well, to me, at least) to separate out large mechanical changes, like s/atom/atom()/ in this patch, into their own patches. Once the reviewer has gone through the mechanical part, they can set it aside, and don't have to worry about small but meaningful changes being lost in a flood of effect-free mechanical changes.

We should have a JSAPI function for getting at JSFunction::displayAtom. Please either add that in this patch, or file a follow-up bug.

You mentioned on IRC that there's some input form that causes NameResolver::resolve to hang; when you've debugged that, make sure that there's a test case in js/src/jit-test or js/src/tests that covers that input form.

::: js/src/frontend/SemanticAnalysis.cpp
@@ +132,5 @@
> +               !n->isKind(PNK_STATEMENTLIST) && !n->isKind(PNK_SEMI);
> +    }
> +
> +    /* Test whether a ParseNode represents a function invocation */
> +    bool call(ParseNode *pn) {

Couldn't this simply be pn->isKind(PNK_LP)?

@@ +224,5 @@
> +                return NULL;
> +            if (!buf.append("/"))
> +                return NULL;
> +            if (!buf.append(fun->displayAtom()))
> +                return NULL;

Possible style suggestion, if you like:

  if (!buf.append(prefix) ||
      !buf.append("/") ||
      !buf.append(fun->displayAtom()))
      return NULL;

@@ +226,5 @@
> +                return NULL;
> +            if (!buf.append(fun->displayAtom()))
> +                return NULL;
> +            return buf.finishAtom();
> +        } else if (pn == NULL || nparents == 0) {

SpiderMonkey style is to eschew 'else' after an 'if' block that ends in a 'return'. So this would just be:

  if (pn == NULL ...)

and there would be no braces necessary around the 'then' statement.

@@ +242,5 @@
> +         * and all nodes are recorded. The assignment is then inserted into the
> +         * StringBuffer instance, and then we traverse back down the tree (using
> +         * the history we built) appending to the StringBuffer.
> +         */
> +        ParseNode *nodes[20]; // might skip lots of nodes, so no using parents

Why not just make this the same size as parents? Then you'd know for a fact that you'd never overrun it, unless you walked off the top of the parents array.

@@ +249,5 @@
> +        ParseNode *node = pn;
> +        ParseNode *parent = parents[pos];
> +
> +        // initially walk up the AST because we can only append to the StringBuffer
> +        for (size = 0; size < 20 && expression(parent); size++) {

Perhaps it's just Friday afternoon, but I can't make head nor tail of this loop. What do 'node', 'parent', and 'nodes' mean, exactly, when it terminates?

@@ +270,5 @@
> +                parent = NULL;           // end of the line
> +            } else if (!parent->isKind(PNK_RETURN)) {
> +                parent = parents[--pos]; // use the normal parent
> +            } else {
> +                // have some fun traversing upwards, looking for a call

Mightn't this loop end up skipping relevant context? Say:

var x = f(function g() { return function() {}; });

if we start at the inner anonymous function, it seems like we'll skip up to the call to f, and find the assignment to x. But that anonymous function shouldn't have 'x' in its name, should it?

@@ +365,5 @@
> +        if (cur == NULL)
> +            return;
> +        if (cur->isKind(PNK_FUNCTION) && cur->pn_funbox) {
> +            prefix = resolveFun(cur->pn_funbox->function(), cur, prefix);
> +            if (nparents > 0 && call(parents[nparents - 1]))

It'd be nice to have a brief comment over this, perhaps:

/* Directly applied functions don't affect the prefix of nested functions. */

@@ +375,5 @@
> +
> +        switch (cur->getArity()) {
> +            case PN_NULLARY:
> +                break;
> +            case PN_NAMESET:

PN_NAMESET is gone now. On IRC you said that the latest rev of the patch has this case removed, but is otherwise the same, so that's fine.

@@ +383,5 @@
> +                switch (cur->getKind()) {
> +                    case PNK_DOT:
> +                    case PNK_NAME:
> +                    case PNK_COLON:
> +                        resolve(cur->pn_expr, prefix);

It seems to me we should be using cur->expr() here, not accessing cur->pn_expr directly ourselves.

::: js/src/jsapi.h
@@ +2523,5 @@
>  #define JSFUN_STUB_GSOPS      0x1000    /* use JS_PropertyStub getter/setter
>                                             instead of defaulting to class gsops
>                                             for property holding function */
> +#define JSFUN_HAS_GUESSED_ATOM  0x8000  /* function had no explicit name, but a
> +                                           name was guessed for it anyway */

Is it really appropriate for this to be part of the public API? Embeddings can't specify display names with JSPropertySpec or anything like that, so I would expect this flag to be defined is jnfun.h, and not visible here.

::: js/src/jsexn.cpp
@@ +280,5 @@
>              JSStackTraceStackElem &frame = frames.back();
>              if (i.isNonEvalFunctionFrame()) {
> +                JSAtom *atom = fp->fun()->atom();
> +                if (atom == NULL)
> +                    atom = cx->runtime->emptyString;

It seems like exception backtraces would be a fine place to use the display name. Does it cause problems?

::: js/src/jsfun.h
@@ +103,5 @@
>          this->flags |= JSFUN_HAS_DEFAULTS;
>      }
>  
> +    JSAtom *atom() { return hasGuessedAtom() ? NULL : atom_; }
> +    void atomInit(JSAtom *atom) { atom_.init(atom); }

Would "initAtom" be better? Dunno.

::: js/src/vm/Debugger.cpp
@@ +3672,4 @@
>          return true;
>      }
>  
> +    JSString *name = obj->toFunction()->displayAtom();

Actually, I think debuggerObject_getName should continue to return atom(), and a new debuggerObject_getDisplayName should return displayAtom().
Attachment #650385 - Attachment is obsolete: true
Attachment #650385 - Flags: review?(jimb)
Attachment #653138 - Flags: review?(jimb)
Carrying forward the r+ from the previous tests patch, just updated name mostly
Attachment #651472 - Attachment is obsolete: true
Attachment #653142 - Flags: review+
(In reply to Alex Crichton [:acrichto] from comment #24)
> All benchmarks came out with "Meh."

Parsemark is flawed;  although the tests themselves are great the driver script is 
terrible at detecting real differences;  for whatever reason the noise always is overwhelming.  I've had more luck using Cachegrind on the Parsemark tests.
Now I'm wondering why this wasn't detected automatically.  Are we not running jit_test.py with --valgrind on Linux?
(In reply to Nicholas Nethercote [:njn] from comment #33)
> Now I'm wondering why this wasn't detected automatically.  Are we not
> running jit_test.py with --valgrind on Linux?

Sorry, that comment was supposed to go on bug 783464.
Attachment #653139 - Flags: review?(jimb) → review+
Comment on attachment 653140 [details] [diff] [review]
Part 3 - Change error stack traces to use displayAtom() instead of atom()

Review of attachment 653140 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great! However, I wonder if we won't find more failures when we run the full browser test suite...
Attachment #653140 - Flags: review?(jimb) → review+
Attachment #653141 - Flags: review?(jimb) → review+
Comment on attachment 653143 [details] [diff] [review]
Part 6 - Add JS_GetFunctionDisplayId as a jsapi function for a JSFunction's displayAtom

When this lands, the JSAPI documentation on the wiki will need to be updated.
Attachment #653143 - Flags: review?(jimb) → review+
Comment on attachment 653141 [details] [diff] [review]
Part 4 - Add DebuggerObject_displayName for a function's displayAtom attribute

I'll take care of documenting this on the Debugger API docs.
Attachment #653142 - Attachment is patch: true
Draft docs for Debugger.Object#displayName:
https://github.com/jimblandy/DebuggerDocs/compare/displayName
Comment on attachment 653138 [details] [diff] [review]
Part 1 - Statically resolve names for anonymous functions

Review of attachment 653138 [details] [diff] [review]:
-----------------------------------------------------------------

Some more comments; still more to read.

::: js/src/frontend/SemanticAnalysis.cpp
@@ +66,5 @@
> +    ParseNode *parents[100]; // history of ParseNodes we've been looking at
> +    StringBuffer *buf;       // when resolving, buffer to append to
> +
> +    /* Test whether a ParseNode is an expression (not a statement) */
> +    bool expression(ParseNode *n) {

I don't think we can recognize expressions by listing non-expressions. As discussed on IRC, let's try recognizing the forms that allow us to assign names: assignments, declarations, object and array literals. We can scan upwards for those, using PNK_FUNCTION or the top of the tree as our backstops.

@@ +124,5 @@
> +
> +    /*
> +     * Resolve the name of a function. If the function already has a name
> +     * listed, then it is skipped. Otherwise an intelligent name is guessed to
> +     * assign do the function's displayAtom field

"assign to"

Also, this comment should explain that if the function already has its own real name, we won't override that, but we compute and return a display name for it anyway, to be used as a prefix for any functions contained within it. (I was confused for a bit: "Why are we bothering to compute this name that we're not going to use?")

@@ +126,5 @@
> +     * Resolve the name of a function. If the function already has a name
> +     * listed, then it is skipped. Otherwise an intelligent name is guessed to
> +     * assign do the function's displayAtom field
> +     */
> +    JSAtom* resolveFun(JSFunction *fun, ParseNode *pn, JSAtom *prefix) {

Nit: JSAtom *resolveFun(...)

@@ +163,5 @@
> +         * we don't have to remember which ones are skipped in the actual
> +         * history later on when we're traversing back down from finding a
> +         * statement.
> +         */
> +        ParseNode *nodes[mozilla::ArrayLength(parents)];

I hear the ArrayLength uses have become a static const; seems good.

@@ +194,5 @@
> +                parent = parents[--pos]; // use the normal parent
> +            } else {
> +                // have some fun traversing upwards, looking for a call
> +                for (int tmp = pos; tmp > 0; tmp--) {
> +                    if (call(parents[tmp])) {

We discussed tightening up the use of 'call' in 'resolve' to specifically match immediate applications; should the same be done for the use here?

@@ +299,5 @@
> +             * of the definition of the function is a call, then it shouldn't
> +             * contribute anything to the namespace, so don't bother updating
> +             * the prefix to whatever was returned.
> +             */
> +            if (nparents == 0 || !call(parents[nparents - 1]))

As discussed in IRC, this should check that cur is indeed the callee of the call.
Attachment #653138 - Attachment is obsolete: true
Attachment #653138 - Flags: review?(jimb)
Attachment #653598 - Flags: review?(jimb)
Comment on attachment 653598 [details] [diff] [review]
Part 1 - Statically resolve names for anonymous functions (v2)

Review of attachment 653598 [details] [diff] [review]:
-----------------------------------------------------------------

Fantastic. We're really close. A few issues, a bunch of minor comments.

::: js/src/frontend/SemanticAnalysis.cpp
@@ +61,5 @@
>  }
>  
> +class NameResolver
> +{
> +    static const unsigned MaxParents = 100;

Should be size_t, I think. Whenever we're representing the length of an array in memory, and we don't need negative values, size_t is the way to go.

@@ +64,5 @@
> +{
> +    static const unsigned MaxParents = 100;
> +
> +    JSContext *cx;
> +    uint32_t nparents;              /* number of parents in the parents array */

This should also be size_t, I think.

@@ +69,5 @@
> +    ParseNode *parents[MaxParents]; /* history of ParseNodes we've been looking at */
> +    StringBuffer *buf;              /* when resolving, buffer to append to */
> +
> +    /* Test whether a ParseNode is an expression (not a statement) */
> +    bool expression(ParseNode *n) {

It looks like this is no longer used (great!), so it should be deleted.

@@ +109,5 @@
> +                       nameExpression(n->pn_right) &&
> +                       buf->append("]");
> +
> +            case PNK_NUMBER: {
> +                char number[10];

If you bump this up to 30, this will cover all the IEEE values under %g, I think.

@@ +132,5 @@
> +     * tree to the function definition point.
> +     *
> +     * This function will walk up the parse tree, gathering relevant nodes used
> +     * for naming, and return the assignment node if there is one. The provided
> +     * array and size will be filled in, and the returned node could be NULL if

Mention "with inner nodes appearing before outer nodes", perhaps? I was surprised for a moment reading resolveFun, that the PNK_COLON case was going to assemble the pieces in the right order.

@@ +135,5 @@
> +     * for naming, and return the assignment node if there is one. The provided
> +     * array and size will be filled in, and the returned node could be NULL if
> +     * no assignment is found.
> +     */
> +    ParseNode *gatherNameable(ParseNode **nameable, unsigned *size) {

'size' should be size_t.

Thanks very much for pulling this out into its own function; I think it makes it a lot clearer.

@@ +144,5 @@
> +            if (cur->isAssignment())
> +                return cur;
> +
> +            switch (cur->getKind()) {
> +                case PNK_NAME:     return cur;  /* found the assignment */

This should be "found the initialized declaration", right? Because we've hit a PNK_NAME with a pn_expr, meaning it must be an initialized declaration in a PNK_VAR or PNK_CONST.

@@ +145,5 @@
> +                return cur;
> +
> +            switch (cur->getKind()) {
> +                case PNK_NAME:     return cur;  /* found the assignment */
> +                case PNK_FUNCTION: return NULL; /* won't find an assignment */

Consider "Won't find an assignment or declaration".

@@ +151,5 @@
> +
> +                /* These nodes are relevant to the naming process, so append */
> +                case PNK_COLON:
> +                case PNK_LP:
> +                case PNK_NEW:

If I understand correctly, the only effect of PNK_LP and PNK_NEW parents is to cause a '<' to appear after the function's inferred name. We don't process those node kinds specially in resolveFun, that I see. Now, technically, I can write garbage like:

x = +function(){}

or 

x = function(){}/4;

They yield NaNs, but they *are* valid JavaScript. And that function should be called "x<". So, would it be more correct to just include all parent nodes, regardless of kind, up to the next assignment/declaration/function?

Now, if I write:

x = c ? function() {} : function () {}

should those be called 'x' or 'x<'? If they should be 'x', then we may want to explicitly not include PNK_CONDITIONAL nodes in 'nameable'.

@@ +153,5 @@
> +                case PNK_COLON:
> +                case PNK_LP:
> +                case PNK_NEW:
> +                    nameable[(*size)++] = cur;
> +                    JS_ASSERT(*size < MaxParents);

Since you're doing this after the increment, it should be <=, for the case where every parent gets included in nameable. (I don't think that can actually happen, since parents will always include stuff we don't care about. But since size manipulations and checks are often security-sensitive, whenever possible they should make sense without detailed reasoning about the context, like what sorts of parse trees are possible.)

@@ +163,5 @@
> +             * sometimes with code like:
> +             *
> +             *    var foo = (function() { return function() {}; })();
> +             *
> +             * The outer function is just a helper to create a scope for the

Since this is a continuation of the sentence begun above, it shouldn't start with a capital.

@@ +170,5 @@
> +             * PNK_RETURN, and if there is a direct function call we skip to
> +             * that.
> +             */
> +            if (cur->isKind(PNK_RETURN)) {
> +                for (int tmp = pos; tmp > 0; tmp--) {

This could be tmp = pos - 1, right?

@@ +191,5 @@
> +     * Resolve the name of a function. If the function already has a name
> +     * listed, then it is skipped. Otherwise an intelligent name is guessed to
> +     * assign to the function's displayAtom field
> +     */
> +    JSAtom *resolveFun(JSFunction *fun, ParseNode *pn, JSAtom *prefix) {

I think it might be a little clearer for this to take only 'pn' as an argument, and then make 'fun' a local variable initialized at the top to cur->pn_funbox->function(). When you pass them as separate arguments, readers have to wonder what the required relationship between them is --- and then see that it's the obvious one.

@@ +192,5 @@
> +     * listed, then it is skipped. Otherwise an intelligent name is guessed to
> +     * assign to the function's displayAtom field
> +     */
> +    JSAtom *resolveFun(JSFunction *fun, ParseNode *pn, JSAtom *prefix) {
> +        if (pn == NULL || nparents == 0)

We don't need the 'pn == NULL' check, do we?

@@ +215,5 @@
> +            return NULL;
> +
> +        /* Gather all nodes relevant to naming */
> +        ParseNode *toName[MaxParents];
> +        unsigned size;

This should be size_t.

@@ +228,5 @@
> +        }
> +
> +        /*
> +         * Other than the actual assignment, other relevant nodes to naming are
> +         * those in hashes and then particular nodes marking a contribution.

Instead of "hashes", we should use the ECMAScript terminology, "object initializer".

@@ +234,5 @@
> +        for (int pos = size - 1; pos >= 0; pos--) {
> +            ParseNode *node = toName[pos];
> +
> +            if (node->isKind(PNK_COLON)) {
> +                if (!node->pn_left->isKind(PNK_NAME))

As a follow-up, we should add support for PNK_STRING here: { "a": <func> }

@@ +246,5 @@
> +                /* if this is truly an anonymous function, don't bother naming it */
> +                if (prefix == NULL && buf.empty())
> +                    return NULL;
> +                /* Don't have consecutive '<' characters */
> +                if (*(buf.end() - 1) != '<' && !buf.append("<"))

If prefix != NULL but buf is empty, won't this fetch off the beginning of buf?

@@ +256,5 @@
> +         * functions which are "genuinely anonymous" but are contained in some
> +         * other namespace are rather considered as "contributing" to the outer
> +         * function, so give them a contribution symbol here.
> +         */
> +        if (!buf.empty() && *(buf.end() - 1) == '/' && !buf.append("<"))

Here and above, you could write buf.end()[-1]!!! Or is that bad taste?

@@ +263,5 @@
> +        return buf.finishAtom();
> +    }
> +
> +    /*
> +     * Tests whether the current node is a function which is directly called.

I think you could just say:

Tests whether parents[pos] is a function call whose callee is cur.

and then drop the whole second paragraph.

@@ +318,5 @@
> +                /*
> +                 * Occasionally pn_left == pn_right for something like
> +                 * destructuring assignment in (function({foo}){}), so skip the
> +                 * duplicate here if this is the case because we want to
> +                 * traverse everything at most once.

Great comment: shows sample text that produces the parse tree being checked for; explains why it needs special care.
Attachment #653598 - Flags: review?(jimb)
Attachment #653598 - Attachment is obsolete: true
I imagine that the diffs of diffs are now out of wack because apparently SemanticAnalysis.cpp went away so I moved everything into a new NameFunctions.cpp file.

(In reply to Jim Blandy :jimb from comment #41)

> @@ +151,5 @@
> > +
> > +                /* These nodes are relevant to the naming process, so append */
> > +                case PNK_COLON:
> > +                case PNK_LP:
> > +                case PNK_NEW:
> 
> If I understand correctly, the only effect of PNK_LP and PNK_NEW parents is
> to cause a '<' to appear after the function's inferred name. We don't
> process those node kinds specially in resolveFun, that I see. Now,
> technically, I can write garbage like:
> 
> x = +function(){}
> 
> or 
> 
> x = function(){}/4;
> 
> They yield NaNs, but they *are* valid JavaScript. And that function should
> be called "x<". So, would it be more correct to just include all parent
> nodes, regardless of kind, up to the next assignment/declaration/function?
> 
> Now, if I write:
> 
> x = c ? function() {} : function () {}
> 
> should those be called 'x' or 'x<'? If they should be 'x', then we may want
> to explicitly not include PNK_CONDITIONAL nodes in 'nameable'.

It's true that those two are marked as the contributions, but the alternative here is kind of tricky. We still have terminations at PNK_NAME, isAssignment(), or PNK_FUNCTION, but if all others contribute to naming then we'll have to weed out a lot of assorted ones to make sure that spurious '<' don't appear. I'm going to try to play around with this and see if it's possible to make this turn out nicely. I ran into the problem again of 'what is a statement?' because if all nodes default to being nameable, knowing what to skip is then a problem.

> Since you're doing this after the increment, it should be <=, 

Thinking for a second, I prefer this being ahead of the store and I believe that's when '<' is more appropriate than '<=' anyway

> As a follow-up, we should add support for PNK_STRING here: { "a": <func> }

Part 7!

> If prefix != NULL but buf is empty, won't this fetch off the beginning of
> buf?

Turns out the NULL check was unnecessary, also see comment below.

> Here and above, you could write buf.end()[-1]!!! Or is that bad taste?

I don't think so? buf.end() points to one beyond the end of the buffer, so the -1 goes back to the actual last character in the buffer. This is what Vector::back() does, and both of the empty checks at the beginning of the if should prevent going out of bounds.
Comment on attachment 654227 [details] [diff] [review]
Part 1 - Statically resolve names for anonymous functions (v3)

Review of attachment 654227 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/frontend/NameFunctions.h
@@ +12,5 @@
> +
> +namespace js {
> +namespace frontend {
> +
> +struct Parser;

Do we need this any more?
Attachment #654227 - Flags: review?(jimb) → review+
Comment on attachment 654225 [details] [diff] [review]
Part 7 - Name functions whose field in an object literal is a PNK_STRING node

Review of attachment 654225 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great!

::: js/src/frontend/NameFunctions.cpp
@@ +102,5 @@
>                  /* These nodes are relevant to the naming process, so append */
>                  case PNK_COLON:
>                  case PNK_LP:
>                  case PNK_NEW:
> +                case PNK_STRING:

Do we need this? As long as PNK_COLON is listed, it seems like we'll find the PNK_STRING when we look for it. And PNK_STRING nodes are leaf nodes, so you'll never even run into them in the parents array anyway.
Attachment #654225 - Flags: review?(jimb) → review+
Comment on attachment 654227 [details] [diff] [review]
Part 1 - Statically resolve names for anonymous functions (v3)

Review of attachment 654227 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/frontend/NameFunctions.cpp
@@ +190,5 @@
> +         */
> +        for (int pos = size - 1; pos >= 0; pos--) {
> +            ParseNode *node = toName[pos];
> +
> +            if (node->isKind(PNK_COLON)) {

I just realized --- not all PNK_COLON nodes are binary. Labeled statements like this:

foo: while (true) { break foo; }

also make PNK_COLON nodes that are PN_NAME, not PN_BINARY.
Here are a few more tests. The labeled statement causes resolveFun to access a PNK_COLON PN_NAME node as if it were a PN_BINARY. The other one causes resolveFun to give a function the wrong name.
I felt like our conversations on IRC were being stalled because it was hard to describe the change I had in mind. Here's something that seems to work; it passes the existing tests, plus the new ones I suggested.

This is kind of ridiculous, because it's really hard to imagine these functions ever escaping into the wild; I had to give Function.prototype a valueOf method. But I think this comes closer to matching the definition of 'contribution': if you're not a direct child of the thing that's giving you your name, then you're a contributor --- which means that any nodes on the path from the namer to the namee are relevant.

Oh --- wait!

var z = [function() {}];

That gets called 'z' when it should be called 'z<'. I'll update my test contribution.
Here's the updated update to the tests.
Attachment #654484 - Attachment is obsolete: true
This patch is r+; let's get it landed. If you like my suggestions, then we can deal with them in a follow-up.

Do you have commit privs?
This seems to break --enable-dtrace builds with:

js/src/jsprobes.cpp:166:15: error: reference to non-static member function must be called; did you mean to call it with no arguments?
    if (!fun->atom)
         ~~~~~^~~~
                  ()
js/src/jsprobes.cpp:166:10: error: member function 'atom' not viable: 'this' argument has type 'const JSFunction', but function is not marked const
    if (!fun->atom)
         ^~~
js/src/jsfun.h:112:13: note: 'atom' declared here
    JSAtom *atom() { return hasGuessedAtom() ? NULL : atom_.get(); }
            ^
js/src/jsprobes.cpp:168:35: error: reference to non-static member function must be called; did you mean to call it with no arguments?
    return bytes->encode(cx, fun->atom) ? bytes->ptr() : Probes::nullName;
                             ~~~~~^~~~
                                      ()
js/src/jsprobes.cpp:168:30: error: member function 'atom' not viable: 'this' argument has type 'const JSFunction', but function is not marked const
    return bytes->encode(cx, fun->atom) ? bytes->ptr() : Probes::nullName;
                             ^~~
js/src/jsfun.h:112:13: note: 'atom' declared here
    JSAtom *atom() { return hasGuessedAtom() ? NULL : atom_.get(); }
            ^
Attached patch Fix --enable-dtrace builds (obsolete) — Splinter Review
Sorry those got through the cracks, I thought I changed all instances of 'fun->atom'
Attachment #654520 - Flags: review?(peterv)
Attachment #654520 - Flags: review?(peterv) → review+
Comment on attachment 654520 [details] [diff] [review]
Fix --enable-dtrace builds

Actually, I think you'll have to make displayAtom() const.
Attachment #654520 - Flags: review+ → review-
Attachment #654520 - Attachment is obsolete: true
Attachment #654531 - Flags: review?(peterv)
Attachment #654531 - Flags: review?(peterv) → review+
Depends on: 785983
Blocks: 838265
(In reply to Jason Orendorff [:jorendorff] from comment #1)
> To clarify: this change would be for diagnostic purposes only.

No more: ES6 proposal, courtesy Brandon Benvie and approved at the last meeting, is to infer a name own-property of each function object, non-configurable non-enumerable but (this changed from Brandon's original proposal) writable, I think as this bug does.

Cc'ing Brandon and Allen.

/be
Hmm, the ES6 proposal is a bit simpler: no dotted paths, etc.

Did the extra parsetree walk show up in any compiler perf tests? cdleary had a big one, but I do not know whether it was automated.

Nit-pick alert -- I made a followup style fix to eliminate the over-indented switch cases:

https://hg.mozilla.org/integration/mozilla-inbound/rev/e1092f311f2d

/be
A writeup for the ES6 proposal is now available at http://wiki.ecmascript.org/doku.php?id=harmony:function_name_property
You need to log in before you can comment on or make changes to this bug.