Closed Bug 568142 Opened 14 years ago Closed 12 years ago

Expand source notes to carry column information

Categories

(Core :: JavaScript Engine, defect, P1)

Other Branch
defect

Tracking

()

RESOLVED FIXED
mozilla17

People

(Reporter: jimb, Assigned: u443197)

References

(Blocks 5 open bugs)

Details

Attachments

(2 files, 6 obsolete files)

Extend the source notes attached to JSScripts to carry both line and column information. This allows debugging of poorly-formatted code such as that produced by script compressors or obfuscators. The bytecode compiler already tracks column numbers; they're simply not recorded in the source notes.

Note that this need not imply any increase in the size of notes for normally formatted source code: the granularity of the features distinguished by the source annotations (that is, statements) need not change. Only if there were multiple statements or functions on the same line would column numbers be needed to distinguish them.
Feel free to revamp srcnotes as much as makes sense -- they're 15 years old and that means ripe for redesign and reimplementation.

/be
If the goal to debug the results of script compression it would be more useful to teach the debugger to use pretty-printed results of decompilation rather than the original sources. That is, it would be nice if a debugger would use the results or compressed_function.toSource() when debugging rather than the original source. For that to be practical the decompiler must be able to generate the source_line->bytecode mapping for the pretty-printed code, but that does not look too difficult.
(In reply to comment #2)
> If the goal to debug the results of script compression it would be more useful
> to teach the debugger to use pretty-printed results of decompilation rather
> than the original sources.

Specifying locations in terms of the pretty-printed code is a valuable thing to do (I believe Venkman can do this, but Firebug cannot), but column numbers are also a basic expansion of the expressiveness of the debugging API. Consider:

How can we set a breakpoint within g?

    function f() { return 1; } function g() { return 2; }

How can we set a breakpoint after the return of f() but before the assignment to y?

    x=1; f(); y = 2;

Without column information, there's no practical way to set these. Sure, you can provide an arbitrary bytecode offset to JS_SetTrap, but how are you going to find the right PC?
Java has a specification for how one should represent the correspondence between a source program and a translated program:
http://jcp.org/aboutJava/communityprocess/final/jsr045/index.html

That accomodates multiple levels of translation (imagine I implement a new language foo by translation to coffeescript: foo->coffeescript->javascript). It seems to me that the right architecture could support both that kind of stuff and pretty-printed/original source switching as well.
Depends on: 588061
Assignee: general → brendan
OS: Linux → All
Priority: -- → P1
Hardware: x86 → All
Target Milestone: --- → mozilla9
Attached patch working patch, needs tests (obsolete) — Splinter Review
Attachment #558661 - Flags: feedback?(jorendorff)
This will use more memory -- I need to measure how much more.

I have a more elaborate patch queue to compress js::TokenPtr to 32 bits, with added overhead in the form of a js::Vector<uint32> in the token stream for a compiler session. That is temporary savings, since TokenPtr members of TokenPos live in JSParseNode, so it won't counter-balance script-lifetime increases in source note lengths.

More tomorrow. Feedback and early testing (fitzgen) welcome!

/be
Comment on attachment 558661 [details] [diff] [review]
working patch, needs tests

Looks fine to me. It'll cost some memory.

A possible solution to that is to emit more sparingly, roughly once per statement, loop-condition, or operand of && || , ?: rather than every bytecode. That will require more code to implement and is orthogonal to this, so perhaps a separate bug.
Attachment #558661 - Flags: feedback?(jorendorff) → feedback+
(In reply to Jason Orendorff [:jorendorff] from comment #7)
> A possible solution to that is to emit more sparingly, roughly once per
> statement, loop-condition, or operand of && || , ?: rather than every
> bytecode.

Not every bytecode -- useful expressions generate at least one byte of bytecode, and this patch annotates only the first op for a given expression, at most once. That's what the last/currentColumn latching is all about.

> That will require more code to implement and is orthogonal to
> this, so perhaps a separate bug.

I'll instrument a bit (not for inclusion in the patch; njn convinced me private printfs and Unix-philosophy post-processing are the way to go). I am about to attach a more correct and complete patch and ask for review, so you don't have to wait for instrumentation results.

/be
Attached patch proposed fix, v1 (obsolete) — Splinter Review
Attachment #558661 - Attachment is obsolete: true
Attachment #558916 - Flags: review?(jorendorff)
This is on top of the main patch. It's just good for better column blame using fewer notes (no spurious negative colspans). There may be more assertions to add to jsreflect.cpp, but this is a big whack and represents a lot of edit/compile/debug cycling today. I'd like to qfold and land the two patches here, once they get review. Looking for feedback first.

/be
Attachment #559014 - Flags: feedback?(cdleary)
Thanks to dherman and whoever else helped for the awesome reflect-parse.js test!

/be
Attachment #559014 - Flags: feedback?(cdleary) → feedback+
Comment on attachment 559014 [details] [diff] [review]
first cut at fixing pn_pos so parent's encloses its kids'

Probably feedback+ promotes to review+ without much effort. The more important aspect is the debug-build test coverage -- spotting gaps there based on a read of tests/js1_8_5/extensions/reflect-parse.js would be killer.

/be
Attachment #559014 - Flags: review?(cdleary)
Attachment #559014 - Flags: review?(cdleary) → review+
Comment on attachment 558916 [details] [diff] [review]
proposed fix, v1

r=me with a heap of comments. Sorry for the slow review--this is bitrotted now.

In jsemit.cpp:
>+static bool
>+UpdateSourceCoordNotes(JSContext *cx, JSCodeGenerator *cg, const TokenPtr &coord)
> {
[...]
>+
>+    CG_CURRENT_COLUMN(cg) = coord.index;
>+    return true;
> }

I mentioned emitting column numbers less often, roughly per-statement,
to save memory on source notes.

I think you replied that the patch did that, but it looks like
UpdateSourceCoordNotes is called unconditionally at the top of EmitTree,
so CG_CURRENT_COLUMN would be updated at every subexpression, and source
notes would be emitted at the next call to EmitCheck (every bytecode
instruction). Did I misunderstand?

I'm fine either way as long as the memory usage is acceptable.

In jsemit.h, in enum JSSrcNoteType:
>-    SRC_EXTENDED    = 21,       /* extended source note, 32-159, in next byte */
>+    SRC_COLSPAN     = 21,       /* extended source note, 32-159, in next byte */

The comment is stale.

In jsexn.cpp, exn_resolve:
>         atom = cx->runtime->atomState.lineNumberAtom;
>         if (str == atom) {
>             prop = js_lineNumber_str;
>             v = INT_TO_JSVAL(priv->lineno);
>             attrs = JSPROP_ENUMERATE;
>             goto define;
>         }
> 
>+        atom = cx->runtime->atomState.columnIndexAtom;
>+        if (str == atom) {
>+            prop = js_columnIndex_str;
>+            v = INT_TO_JSVAL(priv->column);
>+            attrs = JSPROP_ENUMERATE;
>+            goto define;
>+        }

Are these two INT_TO_JSVAL calls safe?

I know column numbers are meant to range from 0 to 2^31-1, but Exception
does not enforce this. The same goes for line numbers. In both cases, the
value stored in the error report struct would be unsigned, but the property
value as defined by exn_resolve would be negative ("implementation-defined",
actually, per C++).

Maybe we just need a UINT32_TO_JSVAL that makes a double jsval if
needed.

In jsexn.cpp, Exception:
>+        if (argc > 3) {
>+            if (!ValueToECMAUint32(cx, argv[3], &column))
>+                return JS_FALSE;
>+        }

Should we really change the behavior of the Error constructors? ES5 does
ask implementors not to do this sort of extension. OTOH we already take
more arguments than ES5 15.11.1.1 specifies, so we are sort of already
in a state of sin.

I think we shouldn't do this. Scripts can just assign to .columnIndex if
they need to (and that will have the advantage of working in older
versions and other browsers).

If we do take this part of the patch, we need a test for toSource in
this case. Also I think you have to manually clamp the column number to
[0,2^31) here; see the comment about exn_resolve above.

In shell/js.cpp, SrcNotes:
>           case SRC_SWITCH: {
>-            JSOp op = js_GetOpcode(cx, script, script->code + offset);
>-            if (op == JSOP_GOTO || op == JSOP_GOTOX)
>-                break;

Was this code obsolete already?

In js/src/tests/ecma/extensions/errorcolumnblame.js:
>+/* Cover negative colspan case using for(;;) loop with error in update part. */
>+try {
>+    eval("function baz() { for (var i = 0; i < 10; i += foo()); assertEq(i !== i, true); }");
>+    baz();
>+} catch (e) {
>+    assertEq(e.columnIndex, 25);
>+}

I don't understand how this tests what it claims to be testing. The
error is thrown from foo, so it seems it would examine foo's source
notes, not baz's.
Attachment #558916 - Flags: review?(jorendorff) → review+
Will rebase to m-c and address comments, some time this week. Thanks -- I may need a re-review with enough change, but I hope to cut down on size of patch in response to the comments, so it could be the r+ carries over.

/be
Status: NEW → ASSIGNED
ping

fitzgen will be back in a few weeks and it would be nice to get column info in, in the event that Nick is going to be working on finishing up sourcemaps. (I don't know if that's part of the plan for Nick, but it wouldn't surprise me.)

Also, does this bug fix bug 481993 or does bug 481993 depend on this one?
(In reply to Kevin Dangoor from comment #15)
> ping
> 
> fitzgen will be back in a few weeks and it would be nice to get column info
> in, in the event that Nick is going to be working on finishing up
> sourcemaps. (I don't know if that's part of the plan for Nick, but it
> wouldn't surprise me.)

I'll see what I can do.

> Also, does this bug fix bug 481993 or does bug 481993 depend on this one?

Not sure. This bug is fairly cold, so it may require reinvestigation.
I have had < 0 time but I did start looking at unbitrotting the patch this week. Someone should look at the problem with fresh eyes, for sure. However, if I get something based on the rotted patch that passes try servers, I'm inclined to land.

/be
Unrotting, testing...

/be
Version: Trunk → Other Branch
Brendan, I'm close to finishing a patch in bug 747831 that converts TokenPtr to a uint32_t buffer index, and adds a table that maps buffer indices to line and column numbers.  I don't know how much it overlaps with this.
(In reply to Nicholas Nethercote [:njn] from comment #19)
> Brendan, I'm close to finishing a patch in bug 747831 that converts TokenPtr
> to a uint32_t buffer index, and adds a table that maps buffer indices to
> line and column numbers.  I don't know how much it overlaps with this.

I think it overlaps little to none, having started such a patch myself last year. Glad you are doing it. Will take a look in a bit at the patch for bug 747831.

/be
Blocks: 757188
Poke
Blocks: 762556
Blocks: 761253
Attached patch new fix (obsolete) — Splinter Review
Took Brendan's patch and rebased it on top of the current mozilla-inbound repo. I tried to take Jason's comments into account as well. Probably didn't catch all the spots that the column information should be updated.

I ran v8's check-crypto and got the amount of source notes generated:

  without this patch        - 2.4k
  with this patch (exactly) - 2.5k
  update column at EmitTree - 3.5k

Currently a jit-test is failing (basic/testBug756919.js), but the test says it's supposed to run out of memory, so not sure what's going on there.
Assignee: brendan → acrichton
Attachment #637330 - Flags: feedback?(jorendorff)
Attached patch new fix - column numbers (obsolete) — Splinter Review
Added more cases to the test, and updated column information in a few more spots.
Attachment #637330 - Attachment is obsolete: true
Attachment #637330 - Flags: feedback?(jorendorff)
Attachment #637735 - Flags: review?
Attachment #637736 - Flags: review?
Comment on attachment 637736 [details] [diff] [review]
new fix - enclosing pn_pos better

Rebased Brendan's second patch on current central, got it passing all tests again. Required some extra touch up when parsing for loops, but nothing major (from what I could tell)
Attachment #637736 - Flags: review? → review?(jorendorff)
Attachment #637735 - Flags: review? → review?(jorendorff)
Alex, thanks a ton for taking this over. I am not suitable reviewer but it looks good to me. Hope Jason can get to it soon.

/be
Comment on attachment 637736 [details] [diff] [review]
new fix - enclosing pn_pos better

Review of attachment 637736 [details] [diff] [review]:
-----------------------------------------------------------------

This seems independent from the other one.

r=me with a few comments.

::: js/src/frontend/ParseNode.h
@@ +920,5 @@
>      }
>  
>      void append(ParseNode *pn) {
>          JS_ASSERT(pn_arity == PN_LIST);
> +        JS_ASSERT(pn->pn_pos.begin >= pn_pos.begin);

Can you also JS_ASSERT(pn->pn_pos.end <= pn_pos.begin)? Why not?

::: js/src/frontend/Parser.cpp
@@ +3352,5 @@
>      forHead->pn_kid3 = pn3;
> +    if (pn1 || pn2 || pn3) {
> +        forHead->pn_pos.begin = (pn1 ? pn1 : pn2 ? pn2 : pn3)->pn_pos.begin;
> +        forHead->pn_pos.end = (pn3 ? pn3 : pn2 ? pn2 : pn1)->pn_pos.end;
> +    }

Instead, use the position of the '(' and ')' tokens.

I don't think we ever use the pn_pos field of this ParseNode, even for Reflect.parse, but we might as well be consistent.

::: js/src/jsreflect.cpp
@@ +2262,3 @@
>          Value label, stmt;
>  
>          return identifier(pn->pn_atom, NULL, &label) &&

Kind of funny that Reflect.parse("a: f();") doesn't produce location information for the label. Should it?
Attachment #637736 - Flags: review?(jorendorff) → review+
Comment on attachment 637735 [details] [diff] [review]
new fix - column numbers

Review of attachment 637735 [details] [diff] [review]:
-----------------------------------------------------------------

It seems like the code that emits SRC_{NEWLINE,SETLINE} should be on speaking terms with the code that emits SRC_COLSPAN. Really we should never emit one without at least considering emitting the other too.

Maybe I'm completely misreading the patch, but it seems like for e.g. expressions that cross multiple lines, we emit NEWLINE without COLSPAN.

Am I missing something? Are there cases where this can cause line numbers to get out of sync with column numbers? Can you change it to be more obviously correct?

Otherwise everything looks about right. Just a few comments. When you post a new patch, r?me and I'll review it right away.

::: js/src/frontend/Parser.cpp
@@ +3505,5 @@
>              MUST_MATCH_TOKEN(TOK_RC, JSMSG_CURLY_AFTER_CATCH);
>              PopStatement(tc);
>  
> +            pnblock->pn_pos.end = pn2->pn_pos.end = tokenStream.currentToken().pos.end;
> +

It looks like the changes to ParseNode.h and Parser.cpp belong in the other patch.

::: js/src/jsatom.tbl
@@ +42,5 @@
>  DEFINE_ATOM(call, "call")
>  DEFINE_ATOM(callee, "callee")
>  DEFINE_ATOM(caller, "caller")
>  DEFINE_ATOM(classPrototype, "prototype")
> +DEFINE_ATOM(columnIndex, "columnIndex")

Why not columnNumber? Because it's zero-based?

::: js/src/jsexn.cpp
@@ +734,5 @@
> +    if (!obj->getProperty(cx, cx->runtime->atomState.columnIndexAtom, &columnVal) ||
> +        !ToUint32(cx, columnVal, &column))
> +    {
> +        return false;
> +    }

If the column isn't used for anything, don't bother adding this.

::: js/src/jsscript.h
@@ +1008,5 @@
>  
>  namespace js {
>  
>  extern unsigned
> +PCToLineNumber(JSScript *script, jsbytecode *pc, unsigned *columnp = NULL);

I'd be all for a real API change here, like

  extern void
  PCToSourcePosition(JSScript *script, jsbytecode *pc, js::SourcePosition *pos);

But you'd have to fix up the callers. It's up to you.
Attachment #637735 - Flags: review?(jorendorff)
(In reply to Jason Orendorff [:jorendorff] from comment #27)

> Can you also JS_ASSERT(pn->pn_pos.end <= pn_pos.begin)? Why not?

If the statement being parsed is something like 'var a = b', then there'll be a PNK_VAR enclosing everything, and this is determined once the 'var' is seen, but it can only ben known that the original starting position is less than all the child starting positions. The end can't be known ahead of time, so it looks like if there was a JS_ASSERT that before every append there'd be an update to pn_pos.end anyway.

> Kind of funny that Reflect.parse("a: f();") doesn't produce location
> information for the label. Should it?

Without changing the representation of a label, I don't think so. Currently the ParseNode is a PNK_COLON, with a pn_atom and pn_expr. The pn_pos of the node has to enclose the entire node, so without changing to something like a binary node, there won't be any position information about just the pn_atom. I suppose that it could probably be inferred by the start + length of the atom. Should I change this to a binary node so we do get position information for the label?

(In reply to Jason Orendorff [:jorendorff] from comment #28)

> It seems like the code that emits SRC_{NEWLINE,SETLINE} should be on
> speaking terms with the code that emits SRC_COLSPAN. Really we should never
> emit one without at least considering emitting the other too.
> 
> Maybe I'm completely misreading the patch, but it seems like for e.g.
> expressions that cross multiple lines, we emit NEWLINE without COLSPAN.
> 
> Am I missing something? Are there cases where this can cause line numbers to
> get out of sync with column numbers? Can you change it to be more obviously
> correct?
> 
> Otherwise everything looks about right. Just a few comments. When you post a
> new patch, r?me and I'll review it right away.

I'm not sure I quite understand. Do you have a specific case in mind which wouldn't work as it is now? I tried this:
  try {                            
      for (var i = 0;              
    i < 10;                        
     i +=                          
   a                               
    .                              
     b);                           
  } catch (e) {                    
      assertEq(e.columnNumber, 1); 
  }                                

and it passes. Currently SRC_*LINE implies the column goes to 0, so why would an additional SRC_COLSPAN need to be emitted whenever the line notes are? If we emit a SRC_COLSPAN, it doesn't make sense to me to emit either of the SRC_*LINE notes because we're still on the same line. If we emit a SRC_*LINE note, then why would we want to emit a SRC_COLSPAN? We will as soon as a relevant expression is found, but I thought we wouldn't know what to emit beforehand?

> Why not columnNumber? Because it's zero-based?

Seeing how there's a lineNumber property, I don't mind renaming to columnNumber.

> I'd be all for a real API change here, like
> 
>   extern void
>   PCToSourcePosition(JSScript *script, jsbytecode *pc, js::SourcePosition
> *pos);
> 
> But you'd have to fix up the callers. It's up to you.

I looked around at the existing use cases of *PCToLineNumber and while definitely all used for debugging, it looked like replacing the functions might not be the best idea. A number of use cases were just having some debug output, but this would force to have an extra few lines to declare the SourcePosition and fill it out. From what I saw, it seems like replacing wouldn't be the best idea.
Updated to be cleaner, also realized I needed to update jit_test.py a bit to catch oom because the output now has the column number.
Attachment #558916 - Attachment is obsolete: true
Attachment #559014 - Attachment is obsolete: true
Attachment #637735 - Attachment is obsolete: true
Attachment #650226 - Flags: review?(jorendorff)
Comment on attachment 650226 [details] [diff] [review]
Part 1: Add column number notes and columns to error reports

Review of attachment 650226 [details] [diff] [review]:
-----------------------------------------------------------------

We discussed this on IRC and I also reviewed some separate fixes you posted in pastebin. I'll wait and review a final patch here.

::: js/src/shell/js.cpp
@@ +1579,5 @@
>          }
>          Sprint(sp, "%3u: %4u %5u [%4u] %-8s", unsigned(sn - notes), lineno, offset, delta, name);
>          switch (type) {
> +          case SRC_COLSPAN:
> +            Sprint(sp, "%d", js_GetSrcNoteOffset(sn, 0));

Subtract SN_COLSPAN_DOMAIN here if the offset is supposed to be negative?

::: js/src/tests/ecma/extensions/errorcolumnblame.js
@@ +3,5 @@
> + * http://creativecommons.org/licenses/publicdomain/
> + */
> +
> +var BUGNUMBER = 568142;
> +var summary = 'error reporting blames column as well as line';

Please also test a few other kinds of errors, besides ReferenceError. Stuff like calling a non-function, trying to get a property off of null, or calling something like
  Object.defineProperty();
which will throw because it requires at least one argument.

I think it's better to emit location information notes just before the code for each top-level expression, and maybe each function/method call; rather than before each opcode that we think might throw. Because they can all throw.
Attachment #650226 - Flags: review?(jorendorff)
Fixes and has tests for a few bugs jorendorff found with copying error reports and reporting for particular statements.

The errorcolumnblame.js test now expects columns to point at statement starts instead of the actual thing in question.

Haven't managed to get reliable column note measurements yet.
Attachment #650226 - Attachment is obsolete: true
Attachment #651068 - Flags: review?(jorendorff)
Blocks: 745678
Comment on attachment 651068 [details] [diff] [review]
Part 1: Add column number notes and columns to error reports (v2)

Review of attachment 651068 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good. We discussed always calling UpdateSourceCoordNotes in the few places where you added UpdateColumnNumberNotes calls. That sounds better to me.

r=me with that change; it shouldn't affect the behavior except in any of these cases but it's more clearly sensible.

::: js/src/frontend/BytecodeEmitter.cpp
@@ +387,5 @@
>      return EmitJump(cx, bce, op, delta);
>  }
>  
>  /* A macro for inlining at the top of EmitTree (whence it came). */
>  #define UPDATE_LINE_NUMBER_NOTES(cx, bce, line)                               \

I probably mentioned this before. Feel free to rewrite this sucker as a static inline function.

@@ +5136,1 @@
>      /* Push a return value */

Style nit: Add a blank line before this comment not at the beginning of a block.
Attachment #651068 - Flags: review?(jorendorff) → review+
https://hg.mozilla.org/mozilla-central/rev/753d5e8c8064
https://hg.mozilla.org/mozilla-central/rev/a8601aeb1d1d
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Please make sure the target milestone is correct (if set) when landing on inbound, since many groups use it as a source-of-truth (AMO/MDN/release-drivers etc).

If the field is blank, the merge tool will populate it with the current trunk version - just make sure that if it isn't blank, that it is correct.
Target Milestone: mozilla9 → mozilla17
Depends on: 785175
Blocks: 723020
You need to log in before you can comment on or make changes to this bug.