Closed Bug 663934 Opened 14 years ago Closed 12 years ago

make filenames in @line comments work for runtime exceptions

Categories

(Core :: JavaScript Engine, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: djf, Assigned: djf)

References

Details

Attachments

(3 files)

Attached file my partial patch
The "atline" option (js -o atline or js -e 'options("atline")' -f) enables parsing of comments in the form: //@line 20 "foo.js" And these comments are used to alter the line number and filename of error reports. Apparently on the line number portion of the comments are documented. Specifying a filename in an @line comment works for compilation errors, but not for runtime errors. If you use an @line comment with a filename specified, any runtime errors will be reported as if they occured in the file specified by the last @line comment. I'd like to fix this and make @line comments work for runtime errors. My use case is that I've got thousands of lines of code that need to be inside of a single closure function. But I want to break those lines of code down into separate files for ease of management. I use a Makefile to automatically concatenate the files and add the function wrapper that makes them part of the same closure. But then for debugging, all errors get reported against the one big file rather than against the individual files. Not a hugely compelling use-case, but it seemed like something fun for me to tackle as my first foray into SpiderMonkey internals. Currently jsscan.cpp does not retain the contents of the @line comments. That is pretty easy to fix. One approach I tried was to define an AtLine struct to hold the data from each @line comment, and then create a JSCList of AtLines and store it in the TokenStream. My problem with that approach is getting the data out of the TokenStream when I need it. By the time a runtime error occurs, the TokenStream with its list of AtLine stucts is long gone. I was able to use the atline data in jsscript.cpp when creating a Script object. That means that code inside functions thinks it is in the file named by //@line comment nearest to the start of the function. It doesn't work for top-level code though, and it doesn't work for functions that are broken across files. I thought about copying all the AtLine data to the JSContext or something, but can't figure out the memory management issues. A second approach I've tried is to append atline data to the primary filename. So suppose the file foo.js looks like this: foo() //@line 1 "bar.js" bar() //@line 1 "baz.js" In jsscan.cpp, I'd change the TokenStream.filename to: foo.js\n\t2=1:bar.js\n\t4=1:baz.js The newlines and tabs make this an illegal filename, but make it print nicely if it is ever displayed raw. This filename gets stored like a regular filename and is accessible when a runtime error occurs, and is properly memory managed, like a regular filename. This store the atline data in the filename is kind of kudgy, but also seems like the most straightforward way to approch the problem. My problem now is to figure out when to extract the atline data from the filename. I've modified js_ErrorToException in jsexn.cpp to extract the correct filename and line number, but it turns out that this only sets the lineNumber and fileName property of the Error object, not the file and line that are printed by default, and not the stack trace. I think I just don't understand the SpiderMonkey error reporting infrastructure well enough. I'll attach my current code, using the one-big-filename approach. I'd be happy to attach my linked-list-of-AtLine-structs attempt, too, if anyone is interested. This is my first attempt at a SpiderMonkey patch, and my C is rusty and my C++ is pretty non-existant, so go easy on me...
Attached file simple test case
Assignee: general → dflanagan
I think it might be easier to extend the src notes to handle this. They are currently used to map a PC to the appropriate line number within a JSScript. They use two notes -- SRC_SETLINE and SRC_NEWLINE -- to control the line numbers. I suggest you add a SRC_SETFILE that pretty much directly corresponds to the @line directive (well, @line would map to a SRC_SETFILE,SRC_SETLINE pair). These will then get picked up automically by the runtime exception because it uses JS_FramePCToLineNumber.
Oh. Note that this would mean that script->filename and script->lineno would maintain their current values, and could be totally different from the filename and line numbers reported for exceptions or in stack traces or whatever. That actually makes more sense to me than changing JSScript's filename to take @line into account.
Attachment #541864 - Flags: review?
I've attached a draft patch that mostly works. I need someone to take a look, since I'm probably doing a lot of things wrong. This turned out to be quite a complicated first patch to take on. Here's a writeup of what I've done: Overview: Spidermonkey has partial support, when the "atline" option is on, for comments of the form //@line n "f". These indicate that the next source line should be treated as line n of file f. Currently the line number works but the filename only works for compile-time errors and in fact, specifying a filename will in general give the wrong file for runtime errots (the last file specified is used in all runtime error reports). My goal with this bug is to make @line comments work. Terminology: I use the term "physical file" and "physical line" (or pfile and pline) to refer to the actual file and line number that a source token appears at. I use the term "virtual file" and "virtual line" (or vfile and vline) to refer to the file and line information from @line comments. General Approach: One recommendation I received was to handle the file portion of @line comments as a SrcNote, since there are already SrcNotes for mapping bytecode PC values to line numbers. I tried to make this work, but there were some serious obstacles: 1) All the src note types are in use, and I would have had to use an extended type 2) src notes have 23-bit offsets associated with them, and I wasn't sure how to associate a filename as a char* or JSAtom * with the src note. I see now that I probably could have atomized the filename and added it to the JSScript's atom list and then used the offset of the atom in that list. Though I'm not sure if it would be appropriate to mix this kind of meta-information with the actual source code atoms on that list. 3) In situations where a for(;;) loop header was in one virtual file and the loop body was in another, I would have needed source notes to jump back and forth, since apparently the bytecodes for the increment clause of a for loop is output after its body. 4) I convinced myself (don't remember the reasoning) that this approach would require me to store the current vfile on every token produced by the TokenStream, and that seemed like too big a change. So, I took a different approach: as a file is scanned, I build up a table that maps phyical line numbers in that file to virtual file/line pairs. This table is passed to the JSScript (and is memory managed in the same way that the physical script filename is so it has the right lifetime). Then, when it is time to initialize the fileName, lineNumber and stack fields of an Error object, that table is used to map from the pfile/pline to vfile/vline. Details: The AtLine and AtLineArray structures are defined in jsscript.h because that seemed like the most central place to put them. jsscan.h defines some new fields. atFilename, atLineno and atLineOffset record the data from the most recently seen @line comment and are used in jsscan.cpp for generating static compilation errors. The atlines field in jsscan.h is an AtLineArray * that holds the table of all atline data for the physical file. The accessor getAtlines() returns the value of that private field so that jsscript.cpp can get at it. jsscan.cpp modifies the @line parsing code to initialize each of these new fields. (It also gets rid of the TSF_OWNFILENAME flag which is no longer needed). It does explicit memory management for atFilename, and modifies the syntax error reporting code to use the vfile and vline if there is one. Most importantly, though, jsscan.cpp allocates and reallocates the atlines table each time it sees an @line comment. The memory for the table itself is managed by the TokenStream class (JSScript makes a private copy) but the memory for the vfilenames is allocated here in jsscan.cpp and managed along with the pfilename in jsscript.cpp. jsscript.h defines AtLine and AtLineArray, as I already mentioned. jsscript.cpp modifies the SaveScriptFilename() and the hash table entries and memory management functions associated with it to save an AtLineArray along with the pfilename. This means that the atline table has the same lifetime as the pfilename, which is the right thing, I think. The other main thing is jsscript.cpp is js_GetAtLineAndFile(). Pass in a pfile and pline and it return a vline and a vfile. jsexn.cpp uses js_GetAtLineAndFile() to initialize the fileName, lineNumber and stack fields of Error objects using vfiles and vlines. jsobj.cpp, jsfun.cpp: these files are modified so that strings of evaluated text and strings passed to the Function() constructor are given pfilenames that differ from the pfilename that they appear in. If eval is called on line 3 of foo.js, for example, then the string that is evaled will be given the pfilename "(foo.js:3)", and an error on the second line of that evaluated string would be reported on "(foo.js:3):2". Note that unlike the other changes in this patch, this one is visible even without "-o atline". Without the change, the atline table for the evaluated strings overrwrites the atline table for the parent file This is not quite working yet, but I've only got two regressions right now, so I'm ready for some feedback. Am I barking up the completely wrong tree? I'm sure I'm violating all kinds of stylistic conventions and technical rules, since this is my first time working with the spidermonkey tree, so a code review would be very much appreciated. I know that I need to write some serious tests, too. REGRESSIONS js1_5/Exceptions/regress-257751.js js1_8_5/extensions/reflect-parse.js FAIL
Attachment #541864 - Flags: review?(sphink)
Attachment #541864 - Flags: review?(jwalden+bmo)
Attachment #541864 - Flags: review?
I've broken the changes to jsfun.cpp and jsobj.cpp into a separate bug and patch. See bug 667514
Depends on: 667514
I like the approach. I'm not sure if I like it better than the SRC_SETFILE approach, though. Your list of problems reads to me like a checklist of reasonable things to do. :) Well, not sure about #4. And the memory management is definitely hairy in either case, probably moreso for SRC_SETFILE. (I don't think you want to store an @filename more than once, so you want them stashed nearby the existing script->filename, and I still don't grok how those are managed.) With your approach (I've skimmed through the patch kind of quickly): - What are the remaining problems? - Is the atLineAndFile stuff keyed only off of the script->filename? What happens if you have a script->filename with different atStuff? (eg you reload a tab in a new window and the contents have changed, or you have a dummy script name, or whatever.) - What breaks if PCToLineNumber always returns the virtual location? - I noticed a change in the 3rd arg to JS_HashTableEnumerateEntries of rt -> NULL. Are you changing the association and lifetime of these filename entries?
(In reply to comment #7) > With your approach (I've skimmed through the patch kind of quickly): > - What are the remaining problems? - A couple of regressions I haven't tracked down yet. - no tests written yet - the general feeling that there must be lots of problems with my code :-) > - Is the atLineAndFile stuff keyed only off of the script->filename? What > happens if you have a script->filename with different atStuff? (eg you > reload a tab in a new window and the contents have changed, or you have a > dummy script name, or whatever.) Yes, it is just keyed off the filename, since that's all we've got. If the parser is invoked multiple times for the same filename, the most recent AtLineArray will overwrite and replace the less recent ones, and that will cause incorrectly reported error locations. I was aware of this problem for eval() when one eval at the same line was used to evaluate multiple strings (with @line comments), but it I hadn't thought about the multiple-tab scenario. This seems like a potentially major problem. The parser could attach a sequence number to the filename, I suppose, if it sees the same file more than once. Seems like a special-case solution, though. Bug 637572 proposes something more general. > - What breaks if PCToLineNumber always returns the virtual location? I'm not sure if that would break anything. But since that function only returns a line number, I'd have to do the same (pfile,pline)->(vfile,vline) lookup again to find the virtual filename. And it seemed more conservative to do the @line table lookup at the last minute rather than altering the behavior of existing functions. > - I noticed a change in the 3rd arg to JS_HashTableEnumerateEntries of rt > -> NULL. Are you changing the association and lifetime of these filename > entries? Oops. I thought the rt there was an old change of mine and I was trying to revert it back to NULL. (Originally I was using atoms for the vfilenames, and had to do some extra memory management to mark them to prevent GC)
Just heard about this: https://wiki.mozilla.org/DevTools/Features/SourceMap Sounds like fitzgen will be working on this RSN.
(In reply to comment #9) > Just heard about this: https://wiki.mozilla.org/DevTools/Features/SourceMap > > Sounds like fitzgen will be working on this RSN. But I don't think it would replace what you're doing here. It's a fancier out-of-line way to handle more than this.
Comment on attachment 541864 [details] [diff] [review] first draft of a patch Review of attachment 541864 [details] [diff] [review]: ----------------------------------------------------------------- I haven't really looked closely enough to say that the design here is the right one. I'm inclined to think source notes are the way, but I admit to little familiarity with them, so I could well be wrong. But in any case, I'm pretty sure the manual memory management should go. We have js::Vector to encapsulate that gunk, and it's much much nicer than having to double-check every bit of memory arithmetic for correctness. Since that change is a fair bit of the patch, I'll - for that, but the design issues are more important to be sure (in a sense). Also I commented a bit on SpiderMonkey style, noted in passing, even if the lines in question are unlikely to survive in current form given the manual memory management beef. ::: js/src/jsexn.cpp @@ +644,5 @@ > if (elem->filename) { > + const char *filename; > + uintN lineno = js_GetAtLineAndFile(cx, > + elem->filename, elem->ulineno, > + &filename); Line limit in SpiderMonkey code is 99 characters (historically 79, and still that for comments, but we're expanding existing lines as we go). This probably fits in fewer lines if it goes to 99. @@ +655,3 @@ > APPEND_CHAR_TO_STACK(*cp); > } > + else { // No filename so just use the pline we have } else { Generally this sort of comment would go inside the block, aligned with the block's statements. @@ +755,4 @@ > JSString *filename; > + uint32_t lineno; > + > + if (argc > 2) { // Explicitly specified line number You're changing the ordering of processing of arguments. That's observable with something like this: function t1() { throw 1; } function t2() { throw 2; } new Error({ toString: t1, valueOf: t1 }, { toString: t2, valueOf: t2 }); In general you should be very careful about changing order of processing. Probably there's little reliance on any particular ordering, but why risk it? And if the ordering is specified by a standard (not the case here, I believe), there's no leeway to change it. (Except to conform with the standard, but it's reasonably rare that our usually-historical ordering disagrees with the spec.) ::: js/src/jsobj.cpp @@ +1232,5 @@ > evalType == DIRECT_EVAL > ? CALLED_FROM_JSOP_EVAL > : NOT_CALLED_FROM_JSOP_EVAL); > + char quoted_filename[1024]; > + JS_snprintf(quoted_filename, 1024, "(%s:%d)", filename, lineno); I still think there must be something better here, but I don't know what it is. :-\ ::: js/src/jsscan.cpp @@ +482,5 @@ > > tp = pn ? &pn->pn_pos : &currentToken().pos; > + if (atFilename) { // If we've seen an //@line comment in this file > + report.filename = atFilename; > + report.lineno = tp->begin.lineno - atLineOffset + atLineno; This sort of thing makes me worry about integer overflow and such -- no idea if they exist, but I'm assuming they do, this smells. :-) I'm not sure this patch will necessarily go places in its current form. But if it does, you're going to want tests for cases where the @line is really far from the actual file line number, with respect to whatever the type of |tp->begin.lineno| is. And, since comments really shouldn't actually break code, you'll need to figure out some vague fallback strategy that doesn't involve a compile error. ::: js/src/jsscan.h @@ +50,4 @@ > #include "jsprvtd.h" > #include "jspubtd.h" > #include "jsvector.h" > +#include "jsscript.h" /* for AtLine and AtLineArray */ Better to just forward-declare them, since you don't actually use the definitions. @@ +637,5 @@ > JSPackedBool maybeStrSpecial[256];/* speeds up string scanning */ > JSVersion version; /* (i.e. to identify keywords) */ > bool xml; /* see JSOPTION_XML */ > + > + /* The most recent //@line comment is used for syntax errors */ Full-sentence comments take periods at the end (and elsewhere). Sometimes we do fragments as comments, but it's mostly only for describing member fields, and we don't capitalize the starts of them because they're not sentences. ::: js/src/jsscript.cpp @@ +1794,5 @@ > + atlines = (AtLineArray *)sfe->value; > + if (atlines) { > + uint32 i, n = atlines->length; > + for(i = 0; i < n; i++) { > + if (atlines->vector[i].pline > pline) break; Couple style comments: We don't brace single-line ifs, if-elses, if-else if-else, &c., or loops, generally. We do brace when the condition in an if spans multiple lines, or when either arm of an if-else (or more) spans multiple lines (bracing all parts of the if-else in that case), and the opening brace is placed on a new line aligning with the |if| and the closing brace. We do brace when the head of a for-loop spans multiple lines. We don't put returns, breaks, continues, etc. on the same line as a condition. @@ +1798,5 @@ > + if (atlines->vector[i].pline > pline) break; > + } > + > + if (i > 0) { > + AtLine atline = atlines->vector[i-1]; // last seen @line Spaces around binary operators: |i - 1|. @@ +1806,5 @@ > + } > + } > + > + if (vfile == NULL) { > + // If there were no at lines or none of them applied to this pline We generally frown on sentence fragments as comments. :-) ::: js/src/jsscript.h @@ +149,5 @@ > +/* > + * AtLine and AtLineArray structures represent the @line comments in a file. > + * They're created in jsscan.cpp, stored in jsscript.cpp and used in jsexn.cpp > + */ > +typedef struct AtLine { We're C++, so there's generally (jsapi.h and jspubtd.h are exceptions, as they're also C-compatible) no need to redundantly typedef this way. @@ +155,5 @@ > + uintN vline; /* ...a virtual line number and */ > + const char *vfilename; /* ...a virtual filename. */ > +} AtLine; > + > +typedef struct AtLineArray { Manual memory management is hard. And looking at other files here, I see an awful lot of code trying to deal with that. Use js::Vector instead.
Attachment #541864 - Flags: review?(jwalden+bmo) → review-
Attachment #541864 - Flags: review?(sphink)
Depends on: 842438
@line support has been removed in bug 842438.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: