Closed Bug 755821 Opened 13 years ago Closed 8 years ago

Function() should use the parser's argument parsing code

Categories

(Core :: JavaScript Engine, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla53
Tracking Status
firefox53 --- fixed

People

(Reporter: Benjamin, Assigned: arai)

References

(Blocks 1 open bug)

Details

(Whiteboard: [js:t])

Attachments

(2 files, 13 obsolete files)

1.07 KB, patch
Details | Diff | Splinter Review
80.75 KB, patch
shu
: review+
Details | Diff | Splinter Review
Function() currently uses the tokenizer to parse function parameters. This means there are two implementations of functionArgument parsing, and parameters defined with Function don't support argument destructuring. Parser should expose its argument parsing interface and Function should use it.
Whiteboard: [js:t]
JSFunction::toString could be simplified if the Function constructor prepended the argument list to the function's source code. Comments in the patch for bug 761723 seem to suggest that should be done as part of this work.
Has this been fixed elsewhere?
No. Function() still tokenizes the arguments itself.
Assignee: general → nobody
Not only destructuring doesn't work, parameter initializers ("default values") cause a SyntaxError as well (reported at http://stackoverflow.com/q/31429405/1048572).
Attached patch very raw WIP (obsolete) — Splinter Review
I am actually surprised I got this working at all, but it seems to work. I used Chrome's way of inserting /**/ after the parameter list to ensure that we really parse the parameters as parameters.
Depends on: 1212794
Attached patch rebased WIP (obsolete) — Splinter Review
I have no idea what I am doing here, so this might be better suited for somebody with some parser experience. The attached WIP makes this work `new Function("...a", "return a;")`.
Attachment #8670884 - Attachment is obsolete: true
For some reason this doesn't throw: `new Function("}}}}")`
(In reply to Tom Schuster [:evilpie] from comment #8) > For some reason this doesn't throw: `new Function("}}}}")` I fixed this. The only remaining test failures with the shell are related to the changed toString() representation. (Includes that /**/ sentinel)
Attached patch WIP v2 new new Function parsing (obsolete) — Splinter Review
So some things to note: - It seems like it's easier to have the parser start parsing after an imaginary `function anonymous(`, that seems rather weird. - I adopted Chrome's idea of adding "\n/**/" after the parameter list. I think this is supposed to prevent people from doing funny stuff with the parameter list. As in adding code that parses differently together with ") {" after the parameter list. This is however not totally foolproof, new Function("/*", "return 1"); should probably not parse, but passes in Chrome and my implementation. - Line numbers are now +1.
Attachment #8737919 - Flags: review?(shu)
How about this, as an alternative to adding "\n/**/": - When building the string, remember the offset of the end of the argument list. - Pass it through to Parser<>::functionArgsAndBodyGeneric (optional argument) - After functionArguments() succeeds, if an offset was passed, and the current pos() doesn't match it, throw a SyntaxError. Things to test: - can't comment out the implicit ") {" using either kind of comment - can't use a backtick to make the implicit ") {" part of a template string - can't fool the parser by including "){" in an argument, making the implicit ") {" part of the body of the function - can't fool the parser by passing an argument like "= function (", making the implicit ") {" part of a default argument expression
Attachment #8737919 - Attachment is obsolete: true
Attachment #8737919 - Flags: review?(shu)
Attached patch WIP using offsets (obsolete) — Splinter Review
Great idea Jason! Now the only difference in parsing seems to be that the debugger now observers a different source including the parameter list with onNewScript. Not sure what to do there.
Assignee: nobody → evilpies
Attachment #8729539 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #8738151 - Flags: feedback?(shu)
How about putting "function anonymous(" in the string, for the sole purpose of including them in the Source? We can just manually step the tokenStream over those tokens before beginning to parse.
Comment on attachment 8738151 [details] [diff] [review] WIP using offsets Review of attachment 8738151 [details] [diff] [review]: ----------------------------------------------------------------- The general approach seems fine to me. I will look more carefully with r? As for the Debugger reporting a different source, do you mean different than what it is now, or different from some other method on the same function like toString()? If it's the former, I'm inclined to agree with jorendorff's suggestion of prepending "function anonymous(" for the sake of the source.
Attachment #8738151 - Flags: feedback?(shu) → feedback+
Following Jason's advice of adding "function anonymous(" I have to fix various things that expect the old format especially Function#toString.
Function#toString is what we were fixing. I hoped you'd be able to ditch the code to compute funCon in FunctionToString(). This would be good, because that code is gibberish. :-) Then you should be able to ditch all that buildBody stuff. I recommended it because the code *should* get a lot simpler! If that's not the case, ignore me!
I don't understand how I can add some bit of information to JSScript, JSFunction or something else so that we don't prefix the toString result with function + name. I don't fully grok the interaction with lazy scripts, cloning and such. If somebody just adds that flag for me :)
If you upload your latest patch and ni?me, I'll give it a shot, though I'm certain to feel stupid once I see what the problem is...
Flags: needinfo?(evilpies)
Attached patch WIP v2 Sketch a fix for toString (obsolete) — Splinter Review
Here you go. Thinking about this some more made me realize that putting this bit on JSScript is probably not good enough, because asm.js also implements their own toString. I think asm.js doesn't even use JSScripts so we need to store the wasFunctionConstructor bit somewhere else.
Flags: needinfo?(evilpies) → needinfo?(jorendorff)
Status: What I want is for FunctionToString to do almost nothing in most circumstances. It should just have to take a slice out of the source. That's all. It's not hard... but it's a bit of a pain to implement because the frontend code around function parsing is so complicated (lazy parsing, strict mode, asm.js, and lots of other little things happen at function boundaries). So I've been spending spare hours on this and it's getting close. If I can't get it done Monday or Tuesday, I'll just post whatever I've got and give up on it.
(In reply to Jason Orendorff [:jorendorff] from comment #20) > It's not hard... but it's a bit of a pain to implement because the frontend > code around function parsing is so complicated (lazy parsing, strict mode, > asm.js, and lots of other little things happen at function boundaries). *And* there are lots of different entry points into function parsing: function statements, function expressions, arrow functions, legacy generator-expressions, the new parseStandaloneFunction. To be clear, all I'm trying to achieve is correct source code coordinates for functions. A third complication is that these are stored in multiple places and even (in FunctionBox) in two different forms, senselessly.
This is still like my 8th priority but I've been making steady enough progress on this I decided to hold on to it for another week. If it stalls I'll give up, I promise :)
Assignee: evilpies → jorendorff
Attachment #8751973 - Attachment is obsolete: true
Attachment #8755031 - Attachment is obsolete: true
All right, this at least passes the JS tests and jit-tests, and it's -160 lines of code! Pretty sure there's still a bug or two, *and*: * The new argStart stuff is not great. It would be better for the lazy-parsing code in CompileLazyFunction to just be able to cope with the few syntax variations that can occur at the beginning of a lazy function. * After this change there are some offset fields in asmjs code that are no longer used; they should be dropped entirely.
Flags: needinfo?(jorendorff)
In case it wasn't clear I am dropping this with an audible thud --- good luck.
Assignee: jorendorff → evilpies
Computed names make it a bit difficult to get rid of argStart/argBegin. :-|
Attachment #8738151 - Attachment is obsolete: true
Blocks: 1288460
This still has at least one crash with basic/destructuring-default.js. Something about a HashTable from newVarScopeData(). Oh and we are back to the prefixing with "function[*] anonymous" problem: Not prefixing gives the right column numbers when reporting SyntaxErrors, but bad results for script-sources in the debugger. Anyway, I have no real idea if what I am doing in Parser.cpp is correct, I just kind of copied parts from standaloneFunctionBody and standaloneLazyFunction. It does seem a lot of stuff got simpler since the rewrite \o/
Attachment #8746988 - Attachment is obsolete: true
Hey shu, do you mind running jit-test/tests/basic/destructuring-default.js with the previous patch applied and see if you can spot some obvious bug? Most of the Parser::standaloneFunction code I just copy-pasted from other Praser code :/
Flags: needinfo?(shu)
Comment on attachment 8786442 [details] [diff] [review] WIP v3 Use normal parsing code fore new Function Review of attachment 8786442 [details] [diff] [review]: ----------------------------------------------------------------- I posted a patch that makes ParseContext::VarScope less of a footgun and actually null out pc->varScope_ on destruction. ::: js/src/frontend/Parser.cpp @@ +2151,5 @@ > + return null(); > + > + fn->pn_pos.end = pos().end; > + > + if (!finishFunction()) The bug is that finishFunction() expects there to be a valid varScope_ on ParseContext. For parsing functions, we make an additional VarScope if there are parameter expressions. Otherwise we re-use the function scope. In the case of destructuring with parameter exprs, we stack-allocated a new VarScope inside functionFormalParametersAndBody above. So functionFormalParametersAndBody already calls its own finishFunction(). The reason the old standaloneFunction called finishFunction separately is because it did *not* call functionFormalParametersAndBody since it parsed its own parameters.
Flags: needinfo?(shu)
Blocks: 1298778
I am still not finished rebasing, but I do have to say that part 2 is really a lot of changes across the whole codebase.
Attached patch Part 2 - (Incorrect) Rebase (obsolete) — Splinter Review
I am pretty sure I switched argsStart, begin and end somewhere, but I have to find it. I rebased this by hand :/. The Function constructor works though :). Running jit-test we run it a lot of "this token was previously looked up with a different modifier, potentially making tokenization non-deterministic" asserts.
Attachment #8786442 - Attachment is obsolete: true
Attachment #8798344 - Attachment is obsolete: true
It seems like for lazy function that are reparsed into real functions somehow don't get the right begin/end, but still parse? Maybe just something with source data. Test case: function g(x) { if (x) { function f() {} return f; } } print(g(1) + ""); There are some other random failures as well. JIT test failures: arrow-functions/syntax-errors.js asm.js/bug1268955-usestrict-semantics.js asm.js/bug1268955-usestrict-semantics.js asm.js/bug941877.js asm.js/testSource.js asm.js/testSource.js asm.js/testSource.js basic/bug1141154.js basic/destructuring-default.js basic/function-tosource-lambda.js basic/iterable-error-messages.js basic/testBug961969.js basic/withSourceHook.js debug/Debugger-debuggees-28.js debug/Frame-eval-21.js debug/Frame-implementation-01.js debug/Frame-onStep-resumption-05.js debug/Memory-takeCensus-08.js debug/Object-parameterNames.js debug/RematerializedFrame-retval.js debug/Script-gc-02.js debug/Script-gc-03.js debug/Script-sourceStart-02.js debug/bug1282741.js debug/resumption-08.js for-of/decompiler.js ion/bug1233343.js parser/modifier-arrow-rest.js parser/syntax-error-illegal-character.js parser/modifier-regexp-vs-div.js parser/modifier-yield-without-operand-1.js parser/modifier-yield-without-operand-2.js wasm/import-export.js wasm/import-export.js xdr/asm.js xdr/lazy.js
Okay success, we are pretty close. jstests passed and jit-tests has the same error 5 times plus the one token error described earlier. I must not be setting the correct end position somewhere. Jason can you take a look? I probably just forgot to adapt one line of your patch somewhere again. I also added a test. Shu should probably review this patch, because Jason basically wrote 90% of it. (I am going to adjust authorship accordingly) I changed the jsapi.cpp CompileFunction to set the name after parsing, because XUL seems to pass function names that aren't valid identifiers like "root-element_XBL_Constructor". There is another assert during browser startup (plus one about before <= end agin): 0x00007fffea0a6a49 in AssertScopeMatchesEnvironment (scope=0x7fffc1c4a040, originalEnv=0x7fffbcd94520) at /home/tom/projects/inbound/js/src/vm/Stack.cpp:145 145 MOZ_ASSERT(env->as<LexicalEnvironmentObject>().isGlobal()); Failures: arrow-functions/syntax-errors.js basic/destructuring-default.js basic/testBug961969.js debug/Memory-takeCensus-08.js wasm/import-export.js --wasm-always-baseline wasm/import-export.js
Attachment #8755032 - Attachment is obsolete: true
Attachment #8801489 - Attachment is obsolete: true
Attachment #8801513 - Flags: feedback?(shu)
Attachment #8801513 - Flags: feedback?(jorendorff)
Fixed the scope issue and disabled the assertion in newExprStatement and we get a working browser.
Attachment #8801513 - Attachment is obsolete: true
Attachment #8801513 - Flags: feedback?(shu)
Attachment #8801513 - Flags: feedback?(jorendorff)
Attachment #8801529 - Flags: feedback?(shu)
Attachment #8801529 - Flags: feedback?(jorendorff)
Comment on attachment 8801529 [details] [diff] [review] WIP v5 Use normal parsing code fore new Function Review of attachment 8801529 [details] [diff] [review]: ----------------------------------------------------------------- I'm reading through this still. Could you explain why argsStart needs to be separated from bufStart? I couldn't quite figure from the code how they're different.
Sorry for giving up :(
Assignee: evilpies → jorendorff
iiuc, [bufStart:bufEnd) contains whole function source, that is the result of fun.toString(). and [argsStart:bufEnd) contains the part that needs to be parsed when compiling lazy function. storing [bufStart:argsStart) range helps generating better .toString() result. before the patch, we generate that part depending on flags, but it means some details are not reflected to .toString() result, such as, comment, computed property, spaces.
(In reply to Tom Schuster [:evilpie] from comment #10) > - Line numbers are now +1. If we follow this proposal and use the generated code for compiling, column number of 1st line is also changed. https://tc39.github.io/Function-prototype-toString-revision/ before patch: Function("a()")() => error at Function:1:1 with WIP patch above: Function("a()")() => error at Function:2:1 with the proposal's syntax applied: Function("a()")() => error at Function:2:4 this breaks line/column numbering of resource://gre/modules/workers/require.js with the patch, line number doesn't match between error message and require'd file...
so, the question is, how many cases depend on the current line/column numbering of Function ctor.
FWIW, chrome returns 2:1 for Function("a()")()
Also, from the change to JS::CompileFunction, event handlers are also affected by line number +1,
taking over. I'm about to fix only Function constructor's parameter handling here, leaving toString, since it's touched by proposal, and currently it doesn't match to any implementation. we should wait for a while before touching the code. https://tc39.github.io/Function-prototype-toString-revision/ https://github.com/tc39/Function-prototype-toString-revision/issues/20 also, leaving that part to other bug will make the patch simpler :) about line numbering I'll use line number 0 as beginning of the function, so that function body starts from line 1.
Assignee: jorendorff → arai.unmht
So, here's simplified version of the previous patch, that only touches Function constructor's parameter handling and source code storage. Changes from previous patch: * Changed the following properties back to original meaning * ParseNode.pn_pos.begin for PNK_FUNCTION * FunctionBox.bufStart * AsmJSMetadata.srcStart * JSScript.sourceStart_ * Lazyscript.begin_ so that this patch doesn't affect existing code that touches them * Reverted all parts that touches Function#toString for function that is not Function constructor * Changed |BytecodeCompiler::compileFunctionBody| to receive the function source that starts with parameter, not "function". This is for reusing existing toString. and also removed |argsStart| since it's always the beginning * Stored |parameterListEnd| to |ScriptSource| Added |ScriptSource::functionBodyString| too get function body in debugger * Changed some places that calls |JS::CompileFunction| to use line = 0 instead of line = 1 in |CompileOptions|, to make the function body starts from line 1 * Renamed |lambdaParen| parameter of js::FunctionToString to |noAddParenToLambda|, just because |lambdaParen| sounds wrongs for purpose of the parameter Also, summary of other changes from previous patch: * Generate source in FunctionConstructor and JS::CompileFunction, from passed parameters and body * Check |parameterListEnd| position after parsing parameter, and throw if it doesn't match, to check strange syntax in Function ctor parameters * Changed function names not to include "Body" * BytecodeCompiler::compileFunctionBody => BytecodeCompiler::compileStandaloneFunction * frontend::CompileFunctionBody => frontend::CompileStandaloneFunction * frontend::CompileStarGeneratorBody => frontend::CompileStandaloneGenerator * frontend::CompileAsyncFunctionBody => frontend::CompileStandaloneAsyncFunction * Parser::standaloneFunctionBody => Parser::standaloneFunction * Removed the following properties/methods, since no more used: * BytecodeCompiler::setSourceArgumentsNotIncluded * BytecodeCompiler.sourceArgumentsNotIncluded * ScriptSource::argumentsNotIncluded * ScriptSource.argumentsNotIncluded_ * Updated tests to follow new behavior
Attachment #8801529 - Attachment is obsolete: true
Attachment #8801529 - Flags: feedback?(shu)
Attachment #8801529 - Flags: feedback?(jorendorff)
Attachment #8811051 - Flags: review?(shu)
remaining concern is that, since I used |line = 0| for the Function's first line, sometimes users and tools may see line 0 in error and debugger. it already happens in devtools/client/inspector/markup/test/browser_markup_events3.js but that is less impact than using |line = 1|, that causes all line numbering different, especially in |require()| and event handler.
Thanks for taking. I had been working through bugs apparently introduced by rebasing, but wasn't really getting anywhere. If V8 uses |line = 1|, I'm for that (a separate patch is fine, just in case).
to be clear, |require()| uses Function ctor, and event handler uses JS::CompileFunction. so, the situation is a bit different. we could apply some trick to JS::CompileFunction, to make body starts with line 1 for event handlers, or perhaps we could provide another API for event handlers and something that expects function body starts from line 1. but |require()| will be directly affected by |line = 1| change, https://dxr.mozilla.org/mozilla-central/rev/28e2a6dde76ab6ad4464a3662df1bd57af04398a/toolkit/components/workerloader/require.js#137-140 > let code = new Function("exports", "require", "module", > source + "\n//# sourceURL=" + uri + "\n" > ); > code(exports, require, module); if we add the API above, we might be able to use it there, but I'm not sure if it worth extra complicity around APIs.
Comment on attachment 8811051 [details] [diff] [review] Parse arguments of Function constructor properly. Review of attachment 8811051 [details] [diff] [review]: ----------------------------------------------------------------- Not yet finished, posting what I have as I'll finish the rest on a different computer. ::: dom/events/EventListenerManager.cpp @@ +1070,5 @@ > if (NS_WARN_IF(!GetOrCreateDOMReflector(cx, aElement, &v))) { > return NS_ERROR_FAILURE; > } > JS::CompileOptions options(cx); > + // Use line 0 to make the function body starts from line 1. Oof this worries me. I wonder how we can tell the extent (if any) of this breaking the web. ::: js/src/frontend/BytecodeCompiler.cpp @@ -676,5 @@ > return bce.emitFunctionScript(pn->pn_body); > } > > -// Compile a JS function body, which might appear as the value of an event > -// handler attribute in an HTML <INPUT> tag, or in a Function() constructor. How come this comment was removed? Is it no longer true? I'd like the comment to remain. ::: js/src/frontend/Parser.cpp @@ -2319,5 @@ > - for (uint32_t i = 0; i < formals.length(); i++) { > - if (!notePositionalFormalParameter(fn, formals[i], false, &duplicatedParam)) > - return null(); > - } > - funbox->hasDuplicateParameters = duplicatedParam; \o/ @@ +3354,5 @@ > bool > Parser<ParseHandler>::functionFormalParametersAndBody(InHandling inHandling, > YieldHandling yieldHandling, > + Node pn, FunctionSyntaxKind kind, > + uint32_t parameterListEnd) Could you use a Maybe<uint32_t> for parameterListEnd instead of 0 as a sentinel value to make the intention clear? @@ +3387,5 @@ > } > } > > + // When parsing something for new Function() we have to make sure to > + // only treat a certain part of the source as a paremeter list. Typo: parameter @@ +3448,5 @@ > if (!matched) { > error(JSMSG_CURLY_AFTER_BODY); > return false; > } > + funbox->bufEnd = pos().end; Is it the case that the last token is always '}', and so pos().end == pos().begin + 1? ::: js/src/jsapi.cpp @@ +4205,5 @@ > /* > * enclosingScope is a static enclosing scope, if any (e.g. a WithScope). If > * the enclosing scope is the global scope, this must be null. > * > * enclosingDynamicScope is a dynamic scope to use, if it's not the global. While you're here, could you change "static enclosing scope" to "scope" and "dynamic scope" to "environment"? Thanks! @@ +4251,5 @@ > +static MOZ_MUST_USE bool > +BuildFunctionString(unsigned nargs, const char* const* argnames, > + const SourceBufferHolder& srcBuf, StringBuffer* out, > + uint32_t* parameterListEnd) > +{ MOZ_ASSERT(parameterListEnd) and MOZ_ASSERT(out) please.
Comment on attachment 8811051 [details] [diff] [review] Parse arguments of Function constructor properly. Review of attachment 8811051 [details] [diff] [review]: ----------------------------------------------------------------- I'm still kind of worried about the line number changes for event handlers. Who knows what people depend on... I don't think we have a better way to find out breakage than to just try it, however. ::: js/src/jit-test/tests/debug/Script-gc-02.js @@ +9,5 @@ > > gc(); > > for (var i = 0; i < arr.length; i++) > + assertEq(arr[i].lineCount, 3); As an aside, it's kind of weird we have such a test. ::: js/src/jsfun.cpp @@ +1693,2 @@ > > + if (!sb.append(") {\n")) See note above for putting these constants into static consts. @@ +1717,5 @@ > * anonymous function in the top-level scope that its constructor inhabits. > * Thus 'var x = 42; f = new Function("return x"); print(f())' prints 42, > * and so would a call to f from another top-level's script or function. > */ > RootedAtom anonymousAtom(cx, cx->names().anonymous); Please double check that a |new Function|'d function cannot recursively call itself using "anonymous". I'm pretty sure this is already the case. @@ -1892,5 @@ > - } > - } > - > - if (hasRest) > - fun->setHasRest(); \o/ ::: js/src/jsscript.cpp @@ +1681,5 @@ > + MOZ_ASSERT(isFunctionBody()); > + > + // +4 for ") {\n" > + size_t start = parameterListEnd_ + 4; > + // -2 for "\n}" Please put these string constants ") {\n" and "\n}" into static consts somewhere and derive the lengths from those instead of hardcoding lengths and re-typing the constants.
Attachment #8811051 - Flags: review?(shu) → review+
Thank you for reviewing. (In reply to Shu-yu Guo [:shu] from comment #50) > @@ +3448,5 @@ > > if (!matched) { > > error(JSMSG_CURLY_AFTER_BODY); > > return false; > > } > > + funbox->bufEnd = pos().end; > > Is it the case that the last token is always '}', and so pos().end == > pos().begin + 1? Yes (In reply to Shu-yu Guo [:shu] from comment #51) > @@ +1717,5 @@ > > * anonymous function in the top-level scope that its constructor inhabits. > > * Thus 'var x = 42; f = new Function("return x"); print(f())' prints 42, > > * and so would a call to f from another top-level's script or function. > > */ > > RootedAtom anonymousAtom(cx, cx->names().anonymous); > > Please double check that a |new Function|'d function cannot recursively call > itself using "anonymous". I'm pretty sure this is already the case. This patch doesn't touch that part. I'll try fixing it in bug 636635.
See Also: → 636635
The remaining parts about toString should be fixed in bug 1317400.
See Also: → 1317400
I forgot to support single line comment in parameter, that's because we currently don't put newline after parameter list. I'll fix it here. maybe, by limiting the buffer length while parsing parameter.
Keywords: leave-open
discussed with shu on IRC, and the single line comment should be fixed at the same time as toString (bug 1317400)
Keywords: leave-open
See Also: → 1319638
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: