Closed Bug 866367 Opened 12 years ago Closed 10 years ago

new RegExp('\n').toString() generates invalid RegularExpressionLiteral format string

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: utatane.tea, Unassigned)

References

Details

Attachments

(1 file, 1 obsolete file)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.65 Safari/537.31 Steps to reproduce: Executes `new RegExp('\n').toString()` Actual results: The script returns `"/\n/"` Expected results: The script should return `"/\\n/"`
Previously I submitted the patch to JSC to solve the same problem and it is landed. https://bugs.webkit.org/show_bug.cgi?id=71572 So I'll implement the same logic to SpiderMonkey
Attached patch rev1 patch (obsolete) — Splinter Review
Attachment #742677 - Flags: review?(sstangl)
In section 15.10.6.4, > NOTE The returned String has the form of a RegularExpressionLiteral that evaluates to another RegExp object with the same behaviour as this object. http://ecma-international.org/ecma-262/5.1/#sec-15.10.6.4 So returning "/\n/" is a spec violation.
Hi Yusuke, thank you for fixing this bug in Firefox! It looks like it could be solved in a simpler manner, though, by modifying the RegExp source string at the time of object creation. There's already a function EscapeNakedForwardSlashes() in builtin/RegExp.cpp that performs a similar function -- would you be able to rewrite this function to similarly handle '\n' and the other LineTerminator characters?
Status: UNCONFIRMED → NEW
Ever confirmed: true
The issue of /\n/.source and /\n/.toString() semantics was brought up semi-recently on es-discuss: https://mail.mozilla.org/pipermail/es-discuss/2013-March/029278.html It'd be great if you spent the time to figure out steps/semantics that would always produce well-formed regular expression syntax, then tested them against implementations and proposed them there. ;-) If you don't have time to do this, that's understandable, but it'd definitely be helpful to have a clear description of the exact escaping behavior implemented, for reviewing purposes. And it'd be really nice if the spec had clear, non-vague steps for all this, so that all implementations can get on the same page.
(In reply to Sean Stangl [:sstangl] from comment #4) Thanks for your review! > It looks like it could be solved in a simpler manner, though, by modifying > the RegExp source string at the time of object creation. There's already a > function EscapeNakedForwardSlashes() in builtin/RegExp.cpp that performs a > similar function -- would you be able to rewrite this function to similarly > handle '\n' and the other LineTerminator characters? Oh, I've missed EscapeNakedForwardSlashes. Agreed. Rewriting EscapeNakedForwardSlashes is simpler. I'll fix it and attach a revised patch. (In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #5) > The issue of /\n/.source and /\n/.toString() semantics was brought up > semi-recently on es-discuss: > > https://mail.mozilla.org/pipermail/es-discuss/2013-March/029278.html > > It'd be great if you spent the time to figure out steps/semantics that would > always produce well-formed regular expression syntax, then tested them > against implementations and proposed them there. ;-) Agreed. Thanks for your suggestion. I'll investigate the steps/semantics and post it to es-discuss.
Attached patch rev2 patchSplinter Review
I've created rev2 patch. Moving logic to EscapeNakedForwardSlashes.
Attachment #742677 - Attachment is obsolete: true
Attachment #742677 - Flags: review?(sstangl)
Attachment #746028 - Flags: review?(sstangl)
And above this patch also fixes an issue in regression 248444 fix. RegExp("[^\/]+$").toString() was expected '/[^\\/]+$/'. But because '/' can be exists in brackets, '/[^/]+$/' is correct. This patch covers this backslash problem at the same time :)
Comment on attachment 746028 [details] [diff] [review] rev2 patch Review of attachment 746028 [details] [diff] [review]: ----------------------------------------------------------------- Thanks for the patch! This looks good -- I just had some minor nits, and I did not understand the need for bracket-tracking logic. Adding an explanatory comment would be sufficient. ::: js/src/builtin/RegExp.cpp @@ +170,5 @@ > > return CreateRegExpMatchResult(cx, input, chars, length, matches, rval); > } > > +static bool isLineTerminator(jschar ch) { Static functions generally begin with a capital letter in our style: IsLineTerminator(). @@ +178,3 @@ > /* Note: returns the original if no escaping need be performed. */ > static JSAtom * > EscapeNakedForwardSlashes(JSContext *cx, HandleAtom unescaped) This function needs a rename. Maybe EscapeNakedCharacters()? @@ +191,5 @@ > > for (const jschar *it = oldChars; it < oldChars + oldLen; ++it) { > + const jschar ch = *it; > + if (!previousCharacterWasBackslash) { > + if (inBrackets) { I don't understand the need for logic surrounding the brackets. Why are forward slashes not escaped if they occur within brackets? A comment explaining the reasoning would be helpful for posterity (and to me!). @@ +205,5 @@ > + } > + if (!sb.append('\\')) > + return NULL; > + } else if (ch == '[') > + inBrackets = true; Nit: SpiderMonkey style requires brackets on all arms of an if{} structure if one arm has brackets. @@ +214,3 @@ > if (sb.empty()) { > /* This is the first one we've seen, copy everything up to this point. */ > if (!sb.reserve(oldLen + 1)) This isn't quite the right length to reserve in this case, but it may not matter. @@ +219,5 @@ > } > + > + if (!previousCharacterWasBackslash) > + if (!sb.append('\\')) > + return NULL; The code from sb.empty() to here is roughly repeated twice in this function. Could the logic be moved out to a JS_ALWAYS_INLINE helper function? @@ +221,5 @@ > + if (!previousCharacterWasBackslash) > + if (!sb.append('\\')) > + return NULL; > + > + switch (ch) { This switch statement could be better expressed by having cases like: > char *expansion = NULL; > switch(ch) { > case '\n': expansion = "n"; break; > ... > default: JS_NOT_REACHED("Unexpected line terminator"); > } > > if (!sb.append(expansion))... which would permit a single callsite for .append() and reduce vertical code size. @@ +245,5 @@ > return NULL; > } > > + if (previousCharacterWasBackslash) > + previousCharacterWasBackslash = false; What happens if the previous character was a backslash, and the current character is also a backslash? Based on the old code working, it seems like the situation can't arise. Could we do |JS_ASSERT_IF(previousCharacterWasBackslash, ch != '\\')|? @@ +247,5 @@ > > + if (previousCharacterWasBackslash) > + previousCharacterWasBackslash = false; > + else > + previousCharacterWasBackslash = ch == '\\'; As a very minor style nit, it would read more nicely with parentheses around "ch == '\\'".
Attachment #746028 - Flags: review?(sstangl)
(In reply to Sean Stangl [:sstangl] from comment #9) > I don't understand the need for logic surrounding the brackets. Why are forward slashes not escaped if they occur within brackets? A comment explaining the reasoning would be helpful for posterity (and to me!). See ES5 section 15.10.2.17: http://es5.github.io/#x15.10.2.17 RegExp character classes can contain any character unescaped except for a right square bracket (`]`).
(In reply to Sean Stangl [:sstangl] from comment #9) Thank you for your review. I'll attach a revised patch. > Comment on attachment 746028 [details] [diff] [review] > rev2 patch > > Review of attachment 746028 [details] [diff] [review]: > ----------------------------------------------------------------- > > Thanks for the patch! This looks good -- I just had some minor nits, and I > did not understand the need for bracket-tracking logic. Adding an > explanatory comment would be sufficient. > > ::: js/src/builtin/RegExp.cpp > @@ +170,5 @@ > > > > return CreateRegExpMatchResult(cx, input, chars, length, matches, rval); > > } > > > > +static bool isLineTerminator(jschar ch) { > > Static functions generally begin with a capital letter in our style: > IsLineTerminator(). OK, I've got it. I'll change it. Is there Coding style guide about SpiderMonkey? I want to read it. > @@ +178,3 @@ > > /* Note: returns the original if no escaping need be performed. */ > > static JSAtom * > > EscapeNakedForwardSlashes(JSContext *cx, HandleAtom unescaped) > > This function needs a rename. Maybe EscapeNakedCharacters()? Right. I'll rename it. > @@ +191,5 @@ > > > > for (const jschar *it = oldChars; it < oldChars + oldLen; ++it) { > > + const jschar ch = *it; > > + if (!previousCharacterWasBackslash) { > > + if (inBrackets) { > > I don't understand the need for logic surrounding the brackets. Why are > forward slashes not escaped if they occur within brackets? A comment > explaining the reasoning would be helpful for posterity (and to me!). As Michael wrote, this is because RegExp character classes allow any character except for ]. So "[/]" should not be escaped. This is acceptable RegExp literal. > @@ +205,5 @@ > > + } > > + if (!sb.append('\\')) > > + return NULL; > > + } else if (ch == '[') > > + inBrackets = true; > > Nit: SpiderMonkey style requires brackets on all arms of an if{} structure > if one arm has brackets. I see. I'll fix it. > @@ +214,3 @@ > > if (sb.empty()) { > > /* This is the first one we've seen, copy everything up to this point. */ > > if (!sb.reserve(oldLen + 1)) > > This isn't quite the right length to reserve in this case, but it may not > matter. > > @@ +219,5 @@ > > } > > + > > + if (!previousCharacterWasBackslash) > > + if (!sb.append('\\')) > > + return NULL; > > The code from sb.empty() to here is roughly repeated twice in this function. > Could the logic be moved out to a JS_ALWAYS_INLINE helper function? I've got it. I'll create a helper function and move these logic to it. > @@ +221,5 @@ > > + if (!previousCharacterWasBackslash) > > + if (!sb.append('\\')) > > + return NULL; > > + > > + switch (ch) { > > This switch statement could be better expressed by having cases like: > > > char *expansion = NULL; > > switch(ch) { > > case '\n': expansion = "n"; break; > > ... > > default: JS_NOT_REACHED("Unexpected line terminator"); > > } > > > > if (!sb.append(expansion))... > > which would permit a single callsite for .append() and reduce vertical code > size. You're right. I'll change it. > @@ +245,5 @@ > > return NULL; > > } > > > > + if (previousCharacterWasBackslash) > > + previousCharacterWasBackslash = false; > > What happens if the previous character was a backslash, and the current > character is also a backslash? Based on the old code working, it seems like > the situation can't arise. Could we do > |JS_ASSERT_IF(previousCharacterWasBackslash, ch != '\\')|? When `new RegExp('\\\\')` is executed (2 backslashes), previousCharacterWasBackslash = true & ch = '\\' at 2nd position. In current code, when it is executed, code produces '\\\\' (2 backslashes) string. (RegExp Literal, /\\/) > @@ +247,5 @@ > > > > + if (previousCharacterWasBackslash) > > + previousCharacterWasBackslash = false; > > + else > > + previousCharacterWasBackslash = ch == '\\'; > > As a very minor style nit, it would read more nicely with parentheses around > "ch == '\\'". OK, I'll rewirte it to `(ch == '\\')`
Assignee: general → nobody
Depends on: 1130860
Fixed by bug 1130860. I saw but somehow neglected this bug when filing bug 1130860. (In my defense, we had an almost correct patch for it in bug 1120169, so I didn't look too closely.)
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: