Closed
Bug 866367
Opened 12 years ago
Closed 10 years ago
new RegExp('\n').toString() generates invalid RegularExpressionLiteral format string
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: utatane.tea, Unassigned)
References
Details
Attachments
(1 file, 1 obsolete file)
6.48 KB,
patch
|
Details | Diff | Splinter Review |
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.65 Safari/537.31
Steps to reproduce:
Executes `new RegExp('\n').toString()`
Actual results:
The script returns `"/\n/"`
Expected results:
The script should return `"/\\n/"`
Reporter | ||
Comment 1•12 years ago
|
||
Previously I submitted the patch to JSC to solve the same problem and it is landed.
https://bugs.webkit.org/show_bug.cgi?id=71572
So I'll implement the same logic to SpiderMonkey
Reporter | ||
Comment 2•12 years ago
|
||
Reporter | ||
Updated•12 years ago
|
Attachment #742677 -
Flags: review?(sstangl)
Reporter | ||
Comment 3•12 years ago
|
||
In section 15.10.6.4,
> NOTE The returned String has the form of a RegularExpressionLiteral that evaluates to another RegExp object with the same behaviour as this object.
http://ecma-international.org/ecma-262/5.1/#sec-15.10.6.4
So returning "/\n/" is a spec violation.
Comment 4•12 years ago
|
||
Hi Yusuke, thank you for fixing this bug in Firefox!
It looks like it could be solved in a simpler manner, though, by modifying the RegExp source string at the time of object creation. There's already a function EscapeNakedForwardSlashes() in builtin/RegExp.cpp that performs a similar function -- would you be able to rewrite this function to similarly handle '\n' and the other LineTerminator characters?
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment 5•12 years ago
|
||
The issue of /\n/.source and /\n/.toString() semantics was brought up semi-recently on es-discuss:
https://mail.mozilla.org/pipermail/es-discuss/2013-March/029278.html
It'd be great if you spent the time to figure out steps/semantics that would always produce well-formed regular expression syntax, then tested them against implementations and proposed them there. ;-) If you don't have time to do this, that's understandable, but it'd definitely be helpful to have a clear description of the exact escaping behavior implemented, for reviewing purposes. And it'd be really nice if the spec had clear, non-vague steps for all this, so that all implementations can get on the same page.
Reporter | ||
Comment 6•12 years ago
|
||
(In reply to Sean Stangl [:sstangl] from comment #4)
Thanks for your review!
> It looks like it could be solved in a simpler manner, though, by modifying
> the RegExp source string at the time of object creation. There's already a
> function EscapeNakedForwardSlashes() in builtin/RegExp.cpp that performs a
> similar function -- would you be able to rewrite this function to similarly
> handle '\n' and the other LineTerminator characters?
Oh, I've missed EscapeNakedForwardSlashes. Agreed. Rewriting EscapeNakedForwardSlashes is simpler.
I'll fix it and attach a revised patch.
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #5)
> The issue of /\n/.source and /\n/.toString() semantics was brought up
> semi-recently on es-discuss:
>
> https://mail.mozilla.org/pipermail/es-discuss/2013-March/029278.html
>
> It'd be great if you spent the time to figure out steps/semantics that would
> always produce well-formed regular expression syntax, then tested them
> against implementations and proposed them there. ;-)
Agreed. Thanks for your suggestion. I'll investigate the steps/semantics and post it to es-discuss.
Reporter | ||
Comment 7•12 years ago
|
||
I've created rev2 patch.
Moving logic to EscapeNakedForwardSlashes.
Attachment #742677 -
Attachment is obsolete: true
Attachment #742677 -
Flags: review?(sstangl)
Attachment #746028 -
Flags: review?(sstangl)
Reporter | ||
Comment 8•12 years ago
|
||
And above this patch also fixes an issue in regression 248444 fix.
RegExp("[^\/]+$").toString() was expected '/[^\\/]+$/'. But because '/' can be exists in brackets, '/[^/]+$/' is correct.
This patch covers this backslash problem at the same time :)
Comment 9•12 years ago
|
||
Comment on attachment 746028 [details] [diff] [review]
rev2 patch
Review of attachment 746028 [details] [diff] [review]:
-----------------------------------------------------------------
Thanks for the patch! This looks good -- I just had some minor nits, and I did not understand the need for bracket-tracking logic. Adding an explanatory comment would be sufficient.
::: js/src/builtin/RegExp.cpp
@@ +170,5 @@
>
> return CreateRegExpMatchResult(cx, input, chars, length, matches, rval);
> }
>
> +static bool isLineTerminator(jschar ch) {
Static functions generally begin with a capital letter in our style: IsLineTerminator().
@@ +178,3 @@
> /* Note: returns the original if no escaping need be performed. */
> static JSAtom *
> EscapeNakedForwardSlashes(JSContext *cx, HandleAtom unescaped)
This function needs a rename. Maybe EscapeNakedCharacters()?
@@ +191,5 @@
>
> for (const jschar *it = oldChars; it < oldChars + oldLen; ++it) {
> + const jschar ch = *it;
> + if (!previousCharacterWasBackslash) {
> + if (inBrackets) {
I don't understand the need for logic surrounding the brackets. Why are forward slashes not escaped if they occur within brackets? A comment explaining the reasoning would be helpful for posterity (and to me!).
@@ +205,5 @@
> + }
> + if (!sb.append('\\'))
> + return NULL;
> + } else if (ch == '[')
> + inBrackets = true;
Nit: SpiderMonkey style requires brackets on all arms of an if{} structure if one arm has brackets.
@@ +214,3 @@
> if (sb.empty()) {
> /* This is the first one we've seen, copy everything up to this point. */
> if (!sb.reserve(oldLen + 1))
This isn't quite the right length to reserve in this case, but it may not matter.
@@ +219,5 @@
> }
> +
> + if (!previousCharacterWasBackslash)
> + if (!sb.append('\\'))
> + return NULL;
The code from sb.empty() to here is roughly repeated twice in this function. Could the logic be moved out to a JS_ALWAYS_INLINE helper function?
@@ +221,5 @@
> + if (!previousCharacterWasBackslash)
> + if (!sb.append('\\'))
> + return NULL;
> +
> + switch (ch) {
This switch statement could be better expressed by having cases like:
> char *expansion = NULL;
> switch(ch) {
> case '\n': expansion = "n"; break;
> ...
> default: JS_NOT_REACHED("Unexpected line terminator");
> }
>
> if (!sb.append(expansion))...
which would permit a single callsite for .append() and reduce vertical code size.
@@ +245,5 @@
> return NULL;
> }
>
> + if (previousCharacterWasBackslash)
> + previousCharacterWasBackslash = false;
What happens if the previous character was a backslash, and the current character is also a backslash? Based on the old code working, it seems like the situation can't arise. Could we do |JS_ASSERT_IF(previousCharacterWasBackslash, ch != '\\')|?
@@ +247,5 @@
>
> + if (previousCharacterWasBackslash)
> + previousCharacterWasBackslash = false;
> + else
> + previousCharacterWasBackslash = ch == '\\';
As a very minor style nit, it would read more nicely with parentheses around "ch == '\\'".
Attachment #746028 -
Flags: review?(sstangl)
Comment 10•12 years ago
|
||
(In reply to Sean Stangl [:sstangl] from comment #9)
> I don't understand the need for logic surrounding the brackets. Why are forward slashes not escaped if they occur within brackets? A comment explaining the reasoning would be helpful for posterity (and to me!).
See ES5 section 15.10.2.17: http://es5.github.io/#x15.10.2.17
RegExp character classes can contain any character unescaped except for a right square bracket (`]`).
Reporter | ||
Comment 11•12 years ago
|
||
(In reply to Sean Stangl [:sstangl] from comment #9)
Thank you for your review. I'll attach a revised patch.
> Comment on attachment 746028 [details] [diff] [review]
> rev2 patch
>
> Review of attachment 746028 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> Thanks for the patch! This looks good -- I just had some minor nits, and I
> did not understand the need for bracket-tracking logic. Adding an
> explanatory comment would be sufficient.
>
> ::: js/src/builtin/RegExp.cpp
> @@ +170,5 @@
> >
> > return CreateRegExpMatchResult(cx, input, chars, length, matches, rval);
> > }
> >
> > +static bool isLineTerminator(jschar ch) {
>
> Static functions generally begin with a capital letter in our style:
> IsLineTerminator().
OK, I've got it. I'll change it.
Is there Coding style guide about SpiderMonkey? I want to read it.
> @@ +178,3 @@
> > /* Note: returns the original if no escaping need be performed. */
> > static JSAtom *
> > EscapeNakedForwardSlashes(JSContext *cx, HandleAtom unescaped)
>
> This function needs a rename. Maybe EscapeNakedCharacters()?
Right. I'll rename it.
> @@ +191,5 @@
> >
> > for (const jschar *it = oldChars; it < oldChars + oldLen; ++it) {
> > + const jschar ch = *it;
> > + if (!previousCharacterWasBackslash) {
> > + if (inBrackets) {
>
> I don't understand the need for logic surrounding the brackets. Why are
> forward slashes not escaped if they occur within brackets? A comment
> explaining the reasoning would be helpful for posterity (and to me!).
As Michael wrote, this is because RegExp character classes allow any character except for ].
So "[/]" should not be escaped. This is acceptable RegExp literal.
> @@ +205,5 @@
> > + }
> > + if (!sb.append('\\'))
> > + return NULL;
> > + } else if (ch == '[')
> > + inBrackets = true;
>
> Nit: SpiderMonkey style requires brackets on all arms of an if{} structure
> if one arm has brackets.
I see. I'll fix it.
> @@ +214,3 @@
> > if (sb.empty()) {
> > /* This is the first one we've seen, copy everything up to this point. */
> > if (!sb.reserve(oldLen + 1))
>
> This isn't quite the right length to reserve in this case, but it may not
> matter.
>
> @@ +219,5 @@
> > }
> > +
> > + if (!previousCharacterWasBackslash)
> > + if (!sb.append('\\'))
> > + return NULL;
>
> The code from sb.empty() to here is roughly repeated twice in this function.
> Could the logic be moved out to a JS_ALWAYS_INLINE helper function?
I've got it. I'll create a helper function and move these logic to it.
> @@ +221,5 @@
> > + if (!previousCharacterWasBackslash)
> > + if (!sb.append('\\'))
> > + return NULL;
> > +
> > + switch (ch) {
>
> This switch statement could be better expressed by having cases like:
>
> > char *expansion = NULL;
> > switch(ch) {
> > case '\n': expansion = "n"; break;
> > ...
> > default: JS_NOT_REACHED("Unexpected line terminator");
> > }
> >
> > if (!sb.append(expansion))...
>
> which would permit a single callsite for .append() and reduce vertical code
> size.
You're right. I'll change it.
> @@ +245,5 @@
> > return NULL;
> > }
> >
> > + if (previousCharacterWasBackslash)
> > + previousCharacterWasBackslash = false;
>
> What happens if the previous character was a backslash, and the current
> character is also a backslash? Based on the old code working, it seems like
> the situation can't arise. Could we do
> |JS_ASSERT_IF(previousCharacterWasBackslash, ch != '\\')|?
When `new RegExp('\\\\')` is executed (2 backslashes), previousCharacterWasBackslash = true & ch = '\\' at 2nd position.
In current code, when it is executed, code produces '\\\\' (2 backslashes) string. (RegExp Literal, /\\/)
> @@ +247,5 @@
> >
> > + if (previousCharacterWasBackslash)
> > + previousCharacterWasBackslash = false;
> > + else
> > + previousCharacterWasBackslash = ch == '\\';
>
> As a very minor style nit, it would read more nicely with parentheses around
> "ch == '\\'".
OK, I'll rewirte it to `(ch == '\\')`
Assignee | ||
Updated•10 years ago
|
Assignee: general → nobody
Comment 12•10 years ago
|
||
Fixed by bug 1130860. I saw but somehow neglected this bug when filing bug 1130860. (In my defense, we had an almost correct patch for it in bug 1120169, so I didn't look too closely.)
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•