Closed Bug 866367 Opened 11 years ago Closed 9 years ago

new RegExp('\n').toString() generates invalid RegularExpressionLiteral format string

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: utatane.tea, Unassigned)

References

Details

Attachments

(1 file, 1 obsolete file)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.65 Safari/537.31

Steps to reproduce:

Executes `new RegExp('\n').toString()`


Actual results:

The script returns `"/\n/"`


Expected results:

The script should return `"/\\n/"`
Previously I submitted the patch to JSC to solve the same problem and it is landed.
https://bugs.webkit.org/show_bug.cgi?id=71572
So I'll implement the same logic to SpiderMonkey
Attached patch rev1 patch (obsolete) — Splinter Review
Attachment #742677 - Flags: review?(sstangl)
In section 15.10.6.4,

> NOTE The returned String has the form of a RegularExpressionLiteral that evaluates to another RegExp object with the same behaviour as this object.
http://ecma-international.org/ecma-262/5.1/#sec-15.10.6.4

So returning "/\n/" is a spec violation.
Hi Yusuke, thank you for fixing this bug in Firefox!

It looks like it could be solved in a simpler manner, though, by modifying the RegExp source string at the time of object creation. There's already a function EscapeNakedForwardSlashes() in builtin/RegExp.cpp that performs a similar function -- would you be able to rewrite this function to similarly handle '\n' and the other LineTerminator characters?
Status: UNCONFIRMED → NEW
Ever confirmed: true
The issue of /\n/.source and /\n/.toString() semantics was brought up semi-recently on es-discuss:

https://mail.mozilla.org/pipermail/es-discuss/2013-March/029278.html

It'd be great if you spent the time to figure out steps/semantics that would always produce well-formed regular expression syntax, then tested them against implementations and proposed them there.  ;-)  If you don't have time to do this, that's understandable, but it'd definitely be helpful to have a clear description of the exact escaping behavior implemented, for reviewing purposes.  And it'd be really nice if the spec had clear, non-vague steps for all this, so that all implementations can get on the same page.
(In reply to Sean Stangl [:sstangl] from comment #4)

Thanks for your review!

> It looks like it could be solved in a simpler manner, though, by modifying
> the RegExp source string at the time of object creation. There's already a
> function EscapeNakedForwardSlashes() in builtin/RegExp.cpp that performs a
> similar function -- would you be able to rewrite this function to similarly
> handle '\n' and the other LineTerminator characters?

Oh, I've missed EscapeNakedForwardSlashes. Agreed. Rewriting EscapeNakedForwardSlashes is simpler.
I'll fix it and attach a revised patch.

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #5)
> The issue of /\n/.source and /\n/.toString() semantics was brought up
> semi-recently on es-discuss:
> 
> https://mail.mozilla.org/pipermail/es-discuss/2013-March/029278.html
> 
> It'd be great if you spent the time to figure out steps/semantics that would
> always produce well-formed regular expression syntax, then tested them
> against implementations and proposed them there.  ;-) 

Agreed. Thanks for your suggestion. I'll investigate the steps/semantics and post it to es-discuss.
Attached patch rev2 patchSplinter Review
I've created rev2 patch.
Moving logic to EscapeNakedForwardSlashes.
Attachment #742677 - Attachment is obsolete: true
Attachment #742677 - Flags: review?(sstangl)
Attachment #746028 - Flags: review?(sstangl)
And above this patch also fixes an issue in regression 248444 fix.
RegExp("[^\/]+$").toString() was expected '/[^\\/]+$/'. But because '/' can be exists in brackets, '/[^/]+$/' is correct.
This patch covers this backslash problem at the same time :)
Comment on attachment 746028 [details] [diff] [review]
rev2 patch

Review of attachment 746028 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the patch! This looks good -- I just had some minor nits, and I did not understand the need for bracket-tracking logic. Adding an explanatory comment would be sufficient.

::: js/src/builtin/RegExp.cpp
@@ +170,5 @@
>  
>      return CreateRegExpMatchResult(cx, input, chars, length, matches, rval);
>  }
>  
> +static bool isLineTerminator(jschar ch) {

Static functions generally begin with a capital letter in our style: IsLineTerminator().

@@ +178,3 @@
>  /* Note: returns the original if no escaping need be performed. */
>  static JSAtom *
>  EscapeNakedForwardSlashes(JSContext *cx, HandleAtom unescaped)

This function needs a rename. Maybe EscapeNakedCharacters()?

@@ +191,5 @@
>  
>      for (const jschar *it = oldChars; it < oldChars + oldLen; ++it) {
> +        const jschar ch = *it;
> +        if (!previousCharacterWasBackslash) {
> +            if (inBrackets) {

I don't understand the need for logic surrounding the brackets. Why are forward slashes not escaped if they occur within brackets? A comment explaining the reasoning would be helpful for posterity (and to me!).

@@ +205,5 @@
> +                    }
> +                    if (!sb.append('\\'))
> +                        return NULL;
> +                } else if (ch == '[')
> +                    inBrackets = true;

Nit: SpiderMonkey style requires brackets on all arms of an if{} structure if one arm has brackets.

@@ +214,3 @@
>              if (sb.empty()) {
>                  /* This is the first one we've seen, copy everything up to this point. */
>                  if (!sb.reserve(oldLen + 1))

This isn't quite the right length to reserve in this case, but it may not matter.

@@ +219,5 @@
>              }
> +
> +            if (!previousCharacterWasBackslash)
> +                if (!sb.append('\\'))
> +                    return NULL;

The code from sb.empty() to here is roughly repeated twice in this function. Could the logic be moved out to a JS_ALWAYS_INLINE helper function?

@@ +221,5 @@
> +            if (!previousCharacterWasBackslash)
> +                if (!sb.append('\\'))
> +                    return NULL;
> +
> +            switch (ch) {

This switch statement could be better expressed by having cases like:

> char *expansion = NULL;
> switch(ch) {
>   case '\n': expansion = "n"; break;
>   ...
>   default: JS_NOT_REACHED("Unexpected line terminator");
> }
>
> if (!sb.append(expansion))...

which would permit a single callsite for .append() and reduce vertical code size.

@@ +245,5 @@
>                  return NULL;
>          }
>  
> +        if (previousCharacterWasBackslash)
> +            previousCharacterWasBackslash = false;

What happens if the previous character was a backslash, and the current character is also a backslash? Based on the old code working, it seems like the situation can't arise. Could we do |JS_ASSERT_IF(previousCharacterWasBackslash, ch != '\\')|?

@@ +247,5 @@
>  
> +        if (previousCharacterWasBackslash)
> +            previousCharacterWasBackslash = false;
> +        else
> +            previousCharacterWasBackslash = ch == '\\';

As a very minor style nit, it would read more nicely with parentheses around "ch == '\\'".
Attachment #746028 - Flags: review?(sstangl)
(In reply to Sean Stangl [:sstangl] from comment #9)

> I don't understand the need for logic surrounding the brackets. Why are forward slashes not escaped if they occur within brackets? A comment explaining the reasoning would be helpful for posterity (and to me!).

See ES5 section 15.10.2.17: http://es5.github.io/#x15.10.2.17

RegExp character classes can contain any character unescaped except for a right square bracket (`]`).
(In reply to Sean Stangl [:sstangl] from comment #9)

Thank you for your review. I'll attach a revised patch.

> Comment on attachment 746028 [details] [diff] [review]
> rev2 patch
> 
> Review of attachment 746028 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Thanks for the patch! This looks good -- I just had some minor nits, and I
> did not understand the need for bracket-tracking logic. Adding an
> explanatory comment would be sufficient.
> 
> ::: js/src/builtin/RegExp.cpp
> @@ +170,5 @@
> >  
> >      return CreateRegExpMatchResult(cx, input, chars, length, matches, rval);
> >  }
> >  
> > +static bool isLineTerminator(jschar ch) {
> 
> Static functions generally begin with a capital letter in our style:
> IsLineTerminator().

OK, I've got it. I'll change it.
Is there Coding style guide about SpiderMonkey? I want to read it.

> @@ +178,3 @@
> >  /* Note: returns the original if no escaping need be performed. */
> >  static JSAtom *
> >  EscapeNakedForwardSlashes(JSContext *cx, HandleAtom unescaped)
> 
> This function needs a rename. Maybe EscapeNakedCharacters()?

Right. I'll rename it.

> @@ +191,5 @@
> >  
> >      for (const jschar *it = oldChars; it < oldChars + oldLen; ++it) {
> > +        const jschar ch = *it;
> > +        if (!previousCharacterWasBackslash) {
> > +            if (inBrackets) {
> 
> I don't understand the need for logic surrounding the brackets. Why are
> forward slashes not escaped if they occur within brackets? A comment
> explaining the reasoning would be helpful for posterity (and to me!).

As Michael wrote, this is because RegExp character classes allow any character except for ].
So "[/]" should not be escaped. This is acceptable RegExp literal.

> @@ +205,5 @@
> > +                    }
> > +                    if (!sb.append('\\'))
> > +                        return NULL;
> > +                } else if (ch == '[')
> > +                    inBrackets = true;
> 
> Nit: SpiderMonkey style requires brackets on all arms of an if{} structure
> if one arm has brackets.

I see. I'll fix it.

> @@ +214,3 @@
> >              if (sb.empty()) {
> >                  /* This is the first one we've seen, copy everything up to this point. */
> >                  if (!sb.reserve(oldLen + 1))
> 
> This isn't quite the right length to reserve in this case, but it may not
> matter.
> 
> @@ +219,5 @@
> >              }
> > +
> > +            if (!previousCharacterWasBackslash)
> > +                if (!sb.append('\\'))
> > +                    return NULL;
> 
> The code from sb.empty() to here is roughly repeated twice in this function.
> Could the logic be moved out to a JS_ALWAYS_INLINE helper function?

I've got it. I'll create a helper function and move these logic to it.

> @@ +221,5 @@
> > +            if (!previousCharacterWasBackslash)
> > +                if (!sb.append('\\'))
> > +                    return NULL;
> > +
> > +            switch (ch) {
> 
> This switch statement could be better expressed by having cases like:
> 
> > char *expansion = NULL;
> > switch(ch) {
> >   case '\n': expansion = "n"; break;
> >   ...
> >   default: JS_NOT_REACHED("Unexpected line terminator");
> > }
> >
> > if (!sb.append(expansion))...
> 
> which would permit a single callsite for .append() and reduce vertical code
> size.

You're right. I'll change it.

> @@ +245,5 @@
> >                  return NULL;
> >          }
> >  
> > +        if (previousCharacterWasBackslash)
> > +            previousCharacterWasBackslash = false;
> 
> What happens if the previous character was a backslash, and the current
> character is also a backslash? Based on the old code working, it seems like
> the situation can't arise. Could we do
> |JS_ASSERT_IF(previousCharacterWasBackslash, ch != '\\')|?

When `new RegExp('\\\\')` is executed (2 backslashes), previousCharacterWasBackslash = true & ch = '\\' at 2nd position.
In current code, when it is executed, code produces '\\\\' (2 backslashes) string. (RegExp Literal, /\\/)

> @@ +247,5 @@
> >  
> > +        if (previousCharacterWasBackslash)
> > +            previousCharacterWasBackslash = false;
> > +        else
> > +            previousCharacterWasBackslash = ch == '\\';
> 
> As a very minor style nit, it would read more nicely with parentheses around
> "ch == '\\'".

OK, I'll rewirte it to `(ch == '\\')`
Assignee: general → nobody
Depends on: 1130860
Fixed by bug 1130860. I saw but somehow neglected this bug when filing bug 1130860. (In my defense, we had an almost correct patch for it in bug 1120169, so I didn't look too closely.)
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: