Closed Bug 1130860 Opened 9 years ago Closed 9 years ago

Implement all of EscapeRegExpPattern instead of just escaping forward slashes

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla38
Tracking Status
firefox38 --- fixed

People

(Reporter: till, Assigned: arai)

References

(Blocks 1 open bug, )

Details

(Keywords: dev-doc-complete)

Attachments

(1 file)

(In reply to Tooru Fujisawa [:arai] from bug 1120169 comment #3)
> Created attachment 8560944 [details] [diff] [review]
> Implement RegExp.prototype.{global, ignoreCase, multiline, source, sticky,
> unicode}.
> 
> Thank you for reviewing :)
> 
> This patch implements flags, and simplified source accessors without any
> change to its return value.
> 
> (In reply to Till Schneidereit [:till] from comment #2)
> > That third one, however, isn't required or even correct: we already escape
> > the result of .source in the right way. You can test this by doing something
> > like the following:
> > var re1 = new RegExp("\r\n\u2028\u2029");
> > var re2 = new RegExp(re1.source);
> > print(re1.source === re2.source);
> > 
> > This prints `true` with the current code, but would print `false` with your
> > change. Also, new RegExp(/\r\n\u2028\u2029/).source gives the result you
> > expect, with double-\\s.
> 
> With my last patch, it also prints true.
> 
> Then, in EscapeRegExpPattern's description
> > 2. The code points / or any LineTerminator occurring in the pattern shall be
> > escaped in S as necessary to ensure that the String value formed by
> > concatenating the Strings "/", S, "/", and F can be parsed (in an appropriate
> > lexical context) as a RegularExpressionLiteral that behaves identically to
> > the constructed regular expression.
> Sorry If I'm wrong, I think
> > concatenating the Strings "/", S, "/", and F
> means
>   "/" + re1.source + "/" + re1.flags
> and
> > can be parsed as a RegularExpressionLiteral
> means
>   eval("/" + re1.source + "/" + re1.flags)
> shouldn't throw SyntaxError, also, following should print true.
>   print(eval("/" + re1.source + "/" + re1.flags).source === re1.source)
> 
> Currently re1.source returns raw LineTerminators, and it cannot be written
> in RegularExpressionLiteral, am I correct?
> 
> I found that bug 866367 is related to this.

Yeah, you're completely right about all of this. The two things that threw me off are the way we print strings in the shell - escaped - and the fact that we already do escape forward slashes. Sorry for the confusion.

So, I think we should implement this not by changing what the .source getter does, but what CompileRegExpObject does. Specifically, rename EscapeNakedForwardSlashes to EscapeRegExpPattern and have it escape all the required tokens.

I assigned this bug to you, but obviously feel free to unassign yourself if you don't want to work on it.
Hmm, after reading [1] I'm not so sure about my suggested plan. Another option would be move all escaping into the .source getter instead, including the forward slash. There might be some subtleties around that somewhere, but I couldn't find any.

[1] https://code.google.com/p/v8/issues/detail?id=1813#c4
Blocks: es6
Keywords: dev-doc-needed
Renamed EscapeNakedForwardSlashes to EscapeRegExpPattern and moved it to source getter and RegExpObject::toString.

Also, as mentioned in bug 866367, avoid escaping slash inside bracket.

Green on try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f06f8e6fb8e2
Attachment #8561222 - Flags: review?(till)
Comment on attachment 8561222 [details] [diff] [review]
Implement all of EscapeRegExpPattern instead of just escaping forward slashes.

Review of attachment 8561222 [details] [diff] [review]:
-----------------------------------------------------------------

Very nice, thanks!

r=me with nits addressed.

::: js/src/builtin/RegExp.cpp
@@ +448,2 @@
>      /* Step 7. */
> +    RootedAtom str(cx, EscapeRegExpPattern(cx, src, flags));

We don't need to atomize the string here anymore. That was required in the old implementation because the result was stored on the regexp, but here we just return it to script, so we can use a RootedString instead.

::: js/src/tests/ecma_6/RegExp/escape.js
@@ +1,1 @@
> +var BUGNUMBER = 866367;

Hum, I didn't look at the other bug closely enough. I think this should mention this bug number instead. Given that we have a landable patch here, bug 866367 is pretty much a dupe, but we can just resolve it as fixed once this one is fixed.

::: js/src/vm/RegExpObject.cpp
@@ +367,5 @@
>      self->setSticky(flags & StickyFlag);
>      return true;
>  }
>  
> +MOZ_ALWAYS_INLINE bool

All the new functions here except for EscapeRegExpPattern should be static.

@@ +427,5 @@
> +template <typename CharT>
> +MOZ_ALWAYS_INLINE bool
> +SetupBuffer(StringBuffer &sb, const CharT *oldChars, size_t oldLen, const CharT *it)
> +{
> +    // This is the first one we've seen, copy everything up to this point.

Move this comment to where `SetupBuffer` is called, and change it to "This is the first char we've seen that needs escaping, copy everything up to this point."

@@ +519,5 @@
> +    if (src->length() == 0)
> +        return cx->names().emptyRegExp;
> +
> +    // Steps 2-3.
> +    return EscapeNakedCharacters(cx, src);

Inline EscapeNakedCharacters into this function, and change the name of the remaining EscapeNakedCharacters to also be EscapeRegExpPattern.

@@ +531,5 @@
> +    RootedAtom src(cx, getSource());
> +    if (!src)
> +        return nullptr;
> +    RegExpFlag flags = getFlags();
> +    RootedAtom escapedSrc(cx, EscapeRegExpPattern(cx, src, flags));

No need to get the flags and pass them in until we actually use them. In fact, this will probably cause a compiler warning right now.
Attachment #8561222 - Flags: review?(till) → review+
https://hg.mozilla.org/mozilla-central/rev/18240dad751f
https://hg.mozilla.org/mozilla-central/rev/87d812111396
Status: NEW → RESOLVED
Closed: 9 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
You need to log in before you can comment on or make changes to this bug.