Add a utility function to escape strings for use in regexes

RESOLVED FIXED

Status

Testing Graveyard
Mozmill
RESOLVED FIXED
6 years ago
8 months ago

People

(Reporter: geo, Assigned: lfresh, Mentored)

Tracking

Details

(Whiteboard: [lib][lang=js][good first bug][mozmill-2.1], URL)

Attachments

(1 attachment, 2 obsolete attachments)

Brought to mind when reviewing Bug 669728.

We should add the escape() function at the bottom of http://simonwillison.net/2006/Jan/20/escape/ to our utils library (or augment the RegExp object, as in the example below) for times when we want to include externally-generated strings in a regex.

RegExp.escape = function(text) {
    return text.replace(/[-[\]{}()*+?.,\\^$|#\s]/g, "\\$&");
}
What about a module for string operations? I don't think we should augment any global object (remember the discussion about Object), but a method would be kinda helpful. What do you think?
OS: Mac OS X → All
Hardware: x86 → All

Comment 2

6 years ago
A Pivotal Tracker story has been created for this Bug: https://www.pivotaltracker.com/story/show/15817983
Sure, makes sense to me.
Ok, should be able to get to it for the refactoring round of Milestone 2.
Blocks: 639545

Updated

6 years ago
Depends on: 686711
No longer depends on: 686711
Even with the refactoring project being dead, it might be good to get this added at some point.
QA Contact: hskupin
Whiteboard: [module-refactor] → [lib][mentor=whimboo][lang=js][good first bug]
Mentor: hskupin@gmail.com
Whiteboard: [lib][mentor=whimboo][lang=js][good first bug] → [lib][lang=js][good first bug]
(Assignee)

Comment 6

3 years ago
Created attachment 8496965 [details] [diff] [review]
Added regExpEscapeString helper to utils library

Attached a patch that adds regExpEscapeString helper to utils library

Exported the helper function to make escaping externally generated strings easier. I am a new contributor and this is my first patch submission. I look forward to any feedback.
Attachment #8496965 - Flags: review?(hskupin)
Hi Lisa. Thank you a lot for this patch. It's good to see that we got your very first submission. I will have a look at it in a bit. For now I will mark you as assignee to this bug, so everyone knows that you are working on it.
Assignee: nobody → bugz42
Status: NEW → ASSIGNED
So after the years I made an audit of the code, which we want to have here. Given one of the latest Mozmill releases we ship the assertions module enabled by default for tests now. See bug 999393. Originally we hit this regex problem in the findCallerFrame() method, which is part of the stack module. Both modules are part of Mozmill now, and we should better make use of it directly in the test framework. So I'm going to move this bug to the Mozmill component.

So what would have to be done here? In the following a guideline:

* Fix the nits, which I will mention in my review
* Add the method to the strings module at https://github.com/mozilla/mozmill/blob/master/mozmill/mozmill/extension/resource/stdlib/strings.js
* Update the findCallerFrame() method to make use of the new logic. Keep in mind that we have to retain '/(.*)-> /' because of the mangled paths by the securable-modules:
https://github.com/mozilla/mozmill/blob/master/mozmill/mozmill/extension/resource/modules/stack.js#L22

When you setup Mozmill you can run the Mutt tests, especially this one which checks for the stack.

https://github.com/mozilla/mozmill/blob/master/mutt/mutt/tests/python/test_expect_stack.py

I'm still considering it as a good first bug. But if you have questions please let me know here or on IRC in the #automation channel. Thanks!
Component: Mozmill Tests → Mozmill
Product: Mozilla QA → Testing
Comment on attachment 8496965 [details] [diff] [review]
Added regExpEscapeString helper to utils library

Review of attachment 8496965 [details] [diff] [review]:
-----------------------------------------------------------------

It may have become more comments as I thought, but mostly those are nits. But the improvement we could do regarding performance, should be done.

Given that we have to move products, I flipped the review over to a feedback request. f=me.

::: lib/utils.js
@@ +424,5 @@
>  
>  /**
> + * Helper function for use when including externally generated strings
> + * in a regular expression.
> + *

nit: 'to use'

Maybe we should also add a @see link to the original post, which meanwhile has been move to http://blog.simonwillison.net/post/57956816139/escape.

@@ +426,5 @@
> + * Helper function for use when including externally generated strings
> + * in a regular expression.
> + *
> + * @param {String} aString
> + *        text to be escaped

`typeof "asd"` returns "string" as type. So please use a lowercase letter. Also the description should start with a capital letter.

@@ +429,5 @@
> + * @param {String} aString
> + *        text to be escaped
> + *
> + * @return {String}
> + *         Returns a new string where the specified values are escaped

I would say 'New string with metacharacters escaped'

@@ +431,5 @@
> + *
> + * @return {String}
> + *         Returns a new string where the specified values are escaped
> + */
> +function regExpEscapeString(aString) {

What about calling it simply ┬┤escape()┬┤?

@@ +432,5 @@
> + * @return {String}
> + *         Returns a new string where the specified values are escaped
> + */
> +function regExpEscapeString(aString) {
> +    return aString.replace(/[-[\]{}()*+?.,\\^$|#\s]/g, "\\$&");

I cannot remember what the author of the post has used a couple of years ago, but checking the above URL I can see differences in the code. I kinda like the stashing of the temporary created RegEx instance on the function object. That way no new RegEx instance would have to be instantiated each time the method is used.
Attachment #8496965 - Flags: review?(hskupin) → feedback+
(Assignee)

Comment 10

3 years ago
In looking at the blog post mentioned in the previous comment, http://blog.simonwillison.net/post/57956816139/escape. The example makes use of arguments.callee and it looks like this attribute is going to be removed soon. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Functions/arguments/callee. I'd like to use a traditional closure instead.
That would also work great. Good catch regarding the deprecation!
(Assignee)

Comment 12

3 years ago
Great! That being said, I am going to skip the "@see link" nit. 

I know we discussed this already but in reference to the function name escape(), is there the possibility of needing to escape any other kind of data (html, sql, etc,) in the future? If so then I think the name should be less general but if not then I will move forward with escape() as requested.
(Assignee)

Comment 13

3 years ago
> * Update the findCallerFrame() method to make use of the new logic. Keep in mind that we have to retain '/(.*)-> /' because of the mangled paths by the securable-modules:
> https://github.com/mozilla/mozmill/blob/master/mozmill/mozmill/extension/resource/modules/stack.js#L22

I looked at this method and don't understand what in it would need to be escaped. Could you explain what needs escaping?
Sure. So what needs to be escaped is the string for the filename, which we prepare in lines 24 and 29. The securable module creates entries like "%path%/securable-modules.js -> %path%/modules/stack.js". To get the real file name we remove all content up to "-> ". So why this escape is necessary you can find in bug 669728 comment 2, which is referenced in comment 0. Means problems can occur with e.g. temp locations like:

"/var/folders/ur/urRSHOmhFJOHFVCrfZAVW++++TI/-Tmp-/tmpfgdil4.mozmill-tests
(Assignee)

Comment 15

3 years ago
When looking over bug 669728 I see that frame.filename.match was changed to frame.filename.indexOf. indexOf doesn't take a regex as an argument so if I escape the string for the filename I think it will break things.
Sure, so what we would need here is a separate line of code, which is doing the escaping. Or wrap the line into the escaping call if we don't exceed the 80 char limit.
(Assignee)

Comment 17

3 years ago
I don't know what I am supposed to be escaping in https://github.com/mozilla/mozmill/blob/master/mozmill/mozmill/extension/resource/modules/stack.js#L22 with escapeRegEx() since there are no regular expressions passed in to this code anywhere.
May last answer was not that perfect. Sorry for that. So we can revert the change, which has been made on bug 669728. Now that we can escape the regex correctly, the match() function can be used again. Maybe it would be good to have a test, which tests such a temp path for getting the stack.
(Assignee)

Comment 19

3 years ago
Since we don't need to match any variable text, indexOf() is clearly the best function to use. I don't think bug 669728 should be reverted. If you are worried that the other function I wrote is no longer necessary, that's ok. I am more than happy to take on a different bug.
Please see comment 8 on this bug. I mentioned two locations where a regex can fail:

https://github.com/mozilla/mozmill/blob/master/mozmill/mozmill/extension/resource/modules/stack.js#L24
https://github.com/mozilla/mozmill/blob/master/mozmill/mozmill/extension/resource/modules/stack.js#L29

Indeed we want to keep indexOf() for the stripped filename path, but the call to replace() some lines earlier will fail in both cases when such a temporary path exists.

Comment 21

3 years ago
(In reply to Henrik Skupin (:whimboo) from comment #18)
> May last answer was not that perfect. Sorry for that.

Weelllll... technically, Lisa pointed out in comments 13, 15, and 17 that the bug was gone, but it doesn't seem that you got around to actually checking the code until comment 18. You apologized for phrasing something poorly; but what you actually need to apologize for is 1) ignoring what someone told you, correctly, directly and repeatedly; and 2) not recognizing back at comment 6 that the bug had been fixed since 2011. By you.

> So we can revert the
> change, which has been made on bug 669728. Now that we can escape the regex
> correctly, the match() function can be used again.

This doesn't make sense. Using indexOf is correct, simpler, and probably faster. (Was the intent "let's reintroduce the bug so Lisa can fix it"? Because that'd be pretty condescending.)

Ideally:

- You should admit that you misunderstood the code, despite Lisa repeatedly pointing out the correct reading to you, and apologize for ignoring what she was telling you.

- Close this bug WORKSFORME or DUPLICATE 669728 or whatever.

If you want the BLAH-> prefixes omitted from the stack trace, that should be a separate bug. That's not what findCallerFrame has ever done; it just returns a selected tail of the call stack.
Status: ASSIGNED → NEW
After my comment 20 Lisa and I had another chat on IRC, where we agreed on what you said above. None from us both actually added a comment on that bug. And I'm indeed sorry about that I wasn't looking closely enough at the code.

As what I mentioned, it would be nice to have the method added to the strings.js module in Mozmill, so we can make use of it when needed. But if that is not an option for the work as done by Lisa anymore, someone else might wanna finish it. 

> If you want the BLAH-> prefixes omitted from the stack trace, that should be
> a separate bug. That's not what findCallerFrame has ever done; it just
> returns a selected tail of the call stack.

Looks like another misunderstanding. I didn't expect to get this fixed by this bug. I will make sure to improve myself to give more clear answers.
(Assignee)

Comment 23

3 years ago
Created attachment 8503322 [details] [diff] [review]
671705_stringsjs.patch

Added escapeRegEx helper to strings.js
Attachment #8496965 - Attachment is obsolete: true
Attachment #8503322 - Flags: review?(hskupin)
Comment on attachment 8503322 [details] [diff] [review]
671705_stringsjs.patch

Review of attachment 8503322 [details] [diff] [review]:
-----------------------------------------------------------------

There are some nits which need to be fixed. Otherwise it looks great. Thanks Lisa. r=me with those changes done.

::: mozmill/mozmill/extension/resource/stdlib/strings.js
@@ +2,4 @@
>   * License, v. 2.0. If a copy of the MPL was not distributed with this
>   * file, you can obtain one at http://mozilla.org/MPL/2.0/. */
>  
> +var EXPORTED_SYMBOLS = ['trim', 'vslice', 'escapeRegEx'];

nit: Mind fixing the order by alphabet?

@@ +16,5 @@
>    return str.slice(sindex + 1, eindex);
>  }
> +
> +/**
> +* Helper function to use when including externally generated strings

nit: The stars starting from the 2nd line should have an indentation of 1. So they are below each other.

@@ +23,5 @@
> +* @param {string} aString
> +*        Text to be escaped.
> +*
> +* @return {String}
> +*         Returns a new string with metacharacters escaped.

For the format please see: https://code.google.com/p/jsdoc-toolkit/wiki/TagReturns. Also please use {string} with lowercase letters.

@@ +29,5 @@
> +
> +var sRE = /[-[\]{}()*+?.,\\^$|#\s]/g;
> +
> +function escapeRegEx(aString) {
> +    return aString.replace(sRE, "\\$&");

nit: there is still an indentation of 4 chars. Please reduce it to 2 and we are aligned.

I kinda like the change you did here, especially by using $&. I have never seen it before, so I had to look it up first. But it makes sense and saves us the extra parenthesis.
Attachment #8503322 - Flags: review?(hskupin) → review+
(Assignee)

Comment 25

3 years ago
Created attachment 8503879 [details] [diff] [review]
671705_stringsjs.patch

Fixed nits. Also moved var sRE.
Attachment #8503322 - Attachment is obsolete: true
Attachment #8503879 - Flags: review?(hskupin)
Comment on attachment 8503879 [details] [diff] [review]
671705_stringsjs.patch

Review of attachment 8503879 [details] [diff] [review]:
-----------------------------------------------------------------

Looks fine now! Thank you Lisa. I will get this landed in a bit.
Attachment #8503879 - Flags: review?(hskupin) → review+
Landed on master for the mozmill 2.1 release:
https://github.com/mozilla/mozmill/commit/ae8d7b65e64d68ba97e5f50ccf2093bc084edfeb

Lisa, you wanted to file the other bug regarding the filenames. Have you done this already? I ask, because I may have missed it.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Flags: needinfo?(bugz42)
Resolution: --- → FIXED
Whiteboard: [lib][lang=js][good first bug] → [lib][lang=js][good first bug][mozmill-2.1]
(Assignee)

Comment 28

3 years ago
(In reply to Henrik Skupin (:whimboo) from comment #27)
> Landed on master for the mozmill 2.1 release:
> https://github.com/mozilla/mozmill/commit/
> ae8d7b65e64d68ba97e5f50ccf2093bc084edfeb
> 
> Lisa, you wanted to file the other bug regarding the filenames. Have you
> done this already? I ask, because I may have missed it.

I haven't filed it yet. I was unsure of the wording.
Flags: needinfo?(bugz42)
Product: Testing → Testing Graveyard
You need to log in before you can comment on or make changes to this bug.