Last Comment Bug 671705 - Add a utility function to escape strings for use in regexes
: Add a utility function to escape strings for use in regexes
Status: RESOLVED FIXED
[lib][lang=js][good first bug][mozmil...
:
Product: Testing Graveyard
Classification: Graveyard
Component: Mozmill (show other bugs)
: unspecified
: All All
-- normal
: ---
Assigned To: Lisa Hewus Fresh [:lfresh]
: Henrik Skupin (:whimboo)
:
Mentors: Henrik Skupin (:whimboo)
https://wiki.mozilla.org/Auto-tools/P...
Depends on:
Blocks: 639545
  Show dependency treegraph
 
Reported: 2011-07-14 17:01 PDT by Geo Mealer [:geo] -- This account is inactive after 2015-07-07
Modified: 2016-08-24 09:02 PDT (History)
3 users (show)
See Also:
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
Added regExpEscapeString helper to utils library (1.01 KB, patch)
2014-09-29 10:51 PDT, Lisa Hewus Fresh [:lfresh]
hskupin: feedback+
Details | Diff | Splinter Review
671705_stringsjs.patch (1.12 KB, patch)
2014-10-10 11:14 PDT, Lisa Hewus Fresh [:lfresh]
hskupin: review+
Details | Diff | Splinter Review
671705_stringsjs.patch (1.04 KB, patch)
2014-10-12 20:28 PDT, Lisa Hewus Fresh [:lfresh]
hskupin: review+
Details | Diff | Splinter Review

Description User image Geo Mealer [:geo] -- This account is inactive after 2015-07-07 2011-07-14 17:01:39 PDT
Brought to mind when reviewing Bug 669728.

We should add the escape() function at the bottom of http://simonwillison.net/2006/Jan/20/escape/ to our utils library (or augment the RegExp object, as in the example below) for times when we want to include externally-generated strings in a regex.

RegExp.escape = function(text) {
    return text.replace(/[-[\]{}()*+?.,\\^$|#\s]/g, "\\$&");
}
Comment 1 User image Henrik Skupin (:whimboo) 2011-07-15 04:24:11 PDT
What about a module for string operations? I don't think we should augment any global object (remember the discussion about Object), but a method would be kinda helpful. What do you think?
Comment 2 User image u415141 2011-07-15 06:25:15 PDT
A Pivotal Tracker story has been created for this Bug: https://www.pivotaltracker.com/story/show/15817983
Comment 3 User image Geo Mealer [:geo] -- This account is inactive after 2015-07-07 2011-07-15 12:46:50 PDT
Sure, makes sense to me.
Comment 4 User image Henrik Skupin (:whimboo) 2011-07-18 03:03:38 PDT
Ok, should be able to get to it for the refactoring round of Milestone 2.
Comment 5 User image Henrik Skupin (:whimboo) 2013-06-04 21:24:58 PDT
Even with the refactoring project being dead, it might be good to get this added at some point.
Comment 6 User image Lisa Hewus Fresh [:lfresh] 2014-09-29 10:51:16 PDT
Created attachment 8496965 [details] [diff] [review]
Added regExpEscapeString helper to utils library

Attached a patch that adds regExpEscapeString helper to utils library

Exported the helper function to make escaping externally generated strings easier. I am a new contributor and this is my first patch submission. I look forward to any feedback.
Comment 7 User image Henrik Skupin (:whimboo) 2014-09-30 00:50:32 PDT
Hi Lisa. Thank you a lot for this patch. It's good to see that we got your very first submission. I will have a look at it in a bit. For now I will mark you as assignee to this bug, so everyone knows that you are working on it.
Comment 8 User image Henrik Skupin (:whimboo) 2014-09-30 01:08:11 PDT
So after the years I made an audit of the code, which we want to have here. Given one of the latest Mozmill releases we ship the assertions module enabled by default for tests now. See bug 999393. Originally we hit this regex problem in the findCallerFrame() method, which is part of the stack module. Both modules are part of Mozmill now, and we should better make use of it directly in the test framework. So I'm going to move this bug to the Mozmill component.

So what would have to be done here? In the following a guideline:

* Fix the nits, which I will mention in my review
* Add the method to the strings module at https://github.com/mozilla/mozmill/blob/master/mozmill/mozmill/extension/resource/stdlib/strings.js
* Update the findCallerFrame() method to make use of the new logic. Keep in mind that we have to retain '/(.*)-> /' because of the mangled paths by the securable-modules:
https://github.com/mozilla/mozmill/blob/master/mozmill/mozmill/extension/resource/modules/stack.js#L22

When you setup Mozmill you can run the Mutt tests, especially this one which checks for the stack.

https://github.com/mozilla/mozmill/blob/master/mutt/mutt/tests/python/test_expect_stack.py

I'm still considering it as a good first bug. But if you have questions please let me know here or on IRC in the #automation channel. Thanks!
Comment 9 User image Henrik Skupin (:whimboo) 2014-09-30 01:24:45 PDT
Comment on attachment 8496965 [details] [diff] [review]
Added regExpEscapeString helper to utils library

Review of attachment 8496965 [details] [diff] [review]:
-----------------------------------------------------------------

It may have become more comments as I thought, but mostly those are nits. But the improvement we could do regarding performance, should be done.

Given that we have to move products, I flipped the review over to a feedback request. f=me.

::: lib/utils.js
@@ +424,5 @@
>  
>  /**
> + * Helper function for use when including externally generated strings
> + * in a regular expression.
> + *

nit: 'to use'

Maybe we should also add a @see link to the original post, which meanwhile has been move to http://blog.simonwillison.net/post/57956816139/escape.

@@ +426,5 @@
> + * Helper function for use when including externally generated strings
> + * in a regular expression.
> + *
> + * @param {String} aString
> + *        text to be escaped

`typeof "asd"` returns "string" as type. So please use a lowercase letter. Also the description should start with a capital letter.

@@ +429,5 @@
> + * @param {String} aString
> + *        text to be escaped
> + *
> + * @return {String}
> + *         Returns a new string where the specified values are escaped

I would say 'New string with metacharacters escaped'

@@ +431,5 @@
> + *
> + * @return {String}
> + *         Returns a new string where the specified values are escaped
> + */
> +function regExpEscapeString(aString) {

What about calling it simply ´escape()´?

@@ +432,5 @@
> + * @return {String}
> + *         Returns a new string where the specified values are escaped
> + */
> +function regExpEscapeString(aString) {
> +    return aString.replace(/[-[\]{}()*+?.,\\^$|#\s]/g, "\\$&");

I cannot remember what the author of the post has used a couple of years ago, but checking the above URL I can see differences in the code. I kinda like the stashing of the temporary created RegEx instance on the function object. That way no new RegEx instance would have to be instantiated each time the method is used.
Comment 10 User image Lisa Hewus Fresh [:lfresh] 2014-10-01 11:34:09 PDT
In looking at the blog post mentioned in the previous comment, http://blog.simonwillison.net/post/57956816139/escape. The example makes use of arguments.callee and it looks like this attribute is going to be removed soon. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Functions/arguments/callee. I'd like to use a traditional closure instead.
Comment 11 User image Henrik Skupin (:whimboo) 2014-10-01 13:22:29 PDT
That would also work great. Good catch regarding the deprecation!
Comment 12 User image Lisa Hewus Fresh [:lfresh] 2014-10-01 13:57:40 PDT
Great! That being said, I am going to skip the "@see link" nit. 

I know we discussed this already but in reference to the function name escape(), is there the possibility of needing to escape any other kind of data (html, sql, etc,) in the future? If so then I think the name should be less general but if not then I will move forward with escape() as requested.
Comment 13 User image Lisa Hewus Fresh [:lfresh] 2014-10-01 20:16:42 PDT
> * Update the findCallerFrame() method to make use of the new logic. Keep in mind that we have to retain '/(.*)-> /' because of the mangled paths by the securable-modules:
> https://github.com/mozilla/mozmill/blob/master/mozmill/mozmill/extension/resource/modules/stack.js#L22

I looked at this method and don't understand what in it would need to be escaped. Could you explain what needs escaping?
Comment 14 User image Henrik Skupin (:whimboo) 2014-10-02 00:13:15 PDT
Sure. So what needs to be escaped is the string for the filename, which we prepare in lines 24 and 29. The securable module creates entries like "%path%/securable-modules.js -> %path%/modules/stack.js". To get the real file name we remove all content up to "-> ". So why this escape is necessary you can find in bug 669728 comment 2, which is referenced in comment 0. Means problems can occur with e.g. temp locations like:

"/var/folders/ur/urRSHOmhFJOHFVCrfZAVW++++TI/-Tmp-/tmpfgdil4.mozmill-tests
Comment 15 User image Lisa Hewus Fresh [:lfresh] 2014-10-02 13:04:50 PDT
When looking over bug 669728 I see that frame.filename.match was changed to frame.filename.indexOf. indexOf doesn't take a regex as an argument so if I escape the string for the filename I think it will break things.
Comment 16 User image Henrik Skupin (:whimboo) 2014-10-06 08:33:42 PDT
Sure, so what we would need here is a separate line of code, which is doing the escaping. Or wrap the line into the escaping call if we don't exceed the 80 char limit.
Comment 17 User image Lisa Hewus Fresh [:lfresh] 2014-10-06 14:32:30 PDT
I don't know what I am supposed to be escaping in https://github.com/mozilla/mozmill/blob/master/mozmill/mozmill/extension/resource/modules/stack.js#L22 with escapeRegEx() since there are no regular expressions passed in to this code anywhere.
Comment 18 User image Henrik Skupin (:whimboo) 2014-10-06 14:49:55 PDT
May last answer was not that perfect. Sorry for that. So we can revert the change, which has been made on bug 669728. Now that we can escape the regex correctly, the match() function can be used again. Maybe it would be good to have a test, which tests such a temp path for getting the stack.
Comment 19 User image Lisa Hewus Fresh [:lfresh] 2014-10-06 15:28:36 PDT
Since we don't need to match any variable text, indexOf() is clearly the best function to use. I don't think bug 669728 should be reverted. If you are worried that the other function I wrote is no longer necessary, that's ok. I am more than happy to take on a different bug.
Comment 20 User image Henrik Skupin (:whimboo) 2014-10-06 15:58:09 PDT
Please see comment 8 on this bug. I mentioned two locations where a regex can fail:

https://github.com/mozilla/mozmill/blob/master/mozmill/mozmill/extension/resource/modules/stack.js#L24
https://github.com/mozilla/mozmill/blob/master/mozmill/mozmill/extension/resource/modules/stack.js#L29

Indeed we want to keep indexOf() for the stripped filename path, but the call to replace() some lines earlier will fail in both cases when such a temporary path exists.
Comment 21 User image Jim Blandy :jimb 2014-10-09 14:11:50 PDT
(In reply to Henrik Skupin (:whimboo) from comment #18)
> May last answer was not that perfect. Sorry for that.

Weelllll... technically, Lisa pointed out in comments 13, 15, and 17 that the bug was gone, but it doesn't seem that you got around to actually checking the code until comment 18. You apologized for phrasing something poorly; but what you actually need to apologize for is 1) ignoring what someone told you, correctly, directly and repeatedly; and 2) not recognizing back at comment 6 that the bug had been fixed since 2011. By you.

> So we can revert the
> change, which has been made on bug 669728. Now that we can escape the regex
> correctly, the match() function can be used again.

This doesn't make sense. Using indexOf is correct, simpler, and probably faster. (Was the intent "let's reintroduce the bug so Lisa can fix it"? Because that'd be pretty condescending.)

Ideally:

- You should admit that you misunderstood the code, despite Lisa repeatedly pointing out the correct reading to you, and apologize for ignoring what she was telling you.

- Close this bug WORKSFORME or DUPLICATE 669728 or whatever.

If you want the BLAH-> prefixes omitted from the stack trace, that should be a separate bug. That's not what findCallerFrame has ever done; it just returns a selected tail of the call stack.
Comment 22 User image Henrik Skupin (:whimboo) 2014-10-10 00:12:02 PDT
After my comment 20 Lisa and I had another chat on IRC, where we agreed on what you said above. None from us both actually added a comment on that bug. And I'm indeed sorry about that I wasn't looking closely enough at the code.

As what I mentioned, it would be nice to have the method added to the strings.js module in Mozmill, so we can make use of it when needed. But if that is not an option for the work as done by Lisa anymore, someone else might wanna finish it. 

> If you want the BLAH-> prefixes omitted from the stack trace, that should be
> a separate bug. That's not what findCallerFrame has ever done; it just
> returns a selected tail of the call stack.

Looks like another misunderstanding. I didn't expect to get this fixed by this bug. I will make sure to improve myself to give more clear answers.
Comment 23 User image Lisa Hewus Fresh [:lfresh] 2014-10-10 11:14:25 PDT
Created attachment 8503322 [details] [diff] [review]
671705_stringsjs.patch

Added escapeRegEx helper to strings.js
Comment 24 User image Henrik Skupin (:whimboo) 2014-10-12 18:55:03 PDT
Comment on attachment 8503322 [details] [diff] [review]
671705_stringsjs.patch

Review of attachment 8503322 [details] [diff] [review]:
-----------------------------------------------------------------

There are some nits which need to be fixed. Otherwise it looks great. Thanks Lisa. r=me with those changes done.

::: mozmill/mozmill/extension/resource/stdlib/strings.js
@@ +2,4 @@
>   * License, v. 2.0. If a copy of the MPL was not distributed with this
>   * file, you can obtain one at http://mozilla.org/MPL/2.0/. */
>  
> +var EXPORTED_SYMBOLS = ['trim', 'vslice', 'escapeRegEx'];

nit: Mind fixing the order by alphabet?

@@ +16,5 @@
>    return str.slice(sindex + 1, eindex);
>  }
> +
> +/**
> +* Helper function to use when including externally generated strings

nit: The stars starting from the 2nd line should have an indentation of 1. So they are below each other.

@@ +23,5 @@
> +* @param {string} aString
> +*        Text to be escaped.
> +*
> +* @return {String}
> +*         Returns a new string with metacharacters escaped.

For the format please see: https://code.google.com/p/jsdoc-toolkit/wiki/TagReturns. Also please use {string} with lowercase letters.

@@ +29,5 @@
> +
> +var sRE = /[-[\]{}()*+?.,\\^$|#\s]/g;
> +
> +function escapeRegEx(aString) {
> +    return aString.replace(sRE, "\\$&");

nit: there is still an indentation of 4 chars. Please reduce it to 2 and we are aligned.

I kinda like the change you did here, especially by using $&. I have never seen it before, so I had to look it up first. But it makes sense and saves us the extra parenthesis.
Comment 25 User image Lisa Hewus Fresh [:lfresh] 2014-10-12 20:28:10 PDT
Created attachment 8503879 [details] [diff] [review]
671705_stringsjs.patch

Fixed nits. Also moved var sRE.
Comment 26 User image Henrik Skupin (:whimboo) 2014-10-13 14:05:20 PDT
Comment on attachment 8503879 [details] [diff] [review]
671705_stringsjs.patch

Review of attachment 8503879 [details] [diff] [review]:
-----------------------------------------------------------------

Looks fine now! Thank you Lisa. I will get this landed in a bit.
Comment 27 User image Henrik Skupin (:whimboo) 2014-10-13 14:09:09 PDT
Landed on master for the mozmill 2.1 release:
https://github.com/mozilla/mozmill/commit/ae8d7b65e64d68ba97e5f50ccf2093bc084edfeb

Lisa, you wanted to file the other bug regarding the filenames. Have you done this already? I ask, because I may have missed it.
Comment 28 User image Lisa Hewus Fresh [:lfresh] 2014-10-13 14:37:01 PDT
(In reply to Henrik Skupin (:whimboo) from comment #27)
> Landed on master for the mozmill 2.1 release:
> https://github.com/mozilla/mozmill/commit/
> ae8d7b65e64d68ba97e5f50ccf2093bc084edfeb
> 
> Lisa, you wanted to file the other bug regarding the filenames. Have you
> done this already? I ask, because I may have missed it.

I haven't filed it yet. I was unsure of the wording.

Note You need to log in before you can comment on or make changes to this bug.