Yeah, I quickly ran against both files on an indexer and we got the ["Invalid regexp literal"](https://github.com/mozsearch/mozsearch/blob/e97a1da2ad46733675cd25579dff30a50938cec8/tools/src/tokenize.rs#L430) error for both. I made the following mod to get context: ```diff diff --git a/tools/src/tokenize.rs b/tools/src/tokenize.rs index 6e78b34..e809309 100644 --- a/tools/src/tokenize.rs +++ b/tools/src/tokenize.rs @@ -427,7 +427,7 @@ pub fn tokenize_c_like(string: &str, spec: &LanguageSpec) -> Vec<Token> { } else if next == '\\' && peek_char() != '\n' { get_char(); } else if next == '\n' { - debug!("Invalid regexp literal"); + debug!("Invalid regexp literal (pos {})", peek_pos()); return tokenize_plain(string); } } ``` Doing `head -c` with that shows it's indeed the arrow function that causes the . Note that in test_json_updatecheck.js it's actually a slightly earlier line at https://searchfox.org/mozilla-central/rev/04d8e7629354bab9e6a285183e763410860c5006/toolkit/mozapps/extensions/test/xpcshell/test_json_updatecheck.js#311 but for the comment 1 it is https://searchfox.org/mozilla-central/rev/04d8e7629354bab9e6a285183e763410860c5006/testing/mochitest/BrowserTestUtils/BrowserTestUtils.jsm#1370 I assume that the logic at https://github.com/mozsearch/mozsearch/blob/e97a1da2ad46733675cd25579dff30a50938cec8/tools/src/tokenize.rs#L555 that sets `next_token_maybe_regexp_literal` is getting tricked because the list of okay characters does not include ">". I nearly once mad on this problem space before (in order to tokenize JS you sorta need to parse JS), so my knee-jerk suggestion would be that maybe the "=" case wants lookahead to consume the ">"? :kashav, do you want to try your hand at a fix (and risk madness? ;)? Emilio has a patch at https://github.com/mozsearch/mozsearch/pull/248 to use 18.04 and help avoid using virtualbox which might be helpful or not.
Bug 1588908 Comment 4 Edit History
Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.
Yeah, I quickly ran against both files on an indexer and we got the ["Invalid regexp literal"](https://github.com/mozsearch/mozsearch/blob/e97a1da2ad46733675cd25579dff30a50938cec8/tools/src/tokenize.rs#L430) error for both. I made the following mod to get context: ```diff diff --git a/tools/src/tokenize.rs b/tools/src/tokenize.rs index 6e78b34..e809309 100644 --- a/tools/src/tokenize.rs +++ b/tools/src/tokenize.rs @@ -427,7 +427,7 @@ pub fn tokenize_c_like(string: &str, spec: &LanguageSpec) -> Vec<Token> { } else if next == '\\' && peek_char() != '\n' { get_char(); } else if next == '\n' { - debug!("Invalid regexp literal"); + debug!("Invalid regexp literal (pos {})", peek_pos()); return tokenize_plain(string); } } ``` Doing `head -c` with that shows it's indeed the arrow function that causes the . Note that in test_json_updatecheck.js it's actually a slightly earlier line at https://searchfox.org/mozilla-central/rev/04d8e7629354bab9e6a285183e763410860c5006/toolkit/mozapps/extensions/test/xpcshell/test_json_updatecheck.js#311 but for the comment 1 it is https://searchfox.org/mozilla-central/rev/04d8e7629354bab9e6a285183e763410860c5006/testing/mochitest/BrowserTestUtils/BrowserTestUtils.jsm#1370 I assume that the logic at https://github.com/mozsearch/mozsearch/blob/e97a1da2ad46733675cd25579dff30a50938cec8/tools/src/tokenize.rs#L555 that sets `next_token_maybe_regexp_literal` is getting tricked because the list of okay characters does not include ">". I nearly once went mad on this problem space before (in order to tokenize JS you sorta need to parse JS), so my knee-jerk suggestion would be that maybe the "=" case wants lookahead to consume the ">"? :kashav, do you want to try your hand at a fix (and risk madness? ;)? Emilio has a patch at https://github.com/mozsearch/mozsearch/pull/248 to use 18.04 and help avoid using virtualbox which might be helpful or not.