Extending RegEx search ("regexp") support for lookaround assertions alike (unsupported by livegrep's underlying use of the re2 engine)
Categories
(Webtools :: Searchfox, enhancement)
Tracking
(Not tracked)
People
(Reporter: masterquestionable, Unassigned)
References
Details
/Mozilla(?!\w)/i:
https://searchfox.org/mozilla-central/search?regexp=true&q=Mozilla(?!%5Cw)
"No results for current query.": apparently erroneous.
Comment 1•1 years ago
|
||
Fulltext search currently only supports what libre2 supports as documented at https://github.com/google/re2/wiki/syntax and linked from the text "regular expression matching" from the (limited) docs at https://searchfox.org/ (with many thanks to a contributor for providing that link).
I've converted this into an enhancement request since it doesn't seem like we had an enhancement request tracking this already, although it is a known limitation. I think I've also proposed in various places that we could use the "query" pipeline mechanism to convert the more expensive, unsupported-by-libre2 into cheaper regexps it can support, then perform a post-filtering pass using a more full-featured regex library. (In particular the regex crate says it does not support look-around or backreferences, so presumably we would need to identify a more powerful regex crate.)
I suppose another stop-gap possibility would be to leverage client-side post-filtering. That might be an option for the "search" endpoint (router.py) which does not have any pipeline capabilities.
Reporter | ||
Comment 2•1 years ago
|
||
Originally thought Searchfox used similar implementation as Bugzilla (which uses Perl).
Didn't notice it's Rust based.
Leveraging to client-side would probably cause significant bandwidth load: probably even worse than the potential ReDoS.
Bugzilla already supports similar RegEx though, probably susceptible to the same problems.
Reporter | ||
Comment 3•1 year ago
|
||
Potential workaround:
https://github.com/curl/curl/discussions/12397#discussioncomment-9603190
Description
•