Open Bug 1877439 Opened 1 years ago Updated 1 year ago

Extending RegEx search ("regexp") support for lookaround assertions alike (unsupported by livegrep's underlying use of the re2 engine)

Categories

(Webtools :: Searchfox, enhancement)

enhancement

Tracking

(Not tracked)

People

(Reporter: masterquestionable, Unassigned)

References

Details

    /Mozilla(?!\w)/i:
    https://searchfox.org/mozilla-central/search?regexp=true&q=Mozilla(?!%5Cw)

    "No results for current query.": apparently erroneous.

Fulltext search currently only supports what libre2 supports as documented at https://github.com/google/re2/wiki/syntax and linked from the text "regular expression matching" from the (limited) docs at https://searchfox.org/ (with many thanks to a contributor for providing that link).

I've converted this into an enhancement request since it doesn't seem like we had an enhancement request tracking this already, although it is a known limitation. I think I've also proposed in various places that we could use the "query" pipeline mechanism to convert the more expensive, unsupported-by-libre2 into cheaper regexps it can support, then perform a post-filtering pass using a more full-featured regex library. (In particular the regex crate says it does not support look-around or backreferences, so presumably we would need to identify a more powerful regex crate.)

I suppose another stop-gap possibility would be to leverage client-side post-filtering. That might be an option for the "search" endpoint (router.py) which does not have any pipeline capabilities.

Status: UNCONFIRMED → NEW
Type: defect → enhancement
Ever confirmed: true
Summary: RegEx search ("regexp") not working with lookaround assertion → Consider supporting more advanced RegEx search ("regexp") like look-around assertions that are not supported by livegrep codesearch's underlying use of the re2 engine

    Originally thought Searchfox used similar implementation as Bugzilla (which uses Perl).
    Didn't notice it's Rust based.

    Leveraging to client-side would probably cause significant bandwidth load: probably even worse than the potential ReDoS.
    Bugzilla already supports similar RegEx though, probably susceptible to the same problems.

Summary: Consider supporting more advanced RegEx search ("regexp") like look-around assertions that are not supported by livegrep codesearch's underlying use of the re2 engine → Extending RegEx search ("regexp") support for lookaround assertions alike (unsupported by livegrep's underlying use of the re2 engine)
You need to log in before you can comment on or make changes to this bug.