Closed Bug 106590 Opened 18 years ago Closed 13 years ago

nsIRegExp and friends


(Core :: XPCOM, defect, P1)






(Reporter: dmose, Assigned: WeirdAl)


(Blocks 1 open bug, )



(1 file)

A bug to kick off the festivities...
First kick at the can:

interface nsIRegExpService
  /** match pattern against target (flags for case-insensitivity, etc.) */
  RESULT match(in AString pattern, in AString target, in PRUint32 flags);

  /** compile pattern for repeated use */
  nsIRegExpPattern compile(in AString pattern, in PRUint32 flags);

  /** match compiled pattern against target */
  RESULT matchPattern(in nsIRegExpPattern pattern, in AString target);
What should RESULT be?  Do we need JS-esque test and search methods for fast
boolean did-it-match? tests, versus the rarer and more expensive
what-did-it-match? tests?

Pattern seems wrong to me -- why not just nsIRegExp?

Are we sure we need XPCOM here?  Over what implementations are we abstracting
(other than JS's?)?  I'm just askin'....

I was planning to use the JS engine's regexp support and a little JS component
to implement this on the cheap, so XPCOM was my use-it-from-C++ solution.

brendan: what shaver said.  There are some number of places in the browser that
are likely to be interested in specifying certain things as regexps, and this
seems like the easiest way to get that support.
Nice to have for 1.1, but definitely not a 1.0 stopper.  (And where would this
interface and implementation live in the tree?  XPCOM doesn't want JS dependencies.)
Target Milestone: --- → mozilla1.1
xpfe/components would be one possibility, though not strictly accurate.
OK, here's another iteration of ifaces:

[scriptable, uuid(96f6a218-4ca7-4b60-94c6-11617819551b)]
interface nsIRegExp : nsISupports {

    boolean test(in AString aTarget);

    void exec(in AString aTarget, out unsigned long aCount,
	      [retval, array, size_is(aCount)] out wstring aValues);

[scriptable, uuid(ee6ce775-9bf8-4461-aa8a-3742cf7bd9ec)]
interface nsIRegExpService : nsISupports
    const long GLOBAL = 1 << 0;
    const long CASE_INSENSITIVE = 1 << 1;
    const long MULTI_LINE = 1 << 2;

     * match pattern against target (flags for case-insensitivity, etc.)
    boolean search(in AString aPattern, in AString aTarget, 
		   in PRUint32 aFlags);

     * match pattern against target (flags for case-insensitivity, etc.)
     * returns array YYY
    void match(in AString aPattern, in AString aTarget, in PRUint32 aFlags,
	       out unsigned long aCount,
	       [retval, array, size_is(aCount)] out wstring aValues);

     * compile pattern for repeated use
    nsIRegExp compile(in AString aPattern, in PRUint32 aFlags);
So the thing I'm currently interested in using regexps for is mail filters.  The
more I think about it, though, the more I'm wondering if it wouldn't be better
to simply include the flags in the string, and require the regexps to be
formatted like JS regexp literals (eg "/foo/gi").  This allows regexps to be
easily saved as single strings, and won't require clients to do extra parsing
work.  Thoughts?
*** Bug 48179 has been marked as a duplicate of this bug. ***
Also somewhat relevant is the patch here:

which just calls into the JS engine directly for some regexp work.

Also, I'd really like some reaction on my strawman iface and previous comments,
The interface looks OK, but I'd want to see some client code to make sure that
the clichés are well-expressed and all that.  The flags-in-regexp string stuff
makes some sense, though I wonder how often clients are going to be appending
"i" for programmatic control of case sensitivity.  (I just reviewed a filepicker
patch that, while not using regexes, does want to control case sensitivity in
such a way.)  Thyey'll live, I guess.

Also, people are going to want to do replacement, as soon as we give this to
them.  Is the global flag even useful without such replacements?

More after I sleep, you big bully.
Assignee: shaver → dmose
I buy the need for an XPCOM interface to JS regexps, but I'd like to see more
we-must-write-in-C++-rather-than-in-JS particulars for the clients of this new
interface.  Can someone give a run-down?

Target Milestone: mozilla1.1alpha → Future
Target Milestone: Future → mozilla1.1alpha
One example is xml schemas - strings can be restricted using regexps
This is going to be required for Web Forms 2.0.  Also note XForms already does this - it's time to resurrect this and get it into core code.  I'm willing to fix and implement as part of implementing <input type="text" pattern="foo"/>.
Blocks: 345512
Target Milestone: mozilla1.1alpha → ---
Attached patch patch, v1Splinter Review
This patch implements nsIRegExp and nsIRegExpService as shaver defined it, with a couple extra methods for string replaces thrown in for good measure.

brendan, shaver says he's quite busy and unable to review; would you, please?
Assignee: dmose → ajvincent
Attachment #230639 - Flags: superreview?(brendan)
Attachment #230639 - Flags: review?(brendan)
Comment on attachment 230639 [details] [diff] [review]
patch, v1

bz: this is a relatively small patch, but my placing it in xpconnect may not be appropriate.  I was just making a guess as to where it should go.

Would you be willing to review this?  It's very elementary IDL + JS.
Attachment #230639 - Flags: superreview?(bzbarsky)
Attachment #230639 - Flags: superreview?(brendan)
Attachment #230639 - Flags: review?(bzbarsky)
Attachment #230639 - Flags: review?(brendan)
Comment on attachment 230639 [details] [diff] [review]
patch, v1

I will look at this.  It seems ok to add it to xpconnect.

One pervasive nit: put a blank line between adjacent functions.

Attachment #230639 - Flags: superreview?(bzbarsky) → superreview?(brendan)
timeless has requested for compatibility with Java that I split the IDL file into two, one for nsIRegExp, and one for nsIRegExpService.  I'll do so in the next patch (which I won't create until I have a fair idea of any other problems in the patch).
This doesn't belong in XPConnect (which is probably why the bug isn't in that component).
Yeah, I'm not happy putting this into XPConnect either.  I'm not quite sure where it _should_ go, though...  Perhaps the best place is XPCOM?  I do think we want this in a fairly low-level place so that it's usable in a large part of the tree, but I'm not sure whether XPCOM already depends on JS...  Benjamin, ideas on where to put this?

As to review, I don't think I'd be a good reviewer here.  Someone who knows more about regular expressions in JS would be a better bet.  I do think that the comments on the interfaces probably need to be a lot more extensive, unless you expect everyone using them to 1) know that JS regexps are the way this is implemented and 2) know a lot about the details of how they work (the behavior of regexps with /g on repeated calls comes to mind here).

Oh, and is there a reason we're returning an array of wstring instead of an array of AString for a method that takes AString?  What's the null-termination behavior here in general?  Are embedded nulls allowed in the various strings involved?
(In reply to comment #21)
> Oh, and is there a reason we're returning an array of wstring instead of an
> array of AString for a method that takes AString?

The reason is I was following the IDL posted in this bug as written in comment 7.  If it's wrong, though, that's fixable too.
Priority: -- → P1
I dunno that it's wrong, but if we can work with AString throughout, that's probably preferable.

Thinking about location some more, the impl depends on XPConnect, right?  So we may want to put the interfaces in someplace like XPCOM and the impl in dom/ or something... Again, I'd like to hear what Benjamin thinks.
I certainly don't mind the interface living in xpcom/ds

The impl should IMO live in js/src/xpconnect, subject to shaver/dbradley agreement.

I don't think that you can have [array] AString methods in xpidl... try it.
One more question:  do you want the components to implement other interfaces, such as nsIClassInfo?
Probably no need to... what would be the benefit?
(In reply to comment #24)
> I don't think that you can have [array] AString methods in xpidl... try it.

js/src/xpconnect/idl/nsIRegExpService.idl:105: Error: [domstring], [utf8string], [cstring], [astring] types cannot be used in array parameters
Ah, well.  That's sad.  :(

In that case, just make sure to document whether null is allowed in the input, and if it is what happens.
Comment on attachment 230639 [details] [diff] [review]
patch, v1

Per bz's comment 21, I'm asking mrbkap (who's also familiar with the js engine) for review.  Between him and brendan, I think we've got it covered.

That said, there are a lot of small nits on this patch without the formal review.  Do I need to submit a new patch to expedite this?
Attachment #230639 - Flags: review?(bzbarsky) → review?(mrbkap)
Comment on attachment 230639 [details] [diff] [review]
patch, v1

This looks like a case of "when all you have is the XPCOM hammer, everything looks like your thumb." ;-)

I don't mind so much adding the implementation to an existing module that already depends on JS, with the interface in the same place, or a more primitive place.

But implementing a service to support JS regular expression usage in C++ is wrong.  And *implementing* that service in JS is also wrong if the object is fast C++ binding to the SpiderMonkey jsregexp.c code.

It's hard to do design review by reviewing a patch.  Can we back up one step and talk about the requirements on a C++ regexp API?

Attachment #230639 - Flags: superreview?(brendan) → superreview-
Comment on attachment 230639 [details] [diff] [review]
patch, v1

Okay.  I've started a wiki page at specifically for us to sketch out the design and requirements.
Attachment #230639 - Flags: review?(mrbkap)
(In reply to comment #32)
> std::tr1::regex!

IIRC people want C++ access to ECMA-262-conforming regexps, not the (POSIX?) C++ ones.  Anyone know better?

After careful consideration by #content, there just doesn't seem to be enough appropriate demand for a XPCOM-based service in the core code.  Instead, I've filed bug 348642 for implementing JS_*RegExp API.

If you disagree, please explain why, in detail.
No longer blocks: 345512
Closed: 13 years ago
Resolution: --- → WONTFIX
python/java/whatever other languages that plug into gecko should be able to get access to the same regexp engine that js uses so that they know exactly how js will behave when passed a given expression. (Note: I am aware of the function parameter to replace, i'm only interested in the matching of the regexp engine and not the fact that javascript can perform arbitrary actions in response to each match and that those actions can't be predicted).

another person just asked for this feature today.
Component: XP Miscellany → XPCOM
brendan: with Tamarin just-in-time compiling on the way, would it be appropriate to reconsider the WONTFIX status on this bug?  (Referencing comment 30.)
Why in the world would the XPCOM hammer be the right tool? Let Tamarin expose an API for calling trace-JITed regexp code on its own -- that API will not use XPCOM.

Some more food for thought:

C++0x (notably, TR1 itself) adds support for various levels ofregex into native C++, one of which is a partially-modified ECMAscript (the others are basic and extended, IIRC).
You need to log in before you can comment on or make changes to this bug.