Last Comment Bug 19442 - Regular expressions in mail and news filters
: Regular expressions in mail and news filters
Status: NEW
[penelope_wants]
: helpwanted
Product: MailNews Core
Classification: Components
Component: Filters (show other bugs)
: Trunk
: All All
: -- enhancement with 113 votes (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
Mentors:
https://addons.mozilla.org/en-US/thun...
: 148430 184690 191261 198273 218298 261854 275988 304428 337229 358683 359238 404189 614533 1175818 (view as bug list)
Depends on: 213567
Blocks: 66423 eudora 66425
  Show dependency treegraph
 
Reported: 1999-11-19 21:21 PST by Garth Wallace
Modified: 2016-04-15 17:21 PDT (History)
83 users (show)
davida: blocking‑thunderbird3-
dmose: wanted‑thunderbird3-
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments

Description Garth Wallace 1999-11-19 21:21:39 PST
The ability to filter email and news posts by regular expression matching in
headers would be nice. It could be used to filter out messages in ALL CAPS or
ending with a number, among other things.
Comment 1 Phil Peterson 1999-11-22 14:36:59 PST
Adding to [help wanted] list
Comment 2 Daniel Brooks [:db48x] 2002-01-01 20:16:23 PST
That and it'd be nice to have a filter action that would move messages to
folders like "bug $1", where the $1 gets substituted from the regex, creating a
new folder if one doesn't already exist.
Comment 3 Garth Wallace 2002-01-02 16:37:51 PST
Daniel: one issue = one bug. Let's keep this one nice & simple. Using 
backreferences for folder creation would be a new RFE depending on this one.
Comment 4 dmitry 2002-04-07 08:31:18 PDT
At the very least, there must be a case sensitivity flag for the matching
strings. Now I have to enter a spam-filtering substring several times in all
case combinations if I want to filter by it.
Comment 5 André Langhorst 2002-04-08 11:06:24 PDT
so I have an extensive set of pr0n and spam filters, and there is yet two types
of mails slipping through
1) Subject: bla blah blah blah        [many spaces here] 12451252 [AFXLKFJK
whatever]
2) Received: someone@anyhost.cn [or other strange domains]

1 could be satisfied by an additional [ends with] condition
2 and more sophisticated stuff could not

also I think it's faster to do a regex than browsing my 20+ filters per account

I know discussion does not belong here, but I needed to tell that it's
impossible to filter *much* spam out without regex (I still get 2-10 per account
a day and I have a very good set of message filters so far AND I do not
subscribe with my email in sex sites :) ) and the fact that regexes might
actually be faster in some cases

nominating mozilla1.1
Comment 6 Nicolás Lichtmaier 2002-06-25 11:43:39 PDT
*** Bug 148430 has been marked as a duplicate of this bug. ***
Comment 7 Carsten Menke 2002-09-28 10:40:52 PDT
Yes, as the number of spam mails I get has dramatically increased (around 50 per
day )
I also came to the conclusion that regex filtering is urgently needed. So to
filter out for example mails by subject "Gambling get over $100 and 100%
guranteed" by gambling*$*.

I think that Perl regexes would (though I would like it) to overhelming for the
average user.

Maybe bug 151226 is also of interest within this context.
Comment 8 André Langhorst 2002-09-28 11:13:30 PDT
I just want to second that the number of spam dramtically increases and due to
that this should work in the body too, but that is another bug I guess...
Comment 9 André Langhorst 2002-09-28 11:15:31 PDT
updated keyword -> mozilla1.2
Comment 10 Valerio Messina 2002-11-09 14:05:06 PST
Great for RegExp filter ability!!!
I have a 440 filter in file of WinEudora that look on IP source of a spammail
and I can survive from 25 spam x day.
I hope this filter is applicable on "all header", "All Received: line", "last
Received: line" or "Body" (not so usefull for antispam efficacy).
Antispam step:
1 - look in header for "real" source IP. Not always is last Received:
2 - Whois this IP, get back email of admin, and IPrange of provider.
3 - Write to Admin and attach the mail with all headers
4 - Create a RegExp filters based on IPrange that put mail in a folder

I had write a C program that from IPrange create a RegExp.
I'm writing a second C program that parse header to find the real IPsource...
and that dont believe to forged header...  :-))

How can I partecipate to developing MozillaMail ?

Bye, efa
Comment 11 Mike Fedyk 2002-12-17 14:52:03 PST
While regular expressions are a great tool for identifying patterns, if each
person takes the time to keep their regexes up to date with the latest spamming
tactics, there will be a very large collective waste of time.  There are
projects like spamassassin that are probably better spending your time on.  I
know, I spent a lot of time with procmail and its weighted scoring, and even
with that many spams got through and I had to keep updating it as time went along.

You might consider supporting the spamassassin (spamd) protocol and have mozilla
filter a message through that, but if you already have a smamd server then you
can probably filter on the server level...
Comment 12 Ryan Grove 2002-12-17 15:24:16 PST
Last time I checked, mail filters weren't just for filtering spam.
Comment 13 Valerio Messina 2002-12-25 17:06:24 PST
I do not concordate completely with Mike. I explain why:
Before I do not understood spamassassin. I studied it in this time. I see that
is a weighted scoring with a very big list of check on body and header of a
mail. I also tried Spamnix (EudoraWin32 plugin of spamassassin). It function
well. But they are code to identify automatically a mail as a spam mail. It's a
great tool. But limiting only to receive, identify and move mail to a spamFolder
is really dangerous. If you do so, the number of spam become really enormous. At
the end you receive 99% spam. Yes you dont see that, but internet traffic will
be all spam. Dont think only to you.

Spam must be killed before it damage us!

My idea is to kick out spammer from good provider (really really most).
Imagine to have a spammassassin tool to identify spam, and than an automatic
tools that do my previous point 1 to 3. The spammail will be forwarded to Admin
of source server, and he kick out immediately the spammer.
I do that manually about one time per day. Everyday I receive response from
Admin that say: "We investigate. If we find that a customer is in violation of
our policies, we will take the necessary action to stop the activity in
question" or best "Thanks you for report.  I have just now terminated the
account responsible for the abuse."
This method is functional, only slow because for now is most manual and maybe
I'm the unique to use it.
Point 4 is only for Admin that dont reply, dont kick out the spammer, or for
provider that are spammer (really few).
Regex filtering require some time to keep updated the list of filter, but do not
require to keep updated to lastest spamming tactics. They are the same in header.
Most of mail, source from last (bottom) received IP address. Forged header are
sourced from last IP that have real DNS in header. My C Code to do point 1 is a
real alpha, but in the future I think it can identify correcly the real source IP.
My C Code to do point 4 is a CandidateRelease2, really stable and bug free.
Point 2 is Unix Whois (need port for Win32 system).
Not too much more, some parsing on Whois report to extract abuse@domain.tld and
IPrange registered in IANA, ...

Seems to me that the trick to extinguish spam, is an automatic tools to
recognise spam (spamassassin or new MozillaMail1.3 filter), and than a
generalized automatic tools to stop the spammer.

And if such tool is really diffused (with point 4), Admin keep more attection to
spammer, because most users can easily filter all the mail sourced from a provider.

In any case RegEx filter ability is a great tool for text matching in filter and
search in mailboxes. I want it. :-))
Comment 14 Garth Wallace 2002-12-25 21:00:56 PST
Valerio: that was entirely offtopic, except for the last line which was a simple
"me too". Please don't comment unless you have something you have to contribute
to the bug under consideration.
Comment 15 R.K.Aa. 2003-01-29 22:40:25 PST
*** Bug 191261 has been marked as a duplicate of this bug. ***
Comment 16 Sander 2003-02-08 15:17:22 PST
*** Bug 184690 has been marked as a duplicate of this bug. ***
Comment 17 Dennis Daniels 2003-02-08 15:39:12 PST
I'd posted a dup of this bug apparently. I was informed:
q>
> I want to sift out all emails with the subject line containing "jhotdraw". So 
> I create a filter for subject = *jhotdraw* 

Why not simply do "subject" "contains" "jhotdraw" ?
/q>
I've been trying to do this:
--------simply do "subject" "contains" "jhotdraw"-----
for as long as I've been running mozilla. As it stands in moz1.3b no newly
created filters are running at all (not connected to this bug, I know)
Comment 18 Sander 2003-04-12 09:09:08 PDT
*** Bug 198273 has been marked as a duplicate of this bug. ***
Comment 19 Boris Zbarsky [:bz] (Out June 25-July 6) 2003-09-04 08:08:12 PDT
*** Bug 218298 has been marked as a duplicate of this bug. ***
Comment 20 John H. Miller 2003-09-04 08:28:34 PDT
There has been an increase of SPAM recently due to some very prolific viruses. 
I think that if wildcards (* and ?) were added to "Message Filtering" that many
of the SPAMs that I am receiving could be filtered out.  Ususally they are
shotgun spams that are emailed to a dozen people with similar email addresses as
mine. Wild Cards would allow me to easily identify these shot gun spammers.
Comment 21 Sander Goudswaard 2003-11-14 07:46:51 PST
I need regex for matching SpamAssassin headers. Voting for this bug.
Comment 22 Ashley Bischoff (blog at handcoding.com) 2003-11-14 07:57:24 PST
Sander: See also bug 224318 - "Bayes filtering should be aware of X-Spam Headers".
Comment 23 Werner Warweg 2004-01-20 04:14:30 PST
Ich möchte mein Mozilla noch effektiver machen. Für folgende Probleme suche ich
eine Lösung:

Bei offensichtlichen (und sicherlich beabsichtigten) Schreibfehlern ist der
Filter durchlässig wie ein Sieb.
Ich habe beispielsweise gesperrt "Viagra", der Filter läßt durch
V;agra
V i a g r a
Via.gra

Es wäre hilfreich, wenn alle Leer- und Sonderzeichen *vor* dem
Mustervergleich eliminiert würden.

Noch problematischer sind HTML-Emails:
ein Text wie
<br>
V<big>i</big>agra<br>
<br>
Via<span style="color: rgb(102, 51, 102);">g</span>ra<br>

wird nicht erkannt, obwohl der Mensch das bestens lesen kann!

noch witziger:
<br>
Vi<acd>agra<br>

wobei <acd> jede beliebige Zeichenkombination sein kann, die willkürlich vom
SPAM-Versendern eingestreut wird. 
Comment 24 Jo Hermans 2004-09-02 04:46:59 PDT
*** Bug 213567 has been marked as a duplicate of this bug. ***
Comment 25 Robert Guico 2004-11-10 08:04:46 PST
In all honesty, the concept of Perl regular expressions doesn't even have to be
implemented in its entirety... if I would like to send entire domains to a
certain folder, it'd be nice to do *@somedomain.com and have that be my one
filter. So a simple implementation of ? (single character match) or * (multiple
character matches) would do the job for me, and would also be more well known
that full-blown regex.
Comment 26 Eyal Rozenberg 2004-11-10 08:09:16 PST
The fact is that it already _is_ implemented in its entirety in JavaScript
(AFAIK), so it's just a matter of using it. No sense in re-implementating a
subset of regular expressions, I think.
Comment 27 Valerio Messina 2004-11-10 09:49:38 PST
(In reply to comment #25)
> if I would like to send entire domains to a
> certain folder, it'd be nice to do *@somedomain.com
The problem is not with real source email. Is for fake source email, like spam.
The spammer can easily cheat the source email and domain, but cannot cheat on
source IP address in header Received lines. With regex you can match the whole
IP range registered IANA block of a known spam provider like chi....... or
hana......
Comment 28 Robert Guico 2004-11-10 10:08:15 PST
(In reply to comment #27)

While I agree with the general concepts, my comments don't relate specifically
to spam. I get plenty of mail that is legitimate AND needs to be filed away into
a single folder, BUT comes from sources that are just a little bit different in
different ways (for example, 61source_dev@domain.com, 61source_stg@domain.com,
81prod_srv2@domain.com). Filtering on subject lines would not be helpful as
there have been unpleasant side effects. :)

The ability to write a single expression for this would be helpful. I tested a
handful of regular expressions that might've worked... they didn't.

Eyal... if the JS filtering is there, it's undocumented. :-)

On a related note, the reason this comes up is because I don't have the option
to create a filter from a message in my inbox (much faster way to create
filters). That is a separate feature, however.
Comment 29 Valerio Messina 2004-11-10 15:47:13 PST
(In reply to comment #25)
if I would like to send entire domains to a
> certain folder, it'd be nice to do *@somedomain.com and have that be my one
> filter. So a simple implementation of ? (single character match) or * (multiple
> character matches) would do the job for me
I tryed this now with Mozilla Suite 1.7.3 and it works well.
Just create a filter with "Sender" and "Contain" and "@domain.com"
Comment 30 Eyal Rozenberg 2004-11-10 23:24:23 PST
> 
> Eyal... if the JS filtering is there, it's undocumented. :-)
> 
Here's how we use regexp's in BiDi Mail UI:

http://www.mozdev.org/source/browse/~checkout~/bidiui/source/suite/chrome/content/bidimailpack/bidimailpack-common.js?rev=1.3

just define it with /..../ 's and then do myregexp.test(mystring)  . Pretty
straightforward.
Comment 31 Garth Wallace 2004-11-14 16:15:03 PST
(In reply to comment #28)
> 
> The ability to write a single expression for this would be helpful. I tested a
> handful of regular expressions that might've worked... they didn't.
> 
> Eyal... if the JS filtering is there, it's undocumented. :-)

It's not that regexp filtering has been implemented (if it was, this would be
RESOLVED FIXED), but that JavaScript already has support for regular expressions
so it's just a matter of modifying the filter code to use it. No need to
implement a new parser.
Comment 32 robert wegner 2004-12-15 14:20:48 PST
It's a pity nobody is working on this. Would be a killerfeature for Thunderbird.
Comment 33 Frankie 2004-12-16 13:28:35 PST
*** Bug 261854 has been marked as a duplicate of this bug. ***
Comment 34 Alex 2004-12-17 04:28:59 PST
Free Eudora had regular expressions in filters a long time ago.  Couldn't
believe it when I discovered that Mozilla mail didn't.
Comment 35 Jo Hermans 2004-12-25 11:34:33 PST
*** Bug 275988 has been marked as a duplicate of this bug. ***
Comment 36 Jo Hermans 2005-08-12 06:24:00 PDT
*** Bug 304428 has been marked as a duplicate of this bug. ***
Comment 37 Jo Hermans 2006-05-09 02:47:29 PDT
*** Bug 337229 has been marked as a duplicate of this bug. ***
Comment 38 Sergio 2006-05-09 04:20:21 PDT
If Thunderbird had a basic "wild card intelligence" (which comes since the DOS age when we wrote dir file.* and the program gave us the matching files...) it could (with no much lines more) be more efficient and less frustrating in finding INTELLIGENT MATCHES in the filter expression.

I know  a little about C++ (its not my best language) and I analized the Thunderbird source, so I could give the following CODE SUGGESTION.

Just need to transform the diagram in equivalent code. VERY STRAIGHTFORWARD and simple. Please take a look and I hope you give us SOON a Thunderbird UPDATE with "wild card intelligence" in the CUSTOM FILTERS...

That would be basic, but would help us A LOT.

The ideal features would include:
1- Possibility of mixing OR  and AND statements in the same RULE WINDOW. Nowadays you can use ONLY one of the options...
2- For advanced users, Filters with Perl regexp. 
Ex: re:.+v.*i?a?.*g?.*r?.*a to match: Viagra Visagra Viagbbbrgra and so on...

Here is the algorith I suggest to add "wild card intelligence" to thunderbird filters. I the image I indicate even the cpp file and the function that needs to be improved.

I hope it helps somehow... 

http://img509.imageshack.us/img509/2923/thunderbirdsuggestion6pa.gif

Best Regards

Sergio Abreu
Comment 39 Magnus Melin 2006-11-04 06:08:27 PST
*** Bug 358683 has been marked as a duplicate of this bug. ***
Comment 40 John Sullivan 2007-04-07 14:11:40 PDT
I've been taking a look at this and have hacked up some code. It'll need some beating into shape before it is suitable for inclusion, for which I will need some advice from someone more familiar with the mozilla codebase.

(My current MUA is showing signs of age, and I'd *really* like to switch to Thunderbird. However the lack of regex support in the filters means I can't bring over my existing sorting/blacklisting filters which is an absolute showstopper for me.)

There are actually two existing regex implementations within the mozilla codebase:

1) directory/c-sdk/ldap/libraries/regex.c - this is a very limited implemenation, far less capable that most people would expect in this age of ubiquitous PCRE support. It's also apparently extended from an old grep implementation in a slightly odd (non-standard) way.

2) js/src/jsregexp.c - this is the one mentioned above. This supports a much more powerful Perl-ish syntax - as I believe from the comments is specified for Javascript by ECMA. Unfortunately it is heavily tied in with both the JS engine memory allocation routines, and the JS engine's internal typedef range. Its symbols are not, without a lot of grotty hacking, even visible from the rest of the mozilla codebase. Even if it were visible, it requires a JSRuntime and JSContext object to operate, which are way too much overhead for a generic facility. It cannot be used directly.

3) (I know I said 2!) There's something in security/nss/lib/util/portreg.c *calling* itself a regexp, but clearly just a shell glob with maybe a little regexy-like extension. There is almost identical code in both modules/libjar/nsWildCard.cpp and xpfe/components/filepicker/src/nsWildCard.cpp which is more accurately named.

God knows whether (1) or (3) are still live code.

I've been playing around and am now at the stage where I have working POP3 filtering based on regexes. I've taken a copy of jsregexp.c, converted it to use PR_ memory allocation and PR typedefs and it seems to function fairly well. As a proof of concept it shows this bug (after 8 years!) could be addressed without too much pain.

At the moment the regex engine is namespaced out of the way and #included directly into mailnews/base/search/src/nsMsgSearchTerm.cpp. Quick-n-dirty with minimal impact, but clearly not a great solution.

I suggest that after suitable cleanups it ought to be made a core facility available across mozzila/. The problem is I lack the familiarity (and authority!) to know where it is appropriate to put it so that it is available to anywhere it might be needed.

(I don't think it could be abstracted out of the JS engine - which means a parallel implementation. The JS version has to rigidly conform to ECMA, whereas a common internal facility I can see people wanting to extend in various ways. The different ways of handling memory management are also a block here. Not pretty, but necessary I think.)

The changes to filtering/UI to integrate it are relatively trivial by comparison (even if the mozilla codebase is a huge bloated monstrosity with no rhyme nor reason to the location of any particular file), however there are a number of places where it may be useful and it might be nice to hit all of them in one go. Again, I lack familiarity enough to be able to enumerate all such places, my focus has so far been on filtering of POP3 mail.

I could do with someone who knows the codebase better to give me a few pointers here...

Cheers,

John
Comment 41 Peva 2007-06-26 06:03:42 PDT
I have a need for wildcards '*' and '?' in filters.  This would be a much-welcomed improvement.
Comment 42 Peva 2007-09-04 06:24:41 PDT
Can we please get the priority on this enhancement elevated to P1?  This is a huge hole in TB's filtering capability in my and 80 other people's opinions.
Comment 43 Eyal Rozenberg 2007-09-04 06:54:32 PDT
Peva: All TB development is very low-priority for MoFo/MoCo at the moment (and in general); IIRC there are only 2 full-time TB/mailnews developers at the moment, and Mitchell Baker wrote a blog entry basically rationalizing how TB development won't be getting any of the googlebucks. So changing settings in a bug page probably won't help much with getting feature added anytime soon... what you (us) need is to find someone to work on this.
Comment 44 Peva 2007-09-04 07:20:47 PDT
OK - thanks, Eval.  I wonder if John Sullivan is still around (see comment #40 above).
Comment 45 Matt Dudziak 2007-09-06 14:45:47 PDT
(In reply to comment #43)
> Peva: All TB development is very low-priority for MoFo/MoCo at the moment (and
> in general); IIRC there are only 2 full-time TB/mailnews developers at the
> moment, and Mitchell Baker wrote a blog entry basically rationalizing how TB
> development won't be getting any of the googlebucks. So changing settings in a
> bug page probably won't help much with getting feature added anytime soon...
> what you (us) need is to find someone to work on this.
> 

Mozilla may only have 2 people working on Thunderbird, but Qualcomm has 4 additional people working on the same basic code. Granted those 4 are not _full-time_ on Thunderbird / Penelope / Eudora, but they are submitting changes....

The more people we can get involved the better, IMHO.

Matt
Comment 46 John Sullivan 2007-09-06 17:05:23 PDT
I am still around. I'd really like to get this done - it's a stopper from me upgrading from my current MUA, which a spammer has unwittingly discovered how to crash with stupidly long subject lines. I have a workaround, but a local build of Thunderbird sorts it out completely.

Situation as above. If someone with commit rights wants to say where I ought to put generic library code in a way that wont interfere with other projects I can make time.

I could I guess just make something up, request review and sort it out from there if any reviewer is listening and objects to a crude insertion.

I should point out that I can make time before the end of year, I have leave I need to take and could use a part of for this, but I just don't feel up to using spare time during my normal working schedule when I could be unwinding from existing machine stuff. I'm sure you understand.
Comment 47 Magnus Melin 2007-09-09 08:58:34 PDT
John: some related regexp interface design was done in bug 106590. (If you need that it might be worth arguing your case there to get it un-wontfixed.)
Comment 48 Wayne Mery (:wsmwk, NI for questions) 2007-09-25 14:53:03 PDT
(In reply to comment #47)
> ... (If you need that it might be worth arguing your case there to get it un-wontfixed.)

That seems highly unlikely without core (pun intended) support from those formerly involved in that bug and their strategic (and probably smart) direction of ECMA-262 regexp's / JS_*RegExp API noted in bug 348642.  John, Brian says in there "It's pretty easy".  Perhaps they'd be glad to have your help.

(plus the bugs dependent on bug 106593, bug 32641 and bug 80337, are quite dead so no help gonna come from there)
Comment 49 Steve 2007-09-26 06:13:48 PDT
I'm not sure what is involved to make this possible, but I would absolutely love this feature.  It would be a hard-core user thing I'm sure, but very powerful.  For the record, if someone can manage to build this, even as an extension/addon, I think I would be very happy (as would others)
Comment 50 Lance Haverkamp 2007-10-11 15:47:25 PDT
> Reported: 1999-11-19 

Does anyone else think it would be nice to get this fixed before it's been kicking around for an entire decade?
Comment 51 Tim Deaton 2007-10-14 18:36:50 PDT
I'd love to see this get done as well.  Since John Sullivan appears to be offering to work on it, it would be nice if someone with authority would speak up and give him what he needs to DO it.
Comment 52 Jo Hermans 2007-11-17 13:55:11 PST
*** Bug 404189 has been marked as a duplicate of this bug. ***
Comment 53 Joshua Cranmer [:jcranmer] 2008-01-01 09:52:01 PST
I want to interject one point into the debate: should we do full JS regexp or wildcard matching. Regexp is much more powerful matching, and I have one use case. The most recent infusion of MI5 spam in Usenet can be filtered out with the matching regexp for sender: ^[vief]+@.*$

Laying out other requirements:
* Support both wildcard and regexp? If not, which one?
* Matches, contains, or both?
* Strings? I like the idea `Sender' `matches regex' [==========]
* Needs to be implemented for all of IMAP, POP, NNTP

To John: If you need help with the filter code, I can easily provide it.
Comment 54 Karsten Düsterloh 2008-01-01 11:06:35 PST
(In reply to comment #53)
> full JS regexp or wildcard matching

If we can grab some low hanging wildcard fruit, fine. But as soon as "heavy" coding is involved, we'd probably shoot for RegExp - although not necessarily JS based (having a scriptable interface for the JS RegExp would be truly cool, but I doubt we'll see that).
(I've tinkered a bit with how to implement user-defined JS filter actions, but I'm not sure how slow such extensive XPCOM boundary crossing would get.)

> * Matches, contains, or both?

Not much of a difference with RegExp.

> * Needs to be implemented for all of IMAP, POP, NNTP

And "movemail" and "none" (local folders).
Comment 55 Serge Gautherie (:sgautherie) 2008-06-20 10:33:14 PDT
Filter on "Nobody_NScomTLD_20080620"
Comment 56 Dan Mosedale (:dmose) 2008-08-26 09:13:30 PDT
Marking as wanted-, as per the revised driving rules <https://wiki.mozilla.org/Thunderbird:Release_Driving>.
Comment 57 Eyal Rozenberg 2008-08-26 11:29:56 PDT
It's not reasonable for the people who edit some Wiki page somewhere to decide that 'wanted' now only pertains to what "thunderbird-drivers think would be nice to have". The drivers control 'blocking', 'wanted' should be for users, QAers, extension devs, and non-driver devs. Please reconsider.
Comment 58 Rob Siklos [:robzilla] 2008-08-26 11:48:12 PDT
I am *not* a thunderbird driver, but I think the way they're doing things is fair.

If users, QAers, extension devs, and non-driver devs "want" a bug, they should vote for it.  Obviously by virtue of the feature bug existing in the first place, somebody wants it.   So unless the "wanted" flag is only for "special" people, it's pretty redundant.
Comment 59 Eyal Rozenberg 2008-08-26 12:00:09 PDT
Votes are not version-related. Plus, by now voting is pretty deprecated AFAICT since votes have been generally ignored.

Eh, what the hell, let'em do whatever they want(ed). Nobody seems to listen to what I/people like me say anyway.
Comment 60 Dan Mosedale (:dmose) 2008-08-26 12:09:30 PDT
blocking and wanted have always been part of a mechanism for thunderbird-drivers to help shepherd the highest-impact bugs/features into the tree; nothing has changed there.  The wiki page changes did not happen at random; I personally made them on behalf of thunderbird-drivers, to clarify policy changes we've made regarding use of that flag: we didn't think that the way it was being used was really sufficiently helpful w.r.t. helping get the highest impact bugs into the tree.

It sounds like you think there should be some separate mechanism for use by the broader community, but it's not clear to me how you envision that working.  That bug isn't really the best place for that discussion, I think, but you're welcome to post a proposal to m.d.a.thunderbird or m.d.planning.
Comment 61 Dan Mosedale (:dmose) 2008-08-26 12:10:08 PDT
Er, "This bug isn't really the best place...."
Comment 62 Wayne Mery (:wsmwk, NI for questions) 2008-08-26 13:11:33 PDT
(In reply to comment #59)
> Votes are not version-related. Plus, by now voting is pretty deprecated AFAICT
> since votes have been generally ignored.

To be fair, I don't think anyone *currently* associated with Thunderbird has discouraged voting by users (tho there are certainly detractors in the mozilla community).  And some of us do use votes. For example when attempting to differentiate within an overwhelming number of bugs looking for worthy nominations.

But voting's usefulness is limited, and it has it's problems - for example it does not equate directly or well to severity nor need. And you can't productively rank bugs against each other, for example a 9 year old bug with 90 votes (like this bug) against a 1 year old bug with 20 votes. 

Getting back to this bug...

> Eh, what the hell, let'em do whatever they want(ed). Nobody seems to listen to
> what I/people like me say anyway.

I'm guessing your ultimate concern is for this bug to make progress, which is dependent more on the suggested blocker or someone taking interest (anyone touched base with John Sullivan?), and less so on it's status per drivers.  relevant bit also in bug 106590 comment 37.
Comment 63 Joshua Cranmer [:jcranmer] 2008-08-26 13:58:45 PDT
(In reply to comment #62)
> But voting's usefulness is limited, and it has it's problems - for example it
> does not equate directly or well to severity nor need. And you can't
> productively rank bugs against each other, for example a 9 year old bug with 90
> votes (like this bug) against a 1 year old bug with 20 votes. 

I counted: # of TB bugs > 50 votes in the last:
1 year:   0
2 years:  1
3 years:  2
4 years:  4
all time: 44

No bugs in the past 4 years have > 100 votes, the newest being just under 5 years. Votes have a strong bias towards older bugs, which means that it's a poor approximation to wanted features.

> I'm guessing your ultimate concern is for this bug to make progress, which is
> dependent more on the suggested blocker or someone taking interest (anyone
> touched base with John Sullivan?), and less so on it's status per drivers. 
> relevant bit also in bug 106590 comment 37.

I've been reading C++0x recently, and as that adds support for regex natively, it might be worthwhile to ask NSPR to add support for the libraries or put it elsewhere. That's another can of worms entirely, though...
Comment 64 Kent James (:rkent) 2009-06-01 14:50:33 PDT
I recently posted a patch in bug 495519 that implements the ability to add custom search terms to Thunderbird filters, and as a demonstration shows an extension that does a regular expression comparison to the message subject. I'm reasonably confident that a finished patch will be implemented in TB3, so the use of regular expression filter terms in extensions should be possible then.

That patch, and its cousin that adds custom filter actions, relies on calling javascript-implemented features from the C++ filter code. If we are willing to take that step, then it is not a big leap to do the same trick in the normal filter code to implement regular expressions. We could implement a javascript XPCOM object to execute a regular expression, and call it from the C++ search code when needed to do the regular expressions. This is not a big project. It also could be done in extensions instead of in the core code though.

I'm curious what people think of this approach, with its obvious possible performance issues. I'd probably be willing to do this work if drivers could add wanted+ to this bug. Otherwise I'll just leave it to the imagination of extension writers.
Comment 65 Dan Mosedale (:dmose) 2009-06-02 11:01:02 PDT
In general, it seems like an entirely reasonable approach.  Because of the potential perf issues, I suspect that the thing that makes the most sense is to code this up as an extension first and do some benchmarking before committing to accept this in the core.  As such, I don't think we can say for sure whether it's wanted+ at this point in time.
Comment 66 sergo 2009-07-27 20:10:02 PDT
Also it would be nice if messages could be searched for using regular
expression syntax.
Comment 67 Matt Dudziak 2010-01-27 14:59:04 PST
*** Bug 359238 has been marked as a duplicate of this bug. ***
Comment 68 Joshua Cranmer [:jcranmer] 2010-11-24 05:17:05 PST
*** Bug 614533 has been marked as a duplicate of this bug. ***
Comment 69 Wayne Mery (:wsmwk, NI for questions) 2013-03-01 01:40:38 PST
kent's comment 64 is related to https://addons.mozilla.org/en-US/thunderbird/addon/filtaquilla/
Comment 70 Kent James (:rkent) 2013-03-01 10:55:32 PST
This is a bug that could be easily implemented using a javascript XPCOM component to do the regex processing, and adding an additional choice for text searching as suggested in https://bugzilla.mozilla.org/attachment.cgi?id=719922
Comment 71 ukrainianconsular 2014-02-12 08:21:42 PST
This is what kent told me 2 years ago on his forum & a year after, he tells me he doesn't have time for this ... . the man seems to have an ego & practices abuse of trust pretending. there's even better, send him requests to add regex for the body using different emails, pretending someone else is asking it & he will objectively ignore the request, whoich shows he doesnt give 2 phux about specific requests when hes not in the mood for particular requests to be added at a certain time of his life .. .

also, dont expect anything from mozilla, as a wise man above pointed out:


"Lance Haverkamp 2007-10-11 15:47:25 PDT

> Reported: 1999-11-19 

Does anyone else think it would be nice to get this fixed before it's been kicking around for an entire decade?"
Comment 72 ukrainianconsular 2014-02-12 08:37:16 PST
kent usually responds to request by it's easy before ignoring you forever.. & mozilla seems to care less about regex implementation in filters .. . something that exists in postbox, with a postbox addon..

let's wait one more decade & see if things evolve at mozilla, i had my dose of it.
Comment 73 Josiah Bruner [:JosiahOne] (needinfo for responses) 2014-02-12 09:48:44 PST
ukrainianconsular@gmail.com,

Personal attacks are not appreciated *at all* around here. Clearly you have nothing better to do with your time than to insult extremely helpful and knowledgeable contributors such as Kent.

Thunderbird is run ONLY by contributors now and Mozilla has very little to do in it's development. If you care so badly about this being fixed, by all means go ahead and fix it yourself, but don't expect the rest of us to do whatever you want. We are all working in our spare time to improve TB (We actually have lives and jobs outside of this), but we can't get to every bug in a timely fashion. There are thousands of filed bugs.

Therefore, unless you have something useful to contribute to a development-only site, please just stop. Continuing to attack people may result in a ban to your account.

Thanks,
Josiah
Comment 74 ukrainianconsular 2014-02-12 11:54:53 PST Comment hidden (spam)
Comment 75 ukrainianconsular 2014-02-12 11:58:26 PST Comment hidden (spam)
Comment 76 ukrainianconsular 2014-02-12 12:02:13 PST
alos, i am insisting towards moz devs, because rkent will not add regex filtering for the body section in his addon, because he thinks it affects me & only me through his rejection & my complaint about his abuse of trust. he will not do it & everyone who's expected that function here has been waiting forever.
Comment 77 Eyal Rozenberg 2014-02-12 12:09:54 PST
(In reply to ukrainianconsular from comment #74)

You cover your valid points with so much hyperbole, with curses, with YELLING, et cetera - that it's really difficult for people with a view similar to yours to sympathize with your message, or even to read through it.

(In reply to Josiah Bruner [:JosiahOne] from comment #73)

Josiah, 

- "Very little" is still some. And I'm sure people connected to the Mozilla foundation/Corporation have a lot of say w.r.t. Thunderbird development.
- Thunderbird development work over the years sure seem to have very strange priorities.
- Many users get extremely frustrated
- For some people, implementing a certain feature might mean 2 days of work, while for others, it might mean many months of agonizing to get to the point where they can add anything significant to the code. I really suggest people remove "so fix it yourself" from the lexicon.
- A 20-year-old mail client should have gotten regex support in message filters many years ago.

Having said that, ukranianconsular's personal attacks are quite beyond the acceptable.
Comment 78 ukrainianconsular 2014-02-12 12:16:32 PST
it's just that the 15 years of thread & the fact another contributor of an addon ignores a main basic function of his add-on drives me nuts, and that's factual, not an hyperbole.

if my complaints are beyond the acceptable, i will apologize to have bothered you & will  make sure i never post any requests at mozilla.org

it is certain that after having started many threads & obtained no fix for all of them even without complaining about that form of negligence, it's beyond acceptable to insist.

you are right eyal, i shouldn't keep expecting anything after having seen the date this thread was started .. . when email clients r conceived & posted online with regex on their first release, i send to all of the ppl here who have been deeply offended, my most humble apologizes.

keep up the good work.
Comment 79 Eyal Rozenberg 2014-02-12 12:32:51 PST
(missing bit in my comment) 
- Many users get extremely frustrated after waiting many years to see some attention to significant issues they have reported or commented on, that they end up losing their temper, like our friend ukrainianconsular. I'm sure he has enough on his plate in life these days in the Ukraine than to write comments here... he does it because he cares about making Thunderbird better. Everyone here wants that, nothing else.
Comment 80 ukrainianconsular 2014-02-12 12:38:01 PST
ur right ((
Comment 81 Valerio Messina 2014-02-13 02:20:49 PST
I propose to remove vote feature in Importance field of Bugzilla, as is never looked while assigning bugs or RFE
Comment 82 opera wang 2014-02-13 06:31:39 PST
If you check bug 868233, you can see Kent has already have a patch that can help all addon authors to working on filter using body part. The patch itself actually demos how to use regular expressions for body match.

Also without the above patch, it's not possible to just write addon to do the regular expression search against body, as now if you need get the body, it's an async call, and filter system need sync call.

The current Thunderbird development is a bit slow, that's true. I do see a few bugs have patch and it might need months before get reviewed. That frustrated both contributors and end users. There might be ways to improve this, but I agree with Josiah Bruner, Personal attacks doesn't help.
Comment 83 robert wegner 2014-02-13 07:03:39 PST
could you move this fruitless discussion to the forum please? I really dont need this via email...
Comment 84 ukrainianconsular 2014-02-13 11:39:32 PST
Thank you so much opera wang, hoping this patch addresses regex & not javascript since i don't know the javascript language & have no computer knowledge. could you please tell me how to install it by the way since i have no coding knowledge & don't know were to put that content & how to name the file: https://bugzilla.mozilla.org/attachment.cgi?id=744907  Regex is already a bit complicated for me but i use a magic tool called regulazy that helped me a lot, when i click on regex edit, select parts & right click, then i have many choices including "exact-letters-numbers-anything, etc". This matter has frustrated me a lot because i've been fighting against spam using a personal email that's been posted by inbreds online & collected by all types of extractors-companies or hired spammers to send me a whole imaginary world of spam content, & i took the decision to abandon the ordinary baysian filters since they appeared on the web & that are useless for my part, since the baysian system sends false positives to a spam folder, moreover, there is a spam folder "to deal with/deal with spam", & i can't use that system to delete false positives. so i started to create complex filters over 20 years of my life, dealing with those spams almost every day, i now reached almost 2500 complex filters, and many of those filters are very "cerebral" & well thought, including the fact:

Re:
i
"or" you (depending on the linguistic expression & content)
are not present when i receive marketing related words
& http
is present

& that personal technique is like 30% of my spam filter's database.
regex for the body was my only accurate possibility because:

if people use doesn't contain "i " & there's a word finishing with the letter "i" in the body, a false positive will be considered & i cannot allow myself such false positives caus emy filters are radical
& delete the messages without sound without any attention, without visibility of the matter..

& this is a stupid limitation of all email clients that's never been fixed by an update.

i could use " i "

but if my letter " i " is in the body but
at the start of the body, it will be
ignored by tb's filters, because 
there's no space before " i "

so we either needed a regex for the body,
or "exclusively contains" option in tb's filters "this would be an evolution in email filtering". 
which would exclude all letters of the alphabet before or after a string/word/letter.

using exclusively contains for "i",
this should be recognized "hihi,i " .. . but not this "oui"

spammers can even amuse themselves reading this message, i will defeat their mind polluting
unsolicited spam campaigns.

even if they try to obfuscate i or you or or http links playing with tags or colors,
i have a solution for them too.

& a link that arrives without any text should also be treated by a complex regex for the body .. .

i already excluded more than a certain number of emails allowed in the too or bcc or cc fields.

this regex option for the body was very important to me cause i engaged myself in 20 years 
of personal fight & my system is actually almost perfect, for an email that receives 100 spam
remails PER DAY, if i login my email servers account through the web panel, ill see them all
in the trash can & they're purged auto. by the server.

& i get like 1 spam email over 7 to 14 days actually..

i've built very complex filters including words obfuscated by discomposition among tags for ex..

i've reviewed my spam box many times & have no false positives, but that's a hard & long work
& fight against spam, i also lost a dear member of my family & forgot the human vital aspect
of my life & experienced affective misery for being so obsessed with it.

today, i'm watching the end of the tunnel folks & this body regex filtering option
adds perfection to my filtering system & allows me to treat tags too without ever having 
to deal with it again, when some online filtering engines don't even allow it either, .

I became schizophrenic & it became a psychosis, that i ended up regretting to have ignored
that family member that passed away before i could spend more time by having more freedom
in my hands. & you cant just sign up for a new email when you've shared that email with
thousands of people over decades .. .

spam can be very devastating in a life, it starts to slowly irritate you until it invades
your life, even spammers from argentina are protected by their laws & spam in total impunity,
that's why full spam reports will never stop some spammers. 

All i'm missing now is the "My email" not present in the To field nor cc field,
while that filter should check more than one identity, without having to add them
to filters & removing them when necessary when if i ever get rid of an account.
i hope aceman takes care of it.

Again, my sincere apologizes to everyone who's felt harassed in my interventions here,
but if Eyal wouldn't have tried to understand me, i probably wouldn't have taken
the time to explain my situation & why am i so obsessed with the matter. Because,
before you guys feel harassed, trust me, i've been through a lot on this matter .. .

Let's move on from this.

Thanks.
Comment 85 ukrainianconsular 2014-02-13 17:58:34 PST
News:

My private conversation with Opera Wang:

Opera:
The patch C++ code requires you to download the source code of TB, patch it and then rebuild 
whole TB, it would be hard if you were not in IT. Also the patch is about half year ago 
which means it is bit rotted ( can't get patched directly, need some merge effort ).
	
ukrainianconsular:
It means if i use the patch, i should never update thunderbird & condemn thunderbird 
to the latest actual stable build & never update it again, since i can't expect devs 
at mozilla to incorporate it for good .. . could you please add it, 
since i have no coding knowledge ((

Opera:
As I said, without the patch, it's (almost) impossible to do regex search in the body. 
I actually tried it once, with a lot of dirty code, still can't make it work without 
the possibility of crashing TB.

ukrainianconsular to the devs of bugzilla:
So now, i'm hopeless .. . and i believe rkent is blocked because tb needs a code modification
before this regex search becomes possible .. . If you guys want to consider that issue,
here's rlkent patch: https://bugzilla.mozilla.org/attachment.cgi?id=744907
Apparently, only you can do something about it.
Comment 86 ukrainianconsular 2014-03-16 18:05:44 PDT
Getting rid of filtaquilla which becomes now obsolete to (regex) filter by
headers
to
from
subject
& the missing one: body

Many thanks to opera wang to introduce all those options in the addon he is maintaining:
Expression Search / GMailUI 0.8.8 beta
http://www.sendspace.com/file/dsfhu4
Comment 87 Magnus Melin 2015-01-03 12:11:53 PST
*** Bug 1075276 has been marked as a duplicate of this bug. ***
Comment 88 :aceman 2015-06-18 00:22:31 PDT
*** Bug 1175818 has been marked as a duplicate of this bug. ***
Comment 89 Alastair Gordon 2015-09-09 14:26:29 PDT
I cannot believe that requests for this enhancement go back 16 years, and are still unfulfilled! So much for Open Source!

Regular Expressions, or even wildcards, in all fields in Message Filters would allow the creation of the most powerful spam filters on an individual basis. After all, one man's marketing communication is another man's spam. 

It cannot be difficult, can it?
Comment 90 Matěj Cepl 2015-09-10 06:18:16 PDT
(In reply to Alastair Gordon from comment #89)
> It cannot be difficult, can it?

Sure it isn't ... just attach a patch to fix it!
Comment 91 Alastair Gordon 2015-09-10 08:05:16 PDT
We seem to have agreement that RegEx/Wildcards would be an extremely valuable feature for all fields in Message Filters, and that there are no technical hurdles. So why has no one taken up the challenge in 16 years? If I had the skills, I would definitely do it.
Comment 92 Kent James (:rkent) 2015-09-13 23:37:15 PDT
See my comment 64. That approach is implemented in the addon FiltaQuilla, which has been available for years. The usage of that addon has never been that high (I believe around 10,000 users) and it includes many features in additional to RegExp, so the total actual demand for this feature is not as high as you would think. Because it is easy to do in an addon now, it may be appropriate to just leave it there.
Comment 93 Alastair Gordon 2015-09-14 05:16:57 PDT
I have installed FiltaQuilla, and to the best of my knowledge, it will not apply filters to the BODY of the message, making it essentially useless for filtering spam. If FiltaQuilla does, in fact, allow filtering on the body and is usable by anyone other than a hard-core nerd, then you are right that we already have a solution.

Simply supporting a wildcard character in the existing Thunderbird Message Filters would meet 90% of filtering needs. How about someone just adding wildcard capability to existing Message Filters?
Comment 94 Kent James (:rkent) 2015-09-14 11:00:50 PDT
Alastair: Right, body filters are much more difficult, since filters are sync by nature, and body access is async. There is now a technical solution to this that did not exist when FiltaQuilla was written, though it is still a bit of a kludge, and may lead to unstable behavior (filters are crashy enough by themselves). It would probably be possible to add RegEx body filters to an addon such as FiltaQuilla, but frankly I've put almost no effort into that in years, and am unlikely to do so in the foreseeable future. There are much more urgent issues to deal with in Thunderbird at the moment that filter improvements.

If someone wanted to attempt that I would be happy to point them in the correct direction.
Comment 95 Alastair Gordon 2015-09-15 06:59:16 PDT
Thank you for the explanation, Kent. I hope someone takes up your kind offer.

In fact, simply allowing wild cards ("*") in the existing Message Filters, which can filter on the contents of the BODY, would achieve most of what is needed. How tough would that be?
Comment 96 grzegorz.szyszlo 2015-09-28 03:34:20 PDT
asterisk "*" is simple workaround but it isn't enough. do you plan support for unix shell or regexp wildcard?  shell accepts  * , ?  and [chars] . this is the simplest regexp for creating.

other question. does asterisk work when is placed in theme, sender, receiver or else anywhere ?
Comment 97 Rebeccah 2016-04-15 17:21:34 PDT
Sorry if I'm late to the party, here.  Kent, FiltaQuilla doesn't come up when I search for Thunderbird add-ons that do searching - and its description only includes a laundry list of features I don't actually care about (no offense).  I care far more about the ability to search e-mail bodies than the ability to use full regular expressions to search only subjects. That said, either regex or boolean search would seem to me to be much more than just a nice-to-have.

For me, this has nothing to do with spam - there are other means of filtering spam, and for now TB's "junk" designation is working OK for me.  I keep my e-mail until I run out of disk space, and I search old e-mails for specific content *in the body of the e-mail* on a regular basis.  In fact, I only within the last month switched to using TB at home rather than my ancient Forte Agent 1.8, because it's easy to use and I haven't wanted to pay to upgrade Forte Agent to a more modern version that will handle the now-ubiquitous HTML e-mails.  But I was dismayed to see how e-mail search is implemented in TB.  I may go back to Forte Agent after all...  

I will try some add-ons first, but everything I'm reading seems to indicate that even those that will search other headers may not search bodies (and I understand about the technical difficulty).  And recent reviews of FiltaQuilla and of Wang Opera's Expression Search / GMailUI seem to indicate regex searching is currently broken in both of them.

Rebeccah

Note You need to log in before you can comment on or make changes to this bug.