Open Bug 453385 Opened 16 years ago Updated 2 years ago

Implement message filter to search raw message source (incl. matching "raw body" like html tags)

Categories

(MailNews Core :: Filters, enhancement)

x86
Windows XP
enhancement

Tracking

(Not tracked)

People

(Reporter: nelson, Unassigned)

References

Details

Attachments

(1 file)

trunk SM Gecko/20080816002821

In order to combat spam like the attached one that arrives in my POP email
inbox, I created the following filter.

name="obfuscated viagra "
enabled="yes"
type="17"
action="Move to folder"
actionValue="mailbox://nelson%40bolyard.com@mail.bolyard.com/Junk"
condition="AND (body,contains,GENERATOR) AND (body,contains,TEXT-DECORATION)"

That filter should match the attached message, but it does not. 
I select the rule, and tell it to run the selected filter on the inbox 
folder, and nothing happens.  I tell it to run all filters on the inbox,
and nothing happens.
Blocks: 389006
Flags: blocking-thunderbird3?
Doesn't meet the "if it was the last bug still open, would we really hold the release for it?" criterion for blocking.
Flags: blocking-thunderbird3? → blocking-thunderbird3-
Nelson, is this now WFM on trunk or beta?
I just retested with TB3b3, same as before, still not caught by filter
> <META content=3D"MSHTML 6.00.2900.3395" name=3DGENERATOR></HEAD>
>     <TD noWrap align=3Dleft><A style=3D"TEXT-DECORATION: none;

Quick check result with Tb trunk(2009/8/01 build).
"body contains" doesn't hit for HEAD, META, BODY TABLE, TR, TD, and cell, padding, noWrap, left etc.
HTML tag is probably excluded form filtering target of "Body".
Note:
CSS text between <style> tag and </style> tag looks considered as "Body". So "Body" is not equal to "HTML mail converted to text". "Body" is text nodes of DOM?
HTML tag is stripped before passing mail data to Bayesian filter.
> Bug 231873 libmime should strip out html tags for bayesian spam filter
I don't know it's applicable to messaeg filter not not.
The origin of this bug is that certain spammers send messages where every
letter of every word is separated from every other letter by LOTS of html
tags.  The spams are huge even though the displayed content is small. 
Since no two letters are actually adjacent, the spam filter cannot detect
the spam words in the messages.  But the predominate characteristic of these
messages is their TONS of html tags.  I don't get any legit mail with those
tags, just spam.  I really want to be able to filter messages on those
tags and get rid of this crap.  

Maybe we need "body" and "raw body" filter targets, where "raw body" doesn't
ignore any of the meta data, and considers all alternatives in multipart/ 
alternative messages.
FYI. Some requests were found for enhancement around "Body" search. 
  Bug 229142 : New "All Headers" and "Entire Message"
  Bug 271222 : Entire Message = All headers + Body
Yes, I had been watching this.  What I would like to do is to look at the body filter code, and see how difficult it would be to add the requested option.  Perhaps with a small change to the core code, I could then add this "filter raw body" as a custom search term.
Summary: filter always fails to match on this matching message → filter on body does not match raw message source (like html tags)
Afasik, current "Body" filter was never designed to match the raw source, so this is a RFE.
Severity: normal → enhancement
Summary: filter on body does not match raw message source (like html tags) → Implement message filter to search raw message source (incl. "raw body" like html tags)
Summary: Implement message filter to search raw message source (incl. "raw body" like html tags) → Implement message filter to search raw message source (incl. matching "raw body" like html tags)
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: