Created attachment 336592 [details] spam that should match filter but never does trunk SM Gecko/20080816002821 In order to combat spam like the attached one that arrives in my POP email inbox, I created the following filter. name="obfuscated viagra " enabled="yes" type="17" action="Move to folder" actionValue="mailbox://firstname.lastname@example.org/Junk" condition="AND (body,contains,GENERATOR) AND (body,contains,TEXT-DECORATION)" That filter should match the attached message, but it does not. I select the rule, and tell it to run the selected filter on the inbox folder, and nothing happens. I tell it to run all filters on the inbox, and nothing happens.
Doesn't meet the "if it was the last bug still open, would we really hold the release for it?" criterion for blocking.
Nelson, is this now WFM on trunk or beta?
I just retested with TB3b3, same as before, still not caught by filter
> <META content=3D"MSHTML 6.00.2900.3395" name=3DGENERATOR></HEAD> > <TD noWrap align=3Dleft><A style=3D"TEXT-DECORATION: none; Quick check result with Tb trunk(2009/8/01 build). "body contains" doesn't hit for HEAD, META, BODY TABLE, TR, TD, and cell, padding, noWrap, left etc. HTML tag is probably excluded form filtering target of "Body". Note: CSS text between <style> tag and </style> tag looks considered as "Body". So "Body" is not equal to "HTML mail converted to text". "Body" is text nodes of DOM?
HTML tag is stripped before passing mail data to Bayesian filter. > Bug 231873 libmime should strip out html tags for bayesian spam filter I don't know it's applicable to messaeg filter not not.
The origin of this bug is that certain spammers send messages where every letter of every word is separated from every other letter by LOTS of html tags. The spams are huge even though the displayed content is small. Since no two letters are actually adjacent, the spam filter cannot detect the spam words in the messages. But the predominate characteristic of these messages is their TONS of html tags. I don't get any legit mail with those tags, just spam. I really want to be able to filter messages on those tags and get rid of this crap. Maybe we need "body" and "raw body" filter targets, where "raw body" doesn't ignore any of the meta data, and considers all alternatives in multipart/ alternative messages.
FYI. Some requests were found for enhancement around "Body" search. Bug 229142 : New "All Headers" and "Entire Message" Bug 271222 : Entire Message = All headers + Body
Yes, I had been watching this. What I would like to do is to look at the body filter code, and see how difficult it would be to add the requested option. Perhaps with a small change to the core code, I could then add this "filter raw body" as a custom search term.
Afasik, current "Body" filter was never designed to match the raw source, so this is a RFE.