Message filter with "Body contains" does not ignore tag content when tag is broken across lines.

RESOLVED FIXED in Thunderbird 65.0

Status

RESOLVED FIXED
3 years ago
2 months ago

People

(Reporter: ski, Assigned: jorgk)

Tracking

(Blocks: 2 bugs)

38 Branch
Thunderbird 65.0
x86
Windows Vista
Dependency tree / graph

Thunderbird Tracking Flags

(thunderbird_esr6064+ fixed, thunderbird64 fixed, thunderbird65 fixed)

Details

Attachments

(2 attachments, 7 obsolete attachments)

(Reporter)

Description

3 years ago
User Agent: Mozilla/5.0 (Windows NT 6.0; rv:42.0) Gecko/20100101 Firefox/42.0
Build ID: 20151029151421

Steps to reproduce:

I can reproduce the bug always in the following way. It need s an POP-account in order to use the "Body contains" filter option.

1.) Create the message filter:
 In the Menu go to Tools/Message Filters.
 Create a new filter via the button [New...].
 Give it some name. I did not change the hooks, so the hook on "Manually Run" and on "Getting New Mail: Filter before Junk Classification" are set. The next two hooks are unset. I set the radio button to "Match any of the following", although this is not necessary for one entry.
 For the filter options I choose "Body" "contains" "spamerwebsjte".
 (The "j" is just to create a new word.)
 Under "Performs these actions:" choose "Move Message to" "Trash on GMX"
 Press [OK]

2.) Switch to sending pure HTML-mails without plain text in the following way:
 In the Menu go to Tools/Options/Composition/[Send Options...].
 In the drop-down-menu choose "Send the message in HTML anyway".
 Press [OK] [OK].

3.) Send the following HTML-mail to yourself:
  Create a new message, address it to any POP-account you can read. A senseful subject could be "Test with short line".
  Click the message body and then in the Menu choose Insert/HTML... . 
  Paste the following content:

<html>
  <body>
  Dear Reader,<br> 
  <a href="http://www.some.spamerwebsjte.tat.i.want.to.filter.out.com">Click Me !</a>
  </body>
</html>

3.) Click [Insert]. Before sending you might like to save the message as template for further tests.
 Send the message.

4.) Get new messages from the POP account you have send the mail to.


Actual results:

When receiving the message, it is not filtered i.e. moved to the Trash but it stays in the Inbox. In the message source there is no line break before the "href"-command:

[...]
Dear Reader,<br>
    <a href="http://www.some.spamerwebsjte.tat.i.want.to.filter.out.com">Click

      Me !</a>
[...]


Expected results:

The word "spamerwebsjte" should have been found by the filter and the message should have been moved to the trash.

But the filter works, if you make the line with the link a bit longer in the test mail: use this test message, where only one additional letter is introduced in "that" instead of "tat":
 (a senseful subject would be "Test with long line")

<html>
  <body>
  Dear Reader,<br> 
  <a href="http://www.some.spamerwebsjte.that.i.want.to.filter.out.com">Click Me !</a>
  </body>
</html>

After sending and receiving this message, it is moved to the trash and the longer line is broken in the source before the "href"-command:

[...]
Dear Reader,<br>
    <a
      href="http://www.some.spamerwebsjte.that.i.want.to.filter.out.com">Click

      Me !</a>
[...]

Obviously, this additional line break, that appears automatically, makes the filter to work correctly.
But the filter must find the search word in both test mails.
(Reporter)

Updated

3 years ago
OS: Unspecified → Windows Vista
Hardware: Unspecified → x86
(Assignee)

Comment 1

3 years ago
Confirmed.

Adding:
  <body>
  Dear Reader,<br> 
  <a href="http://www.some.spamerwebsjte.tat.i.want.to.filter.out.com">Click Me !</a>
  </body>
the link line is broken before the href and the message is filtered, regardless of filter configuration
(Filter before/after Junk Classification).

Adding:
  <body bgcolor="#FFFFFF" text="#000000">
    <p> Dear Reader,<br>
      <a href="http://www.some.spamerwebsjte">Click Me !</a> </p>
  </body>
results in the link line not being broken and the message is NOT filtered correctly.

I wonder whether filtering should happen only on visible text or on all text in the message.
Status: UNCONFIRMED → NEW
Ever confirmed: true
(Assignee)

Updated

3 years ago
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1211128
(Assignee)

Comment 3

3 years ago
OK, after some discussion in bug 1211128 we concluded that text in HTML tags should be ignored in the search. That this doesn't happen when the tag is broken across lines is a bug.
Processing is here:
http://mxr.mozilla.org/comm-central/source/mailnews/base/search/src/nsMsgBodyHandler.cpp#310

So I am re-opening the bug for this and correcting the summary.

The discussion of whether to search inside the tags can happen in the other bug 1211128.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Summary: Message filter with "Body contains" fails in some HTML-mails depending on line length → Message filter with "Body contains" does not ignore tag content when tag is broken across lines.
(Assignee)

Updated

3 years ago
Status: REOPENED → NEW
(Assignee)

Updated

3 years ago
Blocks: 519202
(Assignee)

Updated

3 months ago
Duplicate of this bug: 1498496
(Assignee)

Comment 5

3 months ago
Created attachment 9016862 [details] [diff] [review]
1230815-html-search.patch - WIP

Something like this might do it. Sadly causes test failures in
mach xpcshell-test comm/mailnews/base/test/unit/test_searchBody.js

So something isn't right yet. Also I'd need to add a test case where a HTML tag is split across lines.
Assignee: nobody → jorgk
(Assignee)

Comment 6

3 months ago
Created attachment 9016872 [details] [diff] [review]
1230815-html-search.patch (v2)

All tests in
mach xpcshell-test comm/mailnews/base/test/unit/test_searchBody.js
pass now. I'll add one for a tag broken between lines.
Attachment #9016862 - Attachment is obsolete: true
Attachment #9016872 - Flags: review?(acelists)
(Assignee)

Comment 7

3 months ago
Comment on attachment 9016872 [details] [diff] [review]
1230815-html-search.patch (v2)

Well, firstly it still doesn't quite work, as the test shows, and secondly, StripHtml() is a little simplistic since it doesn't look at strings, so

<tag attr1="xx>" attr2="xxx"> would only remove up to the first > :-(
Attachment #9016872 - Flags: review?(acelists)
(Assignee)

Comment 8

3 months ago
Created attachment 9016875 [details] [diff] [review]
1230815-html-search-test.patch - WIP

Sadly it does find the strings it shouldn't find :-(
(Assignee)

Comment 9

3 months ago
Created attachment 9016880 [details] [diff] [review]
1230815-html-search-test.patch (v2)

OK, the code works for a multipart message, but doesn't work for a single part message yet.
Attachment #9016875 - Attachment is obsolete: true
(Assignee)

Comment 10

3 months ago
Comment on attachment 9016880 [details] [diff] [review]
1230815-html-search-test.patch (v2)

The test is fine and passes with the next patch.
Attachment #9016880 - Flags: review?(acelists)
(Assignee)

Comment 11

3 months ago
Created attachment 9016884 [details] [diff] [review]
1230815-html-search.patch (v3)

This works now on simple and multipart messages.

We still might to improve StripHtml() while we're here.
Attachment #9016872 - Attachment is obsolete: true
Attachment #9016884 - Flags: review?(acelists)
(Assignee)

Comment 12

3 months ago
Created attachment 9016921 [details] [diff] [review]
1230815-html-search.patch (v3b)

I noticed that when accumulating lines of HTML I need to insert a space between them so the last word of a line and the first word of the next line aren't glued together producing false positives.
Attachment #9016884 - Attachment is obsolete: true
Attachment #9016884 - Flags: review?(acelists)
Attachment #9016921 - Flags: review?(acelists)
(Assignee)

Comment 13

3 months ago
Created attachment 9016922 [details] [diff] [review]
1230815-html-search.patch (v3c)

Oops, the m_stripHtml removal hunks were missing.
Attachment #9016921 - Attachment is obsolete: true
Attachment #9016921 - Flags: review?(acelists)
Attachment #9016922 - Flags: review?(acelists)
(Assignee)

Comment 14

3 months ago
Created attachment 9016925 [details] [diff] [review]
1230815-html-search-test.patch (v2b)

Added another test that we find stuff again once the tag is closed and that we produce false positives at the end of the line.
Attachment #9016880 - Attachment is obsolete: true
Attachment #9016880 - Flags: review?(acelists)
Attachment #9016925 - Flags: review?(acelists)

Comment 15

3 months ago
Comment on attachment 9016922 [details] [diff] [review]
1230815-html-search.patch (v3c)

Review of attachment 9016922 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks, this seems to work for me. Strips the value in the tag attributes and even makes the split "Click   Me" findable via search.
Attachment #9016922 - Flags: review?(acelists) → review+

Comment 16

3 months ago
Comment on attachment 9016925 [details] [diff] [review]
1230815-html-search-test.patch (v2b)

Review of attachment 9016925 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/test/data/HTML-with-split-tag2.eml
@@ +24,5 @@
> +<body>
> +HTML part. Now comes the tag
> +
> +<tag attr1=ShouldNotFindThis
> +attr2=ShouldntFindThisEither>

Please quote one attribute so that we see proper syntax also works :)

And please make another tag that is properly closed to check also that case.

And also test searching the string inside tags that is wrapped, if we already do not have a test for it.
As in the report.
<a
href="xx">Click
Me</a>
... and search for "Click Me".
Attachment #9016925 - Flags: review?(acelists) → review+
(Assignee)

Comment 17

3 months ago
(In reply to :aceman from comment #16)
> Please quote one attribute so that we see proper syntax also works :)
attr=xxx is proper syntax, no quotes required, but I can add them.

> And please make another tag that is properly closed to check also that case.
I don't understand. The tag is both properly closed. What would you like me to do exactly?

> <a
> href="xx">Click
> Me</a>
> ... and search for "Click Me".
Can do.

Comment 18

3 months ago
(In reply to Jorg K (GMT+2) from comment #17)
> > And please make another tag that is properly closed to check also that case.
> I don't understand. The tag is both properly closed. What would you like me
> to do exactly?

Tag <tag> in your test is unknown so it is not sure a real HTML parser would guess if it has to look for it's closing tag </tag> or not. Anyway, if you add the <a href>text </a> case, and search within the 'href' and the 'text' (where only 'text' will be found), you should have this covered.
(Assignee)

Comment 19

3 months ago
Created attachment 9019281 [details] [diff] [review]
1230815-html-search-test.patch (v3)

More tests.
Attachment #9016925 - Attachment is obsolete: true
Attachment #9019281 - Flags: review+
Attachment #9019281 - Flags: approval-comm-esr60?
Attachment #9019281 - Flags: approval-comm-beta+

Comment 20

3 months ago
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/8072f03fc22d
remove unused SetStripHtml() and accumulate HTML part to correct tag stripping. r=aceman
https://hg.mozilla.org/comm-central/rev/617e07ac0498
remove unused SetStripHtml() and accumulate HTML part to correct tag stripping, test. r=aceman
Status: NEW → RESOLVED
Last Resolved: 3 years ago3 months ago
Resolution: --- → FIXED
(Assignee)

Updated

3 months ago
Attachment #9016922 - Flags: approval-comm-esr60?
Attachment #9016922 - Flags: approval-comm-beta+

Comment 21

3 months ago
Comment on attachment 9019281 [details] [diff] [review]
1230815-html-search-test.patch (v3)

Review of attachment 9019281 [details] [diff] [review]:
-----------------------------------------------------------------

Great, thank.
Attachment #9019281 - Flags: review+
(Assignee)

Updated

3 months ago
Blocks: 1501358
(Assignee)

Updated

3 months ago
Target Milestone: --- → Thunderbird 65.0
(Assignee)

Comment 22

3 months ago
Beta (TB 64):
https://hg.mozilla.org/releases/comm-beta/rev/033b51e4a841
https://hg.mozilla.org/releases/comm-beta/rev/91563a76b489
status-thunderbird64: --- → fixed
status-thunderbird65: --- → fixed
(Assignee)

Comment 23

2 months ago
Comment on attachment 9016922 [details] [diff] [review]
1230815-html-search.patch (v3c)

Good for 60.4.
Attachment #9016922 - Flags: approval-comm-esr60? → approval-comm-esr60+
(Assignee)

Updated

2 months ago
Attachment #9019281 - Flags: approval-comm-esr60? → approval-comm-esr60+
You need to log in before you can comment on or make changes to this bug.