Closed Bug 1420796 Opened 7 years ago Closed 7 years ago

Message Filter Body Contains matching "string" in inlined jpg data

Categories

(Thunderbird :: Untriaged, defect)

52 Branch
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1259534

People

(Reporter: rtresidd, Unassigned)

References

(Blocks 1 open bug)

Details

User Agent: Mozilla/5.0 (Windows NT 6.0; WOW64; rv:51.0) Gecko/20100101 Firefox/51.0 Build ID: 20170125094131 Steps to reproduce: I had set up a message filter to junk emails that contained "SEO" in the body. However it seems that this also matched some emails that contained embedded images in the html body. ie SEO in any capitalization within the base64 data of the jpg image was triggering the rule. Image is embedded into html via this tag within the html body part of the message: **** html body section tag to place image: <img border=3D0 width=3D623 = height=3D175 id=3D"Picture_x0020_1" = src=3D"cid:image001.jpg@01D36768.8B8C16F0" alt=3D"EMIT Email = Footer"> ****** end html body section ------=_NextPart_001_0002_01D3676C.BB70B110 Content-Type: image/jpeg; name="image001.jpg" Content-Transfer-Encoding: base64 Content-ID: <image001.jpg@01D36768.8B8C16F0> ******** image data that happens to contain "SEO" in it somewhere..****** ------=_NextPart_001_0002_01D3676C.BB70B110-- Actual results: The email was junked even though the matched string was just part of a base64 blob for an image.. Expected results: The filter shouldn't be scanning within mime types that are effectively binary data. This makes utilizing the body contains method impossible. Should only scan the actual "text" of the body..
Sadly body search is pretty rotten, one of my pet hates. One day we might fix it.
Status: UNCONFIRMED → RESOLVED
Closed: 7 years ago
Resolution: --- → DUPLICATE
I saw that one, but thought that was a bit of a different scenario. But I guess it all falls into the same issue of not decoding the message and then only analyzing the text part of the body.. Cheers
It's the same problem. It searches the base64-encoded message and finds things it shouldn't find (this bug) and doesn't find things it should find if the word is contained in a base64-encoded plaintext or HTML part. Sadly the architecture is wrong, so it's hard to fix this bug. It's right on top of the to-do list, but with very few workers, we can't give you an ETA.
OK, let's shuffle this back to where it came from. Bug 1259534 comment #22 contains an analysis of which MIME structures work and which don't. Let's dupe it back over there since we'll try to fix this together. I'm not 100% sure that filters and search use the same code though, so we'll revisit this if we ever fix bug 1259534.
I have confirmed that this is fixed by bug 1259534.
You need to log in before you can comment on or make changes to this bug.