Closed Bug 1647119 Opened 4 years ago Closed 4 years ago

Body filter not working with greek characters

Categories

(Thunderbird :: Filters, defect)

defect

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1245532

People

(Reporter: callmejames, Unassigned)

Details

Attachments

(2 files)

User Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36 OPR/68.0.3618.165

Steps to reproduce:

I set the following conditions:

-Match all the following
-Body | Contains | μύκητες

-Perform these actions:
-Delete message

Then I clicked the button "Run now"

Actual results:

Absolutely nothing, so Thunderbird cannot filter spam messages!

Expected results:

Thunderbird should have found the word "μύκητες" on the body and deleted the annoying message.

Please attach the offending message as .eml

Sent myself messages both in plain text and html, with just subject and just body, utf-8
Body search fails for with version 78, works with 68. Subject search works

Also fails with search on server.
Also fails using quick filter bar.

Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: Body filter set for a single word NOT working! → Body filter not working with greek characters
Attached file body.eml
Attachment #9161142 - Attachment mime type: message/rfc822 → text/plain

Wayne, I tried the message you provided in a local folder using the QFB and "Search Messages" in both TB 68 and TB 78 and μύκητες is found in both cases. It's impossible that this is a bug in local folders, or, I believe, also synchronised IMAP folders, since we have totally extensive tests for this (which I wrote):
https://searchfox.org/comm-central/source/mailnews/base/test/unit/test_searchBody.js#51
Greek text is in one of the examples.

So what are the STR exactly?

Attached image body search.png

I set up a filter for adding a star to messages with μύκητες in the body, and that worked, too. On a local folder.

Yesterday's initial test was mostly creating messages on Mac and copied to other accounts. Also, I stopped using search and filters - simplified testing to using only quick filter.

I've done a lot more testing - I'm mostly only going to report the last bits:

  • on Windows - With the yesterday's messages I have success with both 68 AND 78 on WINDOWS - but these were on messages that had been copied to accounts or received in those accounts
  • On Mac quick filter on the account the message was sent from, vseerror.
  • on Mac, I copied the test messages to another account , then back to vseerror using Mac. Now quick filter on Mac succeeds in the vseerror account. Both accounts have "sync" enabled.

Next, today on Mac, I create new plain text test message with μύκητες sending from vseerror to vseerror. quick filter fails. I compared message yysource of Thursday's message to Friday's - they look the same.

Next, I tested global search, it finds both the message I sent today and Thursday's message that copied I between accounts message. Conclusion - I can't believe I am saying this - the sent and received message are stored differently on disk? Or one is coming from cache and another from the synced folder?

For good measure, some final tests on Mac

  • message sent from vseerror to two other accounts - QF only fails in vseerror from where it was sent (works in other two receiving accounts)
  • message sent from luwsm to vseerror - QF only fails filtering luwsm sent folder (not vseerror inbox)
  • message sent from luwsm to luwsm - QF fails on both sent folder and Inbox
    All of the above are gmail enterprise accounts.

And now the nails in the coffin: (to eliminate possible gmail strangeness)

  • fastmail account sent from wsmwk to wsmwk (both messages are in the Inbox because I have sent message going to Inbox) QF finds ONLY the RECEIVED message - it should have found both the copy in "send folder" and received
  • newsgroup posting to mozilla.test - QF finds the copy in the sent folder (local folder) - conclusion, sent message handled different for imap vs local?

To narrow the issue down, can you please do this more systematically:

  1. On Windows and Mac, do body search in a local folder
  2. On Windows and Mac, do body search in a locally sync'ed IMAP folder
  3. Repeat for non-sync folder.

We need to know whether to look for the bug in local body search, which is subject to the test I mentioned, or whether we have an IMAP issue here.

There is bug 1245532 with 7 duplicates and also bug 404255. I doubt that you're seeing a new issue here.

If you see differences in point 1 between Windows and Mac, I will comment further, just as a teaser:
https://stackoverflow.com/questions/7931204/what-is-normalized-utf-8-all-about
Quote: Unicode includes multiple ways to encode some characters, most notably accented characters.
... and it is known that Mac uses a different normalisation, so if the normalisation of text entered in the search box doesn't coincide with the normalisation of the text in the body, you have a problem.

<<There is bug 1245532 with 7 duplicates and also bug 404255. I doubt that you're seeing a new issue here.>>
Out of curiosity, why those bugs aren't fixed?

<<Unicode includes multiple ways to encode some characters, most notably accented characters.
... and it is known that Mac uses a different normalisation, so if the normalisation of text entered in the search box doesn't coincide with the normalisation of the text in the body, you have a problem.>>

This is characteristic of today's era of incredible techno-idiocy, techno-sloppiness and greed, and that's definitely not an excuse in 21st century, I'd be ashamed to even use this as an excuse.

Meanwhile, lately I'm getting a storm of spam on my official email (shared only with companies where I'm a customer) ... I'm creating 20 F*** filters per day! I got the first spam batch after registering domains with godaddy and I transferred all of my domains to another company after I realized that (they didn't even answered my complaint for the leak).

How about a filter that moves to trash ALL email, except those addresses on a white list?
That would solve the spam epidemic for good.

Looks like an IMAP issue.

How about a filter that moves to trash ALL email, except those addresses on a white list?

You can do that: From doesn't contain and From doesn't contain, etc. Not much fun to manage.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: