Quick Search "Message body filter" does not find message text with umlauts (ä,ö,ü) in saved drafts messages (searching unparsed HTML entities ü etc. in text/html), but succeeds for same msg when received (as text/plain, charset=ISO-8859-1)

NEW
Unassigned

Status

Thunderbird
Search
9 years ago
2 months ago

People

(Reporter: Thomas D. (currently busy elsewhere), Unassigned)

Tracking

(Blocks: 2 bugs)

x86
Windows XP
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments)

Created attachment 405765 [details]
Testmail1, DRAFT with umlauts in body

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.5pre) Gecko/20091010 Shredder/3.0pre

STR (test mail attached)
1) compose mail containing words with umlauts (ä,ö,ü) in body text
2) save as draft
3) in draft folder, select "Message body filter" quicksearch and
4) filter for word with German umlauts (ä,ö,ü, e.g. Münster) that is in the body
5) do the same on msg after having received it (see second attachment)

expected
4) and 5): msg body filter should find the msg

actual
4) msg body filter does not find the draft msg
5) msg body filter finds the same msg after it was received
(Reporter)

Updated

9 years ago
Summary: Quick Search "Message body filter" does not find message text with umlauts in saved drafts (Character-encoding?) → Quick Search "Message body filter" does not find message text with umlauts in saved drafts messages (Character-encoding?)
(Reporter)

Comment 1

9 years ago
Created attachment 405766 [details]
Testmail2, RECEIVED (from same msg as testmail1)

This is basically the same message as testmail1, but after receiving it in inbox.

Comment 2

6 years ago
Thomas, does (ä,ö,ü) still fail?
Summary: Quick Search "Message body filter" does not find message text with umlauts in saved drafts messages (Character-encoding?) → Quick Search "Message body filter" does not find message text with umlauts (ä,ö,ü) in saved drafts messages (Character-encoding?)

Comment 3

6 years ago
it's WFM with current nightly

Updated

6 years ago
Blocks: 812827

Comment 4

4 years ago
(In reply to Wayne Mery (:wsmwk) from comment #2)
> Thomas, does (ä,ö,ü) still fail?
Flags: needinfo?(bugzilla2007)
(Reporter)

Comment 5

4 years ago
Yes, this still fails, both TB24 and Trunk (32.0a1 (2014-05-01))

This obviously depends how the umlauts are saved in draft, e.g. the word Münster:

In TB 24, composing new msg:

<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    M&uuml;nster<br>

Quick filtering for ü fails, regardless of containing folder (after copying the draft into other folders).
Quick filtering for &uuml; (sic) succeeds.
Fwiw, that's on a German Version of TB 24 sharing Profile with English Version of TB 24.
Not sure if that can cause confusion in language settings?

In Trunk, composing new msg:

<meta http-equiv="content-type" content="text/html; charset=utf-8">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    Münster

Or at least that's what Ctrl+U msg source viewer shows, which is probably a bug in the source viewer.
If saved as .eml, then opened with Notepad++ advanced text editor, it shows as UTF correctly having the word "Münster". But search still fails, see below.
That's on English Daily, profile should be reasonably clean.

Quick filtering for ü fails, regardless of folder (after copying the draft around)
Quick filtering for ü fails.
Quick filtering for &uuml; succeeds when they are in source (not applicable here).
Flags: needinfo?(bugzilla2007)
(Reporter)

Comment 6

4 years ago
For later duping
See Also: → bug 344130

Updated

3 years ago
See Also: → bug 1042681

Comment 7

7 months ago
As bug 1427124 shows, this doesn't have anything to do with drafts, but with messages which have plaintext and HTML part as the same time, like all drafts.
See Also: → bug 1427124
Summary: Quick Search "Message body filter" does not find message text with umlauts (ä,ö,ü) in saved drafts messages (Character-encoding?) → Quick Search "Message body filter" does not find message text with umlauts (ä,ö,ü) in saved drafts messages (since they are multipart)
(Reporter)

Comment 8

7 months ago
(In reply to Jorg K (GMT+1) from comment #7)
> As bug 1427124 shows, this doesn't have anything to do with drafts, but with
> messages which have plaintext and HTML part as the same time, like all
> drafts.

???

I described correctly what I saw at the time, and the evidence is still attached. This had everything to do with drafts at the time of reporting, because the same msg failed when searching the saved draft, but succeeded when searching the received message. And I don't see any multipart in either test message, both have only one part, and both are MIME messages.
In a way this another variation/symptom of the downgrading HTML to plain text saga, only this time plain text won for successful quick filtering, and HTML failed at the time of reporting.

My Comment 5 (almost 5 years later, so might not exactly apply to test cases from 8 years ago) correctly points to the most likely cause of this at the time, which is encoding (so I don't see why you removed that from the sumary):
Draft = HTML -> &uuml; in source -> search für "ü" fails, but search for "&uuml;" succeeds -> searching raw text/HTML
Received = plaintext -> some other encoding (charset=ISO-8859-1) -> search succeeds for that text/plain encoding/charset.

That's essentially the same as what you're saying in bug 1427124, comment 5:
> Most likely the search is done on the raw UTF-8, so only ASCII text is found.

I don't see the link of that with multipart messages, can you enlighten me?
Pls don't just make me look wrong without reading the bug and testcases.
Summary: Quick Search "Message body filter" does not find message text with umlauts (ä,ö,ü) in saved drafts messages (since they are multipart) → Quick Search "Message body filter" does not find message text with umlauts (ä,ö,ü) in saved drafts messages (searching unparsed entities in text/html), but succeeds for same msg when received (as text/plain, charset=ISO-8859-1)
(Reporter)

Comment 9

7 months ago
So from here we need to revisit the testcases and see what we're doing today under the same circumstances.

Updated

7 months ago
Attachment #405765 - Attachment mime type: message/rfc822 → text/plain

Updated

7 months ago
Attachment #405766 - Attachment mime type: message/rfc822 → text/plain

Comment 10

7 months ago
Wow, you're right, I didn't look at the test cases from back then. And drafts aren't even multipart :-(
So I was all wrong. That said, I have no idea how you managed to get
  M&uuml;nster ist eine der sch&ouml;nsten St&auml;dte der Welt.
into the draft. But yes, that wouldn't be found.

Sorry about the confusion and my mistake.

The basic problem is another facet of bug 1259534: We search some raw data instead of converting it into un-escaped and decoded text first.

Updated

7 months ago
Blocks: 519202
(Reporter)

Comment 11

7 months ago
(In reply to Jorg K (GMT+1) from comment #10)
> Wow, you're right, I didn't look at the test cases from back then. And
> drafts aren't even multipart :-(
> So I was all wrong. That said, I have no idea how you managed to get
>   M&uuml;nster ist eine der sch&ouml;nsten St&auml;dte der Welt.
> into the draft.

Wasn't me, it was Thunderbird (at the time, long back, but I was already there...)

> But yes, that wouldn't be found.

Even today, in 2017. Just tested. And then, it's not all that hard to get &uuml; in source when importing .eml messages not created by TB...

> Sorry about the confusion and my mistake.

No problem, thanks.

> The basic problem is another facet of bug 1259534: We search some raw data
> instead of converting it into un-escaped and decoded text first.

Yes. That's an ugly bug that should be terminated. I know it's a multipart (pun intended) hydra, but cutting off a head here and there might one day kill the beast. Alternatively, blast the whole thing away and start reassembling phoenix from the ashes... Ah well, just dreaming... :|
(Reporter)

Updated

2 months ago
Summary: Quick Search "Message body filter" does not find message text with umlauts (ä,ö,ü) in saved drafts messages (searching unparsed entities in text/html), but succeeds for same msg when received (as text/plain, charset=ISO-8859-1) → Quick Search "Message body filter" does not find message text with umlauts (ä,ö,ü) in saved drafts messages (searching unparsed HTML entities &uuml; etc. in text/html), but succeeds for same msg when received (as text/plain, charset=ISO-8859-1)
You need to log in before you can comment on or make changes to this bug.