521649 - Quick Search "Message body filter" does not find message text with umlauts (ä,ö,ü) in saved drafts messages (searching unparsed HTML entities ü etc. in text/html), but succeeds for same msg when received (as text/plain, charset=ISO-8859-1)

Reporter

Description

•

15 years ago

Attached file Testmail1, DRAFT with umlauts in body — Details

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.5pre) Gecko/20091010 Shredder/3.0pre

STR (test mail attached)
1) compose mail containing words with umlauts (ä,ö,ü) in body text
2) save as draft
3) in draft folder, select "Message body filter" quicksearch and
4) filter for word with German umlauts (ä,ö,ü, e.g. Münster) that is in the body
5) do the same on msg after having received it (see second attachment)

expected
4) and 5): msg body filter should find the msg

actual
4) msg body filter does not find the draft msg
5) msg body filter finds the same msg after it was received

Thomas D. (:thomas8)

Reporter

Updated

•

15 years ago

Summary: Quick Search "Message body filter" does not find message text with umlauts in saved drafts (Character-encoding?) → Quick Search "Message body filter" does not find message text with umlauts in saved drafts messages (Character-encoding?)

Thomas D. (:thomas8)

Reporter

Comment 1

•

15 years ago

Attached file Testmail2, RECEIVED (from same msg as testmail1) — Details

This is basically the same message as testmail1, but after receiving it in inbox.

Wayne Mery (:wsmwk)

Comment 2

•

12 years ago

Thomas, does (ä,ö,ü) still fail?

Summary: Quick Search "Message body filter" does not find message text with umlauts in saved drafts messages (Character-encoding?) → Quick Search "Message body filter" does not find message text with umlauts (ä,ö,ü) in saved drafts messages (Character-encoding?)

Wayne Mery (:wsmwk)

Comment 3

•

12 years ago

it's WFM with current nightly

David Lechner (:dlech)

Updated

•

12 years ago

Blocks: tb-drafts

Wayne Mery (:wsmwk)

Comment 4

•

10 years ago

(In reply to Wayne Mery (:wsmwk) from comment #2)
> Thomas, does (ä,ö,ü) still fail?

Flags: needinfo?(bugzilla2007)

Thomas D. (:thomas8)

Reporter

Comment 5

•

10 years ago

Yes, this still fails, both TB24 and Trunk (32.0a1 (2014-05-01))

This obviously depends how the umlauts are saved in draft, e.g. the word Münster:

In TB 24, composing new msg:

<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    M&uuml;nster<br>

Quick filtering for ü fails, regardless of containing folder (after copying the draft into other folders).
Quick filtering for &uuml; (sic) succeeds.
Fwiw, that's on a German Version of TB 24 sharing Profile with English Version of TB 24.
Not sure if that can cause confusion in language settings?

In Trunk, composing new msg:

<meta http-equiv="content-type" content="text/html; charset=utf-8">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    MÃ¼nster

Or at least that's what Ctrl+U msg source viewer shows, which is probably a bug in the source viewer.
If saved as .eml, then opened with Notepad++ advanced text editor, it shows as UTF correctly having the word "Münster". But search still fails, see below.
That's on English Daily, profile should be reasonably clean.

Quick filtering for ü fails, regardless of folder (after copying the draft around)
Quick filtering for Ã¼ fails.
Quick filtering for &uuml; succeeds when they are in source (not applicable here).

Flags: needinfo?(bugzilla2007)

Thomas D. (:thomas8)

Reporter

Comment 6

•

10 years ago

For later duping

Updated

•

9 years ago

Comment 7

•

6 years ago

As bug 1427124 shows, this doesn't have anything to do with drafts, but with messages which have plaintext and HTML part as the same time, like all drafts.

Comment 8

•

6 years ago

(In reply to Jorg K (GMT+1) from comment #7)
> As bug 1427124 shows, this doesn't have anything to do with drafts, but with
> messages which have plaintext and HTML part as the same time, like all
> drafts.

???

I described correctly what I saw at the time, and the evidence is still attached. This had everything to do with drafts at the time of reporting, because the same msg failed when searching the saved draft, but succeeded when searching the received message. And I don't see any multipart in either test message, both have only one part, and both are MIME messages.
In a way this another variation/symptom of the downgrading HTML to plain text saga, only this time plain text won for successful quick filtering, and HTML failed at the time of reporting.

My Comment 5 (almost 5 years later, so might not exactly apply to test cases from 8 years ago) correctly points to the most likely cause of this at the time, which is encoding (so I don't see why you removed that from the sumary):
Draft = HTML -> &uuml; in source -> search für "ü" fails, but search for "&uuml;" succeeds -> searching raw text/HTML
Received = plaintext -> some other encoding (charset=ISO-8859-1) -> search succeeds for that text/plain encoding/charset.

That's essentially the same as what you're saying in bug 1427124, comment 5:
> Most likely the search is done on the raw UTF-8, so only ASCII text is found.

I don't see the link of that with multipart messages, can you enlighten me?
Pls don't just make me look wrong without reading the bug and testcases.

Summary: Quick Search "Message body filter" does not find message text with umlauts (ä,ö,ü) in saved drafts messages (since they are multipart) → Quick Search "Message body filter" does not find message text with umlauts (ä,ö,ü) in saved drafts messages (searching unparsed entities in text/html), but succeeds for same msg when received (as text/plain, charset=ISO-8859-1)

Thomas D. (:thomas8)

Reporter

Comment 9

•

6 years ago

So from here we need to revisit the testcases and see what we're doing today under the same circumstances.

Jorg K (CEST = GMT+2)

Updated

•

6 years ago

Attachment #405765 - Attachment mime type: message/rfc822 → text/plain

Jorg K (CEST = GMT+2)

Updated

•

6 years ago

Attachment #405766 - Attachment mime type: message/rfc822 → text/plain

Jorg K (CEST = GMT+2)

Comment 10

•

6 years ago

Wow, you're right, I didn't look at the test cases from back then. And drafts aren't even multipart :-(
So I was all wrong. That said, I have no idea how you managed to get
  M&uuml;nster ist eine der sch&ouml;nsten St&auml;dte der Welt.
into the draft. But yes, that wouldn't be found.

Sorry about the confusion and my mistake.

The basic problem is another facet of bug 1259534: We search some raw data instead of converting it into un-escaped and decoded text first.

Jorg K (CEST = GMT+2)

Updated

•

6 years ago

Blocks: qfasfailtracker

Thomas D. (:thomas8)

Reporter

Comment 11

•

6 years ago

(In reply to Jorg K (GMT+1) from comment #10)
> Wow, you're right, I didn't look at the test cases from back then. And
> drafts aren't even multipart :-(
> So I was all wrong. That said, I have no idea how you managed to get
>   M&uuml;nster ist eine der sch&ouml;nsten St&auml;dte der Welt.
> into the draft.

Wasn't me, it was Thunderbird (at the time, long back, but I was already there...)

> But yes, that wouldn't be found.

Even today, in 2017. Just tested. And then, it's not all that hard to get &uuml; in source when importing .eml messages not created by TB...

> Sorry about the confusion and my mistake.

No problem, thanks.

> The basic problem is another facet of bug 1259534: We search some raw data
> instead of converting it into un-escaped and decoded text first.

Yes. That's an ugly bug that should be terminated. I know it's a multipart (pun intended) hydra, but cutting off a head here and there might one day kill the beast. Alternatively, blast the whole thing away and start reassembling phoenix from the ashes... Ah well, just dreaming... :|

Thomas D. (:thomas8)

Reporter

Updated

•

6 years ago

Summary: Quick Search "Message body filter" does not find message text with umlauts (ä,ö,ü) in saved drafts messages (searching unparsed entities in text/html), but succeeds for same msg when received (as text/plain, charset=ISO-8859-1) → Quick Search "Message body filter" does not find message text with umlauts (ä,ö,ü) in saved drafts messages (searching unparsed HTML entities ü etc. in text/html), but succeeds for same msg when received (as text/plain, charset=ISO-8859-1)

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

georg.weickelt

Comment 12

•

6 months ago

Bug 1855637 looks very similar to this

Wayne Mery (:wsmwk)

Updated

•

6 months ago

Duplicate of this bug: 1855637

Testmail1, DRAFT with umlauts in body 15 years ago Thomas D. (:thomas8) 1.14 KB, text/plain		Details
Testmail2, RECEIVED (from same msg as testmail1) 15 years ago Thomas D. (:thomas8) 973 bytes, text/plain		Details

Quick Search "Message body filter" does not find message text with umlauts (ä,ö,ü) in saved drafts messages (searching unparsed HTML entities &uuml; etc. in text/html), but succeeds for same msg when received (as text/plain, charset=ISO-8859-1)

Quick Search "Message body filter" does not find message text with umlauts (ä,ö,ü) in saved drafts messages (searching unparsed HTML entities ü etc. in text/html), but succeeds for same msg when received (as text/plain, charset=ISO-8859-1)