Investigate whether writing a meta tag in mimethsa.cpp is needed and whether writing lang=x-unicode in mimethtm.cpp makes any sense

RESOLVED FIXED in Thunderbird 62.0

Status

enhancement
RESOLVED FIXED
Last year
Last year

People

(Reporter: jorgk, Assigned: jorgk)

Tracking

Trunk
Thunderbird 62.0

Thunderbird Tracking Flags

(thunderbird_esr52 fixed, thunderbird_esr60 fixed, thunderbird60 fixed, thunderbird61 wontfix, thunderbird62 fixed)

Details

Attachments

(1 attachment, 1 obsolete attachment)

Investigate whether writing a meta tag in mimethsa.cpp is needed and whether writing lang=x-unicode in mimethtm.cpp makes any sense.

Working of bug 1419417 we noticed a few strange things in mimethsa.cpp and mimethtm.cpp.

mimethsa.cpp originally wrote a Context-Type which was a no-op an corrected in bug 1462895. Maybe we can drop the meta tag for Content-Type.

mimethtm.cpp still writes 
<div class="moz-text-html" lang="x-western"></div> or
<div class="moz-text-html" lang="x-unicode"></div>
which appears wrong.
Please be careful. If you look at Firefox Preferences Font dialog, you see that it mixes languagues with "Unicode". If you look in browser/components/preferences/fonts.xul, it actually uses "x-unicode" for what's now called "Other languages" in the UI, and "x-western" also exists and is called "Latin" in the Firefox Font UI. So that code is actually perfectly correct for Gecko. And that's the point here, to pick the right font family for display. So, please don't be presumptuous in judging and esp. modifying that code. Like a grandfather, libmime is old code, and it's not perfect, but it does work. Be careful before dismissing it, you might really break stuff.
As far as I remember, this maps to fonts. And this is indeed used in the Firefox Font UI exactly like that. This was a bug fix. It's specifically made to let Gecko pick certain fonts that can be selected in the UI.

FYI, even if it's not strictly correct by the standard, this was written to work, and it does work. Or at least, it did work at the time of writing, and we have no contrary reports.
Sorry Ben, you're confused here. I'm quite aware of encodings and writing systems and how Gecko tries to derive one from the other. Further reading in bug 1228193 comment #10.

Unicode (UTF-8) and Western (windows-1252) in the |View > Text Encoding| menu are encodings. "x-unicode" (Other Languages) and "x-western" (Latin) are writing systems. Language is something different, but language maps to writing system.
https://dxr.mozilla.org/mozilla-central/source/intl/locale/langGroups.properties

Stuffing a writing system into the lang attribute seems 100% wrong, or can you find the code that would prove the opposite?

I removed the Content-Type and the lang attribute and things still seem to be working.
Like I wrote in the other bug the Context-Type typo is made that code never do anything, so it should be safe to kill it off.
I agree. Please refer to bug 1429491 comment #24 there I suggest the removal of the meta tag *and* the class=moz-text-html with lang="writing system". Sorry about the double-up, I just remembered the other bug when doing the MIME stuff today.
OK, let's see what breaks:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=eaadf5981024f27861934b1d66a761385880fdda
Assignee: nobody → jorgk
Status: NEW → ASSIGNED
That didn't work since the parse/serialise adds a meta tag internally, so that can't be removed from the tests. New try leaving the tests as they are:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=11ab3273df361ce0c527c4357765e29c8a4ba5d6
Attachment #8979362 - Attachment is obsolete: true
Attachment #8979375 - Flags: review?(mkmelin+mozilla)
Comment on attachment 8979375 [details] [diff] [review]
1429491-remove-MIME-junk.patch (v2)

Review of attachment 8979375 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM though I don't think we should uplift the mimethtm.cpp hunk to 52. r=mkmelin
Attachment #8979375 - Flags: review?(mkmelin+mozilla) → review+
I don't plan an uplift of any of this to TB 52.

Here some testing. I prepared a message like so (source of draft):
Content-Type: text/html; charset=ISO-2022-JP
Content-Language: en-GB
Content-Transfer-Encoding: 7bit
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html;
      charset=ISO-2022-JP">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p><tt>$B$3$l$OD9$$F|K\8l$N%F%-%9%H$G$9$N$G!"9T$,@^$l$k$H;W$$$^$9!#(B</tt></p>
    <p><b><tt>This is Jap.</tt></b><br>
    </p>
  </body>
</html>

Without the patch, the HTML when viewed is:
<html><head>
<title>Jap ISO-2022-JP</title>
<link rel="important stylesheet" href="chrome://messagebody/skin/messageBody.css">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-text-html" lang="ja">
<meta http-equiv="content-type" content="text/html; charset=ISO-2022-JP">
</div>
    <meta http-equiv="Content-Type" content="text/html; ">
    <p><tt>これは長い日本語のテキストですので、行が折れると思います。</tt></p>
    <p><b><tt>This is Jap.</tt></b><br>
    </p>
</body></html>

Note lang="ja" and charset=ISO-2022-JP".

With the patch we have this:
<html><head>
<title>Jap ISO-2022-JP</title>
<link rel="important stylesheet" href="chrome://messagebody/skin/messageBody.css">
</head>
<body text="#000000" bgcolor="#FFFFFF">
    <meta http-equiv="Content-Type" content="text/html; ">
    <p><tt>これは長い日本語のテキストですので、行が折れると思います。</tt></p>
    <p><b><tt>This is Jap.</tt></b><br>
    </p>
</body></html>

So we lost the 
<div class="moz-text-html" lang="ja">
<meta http-equiv="content-type" content="text/html; charset=ISO-2022-JP">
</div>

I think that's cool. So I'm going to land this.
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/18881dd127e3
Remove unnecessary MIME preludes. r=mkmelin
Status: ASSIGNED → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 62.0
Attachment #8979375 - Flags: approval-comm-esr60+
Attachment #8979375 - Flags: approval-comm-beta+
> "x-unicode" (Other Languages) and "x-western" (Latin) are writing systems.

So, why do these exact strings appear in the Firefox font preferences? Of each of these map to a separate font selection.

> Stuffing a writing system into the lang attribute seems 100% wrong, or can you find the code that would prove the opposite?

Yes, the very code that you remove. It fixed a bug, and it worked. If you remove it, you regress bugs fixed decades ago.
followup in bug 1464391
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/c4f9db9e61ee
Partly backed out changeset 18881dd127e3 to restore <div class="moz-text-html" lang="...">. a=jorgk DONTBUILD
TB 60 beta 7 (BETA_60_CONTINUATION branch):
https://hg.mozilla.org/releases/comm-beta/rev/0d2a27eff60a0fb5a396cde53e4a41f2a8aa52fd
Partly backed out changeset b39d98d20e6d to restore <div class="moz-text-html" lang="...">. a=jorgk DONTBUILD
Component: General → MIME
You need to log in before you can comment on or make changes to this bug.