1713786 - Fix "Repair Text Encoding" menu item not doing anything

Assignee

Description

•

3 years ago

After bug 1704749 the "Repair Text Encoding" menu item will not do anything, since assigning the magic auto charset to the message window charset throws (this was already the case with the "Automatic" charset item before).

We can fix the display auto detection by skipping the message window parts. However, for optimal operation we should actually determine the detected charset and use that for all the operations. That way quoting the fixed message also uses the updated charset.

José M. Muñoz

Comment 1

•

3 years ago

this was already the case with the "Automatic" charset item before

... and the "Japanese" entry already in TB 78, see bug 1677104.

José M. Muñoz

Comment 2

•

3 years ago

Attached patch 1713786.patch - Solution idea (obsolete) — Details — Splinter Review

This works on a couple of plain text messages I tried. Maybe this is the way to go. It will need refinement for multipart messages depending on whether the plaintext or HTML part is displayed, see pref mailnews.display.html_as.

José M. Muñoz

Updated

•

3 years ago

Attachment #9224827 - Attachment is patch: true

Wayne Mery (:wsmwk)

Updated

•

3 years ago

Version: unspecified → Thunderbird 91

Richard Marti (:Paenglab)

Comment 3

•

3 years ago

Comment on attachment 9224827 [details] [diff] [review]
1713786.patch - Solution idea

I'm adding Magnus for review to get action on the bug.

Attachment #9224827 - Flags: review?(mkmelin+mozilla)

Magnus Melin [:mkmelin]

Comment 4

•

3 years ago

Comment on attachment 9224827 [details] [diff] [review] 1713786.patch - Solution idea Review of attachment 9224827 [details] [diff] [review]: ----------------------------------------------------------------- I'll let Martin check this

Attachment #9224827 - Flags: review?(mkmelin+mozilla) → review?(martin)

José M. Muñoz

Comment 5

•

3 years ago

Comment on attachment 9224827 [details] [diff] [review]
1713786.patch - Solution idea

This is not really ready for review, see comment #2. The patch also contains copied code which should be moved to a common location. Meanwhile I noticed this add-on:
https://addons.thunderbird.net/en-US/thunderbird/addon/charset-menu/ which reinstates the charset menu for those who want it.

Attachment #9224827 - Flags: review?(martin)

José M. Muñoz

Comment 6

•

3 years ago

It's possible (haven't tried yet) that the solution doesn't work any more after
https://hg.mozilla.org/comm-central/rev/9473904d2047fc7d42476625eb8f4cf8d5a6f11f#l12.408
See my question in bug 1713145 comment #15.

John Bieling (:TbSync)

Comment 7

•

3 years ago

The patch in bug 1713145 does not alter any core behavior. It fixes the newly introduces MimeParser.extractMimeMsg() jsmime based mime parser to align better with Thunderbirds core libmime based mime parser. It is currently only used in the messages WebExtension API for nntp messages.

José M. Muñoz

Comment 8

•

3 years ago

Attached patch 1713786.patch - Solution idea (refreshed) (obsolete) — Details — Splinter Review

The patch in bug 1713145 does not alter any core behavior.

Well, mailnews/mime/src/mimeParser.jsm appears to a Thunderbird core module, and yes, its behaviour has changed to the point where the patch here doesn't work any more. I've fixed it now by explicitly requesting strformat: "binarystring".

Refer to comment #2 for more work to be done here.

Attachment #9224827 - Attachment is obsolete: true

John Bieling (:TbSync)

Comment 9

•

3 years ago

I was not aware (and did not check) that you actually use my emitter, since I added it just 2 weeks ago. With "does not alter any core behavior" I meant no core component is using it and therefore nothing could have changed.

You are only looking at the first parts body, so you do not need the full emitter, maybe it is better to use extractHeadersAndBody or add a simple emitter for your use case. I am not a friend of making the strformat of my emitter configurable.

José M. Muñoz

Comment 10

•

3 years ago

Thanks for the further comment, extractHeadersAndBody() indeed sounds more suitable. I don't intend to fix this bug here. All I've done was to present a possible solution, which is basically:

   // get the relevant body part somehow, depends on what parts are present
   // and which part is displayed (pref mailnews.display.html_as)
+  let body = ...
+  let compUtils = Cc[
+    "@mozilla.org/messengercompose/computils;1"
+  ].createInstance(Ci.nsIMsgCompUtils);
+  let charset = compUtils.detectCharset(body);
+
+  msgWindow.mailCharacterSet = charset;

I hope someone will implement this correctly or come up with a different solution.

José M. Muñoz

Updated

•

3 years ago

tracking-thunderbird91: --- → ?

max

Comment 11

•

3 years ago

I am realizing that this particular issue is not the proper place for such complains, but it seems I am too late and more relevant tasks are closed already.

Is charset menu completely removed from thunderbird? During pre-unicode age, around year 2000. legacy 8-bit encodings were widely used. There are several charsets per language, e.g. koi8-r and windows-1251 for Russian (and cp866, iso8859-5, something rare for macs). Letters from Windows users often did not have explicit charset specified. Even on unix/linux it depended if localization were achieved with some hacks. I regularly received messages with some "assumed" encoding or event with wrongly specified encoding. E.g. in pine it was necessary to setup special filters to read such messages. Tools intended to guess encoding were unreliable. That is why I consider charset menu is important. It can reside deeper in some submenu but it should be available for a case when it is necessary to read an old message. It is not common nowadays, so telemetry could easily show that it is rarely used, especially for firefox since many of old sites are already dead. From mail user agent I expect more conservative feature set. Please, consider restoring of ability to manually choose message encoding.

Sorry if I missed something and this comment is a false alarm.

José M. Muñoz

Comment 12

•

3 years ago

Is charset menu completely removed from thunderbird?

Yes, in bug 1704749, currently only visible in TB Daily. Did you see comment #5 mentioning https://addons.thunderbird.net/en-US/thunderbird/addon/charset-menu/?

IMHO the removal of features without hard facts to justify such removal is questionable (see discussion at the end of bug 1704749). Replacing the charset menu (mostly working, see bug 1677104) with a 100%-currently-not-working automatic feature is rather unfortunate. Let's hope it gets fixed before TB 91 goes to beta and later becomes the new TB 91 ESR. Currently fixing a wrong charset is completely broken in TB Daily for both message display (this bug here) and message source display (bug 1716059).

José M. Muñoz

Comment 13

•

3 years ago

There is another good reason to restore the charset menu as discovered in bug 1716059. The "repair" is based on automatic detection of the charset and that only works if you feed it 8bit content. For messages with CTE quoted-printable there is now no way to correct the charset.

José M. Muñoz

Comment 14

•

3 years ago

The patch here won't work if the message has CTE quoted-printable. In this case the retrieved body is ASCII and nothing can be detected. So at least QP needs to be decoded before passing the body to charset detection.

José M. Muñoz

Comment 15

•

3 years ago

Apologies! Just ignore the last two comments. The patch in attachment 9226588 [details] [diff] [review] does work, even for QP messages. Looks like extractMimeMsg() already returns the QP-decoded result. So it's just a matter of addressing comment #2 and comment #9/comment #10.

Henri Sivonen (:hsivonen)

Comment 16

•

3 years ago

(In reply to max from comment #11)

Tools intended to guess encoding were unreliable.

While chardetng isn't reliable for some Latin-script languages, notably Hungarian and Lithuanian, for email-length Russian inputs, it should be very, very accurate, though of course someone will find some counterexamples.

max

Comment 17

•

3 years ago

José, thank you for details. I have not tried quality of automatic detection yet (I hope, I can do it using experiments API for webextensions). In meanwhile a couple of additional notes:

Bug #1687635 comment #54 mentions https://support.mozilla.org/en-US/kb/text-encoding-no-longer-available-firefox-menu It seems, in Firefox encodings list is removed from hamburger menu only and it is still available from (hidden by default) menu bar. I have not tried it however.
Besides a body, a message may have attachments. By default text attachments are not shown, but configuration may be adjusted to display them. Encoding of some attachments may differ from encoding of the body. Choosing explicit encoding it was possible to read particular part even though all others became unreadable.

Henri Sivonen (:hsivonen)

Comment 18

•

3 years ago

(In reply to max from comment #17)

José, thank you for details. I have not tried quality of automatic detection yet (I hope, I can do it using experiments API for webextensions). In meanwhile a couple of additional notes:

Bug #1687635 comment #54 mentions https://support.mozilla.org/en-US/kb/text-encoding-no-longer-available-firefox-menu It seems, in Firefox encodings list is removed from hamburger menu only and it is still available from (hidden by default) menu bar. I have not tried it however.

The list was removed from the hamburger menu in 89. In the two remaining entry points, the list was replaced with the single action in 91.

Besides a body, a message may have attachments. By default text attachments are not shown, but configuration may be adjusted to display them. Encoding of some attachments may differ from encoding of the body. Choosing explicit encoding it was possible to read particular part even though all others became unreadable.

When the detector is invoked, it would make sense to run it on a per-message-part basis.

José M. Muñoz

Comment 19

•

3 years ago

Besides a body, a message may have attachments. ... Choosing explicit encoding it was possible to read particular part ...

Yes, being able to force a particular charset was useful, but somehow more an more niche features are removed from Thunderbird. You can now use the add-on I mentioned to do it. It has the added geek bonus that the charset name is mentioned explicitly, so you don't have to search for "Greek ISO" if you're interested, for example. Working automatic detection would be good for people who don't want to use "trial and error".

When the detector is invoked, it would make sense to run it on a per-message-part basis.

Good idea, but that will make the implementation harder. Besides, the entire message is one DOM document with many (attachment) parts strung together, so there can only be one charset. How would you select an inline attachment to use that part for the detection? I'd say all this is too geeky, you can save the attachment and view it. It's important that you can fix the message body if it's not properly displayed.

Henri Sivonen (:hsivonen)

Comment 20

•

3 years ago

(In reply to José M. Muñoz from comment #19)

Besides, the entire message is one DOM document with many (attachment) parts strung together, so there can only be one charset.

Yes, mailnews converts to UTF-8 before handing the data to m-c code.

How would you select an inline attachment to use that part for the detection?

The detection needs to happen, as it, I believe, already does for message parts without charset, in the MIME engine after decoding QP or Base64 but before converting the parts to UTF-8 and joining them together.

max

Comment 21

•

3 years ago

(In reply to José M. Muñoz from comment #19)

Yes, being able to force a particular charset was useful, but somehow more an more niche features are removed from Thunderbird. You can now use the add-on I mentioned to do it.

I would say that ability to deal with partially broken messages has a real value for a mail user agent. It is rarely required last years, but still...

Concerning extension, for browsers WebExtensions API limits to some extent what an add-on can do. Charset menu in Thunderbird requires experiments API, so full access. As the consequence, the author should have ideal reputation, e.g. it is necessary to be sure that he will not accidentally lost control, will not sell the extension to untrusted 3rd party company, etc. This particular extension does not have a link to source code repository, no license specified (though GPL-3 inside .xpi). I do not think, it is comfortable situation when someone is in a hurry to read a message, but it is necessary to install a suspicious add-on.

I have tried charset detection for several messages having KOI8-R and windows-1251 encodings. Charset is recognized correctly for these examples. KOI8-U instead of KOI8-R is not a real problem since messages do not have characters in the range where encodings differ from each other.

José M. Muñoz

Comment 22

•

3 years ago

Off-topic re. that add-on: Until TB 68 all add-ons were "privileged" (more than once breaking TB). The full source code is in the add-on. BTW, the author used to work for TB. For me the add-on works fine and I can't see anything rogue in the ~120 lines of JS.

I have tried charset detection for several messages having KOI8-R and windows-1251 encodings.

How did you do that since automatic detection is broken and has always been broken since it was never correctly implemented?

(In reply to Henri Sivonen (:hsivonen) from comment #20)

The detection needs to happen, as it, I believe, already does for message parts without charset, in the MIME engine after decoding QP or Base64 but before converting the parts to UTF-8 and joining them together.

As far as I can tell, there is no detection for messages without charset, this code
https://searchfox.org/comm-central/rev/4e5a1cfe83ae0b7eefdbe553f5cb56304b0e7d25/mailnews/mime/src/comi18n.cpp#37
isn't triggered during my debugging. In general, the MIME code is 20+ years old, there are few (if any) people on the TB team with a working understanding of it. As you can see, the bug has not received any "official" attention yet, that's why I suggested a hacky solution to get the body and detect its charset.

Martin Giger [:freaktechnik]

Assignee

Comment 23

•

3 years ago

(In reply to José M. Muñoz from comment #22)

the bug has not received any "official" attention yet

Its existence is "official" attention in the first place.

I agree that the solution of having the auto detection happen per mime part is probably the best way to do it.

José M. Muñoz

Comment 24

•

3 years ago

Sorry, slip of the keyboard, s/any/much/. Hard to understand how TB does its scheduling, it's been broken for two weeks now and nobody is assigned to the bug. No priority/severity set although this is a user-facing regression and loss of functionality.

Henri Sivonen (:hsivonen)

Comment 25

•

3 years ago

(In reply to max from comment #21)

KOI8-U instead of KOI8-R is not a real problem since messages do not have characters in the range where encodings differ from each other.

The detector never answers KOI8-R. That is, if the input looks like KOI8-R or KOI8-U, it answers KOI8-U without trying to distinguish between the two. Where the two encodings differ, it's 1) very unlikely that box drawing would be used on the Web or in email and 2) the failure mode of letters getting replaced with box segments is worse than the failure mode of box segments getting replaced with letters.

(In reply to José M. Muñoz from comment #22)

As far as I can tell, there is no detection for messages without charset, this code
https://searchfox.org/comm-central/rev/4e5a1cfe83ae0b7eefdbe553f5cb56304b0e7d25/mailnews/mime/src/comi18n.cpp#37
isn't triggered during my debugging.

Interesting. It's rather unfortunate that there's code that gets updated for m-c changes without checking that it does what it's supposed to be doing.

José M. Muñoz

Comment 26

•

3 years ago

Interesting. It's rather unfortunate that there's code that gets updated for m-c changes without checking that it does what it's supposed to be doing.

The other call site does get exercised so we know that charset detection works:
https://searchfox.org/comm-central/rev/4e5a1cfe83ae0b7eefdbe553f5cb56304b0e7d25/mailnews/compose/src/nsMsgCompUtils.cpp#75
There's also a test: test_detectAttachmentCharset.js.

MIME_detect_charset and its callers are only called in mimetext.cpp, so only for plaintext mail:
https://searchfox.org/comm-central/search?q=MIME_detect_charset&redirect=false
No idea what that code does.

I tested with flowed and non-flowed plaintext mail and the code wasn't triggered, hence a missing charset wasn't replaced by a detected one, everything was displayed as if it were UTF-8.

A bit off-topic: Note that flowed plaintext mail is processed in mimetpfl.cpp (tpfl: t-ext p-lain fl-owed). FYI, MIME has that class model
https://searchfox.org/comm-central/rev/e5e75651c5fb70526ae298312d99bc37ffd1ad32/mailnews/mime/src/mimei.h#24
implemented in C, so every class implements its own methods or calls into the parent. All that's done by passing around function pointers. There was an attempt to straighten that out, see bug 1463289.

max

Comment 27

•

3 years ago

(In reply to José M. Muñoz from comment #22)

How did you do that since automatic detection is broken and has always been broken since it was never correctly implemented?

that's why I suggested a hacky solution to get the body and detect its charset.

I was afraid that charset detector behavior may be completely unreliable. So I created a small extension that is based on your "hacky" patch (by the way, const or let is missed before mimeMsg). I did not touch menu entries, etc. I tried several phrases and fragments of messages encoded as koi8-r and cp1251. I do not have a convincing example that charset detector does not work (however I would still prefer to have charset menu as a fallback). Since I use Thunderbird from Ubuntu packages, I do not bother too much concerning version 91 (update from 68 was not so much time ago). I hope, it would not be required to exercise with iconv to read a message.

var tst = class extends ExtensionCommon.ExtensionAPI {
	getAPI(context) {
		return { tst: {
			async charset(raw) {
				const mimeMsg = MimeParser.extractMimeMsg(raw, {
					includeAttachments: false,
					strformat: "binarystring",
				});
				let body = mimeMsg.parts[0].body;
				let compUtils = Cc[
					"@mozilla.org/messengercompose/computils;1"
				].createInstance(Ci.nsIMsgCompUtils);
				let charset = compUtils.detectCharset(body);
				return { body, charset };
			},

+ patched mimeParser.jsm

Geoff Lankow (:darktrojan)

Updated

•

3 years ago

Comment 28

•

3 years ago

Thank you all for the discussion here. We've completed José's idea and included it in our product. John, we've also followed your advice and added another emitter:
https://github.com/Betterbird/thunderbird-patches/blob/main/91/bugs/1713786-fix-repair-charset.patch

José M. Muñoz

Comment 29

•

3 years ago

Hmm, this comes as a surprise, I didn't know there's another product now. Anyway, you're welcome and it's all FOSS anyway. We have a saying that goes (translated): "Competition stimulates business".

Tom

Updated

•

3 years ago

status-thunderbird_esr91: --- → affected

tracking-thunderbird_esr91: --- → ?

Worcester12345

Comment 30

•

3 years ago

(In reply to José M. Muñoz from comment #12)

Is charset menu completely removed from thunderbird?

Yes, in bug 1704749, currently only visible in TB Daily. Did you see comment #5 mentioning https://addons.thunderbird.net/en-US/thunderbird/addon/charset-menu/?

IMHO the removal of features without hard facts to justify such removal is questionable (see discussion at the end of bug 1704749). Replacing the charset menu (mostly working, see bug 1677104) with a 100%-currently-not-working automatic feature is rather unfortunate. Let's hope it gets fixed before TB 91 goes to beta and later becomes the new TB 91 ESR. Currently fixing a wrong charset is completely broken in TB Daily for both message display (this bug here) and message source display (bug 1716059).

Looks like it didn't.

(In reply to José M. Muñoz from comment #24)

Sorry, slip of the keyboard, s/any/much/. Hard to understand how TB does its scheduling, it's been broken for two weeks now and nobody is assigned to the bug. No priority/severity set although this is a user-facing regression and loss of functionality.

So what is the solution? Back out bug 1704749?

Martin Giger [:freaktechnik]

Assignee

Comment 31

•

3 years ago

Attached file Bug 1713786 - Do mime detection instead of charset override. r=mkmelin — Details

Phabricator Automation

Updated

•

3 years ago

Assignee: nobody → martin

Status: NEW → ASSIGNED

Martin Giger [:freaktechnik]

Assignee

Updated

•

3 years ago

Attachment #9226588 - Attachment is obsolete: true

Martin Giger [:freaktechnik]

Assignee

Updated

•

3 years ago

Keywords: checkin-needed-tb

Pulsebot

Comment 32

•

3 years ago

Pushed by alessandro@thunderbird.net:
https://hg.mozilla.org/comm-central/rev/302c32860a1f
Do mime detection instead of charset override. r=mkmelin

Status: ASSIGNED → RESOLVED

Closed: 3 years ago

Keywords: checkin-needed-tb

Resolution: --- → FIXED

Alessandro Castellani [:aleca]

Updated

•

3 years ago

Target Milestone: --- → 94 Branch

Rachel Martin

Comment 33

•

3 years ago

Attached file Russian-test-message.eml — Details

nsIMessenger.setDocumentCharset() also corrected the subject display in the header pane (not the thread pane), your solution doesn't. Here's a test message from a Russian news group.

aliledudiable

Comment 34

•

3 years ago

I don't understand why the charset override has been removed, instead of detection being just added. Detection doesn't always work. If the user can't force the charset, that means that some messages will now be forever unreadable.

Magnus Melin [:mkmelin]

Comment 35

•

3 years ago

(This should eventually go uplift to 91, but best to let it bake on beta for the full cycle.)

tracking-thunderbird91: ? → ---

tracking-thunderbird_esr91: ? → +

Henri Sivonen (:hsivonen)

Updated

•

3 years ago

Comment 36

•

3 years ago

The linked bug 1738000 is, IMHO, an indication of why it is important to allow overriding the character set, and not rely on absolutely always getting the right charset.

Pulsebot

Comment 37

•

3 years ago

Pushed by mkmelin@iki.fi: https://hg.mozilla.org/comm-central/rev/04b7b6e88251 followup - clang-format. rs=clang-format

Martin Giger [:freaktechnik]

Assignee

Updated

•

3 years ago

Regressions: 1739609

Martin Giger [:freaktechnik]

Assignee

Comment 38

•

3 years ago

Attached patch bug1713786-esr91.diff (obsolete) — Details — Splinter Review

[Approval Request Comment]
Regression caused by (bug #): bug 1704749
User impact if declined: "Repair Text Encoding" menu item does nothing
Testing completed (on c-c, etc.): tested on beta, c-c, has a browser test
Risk to taking this patch (and alternatives if risky): Worst thing I've seen (in previous versions of the patch) is that some actions could fail once the encoding has been repaired. Given this has been on beta for an entire cycle this seems fairly safe.

Attachment #9249413 - Flags: approval-comm-esr91?

Artem

Comment 39

•

3 years ago

"Repair Text Encoding" don`t recognise Windows-1251

Rachel Martin

Comment 40

•

3 years ago

Attached file windows-1251.eml — Details

This is detected as windows-1251, isn't it?

Masatoshi Kimura [:emk]

Updated

•

3 years ago

Attachment #9249603 - Attachment mime type: text/plain → message/rfc822

Rachel Martin

Comment 41

•

3 years ago

Please don't change mail attachments to message/rfc822. BMO can't view those, see bug 1154521 comment #7.

Rachel Martin

Updated

•

3 years ago

Attachment #9249603 - Attachment mime type: message/rfc822 → text/plain

Artem

Comment 43

•

3 years ago

Attached file Inventory.eml — Details

Example win-1251 that don`t recognised correctly at 91.x

Artem

Comment 44

•

3 years ago

Attached image test1251.PNG — Details

Show test file not correct

Rachel Martin

Comment 45

•

3 years ago

TB 91.3 is still broken. It works in the latest beta, right?

Artem

Comment 46

•

3 years ago

(In reply to Rachel Martin from comment #40)

Created attachment 9249603 [details]
windows-1251.eml

This is detected as windows-1251, isn't it?

Yes, and don`t show correct(In reply to Rachel Martin from comment #45)

TB 91.3 is still broken. It works in the latest beta, right?

Yes, at 95 it correct. Wait for release update.

Henri Sivonen (:hsivonen)

Comment 47

•

3 years ago

(In reply to Artem from comment #43)

Created attachment 9249640 [details]
Inventory.eml

Example win-1251 that don`t recognised correctly at 91.x

The detector detects this as windows-1251; the problem is 91 not running the detector.

Wayne Mery (:wsmwk)

Comment 48

•

3 years ago

Comment on attachment 9249413 [details] [diff] [review]
bug1713786-esr91.diff

[Triage Comment]
Approved for esr91

And thanks for the test

Wayne Mery (:wsmwk)

Comment 49

•

3 years ago

Comment on attachment 9249413 [details] [diff] [review]
bug1713786-esr91.diff

[Triage Comment]
(really) Approved for esr91

Attachment #9249413 - Flags: approval-comm-esr91? → approval-comm-esr91+

Rob Lemley [:rjl]

Comment 50

•

3 years ago

bugherder uplift

Thunderbird 91.3.1:
https://hg.mozilla.org/releases/comm-esr91/rev/87443dabfdc4

status-thunderbird_esr91: affected → fixed

Rob Lemley [:rjl]

Comment 51

•

3 years ago

Backout Thunderbird 91.3.1:
https://hg.mozilla.org/releases/comm-esr91/rev/de65d8f20698

There is C++ code in comm-esr91 that has been removed from comm-central that fails to build with these changes applied. There's probably more.

ERROR -  /builds/worker/checkouts/gecko/comm/mailnews/compose/src/nsMsgAttachmentHandler.cpp:579:52: error: cannot initialize a parameter of type 'bool' with an rvalue of type 'nullptr_t'
INFO -                                            nullptr, nullptr,
INFO -                                                     ^~~~~~~
INFO -  /builds/worker/workspace/obj-build/dist/include/nsIMsgMessageService.h:71:169: note: passing argument to parameter 'aCharsetOverride' here
INFO -    JS_HAZ_CAN_RUN_SCRIPT NS_IMETHOD DisplayMessage(const char * aMessageURI, nsISupports *aDisplayConsumer, nsIMsgWindow *aMsgWindow, nsIUrlListener *aUrlListener, bool aCharsetOverride, nsIURI **aURL) = 0;
INFO -                                                                                           ^
INFO -  1 error generated.
ERROR -  make[4]: *** [/builds/worker/checkouts/gecko/config/rules.mk:676: nsMsgAttachmentHandler.o] Error 1
INFO -  make[4]: Leaving directory '/builds/worker/workspace/obj-build/comm/mailnews/compose/src'
ERROR -  make[3]: *** [/builds/worker/checkouts/gecko/config/recurse.mk:72: comm/mailnews/compose/src/target-objects] Error 2
INFO -  make[3]: *** Waiting for unfinished jobs....

status-thunderbird_esr91: fixed → affected

Flags: needinfo?(martin)

Rob Lemley [:rjl]

Updated

•

3 years ago

Attachment #9249413 - Flags: approval-comm-esr91+

Comment hidden (off-topic)

Rachel Martin

Comment 53

•

3 years ago

Comment posted to the wrong bug? Timers? Rooms?

Rob Lemley [:rjl]

Comment 54

•

3 years ago

Attached patch Bug1713786-esr91_v2.patch — Details — Splinter Review

[Triage Comment]
Patch was previously approved for 91.3.1. Updated version fixes msMsgAttachmentHandler.cpp.

Attachment #9249413 - Attachment is obsolete: true

Flags: needinfo?(martin)

Attachment #9250945 - Flags: approval-comm-esr91+

Rob Lemley [:rjl]

Comment 55

•

3 years ago

bugherder uplift

Thunderbird 91.3.2:
https://hg.mozilla.org/releases/comm-esr91/rev/c45170901b9f

status-thunderbird_esr91: affected → fixed

Ryan Ho

Comment 56

•

3 years ago

Attached file Example message in big5 encoding that "Repair Text Encoding" is not working in 91.3.2 — Details

This message has two parts: Plain Text and Rich Text (Html).
Thunderbird can display correctly in Plain Text, but not in Rich Text.
"Repair Text Encoding" is not help.

Rich Text (html) part has "charset=big5" in <meta> tag but seems like Thunderbird ignored it.

Rachel Martin

Comment 57

•

3 years ago

Henri, could you please give us your opinion on this. The text/plain part is base64 encode and has these headers:

Content-Type: text/plain;
	charset="big5"
Content-Transfer-Encoding: base64

The text/html part doesn't specify a charset. Opening the eml file in Notepad++, it detects Big5.

@Ryan: I don't think TB looks at the HTML content for charset detection. It would also be best to report this as a new bug.

Flags: needinfo?(hsivonen)

Rachel Martin

Comment 58

•

3 years ago

Actually, the message displays fine in TB 95 beta where this code is present. So detection of the charset must have worked:
https://hg.mozilla.org/comm-central/rev/d0af6cc5fe02#l3.18

This will eventually be in TB 91.x.

Flags: needinfo?(hsivonen)

Rachel Martin

Comment 59

•

3 years ago

Attached file big5-written-with-Notepad++.eml — Details

Sorry, I got confused. The original message doesn't display correctly, what does display correctly is the file I imported after writing it back with Notepad++. When comparing the original to that file, the compare utility complains about encoding errors.

So back to Henri. Is there anything strange in 'Example message in big5 encoding that "Repair Text Encoding" is not working in 91.3.2'?

Flags: needinfo?(hsivonen)

Henri Sivonen (:hsivonen)

Comment 60

•

3 years ago

(In reply to Rachel Martin from comment #59)

So back to Henri. Is there anything strange in 'Example message in big5 encoding that "Repair Text Encoding" is not working in 91.3.2'?

As far as I can tell, this is not a detector bug but an enveloping bug in whatever software generated the email. It appears that the HTML part has first been encoded as Big5 and then CRLF line breaks have been inserted every 998 bytes (so that every line with CRLF included is 1000 bytes).

These line breaks not only occur in places that are inappropriate for the HTML markup but also between the bytes of two-byte Big5 characters. As a result, the stream of bytes ends up not looking like Big5, since in has a Big5 lead byte followed by a carriage return even though to there should be a proper Big5 trail byte.

Since I'm not aware of this failure mode occurring on the Web and this seems specific to bad email enveloping CRLF injection, I don't intend to change the detector due to this, but I leave it up to you if you wish to explore some hacks to remove CRLFs occurring at regular byte intervals in Content-Transfer-Encoding: 8bit before further processing.

Flags: needinfo?(hsivonen)

Rachel Martin

Comment 61

•

3 years ago

Thanks for the insight, Henri, much appreciated. So the charset detection is thrown off the rails by CRLF being injected into two-byte Big5 characters. Notepad++ detects Big5 and apparently removes the broken characters, so once the file is written back, it's all valid Big5 with some missing characters, in fact, Notepad++ saves 10 bytes less.

The message was produced by "Microsoft CDO for Windows 2000", well, the year 2000 was 21 years ago, perhaps this sort of message won't happen all that often any more. Pity the facility to explicitly set a charset was removed, otherwise the user could have set the correct charset and read most of the message, apart from the five broken characters.

Reporter, if you want to take the issue further, please file a new bug with the summary:
Charset detection doesn't work on Big5-encoded message were some two-byte Big5 characters were injected with CRLF.

Needless to say that the client which produced the message is in error, much like TB before version 45 and before bug 1225904 was fixed. As some may recall, TB regularly corrupted CJK messages. These days TB mostly uses base64 encoding to work around long lines.

aliledudiable

Comment 62

•

3 years ago

Reporter, if you want to take the issue further, please file a new bug etc.

Note that this situation is one of the reasons why a manual override of the encoding remains necessary for the foreseeable future. There are, and will remain, situations in which the auto-detection fails, or cannot succeed, for some reason.

Rachel Martin

Comment 63

•

3 years ago

Bug 1743059 was actually filed for the Big5 issue.

Comment hidden (off-topic)

1713786.patch - Solution idea 3 years ago José M. Muñoz 2.58 KB, patch		Details \| Diff \| Splinter Review
1713786.patch - Solution idea (refreshed) 3 years ago José M. Muñoz 3.37 KB, patch		Details \| Diff \| Splinter Review
Bug 1713786 - Do mime detection instead of charset override. r=mkmelin 3 years ago Martin Giger [:freaktechnik] 48 bytes, text/x-phabricator-request		Details \| Review
Russian-test-message.eml 3 years ago Rachel Martin 1.65 KB, text/plain		Details
bug1713786-esr91.diff 3 years ago Martin Giger [:freaktechnik] 49.85 KB, patch		Details \| Diff \| Splinter Review
windows-1251.eml 3 years ago Rachel Martin 1.22 KB, text/plain		Details
Inventory.eml 3 years ago Artem 1.19 KB, message/rfc822		Details
test1251.PNG 3 years ago Artem 28.50 KB, image/png		Details
Bug1713786-esr91_v2.patch 3 years ago Rob Lemley [:rjl] 50.77 KB, patch	rjl : approval-comm-esr91+	Details \| Diff \| Splinter Review
Example message in big5 encoding that "Repair Text Encoding" is not working in 91.3.2 3 years ago Ryan Ho 90.40 KB, message/rfc822		Details
big5-written-with-Notepad++.eml 3 years ago Rachel Martin 90.39 KB, text/plain		Details