Closed Bug 1717523 Opened 2 years ago Closed 1 year ago

Port bug 1713627: 1) Replace setting charset to _autodetect_all 2) Remove setting nsIDocShell.charset

Categories

(Thunderbird :: Upstream Synchronization, defect)

Thunderbird 91
defect

Tracking

(thunderbird_esr78 unaffected)

RESOLVED FIXED
91 Branch
Tracking Status
thunderbird_esr78 --- unaffected

People

(Reporter: jose, Assigned: darktrojan)

References

Details

Attachments

(3 files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36

Steps to reproduce:

Bug 1713627 needs porting since this doesn't work any more:
https://searchfox.org/comm-central/rev/84da1b60338f52ad99851bdc9d6512135909c8ec/common/src/viewSource.js#70

The other call site
https://searchfox.org/comm-central/rev/84da1b60338f52ad99851bdc9d6512135909c8ec/mail/base/content/mailWindow.js#1099
is covered in bug 1713786.

Depends on: 1713627

I'm going to land a patch to fix the build, some of the tests will break until I fix them too.

Assignee: nobody → geoff
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Keywords: leave-open
Target Milestone: --- → 91 Branch
Pushed by geoff@darktrojan.net:
https://hg.mozilla.org/comm-central/rev/9941ab12f44c
Port bug 1713627 - Removal of nsDocShell::SetCharset. rs=bustage-fix
Pushed by geoff@darktrojan.net:
https://hg.mozilla.org/comm-central/rev/5c4bb929e5d7
Stop setting docShell.charset in MozConversationBrowser to fix the tests. rs=bustage-fix

The one remaining test failure is mail/test/browser/content-policy/browser_viewSource.js. I've been looking at it, and it doesn't test what it was written for (that option no longer exists) and it's unclear to me if what it actually tests is correct (I performed the steps manually and got the outcome I expected yet the test fails). I think it's the only test of the view source feature though, so we should move it somewhere appropriate (it's not a content policy test any more) and make the intention clearer.

Sorry, I was focused on _autodetect_all and overlooked the second part. browser_viewSource.js seems to relate to encoding issues, quote: "Test that view-source content can be reloaded to change encoding", although it doesn't change the encoding. Sure, in content-policy/ it seems to be in the wrong place.

Summary: Port bug 1713627: Replace setting charset to _autodetect_all → Port bug 1713627: 1) Replace setting charset to _autodetect_all 2) Remove setting nsIDocShell.charset

Looking at the test again, it tries to repair the text encoding, and exactly that is broken, see comment #0:
https://searchfox.org/comm-central/rev/bc3bbc6f9a332c72ffe9d53c28cdd055af51d8fb/mail/test/browser/content-policy/browser_viewSource.js#129

This fixes the function and the related test failure. Somehow the behavior in the preview pane has changed, too, and it appears that manual testing and automatic testing give different results. At least on Windows, the test succeeds now. It's not surprising that UTF-8 två is displayed as tvÃ¥ when interpreted as ISO-8859-1.

Attachment #9228511 - Flags: review?(geoff)
Attached patch 1717523.patchSplinter Review

Reworked the test a bit. There is in fact no latin1 involved, it's just the UTF-8 multi-byte character displayed as if it were two ISO-8859-1 characters.

However, there is a problem which you can see when importing a message that has UTF-8 content declared as windows-1252/ISO-8859-1. On first display it's displayed "double-garbled", when you hide/redisplay the preview pane, it's "normal" garbled. So there is something else wrong this test picks up additionally.

Let me know if you have a better idea. One could of course remove that first check altogether, since the test is not about message display, but message source display, and those checks pass. I guess for the message display all this will need to be revisited in bug 1713786.

Surprisingly the failure in automation is "View source must contain the utf-8 text", so the first check of the message content doesn't even fail. So perhaps only the hunk in viewSource.js is needed.

Attachment #9228629 - Flags: feedback?(geoff)

Comment on attachment 9228511 [details] [diff] [review]
1717523.patch (1st version)

First version with minimal change. Looks like the "Message content must include the garbled latin1 text" doesn't fail in automation, but it fails locally. Please check whether perhaps the first hunk will be sufficient.

Attachment #9228511 - Attachment description: 1717523.patch → 1717523.patch (1st version)
Attachment #9228511 - Flags: review?(geoff)
Regressions: 1718119

Further to comment #8 I filed bug 1718119. "Good" messages show garbled upon first display, and "bad" message with incorrectly declared charset show "double-garbled". That's what the changes in browser_viewSource.js try to compensate.

No longer regressions: 1718119
Regressions: 1718119

Comment on attachment 9228629 [details] [diff] [review]
1717523.patch

Thanks, I've gone a bit further and given the test a much-needed overhaul.

Attachment #9228629 - Flags: feedback?(geoff) → feedback+

Repair Text Encoding in the main window is still broken and I'm not really sure what to do about it.

Nice rework. So you avoid running into the "double-garbled" I mentioned in comment #8 and and bug 1718119? The main window issue is covered in bug 1713786. I submitted a "proof of concept" patch there with a hacky idea. Looking at 1713786 comment #23 (quote: I agree that the solution of having the auto detection happen per mime part is probably the best way to do it.), it appears that they want a "proper" solution, but the bug has received no attention apart from being filed.

So you avoid running into the "double-garbled" I mentioned in comment #8 and bug 1718119?

I've just seen the workaround, quite ingenious ;-)

Would you mind changing ISO-8859-1 to windows-1252? The former is a label of the latter and it's sort of old fashioned to be using it still, see https://encoding.spec.whatwg.org/.

Pushed by geoff@darktrojan.net:
https://hg.mozilla.org/comm-central/rev/dfc51d8110e4
Fix Repair Text Encoding in source viewer and overhaul browser_viewSource.js. r=mkmelin

(In reply to José M. Muñoz from comment #14)

Nice rework. So you avoid running into the "double-garbled" I mentioned in comment #8 and and bug 1718119?

The problem is (was) the test is wrong, after many previous attempts to keep it working. The string we called "latin1" actually contained the UTF-8 bytes for the character C3 A5 instead of E5 like it should have. No wonder it's even more garbled when bug 1718119 hit.

The main window issue is covered in bug 1713786.

It can be handled there.

(In reply to José M. Muñoz from comment #15)

Would you mind changing ISO-8859-1 to windows-1252? The former is a label of the latter and it's sort of old fashioned to be using it still, see https://encoding.spec.whatwg.org/.

That's kind of the point of the test, to prove we still handle outdated things correctly. So by leaving it as-is we check that the old label still works too.

Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Keywords: leave-open
Resolution: --- → FIXED
See Also: → 1713786

The string we called "latin1" actually contained the UTF-8 bytes ...

No. The JS string was an UTF-16 string and adding it to a folder using nsIMsgLocalMailFolder.addMessage() would have just converted UTF-16 in the latin1 range to raw bytes, so Ã¥ miraculously became C3 A5 (and not the four bytes it would be in raw UTF-8) which is the raw-UTF8 for å which you have now. What you have now is what I suggested in my second patch, it's nicer, and the four tests are much better then one, but the original test worked, before bug 1718119 hit (as you can see in the first patch I attached, that just worked around bug 1718119). That bug with its "double-garbling" managed to confuse us all (since there you suddenly saw those four bytes, but for a reason that still needs to be explored).

You could change "Testar, ett tv\xC3\xA5 tre." back to "Testar, ett tvÃ¥ tre." and "Testar, ett tv\xE5 tre." to "Testar, ett två tre." and the test should still work since the string parameter of addMessage() does that for you:
https://searchfox.org/comm-central/rev/70f25c94bbc4808effa7834ece2b2a499656da5a/mailnews/local/public/nsIMsgLocalMailFolder.idl#73
Of course that wasn't mentioned, and as we see, it was hard to work out.

Okay, let me rephrase what I was trying to say, which isn't quite what I said:
If you pause the test and look at the file we ended up with, it contained C3 A5 but should've contained E5.

Indeed, through XPCOM magic å ended up being C3 A5 and the variable name contentLatin1 was a misnomer. After the overhaul it's much clearer and better. So what are the plans for bug 1713786 and bug 1718119?

(In reply to José M. Muñoz from comment #20)

So what are the plans for bug 1713786 and bug 1718119?

I don't actually have one. TBH it's not really my area, I'm just here to fix the build so we could all get back to work. Did the view source test too because I've been meaning to do it for ages.

You need to log in before you can comment on or make changes to this bug.