Closed Bug 1202401 Opened 4 years ago Closed 4 years ago

Prepare for m-c removal of nsISaveAsCharset

Categories

(MailNews Core :: Internationalization, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 45.0

People

(Reporter: hsivonen, Assigned: mkmelin)

References

Details

Attachments

(2 files, 1 obsolete file)

nsMsgI18N.cpp is the only c-c caller of nsISaveAsCharset:
https://mxr.mozilla.org/comm-central/search?string=nsISaveAsCharset

Since transliteration was removed by bug 1003731, the first call is equivalent to using nsIUnicodeEncoder directly.

The second call also request entities, but it's unclear if that code path is in use anymore considering that Thunderbird is supposed to upgrade messages to UTF-8 if there are on encodable characters.

Please change these to calling nsIUnicodeEncoder directly (if this whole method isn't dead code). Otherwise, c-c might have to take over nsSaveAsCharset once there are no more m-c callers.
Thanks for reminding us before the removal. When do you think would m-c like to remove the nsISaveAsCharset code?
(In reply to Henri Sivonen (:hsivonen) from comment #0)
> The second call also request entities, but it's unclear if that code path is
> in use anymore considering that Thunderbird is supposed to upgrade messages
> to UTF-8 if there are on encodable characters.

This method appears to be called on that path, so it might be doing entity conversion for text/html but not text/plain.

In short, what this method appears to be used in effect for is "save as charset, using Windows-1252 instead of ISO-8859-1 if the euro character is requested." Outside of compose, this behavior really oughtn't be necessary (well, if nsMSGI18NFileSystemCharset is fixed to return Windows-1252 instead of ISO-8859-1, unfortunately it seems that lots of places in our code love to assume that ISO-8859-1 is intended, although charset canonicalization would probably override that anyways).
(In reply to :aceman from comment #1)
> Thanks for reminding us before the removal. When do you think would m-c like
> to remove the nsISaveAsCharset code?

I expect to remove it in Q4. By then I expect to have added m-c code that converts with unmappables falling back to decimal character references, but no fancy options for hex or whatever: bug 1202366.
Depends on: 1214619
Blocks: 1214619
No longer depends on: 1214619
Bug 1202366 is now on mozilla-inbound and introduced nsNCRFallbackEncoderWrapper which takes a Gecko-canonical encoding name (recognized by the Firefox part; c-c extension encodings not supported but those are not supposed to be used for outgoing email) and provides conversion from nsAString to nsACString with decimal numeric character reference fallback.

Again, it's unclear to me if mailnews core really needs that function considering the UTF-8 upgrade, and the first use to check if message can be encoded in a given encoding should be implemented using nsIUnicodeEncoder directly now that transliteration is gone anyway.

Bug 1214619 has been reviewed and is waiting on this bug to be fixed first as a courtesy to avoid burning c-c.
Yes, looks to me we can drop nsMsgI18NSaveAsCharset. 
We were only converting to entities for plaintext, which we don't really want to do. (The rest of the entity usage was already killed off in bug 302748).

This patch simplifies things a bit. We no longer go all out trying to not need to convert to utf-8. If it's at all needed, let's just do it - e.g. for latin1 with euro sign. I don't see why not.
Assignee: nobody → mkmelin+mozilla
Status: NEW → ASSIGNED
Attachment #8679629 - Flags: review?(Pidgeot18)
Comment on attachment 8679629 [details] [diff] [review]
bug1202401_cc_nsISaveAsCharset_removal.patch

Review of attachment 8679629 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/base/src/nsMessenger.cpp
@@ +1854,2 @@
>      }
> +    // XXX: doing nothing for other cases?

Actually, the code is correct; no need for the XXX here.

What would look better is to assert that the output format is plain text. If it's HTML, m_doCharsetConversion shouldn't be set, and in that scenario, the code is emitted immediately in OnDataAvailable.

::: mailnews/base/util/nsMsgI18N.cpp
@@ +41,5 @@
>    if (inString.IsEmpty()) {
>      outString.Truncate();
>      return NS_OK;
>    }
> +  // Note: this will hide a possible error if the unicode contains more than one

Nit: capitalize 'this' and 'unicode'.

::: mailnews/base/util/nsMsgI18N.h
@@ +77,5 @@
>   * @param outString   [OUT] Converted output string.
> + * @param aIsCharsetCanonical  [IN] Whether the charset is canonical or not.
> + * @param aReportUencNoMapping [IN] Set encoder to report (instead of using
> + *                                  replacement char on errors). Set to true
> + *                                  to recive NS_ERROR_UENC_NOMAPPING when

Typo: receive

@@ +78,5 @@
> + * @param aIsCharsetCanonical  [IN] Whether the charset is canonical or not.
> + * @param aReportUencNoMapping [IN] Set encoder to report (instead of using
> + *                                  replacement char on errors). Set to true
> + *                                  to recive NS_ERROR_UENC_NOMAPPING when
> + *                                  that happens.

You should emphasize here that NS_ERROR_UENC_NOMAPPING is a success code, despite its name.

::: mailnews/compose/src/nsMsgCompose.cpp
@@ +1171,5 @@
>    if (!msgBody.IsEmpty())
>    {
>      // Convert body to mail charset
>      nsCString outCString;
>      nsCString fallbackCharset;

fallbackCharset is now dead, and thus so is the else if (!fallbackCharset.IsEmpty()) block a few lines down.

::: mailnews/compose/src/nsMsgSend.cpp
@@ +1613,4 @@
>        if (NS_SUCCEEDED(rv))
>        {
> +        origHTMLBody = (char16_t *) newBody.get();
> +        // XXX: re the cast, see the "whoa" a few lines down.

Hmm. I think this is a use-after-free, since the pointer of newBody.get() should be dead after newBody exits.

The reuse of origHTMLBody here is atrociously bad code. What you should just do is make the code basically be (deleting the if block below, the NS_ERROR_FAILURE in the else here meaning that we have to go through this code path):
if (origHTMLBody)
{
  nsCString newBody
  convert from unicode ...
  mOriginalHTMLBody = ToNewCString(newBody)
} else {
  mOriginalHTMLBody = ToNewCString(attachment1 [details] [diff] [review]_body)
}
Attachment #8679629 - Flags: review?(Pidgeot18) → feedback+
Should address review comments.
Attachment #8679629 - Attachment is obsolete: true
Attachment #8684698 - Flags: review?(Pidgeot18)
Attachment #8684698 - Flags: review?(Pidgeot18) → review+
https://hg.mozilla.org/comm-central/rev/15454eb97bbe -> FIXED
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 45.0
This caused a test failure
TEST-UNEXPECTED-FAIL | /builds/slave/test/build/tests/mozmill/composition/test-charset-edit.js | test-charset-edit.js::test_no_mojibake

Seems I've tracked it down though, just need to test some more.
For the test that fails we have

msgBody=ケツァルコアトル
to charset=windows-1252
rv ok, outCString=empty

I suppose that's really a bug in itself (in the encoder?), but let's fall back to utf-8, not ascii for such failures.

I also removed the m_editor check which I don't see the point of having there. Also the isAscii should apparently exclude stateful charsets - (that's the way the code was earlier)
Attachment #8687709 - Flags: review?(Pidgeot18)
Attachment #8687709 - Flags: review?(Pidgeot18) → review+
Depends on: 1251120
Depends on: 1271864
Depends on: 1444371
You need to log in before you can comment on or make changes to this bug.