Open Bug 688144 Opened 13 years ago Updated 2 years ago

Thunderbird and Gmail display differently the same character

Categories

(Thunderbird :: General, defect)

7 Branch
x86_64
Windows 7
defect

Tracking

(Not tracked)

People

(Reporter: o2627091, Unassigned)

References

Details

(Whiteboard: [Gmail's web end bug])

Attachments

(6 files)

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0.2) Gecko/20100101 Firefox/6.0.2
Build ID: 20110902133214

Steps to reproduce:

Send a email from thinderbird with the '¿' symbol


Actual results:

In thinderbird 7.0 beta 3 is displayed correctly but in gmail is displayed like 'ż'
In thunderbird 6 that doesn't happen.
I am using thunderbird 7.0 beta 3 (english) but my windows 7 is in spanish (and spanish keyboard layout)
(In reply to Bastard from comment #1)
Is this issue being discussed ATM in bug 686519 ?
Can you ave one of those message for which it doesn't work and attach it to the bug so we can do further analysis ?
Keywords: regression
Infomation about the message:
Step to reproduce: send a message with thunderbird, from and to gmail accounts.
Encoding: ISO-8859-1
Subject: ¿Ticket recibió?
Body: hello, ¿your ticket recibió?
Reading the email from gmail also shows incorrect symbols:

Go to your gmail account -> read problematic email -> droplist menu -> show original.

Please check also related bug:
https://bugzilla.mozilla.org/show_bug.cgi?id=686519
(In reply to Hashem Masoud from comment #2)
> (In reply to Bastard from comment #1)
> Is this issue being discussed ATM in bug 686519 ?

I don't know.
(In reply to Bastard from comment #0)
> Actual results:
> In thinderbird 7.0 beta 3 is displayed correctly but in gmail is displayed
> like 'ż'

At where?
  Thread pane? Message header pane? Message pane? All of them?

> but in gmail is displayed like 'ż'

At Gmail Web Interface? Gmail IMAP account's mbox accessed by Tb?
If latter, what charset is selected at Folder Properties/General of the Gmail IMAP folder which is accessed by Tb?

Tested with next crafted mail written in ISO-8859-1 held in local mail folder, using Tb 7.0.1 on Win.
(Tb never genertes such Subject: header. Tb always encodes it correctly) 
> Subject: ¿Ticket recibió?
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> Content-Transfer-Encoding: 8bit
>
> ¿Ticket recibió?

Folder's charset Thread pane       Message header pane          Message pane
ISO-8859-1       ¿Ticket recibió?  subject: ¿Ticket recibió?    ¿Ticket recibió?
UTF-8            ¿Ticket recibió?  subject: �Ticket recibi�?  ¿Ticket recibió?
winows-1250      ¿Ticket recibió?  subject: żTicket recibió?    ¿Ticket recibió?
Note:
  "Apply default to..." is unchecked.
  No differece between Auto-Detect=Off and Auto-Detect=Universal.

(1) Gmail IMAP returns as-is to Tb's fetch body.headerfields(subject request,
    because charset is correctly specified in Content-Type: header.
(2) Because Tb has quirks on not-encoded Subject: header, 
    charset in Content-Type: header is applied to thread pane display.
(3) There is no such quirks for display at messsage header pane.
    Not-properly encoded Suject: header data is used as-is, and the binary is
    placed as text of HTML which is internally used for message header pane.
    Because charset of Subject: heeader is unknown, folder's charset is applied.
(4) Because charset in Content-Type: header is correct and is applied to
    message pane display, text is shown as expected.

> In thunderbird 6 that doesn't happen.

With same pofile, same mail folder, same mail data, as Tb 7 testing?

Code point of 0xBF(¿, ż) seems special in many charset, and looks to have different glyph in different charsets or code pages.
Status: UNCONFIRMED → NEW
Ever confirmed: true
(In reply to WADA from comment #7)
> (In reply to Bastard from comment #0)
> > Actual results:
> > In thinderbird 7.0 beta 3 is displayed correctly but in gmail is displayed
> > like 'ż'
> 
> At where?
>   Thread pane? Message header pane? Message pane? All of them?
> 
> > but in gmail is displayed like 'ż'
> 
> At Gmail Web Interface? Gmail IMAP account's mbox accessed by Tb?
> If latter, what charset is selected at Folder Properties/General of the
> Gmail IMAP folder which is accessed by Tb?
> 

See attachments. The charset is Western (ISO-8859-1)


> Tested with next crafted mail written in ISO-8859-1 held in local mail
> folder, using Tb 7.0.1 on Win.
> (Tb never genertes such Subject: header. Tb always encodes it correctly) 
> > Subject: ¿Ticket recibió?
> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> > Content-Transfer-Encoding: 8bit
> >
> > ¿Ticket recibió?
> 

Yes but open the email with gmail web service, not IMAP (see attachments)
It's not a regression, with Tb 6.0.2 it happens exactly the same. Sorry for the misinformation.
> Tested with next crafted mail written in ISO-8859-1 held in local mail
> folder, using Tb 7.0.1 on Win.
> (Tb never genertes such Subject: header. Tb always encodes it correctly) 
> > Subject: ¿Ticket recibió?
> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> > Content-Transfer-Encoding: 8bit
> >
> > ¿Ticket recibió?
> 
> Folder's charset Thread pane       Message header pane          Message pane
> ISO-8859-1       ¿Ticket recibió?  subject: ¿Ticket recibió?    ¿Ticket
> recibió?
> UTF-8            ¿Ticket recibió?  subject: �Ticket recibi�?  ¿Ticket
> recibió?
> winows-1250      ¿Ticket recibió?  subject: żTicket recibió?    ¿Ticket
> recibió?

Note that the ISO-8859-1 equivalent (at least for java) it is Windows-1252, not Windows-1250.
Keywords: regression
Message sent with Thunderbird 7.0.1 (viewed from Gmail -> show original), using to view it Mozilla Firefox 7.0.1:


Return-Path: <example@gmail.com>
Received: from [127.0.0.1] (IP)
        by mx.google.com with ESMTPS id id.0.2011.10.11.06.24.06
        (version=SSLv3 cipher=OTHER);
        Tue, 11 Oct 2011 06:24:07 -0700 (PDT)
Message-ID: <M.ID@gmail.com>
Date: Tue, 11 Oct 2011 15:24:07 +0200
From: Me <example@gmail.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:7.0.1) Gecko/20110929 Thunderbird/7.0.1
MIME-Version: 1.0
To: example@gmail.com
Subject: =?ISO-8859-1?Q?=BFTicket_recibi=F3=3F?=
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit

hello, �your ticket recibi�?
Note that google Chrome in this latest case doesn't show the � glyphs, it shows everything correctly.
Message sent with The Bat! Pro 5.0.6 (viewed from Gmail -> show original), using to view it Mozilla Firefox 7.0.1. Google Chrome shows it exactly the same. Using the bat! it is viewed correctly from everywhere, all fields.


Return-Path: <example@gmail.com>
Received: from localhost (ip)
        by mx.google.com with ESMTPS id id.8.2011.10.12.08.56.41
        (version=TLSv1/SSLv3 cipher=OTHER);
        Wed, 12 Oct 2011 08:56:42 -0700 (PDT)
Date: Wed, 12 Oct 2011 17:56:42 +0200
From: Me <example@gmail.com>
X-Priority: 3 (Normal)
Message-ID: <id@gmail.com>
To: example@gmail.com
Subject: =?iso-8859-15?Q?=BFticket_recibi=F3=3F?=
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-15
Content-Transfer-Encoding: quoted-printable

Hello, =BFyour ticket recibi=F3?
The same thing as comment 15 but configuring The Bat! encoding for new mails to Western European (ISO) instead of its defaults Latin 9 (ISO), just to see how it behaves (everything is perfect for any place):

Return-Path: <example@gmail.com>
Received: from localhost (ip)
        by mx.google.com with ESMTPS id id.10.2011.10.12.09.10.37
        (version=TLSv1/SSLv3 cipher=OTHER);
        Wed, 12 Oct 2011 09:10:38 -0700 (PDT)
Date: Wed, 12 Oct 2011 18:10:38 +0200
From: Me <example@gmail.com>
X-Priority: 3 (Normal)
Message-ID: <043r4jijjrklkrmelk@gmail.com>
To: example@gmail.com
Subject: =?iso-8859-1?Q?=BFticket_recibi=F3=3F?=
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable

Hello, =BFyour ticket recibi=F3? ISO Western
(In reply to Bastard from comment #9)
> Created attachment 566543 [details]
> From GMail web interface (Mozilla Firefox 7.0.1)

Subject is shown like next by *Firefox*;
  ¿Ticket recibió?
  This is shown by *Firefox who is a browser*,
    based on data transfered from *Gmail Web Interface* via *HTTP:*,
   (never by *Thunderbird who is mailer*,)
   (   based on data transfered from *Gmail IMAP* via *IMAP:*)
Body text is shown like next by *Firefox*;
  hello, żyour ticket recibió?
  This is shown by *Firefox who is a browser*,
    based on data transfered from *Gmail Web Interface* via *HTTP:*,
   (never by *Thunderbird who is mailer*,)
   (   based on data transfered from *Gmail IMAP* via *IMAP:*)

Why can Tb be relevant to shown glyph by *Firefox who is a browser* based on data transfered from *Gmail Web Interface* via *HTTP:*?
All of phenomena is up to Gmail Web Interface and Firefox.
Are you requesting glyph of "hello, żyour ticket recibió?"(no quotes) as message body shown by Tb based on data passed from Gmail IMAP?

As seen in next mail source generated by The Bat!,
> Subject: =?iso-8859-1?Q?=BFticket_recibi=F3=3F?=
> Content-Type: text/plain; charset=iso-8859-1
> Content-Transfer-Encoding: quoted-printable
>
> Hello, =BFyour ticket recibi=F3?
non-7bits-ascii data in message header has to be encoded, and charset has to be specified in Content-Type: header if message body contains non-7bits-ascii. 
Please note that next;
There is no correct display of malformed or corrupted mail on the Earth.
There is correct display of correct mail only.
Quirks for malformed mail, "non-7bits-ascii data in message header without encoding" and/or "no charset in Content-Type: header even though message body contains non-7bits-ascii" in your case, depends on software.
Gmail(Web server) and Gmail IMAP(IMAP server) are absolutely different server/system/software, even though they use same software components and they share mail data.
Firefox(browser) and Thunderbird(mailer, IMAP client in this bug) are absolutely different software who access different server, even though they share same software components.
One comment to make here is that the Thunderbird email appears to be using an 8-bit encoding.

Doing some research, ż is the ISO 8859-2 character at 0xBF, while ¿ is the character at 0xBF in ISO 8859-1.

My guess is that, in the case of an 8-bit transfer encoding, Gmail naively passes through the character and it gets misencoded by some computer when processing for display. I haven't done any thorough test cases, but it appears that Gmail uses UTF-8 for output in the HTML page and directly emits the character (as opposed to using character entities). So the most likely interpretation is one of Gmail's internal processing servers is at fault.

Scratch that, I did some more testing. Gmail is definitely the one at fault here; they're not handling 8-bit email correctly.

However, there is a simple workaround: send all of your messages in UTF-8. When this is done, even Gmail's incorrect handling of 8-bit mail won't affect the outcome of the message.
(In reply to Bastard from comment #13)
> Message sent with Thunderbird 7.0.1 (viewed from Gmail -> show original),
> using to view it Mozilla Firefox 7.0.1:
> 
> 
> Return-Path: <example@gmail.com>
> Received: from [127.0.0.1] (IP)
>         by mx.google.com with ESMTPS id id.0.2011.10.11.06.24.06
>         (version=SSLv3 cipher=OTHER);
>         Tue, 11 Oct 2011 06:24:07 -0700 (PDT)
> Message-ID: <M.ID@gmail.com>
> Date: Tue, 11 Oct 2011 15:24:07 +0200
> From: Me <example@gmail.com>
> User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:7.0.1) Gecko/20110929
> Thunderbird/7.0.1
> MIME-Version: 1.0
> To: example@gmail.com
> Subject: =?ISO-8859-1?Q?=BFTicket_recibi=F3=3F?=
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> Content-Transfer-Encoding: 8bit
> 
> hello, �your ticket recibi�?

I've noticed that this � glyph appears because Firefox uses the UTF-8 charset by default, changing it via Menu -> Web developer -> Character codification -> ISO-8859-1 /Windows-1252  it shows properly all glyphs (hello, ¿your ticket recibió?)

So, how to continue here? How to report it to Google?
I've already sent the bug report to Google.

I've done some research with Tb and yahoo.com accounts sending a email with Tb via POP (IMAP now supported for Yahoo), and with yahoo it is shown properly. See attachment next.
For crafted mail of next subject header(glyph of iso-8879-1) which is uploaded by Tb 7.0.1 via Gmail IMAP:
> Subject: ¿Ticket recibió?
Via Gmail IMAP, by Tb, Thread pane display(upload by copy, then Repair Folder);
  �Ticket recibi�?
  (as you saw in bug 686519, phenomenon of bug 513472)
Via Gmail, by SeaMonkey, with Language: Gmail display language: English(US);
  ¿Ticket recibió?

Gmail sends page in UTF-8 to browser.
Gmail probably converts the "raw 8bit 0xBF of unknown charset" to UTF-8 upon sending HTML data to browser.
When converting, Gmail possibly converts the raw 8bit binary from "charset corresponds to user's display language choice" to UTF-8.

Bastard, what language do you select at Mail settings of Gmaii?
> Language: Gmail display language: ?

Do you see your problem even on next mails sent by Tb or The Bat!?
(mail with correct encoding of Subject and correct charset in Content-Type)
Or mail of "no charset in Content-Type:" only?
>(Sent by Tb)
> User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:7.0.1) Gecko/20110929 Thunderbird/7.0.1
> Subject: =?ISO-8859-1?Q?=BFTicket_recibi=F3=3F?=
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> Content-Transfer-Encoding: 8bit
> 
> hello, ¿your ticket recibió?
>(Sent by The Bat!)
> Subject: =?iso-8859-15?Q?=BFticket_recibi=F3=3F?=
> Content-Type: text/plain; charset=iso-8859-15
> Content-Transfer-Encoding: quoted-printable
>
> Hello, =BFyour ticket recibi=F3?
(In reply to Bastard from comment #21)
> So, how to continue here? How to report it to Google?

It seems by now certain that the fault is with Google, and nothing to do with TB, since everything here is happening according to spec.

This does, however, beg the question as to what could or should be done by Thunderbird. The options, as I see them, are:
1. Ignore the issue.
2. Stop sending out 8-bit MIME messages anywhere, and always use either quoted-printable or base64.
3. Switch the default to UTF-8.
4. Only allow emitting 8-bit MIME messages if the charset is UTF-8.

Thoughts, bienvenu?
To know what happens, next are required.
(1) Mail source held by Gmail.
    View the mail by Tb via Gmail IMAP, save as .eml.
(2) Information about page/iframe sent by Gmail to browser.
(2-1) charset of Page, by context menu.
    Cursor at any place of page, View Page Info,
      Encoding: ?
      Content-Type exists in meta data? If yes, utf-8?
(2-2) HTML source used by browser.
    By Firefox, select next part at shown page by Gmail,
      hello, żyour ticket recibió?
    then "View Selection Source" of context menu.
    HTML like next will be shown.
      <div id=":gy">hello, żyour ticket recibió?<br>
      </div>
(2-3) HTTP headers for HTML source of iframe
   With "hello, żyour ticket recibió?" selected, This Frame/View Frame Info,
   General, Address.
     Example: https://mail.google.com/mail/?ui=2&view=bsp&ver=ohhl4rw8mbn4
   Disk Cache entry for the Address.
     about:cache?device=disk, find the Address, click link.
       example:  about:cache-entry?client=HTTP&sb=1&key=
                 https://mail.google.com/mail/?ui=2&view=bsp&ver=ohhl4rw8mbn4
     response-head: HTTP/1.1 200 OK Content-Type: text/html; charset=UTF-8(sip)

If row 8bit binary of 0xBF is sent from Gmail to Firefox without charset information, glyph shown in the iframe depends on View/Character Encoding choice by Firefox user. No one can know the 0xBF is "0xBF of ISO-8869-1" or "0xBF of ISO-8859-2 without character set information. To Auto-Detect(charset guessing) effectively work, "binary which exists in a charset but doesn't exist in other charset" is required in HTML text.

However, I couldn't see "żticket recibió?" which is for "0xBF of unknown charset in mail" by SeaMonkey and Gmail, even with View/Character Encoding=ISO-8859-2. It is shown as "¿Ticket recibió?" always.
I don't know whether View/Character Encoding is applied to all subsequent iframes or not. Because Google heavily utilizes JavaScript, it's very hard(nearly imposible for user) to know what happens in the page after HTTP GET.
I'm still guessing Gmail's Display Languge relevant phenomenon, because HTML for iframe is sent with Content-Type: text/html; charset=UTF-8 by Gmail.
>FYI. Content of it.
> <!DOCTYPE html><html><head></head><body><div></div></body></html>
(In reply to WADA from comment #26)
> To know what happens, next are required.

WADA, I've already debugged the problem. The problem is that Gmail does not properly transcode 8-bit MIME messages when displaying them if the source charset is not UTF-8. It is not a problem of the page displaying it incorrectly, it is a problem of the website emitting the wrong character. It is something that exists 100% on Gmail's end; the only open question now is if Thunderbird should take steps to avoid creating such problematic messages.
(In reply to Joshua Cranmer [:jcranmer] from comment #27)
> WADA, I've already debugged the problem. The problem is that Gmail does not
> properly transcode 8-bit MIME messages when displaying them if the source
> charset is not UTF-8.

We already know such problem of Gmail IMAP on "raw 8bit, 0xBF in this bug, of unknown charset in not-encoded Subject:" in bug 513472. In that case, such binary is converted to U+FFFD by Gmail, and is passed to Tb as utf-8 binary for U+FFFD by Gmail IMAP.
Such phenomenon never occurs on correctly encoded Subject:(i.e. charset information is correctly specified by mail sender).
Such phenomenon never occurs if "binary of unknown charset in Subject:" is binary of utf-8.
So, we guessed that Gmail/Gmail IMAP assumes utf-8 if binary of unknown charset in some circumstances. 

In contrast to it, phenomenon of "glyph of ż for message body text instead of glyph of ¿" looks to occur when message body is shown via Gmail Web interface.
And, it doesn't look to occur on "not encoded Subject: even with 8bit binary" in bug opener's case.
Further, "whether problem occurs even when charset of message body text is correctly specified in Content-Type: header or not" is still unlear by reporter's comment.
   
> It is not a problem of the page displaying it incorrectly, it is a problem of the website emitting the wrong character.
> It is something that exists 100% on Gmail's end;

I'm not suspecting "display problem of Firefox". I tried to find workaround by Firefox if possible. It's because that; if the raw 8bit binary is sent as-is from Gmail to Firefox in HTML text data, Firefox may be able to show glyph which bug opener wants by View/Character Encoding change.

My questions was to know;
(A) Whether problem which bug opener saw occurs even when Subject: is correctly
    encoded and charset of message body is correctly specified in Content-Type,
    or not.
(B) Gmail sends original raw 0xBF to browser, or always sends utf-8 binary
    (which is converted from 0xBF) to browser, as HTML data passed to browser.

If always converted to utf-8 before send to browser, I can't imagine why original 0xBF of unknown charset is converted to "ż" in bug opener's environment but it's always converted to "¿" in my environment.
If always converted to utf-8 before send to browser, I can't imagine why original 0xBF of known charset is not converted to utf-8 correctly by Gmail.

Joshua Cranmer, can you answer to above my questions?

> the only open question now is if Thunderbird should take steps to avoid creating such problematic messages.

Because Tb never generates "not encoded Subject even though non-7bit-ascii exists", and because Tb always specifies charset in Content-Type: header, I though your concern on Tb and you question to David implies that the Gmail's problem occurs even when charset is correctly specified by mail sender if specified charset is not utf-8.
Needless to say, Tb may generate incorrect mail data stream by some special bugs of Tb, but it's very exceptional and such bugs will be resolved by developers.

Another question.
> WADA, I've already debugged the problem. The problem is that Gmail does not
> properly transcode 8-bit MIME messages when displaying them if the source
> charset is not UTF-8.

For 0xBF of unknown charset, what is proper transcode to utf-8?
If Gmail's BUG of "not properly convert when unknown charset", how can Gmail determine "0xBF of unknown charset" is "0xBF of iso-8859-1" instead of "0xBF of iso-8859-2" or vice versa?
Does Gmail fail to convert to utf-8 in some circumstances even when charset is correctly specified in encoded Subject: or in Content-Type: by mail sender?
A you said Gmail's bug, I thought the Gmail's problem occurs even when charset is correctly specified if it's not utf-8.
Is it wrong?
Or, does your "such problematic messages" involve mail like next?
  Content-Type: text/...; charset=xxx
    where xxx is charset who has 8bits code points, except UTF-8,
  Content-Transfer-Encoding: 8bits,
  with 8 bit binary in message body
If so, Tb usually generates such mail, according to user's charset choice for mail composition.

If so, no problem when sent in Quoted-Printable or Base64?
(No 8bits data in mail data stream, although 8bits data appears after decoding)
If so, no problem with charset=utf-8, even when 8bits binary exists in data stream?
(In reply to Joshua Cranmer [:jcranmer] from comment #25)
> (In reply to Bastard from comment #21)
> > So, how to continue here? How to report it to Google?
> 
> It seems by now certain that the fault is with Google, and nothing to do
> with TB, since everything here is happening according to spec.
> 
> This does, however, beg the question as to what could or should be done by
> Thunderbird. The options, as I see them, are:
> 1. Ignore the issue.
> 2. Stop sending out 8-bit MIME messages anywhere, and always use either
> quoted-printable or base64.
> 3. Switch the default to UTF-8.
> 4. Only allow emitting 8-bit MIME messages if the charset is UTF-8.
> 
> Thoughts, bienvenu?

Are we the only mail client sending out non utf-8 8-bit mime messages? I.e., is gmail likely to fix this on their end? I'd lean to 3 or 4, and between those two, probably 4.
My biggest trepidation is that I recall hearing that Japan's mobile phones often lack support for UTF-8, which makes me leery of options 3 or 4.

I'm trying to see if I can track down people with other mail clients to see if they are using 8-bit MIME...
> Bastard, what language do you select at Mail settings of Gmaii?
> > Language: Gmail display language: ?
I've tried it in English and Spanish, and both the same.
(In reply to Bastard from comment #32)
> > Bastard, what language do you select at Mail settings of Gmaii?
> > > Language: Gmail display language: ?
> I've tried it in English and Spanish, and both the same.

I could reproduce your problem at last by exactly same message body text as yours, with Display Language: English(US).
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> Content-Transfer-Encoding: 8bit
> 
> hello, ¿your ticket recibió?
Why I had been failing to reproduce problem was;
  I tested message body text of "¿Ticket recibió?"(no quote), which is same
  string as Subject:, to see same string at Thread pane, Message header pane,
  and Message pane.
Tb sends mail text in quoted-printable if mail.strictly_mime=true is set in prefs.
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> Content-Transfer-Encoding: quoted-printable
>
> hello, =BFyour ticket recibi=F3?
So, if you are mail sender, by mail.strictly_mime=true, Gmail's problem is avoided as Joshua Cranmer says.
Whiteboard: [Gmail's web end bug]
If no encoding in Subject: header, phenomenon is observed on Subject: too.
Message source.
> Subject: hello, ¿your ticket recibió?
> Content-Type: text/plain
> Content-Transfer-Encoding: 8bit
>
> hello, ¿your ticket recibió?
Gmail's mail list of a Gmail folder(Gmail label, single mail per conversation).
> hello, żyour ticket recibió?‎ - hello, żyour ticket recibió
Subject shown by Show Detail of mail display.
> subject	hello, żyour ticket recibió?

I can't call this phenomenon on malformed mail "Gmail's bug".
Apparently mail sender side bug.
If it's problem of Gmail, I think problem in this case is merely that there is no chance for user to see wanting glyph of "¿"(when "ż" is shown) or "ż"(when "¿" is shown) for original 0xBF of unknown charset. But is it mandatory feature of Web Mail system?
(In reply to WADA from comment #34)
> If it's problem of Gmail, I think problem in this case is merely that there
> is no chance for user to see wanting glyph of "¿"(when "ż" is shown) or
> "ż"(when "¿" is shown) for original 0xBF of unknown charset. But is it
> mandatory feature of Web Mail system?

Yes, there is a chance: Go to problematic email via web interface -> droplist -> Show original -> you will see the problematic glyphs because web browsers show by default UTF-8. Change the character encoding to ISO-8859-1 from your web browser (tools menu ->tools ->encoding in chrome, and similar in Firefox). You will see the correct characters.

But from the webmail interface itself it doesn't work.
Gmail team here

The issue is indeed on our side, though I'm not sure what we can do to fix it.  We deliberately ignore a stated charset of iso-8859-1 on messages which are not also encoded into base64 or quoted-printable, and instead rely on our automatic converters to pick the correct encoding.

The reason is, we see a lot of mail which says iso-8859-1 when it isn't, especially when these messages are 8bit encoded (or more usually, the encoding isn't specified at all).

I'm guessing our auto-detectors aren't correctly identifying the charset for these messages, possibly because they're very short and hence hard to identify, or possibly due to other biases in them.  I would think that "¿" would be more likely than "ż", and so iso-8859-1 would be detected, but I guess not.

As a work around, I would recommend not sending iso-8859-1 messages in 8bit encoding... or be fine with Gmail getting it wrong sometimes.  I can file a bug internally against the auto-detector, but I'm not sure when or if it would be fixed, and in any case, it is in the nature of such auto-detection to be wrong occasionally.
I think auto-detection should be only used in two cases:

1) When no possibly encoding is specified, or
2) when, although encoding is specified, somehow it seems to be invalid by some trial-error validation, for example invalid hex codes for that charset (says iso-8859-1 when it isn't),

that is not the case in this bug because western is specified.
Yes, this is a less than great situation, all of our decoding goes through auto-detection, usually with the charset as a suggestion.  Unfortunately, the auto-detector with the hint of the charset sided with the charset in too many incorrect cases, and usually with much worse results than one or two wrong characters.  Instead, we'd end up with posts in Chinese that were completely unreadable.

So, we had the choice of two bad situations, and the fix was for the "more bad" case.

I'm not an expert in all of the encodings involved or the auto-detector to know whether it would be possible to do a better job.  I do know the auto-detector is trained on web pages, which is a different corpus with larger text sizes, so its not surprising to me that with only a single character to choose between a bunch of possible charsets we would be wrong.
So, could you submit a bug to google mail?
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: