Closed
Bug 41706
Opened 25 years ago
Closed 24 years ago
Unicode messages that are quoted printable parts have "ðž" at beginning
Categories
(MailNews Core :: MIME, defect, P3)
Tracking
(Not tracked)
RESOLVED
INVALID
Future
People
(Reporter: pajs_1, Assigned: rhp)
Details
This bug also is in netscape 4.7 , basically, if someone posts a mime posting to
a newsgroup, you get a ÿþ< in the message. And if you click reply, that is all
of the message.. In netscape 4.7 , you never even saw any of the posting other
than that.
This happens if the header has anything like this in it :
MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary="----=_NextPart_000_004F_01BFD000.1228BBA0"
X-Priority: 3
(Would like to say if the header had X-Newsreader: Microsoft Outlook Express
5.00.2919.6600 in it :-) .. pitty)
Assignee | ||
Comment 1•25 years ago
|
||
Can you point to an example...that would help.
- rhp
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Target Milestone: --- → M19
Most of the postings this problem occurs in is the newsgroup 0000000076, which
is not visable outside my ISP im afraid.. But I will put an offending message
under this..
In netscape 4.7 I see the message as "
ÿþ<
"
In Mozilla M15 , I see "
ÿþ
That's the chappie...cheers Tarz
NUKE
"
(Without the speech marks).
If I hit reply in either version I just get
"
"NUKE" wrote:
> ÿþ<
"
Page source below.
From: "NUKE" <me@here.com>
Newsgroups: 0000000076
References: <393ce7f2@news.server.worldonline.co.uk>
<393ceeae@news.server.worldonline.co.uk>
<393d44e8@news.server.worldonline.co.uk>
Subject: Re: YoYo
Date: Tue, 6 Jun 2000 20:33:27 +0100
Lines: 58
MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary="----=_NextPart_000_0008_01BFCFF6.788A57A0"
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 5.00.2615.200
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2615.200
NNTP-Posting-Host: 212.49.241.182
X-Original-NNTP-Posting-Host: 212.49.241.182
Message-ID: <393d5453@news.server.worldonline.co.uk>
X-Trace: 6 Jun 2000 19:43:15 GMT, 212.49.241.182
Path: news.server.worldonline.co.uk!212.49.241.182
Xref: news.server.worldonline.co.uk 0000000076:4498
This is a multi-part message in MIME format.
------=_NextPart_000_0008_01BFCFF6.788A57A0
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
That's the chappie...cheers Tarz
NUKE
------=_NextPart_000_0008_01BFCFF6.788A57A0
Content-Type: text/html;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
=FF=FE<=00!=00D=00O=00C=00T=00Y=00P=00E=00 =00H=00T=00M=00L=00 =
=00P=00U=00B=00L=00I=00C=00 =
=00"=00-=00/=00/=00W=003=00C=00/=00/=00D=00T=00D=00 =00H=00T=00M=00L=00 =
=004=00.=000=00 =
=00T=00r=00a=00n=00s=00i=00t=00i=00o=00n=00a=00l=00/=00/=00E=00N=00"=00>=00=
=0D=00=0A=
=00<=00H=00T=00M=00L=00>=00<=00H=00E=00A=00D=00>=00=0D=00=0A=
=00<=00M=00E=00T=00A=00 =
=00c=00o=00n=00t=00e=00n=00t=00=3D=00"=00t=00e=00x=00t=00/=00h=00t=00m=00=
l=00;=00 =
=00c=00h=00a=00r=00s=00e=00t=00=3D=00u=00n=00i=00c=00o=00d=00e=00"=00 =
=00h=00t=00t=00p=00-=00e=00q=00u=00i=00v=00=3D=00C=00o=00n=00t=00e=00n=00=
t=00-=00T=00y=00p=00e=00>=00=0D=00=0A=
=00<=00M=00E=00T=00A=00 =
=00c=00o=00n=00t=00e=00n=00t=00=3D=00"=00M=00S=00H=00T=00M=00L=00 =
=005=00.=000=000=00.=002=006=001=004=00.=003=005=000=000=00"=00 =
=00n=00a=00m=00e=00=3D=00G=00E=00N=00E=00R=00A=00T=00O=00R=00>=00=0D=00=0A=
=00<=00S=00T=00Y=00L=00E=00>=00<=00/=00S=00T=00Y=00L=00E=00>=00=0D=00=0A=
=00<=00/=00H=00E=00A=00D=00>=00=0D=00=0A=
=00<=00B=00O=00D=00Y=00 =
=00b=00g=00C=00o=00l=00o=00r=00=3D=00#=00f=00f=00f=00f=00f=00f=00>=00=0D=00=0A=
=00<=00D=00I=00V=00>=00<=00F=00O=00N=00T=00 =
=00f=00a=00c=00e=00=3D=00"=00C=00o=00m=00i=00c=00 =00S=00a=00n=00s=00 =
=00M=00S=00"=00 =
=00s=00i=00z=00e=00=3D=002=00>=00<=00S=00T=00R=00O=00N=00G=00>=00T=00h=00=
a=00t=00'=00s=00 =00t=00h=00e=00 =
=00c=00h=00a=00p=00p=00i=00e=00.=00.=00.=00c=00h=00e=00e=00r=00s=00 =
=00=0D=00=0A=
=00T=00a=00r=00z=00<=00/=00S=00T=00R=00O=00N=00G=00>=00<=00/=00F=00O=00N=00=
T=00>=00<=00/=00D=00I=00V=00>=00=0D=00=0A=
=00<=00D=00I=00V=00>=00&=00n=00b=00s=00p=00;=00<=00/=00D=00I=00V=00>=00=0D=
=00=0A=
=00<=00D=00I=00V=00>=00<=00F=00O=00N=00T=00 =
=00f=00a=00c=00e=00=3D=00"=00C=00o=00m=00i=00c=00 =00S=00a=00n=00s=00 =
=00M=00S=00"=00 =00=0D=00=0A=
=00s=00i=00z=00e=00=3D=002=00>=00<=00S=00T=00R=00O=00N=00G=00>=00N=00U=00=
K=00E=00<=00/=00S=00T=00R=00O=00N=00G=00>=00<=00/=00F=00O=00N=00T=00>=00<=
=00/=00D=00I=00V=00>=00<=00/=00B=00O=00D=00Y=00>=00<=00/=00H=00T=00M=00L=00=
>=00=0D=00=0A=
=00
------=_NextPart_000_0008_01BFCFF6.788A57A0--
There are hundreds of postings this occurs on if you need more feedback.
Assignee | ||
Updated•25 years ago
|
Keywords: correctness,
nsbeta3
Assignee | ||
Updated•24 years ago
|
Summary: mime messages in news appear as ÿþ< → Quoted printable messages have "ÿþ" at beginning
Assignee | ||
Comment 3•24 years ago
|
||
Ok, the problem here is that the HTML part that is in quoted printable form is
actually in Unicode. libmime doens't handle that very well. I'll have to see
what I can do here.
- rhp
Summary: Quoted printable messages have "ÿþ" at beginning → Unicode messages that are quoted printable parts have "ÿþ" at beginning
Assignee | ||
Updated•24 years ago
|
Target Milestone: M19 → M18
Assignee | ||
Comment 4•24 years ago
|
||
Hi Naoki,
I was wondering if there are any other bugs you've looked at that are similar
to this and if any of our overrides would work for this. Basically, this is an
HTML attachment that is in quoted printable encoding, and that HTML doc has a
charset=unicode META tag.
I've tried hacking around with it, but without much luck. Any ideas or
insights.
- rhp
Assignee | ||
Comment 5•24 years ago
|
||
Kat,
Wondering if you had any ideas on this one either?
- rhp
Comment 6•24 years ago
|
||
Rich, this is my guess but the problem is something like this:
1. Someone uses OE5 to compose a rich text message. He/she uses
Sans Comic Font and types in text under ISO-8859-1.
2. For some reason (perhaps instead of typing in this text, he/she
copied from a text), a Byte Order Mark (BOM) u\FFFE gets into
this message body. Since MS files text files can contain a BOM
this is not unusual.
3. So this message goes out, it is really just plain English message
and not even any 8-bit characters, but because of the spurious
BOM at the beginning of the text, QP mechanism kicks in and
QP-encode the whole thing.
4. My guess is that we don't handle a BOM well in mail messages
and chokes on this when we try to quote it.
"ÿþ<" would be simply FFFE + the beginning of an HTML file, which goes
like: <!DOCTYPE HTML ....
Recently bobj talked about supporting BOM in reading UTF-8
text files. Maybe he has some more idea about this.
I am curious as to how this type of BOM gets into otherwise
simple ASCII HTML file. My suspicion is copy/paste operation
which may leave something like this without the user
ever knowing about it.
I would have Naoki look at this definitely.
Comment 7•24 years ago
|
||
There is a factual error above. It is "FEFF" rather
than "FFFE" which we find in this message. So
in the above comment, substitute FEFF wherever
I say "FFFE".
To add a bit more, Windows 2000/NT stores files in
Unicode. I understand that they add a BOM for Little
Endian, FEFF, to saved text files.
Comment 8•24 years ago
|
||
There is one more peculiar thing going on with this test message.
In fact I tried MS OE5 with the same type of message structure
and Mozilla has the same porblem with every one of these messages
I created with OE5.
The problem is that we are not reading the Content charset correctly
with the multipart/alternative type messages produced by OE5.
My test messages contained no BOM (FEFF) and so they are
pure ISO-8859 (actually only ASCII) msgs.
In fact the test message contained here shows the same problem.
Just try this:
1. Set Default Message View encoding (Edit | Prefs | Mail & Newsgroup | Languages | Characer Coding" to
something like Turkish.
2. Now display a message other than the test message and then come back to the test message.
3. Check View | Character Coding to see what the encoidng is. It will say "Turkish".
4. Go back to step 1 and change the encoding to something else. And try the steps 2 and 3, the value of
encoding changes.
1-4 indicate that we are not picking up the charset parameter in the test message at all and
falling back on the viewing deafult charset. If you reply, that will get engaged.
So we don't know seem to be able to tell what the charset of this type of
multipart/alternative messages. All my other test messages of multipart/alternative
type from OE5 show the same problem. The all contain charset header info, however.
My test messages don't have the quotep problem, though.
Summary: Unicode messages that are quoted printable parts have "ÿþ" at beginning → Unicode messages that are quoted printable parts have "ðž" at beginning
Comment 9•24 years ago
|
||
One more fact: we don't seem to have a problem with multipart/alternative msgs
from Communicator as far as picking up the charset info is concerned.
Comment 10•24 years ago
|
||
I tried a few more times to create a mail message with OE5 so that it contained nothing but
ASCII HTML text but still encoded in Unicode and QP-ed (with all the "=00" in front of
the ASCII characters) with a BOM at the beginning. I haven't been successful so far.
The original filer of this bug says that there are a bunch of messages like that. I'm curious as
to how they get created.
Comment 11•24 years ago
|
||
I think this is a send problem. Since the message is labeled as
charset="iso-8859-1", "=FF=FE" is displayed as "ÿþ".
Sender should not include BOM when it's sending as "ISO-8859-1".
Assignee | ||
Comment 12•24 years ago
|
||
So are we thinking this is an invalid formatted message? If so, I'll mark
invalid.
- rhp
Comment 13•24 years ago
|
||
Yes, since ISO-8859-1 includes 0xFE and 0xFF as valid code points, we cannot
simply ignore them when display the message.
Assignee | ||
Comment 14•24 years ago
|
||
Naoki,
Well, I go back to my original question. Do you have any ideas on handling
this. I'm in Mountain View today so maybe we can talk. I tried a few hacks, but
it didn't seem to work...maybe I'm messing something up, but we can discuss.
- rhp
Comment 15•24 years ago
|
||
I think we need to know how popular this type of mails are.
If we do hack, we need make sure it won't affect correct MIME mails and
performance.
Assignee | ||
Comment 16•24 years ago
|
||
Very much agreed. Let me continue down my hacking road and if I get a workable
solution, I'll let you know.
- rhp
Comment 17•24 years ago
|
||
From the previous comments, it looks like we are handling it correctly as
nhotta commented. If a QP'd message is labeled iso-8859-1, then the correct
interpretation of "=FF=FE" is "ÿþ". So by "fixing" this we would break any
correct email that starts with "=FF=FE".
But since this is very unlikely, we have a choice:
- leave it, and tell users to file bugs against OE
- put a special hack (maybe only for email from OE) to ignore it
What happens if the OE email contains non-ASCII? Are they sent encoded in
ISO-8859-1 in this case?
What doesn't make sense is that if there is a BOM (FE FF or FE FF), then the
data should be in UTF-16 and all "ASCII" characters should have a leading null
byte. But that does not seem to be the case from the previous comments,
otherwise you'd have seen something like a square box before every character.
Kat,
For UTF-8 browsing, the bug about handling UTF-8 "BOM"s (I don't remember
the bug #) has been resolved as WORKSFORME. In the UTF-8 case, the "BOM"
is EF BB BF not either FE FF or FE FF. BOMs (Byte Order Marks) don't really
make sense for UTF-8 since it is a byte stream -- unless used as merely a
UTF-8 signature.
Comment 18•24 years ago
|
||
bobj comments:
> What doesn't make sense is that if there is a BOM (FE FF or FE FF), then the
> data should be in UTF-16 and all "ASCII" characters should have a leading null
> byte
You're right. The original test data contain leading null bytes for all ASCII characters.
And so this is UTF-16/UCS-2 data.
My question really is how this data can be generated and how often this happens.
Assignee | ||
Comment 19•24 years ago
|
||
If this is an edge case that is not that common, I would MUCH rather mark
invalid and move on than muck with libmime at this stage of the game.
- rhp
Comment 20•24 years ago
|
||
Agreed. I'll keep an eye on the frequency issue now that we know
this kind of data exist.
However, I still need to file a separate bug on "multipart/alternative" mail by OE5 and
us not recognizing the charset info. It happens without the Unicode mail and
more general.
Comment 21•24 years ago
|
||
If none of us have any idea how prevalent this may be, then we could just
wait and see what PR2 feedback we get or generate. Let's gather some
data before deciding to punt or fix...
Assignee | ||
Comment 22•24 years ago
|
||
Until this is more pervasive, I'm going to pull back on the beta3 nomination.
- rhp
Keywords: correctness,
nsbeta3
Target Milestone: M18 → M20
Reporter | ||
Comment 23•24 years ago
|
||
I just thought I would say I only really get these messages with this problem,
mainly from one or 2 posters. But all follow ups from that poster then gets the
problem. At a guess I wouldn't say its too wide spread of a problem because
netscape 4 has the same problem, and im the only person I know whos encounted
the problem.
Assignee | ||
Comment 24•24 years ago
|
||
Due to problem happening relatively infrequently, going to future this one.
- rhp
Target Milestone: M20 → Future
Comment 25•24 years ago
|
||
Steve Elmer wrote:
Jaime,
Did your team look at this bug during the triage? It's not even nominated, so
we're thinking that means it should be
FUTUREd. Let me know what you think.
Thanks,
Steve Jaime,
Did your team look at this bug during the triage? It's not even nominated, so
we're thinking that means it should be
FUTUREd. Let me know what you think.
Thanks,
Steve
Comment 26•24 years ago
|
||
Adding Frank and myself to cc: list.
What's the status of this one???
Comment 27•24 years ago
|
||
From the comments in this bug, it seems to me that NS6 is handling this
correctly and that the sent mail is incorrectly encoded.
The question is "Whether this was a common OE problem for which we need to
provide a workaround?" From the data in this bug report, the answer seems to
be "no", that this is a unusual case of a bogus email.
Comment 28•24 years ago
|
||
Frank - Based on the comments in the bug, and the low frequency of occurence. I
vote to nsbeta3- or future this one for now.
Comment 29•24 years ago
|
||
I'd vote to RESOLVE this bug as INVALID.
The email is bogus.
Comment 30•24 years ago
|
||
Frank do u have any objections to marking as invlaid? If not, please mark it as
invalid and let's get it off the radar. On to bigger, badder bugs!
Comment 31•24 years ago
|
||
marking INVALID. reopen if you disagree.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → INVALID
Updated•20 years ago
|
Product: MailNews → Core
Updated•16 years ago
|
Product: Core → MailNews Core
You need to log in
before you can comment on or make changes to this bug.
Description
•