Mozilla will hang when you try to post to a uft-8 newsgroup name

VERIFIED FIXED

Status

MailNews Core
Networking: NNTP
--
critical
VERIFIED FIXED
16 years ago
9 years ago

People

(Reporter: Henrik Gemal, Assigned: nhottanscp)

Tracking

({hang})

Trunk
x86
Windows 2000

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Reporter)

Description

16 years ago
I just tried to post to a newsgroup with a utf-8 name:
dk.test.utf8-זרו
and got an error:
441 Nonexistent newsgroup(s)

the reason is the header that mozilla tries to send is this:

Newsgroups: =?ISO-8859-1?Q?dk=2Etest=2Eutf8=2D=E6=F8=E5?=

he, he.. it encodes the newsgroup name.

build 20011018

Comment 1

16 years ago
Did you check to see if the Pref item:

Edit | Prefs | Mail & Newsgroups | Message Composition |
Composing Message | 8-bit character MIME encoding 

is not checked, i.e. OFF?

Please try that setting and see if that does not work.
(Reporter)

Comment 2

16 years ago
it is OFF. I think this is a known bug....

Comment 3

16 years ago
dup of bug 91112?
This seems like a DUP to me.  Henrik, please verify...

*** This bug has been marked as a duplicate of 91112 ***
Status: NEW → RESOLVED
Last Resolved: 16 years ago
Resolution: --- → DUPLICATE
(Reporter)

Comment 5

16 years ago
dupe = ok...
Status: RESOLVED → VERIFIED

Comment 6

16 years ago
Can this bug be reopened ?
There's a major regression on this.
Mozilla since 0.9.6, and up to 2002010703, now _hangs_ when trying to send, send 
later or save a message to such a group.
Checked on NT4, W2000, RH Linux with several diffrent releases between 0.9.6 and 
2002010703. It wasn't hanging on some pre-0.9.6 releases.

Secondly :
I don't believe this bug should really be handled as a dup of 91112.

91112 is about correct handling of utf-8 encoded newsgroup name.
This request is about to be able to post to such a group.

Just fixing Mozilla so that it's possible to post to a group with a non-ASCII 
value without taking care of displaing them properly is much simpler that 
correctly handling them fully.
This can be done as a first step to satisfy some user.

Comment 7

16 years ago
> Just fixing Mozilla so that it's possible to post to a group 
> with a non-ASCII value without taking care of displaing 
> them properly is much simpler that correctly handling them fully.

J-M, are you saying that it WAS possible at some point to post
an UTF-8 named newsgroup? 

One way to handle this bug is to make it dependent on
Bug 91112 so that when that bug gets fixed, this bug should 
be fixed at the same time. Is that satisfactory to you? or
do you want the regression fixed now?

Re-opening for re-consideration.

Status: VERIFIED → REOPENED
Resolution: DUPLICATE → ---

Comment 8

16 years ago
> J-M, are you saying that it WAS possible at some point to post
> an UTF-8 named newsgroup? 

What I mean is that before it _just_ didn't work, you were getting an error, and 
that now it _blocks_ Mozilla.

Now if you try to post to such a groupe, you end up with an unusable Mozilla 
that's eating 100% CPU, you've got to kill it, and you loose all data you might 
have entered in other Windows.

I don't believe bug 91112 will be fixed fast, I don't see any reason for this to 
be a priority as long as USEFOR is only a draft, and not an RFC, and just fixing 
the sending to groups would be way simpler I think.

And Mozilla hanging and eating 100% CPU is something that should never happen, 
even in unsupported cases.

Comment 9

16 years ago
Sounds really bad Jean-Marc - raising priority.
Severity: normal → critical
Keywords: hang
Summary: impossible to post to newsgroup that have uft-8 name → Mozilla will hang when you try to post to a uft-8 newsgroup name

Comment 10

16 years ago
About the severity, I don't pretend there's a large numer of users impacted,
there's only one newsgroup wich such a name, it's not very widely distributed
and is used only for testing.

I've decided to go and try to see if I can find the origin of the problem, I
downloaded the mozilla source, and recompiled it.

I'm browsing through the source and trying to find what could go wrong, I've
already found suspicious bits.

mime_generate_headers (nsMsgCompUtils.cpp:568) calls
GenerateNewsHeaderValsForPosting to handle the newsgroups header, this is
something that is done only for that header, but is done in every case, "send",
"send later", "save", and the hanging occurs in every of this cases.
So I suspect it might very well be hanging there.

Also I've seen :
compFields->SetNewsgroups(NS_LossyConvertUCS2toASCII(newgroups).get());
being called two times inside QuotingOutputStreamListener::OnStopRequest
(nsMsgCompose.cpp).

This is not consistent. 
When reading the newsgroup content from the server, we are able to ask him for a
newsgroup whose name has non-ASCII character, but when posting we refuse to send
non-ASCII and do a lossy convertion.
The thing to do would be to insert in the newsgroup header exactly the same
value as what the newsserver gave. 
This way even if Mozilla wasn't able to interpret correctly what was sent, it's
still possible to post.

To do that the best would be to keep the exact string that the newserver has sent. 
The other option, the only one available here without a complex rewrite, is to
exactly invert the convertion that has created the UCS2 newgroups value from
what the newsserver has sent.

Also, I found this (nsMsgCompose.cpp:1839):

mHeaders->ExtractHeader(HEADER_NEWSGROUPS, PR_FALSE, getter_Copies(outCString));
if (outCString)
{
    mimeConverter->DecodeMimeHeader(outCString, newgroups, charset);
}

This is no use, one thing that is really sure is that the newsgroup header will
never be MIME RFC2047 encoded. 
MIME-Encoding it would never work.

Comment 11

16 years ago
OK, I've found what is happening.

The code in nsMsgCompUtils.cpp really lacks coherence.
It both tries to MIME encode the Newsgroup header, and blindly assumes the value
is ASCII.

nsMsgCompFields::GetNewsgroups and  nsMsgCompFields::SetNewsgroups just assume
Newsgroups is ASCII, and never check if it's really ASCII.
So when other nsMsgCompFields::Getxxx fonction are garanteed to return valid
UTF-8, nsMsgCompFields::GetNewsgroups will just return the value that has been
set by nsMsgCompFields::SetNewsgroups whatever it is.

nsMsgCompUtils.cpp:518 calls nsMsgI18NEncodeMimePartIIStr giving it a a first
argument pNewsGrp.

pNewsGrp in our case is the return value of nsMsgCompFields::GetNewsgroups, and
is a ISO-8859-1 value.
nsMsgI18NEncodeMimePartIIStr hangs trying to convert pNewsGrp from UTF-8 to the
local encoding.

I have a small and stupid patch in hand. 
Stop calling nsMsgI18NEncodeMimePartIIStr from the Newsgroups header.
The Newsgroups header should never be MIME encoded.

This is not suffisant to be able to post in dk.test.utf8-æøå, but at least we
don't hang anymore.

Comment 12

16 years ago
Created attachment 65089 [details] [diff] [review]
Patch that suppresses the call to MsgI18NEncodeMimePartIIStr that blocks

Comment 13

16 years ago
The reason why we still can't post to dk.test.utf8-æøå after this patch is
that the value in pNewsGrp is not dk.test.utf8-æøå, but the ISO-8859-1 string
this value represents.

When Mozilla opens the message, it automatically detect the Newsgroup header is
in UTF-8, and converts it to ISO-8859-1 before storing it in the newsgroup.

It seems that if the newsgroup name displays properly in the message windows,
that's because it has been recognized it as UTF-8, but the convertion to
ISO-8859-1 is a bad thing.

Because of that we can not send the correct value to the news server.
We would have to assume the header has been converted to ISO-8859-1, but this
works only for names that can be represented in ISO-8859-1, and I wouldn't be
surprised if the convertion is in fact to the local encoding.

Comment 14

16 years ago
Nhotta, Seth, can you guys have a look at Jean-Marc's fix for the hang? It needs
your review.  Thanks.
(Assignee)

Comment 15

16 years ago
>The Newsgroups header should never be MIME encoded.
What should be the right format, any standard?

What if posting a message in UTF-8 to that newsgroup? Then no conversion will be
attempted from ISO-8859-1 to UTF-8. Not proposing that as a workaround though.
 

Comment 16

16 years ago
>> The Newsgroups header should never be MIME encoded.
> What should be the right format, any standard?

There is a proposed IETF draft for USEFOR. The following
document seems to be the latest:

http://search.ietf.org/internet-drafts/draft-ietf-usefor-article-06.txt

Note in particular "Section 4.4.1 Character Sets within Article 
Headers".

 "Where the use of non-ASCII characters, encoded in UTF-8, is permitted
   as above, they MAY also be encoded using the MIME mechanism defined
   in [RFC 2047], but this usage is deprecated within news articles
   (even though it is required in mail messages) since it is less
   legible in older reading agents which support neither it nor UTF-8.
   Nevertheless, reading agents SHOULD support this usage, but only in
   those contexts explicitly mentioned in [RFC 2047]."

Reading this passage and taking into account the fact that this
is still a draft, I would not rush into stop MIME-encoding of
subject headers. The language here is extremely vague saying on
one hand that MIME-encoding is deprecated but at the same time
saying that we may also use MIME-encoding.
This recommendation is also very unrealistic for certain
languages like Japanese where just about 99% of articles
posted have MIME-encoded headers and software that cannot
handle MIME-encoded headers extremely rare or not used.
I don't see that UTF-8 headers can be enforced for Japanese
news no matter what this IETF draft says. 
If some folks want to start encoding headers using UTF-8, I would
create an option not to do that and let the user choose. The draft
says that UTF-8 header is also problematical because older
software does not support. 



Comment 17

16 years ago
CC'ing taka also.

Comment 18

16 years ago
I think there is two separate points here :

One is MIME encoding or not of headers in general. Some people favor MIME
encoding, other prefer sending them in 8 bits in the local encoding, some will
push USEFOR ... This a complex matter and hard to solve.

The other is the MIME encoding of the two specific headers "Newsgroups" and
"Follow-Up".
As news server work by binary comparing the string inside the header
"Newsgroups" to the name of the newsgroup to determine to which newsgroup the
message belongs, it just never, ever work to RFC2047 encode this header, and the
related header "Follow-Up".
When the name is RFC2047 encoded, the result is not predictable, because it
depens on the encoding choosen, the way the encoder works, etc ...
 
So taking a decision about what to do for the others headers is difficult, but
for this two headers one things is certain, there is no way RCF2047 encoding
them could work.

Therefore, the Mozilla code that specifically RCF2047 encode this two headers
should go away, and there is _no_ risk that it will be missed by a single person
anywhere.

Comment 19

16 years ago
One thing more, Momoi's comment #16 applies perfectly well to the first point,
generic mime encoding of headers, but not to the second point, mime encoding of
"Newsgroups" and "Follow-up". 
Nobody in Japan works with Mime encoded "Newsgroups" and "Follow-up" headers.
Depending on the MIME encoding the news client would choose for the name,
ISO-2022-JP, UTF-8, etc ..., the news server would not be able to find the
correct group, so it defects the point of using tagged RFC2047 encoding.

I've been thinking about working on a patch so that the headers "Newsgroups" and
"Follow-up" be sent in the default character coding (the one in
Preference/Navigator/Languages).

If an user wants USEFOR conformance, he will have to select Unicode UTF-8 for
this value. If what he wants is to send the headers in his local encoding, he
just has to choose the encoding he wants there. So this patch will enable USEFOR
conformance, without imposing it.

This can be improved later by using instead an additional parameter that will be
the defaut encoding parameter for the newsserver. The user would be able to
choose it when adding a newsserver, and the default character coding would be
the default value. 
This could also be used as the default character coding for newsgroup folders
provided by this newsserver, when not manually overridden.

But I'd prefer to know if there's a chance this patch would be accepted before
working on it.
(Assignee)

Comment 20

16 years ago
nsMsgI18NEncodeMimePartIIStr, the last argument can specify the header to be
MIME encoded or not. If MIME encode is not preffered for newsgroup name then
pass PR_FALSE instead of using the result of nsMsgMIMEGetConformToStandard().

Comment 21

16 years ago
The draft in comment #16 is meant to extend RFC-1036 which is equivalent to
RFC-822 (or, 2822).  According to RFC-977 which
defines NNTP protocol as RFC-821 (or, 2821) does for SMTP:

   2.2.  Character Codes

   Commands and replies are composed of characters from the ASCII
   character set.  When the transport service provides an 8-bit byte
   (octet) transmission channel, each 7-bit character is transmitted
   right justified in an octet with the high order bit cleared to     
   zero.

Therefore, implemenations are not supposed to use raw UTF-8 to select a
newsgroup with GROUP command, as far as standard conformance's concerned.

I know Brian in messaging divlet used to work extended version of 
NNTP, but not sure how it ended up with.  My guess is, his draft 
couldn't attract much attention since nobody really wanted to use 
newsgroup name in native language.

IMHO, the right direction for Mozilla 1.0 release is to validate
newsgroup name and give user a error message when non ASCII group
name is used.

Comment 22

16 years ago
Any comments to my statement in the previous comment above?

By allowing a post to UTF-8 newsgroup, we will be violating RFC-977.
Using MIME encoding to newsgroup doesn't make sense either.

Comment 23

16 years ago
In bug 74055 , the problem of a crash when trying to subscribe to non 7-bit
groups on the server news.webking.com.cn was accepted and solved.

The existence of this server proves some people want non 7-bit names for
newsgroups (and don't simply accepts that RFC977 tells them it's forbidden).
I can not access this server right now, it either no more has public access, or
restricts access based on geographic zone, I don't know.

The approach choosen in NS 4 and in bug 74055 was to systematically interpret
newsgroups name as being in ISO-8859-1, but the groups on this server clearly
are in chinese (probably gb2312).

So this shows that :
- some user want support for non 7 bit newsgroups.
- even if Mozilla can't display the name properly, at least being able to read
and post messages to the group is useful, and is an acceptable first step.
- ideally, the user should have the choice of using the encoding he specifies
for communicating with the server. 
If he wants to respect USEFOR, it will be UTF-8, if he wants a local encoding,
like gb2312 for this server, he will just have to specify it.

In addition :
In the current state, only ISO-8859-1 can properly be handled
So even if the newsgroup name is valid UTF-8, it should be displayed and stored
as ISO-8859-1.
Right now in this case, it's converted from UTF-8 to ISO-8859-1, and this later
causes that it's impossible to post back to the server.
(Assignee)

Comment 24

16 years ago
-> nhotta
Does it still hang?
Assignee: sspitzer → nhotta
Status: REOPENED → NEW
Keywords: nsbeta1

Comment 25

16 years ago
http://ietf.org/internet-drafts/draft-ietf-nntpext-base-15.txt

This is a proposed standard to enable UTF-8 newsgroup name at NNTP level.  We
shouldn't send non ASCII newsgroup name without implementing this protocol. 
Although reading non ASCII newsgroup with standard NNTP might be a good idea,
I'm not sure how much effort it implies and how much we gain from it.

Comment 26

16 years ago
Testing with 2002022408 :
Hey nhotta, just tell you've checked in something new.
It's a lot, lot better.

It hangs no more.
The content of the Newsgroups and Followup-To headers that are sent to the
newsserver is never Mime encoded, even when mail.strictly_mime_headers is true.
The value for other headers respects the setting of mail.strictly_mime_headers.

Sample values :
User-Agent: Mozilla/5.0 (Windows; U; WinNT4.0; en-US; rv:0.9.8+) Gecko/20020224
X-Accept-Language: en-us, en
MIME-Version: 1.0
Newsgroups: dk.test.utf8-æøå
Followup-To: dk.test.utf8-æøå
Subject: Re: test envoie message =?ISO-8859-1?Q?=E9=E9=E9?=

Sending messages to the group dk.test.utf8-æøå is still a bit problematic.

It's name in the list of groups is displayed as "dk.test.utf8-æøå".
When composing a new message, the name appears as "dk.test.utf8-æøå".
All the values in the headers sent to the newsserver are correct, it is possible
to post in the group using Mozilla.

When viewing a message in the group, inside in message panel the name is
displayed as "dk.test.utf8-æøå".
When replying to the message, the name in the composition windows is 
"dk.test.utf8-æøå", this is the value that will be sent to the newsserver and
will not be accepted as the value the server expects is "dk.test.utf8-æøå",
which the UTF8 encoding of "dk.test.utf8-æøå".

It's not great, but if you know what to do, you can post to the group in all cases. 
When replying, you have to overwrite the value of "dk.test.utf8-æøå" that will
not work with the value "dk.test.utf8-æøå" that works.

The basic functionnality of being able to send messages to a 8-bits group now works.

IMO it's RESOLVED/FIXED, and what is left to be done (Proper displaying of the
group name) can be considered as a duplicate of bug 91112.
(Assignee)

Comment 27

16 years ago
I think the behavior change was by taka's recent check in for MIME encoder.

Mark this as fixed. Please file a separate bug for the display issue.
Status: NEW → RESOLVED
Last Resolved: 16 years ago16 years ago
Resolution: --- → FIXED
(Assignee)

Comment 28

16 years ago
There is a bug 126453 about newsgroup name in Chinese.

Henrik - does this now work for you?  I don't have access to that UTF-8 test
group yet, thanks!
(Reporter)

Comment 30

16 years ago
in 20020305 it's impossible to test:
sending a posting to either "dk.test.utf8-æøå" or ""dk.test.utf8-æøå" fails.

Newsgroups: dk.test.utf8-æøå
Article Rejected -- No such groups

Newsgroups: dk.test.utf8-æøå 
Article Rejected -- Illegal newsgroup 'dk.test.utf8-æøå' in 'Newsgroups:' field

Comment 31

16 years ago
Henrik, I don't have write acces to sunsite.dk, so I did the test with a local
newserver, namely the Hamster program on wich I created a dk.test.utf8-æøå group.
With 2002030409, I still can post.

This said I forgot to say that I do see a bug, but I didn't report it as I
considered it unrelated, but maybe it's not so much unrelated.

Mozilla refused to post to the server it got the dk.test.utf8-æøå group from, it
tried another server I was subscribed to instead, and failed. 
From your problem description, it might be that you're having a problem of this
kind, "Illegal newsgroup 'dk.test.utf8-æøå' in 'Newsgroups:" does not seem to
be a Mozilla originated message.

As I have already had some issues with Mozilla selecting the wrong newserver
when posting, I restarted the test with a fresh profile that has only one
server, the one that holds dk.test.utf8-æøå.

Then it worked.

Can you try that ?

If the reason why Mozilla is not able to find out the right server to post is
because the group name is 8 bit, then this is a related bug.

I will check if they are opened bugs about Mozilla selecting the wrong server to
post a news message.
(Reporter)

Comment 32

16 years ago
v 20020305

posted to both "dk.test.utf8-æøå" and ""dk.test.utf8-æøå" with success.

The reason why it didn't work before was that the wrong newsserver was selected.
I just created a new profile that only had one newsserver.

Please open a new bug on that and Cc me on it.
Status: RESOLVED → VERIFIED
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.