Closed Bug 91112 Opened 23 years ago Closed 16 years ago

Newsgroups header should always be in UTF8

Categories

(MailNews Core :: Networking: NNTP, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: John.planb, Unassigned)

References

()

Details

Per the latest (and most of the previous) USEFOR draft, the Newsgroups 
header 8 bit characters in the Newsgroups header should always be sent 
as UTF8, not MIME encoded or in the local charset.
Status: UNCONFIRMED → NEW
Ever confirmed: true
I don't mind having an option to send UTF-8 headers but
currently there are other RFCs for news headers such as the one
for Japanese:

RFC 1468

For nearly a decade Japanese news has been using ISO-2022-JP headers. This
is something that is done in practice and I don't see that changing immediately. 

Thus, it is not practical in this and probably in other cases to stop
using local charsets unless there is a broad agreement. I will write to
Charles Lindsey, the author of the USEFOR proposal to request info on
this issue.

My suggestion is to create an option instead of using UTF-8 for all
cases. This is a draft proposal and we need not rush.
We should consider if we can detect UTF-8 headers when reading news.  This
would be in line with our attempt to be lenient on what come in, but strict
(e.g., adhere to RFCs) on what is sent out.
Perhaps I wasn't clear.  I'm not referring to "newsgroups headers" but 
rather the header "Newsgroups:".  I know there are other encodings for 
*other* headers, but it's as close to a sure thing as you can get for 
something that hasn't happened yet, that "Newsgroups:" is going to be 
UTF8.
So, this bug is not about the subject line?
No, this isn't about the header "Subject" it is about the header 
"Newsgroups".  Which, regardless of any other setting, should be sent as 
raw 8 bit characters in UTF8 -- no MIME encoding.
There is a test group with a UTF8 name, dk.test.utf8-זרו (that's C3A6 
C3B8 C3A5) for those interested in seeing how Mozilla currently works.

(May not be carried by all servers).
Current support for the "newsgroups" header in UTF-8 in Mozilla :

Newsgroups headers in UTF-8 is a proposition in USEFOR, and is being tested for
 the group dk.test.utf8-æøå.

Warning: This is the result of testing on Windows platform with an OS version
where ISO-8859-1 is the native encoding. 
Results may be different with a different localization/OS.

In the list of newsgroups panel and in the subscribe windows, the name of the
newsgroup is displayed as if the encoding was ISO-8859-1.
When reading one of the message in the group, the name of the group is correctly
interpreted and displayed in the message window. 
The utf-8 encoding of the group name is being recognized.

When answering to a message in the group, in the composition window, the name
will be correct, but in the content sent to the newsserver, the newsgroup header
will be encoded as MIME in ISO-8859-1. 

When creating a new message in the group, the name for the group in the
composition window will be the same as in the newsgroups list.

When user_pref("mail.strictly_mime_headers", false); is set to allow headers to
be in raw 8 bit (see bug 68394), in the content sent to the newsserver, the
newsgroup header will stop at the first character that was encoded in UTF-8 / is
not US-ASCII.
Setting the encoding of the message to utf-8 changes nothing to this.

Steps for reproduction :
This steps use the newsserver sunsite.dk in read only mode, they do not require
you to have a write enabled account on it. 
Access to this server is free in read only mode.

- Create an newsserver account on the newserver sunsite.dk
- download list of newsgroups
- subscribe to dk.test.utf8-æøå (this is a UTF-8 encoded name, improperly
displayed as ISO-8859-1)
- in the newsgroup list panel, the name dk.test.utf8-æøå appears.
- select this group
- click on the title of one of the messages to the read it
- The content of the message appears in the message panel. 
  The name displayed for the group is dk.test.utf8-æøå as it should be.
- click reply.
- the name that appears for the group in the composition windows is
  dk.test.utf8-æøå
- select send later in the file menu
- search for this message in the "unsent messages" mailbox.
- in the message view windows, the name displayed for the group is 
 dk.test.utf8-
- do "view source"
- in the source of the message, the newsgroups headers is :
"Newsgroups: =?ISO-8859-1?Q?dk=2Etest=2Eutf8=2D?=,dk.test"

It has been improperly encoded in Mime in ISO-8859-1.
(ps : in this MIME encoding, only "dk.test.utf8-"has been encoded, the utf-8
characters have disappeared).

- close Mozilla 
- add  user_pref("mail.strictly_mime_headers", false); to pref.js in the user
account.
- restart Mozilla
- click on one of messages in dk.test.utf8-æøå , hit reply, do send later.
- go to the "unsent messages" mailbox and select the unsent message.
- in the message view windows, the name displayed for the group is 
 dk.test.utf8-
- in the source of the message, the newsgroups headers is :
"Newsgroups: dk.test.utf8-"

- If I set the encoding of the message to UTF-8 before sending the message,
Mozilla will accept to send raw UTF-8 in the subject,
"Subject: Re: æøå"

but will still send the same newsgroups header that stops at the fist non
US-ASCII character :
"Newsgroups: dk.test.utf8-"

As a whole, the behaviour is not very consistent.
I have the feeling the "newgroups" header is already treated differently from
other headers, so adding to it the special property that it will never be MIME
encoded is not a special step.

So we have two bugs that should be seperated :
1 - allow Mozilla to post to newsgroups with non US-ASCII characters. This now
is not possible, whatever the position with regard to utf-8 is.
This is possible with OE 4, even if the display of the newsgroup name is never
correct.
2 - have a setting that allows the default encoding for headers to be UTF-8. 
This is probably less urgent, but it would be nice to have it at least in a way
similar to the mail.strictly_mime_headers option. 
so this bug is the cause of bug 105710 ?
btw: I have full access (read/post) to the server that has the dk.test.utf8-זרו
newsgroup...
Yes, this bug is the cause of bug 105710.
*** Bug 105710 has been marked as a duplicate of this bug. ***
Product: MailNews → Core
sorry for the spam.  making bugzilla reflect reality as I'm not working on these bugs.  filter on FOOBARCHEESE to remove these in bulk.
Assignee: sspitzer → nobody
Filter on "Nobody_NScomTLD_20080620"
QA Contact: stephend → networking.news
Anyone who has access to servers with i18nized names, does this bug still occur in TB 2.0.0.* or trunk TB builds? If no response in 3 weeks, I will close this bug as RESO INCO...
Whiteboard: [jcranmer:unconfirmed] closeme 2008-07-17
RESO INCO per last comment. If you feel this change was made in error, please respond with your comments why.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → INCOMPLETE
Whiteboard: [jcranmer:unconfirmed] closeme 2008-07-17
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.