Closed Bug 1389762 Opened 4 years ago Closed 9 months ago

Unable to post to non-ascii named newsgroups, NNTP error "No newsgroups matched"

Categories

(MailNews Core :: Networking: NNTP, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: tkgg, Unassigned)

References

Details

(Keywords: regression)

Attachments

(2 files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36

Steps to reproduce:

I am trying to post to a non-ascii newsgroups named "测试中文", in this NNTP server "news.newsfan.net", by using Thunderbird released version 52.2.1, Windows 10


Actual results:

Error message pop up, "NNTP error... Feedrule: No newsgroups matched {æµè¯ä¸­æ}" 


Expected results:

old thunderbird v24.8.1 is able to post, current release v52.2.1 should be able to post to non-ascii named newsgroups
Some encoding problem, I should take a look.

Have you tried TB 31, TB 38 and TB 45?
Component: Message Compose Window → Networking: NNTP
Keywords: regression
Product: Thunderbird → MailNews Core
Version: 52 Branch → 52
Alfred, would you be interested to look into this one?

I created an account with news.newsfan.net and in the subscribe dialogue everything already looks garbled, even in TB 24.
Flags: needinfo?(infofrommozilla)
As far as I know, there is still no standard for Unicode group names.
There is therefore no uniform regulation for this.

For UTF-8 encoded names it works out of the box.
See group "Test.UTF8.测试1" on that Server.

For the other groups, the server uses a different encoding.

So with:
Account Settings -> "news.newsfan.net" -> Server Settings -> Default Text Encoding: 'Chinese Simpified (GBK)'

it works for me.
Flags: needinfo?(infofrommozilla)
(In reply to Jorg K (GMT+2) from comment #1)
> Some encoding problem, I should take a look.
> 
> Have you tried TB 31, TB 38 and TB 45?

I tried TB v24.4.0, v24.8.1, v31.1.2, v17.0.7, all of them can work properly. 

I didn't have v38 and v45 installation previusly.
How about following the instructions from comment #3? Does that help?
(In reply to Jorg K (GMT+2) from comment #5)
> How about following the instructions from comment #3? Does that help?

No, for comment #3, it's talking about how to display the group names (Chinese GBK) in subscription window, I think. 

I can see those group names in Chinese, and I can subscribe them, and download header and body properly, but I could not post to any of those groups, except those ascii groups.
Yes, I can see the newgroups now but when posting a follow-up, I get an error. Looks like we can't post to any non-UTF-8-named group.

Just mentioning that I'm seeing:
DEPRECATION WARNING: Encoding to non-UTF-8 values is obsolete
You may find more details about this deprecation at: http://bugzilla.mozilla.org/show_bug.cgi?id=790855
jar:file:///C:/Program%20Files/Mozilla%20Thunderbird%2057/omni.ja!/components/mimeJSComponents.js 443 encodeMimePartIIStr_UTF8
which typically happens when someone is trying to encode a header in anything bug UTF-8.

I'd have to debug it, but currently I have more important things to do. Alfred, can you help out? You can get a JS stack by adding this debug:
for (var frame = Components.stack; frame; frame = frame.caller) {
  dump("== JS stack> " + frame.filename + " (" + frame.lineNumber + ")\n");
Flags: needinfo?(infofrommozilla)
Immediately - unfortunately not. My building environment is broken because of an HD crash.
If it can wait some days, I could do that.
Looking in to this a little: The "Feedrule: No newsgroups matched" (not localised) comes from the server.

"A News (NNTP) error occurred:" comes from here:
https://dxr.mozilla.org/comm-central/rev/20658ec6d032a2dff9c281d6431e9004d5a33968/mail/locales/en-US/chrome/messenger/news.properties#37

A little stack dump from when the error happens:
xul.dll!nsNNTPProtocol::AlertError(int errorCode, const char * text) Line 4798	C++
xul.dll!nsNNTPProtocol::PostDataResponse() Line 3485	C++
xul.dll!nsNNTPProtocol::ProcessProtocolState(nsIURI * url, nsIInputStream * inputStream, unsigned __int64 sourceOffset, unsigned int length) Line 4514	C++
xul.dll!nsMsgProtocol::OnDataAvailable(nsIRequest * request, nsISupports * ctxt, nsIInputStream * inStr, unsigned __int64 sourceOffset, unsigned int count) Line 298	C++

So we're in PostDataResponse() and the previous step was nsNNTPProtocol::PostData() which calls PostMessageInFile(filePath).

That calls nsMsgProtocol::PostMessage() and looking the in the loop, I see this data going past:
mozilla::detail::nsCStringRepr = {mData=0x000001e2b8855888 "Newsgroups: èŠå¤©çŒæ°´.情感" mLength=31 mDataFlags=TERMINATED | SHARED (5) ...}

So something has messed up the Newsgroups: header.

Further lines look OK, and I noted:
mozilla::detail::nsCStringRepr = {mData=0x000001e2b8855888 "Content-Type: text/plain; charset=gbk; format=flowed" mLength=52 mDataFlags=...}

So we're posting in GBK which is the encoding for that server.

I have that suspicion that this is another JSMime regression. Before JSMime, you could encode headers in all different encodings, but JSMime forces headers to UTF-8 now. So something has gone wrong for the Newsgroups header.

While the message was being processed I found the file being sent in my temp directory as nsemail.eml, UTF-8 encoded. Here is the content:

Subject: =?UTF-8?B?UmU6IOeUn+a0u+S4jei/h+WmguatpA==?=
Newsgroups: 聊天灌水.情感
References: <58c0e502$1_3@NFNewsServer4.NewsFan.NET>

As we see, the Newsgroups: header is UTF-8 encoded, something the server doesn't seem to like at all.

I'll consult Joshua.
Flags: needinfo?(Pidgeot18)
I repeated the experiment with TB 24. There I get:

User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.8.1
MIME-Version: 1.0
Newsgroups: 聊天灌水.情感
Subject: Test please ignore
References: <58c239ff$1_5@NFNewsServer4.NewsFan.NET>
In-Reply-To: <58c239ff$1_5@NFNewsServer4.NewsFan.NET>

but GB2312 encoded (according to my editor). So the header is sent in the encoding of the server and not UTF-8.

As I said, JSMime can't encode headers in anything else but UTF-8 and also the only charset permissible for (e-mail) headers according to RFC 6532 (https://tools.ietf.org/html/rfc6532) is UTF-8.

So I don't know that the way forward here is. Of course Newsgroups: is not an e-mail header, but rather a token that identifies the newsgroup we want to post to.
Flags: needinfo?(infofrommozilla)
(In reply to Alfred Peters from comment #3)
> As far as I know, there is still no standard for Unicode group names.
> There is therefore no uniform regulation for this.

RFC 3977 states that commands and arguments are all UTF-8. Newsgroups are supposed to be pure ASCII, which is of course a lie in practice. My recollection is that it's intended that newsgroup names should be UTF-8, but there was resistance to enabling non-ASCII newsgroup names in RFC 5536 given that RFC 5322 still mandated ASCII headers. Given the trajectory of standards, if and when non-ASCII newsgroups are made spec-legal, they're going to be UTF-8. The only question is if RFC 2047-encoded variants of newsgroups would be legal in the messages themselves.

It's worth testing if a RFC 2047-encoded newsgroup header actually posts properly. Given that the news server appears to be custom, and doesn't support RFC 3977, I'm doubtful that it would work, but it's still worth testing.

> For UTF-8 encoded names it works out of the box.
> See group "Test.UTF8.测试1" on that Server.
> 
> For the other groups, the server uses a different encoding.

Per-group charset settings? That's a sign that the server is messed up and needs to be fixed, not the client.

(In reply to Jorg K (GMT+2) from comment #11)
> So I don't know that the way forward here is. Of course Newsgroups: is not
> an e-mail header, but rather a token that identifies the newsgroup we want
> to post to.

The Usenet message format is a slightly stricter variant of email messages (RFC 5536 is built on top of RFC 5322, as RFC 1036 was on RFC 822), so if it's not a legal email message, it's definitely not a legal Usenet message.
Flags: needinfo?(Pidgeot18)
(In reply to Joshua Cranmer [:jcranmer] from comment #12)
> It's worth testing if a RFC 2047-encoded newsgroup header actually posts
> properly.
And how would I do that exactly? Drill open the send logic? Somehow the code that prepared the message, supposedly nsMsgSend.cpp and friends, decided to do the Newsgroups: header in pure UTF-8 although other headers usually get encoded.

Or I telnet into the server and talk the protocol?

BTW, the group I tried posting to was: 聊天灌水.情感, so =?UTF-8?B?6IGK5aSp54GM5rC0LuaDheaEnw==?=
(In reply to Joshua Cranmer [:jcranmer] from comment #12)

>  Given the trajectory of
> standards, if and when non-ASCII newsgroups are made spec-legal, they're
> going to be UTF-8.

With newsgroup names in RAW-UTF-8, TB should work fine.
On 'news.trigofacile.com' (read-only) do exist some test groups like 'trigofacile.test.ᾅ'
The newsgroup headers of the articles are also in RAW-UTF-8.

So INN actually handle them that way.

On my local server (Hamster), I can post successfully to such an UTF-8 group.

---
See also: Bug 126453
(In reply to Jorg K (GMT+2) from comment #13)
> (In reply to Joshua Cranmer [:jcranmer] from comment #12)
> > It's worth testing if a RFC 2047-encoded newsgroup header actually posts
> > properly.
> And how would I do that exactly?
Joshua suggested on IRC to paste the RFC 2047 encoded group name into the Newsgroups: field. I did and got:
Feedrule: No newsgroups matched {=?utf-8?b?6igk5asp54gm5rc0luadheaenw==?=}

So given the previous discussion and given that posting to a UTF-8 newsgroup on this server ("Test.UTF8.测试1") works, this is a WONTFIX?

Still a little disappointing given that TB 24 "worked" and given that you can set the server encoding to GBK. So shouldn't that be respected when posting? What is the setting good for if it doesn't work "full circle"?

Is the last paragraph of comment #12 saying that we'd need to compose an *invalid* message with |Newsgroups: 聊天灌水.情感| encoded in GBK in the header to make it work (as we did before) and we won't do that?
Flags: needinfo?(Pidgeot18)
(In reply to Jorg K (GMT+2) from comment #15)
> (In reply to Jorg K (GMT+2) from comment #13)
> > (In reply to Joshua Cranmer [:jcranmer] from comment #12)
> > > It's worth testing if a RFC 2047-encoded newsgroup header actually posts
> > > properly.
> > And how would I do that exactly?
> Joshua suggested on IRC to paste the RFC 2047 encoded group name into the
> Newsgroups: field. I did and got:
> Feedrule: No newsgroups matched {=?utf-8?b?6igk5asp54gm5rc0luadheaenw==?=}
> 
> So given the previous discussion and given that posting to a UTF-8 newsgroup
> on this server ("Test.UTF8.测试1") works, this is a WONTFIX?

I don't want to enable writing message headers as non-UTF-8 in JSMime. That's something that requires some rather intrusive API changes (to handle unencodable characters) for a very niche use case. Handling this would have to happen in the NNTP code, IMO (which reflects what it is: non-spec NNTP servers violating old specifications). I could see accepting a patch that would do transcoding of the headers when posting the NNTP message to the server charset, but I'm not entirely sure it's worth spending time to write that patch.

> Still a little disappointing given that TB 24 "worked" and given that you
> can set the server encoding to GBK. So shouldn't that be respected when
> posting? What is the setting good for if it doesn't work "full circle"?

The setting is used to mark the charset of the NNTP control protocol (although RFC 3977 does say that is to be UTF-8, we definitely can't rely on that being the case unless we know it supports RFC 3977).

> Is the last paragraph of comment #12 saying that we'd need to compose an
> *invalid* message with |Newsgroups: 聊天灌水.情感| encoded in GBK in the header to
> make it work (as we did before) and we won't do that?

Yes, this is an invalid message per RFC 1036 and RFC 5536. Breaking specifications is sometimes necessary for compatibility, but I feel that we should carefully consider when we do that (particularly for content we originate).
Flags: needinfo?(Pidgeot18)
(In reply to Joshua Cranmer [:jcranmer] from comment #16)
> I could see accepting a patch that
> would do transcoding of the headers when posting the NNTP message to the
> server charset,
That's what I've been thinking.

> but I'm not entirely sure it's worth spending time to write
> that patch.
Hmm, yes, other fires are burning hotter right now, but I might get back to this.
This can only be a couple of lines in the right spot where we process the Newsgroup: anyway:
- Get the server charset
- Encode the header in the server's charset (maybe hidden behind a pref)

Joshua, from your vastly better knowledge of the code, do you know where to put this tweak?
Flags: needinfo?(Pidgeot18)
Reading comment #12 about how newsgroup names are intended to be encoded, the usual consensus (unfortunately still not written in a standard) is to use UTF-8 with Unicode Normalization Form C (NFC).
MIME-encoded newsgroup names are not expected to be used because wide-spread news servers do byte-to-byte comparisons to handle newsgroups, distributions, message-IDs...  Just using NFC UTF-8 newsgroup names will then work out of the box.
See also bug 79606 that maybe could be linked to this one?
While TB now behaves according to the standard, previous versions didn't and worked better in some cases. I think the suggestion in comment #18 is still worth implementing.
See Also: → 79606
Flags: needinfo?(Pidgeot18)

Doesn't really sound fixable. Doing comment #18 would give broken messages on the email side for cross mail-nntp use, say a posting with a mail cc.

Status: UNCONFIRMED → RESOLVED
Closed: 9 months ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.