Closed Bug 1130533 Opened 9 years ago Closed 9 years ago

latest versions of SeaMonkey dropped option to set KOI8-R charset in "Default Character Encoding" and "Fallback Character Encoding" for Mail & News

Categories

(SeaMonkey :: MailNews: General, defect)

SeaMonkey 2.32 Branch
x86
Windows XP
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: jagular, Unassigned)

References

Details

Attachments

(4 files)

User Agent: Mozilla/5.0 (Windows NT 5.1; rv:34.0) Gecko/20100101 SeaMonkey/2.31
Build ID: 20141202220728

Steps to reproduce:

Edit - Preferences - Mail&Newsgroops - Character Encoding:
Tried to set "Fallback Character Encoding" and "Default Character Encoding" to KOI8-R (standard russian coding for e-mails)


Actual results:

There is no option to choose KOI8-R coding. Choosing "Cyrillic" parameter sets charset to windows-1251 coding that is not standard for e-mails. I had to manually edit mailnews.send_default_charset and mailnews.view_default_charset to KOI8-R through "about:config" menu.


Expected results:

KOI8-R coding should be returned to the list of available charsets.
This is the result of Bug 943252 (Remove nsCharsetMenu from m-c)
Looks like the C-C patch there <https://bugzilla.mozilla.org/attachment.cgi?id=8410848&action=edit> was never approved.
Status: UNCONFIRMED → NEW
Ever confirmed: true
See Also: → 943252
Why not just use UTF-8?
I think the plan is to eventually cut the list down way further, leaving UTF-8, ISO-2220-JP and maybe some more.
(In reply to Magnus Melin from comment #2)
> Why not just use UTF-8?

Lots of reasons like the compatibility with ALL even very ancient mail-clients, ability to read mail from any console terminal, correct coding as a trust-flag for spam-filter, passing through 8-bit servers... And inconvenience of explaining to clients how to set correct charset through "about:config" menu instead of common "Preferences". :-)
(In reply to jagular from comment #3)
> Lots of reasons like the compatibility with ALL even very ancient
> mail-clients, ability to read mail from any console terminal, 

Any console that's supported anymore can show UTF-8.

> correct coding
> as a trust-flag for spam-filter, passing through 8-bit servers... And

I doubt flagging UTF-8 mails as spam is a good idea. There's no problem having content be UTF-8 - transfer encodings are used when needed.

> inconvenience of explaining to clients how to set correct charset through
> "about:config" menu instead of common "Preferences". :-)

Unicode (UTF-8) is right there. No need for about:config
(In reply to Magnus Melin from comment #4)
Maybe, maybe. But KOI8-R is a standard for e-mails in russian language and other codings are not. So its removal just makes SM unusable for correct RU-mail (or at least makes it harder to customize).
BTW, what is the latest version of SM that supports setting KOI8 through ordinary "Preferences" menu?
(In reply to jagular from comment #5)
> (In reply to Magnus Melin from comment #4)
> Maybe, maybe. But KOI8-R is a standard for e-mails in russian language and
> other codings are not. So its removal just makes SM unusable for correct
> RU-mail (or at least makes it harder to customize).

How do you explain Russian Thunderbird getting away with defaulting to UTF-8 for outgoing email (for quite some time already)?
https://mxr.mozilla.org/l10n-central/source/ru/mail/chrome/messenger/messenger.properties#162
(In reply to Henri Sivonen (:hsivonen) from comment #6)
> (In reply to jagular from comment #5)
> > (In reply to Magnus Melin from comment #4)
> > Maybe, maybe. But KOI8-R is a standard for e-mails in russian language and
> > other codings are not. So its removal just makes SM unusable for correct
> > RU-mail (or at least makes it harder to customize).
> 
> How do you explain Russian Thunderbird getting away with defaulting to UTF-8
> for outgoing email (for quite some time already)?
> https://mxr.mozilla.org/l10n-central/source/ru/mail/chrome/messenger/
> messenger.properties#162

Also note how Russian Thunderbird by default assumes that unlabeled incoming email is windows-1251.
(In reply to Henri Sivonen (:hsivonen) from comment #7)
> > How do you explain Russian Thunderbird getting away with defaulting to UTF-8
> > for outgoing email (for quite some time already)?
> Also note how Russian Thunderbird by default assumes that unlabeled incoming
> email is windows-1251.

Can't comment TB. Maybe bug. I don't use it, I prefer SM.
There are few state-standards for russian coding (iron standards called GOST in russian, that are used in state-institutes and similar structures). Web-pages can be written in lots of OS'es, so any russian standard coding for such OS'es (cp1251,cp866,koi8-r,utf) is fine for web-pages. But the only official state standard for ru-mail is koi8-r.

Again, the bug in latest versions of SM is not in support of the KOI8-R or any other russian coding (all charsets work fine AFAIK), but in "Preferences" menu, where selection of this parameter was removed. The bug is in interface, not support of coding. I can't select standard KOI8-R coding with normal GUI. I have to use unofficial about:config menu and have to explain to my clients and co-workers how to use it. That's very inconvenient. I think users should not bug with low-level tweaks.
I don't know what you mean by "ru-mail". And where is that documented?

I think you so far not given any real reason for not using UTF-8 (or windows-1251), besides you personally not liking it. Since UTF-8 has been the default for quite some time, and there's been no other complaints about that, you can't just dismiss those evidence that it works with "maybe bug".
(In reply to Magnus Melin from comment #9)

> I think you so far not given any real reason for not using UTF-8 (or
> windows-1251), besides you personally not liking it.

How a coding can be liked or not? :-) That was a funny one, really. :-) Ok, here is the story:
There are many atandards for coding in russian language. To prevent a mess every serious organization has a set of rules for document exchange. System administrators put their signatures to these documents. The rules of all scientific and state institutes are based on recommendations of institutes that were the first internet-providers in our country. Kurchatov's Institute was a base for the first RUnet-provider called RELCOM. Its recommendations are the base for all standards in Russia. Some simplified documents on standards can be found on its open pages. KOI8-R for example is briefly described here: http://old.relcom.ru/Services/Infoline/TechSupport/Application/General/encode/ (page in russian). KOI8-R is a standard for mail in russian language. Other codings are ok for other aims (f.e. cp1251, koi8-r and utf are acceptable for web-pages, cp866 is used for technical documentation, etc). Inobvious for foreigners situation with russian charsets is the reason why xUSSR-made software is more popular in Russia. There are many examples of such problems:
"The BAT!" e-mail program (made in Moldavia) is one of the most popular, for example, because all coding problems are well-known to developers.
Netscape-3 was never used for mail in Russia because it required additional fonts for koi8-r support.
Google Chrome browser will never be used in serious institutes not only because of security problems but because of lack of cp866-coding that prevent it from accessing tons of documentation.
So, "wild" users can use any coding they want (there is no problem with that), but KOI8-R is necessary for users that work with e-mail. That's not a question of "like the coding" or "not like the coding". :-) That's a question of working by rules or not working at all. SeaMonkey mail works fine with standard for ru-mail KOI8-R coding, but removing it from GUI to tweaks-menu makes customizing SM harder. That's all.
(In reply to jagular from comment #8)
> (In reply to Henri Sivonen (:hsivonen) from comment #7)
> > > How do you explain Russian Thunderbird getting away with defaulting to UTF-8
> > > for outgoing email (for quite some time already)?
> > Also note how Russian Thunderbird by default assumes that unlabeled incoming
> > email is windows-1251.
> 
> Can't comment TB. Maybe bug.

These choices for the Russian localization were made by the localizers, BTW, so it's not like someone from outside the Russian context imposed them.

> There are few state-standards for russian coding (iron standards called GOST
> in russian, that are used in state-institutes and similar structures).

That's irrelevant. For outgoing mail, the question is "Can recipients handle UTF-8?" Thunderbird being able to default to UTF-8 suggests yes. After all, UTF-8 is over 20 years old now and UTF-8-enabled software has been widely available for over a decade now. (For incoming email, the question is "Which guess is the most successful one for non-compliant incoming email that lacks encoding information?" The Thunderbird localizers seem to think that windows-1251 is a reasonable guess.)

It's irrelevant to email but relevant to how seriously you should take GOST's legacy specs in general on the encoding topic that the gost.ru Web site itself now uses UTF-8 and before they started using UTF-8, they used windows-1251 instead of KOI8-R.

Furthermore, if you look at the *standards* for email, the pre-UTF-8 IETF standard first standardized ISO-8859 part 1 through 10, so by that account, the pre-UTF-8 *email standard* would be ISO-8859-5, but of course we shouldn't go against what works based on an outdated RFC now that UTF-8 has been standardized and implemented. Likewise, outdated GOST standards should have no bearing on how outgoing email works when UTF-8 exists and works.

> I have to use unofficial about:config menu and have
> to explain to my clients and co-workers how to use it.

Why do you need to explain anything? Is someone concretely unable to read messages that your clients and co-workers send if they stick to the defaults?
I guess inability to read and understand explanations means that localization bug will not be fixed. Thanks for nice talk, children. :-)
Marking INCOMPLETE due to mere appeal to outdated (pre-UTF-8) national standards without an explanation of what specific receiving MUAs have trouble with emails sent with the default settings.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → INCOMPLETE
That's actually a shame that features used by 5% users are thrown away for the sake of other 95% "who would never need those".

This is exactly what happened to Mozilla Suite when FF and TB became popular. Following the same logic, people would never need any SeaMonkey once FF and TB are available.

As a Russian speaking user, I can say lots of people still prefer their mail sent out as KOI8-R. Particularly, here're the some of reasons those 5% would produce:

 - NNTP (Russian-speaking groups) and FidoNet still define KOI8-R as their standard charset, this encoding being a part of the netiquette. If one sends a CP1251-encoded message to FidoNet, the least what happens is he becomes a laughingstock (but most probably he gets banned).

 - There're people who still have ru_RU.KOI8-R as their system locale. There's no need to switch to UTF-8 when KOI8-R suffices. So why do I need to store my mail in a charset different from my locale's charset?

 - If I store my mail locally, sometimes it's just convenient to use 'grep' against my /var/spool/mail/<username>, w/o guessing the charset for each individual message or performing a conversion with 'iconv'.

 - When you keep your mail archive for the last 15+ years on your hard drive, 1 byte in KOI8-R vs 2 or 3 bytes in UTF-8 becomes a significant increase in size. Yes storage is cheap nowadays, but why would I want my mail archive expanded in the first place? Multiple old IT companies in Russia and CIS still keep their relational data in KOI8-R, and database vendors have little success persuading those companies to migrate to Unicode.

 - All MUAs popular among UNIX users (sylpheed, claws-mail, kmail, evolution) support KOI8-R directly, w/o the need for an end user to master the poorly documented internals like about:config. Why should SeaMonkey be any different?

 - Finally, I don't see any reason why windows-1251 was chosen over KOI8-R as a legacy 8-bit encoding. I must admit CP866 and ISO-8859-5 are rare indeed nowadays, but KOI still preserves a significant share.
(In reply to Henri Sivonen (:hsivonen) from comment #13)
> Marking INCOMPLETE due to mere appeal to outdated (pre-UTF-8) national
> standards without an explanation of what specific receiving MUAs have
> trouble with emails sent with the default settings.

Henri, one last question.

I've set

 - intl.charset.fallback.override,
 - mailnews.send_default_charset and
 - mailnews.view_default_charset

to KOI8-R in my about:config. Now my preferences dialog looks like "no encoding is selected" at all (see attachment #8601597 [details] and attachment #8601598 [details]).

Provided you have no intention of fixing the original issue,
is there anything that could be done to at least make the charset selection combo boxes display the value set via about:config rather than have no value selected?
(In reply to Andrey ``Bass'' Shcheglov from comment #14)
>  - NNTP (Russian-speaking groups) and FidoNet still define KOI8-R as their
> standard charset, this encoding being a part of the netiquette. If one sends
> a CP1251-encoded message to FidoNet, the least what happens is he becomes a
> laughingstock (but most probably he gets banned).

I thought public FidoNet was using CP866.
(In reply to [:Aleksej] from comment #18)
> 
> I thought public FidoNet was using CP866.

Negative.
See the rules: http://www.fido7.ru/roadmap.html:

> If native Cyrillic codetable on your computer is not the one used on the Net (koi8-r, RFC-1489),
> then you need a way to convert text to your codetable (translation proxy server, translating NNTP server)
> or have koi8 fonts and koi8 keyboard map installed of your computer.
(In reply to Andrey ``Bass'' Shcheglov from comment #19)
> (In reply to [:Aleksej] from comment #18)
> > I thought public FidoNet was using CP866.
> 
> Negative.
> See the rules: http://www.fido7.ru/roadmap.html:

Those rules are for the gateway.  Rules of echos that mention an encoding require CP866 (at least one also mentions KOI8 for the gateway).
(In reply to Andrey ``Bass'' Shcheglov from comment #14)
>  - NNTP (Russian-speaking groups) and FidoNet still define KOI8-R as their
> standard charset, this encoding being a part of the netiquette. If one sends
> a CP1251-encoded message to FidoNet, the least what happens is he becomes a
> laughingstock (but most probably he gets banned).

Becoming a laughingstock is not really a description of a technical compatibility problem.

How do other people even notice if someone is posting something other than KOI8-R to FidoNet? Thunderbird and SeaMonkey always indicate what encoding they actually used, so as long as receiving software a) supports MIME and b) supports the encoding that was indicated, the user of the recipient software should not have any reason to notice what encoding was actually used.

Do you mean that software commonly used to read FidoNet posts is either defective in the sense the it doesn't support receiving UTF-8 or defective in the sense of ignoring the charset parameter in the Content-Type header? (I.e. Thunderbird/SeaMonkey would need to send outgoing messages in KOI8-R to paper over the defects in the receiving software.)

Or do you mean that software commonly used to post on FidoNet is defective in the sense that it doesn't label KOI8-R messages as being encoded in KOI8-R? (I.e. Thunderbird/SeaMonkey would need to assume KOI8-R for unlabeled incoming messages to paper over defects in the sending software.)

In other words, is there an actual technical compatibility problem? If there is, let's focus on the nature of that problem.

>  - There're people who still have ru_RU.KOI8-R as their system locale.
> There's no need to switch to UTF-8 when KOI8-R suffices.

The above is a terribly hostile attitude towards software developers and QA. Saying something as unreasonable as the above doesn't help making people believe that you are voicing a legitimate concern about the encoding prefs. Considering complexity, the reasonable path forward is for *nix systems all around the world to use the same encoding: UTF-8. Red Hat started defaulting to UTF-8 in 2002 and Debian in 2007. Users who are running with a non-UTF-8 locale have had plenty of time to migrate. If someone has failed to migrate and this causes them a problem in 2015, they have themselves to blame. Refusal to migrate and expecting other people to accommodate basically boils down to asking others to bear the cost of your habit.

>  - All MUAs popular among UNIX users (sylpheed, claws-mail, kmail,
> evolution) support KOI8-R directly, w/o the need for an end user to master
> the poorly documented internals like about:config. Why should SeaMonkey be
> any different?

Thunderbird and SeaMonkey support receiving KOI8-R email that's labeled as KOI8-R. As for outgoing email, supporting non-UTF-8 output involves complexity and it would be good to get rid of that complexity eventually. The MUAs you mention appear (I didn't actually test) be able to receive UTF-8 email labeled as UTF-8 just fine, so sending UTF-8 to them should not be a problem.

Note that the Russian localizations of Thunderbird and SeaMonkey had chosen to default to UTF-8 for outgoing email long ago (before the start of Mercurial history in 2008)--way before en-US started defaulting to UTF-8 for outgoing email and before the number of options in the pref menu got reduced. For this reason, it was considered safe not to include Cyrillic legacy encodings in the menu for the pref for outgoing email when the menu was pruned in preparation of moving to UTF-8 only eventually.

>  - Finally, I don't see any reason why windows-1251 was chosen over KOI8-R
> as a legacy 8-bit encoding. I must admit CP866 and ISO-8859-5 are rare
> indeed nowadays, but KOI still preserves a significant share.

Two reasons:

 1) Since time before the start of the Mercurial history in 2008 the Russian localizations of Thunderbird and SeaMonkey have defaulted to windows-1251 as the encoding used for decoding unlabeled incoming email.

 2) The analogous menu in the browser has windows-1251, so when the menu was ported over, there was no obvious reason to change it on the point of the Cyrillic encoding given point #1.

For this to change, we'd need to see data that it's more common for people who expect Russian email to receive *unlabeled* KOI8-R than for them to receive *unlabeled* windows-1251.

(In reply to Andrey ``Bass'' Shcheglov from comment #17)
> Provided you have no intention of fixing the original issue,
> is there anything that could be done to at least make the charset selection
> combo boxes display the value set via about:config rather than have no value
> selected?

I am not a SeaMonkey front-end developer, but I find it unlikely that such a change would be a good use of front-end developer time, because making such a change would be more complicated than adding KOI8-R to either menu.
(In reply to Henri Sivonen (:hsivonen) from comment #21)
> (In reply to Andrey ``Bass'' Shcheglov from comment #14)

> Do you mean that software commonly used to read FidoNet posts is either
> defective in the sense the it doesn't support receiving UTF-8 or defective
> in the sense of ignoring the charset parameter in the Content-Type header?

Some popular FidoNet editors do not support UTF-8, although they should support transcoding between single-byte encodings, probably unless misconfigured (CP866 is the default in Russian FidoNet, because it is used in DOS, Windows 9x console and OS/2), and at least some are capable of specifying the encoding.  A Web search shows recent efforts of testing UTF-8 and describing possible use of UTF-7.

But it is the Internet-FidoNet gateway software that needs to be able to distinguish encodings of incoming *e-mail* messages to transcode correctly to CP866.  The widely used ones are probably few enough for it to be their problem.
(In reply to Henri Sivonen (:hsivonen) from comment #21)
> (In reply to Andrey ``Bass'' Shcheglov from comment #14)
> >  - There're people who still have ru_RU.KOI8-R as their system locale.
> > There's no need to switch to UTF-8 when KOI8-R suffices.
> 
> The above is a terribly hostile attitude towards software developers and QA.
> Saying something as unreasonable as the above doesn't help making people
> believe that you are voicing a legitimate concern about the encoding prefs.
> Considering complexity, the reasonable path forward is for *nix systems all
> around the world to use the same encoding: UTF-8. Red Hat started defaulting
> to UTF-8 in 2002 and Debian in 2007. 

Henri, what you're saying is a terribly hostile attitude towards end users. Additionally, it is merely not true. Red Hat does default to UTF-8 indeed, but a user can define custom locale settings using localedef(1). Speaking of Debian (8.0 Jessie, released a couple of moths ago), it allows to set ru_RU.KOI8-R during installation (attachment #8615884 [details]), and this locale is a 1st class citizen in Debian Linux (attachment #8615885 [details]). Since you're supporting legacy 8-bit charsets like CP1251 and CP1252 anyway, I don't see what extra complexity it takes to support more 8-bit charsets.

> Users who are running with a non-UTF-8
> locale have had plenty of time to migrate. If someone has failed to migrate
> and this causes them a problem in 2015, they have themselves to blame.
> Refusal to migrate and expecting other people to accommodate basically boils
> down to asking others to bear the cost of your habit.

This is just BS, if you pardon my French. This is a UNIX world, we have software reliably running for years, with migration steps as transparent as possible. I have nothing against switching to UTF-8, but this should be a choice made by end user, *not* software vendor. In a couple of years some other SeaMonkey engineer comes up with an insane idea of UTF-32 being a "reasonable" upgrade path and drops support for UTF-8 (not completely impossible: look at Lennart Poettering, systemd and kdbus) -- and users are forced to run through the migration hell once again.

> >  - All MUAs popular among UNIX users (sylpheed, claws-mail, kmail,
> > evolution) support KOI8-R directly, w/o the need for an end user to master
> > the poorly documented internals like about:config. Why should SeaMonkey be
> > any different?
> 
> Thunderbird and SeaMonkey support receiving KOI8-R email that's labeled as
> KOI8-R. As for outgoing email, supporting non-UTF-8 output involves
> complexity and it would be good to get rid of that complexity eventually.

You keep mentioning "complexity" w/o getting into any details about the nature of this complexity. And I really can't understand your point. And believe me, as a S/W engineer dealing with both Unicode and multiple Cyrillic charsets, I do have some minimum experience sufficient to say your argument is pointless (except that your wish to make your life easier is completely natural and understandable). Thunderbird/SeaMonkey is going to be the only MUA simple enough to support nothing but Unicode. Or did I forgot Apple Mail?

This discussion is pointless.
Thank you for you time.
I'll spend mine re-building SM with my own patches applied -- this seems a more productive approach.
Look at any of the Fidonet newsgroups (fido7.ru.*).  The posts are all in KOI8 with no Content-Type: headers.  The volume's not huge but it's clearly still in use.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: