Closed Bug 272875 Opened 20 years ago Closed 20 years ago

Thunderbird systematically displays RSS feeds in UTF8 making them unreadable

Categories

(MailNews Core :: Feed Reader, defect)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: pascalc, Assigned: mscott)

References

Details

(Keywords: fixed-aviary1.0, intl)

Attachments

(2 files)

Thunderbird 1.0RC1, Winxp, new profile

1 add this feed to the list of feeds : http://www.chevrel.org/fr/carnet/rss.php

Reproducable : always
actual result: all latin characters are displayed as é, è...
expected result : display them with the right charset

The feed is valid and correctly served as Content-Type: text/xml; charset=ISO-8859-1

The problem is systematic with all non english feeds, thunderbird always
displays them in UTF8 whatever the encoding really is. This makes RSS support in
Thunderbird unusable outside of the English-speaking world.
Flags: blocking-aviary1.0?
Flags: blocking-aviary1.0-L10N?
it's usually a problem with the feed.

We fixed a bunch of bugs with japanese based feeds to make the charsets look right.

Problem is that some feeds lie  about their charset. 
I checked the feed in a feed validator, checked the charset sent by the server
and everything says that it is sent as ISO 8859-1

This feed is generated by Dotclear, the main "geek" blog system in France used
by all Mozilla fans and several major mozilla blogs (Spanish version of
Mozillazine, daniel glazman, tristan nitot, blogzinet, me...). It will also be
used for Mozilla Europe RSS feed.

It means that the bug will be highly visible in Europe in the mozilla community
if we ship 1.0 with this bug. I send this bug report to the Dotclear developpers
as well so ass that they can see if the problem is on their end.
something weird is definetly going on here. I'm still debugging..

Pascal, can you open up the local mail folder called Carnet web de pascal which
contains the feed articles in an editor. Look at some of the message bodies. Do
those characters look like UTF-8 or ISO-8859-1?

please let me know as soon as you can. thanks!
I can fix this for this french feed by getting rid of the last
ConvertFromUnicode call befoer we write the text to the mail folder but then we
end up breaking all of the japanese test feeds that we use which need this
call...Very strange.
I just looked in my text editor and characters look like UTF-8
I should add that if you download the file directly through your browser and
save it on your hard drive, the text is in ISO-8859-1 and not UTF8, its
Thunderbird that first transforms the file as UTF-8 for disk storage and then
tries to display it as ISO-8859-1.

Logically it should either not transform anything at all and display each feed
with the original charset, or transform and display everything as UTF-8 whatever
the site and discard the original charset.
I confirm that having the Mozilla-europe.org news feeds broken in our product is
not going to bring us some positive coverage :-)
any patch I've come up with so far that ends up fixing this one particular feed
ends up breaking all of my other I18N test cases including UTF-8 encoded
japanese feeds like slashdot and my central european test feeds like:

http://www.gazeta.pl/pub/rss/wiadomosci.xml

Trisan, does your comment imply that you are going to help us debug this! That's
awesome! Thanks for the offer to help.
OS: Windows XP → All
Hardware: PC → All
Attached patch start of a fixSplinter Review
This fixes this particular feed without regressing my other European feed test
cases. 

Testing matrix coming up next
This document outlines the test feeds I've been using to test the various
combinations. Things get really tricky because there are lots of variables in
the matrix: RSS 1.0, RSS 2.0 and atom feeds. For each one of these feeds you
could be viewing: article text, a web page, just a short article summary or the
summary may just be the title.

That's a lot of combinations and I haven't found feeds to test all of them yet.


Good news: with this patch, this feed and several others start working. It
hasn't regressed any of the other testing feeds I've been using. 

What still doesn't work:
feeds with asian characters that are NOT encoded as UTF-8 don't work with this
patch. But they didn't work before this patch either. Feeds like JA slashdot
which are encoded as UTF-8 work before and after this patch.

I also need I18N atom feeds to round out the testing of these changes. I don't
have any non ascii atom feeds. So there could still be problems there.
If it can help, the RSS feed is also generated in the Atom format:
http://www.chevrel.org/fr/carnet/atom.php
Keywords: intl
Thanks. This particular atom feed works fine without the patch.

Hopefully it will still work with the patch. 
atom feeds worked for ISO-8859-1 before and after this patch. So that's good. 
*** Bug 263998 has been marked as a duplicate of this bug. ***
Comment on attachment 167759 [details] [diff] [review]
start of a fix

time to start getting some extra eyes on this. 

Read my last few comments for more info.
Attachment #167759 - Flags: superreview?(bienvenu)
Attachment #167759 - Flags: superreview?(bienvenu) → superreview+
Status: NEW → ASSIGNED
Target Milestone: --- → Thunderbird1.0
fixed branch and trunk..

would be nice if we could figure out the problem with EUC_JP encoded feeds though...
Flags: blocking-aviary1.0?
Flags: blocking-aviary1.0-L10N?
Keywords: fixed-aviary1.0
Pascal, can you confirm that things are working now on your end?

ftp://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-0.9/

Thanks!
I works now :)
the french feed in comment 0 and the polish feed in comment 8 both look good
using 200412050x-0.9 on linux fc2 and mac os x 10.3.6.
This may be a duplicate of bug 253807
since the patch for this was checked into both the aviary1.0 and trunk, marking
fixed. but do reopen if this still need additional patching on the trunk.
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Component: RSS → Feed Reader
Product: Thunderbird → MailNews Core
Target Milestone: Thunderbird1.0 → ---
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: