Closed Bug 272875 Opened 20 years ago Closed 20 years ago

Thunderbird systematically displays RSS feeds in UTF8 making them unreadable

Categories

(MailNews Core :: Feed Reader, defect)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: pascalc, Assigned: mscott)

References

Details

(Keywords: fixed-aviary1.0, intl)

Attachments

(2 files)

Thunderbird 1.0RC1, Winxp, new profile 1 add this feed to the list of feeds : http://www.chevrel.org/fr/carnet/rss.php Reproducable : always actual result: all latin characters are displayed as é, è... expected result : display them with the right charset The feed is valid and correctly served as Content-Type: text/xml; charset=ISO-8859-1 The problem is systematic with all non english feeds, thunderbird always displays them in UTF8 whatever the encoding really is. This makes RSS support in Thunderbird unusable outside of the English-speaking world.
Flags: blocking-aviary1.0?
Flags: blocking-aviary1.0-L10N?
it's usually a problem with the feed. We fixed a bunch of bugs with japanese based feeds to make the charsets look right. Problem is that some feeds lie about their charset.
I checked the feed in a feed validator, checked the charset sent by the server and everything says that it is sent as ISO 8859-1 This feed is generated by Dotclear, the main "geek" blog system in France used by all Mozilla fans and several major mozilla blogs (Spanish version of Mozillazine, daniel glazman, tristan nitot, blogzinet, me...). It will also be used for Mozilla Europe RSS feed. It means that the bug will be highly visible in Europe in the mozilla community if we ship 1.0 with this bug. I send this bug report to the Dotclear developpers as well so ass that they can see if the problem is on their end.
something weird is definetly going on here. I'm still debugging.. Pascal, can you open up the local mail folder called Carnet web de pascal which contains the feed articles in an editor. Look at some of the message bodies. Do those characters look like UTF-8 or ISO-8859-1? please let me know as soon as you can. thanks!
I can fix this for this french feed by getting rid of the last ConvertFromUnicode call befoer we write the text to the mail folder but then we end up breaking all of the japanese test feeds that we use which need this call...Very strange.
I just looked in my text editor and characters look like UTF-8
I should add that if you download the file directly through your browser and save it on your hard drive, the text is in ISO-8859-1 and not UTF8, its Thunderbird that first transforms the file as UTF-8 for disk storage and then tries to display it as ISO-8859-1. Logically it should either not transform anything at all and display each feed with the original charset, or transform and display everything as UTF-8 whatever the site and discard the original charset.
I confirm that having the Mozilla-europe.org news feeds broken in our product is not going to bring us some positive coverage :-)
any patch I've come up with so far that ends up fixing this one particular feed ends up breaking all of my other I18N test cases including UTF-8 encoded japanese feeds like slashdot and my central european test feeds like: http://www.gazeta.pl/pub/rss/wiadomosci.xml Trisan, does your comment imply that you are going to help us debug this! That's awesome! Thanks for the offer to help.
OS: Windows XP → All
Hardware: PC → All
Attached patch start of a fixSplinter Review
This fixes this particular feed without regressing my other European feed test cases. Testing matrix coming up next
This document outlines the test feeds I've been using to test the various combinations. Things get really tricky because there are lots of variables in the matrix: RSS 1.0, RSS 2.0 and atom feeds. For each one of these feeds you could be viewing: article text, a web page, just a short article summary or the summary may just be the title. That's a lot of combinations and I haven't found feeds to test all of them yet. Good news: with this patch, this feed and several others start working. It hasn't regressed any of the other testing feeds I've been using. What still doesn't work: feeds with asian characters that are NOT encoded as UTF-8 don't work with this patch. But they didn't work before this patch either. Feeds like JA slashdot which are encoded as UTF-8 work before and after this patch. I also need I18N atom feeds to round out the testing of these changes. I don't have any non ascii atom feeds. So there could still be problems there.
If it can help, the RSS feed is also generated in the Atom format: http://www.chevrel.org/fr/carnet/atom.php
Keywords: intl
Thanks. This particular atom feed works fine without the patch. Hopefully it will still work with the patch.
atom feeds worked for ISO-8859-1 before and after this patch. So that's good.
*** Bug 263998 has been marked as a duplicate of this bug. ***
Comment on attachment 167759 [details] [diff] [review] start of a fix time to start getting some extra eyes on this. Read my last few comments for more info.
Attachment #167759 - Flags: superreview?(bienvenu)
Attachment #167759 - Flags: superreview?(bienvenu) → superreview+
Status: NEW → ASSIGNED
Target Milestone: --- → Thunderbird1.0
fixed branch and trunk.. would be nice if we could figure out the problem with EUC_JP encoded feeds though...
Flags: blocking-aviary1.0?
Flags: blocking-aviary1.0-L10N?
Keywords: fixed-aviary1.0
Pascal, can you confirm that things are working now on your end? ftp://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-0.9/ Thanks!
I works now :)
the french feed in comment 0 and the polish feed in comment 8 both look good using 200412050x-0.9 on linux fc2 and mac os x 10.3.6.
This may be a duplicate of bug 253807
since the patch for this was checked into both the aviary1.0 and trunk, marking fixed. but do reopen if this still need additional patching on the trunk.
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Component: RSS → Feed Reader
Product: Thunderbird → MailNews Core
Target Milestone: Thunderbird1.0 → ---
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: