Closed Bug 282908 Opened 20 years ago Closed 10 years ago

character encoding of linked RSS content always falls back to UTF-8

Categories

(MailNews Core :: Feed Reader, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: mkmelin, Unassigned)

References

Details

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0

In a feed where RSS article content is linked, the linked content seem to always
show up UTF-8 encoded. You can test using
http://www.ficora.fi/suomi/tietoturva/rss/varoitukset.xml

An example link from this is
http://www.ficora.fi/suomi/tietoturva/varoitukset/varoitus-2005-17.htm. This
shows up correctly in firefox (which recognize it as ISO-8859-1). When the same
article is displayed inside thunderbird the encoding is wrong. Tested with a
recent nightly as well as tb 1.0.  

Bug 272875 is similar but seems to deal only with inline feed content, this one
is about linked content. 


Reproducible: Always

Steps to Reproduce:
1. Add feed http://www.ficora.fi/suomi/tietoturva/rss/varoitukset.xml
2. Click on an article
3. Open the link in firefox and compare


Actual Results:  
The article is displayed in UTF-8.

Expected Results:  
Should be displayed in ISO-8859-1 (like in firefox).
(In reply to comment #0)
> In a feed where RSS article content is linked, the linked content seem to
> always show up UTF-8 encoded.

By "linked content" I'm assuming you mean that, in the Subscription edit field, 
the checkbox "Show the article summary..." is turned off -- such that the page 
content is brought in when the item is displayed.


> An example link from this is
> http://www.ficora.fi/suomi/tietoturva/varoitukset/varoitus-2005-17.htm. This
> shows up correctly in firefox (which recognize it as ISO-8859-1). When the
> same article is displayed inside thunderbird the encoding is wrong.

Yes, I see this.  The UTF-8 encoding being used by TB is implemented to apply to 
the "message" item that is stored in the feed.  Those "messages" are stored as 
UTF-8 as a matter of course.  If the subject info originally was encoded in 
ISO-8859-1, it gets converted to UTF-8 for storage in the message item's 
Subject header.

The "linked content" -- that is, the actual web page -- is brought in using an 
<iframe>.  The message-item's encoding does appear to be applied to the <iframe> 
-- if the served page does not specify any character set on its own, as the 
example site does not.

Furthermore, changing the encoding via the menu does not change the display of 
the <iframe>'s content.

Seen with TB 1.0 and 1.0+20050218.

Note that some text of the example RSS feed -- e.g. the links in the left-hand 
sidebar -- is displayed with correctly.  This is because the HTML source for 
that text uses entities for the diacriticals.


Compare to (for example) this feed:
  http://www.ch1webdesign.com/rss/
which serves the pages as ISO-8859-1.  The items in Thunderbird are still 
encoded in UTF-8, but the "linked content" displays correctly because those 
pages are served correctly.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Hardware: PC → All
Version: unspecified → Trunk
Summary: character encoding of linked RSS content is always UTF-8 → character encoding of linked RSS content always falls back to UTF-8
> The "linked content" -- that is, the actual web page -- is brought in using an 
> <iframe>.  The message-item's encoding does appear to be applied to the <iframe> 
> -- if the served page does not specify any character set on its own, as the 
> example site does not.

I can confirm that.
 
> Furthermore, changing the encoding via the menu does not change the display of 
> the <iframe>'s content.
> 
> Seen with TB 1.0 and 1.0+20050218.

Still present in TB 1.5.

> The items in Thunderbird are still 
> encoded in UTF-8, but the "linked content" displays correctly because those 
> pages are served correctly.

can confirm that, too.

But I think there should be the possibility to change the encoding via the menu. 
*** Bug 357599 has been marked as a duplicate of this bug. ***
It certainly seems like a bug to me, because it ignores the user character encoding preferences.  If the iframe is presenting problems then perhaps this is yet more evidence that displaying all HTML content within an iframe is a design decision that should be revisited.
QA Contact: rss
I can confirm the bug in  2.0.0.9+20071031 and in latest night build 3.0a1pre (2008020403)
Sample link doesn't show this anymore due to site redesign.
Assignee: mscott → nobody
this feed can be used as a reference as well
http://aristo4bgu.bgu.ac.il/weboard/rss1.aspx?s=1jsm91r31qBV5TgxEwf1/w==
Component: RSS → Feed Reader
Product: Thunderbird → MailNews Core
magnus, is this still a problem?  the iframe has not been used to load the web page for a very long time.  if the correct encoding is served, it should work fine, otherwise default per settings in the few outlier cases.
Haven't seen it, and seems we have no testcase anymore, so WFM.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.