Explicit unicode characters in RSS feed Subject aren't shown correctly

RESOLVED INVALID

Status

MailNews Core
Feed Reader
--
trivial
RESOLVED INVALID
12 years ago
9 years ago

People

(Reporter: Ulrich Hobelmann, Assigned: Scott MacGregor)

Tracking

1.8 Branch

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Reporter)

Description

12 years ago
User-Agent:       Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.8.1b2) Gecko/20060820 Camino/1.0+
Build Identifier: TB version 1.5.0.5 (20060719)

Ok, today's Slashdot RSS contains an article with the Subject containing "Poincaré", and it's written as "Poincaré".  TB shows the latter.

Reproducible: Always

Steps to Reproduce:
1. Load an RSS whose Subject contains a unicode character as &#...;

Actual Results:  
The subject is shown verbatim, as &#...;

Expected Results:  
The Unicode character description should be decoded and displayed correctly.
(Reporter)

Updated

12 years ago
Version: unspecified → 1.5
Confirming. The slashdot post is not in the feed any more, but Mozilla Developer News currently has a post with the title "Quick note on status of 1.8 Branch…"
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Mac OS X 10.4 → All
Hardware: Macintosh → All

Comment 2

12 years ago
Dupe of bug 260078?
Created attachment 236353 [details]
Slashdot feed

Slashdot (Science) feed with the Poincaré article still in it.
Closer to the rotated inverse: that bug is about rendering entities as HTML when they should be displayed as plain text, this is about not rendering them.

However, this is invalid: if the feed had literally "&#233;" in it, then it would be the job of the XML parser to convert that to an actual character, and life would be good. Instead, they double-escape it as though it somehow made sense to use "&amp;#233;", so the XML parser converts it to "&#233;", and what happens next is up to your interpretation of the various RSS (under-)specs. In the case of Atom, it's utterly unambiguous, and that's exactly why Atom exists; in the case of RSS 2.0, it's unspecified by ununderstanding, but the spec author has later said that he intended the spec to mean that the content model for RSS 2.0 <title> elements was plain text, that <title>&amp;#233;</title> should display "&#233;" rather than an e with an acute; in the case of RSS 1.0, which is what Slashdot publishes, it's not quite as well specified as it should be, but <title> is described and specified exactly the same as <description>, and a primary author of the spec later specified <content:encoded> to allow for an alternative to <description> which *does* contain escaped HTML, which makes it a fair assumption that the intention was to specify both <description> and <title> as plain text.

It's unfortunate that most of the earliest aggregators were web pages, written by people who didn't actually understand either HTML or XML, so they failed to re-escape the plain text coming out of their XML parsers before stuffing it into an HTML page, so people noticed that they could treat <title> as text/html rather than text/plain, putting RSS in the current state where you can't predict how many times to escape things to have your meaning make it through, but because Thunderbird, Firefox, and IE7 are all going to treat it as originally "specified" as text, there's at least some hope that people will stop pretending it's HTML. Or, will switch to Atom, where they can say what they mean.
Status: NEW → RESOLVED
Last Resolved: 12 years ago
Resolution: --- → INVALID
(Reporter)

Comment 5

12 years ago
(In reply to comment #4)
> if the feed had literally "&#233;" in it, then it
> would be the job of the XML parser to convert that to an actual character, and
> life would be good. Instead, they double-escape it as though it somehow made
> sense to use "&amp;#233;", so the XML parser converts it to "&#233;", and what
> happens next is up to your interpretation of the various RSS (under-)specs.

Look at the attachment: the feed literally *has* "&#233" in it; there's no "&amp;#233;" escape in there.

So are you saying that Thunderbird is following the wrong/misunderstood interpretation of RSS?  That would still be a bug, in my book.
View - Page Source.

And no, I'm saying Thunderbird is following the right interpretation.

Updated

9 years ago
Component: RSS → Feed Reader
Product: Thunderbird → MailNews Core
Version: 1.5 → 1.8 Branch
You need to log in before you can comment on or make changes to this bug.