Closed Bug 252147 Opened 20 years ago Closed 18 years ago

Livemark threads should render character entities

Categories

(Firefox :: Bookmarks & History, defect)

x86
Windows XP
defect
Not set
normal

Tracking

()

RESOLVED INVALID

People

(Reporter: glc_bugs, Assigned: vlad)

References

Details

(Keywords: polish)

Attachments

(3 files, 2 obsolete files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040715 Firefox/0.9.1+
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040715 Firefox/0.9.1+

Character entities are displayed expanded in Livemark threads.

Reproducible: Always
Steps to Reproduce:
1. Add a livemark (to Slashdot, for example, since it often has article titles
with ampersands).
2. In the Bookmarks menu, look at a thread title in the livemark containing an
ampersand.
Actual Results:  
The ampersand is spelled out as the character entity: &

Expected Results:  
The ampersand should be rendered as an ampersand character: &
Slashdot RSS from 20040902 with an ampersand
Keywords: polish
This occurs for numeric entities like quote characters and dashes too (e.g.
—). Severity should be raised to at least minimal, this is very
distracting behaviour.
Severity: trivial → normal
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Flags: blocking-aviary1.0?
Flags: blocking-aviary1.0? → blocking-aviary1.0+
Attached file Asa's blog 20040924
This also occurs for html tags, but I don't know if that should be supported.
Asa's blog used <s></s> in a title today.
Problem with this post over at djst's nest too:
http://weblogs.mozillazine.org/djst/archives/006473.html
I'll try to get to this for 1.0, but it shouldn't block 1.0.
Flags: blocking-aviary1.0+ → blocking-aviary1.0-
*** Bug 260356 has been marked as a duplicate of this bug. ***
*** Bug 263163 has been marked as a duplicate of this bug. ***
*** Bug 264359 has been marked as a duplicate of this bug. ***
Same problem occurs with bookmarks, it seems.

Visit http://www.yaml.org/spec/ and bookmark the page. It shows up as "YAML
Ain&#39;t Markup Language (YAML) 1.0" in the bookmarks, although the title
renders fine otherwise.
comment 9 WFM
Ignore comment 9, the problem does not occur in the current version of Firefox
(might have been an issue with earlier versions). Sorry for the interruption.
*** Bug 266244 has been marked as a duplicate of this bug. ***
I have 1.0PR and this bug is still happening in this version, screenshot avaible
if requested
This also happend often with French Titles containing any accents (ie: é, è, à,
ç...)

I have 1.0 and the bug is still there

Here is a good example:
http://express.xstreamsoft.com/dyn/rss/6561e0b0-a249-4246-9fff-b1a9a20224a3-fr.aspx
(In reply to comment #14)
>
http://express.xstreamsoft.com/dyn/rss/6561e0b0-a249-4246-9fff-b1a9a20224a3-fr.aspx

It seems to me that the link and attachment (id=157724) are using
*double* entity references.

Invalid?
Still occurring for me in:

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0

while browsing the live bookmarks from slashdot.
RSS files in slashdot and mozillazine(djst's) are malformed.
IMHO, this bug is WFM or INVALID.
However, maybe livemark should render, say, &amp;amp;#45;
,though I have no idea about e.g. &amp;amp;amp;amp;amp;amp;amp;amp;amp;

I wonder why they feed malformed data.
What we need now is the RSS evangelism.
Attached file Test RDF (obsolete) —
Attached file RDF testcase version 2
Fixes the problem with the first testcase. Just add this live bookmark
manually.
Attachment #166937 - Attachment is obsolete: true
Attachment #166938 - Attachment is obsolete: true
Torisugari is right. My testcase renders perfectly fine as a live bookmark. The
question is: should we support poorly constructed feeds if they are so common?
Maybe evangelism to common feed generation tools would be a good route.

Another question: why does the XML source look correct when Firefox or IE does
the "pretty print" thing, but not when I download and view the raw source?
Evangelism is the way to go eventually, but that's an uphill struggle. For now,
given the fact that the code to convert entities into symbols is already there
somewhere in the codebase, I'd say it should be put in.
*** Bug 278158 has been marked as a duplicate of this bug. ***
*** Bug 284102 has been marked as a duplicate of this bug. ***
(In reply to comment #17)
> RSS files in slashdot and mozillazine(djst's) are malformed.
> IMHO, this bug is WFM or INVALID.

No, they aren't malformed, because there's no definition of how they should be
formed. This ambiguity is one of the primary reasons why Atom exists.

Consider one of the most famous examples, Tim Bray's post when he left the W3C
Technical Architecture Group, entitled "</TAG>". Because the spec strongly
implies that the output handed to you by an XML parser from the <description> is
HTML source as-is, some people believe the output from <title> is as well, and
thus will treat <title>&lt;/TAG></title> as an attempt to insert a </TAG> into
the output HTML of an aggregator, while others believe the output of <title> is
plain text (since it started that way, and was never explicitly changed), and if
they were outputting HTML would re-escape the < that their XML parser unescaped.
The former belief is more widely held, so usually when you encounter an
&amp;amp; in a <title>, it means that the author wanted an &amp; put into HTML
output, and an & displayed.
*** Bug 284332 has been marked as a duplicate of this bug. ***
*** Bug 284740 has been marked as a duplicate of this bug. ***
At least in RSS2.0 spec, it is explicitely mentionned that specific characters
should be doubly escaped via entities : see
http://blogs.law.harvard.edu/tech/rss#hrelementsOfLtitemgt, and the associated
examples http://blogs.law.harvard.edu/tech/encodingDescriptions (especially 3
and 4).
The use of CDATA in example 4 should also be supported -- I haven't yet tested
if it does, though.
*** Bug 312890 has been marked as a duplicate of this bug. ***
*** Bug 322785 has been marked as a duplicate of this bug. ***
*** Bug 325156 has been marked as a duplicate of this bug. ***
*** Bug 333360 has been marked as a duplicate of this bug. ***
The RSS 2.0 spec explicitly says to entity-escape HTML in the *description* element, not in the title. It says nothing about HTML in the title, going clear back to RSS 0.91 which says of everything "no HTML is allowed." While the RSS 2.0 spec is frozen in its ambiguous state, the author of the spec has said that he thinks the correct way to interpret "description is entity-escaped HTML" without a corresponding "title is entity-escaped HTML" is that title is not entity-escaped HTML, and thus that the correct way to display |<title>AT&amp;amp;T</title>| is as "AT&amp;T", which is exactly how IE7 is going to display it, judging by the current preview, which should help out our evangelizing of either using Atom (which we will handle correctly in 2.0), or of producing correct RSS (using either real characters or numeric character references instead of double-escaped HTML character entity references, not double escaping ampersands, and never using a less-than character in any form, because it's impossible to avoid the broken results in broken clients).
Status: ASSIGNED → RESOLVED
Closed: 18 years ago
Resolution: --- → INVALID
Bollocks. &amp; is defined by XML, not HTML. Unless the entity is contained within a CDATA section, &amp; has to be dereferenced. &auml; et al. are HTML entities, yes, but &amp;, &lt; and &gt; are XML entities and thus should be treated accordingly in ALL XML dialects (including RDF) unless the specs override the corresponding XML standard.
(In reply to comment #34)
> Bollocks. &amp; is defined by XML, not HTML.

That would be a fine argument, if it had anything to do with this. If we had an XML parser that didn't know about XML's predefined entities, I'd be all over fixing it. This bug, however, is about the XML |<title>AT&amp;amp;T &amp;uuml;ber alles</title>|, which causes our parser to hand us the string "AT&amp;T &uuml;ber alles" which we correctly display as-is. Because some of the early implementations used a browser (as distict from a browser's chrome) to display RSS, and sloppily failed to escape ampersands and in some cases angle brackets, people got the mistaken impression that they should use that XML when they want a display of "AT&T über alles" and that mistaken impression is what we are talking about correcting.
Well, in that case there's no sane reason anything should be different then. "&amp;amp;" is clearly an escaped "&amp;", not an escaped "&".

Sounds more like a job for a standards evangelist than a Mozilla update to me.

I don't fully understand why anyone would escape entities like that, but in this case I agree that a view-source clearly shows that Mozilla isn't the software to blame for this not leading to the (absurdly) expected result.

So, once again, sorry for the interruption.
*** Bug 337074 has been marked as a duplicate of this bug. ***
*** Bug 340403 has been marked as a duplicate of this bug. ***
This is also in 1.5.0.3, If this is a issue with Atom and is reselved with RSS 2.0 that would be great in Firefox 1.0.x as it gave you a choice of all avalible feeds.  but in FireFox 1.5.x the default is Atom if that is not avalible then you get choice for other feeds.  There for now the bug is seen more often.  When people sighn up for podcast feeds blog feeds news feeds that have "" in them etc.  


sorry for bugspam, long-overdue mass reassign of ancient QA contact bugs, filter on "beltznerLovesGoats" to get rid of this mass change
QA Contact: mconnor → bookmarks
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: