Closed
Bug 302749
Opened 19 years ago
Closed 19 years ago
Feedview: feedview removes html and thus garbles certain blogposts
Categories
(Firefox Graveyard :: RSS Discovery and Preview, defect)
Firefox Graveyard
RSS Discovery and Preview
Tracking
(Not tracked)
RESOLVED
INVALID
People
(Reporter: bugs.caleb, Assigned: myk)
References
Details
Attachments
(2 files, 1 obsolete file)
186.86 KB,
image/png
|
Details | |
8.36 KB,
patch
|
Details | Diff | Splinter Review |
It seems that feedview removes HTML tags from the posts (for one reason or another) so the blogpost ends up looking bad. Here are 2 examples: http://weblogs.mozillazine.org/roc/atom.xml - roc's blog uses lists, and since they are stripped (look at the Gecko 1.9 post) it becomes unreadable. http://weblogs.mozillazine.org/asa/atom.xml - Asa's blog uses <strong> and <p>, and the feedview ends up doing strange things to the <p> tags.
I'd like to get the input of people who've tested the 2 blogs specified in comment 0 on Safari, Opera, IE7, and other browsers which supports feedview-like functionality to see how they handle handle in blogposts.
Also see, http://kernel.org/kdist/rss.xml
Comment 3•19 years ago
|
||
Opera only generates a feedview for asa's feed, so I'll compare that. Nothing seems apparently wrong with how Firefox renders the feed. Opera does not do new lines for <p> elements, Firefox does. Neither Opera or Firefox make stuff inside <strong> bold.
Comment 4•19 years ago
|
||
Opera doesn't generate a feedview for Asa's feed, that's Asa own stylesheet. They don't prettyprint at all. Probably more significant for our idea-stealing needs than either Opera or Safari (which as I understand it shows clicked-on feeds in its full feedreader, not a little quick feedviewer like ours) is IE7, which from the screenshot in https://blogs.msdn.com/rssteam/archive/2005/08/02/446882.aspx looks like it's going with full-item, full-HTML. There's no right answer: showing HTML pretty much means scrapping the item length widget, and while I'd rather read roc as full items with HTML, I'd rather read something like Freshmeat, where 90% of the items aren't interesting to me, but the titles aren't enough to tell me that in Live Bookmarks, with the current stripped and shortened descriptions. If someone died and put me in charge, I'd probably say "there's dozens of other ways to read the full content of feeds, and no other ways to glance at quickly scrollable cut off descriptions, wontfix," and then hope that someone someday will write a half-decent HTML-to-plaintext formatter that will at least not clobber block-level elements so badly, but so far they haven't. Died, that is.
Summary: Feedview: feedview removes html and thus garbes certain blogposts → Feedview: feedview removes html and thus garbles certain blogposts
Comment 5•19 years ago
|
||
What seems to be worse is that it is stripping content enclosed in < and > inside the CDATA sections as well, meaning you actually lose real content as well as HTML 'effects' (and links!).
Updated•19 years ago
|
Severity: normal → major
Flags: blocking1.8b4?
Hardware: PC → All
Comment 6•19 years ago
|
||
This screenshot shows that Safari handles HTML (both links and lists) beautifully in its feed view.
Updated•19 years ago
|
Component: General → RSS Discovery and Preview
Comment 7•19 years ago
|
||
Shouldn't we apply disable-output-escaping on content parts?
Comment 8•19 years ago
|
||
disable-output-escaping is not supported by Mozilla by design. See bug #98168
Assignee | ||
Comment 9•19 years ago
|
||
This seems dependent on the changes taking place in bug 303848.
Depends on: 303848
Comment 10•19 years ago
|
||
Here are a number of other RSS feeds that Feedview does not handle properly. The Sage extension does not have any problems handling any of these that were tested with it. For each of these, click on the RSS icon in the address bar, unless specified otherwise: http://www.dslreports.com/ http://www.blogscanada.ca/ http://www.blogsforbush.com/ - here DP FeedView doesn't handle the Atom and RSS 1.0 fees properly http://www.blogsofwar.com/ http://www.javablogs.com/Welcome.action http://msnbc.msn.com/id/3032105/ - DP tries to download it instead. If you click "Cancel" on the download dialog, you can no longer get a response by clicking the RSS icon in the address bar until you refresh the page. http://www.microsoft.com/communities/blogs/PortalHome.mspx - You'll have to click on the RSS icon to the right of the title "Blogcasts" in the page. http://blogsbyiranians.com/
Assignee | ||
Comment 11•19 years ago
|
||
With the checkin for bug 303848, feedview no longer removes HTML, but now it sometimes displays it inline rather than parsing it.
Updated•19 years ago
|
Assignee | ||
Comment 12•19 years ago
|
||
Since bug 303848, we no longer strip HTML. That's the right thing to do, IMHO, but what's wrong is that we display it inline for the many feeds that escape it within their RSS <description> or Atom <content> tags. We should instead unescape it for those feeds, and we should also decode Base64-encoded content, given that Atom provides for such content (and, in fact, forces certain media types to be so encoded). This patch makes the feedview transformsheet tag content by media type and makes FeedView.init() call the new function FeedView._postProcessContent() to unescape HTML-escaped content once the feed has been transformed. The patch doesn't actually decode Base64-encoded content, but it does provide the framework for that to be added later. It's a work-in-progess because XSLTProcessor.importStylesheet() currently dies dies with an unknown error when importing the transformsheet. Given my unfamiliarity with XSLT, it's probably a syntax error somewhere, but I'm still trying to figure out how to debug.
Assignee: nobody → myk
Status: NEW → ASSIGNED
Comment 13•19 years ago
|
||
*** Bug 304584 has been marked as a duplicate of this bug. ***
Comment 14•19 years ago
|
||
Comment on attachment 192788 [details] [diff] [review] work in progress: unescapes HTML >Index: browser/base/content/feedview.xsl >@@ -146,6 +146,16 @@ > </xsl:if> > <h2><xsl:call-template name="a-element"/></h2> > <span class="date"></span> >+ <div class="article"> You want a class="content" there instead. >+ <xsl:attribute name="type">html</xsl:attribute> >+ </div> > <xsl:value-of select="*[local-name()='description']" /> You want the </div> after the content, not before. >+ <xsl:when test="atom03:content@type = 'text/plain' That's your syntax error: those should all be atom03:content/@type with a slash. That should get you close enough to see that you need either something else, or to do the same thing somewhere else, because that will successfully unescape escaped HTML right up to the first instance of the reason people stuff escaped HTML in feeds: because it's not well-formed XML.
Updated•19 years ago
|
Flags: blocking1.8b4? → blocking1.8b4+
Assignee | ||
Comment 15•19 years ago
|
||
Here's another work in progress patch that fixes the bugs in the previous one. I also trap content parsing errors and stick the raw HTML back in place if the parser can't parse it, but we need to figure out a better solution for that. We also need better style for the content. The current style is pretty awkward (presumably the content is inheriting chrome style badly).
Attachment #192788 -
Attachment is obsolete: true
Comment 16•19 years ago
|
||
Hi -- I stumbled across this bug while trying to write a stylesheet that would transform RSS results (or more specifically, OpenSearch RSS results) into HTML in the browser. Are people certain that the proposed fix is the right approach? While support for the "disable-output-escaping" attribute is optional according to the XSL spec, it is probably still the easiest way to support encoded RSS description elements. Moreover, that feature seems to be supported by the other browsers, and it would make XSLT in general that much more viable as a cross-platform tool. (And it's not like this would be a case of copying someone that is doing something outside of spec -- disable-output-escaping is in the standard, and it is supported by offline XSLT processors as well.) Or at the very least, is there a compromise that will work with pre-1.5 browsers? The proposed fix is tied closely to the RSS preview mechanism, which isn't as open for reuse. (That's not to say that getting preview working isn't important -- I just wonder if re-addressing the disable-output-escaping question may kill two birds with one stone.)
Comment 17•19 years ago
|
||
minusing, we're backing away from this feature now, and we're going to reimplement in a much cleaner way for 2.0
Flags: blocking1.8b4+ → blocking1.8b4-
Comment 18•19 years ago
|
||
Feedview was backed out, cleaning out deps.
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Resolution: --- → INVALID
Updated•6 years ago
|
Product: Firefox → Firefox Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•