Closed Bug 302749 Opened 19 years ago Closed 19 years ago

Feedview: feedview removes html and thus garbles certain blogposts

Categories

(Firefox Graveyard :: RSS Discovery and Preview, defect)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: bugs.caleb, Assigned: myk)

References

Details

Attachments

(2 files, 1 obsolete file)

It seems that feedview removes HTML tags from the posts (for one reason or
another) so the blogpost ends up looking bad.

Here are 2 examples:

http://weblogs.mozillazine.org/roc/atom.xml - roc's blog uses lists, and since
they are stripped (look at the Gecko 1.9 post) it becomes unreadable.

http://weblogs.mozillazine.org/asa/atom.xml - Asa's blog uses <strong> and <p>,
and the feedview ends up doing strange things to the <p> tags.
I'd like to get the input of people who've tested the 2 blogs specified in
comment 0 on Safari, Opera, IE7, and other browsers which supports feedview-like
functionality to see how they handle handle in blogposts.
Also see, http://kernel.org/kdist/rss.xml
Opera only generates a feedview for asa's feed, so I'll compare that. Nothing
seems apparently wrong with how Firefox renders the feed. Opera does not do new
lines for <p> elements, Firefox does. Neither Opera or Firefox make stuff inside
<strong> bold. 
Opera doesn't generate a feedview for Asa's feed, that's Asa own stylesheet.
They don't prettyprint at all. Probably more significant for our idea-stealing
needs than either Opera or Safari (which as I understand it shows clicked-on
feeds in its full feedreader, not a little quick feedviewer like ours) is IE7,
which from the screenshot in
https://blogs.msdn.com/rssteam/archive/2005/08/02/446882.aspx looks like it's
going with full-item, full-HTML.

There's no right answer: showing HTML pretty much means scrapping the item
length widget, and while I'd rather read roc as full items with HTML, I'd rather
read something like Freshmeat, where 90% of the items aren't interesting to me,
but the titles aren't enough to tell me that in Live Bookmarks, with the current
stripped and shortened descriptions. If someone died and put me in charge, I'd
probably say "there's dozens of other ways to read the full content of feeds,
and no other ways to glance at quickly scrollable cut off descriptions,
wontfix," and then hope that someone someday will write a half-decent
HTML-to-plaintext formatter that will at least not clobber block-level elements
so badly, but so far they haven't. Died, that is.
Summary: Feedview: feedview removes html and thus garbes certain blogposts → Feedview: feedview removes html and thus garbles certain blogposts
What seems to be worse is that it is stripping content enclosed in &lt; and &gt;
inside the CDATA sections as well, meaning you actually lose real content as
well as HTML 'effects' (and links!).
Severity: normal → major
Flags: blocking1.8b4?
Hardware: PC → All
Attached image roc's feed in Safari
This screenshot shows that Safari handles HTML (both links and lists)
beautifully in its feed view.
Component: General → RSS Discovery and Preview
Shouldn't we apply disable-output-escaping on content parts?
disable-output-escaping is not supported by Mozilla by design. See bug #98168
This seems dependent on the changes taking place in bug 303848.
Depends on: 303848
Here are a number of other RSS feeds that Feedview does not handle properly. 
The Sage extension does not have any problems handling any of these that were
tested with it.

For each of these, click on the RSS icon in the address bar, unless specified
otherwise:

http://www.dslreports.com/
http://www.blogscanada.ca/

http://www.blogsforbush.com/
- here DP FeedView doesn't handle the Atom and RSS 1.0 fees properly

http://www.blogsofwar.com/

http://www.javablogs.com/Welcome.action
http://msnbc.msn.com/id/3032105/
- DP tries to download it instead.  If you click "Cancel" on the download
dialog, you can no longer get a response by clicking the RSS icon in the address
bar until you refresh the page.

http://www.microsoft.com/communities/blogs/PortalHome.mspx
- You'll have to click on the RSS icon to the right of the title "Blogcasts" in
the page.

http://blogsbyiranians.com/
With the checkin for bug 303848, feedview no longer removes HTML, but now it
sometimes displays it inline rather than parsing it.
Blocks: 303848
No longer depends on: 303848
Attached patch work in progress: unescapes HTML (obsolete) — Splinter Review
Since bug 303848, we no longer strip HTML.  That's the right thing to do, IMHO,
but what's wrong is that we display it inline for the many feeds that escape it
within their RSS <description> or Atom <content> tags.	We should instead
unescape it for those feeds, and we should also decode Base64-encoded content,
given that Atom provides for such content (and, in fact, forces certain media
types to be so encoded).

This patch makes the feedview transformsheet tag content by media type and
makes FeedView.init() call the new function FeedView._postProcessContent() to
unescape HTML-escaped content once the feed has been transformed.  The patch
doesn't actually decode Base64-encoded content, but it does provide the
framework for that to be added later.

It's a work-in-progess because XSLTProcessor.importStylesheet() currently dies
dies with an unknown error when importing the transformsheet.  Given my
unfamiliarity with XSLT, it's probably a syntax error somewhere, but I'm still
trying to figure out how to debug.
Assignee: nobody → myk
Status: NEW → ASSIGNED
*** Bug 304584 has been marked as a duplicate of this bug. ***
Comment on attachment 192788 [details] [diff] [review]
work in progress: unescapes HTML

>Index: browser/base/content/feedview.xsl
>@@ -146,6 +146,16 @@
>       </xsl:if>
>       <h2><xsl:call-template name="a-element"/></h2>
>       <span class="date"></span>
>+      <div class="article">

You want a class="content" there instead.

>+        <xsl:attribute name="type">html</xsl:attribute>
>+      </div>
>       <xsl:value-of select="*[local-name()='description']" />

You want the </div> after the content, not before.

>+              <xsl:when test="atom03:content@type = 'text/plain'

That's your syntax error: those should all be atom03:content/@type with a
slash.

That should get you close enough to see that you need either something else, or
to do the same thing somewhere else, because that will successfully unescape
escaped HTML right up to the first instance of the reason people stuff escaped
HTML in feeds: because it's not well-formed XML.
Flags: blocking1.8b4? → blocking1.8b4+
Here's another work in progress patch that fixes the bugs in the previous one. 
I also trap content parsing errors and stick the raw HTML back in place if the
parser can't parse it, but we need to figure out a better solution for that.

We also need better style for the content.  The current style is pretty awkward
(presumably the content is inheriting chrome style badly).
Attachment #192788 - Attachment is obsolete: true
Hi -- I stumbled across this bug while trying to write a stylesheet that would
transform RSS results (or more specifically, OpenSearch RSS results) into HTML
in the browser.  

Are people certain that the proposed fix is the right approach?  While support
for the "disable-output-escaping" attribute is optional according to the XSL
spec, it is probably still the easiest way to support encoded RSS description
elements.  Moreover, that feature seems to be supported by the other browsers,
and it would make XSLT in general that much more viable as a cross-platform
tool.  (And it's not like this would be a case of copying someone that is doing
something outside of spec -- disable-output-escaping is in the standard, and it
is supported by offline XSLT processors as well.)

Or at the very least, is there a compromise that will work with pre-1.5
browsers?  The proposed fix is tied closely to the RSS preview mechanism, which
isn't as open for reuse.

(That's not to say that getting preview working isn't important -- I just wonder
if re-addressing the disable-output-escaping question may kill two birds with
one stone.)
 
minusing, we're backing away from this feature now, and we're going to
reimplement in a much cleaner way for 2.0
Flags: blocking1.8b4+ → blocking1.8b4-
Feedview was backed out, cleaning out deps.
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Resolution: --- → INVALID
Resetting QA Contact to default.
QA Contact: general → rss.preview
Product: Firefox → Firefox Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: