Closed Bug 476641 Opened 12 years ago Closed 7 years ago

parsing RSS of Planet Python outputs a lot of "(no subject)" articles

Categories

(MailNews Core :: Feed Reader, defect)

x86
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 27.0

People

(Reporter: bamanzi, Assigned: alta88)

References

Details

Attachments

(1 file)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; zh-CN; rv:1.9.1b2) Gecko/20081201 Firefox/3.1b2 GTB5
Build Identifier: 2.0.0.19

When I subscribed Planet Python's RSS feed(http://planet.python.org/rss10.xml), thunderbird give me a lot of articles without content. 

Reproducible: Always

Steps to Reproduce:
0. Create a 'News & Blogs' account
1. In account settings, check 'show summary of article rather than original whole page by default' (I'm using the zh-CN version and don't know the exact english string for this)
2. Subscribe http://planet.python.org/rss10.xml
3. TB starts to parsing that RSS, and the summary window show some (not all) entries labeld '(no subject)', and there's no content for each entry (without step 2, TB can correctly connect the original URL of that entry)
Yep, that would be one of the reasons why RSS 1.0 is a bad idea: it's approximately RDF, but very nearly nobody other than Thunderbird parses it with an RDF parser, so people don't realize how horrible their RDF is.

The <items> rdf:Seq is like a table of contents for the feed, telling you the URIs that feed has an <item> which is rdf:about the same URI. However, that Planet feed looks like it's assembling the Seq from the original item ids/guids, and the rdf:about on items from the original item links, which can be quite different, so the feed claims it includes items about http://jessenoller.com/?p=461 when it actually has items about http://feedproxy.google.com/~r/Jessenollercom/~3/p02Yjhv_hmU and items about tag:blogger.com,1999:blog-496482.post-4627626801651497621 which are actually about http://holdenweb.blogspot.com/2009/02/on-take.html

By far the most likely way this will get fixed is by bug 450543 switching us over to the Toolkit feed parser, which dropped using an RDF parser because it gets terrible results from terrible RDF for no benefit.

(And as far as just being able to read Planet Python, even though they don't advertise it they do have an RSS 2.0 feed which is likely to work out much better, at http://planet.python.org/rss20.xml)
Depends on: 450543
Component: RSS → Feed Reader
Product: Thunderbird → MailNews Core
this should be resolved by either

1. closing as invalid as it's up to the publisher to get the <items> list right, and other rdf publishers do it right.
2. ignoring the <items> list and get all <item>s, like Fx does, and which Tb does if there isn't an <items> to begin with.

magnus, what do you think?
Flags: needinfo?(mkmelin+mozilla)
2 sounds like the more pragmatic thing to do.
Flags: needinfo?(mkmelin+mozilla)
Attached patch rss1rdf.patchSplinter Review
Assignee: nobody → alta88
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Attachment #819201 - Flags: review?(mkmelin+mozilla)
Comment on attachment 819201 [details] [diff] [review]
rss1rdf.patch

Review of attachment 819201 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, thx! r=mkmelin
Attachment #819201 - Flags: review?(mkmelin+mozilla) → review+
https://hg.mozilla.org/comm-central/rev/36ac7d6d04ed
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 27.0
You need to log in before you can comment on or make changes to this bug.