Closed
Bug 476641
Opened 16 years ago
Closed 11 years ago
parsing RSS of Planet Python outputs a lot of "(no subject)" articles
Categories
(MailNews Core :: Feed Reader, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
Thunderbird 27.0
People
(Reporter: bamanzi, Assigned: alta88)
References
Details
Attachments
(1 file)
2.35 KB,
patch
|
mkmelin
:
review+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (X11; U; Linux i686; zh-CN; rv:1.9.1b2) Gecko/20081201 Firefox/3.1b2 GTB5
Build Identifier: 2.0.0.19
When I subscribed Planet Python's RSS feed(http://planet.python.org/rss10.xml), thunderbird give me a lot of articles without content.
Reproducible: Always
Steps to Reproduce:
0. Create a 'News & Blogs' account
1. In account settings, check 'show summary of article rather than original whole page by default' (I'm using the zh-CN version and don't know the exact english string for this)
2. Subscribe http://planet.python.org/rss10.xml
3. TB starts to parsing that RSS, and the summary window show some (not all) entries labeld '(no subject)', and there's no content for each entry (without step 2, TB can correctly connect the original URL of that entry)
Comment 1•16 years ago
|
||
Yep, that would be one of the reasons why RSS 1.0 is a bad idea: it's approximately RDF, but very nearly nobody other than Thunderbird parses it with an RDF parser, so people don't realize how horrible their RDF is.
The <items> rdf:Seq is like a table of contents for the feed, telling you the URIs that feed has an <item> which is rdf:about the same URI. However, that Planet feed looks like it's assembling the Seq from the original item ids/guids, and the rdf:about on items from the original item links, which can be quite different, so the feed claims it includes items about http://jessenoller.com/?p=461 when it actually has items about http://feedproxy.google.com/~r/Jessenollercom/~3/p02Yjhv_hmU and items about tag:blogger.com,1999:blog-496482.post-4627626801651497621 which are actually about http://holdenweb.blogspot.com/2009/02/on-take.html
By far the most likely way this will get fixed is by bug 450543 switching us over to the Toolkit feed parser, which dropped using an RDF parser because it gets terrible results from terrible RDF for no benefit.
(And as far as just being able to read Planet Python, even though they don't advertise it they do have an RSS 2.0 feed which is likely to work out much better, at http://planet.python.org/rss20.xml)
Depends on: 450543
this should be resolved by either
1. closing as invalid as it's up to the publisher to get the <items> list right, and other rdf publishers do it right.
2. ignoring the <items> list and get all <item>s, like Fx does, and which Tb does if there isn't an <items> to begin with.
magnus, what do you think?
Flags: needinfo?(mkmelin+mozilla)
Comment 3•11 years ago
|
||
2 sounds like the more pragmatic thing to do.
Flags: needinfo?(mkmelin+mozilla)
Assignee: nobody → alta88
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Attachment #819201 -
Flags: review?(mkmelin+mozilla)
Comment 5•11 years ago
|
||
Comment on attachment 819201 [details] [diff] [review]
rss1rdf.patch
Review of attachment 819201 [details] [diff] [review]:
-----------------------------------------------------------------
Looks good, thx! r=mkmelin
Attachment #819201 -
Flags: review?(mkmelin+mozilla) → review+
Updated•11 years ago
|
Keywords: checkin-needed
Comment 6•11 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 27.0
You need to log in
before you can comment on or make changes to this bug.
Description
•