Closed Bug 260745 Opened 20 years ago Closed 19 years ago

only first item of many with same <link> URL shown; should use <guid> to compare items

Categories

(MailNews Core :: Feed Reader, defect)

x86
All
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 264482

People

(Reporter: yuanyi21, Assigned: mscott)

References

()

Details

Attachments

(1 file)

In Feeditem.js, there are some places where judge whether the feed is stored or
not, but it only compares the feed's url, like this:
var itemResource = rdf.GetResource(this.url);
This is not always true, for example, I have a forum which will generate new
feed for every new post, the url will be same if it's a replied post. According
to the rss 2.0 spec
(http://blogs.law.harvard.edu/tech/rss#ltguidgtSubelementOfLtitemgt), <link> is
not guaranteed to be unique, <guid> does. So I think the correct way should be
compare guid firstly, if guid is empty, use url intead.
*** Bug 259947 has been marked as a duplicate of this bug. ***
OS: Windows XP → All
Summary: Shouldn't compare feeds only by their url → only first item of many with same <link> URL shown; should use <guid> to compare items
We have this same problem with some internal feeds.  There are several RSS feeds
which alert employees to new reports being published.  The <link> node is the
same for each <item> because there is one page on our intranet that displays all
of the correct reports to each user based on who they are.  The <guid> and
<description> change with each post providing the employees with a summary of
the update.

We currently use Thunderbird 0.7 for mail and a custom developed RSS reader for
these reports.  We plan to deploy Thunderbird 1.0 when it is available and hoped
to use it for mail, these current reports, and for future projects where RSS
would be appropriate, but we will have to keep our custom app around for the
feeds if Thunderbird continues to treat <link> as a unique identifier.

Instead of:
var itemResource = rdf.GetResource(this.url || ("urn:" + this.id));

would it be better to use:
var itemResource = rdf.GetResource(("urn:" + this.id) || this.url);
Sorry for bugspam -- commited comment #2 too soon :(

(Follow-up to comment #2)
> Instead of:
> var itemResource = rdf.GetResource(this.url || ("urn:" + this.id));
> 
> would it be better to use:
> var itemResource = rdf.GetResource(("urn:" + this.id) || this.url);

and give preference to "urn:" + this.id when storing the feed item in the first
place even if there is a url?
What does this mean?

    someFunction(a || b)

Does that call someFunction(a) and then someFunction(b) if the first one retursn
null?

I've been trying the above fix in my thunderbird and it's made the situation
worse for me. But I'd love to understand what that code fragment means. Wouldn't
it be better to do:

    var foo = someFunction(a);
    if (!foo) {
        foo = someFunction(b);
    }
(In reply to comment #4)
> What does this mean?
> 
>     someFunction(a || b)
> 
> Does that call someFunction(a) and then someFunction(b) if the first one retursn
> null?

The double-pipe || in JavaScript is a logical OR; a || b means a OR b.

> I've been trying the above fix in my thunderbird and it's made the situation
> worse for me.

My apologies -- what I wrote wasn't really a fix just a suggestion.  For it to
work Thunderbird would have use "'urn:' + this.id" as the resource identifier
when storing the feed item.  As it is now, "this.url" is used which works fine
as long as the <link> is absent or the same as <guid>.  this.url uses the value
of <link> if it is present even though <link> doesn't uniquely identify the <item>.

The RSS 2.0 spec simply defines <guid> as "a string that uniquely identifies the
item" (http://blogs.law.harvard.edu/tech/rss#ltguidgtSubelementOfLtitemgt).

"A frequently asked question about <guid>s is how do they compare to <link>s.
Aren't they the same thing? Yes, in some content systems, and no in others. In
some systems, <link> is a permalink to a weblog item. However, in other systems,
each <item> is a synopsis of a longer article, <link> points to the article, and
<guid> is the permalink to the weblog entry. In all cases, it's recommended that
you provide the guid, and if possible make it a permalink. This enables
aggregators to not repeat items, even if there have been editing changes."
OK, so now that I realize that javascript's "||" is just like Java's, I see
what's going on. So the problem with this idea:

    var itemURI = ("urn:" + this.id) || this.url;

is that "urn:" + this.id is NEVER null, and so it's always used and this.url is
never used. This is fine unless this.id is null, which I gather is possible. So,
while this change has worked well for me, my buddy noticed that he was not
getting new articles in some of his feeds, and I decided it was because id was
null, and "urn:" + null => "urn:null" at least in Java. So you can see why it
would think there are no new articles if a feed doesn't have an id for all the
items.

So here's what I've done, and it seems to be working well (keep in mind, I know
nothing about javascript). This uses the id if possible OR if the url is null
(which I think is NOT possible):

FeedItem.prototype.getURI = function() {
    if (this.id || !this.url) {
	return "urn:" + this.id;
    }
    return this.url;
}

and then

  var itemURI = this.getURI();

replaces this line and ones like it:

  var itemURI = ("urn:" + this.id) || this.url;

If somebody thinks this is right, please fix it. I am not a developer.
*** Bug 306732 has been marked as a duplicate of this bug. ***
Robert, your patch in bug 301964 included a comment that "RSS2 with GUIDs should do this as well", this being store the feed item by its guid rather than the <link>/url, but it doesn't seem to actually accomplish this for non-Atom feeds.

Could you provide any help here?
This patch works for me, both on the trunk and against the 1.8 branch.
Attachment #201095 - Flags: review?(mscott)
Even with this proposed patch feeds such as the URL attached to duplicate bug 306732 (http://marblehead.com/schools/mhs/headlight/tools/rss-news.php) will fail if they have matching <link>s for each <item> and no <guid>s.  Feeds like this and http://www.benhammersley.com/tools/fedextrack.cgi?track=039813830750081 are also essentially useless the way Thunderbird currently stores feed items.

Perhaps feed items should be stored by a hash of the <link>/url, <title>, and/or <pubDate> to ensure new feed <item>s with the same <link> will be properly recognized.

Also, my apologies if the patch or review? flag were done wrong; this is my first patch ;)
(In reply to comment #8)
> Could you provide any help here?
> 

The approach you're suggesting is the right one, but that is not the approach your patch takes. It was easy to add this for Atom feeds because there is always an id, and the parser was new, so users have no previously stored entries. 

This is also a dupe of bug264482, so I'm closing this.

*** This bug has been marked as a duplicate of 264482 ***
Status: NEW → RESOLVED
Closed: 19 years ago
Resolution: --- → DUPLICATE
Attachment #201095 - Flags: review?(mscott)
Component: RSS → Feed Reader
Product: Thunderbird → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: