Closed Bug 1259635 Opened 10 years ago Closed 9 years ago

Accept duplicate guids in RSS Feedreader in the same account

Categories

(Thunderbird :: General, defect)

47 Branch
x86_64
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: mmueller.de1987, Unassigned)

References

Details

Attachments

(1 file)

Attached image thunderbird.jpg
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Firefox/38.0 Iceweasel/38.7.1 Build ID: 20160318181316 Steps to reproduce: On latest Debian stable (8.3) Linux 64 bit, i have installed Thunderbird Early Bird 47.0a2. The same Error happens in latest Thunderbird stable Release 38.7.1 x64, and Icedove, which is a debian-branded version of Thunderbird from Package Manager. I select "Blogs & News Feeds", and subscribe to 6 RSS News-Feeds. Each Feed has 10 News. http://news.google.de/news?cf=all&hl=de&pz=1&ned=de&topic=w&output=rss http://news.google.de/news?cf=all&hl=de&pz=1&ned=de&topic=n&output=rss http://news.google.de/news?cf=all&hl=de&pz=1&ned=de&geo=Bremen&output=rss http://news.google.de/news?cf=all&hl=de&pz=1&ned=de&output=rss http://news.google.com/news?cf=all&hl=en&pz=1&ned=us&output=rss http://news.google.com/news?cf=all&hl=en&pz=1&ned=us&topic=w&output=rss Actual results: The First three subscriptions work as intended. All 10 News are shown. After adding the fourth and fifth rss-feed, it only shows 4, 5, or 7 News instead of 10. Some News are missing. Expected results: Each Feed has 10 News. Sometimes Thunderbird only show 4-7 News. I checked this with another Feedreader called "Liferea" (see my screenshot), where all 10 News from this 6 RSS-Feeds are shown. This is a bug, Thunderbird should load all the News. I tried to delete my Thunderbird Profile (at ~/.thunderbird) and re-add my rss-feeds, but got the same problem.
Component: Untriaged → General
OS: Unspecified → Linux
Hardware: Unspecified → x86_64
I tried this with other News Feed Providers, such as 'http://www.golem.de/sonstiges/rss.html'. The same bug appears. Regardless of using RSS or ATOM Feed. BUT i found out how to circumvent this Bug: When i am putting the fourth, fifth and sixth News-Feed in a second Feed Account in Thunderbird, all 10 News of each Feed is shown. This is a temporary solution. By the way: sorry for my english, as a german native this is not quite easy for me ;)
It is entirely intentional design to prevent storing duplicate messages (in the same account, as you've discovered). A dupe is a message whose guid has already been downloaded and the publisher here is incorrectly reusing guids across feeds. See the guide for more on dupes: https://support.mozilla.org/en-US/kb/how-subscribe-news-feeds-and-blogs. It's possible Liferea even has an open bug to not store dupes..
Status: UNCONFIRMED → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
Status: RESOLVED → UNCONFIRMED
Resolution: WORKSFORME → ---
You seem to have misunderstood "intentional design". It is, rather, a bug to duplicate messages with same guids, which undesirable behavior someone would immediately flag as a bug. And it is certainly publisher error/abuse of guid, which means Globally Unique, to reuse them across feeds. Do not reopen this.
Status: UNCONFIRMED → RESOLVED
Closed: 10 years ago9 years ago
Resolution: --- → WORKSFORME
Please look at the RSS Feeds i send you. For example this one: http://rss.cnn.com/rss/edition.rss Every message there is unique. 25 Message Entries. Thunderbird only show me 17 of them. There are missing news. This cant be intentional. This is not normal. And this is not just for CNN News. It is for almost every RSS-Feed i tried (and i tried many of them).
I just subscribed to the feed in comment 5 and it immediately downloaded 25 items. If you want to diagnose this, you need to find an item that you say is not being downloaded, in the feed file, then go to feeditems.rdf and see if it (its guid) is there. It will be, meaning the item has already been seen/downloaded and exists in an active other feed source, meaning that the publisher is reusing guids across feed sources.
I understand, many publishers tend to reuse their own guids. Is there any way to set Thunderbird to ignore guids? So that it behaves like other Feedreaders. In this state Thunderbird as Feedreader is unusable for me, and i think for many others too. I cant subscribe to two sections (for example business and sports) from one publisher without getting this problem.
Your assertion "i think for many others too" is quite incorrect, people have a very high intolerance for dupes. No one has ever wanted them; I've closed hundreds of feed bugs. I suggest you use another reader or write (or get someone to write) an extension to hack the feed reader's dupe prevention code.
Resolution: WORKSFORME → WONTFIX
Summary: Missing News in RSS Feedreader → Accept duplicate guids in RSS Feedreader in the same account

The RSS 2.0 specification only says that the guid should uniquely identify the item, and nothing about deduplicating items. What's the reasoning to deduplicate identical items with identical guids (which is correct) across folders?

From the spec:

The guid enables an aggregator to detect when an item has been received previously and does not need to be presented to a user again

It appears in the Best Practices Profile, not in the spec.

Also, there is no clear mention if this applies to a single feed or to all feeds. Actually, both the spec and those best practices seem to omit hints on how aggregators should handle several feeds, so I'd stick to the first variant.

Guids are obviously useful for detecting new items in a single feed, as without them, an aggregator would need to compare all the content of items, which is a resource-heavy and unreliable operation, rather than comparing only guids. But comparing items across all the feeds looks like an overkill.

Previously you mentioned that this is a publisher error. If a permalink is used as a guid, how are publishers supposed to make the same item with the same permalink appear independently in several feeds? Should they produce different permalinks pointing to the same page for each feed, or what?

From a user's perspective, it looks like Thunderbird just loses some items. I struggled a lot with manually re-fetching the feed, and only being used to deep researches stopped me from reporting something like "Some RSS items are not fetched, here's an example", rather than figuring out what actually happens. Choosing a folder to place an item to looks like an undefined behaviour.

Should I be afraid that one day Thunderbird deletes the copies of messages from one of my IMAP account on another IMAP account, just because developers consider them "duplicates" too, and don't even expose a setting for their decision?

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: