Closed Bug 258465 Opened 20 years ago Closed 11 years ago

[tracking] Duplicate entries appear in feeds

Categories

(MailNews Core :: Feed Reader, defect, P1)

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: bugzilla1, Unassigned)

References

()

Details

(Keywords: meta, Whiteboard: [file specific bugs with specific URL - see comment 157][delight])

Attachments

(4 files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.3) Gecko/20040908 Firefox/0.10
Build Identifier: version 0.8 (20040904)

Sometimes entries seem to get duplicated within TB.

Seen with multiple feeds including IEBlog, The Register and Planet Mozilla.

Planet Mozilla - mostly Daniel Glazman, David Tenser, Blogzilla, Axel Hecht,
W3C, Rumbling Edge and David Miller get duplicated. Mike Pinkerton, Mitchell
Baker,Gerv and MozillaZine I haven't seen duplicated. 

Reproducible: Sometimes
Steps to Reproduce:
Subscribe to a feed.
Watch the duplicates build up as the feed is rechecked every hour.
I can verify this bug. However, the duplicates all seem to have different dates
associated with them. E.g., the entry "[Amusing] Underage teens turn profit on
old beer" from http://www.pluck.com/rss/fark.rss appears in my list with dates
2004-09-14T21:22, 2004-09-15T04:02, and 2004-09-15T09:42. I have set the refresh
rate to 10 minutes.

Other feeds in which duplicates occur:
http://www.aftenposten.no/eksport/rss-1_0/?seksjon=viten
http://www.ntnu.no/~engmark/arbeid/hjemmeside/beskjed/motd.xml

Feeds in which duplicates do NOT occur:
http://www.digi.no/phpf/feed/rss/digi.php
http://www.cooltechzone.com/index2.php?option=com_rss&no_html=1
http://www.aftenposten.no/eksport/rss-1_0/?seksjon=nyheter_iriks

An additional note: None of these feeds have contained duplicates when viewed
using the Sage FireFox plugin.
One addition which might be interesting: My own RSS feed,
http://www.ntnu.no/~engmark/arbeid/hjemmeside/beskjed/motd.xml, has not changed
for weeks, but between yesterday and today it seems to have spawned 7 (!) new
(duplicate) messages. They all have the same datetime as their origin.
I get this on all my RSS feeds with a 20041027 nightly build.  I can reproduce 
it very easily by repeatedly clicking the get mail while selecting any of my
rss folders.

Just click once, wait half a second, click again and again and again, ad infinitum
and watch the duplicates grow and grow.
*** Bug 258848 has been marked as a duplicate of this bug. ***
The same problem was reported in bug 258848, giving the following URL: 
  http://www.computerbase.de/rss/news.xml
Have verified with a local test blog.
The memory of what is stored seems to get lost on restart.
In addition, items seem to be being invalidated.
Note that if run from the commandline, Thunderbird gives output on what is going
on - attaching a log
Status: UNCONFIRMED → NEW
Ever confirmed: true
Note: This is running on Gentoo.

I have this problem when I enable "Check for new articles at startup" enable in
the News Reader options. Everytime I startup thunderbird 0.8 it duplicates all
available articles.. close/open and it gets the artciles AGAIN.

If I remove this, the problem goes away.

I use the following RSS Feeds:

Slashdot, Slashdot : BSD, Slashdot Linux, HackaDay,
http://closingsoon.corrupt.co.nz/

Regardless of the feeds, this problem happens.
(In reply to comment #1)
> However, the duplicates all seem to have different dates
> associated with them. E.g., the entry "[Amusing] Underage teens turn profit on
> old beer" from http://www.pluck.com/rss/fark.rss appears in my list with dates
> 2004-09-14T21:22, 2004-09-15T04:02, and 2004-09-15T09:42.

I have the same problem using Thunderbird 0.9 (20041103) under Win2k, but I
always get the duplicates with exactly the same date associated with them.

-----

My worst "blog-duplicate-enemies" are three posts by Ian Hickson (Keeping
busy/The Elevator/Some notes from Los Santos). I get the duplicates with "Planet
Mozilla" and also with "mozillaZine feedHouse".

-----

However, I've managed to get rid of the problem by doing the following:
(WARNING: do not edit any files in the profile directoy by hand if you don't
know what you're doing!)

1) open TB

2) write down the URLs (http://ln.hixie.ch/?start=1100471333&count=1) of all the
duplicates and the folder(mozillaZine feedHouse) in which they occur too.

3) close TB

4) go to the "News & Blogs" folder in your profile directory
(*\Thunderbird\Profiles\xxxxx.default\Mail\News & Blogs\)

5) open feeditems.rdf and delete all entries of the URLs from 1):

  <RDF:Description RDF:about="http://ln.hixie.ch/?start=1100471333&amp;count=1"
                   fz:stored="true"
                   fz:valid="true">
    <fz:feed RDF:resource="http://feedhouse.mozillazine.org/rss20.xml"/>
  </RDF:Description>

6) close (and save) feeditems.rdf

7) open the file (mozillaZine feedHouse) with the name of the folder in which
the duplicates occur and delete all entries of the URLs from 1):

From - Sun, 14 Nov 2004 22:28:53 +0000
   ...
Subject: Ian Hickson: Keeping busy
   ...
Content-Base: http://ln.hixie.ch/?start=1100471333&count=1
   ...
    </iframe>

  </body>
</html>

8) close (and save) mozillaZine feedHouse

9) start TB and reclaim your inbox ;)

(In reply to comment #10)

> My worst "blog-duplicate-enemies" are three posts by Ian Hickson (Keeping
> busy

I've seen quite a lot of that item lately. It seems Hixie's site was unavailable
(the link on the item yielded an error page of some sort) during the few days
that the duplicates appeared -- perhaps some kinds of feed serving failure might
be one cause of duplicates? (thunderbird aviary cvs)
OS: Windows XP → All
Summary: Duplicate entries appear when viewing feeds → Duplicate entries appear in feeds
Hardware: PC → All
Just noticed this myself in 0.9 with Wired's feed
(http://wired.com/news/feeds/rss2/0,2610,,00.xml)

Glad to see it's not just me.
nominating, in case there might be fix in the midst for 1.0...
Flags: blocking-aviary1.0?
(In reply to comment #12)
> Just noticed this myself in 0.9 with Wired's feed
> (http://wired.com/news/feeds/rss2/0,2610,,00.xml)

Have you been reading Wired long? I have read it for months, and it started
spawning duplicates for the first time only a few days ago. However, now it's
rampant, and single articles are duplicated in the twenties...
It seems that this is how it starts.  You get a single duplicate the first time
it starts happening.  Then the next story which shows up, you get a duplicate of
both, and they start multiplying.

Last Monday morning, I came into the office and there were 450 messages in the
Wired folder.  At that point, I gave up and deleted them. :-)
*** Bug 271640 has been marked as a duplicate of this bug. ***
This is happening because of a &amp; in hixie's feed article link and guid tags.

A item might have an id of:

http://ln.hixie.ch/?start=1101165341&amp;count=1

but somehow that ends up getting stored in our RDF data source as:

http://ln.hixie.ch/?start=1101165341&amp;amp;count=1

note the 2nd "amp;"

as a result the next time we download the feed, we look in our data source to
see if we already have this feed item and we don't have a match.

What makes this even more confusing is that this doesn't happen every time. I
have plenty of entries in feeditems.rdf from hixie's blog that are perfectly
formed such as:

http://ln.hixie.ch/?start=1100471333&amp;count=1

I don't know why every now and then one of the urls gets corrupted. 



WOW! This bug has nothing to do with the RSS module at all. It's a bug way down
deep in the bowels of our RDFXML serializer which serializes our datasource to disk:

http://lxr.mozilla.org/aviarybranch/source/rdf/base/src/nsRDFXMLSerializer.cpp#557

nsRDFXMLSerializer::SerializeDescription

takes our RDF resource which has a value of: 

http://ln.hixie.ch/?start=1101165341&amp;count=1

and turns it into a URL then escapes it by calling:
rdf_EscapeAttributeValue
which turns it into:
http://ln.hixie.ch/?start=1101165341&amp;amp;count=1

note the double amp

This is the value that get serialized to feeditems.rdf

When we next load the datasource again from disk, it looks like the rdf xml data
 source doesn't account for this unescaping so it ends up in memory with the
double ampersands!



So RDF serializes it to disk and escapes the url into:

http://ln.hixie.ch/?start=1101165341&amp;amp;count=1 which gets written to disk

When it parses the data source back in on startup, our escaped resource URI gets
turned into:
http://ln.hixie.ch/?start=1101602247&count=1

which is incorrect. We would expect to see:

http://ln.hixie.ch/?start=1101165341&amp;count=1
And thus the data source is broke for any resource you put into it that looks
like hixie's urls.

The bug may not actually be in RDF but even deeper, down in the xmlparser that
RDF uses to build up the data source on startup.

Someone stop the pain...
Further clarification. the expat parser is properly parsing the input when
loading the data source but it unescapes the attribute string before calling
into the RDF data source with information about the new value. So it gives RDF a
string that looks like:

http://ln.hixie.ch/?start=1101602247&amp;count=1

which is the unescaped version of the how the string looks on disk in the .rdf file.

nsRDFContentSink::GetIdAboutAttribute takes this string that has already been
escaped and calls: nsRDFParserUtils::StripAndConvert which effectively unescapes
it again giving us just:

http://ln.hixie.ch/?start=1101602247&count=1

and that's what gets added to our data source...
Attached file attempt at a test feed
this is a hack to work around the escaping issue in RDF
Comment on attachment 167519 [details] [diff] [review]
total hack work around in feedItem.js

Hack to work around the RDF entity replacement bug.

in FeedItem::isStored, if we don't see the url resource in our data source,
then try to replace any entities (most importantly &amp; in the feed URI) with
their char equivalents and then try to look up that URI in the data source.

This means, if we fail to find:

http://ln.hixie.ch/?start=1101602247&amp;count=1

we'll end up also looking for:

http://ln.hixie.ch/?start=1101602247&count=1 which really is in the datasource.
Attachment #167519 - Flags: superreview?(bienvenu)
I aggressivly checked this hack work around in so it could make the RC build in
the morning. I hope I won't get burned by doing so. 
Flags: blocking-aviary1.0? → blocking-aviary1.0+
Keywords: fixed-aviary1.0
Target Milestone: --- → Thunderbird1.0
Comment on attachment 167519 [details] [diff] [review]
total hack work around in feedItem.js

great job, Scott. That explains why I was only seeing this after rebooting
tbird...
Attachment #167519 - Flags: superreview?(bienvenu) → superreview+
vrfy'd fixed on the branch: tested with 200412010x-0.9 on linux and mac, and the
problematic rss feeds don't display duplicates for me.
*** Bug 271206 has been marked as a duplicate of this bug. ***
*** Bug 271975 has been marked as a duplicate of this bug. ***
this work around went into the branch and the trunk. 
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Is this fix in 1.0 RC1?

Using it and i am getting duplicate entries from Ian Hickson blog via the Planet
Mozilla feed. Can attach screenshot if necessary.
no, it didn't make rc1, but it's in the latest nightly .9 builds.
I've now got 1.0 final installed, and am still getting this on Hixie's feed.
Just now I got a duplicate of 'choo choo'.

I used to get duplicates of many articles at once, this is only a single article
duplicated. It is the most recent article..

I'm using the Planet Mozilla aggregate feed btw.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I am seeing similar things to comment 32. Retrieving the Planet Mozilla feed
(http://planet.mozilla.org/rss20.xml) with wget gives the following for the Choo
Choo entry:

<item>
        <title>Ian Hickson: Choo choo</title>
        <guid>http://ln.hixie.ch/?start=1102294283&amp;amp;count=1</guid>
        <link>http://ln.hixie.ch/?start=1102294283&amp;amp;count=1</link>
        <description>&lt;p&gt;Today I met some new people and played some games.
 First we played a game that took seven hours, namely &lt;a href=&quot;http://ww
< snip the rest of the description>
venting.&lt;/p&gt;</description>
        <pubDate>Mon, 06 Dec 2004 00:51:23 +0000</pubDate>
</item>

If you wget from Hixie's log directly (http://ln.hixie.ch/rss/html):
 <item rdf:about="http://ln.hixie.ch/?start=1102294283&amp;count=1">
  <title>Choo choo</title>
  <link>http://ln.hixie.ch/?start=1102294283&amp;count=1</link>
  <dc:date>2004-12-06T00:51:23+00:00</dc:date>
  <description>&lt;p&gt;Today I met some new people and played some games. First
 we played a game that took seven hours, namely &lt;a href=&quot;http://www.18xx
<snip>
&lt;p&gt;Now, back to writing up a draft of the card game rules Kam and I are in
venting.&lt;/p&gt;</description>
 </item>


ie, Planet Mozilla:
<link>http://ln.hixie.ch/?start=1102294283&amp;amp;count=1</link>
Hixie:
<link>http://ln.hixie.ch/?start=1102294283&amp;count=1</link>

Perhaps this is a Planet Mozilla problem.
Or perhaps not. I subscribed to Hixie's feed directly and get duplicate entries
there too.
I have the same issue with the Plaent Mozilla feed.  Everyone elses blog entries
are fine, but each time it checks for new articles if there are any new articles
in the planet Mozilla feed.  All of Hixeis articles get duplicated.  If there
are no other new articles then I don't get Hixies posts again.
*** Bug 274882 has been marked as a duplicate of this bug. ***
*** Bug 274604 has been marked as a duplicate of this bug. ***
*** Bug 274583 has been marked as a duplicate of this bug. ***
Yesterday my TB 1.0 started to get duplicates every time I opened it. It
happened with every feed, also with feeds without "&" in the URL. I deleted all
the feeds, deleted some old entries in Mail\News & Blogs\ in my profile
directory which TB wasn't able to delete, and resubscribed to the feeds. It was
quite a lot work because the missing OPML export/import, but now everything
works again. I noticed that now the namespace fz is used in feeditems.rdf and
feeds.rdf instead of forumzilla which was used before I did the whole thing ...
I can't remember, but maybe I also deleted these .rdf-files. I heard sometimes
mails cannot be deleted due to corrupted .msf-files, could corrupted .msf-files
be a reason that TB didn't dismiss already downloaded feed items? It's a bit
strange that this bug occured suddenly without a cognizable reason.
re comment #39, I ran into the same issue.  In my case it appears it was
triggered by switching back and forth between branch and trunk builds.  I dont
think this is the same issue as this bug.  This bug is more about the issue that
if you subscribe to the Planet Mozilla feed, every time there are any new
articles in the feed, any current articles posted by Hixie get duplicated.  (BTW
since there are no current Hixie posts, this problem is now dormant as well). 
Anyway, I guess Hixie is even more special than we thought. :-)
BTW, despite the fixed-aviary1.0 kea=yword, the patch that was supposed to fix
it may have been checked in, but it never actually fixed the problem.
One blog that is generating as many duplicates as ever is this:
http://blog.sun.com/roller/rss/timf
Might be easier to track down with that as its fairly consistent (at least for me)
Have recently noticed this problem on a lot of atom feeds that I am responsible
for, eg:

http://www.iii.co.uk/atom/cotn:TAD.L.xml

Currently trying to track it down - seems to be more likely to occur when TB is
restarted or if the user adds new feeds.
(In reply to comment #43)
> Currently trying to track it down - seems to be more likely to occur when TB is
> restarted or if the user adds new feeds.

Just happened to me (again) in Thunderbird 1.0 (WinXP). I had not added any new
feeds. I don't recall if this is the first time checking the account since
restarting.
Flags: blocking-aviary1.1+
This has been a problem for me for months now, and there doesn't seem to be any
pattern to it.  I am running and have only ever installed the 1.0 release on
Windows XP (20041206).

Nearly every time I start Thunderbird (but not every time) many of my feeds (but
not all, and not always the same feeds) produce some number of duplicate
entries.  Sometimes a feed will duplicate only the most recent entry, or the
most recent three.  Other feeds seem to duplicate every item in the feed.  I
seem to receive duplicates on feeds where I delete most of the entries after
reading them, but also on feeds where I've never deleted a (non-duplicate)
entry.  I've tried to find commonalities, but it happens on RSS and Atom feeds
of different versions, on feeds that do and do not have datestamps.  The
duplicates usually appear when I start the program after several hours of not
running it (like when I turn my computer on each day.)  Restarting the program
within a short period of time doesn't produce duplicates.

I wish I could narrow it down better, but it seems like quite a nasty bug.  Here
are some observations:

The feed for alterslash.org never seems to make duplicates.  I delete nearly
every entry in it after reading.
http://www.alterslash.org/rss_full.xml

The feeds for MetaFilter, BoingBoing, Reason, and Slate seems to duplicate the
most recent half of the entries that are in the RSS feed nearly every time.  I
delete most of these entries after viewing.
http://xml.metafilter.com/rss.xml
http://feeds.feedburner.com/boingboing/iBag
http://www.reason.com/hitandrun/index.xml
http://slate.msn.com/rss

Feeds made by many different blog softwares create duplicates but in a similar
way, usually duplicating only the most recent 1-3 entries.  For many of these I
never delete entries after viewing (except duplicates).
LiveJournal: http://www.livejournal.com/users/jwz/
WordPress: http://fontleech.com/
LogicWorks: http://www.hackaday.com/rss.xml
Blogger: http://colinjudge.blogspot.com/atom.xml
Movable Type: http://www.makezine.com/blog/atom.xml
CityDesk: http://www.joelonsoftware.com/rss.xml

I'm checking 23 feeds total.  I think all of them have produced duplicates
(except I can't remember Alterslash doing so), but never all of them at once.

I hope you find any of this information helpful, I'd really like to see this bug
squashed (and to keep using Thunderbird for my RSS feeds!)
Here's the story of my experience with this bug.  I've set up a bunch of RSS
feeds in Thunderbird. Most of them work just fine. Others, however, are very
problematic. Basically, they just keep pulling the same item off the RSS feed
over and over and over. I can end up with 30-40 copies of the same item.

Of course, it could be a problem with Thunderbird, or it could be a problem with
the RSS feed. Here's one of the RSS feeds that's troublesome:

http://www.technorati.com/watchlists/rss.html?wid=24991

If you're trying to replicate this potential bug, be sure to let it sit in your
folder for a while - it appears that the multiples appear one at a time, over
time.  One theory - that the RSS feed is being recreated from scratch each time
it's loaded, causing TB to be unable to identify repetitive elements.  Not sure.
 I'm no RSS guru.

Thanks!
I found a tool called Duplicate Message Remover 0.1 recently.  It is pretty good
at removing duplicate messages, even if you have the feed group by sort.  Can't
something like this be used to prevent duplicate messages from being put into
the folders in the first place?
I'm not sure whether this is relevant to the solution or not, but I've found
LonghornBlogs to be the biggest offender amongst my RSS subscriptions.  It takes
about two days to generate 1000 duplicate messages (that's about 50 dupes of 20
posts), no matter whether I delete them or not.
 
The URL for the Longhorn Blogs feed is http://fortes.com/work/feed
 
What may also be of relevance is that I experience this duplication problem even
when Thunderbird is left running (as I have it running almost all the time at
work).  I leave Thunderbird open for days on end.
 
I hope this helps in tracking down the precise cause of the problem.
See bug 267682, now fixed -- does that affect this bug?
afair other fixes for issues of this type in the feeds feature were made as
well, on trunk? I saw this all the time on 1.0.x; I haven't seen this since
migrating to trunk months ago. I read the comments through, and no one has
indicated this bug exists on trunk. So unless someone can reproduce this on
trunk, or it's deemed blocking-aviary1.0.4, it should be marked fixed (or wfm).
(In reply to comment #49)
> See bug 267682, now fixed -- does that affect this bug?

I'm hoping it does...

In reply to comment #50)
> afair other fixes for issues of this type in the feeds feature were made as
> well, on trunk? I saw this all the time on 1.0.x; I haven't seen this since
> migrating to trunk months ago. I read the comments through, and no one has
> indicated this bug exists on trunk. So unless someone can reproduce this on
> trunk, or it's deemed blocking-aviary1.0.4, it should be marked fixed (or wfm).

Trunk builds still have this problem for me
Doug, have you nuked the feed related bits in your profile in the recent months?
I vaguely recall nuking mine sometime around when I switched to trunk, and the
recreated bits being ... um, different somehow. Can you give me an example feed
that has had this issue lately so I can test with it?
(In reply to comment #52)
> Doug, have you nuked the feed related bits in your profile in the recent months?
> I vaguely recall nuking mine sometime around when I switched to trunk, and the
> recreated bits being ... um, different somehow. Can you give me an example feed
> that has had this issue lately so I can test with it?

http://meyerweb.com/eric/thoughts/rss2/full
*** Bug 295188 has been marked as a duplicate of this bug. ***
Removing "fixed-aviary1.0" keyword, since it's not fixed on 1.0.x
Keywords: fixed-aviary1.0
Still happening with a fresh install of the latest release version of
Thunderbird, 1.0.2 (20050317).

It even happens with major feeds:
http://www.cbsnews.com/feeds/rss/main.rss
http://newsrss.bbc.co.uk/rss/newsonline_world_edition/front_page/rss.xml
For me it started in version 1.1a2, while before (1.0, 1.01, 1.02, 1.06) it
never showed duplicate entries. Furthermore the duplicate entries cannot be
edited (the ones that are not duplicated can be edited).

BTW: didn't try 1.1a1.
*** Bug 303394 has been marked as a duplicate of this bug. ***
not blocking

I still haven't been able to reproduce the latest reports that caused this bug
to get re-opened. Fortunately, a lot fewer folks are seeing this issue than before. 
Flags: blocking-aviary1.5+
Perhaps you could try this to reproduce it:

1. Run Thunderbird 1.06.
2. Add some RSS feeds.
3. Remove some RSS feeds.
4. Re-add some RSS feeds that you removed in step 3.
5. Close TB 1.06
6. Start TB 1.1a2.

I believe that this is how it occurred to me.
The feed of Roger Ebert's movie reviews
(http://rogerebert.suntimes.com/apps/pbcs.dll/section?category=RSS&mime=xml_)
are being horribly duplicated. It's not five or ten articles that are being
duplicated, its 53 articles each duplication and I got duplicates twice within
less than a week.
(In reply to comment #60) [Ben]
> Perhaps you could try this to reproduce it:
> 
> 1. Run Thunderbird 1.06.
> 2. Add some RSS feeds.
> 3. Remove some RSS feeds.
> 4. Re-add some RSS feeds that you removed in step 3.
> 5. Close TB 1.06
> 6. Start TB 1.1a2.

This is not the scenario that I had, when I saw the problem in 1.0.x builds; I'm 
not sure if it's the scenario for anyone other than you, at this point.


(In reply to comment #61)
> The feed of Roger Ebert's movie reviews [...]
> are being horribly duplicated.

Which build of the program are you using?  If (as I suspect) you're not using a 
build from the trunk since April (e.g. 1.1a, or 1.0+, etc.) then it's no 
surprise that you're seeing this symptom -- the potential fix cited at 
comment 49 is not (and will not be) part of the 1.0.x branch.
Whiteboard: [DO NOT COMMENT unless you're using a trunk or 1.8 branch build]
> > The feed of Roger Ebert's movie reviews [...]
> > are being horribly duplicated.
> 
> Which build of the program are you using?  If (as I suspect) you're not using a 
> build from the trunk since April (e.g. 1.1a, or 1.0+, etc.) then it's no 
> surprise that you're seeing this symptom -- the potential fix cited at 
> comment 49 is not (and will not be) part of the 1.0.x branch.

I'm pretty surirsed myself, especially since I'm seeing it in nightly trunks.
I'm currently using Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8b4)
Gecko/20050821 Thunderbird/1.0+ ID:2005082116 but I ended up deleting the feed
three days ago (in another recent nightly trunk) because I got tired of clearing
out duplicates. I'll add the feed again and let you know if the problem persists.
(In reply to comment #60)

I misinterpreted the bug's title. Although I also see the problem listed in this
bug for many feeds, what is new in TB 1.1a2 is that the feeds itself (not the
feed items) are duplicated. I'll search for another bug or open a new one.

I suggest to change this bug's description to: "Duplicate messages appear in RSS
feeds".

Sorry for the confusion.
(In reply to comment #59)
> not blocking
> 
> I still haven't been able to reproduce the latest reports that caused this bug
> to get re-opened. Fortunately, a lot fewer folks are seeing this issue than
before. 

The feed from a list apart is going crazy for me right now
(http://www.alistapart.com/rss.xml) - you may be able to reproduce with that.
No duplicates so far with both above mentioned feeds.
Months ago I had to remove Reuters feeds for unberable duplication. I still get
duplicates with some feeds, from time to time..
Correct me if I'm wrong, but we're not talking about eg an entry appearing twice
in a feed; It seems to me that sometimes, when a new item is available, it's all
the feed's items that are gotten (again).
I was assuming it could be the guy messing with his or her generator, or a
server bug/reset, but it seems that a feed which doesn't work for you would work
for me et vice versa.
Sorry, forgot to say: It vaguely resembles bug 262408 doesn't it?
(In reply to comment #66)
> No duplicates so far with both above mentioned feeds.
> Months ago I had to remove Reuters feeds for unberable duplication. I still get
> duplicates with some feeds, from time to time..
> Correct me if I'm wrong, but we're not talking about eg an entry appearing twice
> in a feed; It seems to me that sometimes, when a new item is available, it's all
> the feed's items that are gotten (again).
> I was assuming it could be the guy messing with his or her generator, or a
> server bug/reset, but it seems that a feed which doesn't work for you would work
> for me et vice versa.

Actually with the ALA feed I mentioned, I'm getting combinations of the whole
feed again, or just the last 3 articles every time the feed tries to update.
This has only occured since they moved to a new server....

With the IEBlog, duplication of the entire feed occurs for me a couple of
times/week.
I'm sad to say, this just happened to me on:
1.0+ (20050825)

Yup, it happened on the latest mozilla 1.8 branch. :-(

The feed:
http://sports.yahoo.com/nhl/rss.xml

It was not deleted and recreated. It was part of an OPML file (exported from
Sage) I imported a day after Scott mentioned "a lot fewer folks are seeing this"
almost a week ago. There were no other RSS accounts in the profile, before this
one. It only happened on one feed (I have many, including p.m.o, asa,
sportsline), and only 4 items were duplicated. They weren't even contiguous
either. The XML file contains 7 items, and I received duplicates of items 3,4,6,
and 7. (1=most recent, 7=oldest) There were no new items.

I've saved the XML file to my HD, and I'll make a copy of my profile folder. If
you'd like me to post any of those files, just say the word.
This bug has been opened almost a year now. What needs to be done to fix this?
It happens to me on a few craigslist feeds I subscribe too, and while teh
"remove duplicates" extensions helps, it would be nice to see this bug gone.
After not having a problem with this for a while (the Roger Ebert feeds are
working fine now), this bug has beitten me again in recent days. I am
particularly noticing it on three feeds:

   1. Reuters Entertainment News
(http://www.microsite.reuters.com/rss/Entertainment)
   2. Reutres Technology News (http://www.microsite.reuters.com/rss/technologyNews)
   3. Spread Firefox (http://spreadfirefox.com/community/?q=node/feed)

The two Reuters feeds in particular are getting duplicates from several days ago
(not even duplicating recent articles) and the SFX feed is simply duplicating
the last 15 articles. 

I am currently running Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8b4)
Gecko/20050906 Thunderbird/1.4 ID:2005090608
Sorry, it's not the Reuters Entertainment feed that's duplicating, it the
Reuters Oddly Enough (http://www.microsite.reuters.com/rss/oddlyEnoughNews)
that's duplicating, and Tb just duplicated the 10 latest articles. Retuers Tech
just duplicated six of the last 13 articles.
(In reply to comment #72)
> Sorry, it's not the Reuters Entertainment feed that's duplicating, it the
> Reuters Oddly Enough (http://www.microsite.reuters.com/rss/oddlyEnoughNews)
> that's duplicating

OK, testing this with 1.5a1-0904, I saw duplication at Oddly Enough -- after 
deleting the articles and restarting, in my case, but that's probably not 
actually a contributing factor. (But see bug 297359.)

The Roger Ebert feed, the Yahoo Sports feed, and two Craig's List feeds I have 
set up have not been duplicating for me, while running various 1.5a1 and 1.6a1 
builds.  I used to get dupes of Craig's List (with 1.0.x builds) all the time.

I happened to speak with someone a couple weeks ago who is a bit of an RSS 
maven.  He said that duplication can be the fault of the feed side.  In 
particular, certain schemes of putting ads on the feed page can cause a feed 
item to be listed as "new" even when the content hasn't changed.
This bug frequently appears on the PHP news feed at http://www.php.net/news.rss

The "10 Years sinse PHP 1.0 was released!" item duplicates freqently (I think
atm its 2 times a day) (Thunderbird 1.5 Alpha 2)
*** Bug 297359 has been marked as a duplicate of this bug. ***
Duplicate feeds occur randomly in subscriptions.  I'm currently using version
1.5 Beta 1 (20050908), but have noticed it on all versions of Thunderbird.  I
keep Thunderbird running constantly and update feeds every hour and periodically
this occurs.
A further data point:

I had this problem with my Windows XP version of Thunderbird (all versions up to
and including 1.5 Beta), but not on my Linux version 1.0.6.

HOWEVER, I recently copied some of my subscriptions from my profile folder on
Windows across to my profile folder on Linux and THOSE copied folders/feeds from
Windows experience the bug, but my original Linux feeds do not.
FYI bug 9413 has landed to the branch. Try to set mail.server.default.dup_action
to 2 and see if duplicates get shipped to trash. It might require a restart.
Basically I subscribed to a feed then unsubscribed then subscribed again, and
the second bunch of items made it directly to the trash folder.
See this post for more pref values:
http://forums.mozillazine.org/viewtopic.php?p=1773414#1773414
(In reply to comment #78)
> FYI bug 9413 has landed to the branch. Try to set mail.server.default.dup_action
> to 2 and see if duplicates get shipped to trash. It might require a restart.

Nope. Tested in a new profile, created an RSS account, set the pref. I did that
the day you posted (2 days ago). Good ole planet.mozilla.org just gave me 7
duplicates. Verified by checking http://planet.mozilla.org/ .
(Using mozilla1.8 builds, updating via auto-update)
Haven't extensively tested the feature yet but I think it should detect
duplicates unless you have deleted the original items, which I agree is
sub-optimal if (like I) you delete most items after reading them.
Sounds like a "SELECT DISTINCT" request to me ;)
*** Bug 312785 has been marked as a duplicate of this bug. ***
Flags: blocking1.8.1?
*** Bug 280033 has been marked as a duplicate of this bug. ***
*** Bug 314309 has been marked as a duplicate of this bug. ***
I'm still getting this bug with "Mozilla/5.0 (Windows; U; Windows NT 5.1; de-DE; rv:1.8) Gecko/20051030 Thunderbird/1.5 Mnenhy/0.7.2.10015 ID:2005103006". All feeds from "CAcert.org NEWS!" at http://my.rsscache.com/www.cacert.org/rss.php get duplicated every day. I subscribed to it a week ago and have every feed 7 times now.
Assignee: mscott → sayrer
Status: REOPENED → NEW
(In reply to comment #74)
> This bug frequently appears on the PHP news feed at http://www.php.net/news.rss
> 
> The "10 Years sinse PHP 1.0 was released!" item duplicates freqently (I think
> atm its 2 times a day) (Thunderbird 1.5 Alpha 2)

I can confirm this. Looking into it...
Status: NEW → ASSIGNED
(In reply to comment #86)
> I can confirm this. Looking into it...

OK, so this problem is probably a regression that occured when i18n issues with RSS1 feeds were fixed. Line 229-230 of feed-parser.js shows the issue. 

  // Prefer the value of the link tag to the item URI since the URI could be
  // a relative URN.
  var uri = itemResource.Value;
  var link = getRDFTargetValue(ds, itemResource, RSS_LINK);  
  item.url = link || uri;

"link" is returning doubly-escaped ampersands, while "uri" is properly escaped, with an actual ampersand. Not quite sure how to fix this yet.
I'd continued to receive duplicate entries (esp. from http://www.wnyc.org/index.xml) until about a week ago, but the RSS function has worked perfectly since then.  I also upgraded from Thunderbird 1.0.6 to 1.0.7 about a week ago.

Did something in 1.0.7 resolve this bug?
I am having this issue and I am using the 1.8 branch nightly (20060222 v1.5.0.2)

The message ID's of dupe messages are identical. The only difference that I do see is the Content-Base is slightly different. (The URL has a numerical parameter which changes, but it has no effect on the content)

If differences in content-base is standard comparison practice, might be nice to have some option here to work around.

In the meantime I am using the 'Remove Duplicate Messages' extension.
Bug 331851 lists a feed with two specific feedvalidator errors, where a single entry is duplicated, using TB 1.5.
Flags: blocking-thunderbird2?
1,5 years... I think it's time someone raised the priority for this, it's making Thunderbird completely unusable for RSS reading. :(
*** Bug 337743 has been marked as a duplicate of this bug. ***
*** Bug 336396 has been marked as a duplicate of this bug. ***
How are we doing on this issue? Just wondering if there has been any progress as I'm using the latest (just downloaded and installed version 1.5.0.4 (20060516) and I'm still getting the duplicates. I've read several forum posts and all of posts here but not sure if anyone has got a grasp of the issue yet.

Thanks
*** Bug 341677 has been marked as a duplicate of this bug. ***
*** Bug 341677 has been marked as a duplicate of this bug. ***
*** Bug 341677 has been marked as a duplicate of this bug. ***
*** Bug 341677 has been marked as a duplicate of this bug. ***
not going to block. Would gladly consider a patch.
Flags: blocking-thunderbird2? → blocking-thunderbird2-
Anyone want to try this feed for testing: http://www.vu.lt/RQ/testfeed.php ? It consists of three items, and should generate new duplicates every five minutes (so make sure you update often enough).

The catch here is: Every five minutes the feed changes its HTTP response code from 200 (OK) to Error 503 (server failure). When simulating a failure, it doesn't present the user with any entries at all, and this seems like enough to trigger the duplication when the entries are back after five minutes.

You can see the colorized sourcecode of the file here: http://www.vu.lt/RQ/testfeed.phps .
I am able to duplicate the problem using the test feed at http://www.vu.lt/RQ/testfeed.php using Thunderbird 1.5.0.7.  I configured a new "RSS News & Blogs" account that checks for new articles every 2 minutes and contains only this test feed.
I also get duplicate items for that feed (using Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1) Gecko/20060925 Thunderbird/2.0b1pre ID:2006092503)

On a side note, I added it to my Test RSS account, and after that all the feeds in my main RSS account loaded all items from all feeds -> a lot of duplicates! Not sure if it was related. Quite often going between RSS accounts will cause lots of dupes.
Target Milestone: Thunderbird1.0 → ---
(In reply to comment #102)
> I also get duplicate items for that feed (using Mozilla/5.0 (X11; U; Linux
> i686; en-US; rv:1.8.1) Gecko/20060925 Thunderbird/2.0b1pre ID:2006092503)

Same here with a slightly older beta.

> On a side note, I added it to my Test RSS account, and after that all the feeds
> in my main RSS account loaded all items from all feeds -> a lot of duplicates!
> Not sure if it was related. Quite often going between RSS accounts will cause
> lots of dupes.

This doesn't seem to happen for me.

BTW, serving failure has already been mentioned as a possible reason for duplicates in comment #11.
Depends on: 354345
*** Bug 362427 has been marked as a duplicate of this bug. ***
http://ghisler.ch/board/rss.php?&f=13
http://ghisler.ch/board/rss.php?&f=14
http://ghisler.ch/board/rss.php?&f=3
http://ghisler.ch/board/rss.php?&f=6
http://ghisler.ch/board/rss.php?&f=7

For all these feeds I get duplicate messages on a regular basis.

BTW this bug is since 2004 o_O. Is there actually taking care of? Latest sign from Robert Sayre dates back in 2005.
> For all these feeds I get duplicate messages on a regular basis.

A workaround that works for me is to "delete" and "undelete" the feeds. Select them, choose "Delete Folder". Now they are in the Trash. Restart Thunderbird. Now drag them out of the Trash by dropping them on the RSS account icon. Now they should work for a while (months) without duplicates.

Duplicate messages you can remove using the extension "Remove Duplicate Messages".
Wow - this bug goes back to 2004 and they still haven't BOTHERED to fix it..... 
Assignee: sayrer → mscott
Status: ASSIGNED → NEW
QA Contact: rss
RSS team: could this bug at least be fixed for the case of serving error (as indicated by comment #100 and comment #11)? I think such fix would certainly help a lot of people become less annoyed.
If I may offer a solution / workaround: if Thunderbird is having trouble keeping track of which feed items it's already seen, why not compute an md5sum (or a less computationally-intensive hash; it doesn't need to be cryptographically secure) for each item and compare the hashes to incoming items?  The hashes could be discarded after a user-specified time interval (90 days by default?) to prevent them eating up resources.
I guess a few weeks ago this started a few weeks ago for me. So, I guess it's a regression. I am using the latest nightly trunk builds. But all of the feeds I get are redownloading the same duplicate messages every time I check for new feeds.
Flags: blocking-thunderbird3?
This was originally not happening for me (since moving from eudora to TB 2.0.0pr nightlies. I have stayed on this release (now called 2.0.0.4pre) and the only changes I've made are adding more feeds (I have alot of feeds).

There's a few feeds that are giving my doubles, but the most troublesome feed by far is http://www.ctrlaltdel-online.com/rss/rss.xml

I get double, tripple or more of the same entry, although the 'time' of the entry is different.
Oohh. I thought someone had fixed this extremely annoying bug in TB 2.0 but it is still there. This should be one of the most urgent bug fixes to be done in TB 3.0. 

My most troublesome feed: http://www.lifehacker.com/index.xml

If I don't delete the double articles with "Remove Duplicate Messages" I will find them three, four, five times in the feed. This really sucks.
I manage the feed at http://www.hbcfrankfort.org/feed.xml. I recently installed TB2.0 and encountered this problem for the first time. (Had been using the previous version for a long time.) The feed is valid RSS 2.0 according to feedvalidator.org, and uses a unique GUID for each entry. That being the case, eliminating dups seems like it should be straightforward. 
Here's a reproducible test case that's been causing me problems from 1.5 through last night's build of 2.0.0.4pre, no exceptions.  Link occasionally, mildly NSFW: http://www.skins.be/skins.xml

Feedvalidator reports valid RSS.
I have the described problem. My current situation is that I manage all the feeds in Thunderbird, unless this deuplication occurs (have two feeds that those it). When that is the case I manage them in Opera, which handles the feeds correctly.
I find this a major bug and would recommend fixing it asap. However, looking at the date of the first bugreport it seems like that some people do not find it relevant to fix or the fix isn't that easy as it seems like.
Can anybody (preferably an developer working on bug fixing) tell us anything about the bug itself and what seems to be the difficulty soving it?
I seem to be able to reproduce the dupes from http://www.skins.be/skins.xml using a trunk build.
Assignee: mscott → nobody
This is now happening for more on a daily event (today, it keeps happening ever 5 seconds!) with the following feed:

https://addons.mozilla.org/en-US/thunderbird/browse/type:1/cat:all/sort:updated/format:rss

Surely this can be tracked down based on the mozilla feed?!?! I'm happy to assist with any debug logging or the like.

I'm using the 2.0.0.5 nightlies, currently 2.0.0.5pre (20070723)

Seriously, this screenshot is just from half a day (I removed all duplicates when I woke up).. its only 1:30pm here.

http://www.abednarz.net/tb-duplicates.png
I've observed that I can get duplicates when some file gets corrupted. In my case, this has happened several time when the disk gets full. Thunderbird seems to trash the feed files when it tries to write them out and doesn't have any disk space. After I do some cleanup, I stop getting dupes on feeds that were doing it regularly. So I assume there can me multiple causes for the same apparent symptom.
is there a reason this might happen in "waves"?  

I go weeks with no problem with planet mozilla and then, several times a day, I'll get entries back for some that I deleted.  And not necessarily the most recent ones either - sometimes it's from several days back.
Version: unspecified → Trunk
Not sure why you put version:Trunk, we've been experiencing this since 2004...
FWIW some extensions out there relieve the pain a bit, e.g. https://addons.mozilla.org/en-US/thunderbird/addon/4654
Why not check for duplicated message-ids'? If it is already in a list for that feed then it is an old message.

User can decide on how many entries to save.

Here is another feed which on regularly basis sends duplicated messages:

http://www.dn.se/vetenskap-rss
(In Swedish)
This appears to be causing a crash for me.  Using 200708211545 build from here http://hourly-archive.localgho.st/, I have no crash.  Using the 200708211656 build from the same location, I get a crash when the suggestions start appearing from the search bar.  The only checkin during this interval is:  http://bonsai.mozilla.org/cvsquery.cgi?module=PhoenixTinderbox&date=explicit&mindate=1187736300&maxdate=1187740559

I get the crash in normal mode, and in safe mode.  However, I do not crash with a new profile.  

Does anyone want me to upload part of my profile so that we can see what might be causing this?
Ignore my last comment, https://bugzilla.mozilla.org/show_bug.cgi?id=258465#c126.  Bonsai listed 389503 instead of 389593.
I've got very interesting behavoir that I've noticed with a feed from addons.mozilla: https://addons.mozilla.org/en-US/thunderbird/browse/type:1/cat:all/sort:updated/format:rss

I have TB2.0.07pre on my laptop (this issue has been happening since before 2.0.0.0 was officially released however) When at home, I get *no* duplicates on the above feed. As soon as I get to work, I get *every* article duplicated, pretty much every time the feed is checked. The only difference is the network that I connect to.

Is there any logging I can turn on, or the like to help track down this one?
If I understand you correctly, you use the same thunderbird installation on the same computer (the laptop) on different connections and then get duplicated feeds?

Hm. this problem gets wierder and wierder...
That's correct, I turn my laptop on, connect to the work network, and slowly get duplicates (at 5:56pm (Melb Australia), right now, I'm still at work and have 480 unread messages in the above feed). As soon as I get home, turn my laptop on, connect to my network, I will stop getting duplicates. Tomorrow at work I will get another 480 or so duplicates, and so on..

Since I have a relatively 'stable' error producing enviroment, I want to know what I can do to help get to the bottom of this wierd issue.
on some feeds i get duplicates _almost_ every time thunderbird scans for new items

sometimes these are old messages , sometimes new ones . it appears random.

Subject , Sender , Date  <- are identical


---------------------------


what i think is causing this bug ; is a server side error ; where the server gives  a result not including new messages in it's XML querry , due to , either locking of the records , or equivalent. with a fixed number of message results.

say a server returns 

1 ) message D
2 ) message E
3 ) message F

the first time

and in the next querry returns

1 ) message E
2 ) message F
3 ) message G

everything runs as it is supposed to.

BUT

if the server returns 

1 ) message C
2 ) message D
3 ) message E

due to content of message F being locked , message C gets duped

when that happens , and in the next xml querry result it's D E F again , F gets duped.

on client side:

so , the code that involves checking if messages are new , should not consist of the results of the previous xml querry , but rather a list of all the messages currently in the list . preferably ( tho not REEEEALY unique) use subject and date combo as key. , 

ie : assulptions that the RSS XML only gets added on , are false

or a realy dirty solution , extra check if news item with same subject and date are already present in the list , if they are , do not add. ( i feel dirty after i type this)

---------------------------------

ps : thunderbird is not the only rss client that i get these errors on
i get the same from (for example) Good News plugin fot trillian (to just name one)


I think (so far it is only a guess) that the problem with http://www.skins.be/skins.xml is caused by the newlines in the <guid> elements.

The feed parser uses the contents of <guid> and <atom:id> as lookup keys in the RDF data store. The RDF store requires that the keys are URIs. However, neither <guid> nor <atom:id> are necessarily URIs ( <guid> may contain any string, and <atom:id> should contain an IRI), so this causes problems when they are not wellformed URIs.
(In reply to comment #133)
> I think (so far it is only a guess) that the problem with
> http://www.skins.be/skins.xml is caused by the newlines in the <guid> elements.
> 
> The feed parser uses the contents of <guid> and <atom:id> as lookup keys in the
> RDF data store. The RDF store requires that the keys are URIs. However, neither
> <guid> nor <atom:id> are necessarily URIs ( <guid> may contain any string, and
> <atom:id> should contain an IRI), so this causes problems when they are not
> wellformed URIs.
> 

Your saying that the code of the feed parser is correct only the feed (of the content provider) is not shaped correctly?
No, <guid> and <atom:id> are not required to contain valid URIs according to the RSS and Atom specs, so this problem may occur even with valid feeds.

People just often use URIs for <guid> and <atom:id>, so it works most of the time.
I created a separate bug for the specific problem with guids that are not strings (bug 410842). I believe this is only the cause of some of the problems described in this bug.
I still get this on 3.0a1, albeit perhaps less often. One example is this post:
http://weblogs.mozillazine.org/weirdal/archives/019323.html
from this feed:
http://weblogs.mozillazine.org/weirdal/index.rdf

I can't see what's wrong with the feed that would cause this problem.

Gerv
My feeds entirely stopped working in TB2, so I installed the latest nightly of TB3 today (June 19, 2008).

Starting with 10 or so empty feed folders (deleted all messages in TB2), TB3 has downloaded a new copy of every message in every feed, every time it checks for new messages.  I have several hundred duplicate messages in feeds that have not even been updated today.

Since the feeds are from various sources (Wordpress blogs, Slashdot, hand-written, etc.) I'm doubting it's an error with the feed.

Will be trying other versions of TB3 tomorrow (a2 probably).


Reported: 2004-09-08
Well... this is so mozilla style.
Flags: blocking-thunderbird3?
Flags: blocking-thunderbird3.0b1?
Flags: blocking-thunderbird3+
Not a blocker, but it would be very nice to see this fixed for tb3.
Flags: wanted-thunderbird3+
Flags: blocking-thunderbird3.0b1?
Flags: blocking-thunderbird3.0b1-
definitely wanted, p1, but not a blocker
Flags: blocking-thunderbird3+ → blocking-thunderbird3-
Priority: -- → P1
the **** RSS ENGINE is BROKEN. it NEEDS FIXING. WTH is WRONG with you guys? THIS IS A BLOCKER.
Re: comment 143. Wow, that's good.

Anyway, could anyone of the developers take a look at comment #100 which describes how to clearly and surely reproduce the problem (or at least one of its symptoms)?
It seems like some people here don't seem to see the issue here. I'm being told to read the etiquette and calm down. The *entire* RSS feature is unusable, and it has been for almost FOUR YEARS. While mozilla (or whoever maintains thunderbird these days) keeps on versionwanking. The major version of a package is increased when there's been listened to the community, or a big change has been committed. The only thing I see changing is the damn GUI. Can't a BROKEN FEATURE be fixed?

Here in the Netherlands, you're not allowed to drive a car which has a broken part. It doesn't matter which part. If your airconditioner is failing, you can't drive your car, not even in the middle of the winter. Okay, an e-mail client is not a car, but why can't this just be fixed? Put a guy on it for a week, and it's fixed. Heck, maybe a day. But no, four years have passed and nothing changed. And we seem to still be in the need of some new GUI or a new feature. Why can't you just fix what you have first, and then add new stuff?
Auke: feel free to take on fixin' it! For the record, I very rarely get duplicates nowadays. Not saying you don't, but since you are wondering why noone fixed it... If you can't code, at least give steps to reproduce (e.g. with a clean profile).

Rimas: the feed in comment #100 is empty atm. 
@#146: sorry, I am not the one who can fix this. I am in no way familiar with mozilla-style code, and it'd take me months. I don't know how to reproduce the problem either, because I'm not trained in decoding patterns. But it bothers me sooo much that a feature-wise great product like Thunderbird is apparently not going to be known for being bugfree. The maintainers should do something. And I'm trying to say they can. But currently, they're almost acting like microsoft: they keep changing the GUI and they keep saying they're changing the engine, but I'm not seeing this back in this bug. An RSS reader isn't hard to make, so in the worst case scenario the entire RSS part can be rewritten completely in a week or two. Maybe the swearing was misplaced, but can't anyone just do something about this bug?
I used to have the dupe entry problem with Thunderbird, but I don't anymore. Either the feeds fixed whatever was wrong, or Thunderbird just handles it better. I don't see this issue nearly as much as I used to. It's not that big of an issue for me. 
(In reply to comment #147)
All I'm saying is nobody can fix it if we can't reproduce.
(In reply to comment #148)
> It's not that big
> of an issue for me. 

The bug is there and if you can't reproduce it or it doesn't annoy you doesn't mean that it is not there.
This is not only P1 but a BLOCKER, please.
(In reply to comment #146)
> Rimas: the feed in comment #100 is empty atm. 

Magnus: the feed in comment #100 cycles between being empty (and returning Error 503) and having three (always same) entries (and returning Status code 200) every five minutes.

If you just subscribe to that feed, you'll soon find quite a lot of duplicates in its folder. This is clearly reproducible. If you change feed refresh time to 5 minutes in Thunderbird, you'll get three dupes every ten minutes!
Addition to comment #151:
Just for the record: I'm currently having 8556 entries instead of 3 in that feed's folder. This turns out to be 2852 duplicates for each entry. :)

I'm using Thunderbird 2.0.0.16.
(In reply to comment #149)
> All I'm saying is nobody can fix it if we can't reproduce.

Guys, from our (user's) point of view you're never ever tried to reproduce it. It can be reproduced with any feed within 1 or 2 hours. Put here your steps how you've tried to reproduce and we'll find step you're missing. Even on feed you'll provide us.
I think that this bug is a blanket bug for a number of unrelated bugs.  I was experiencing duplicate entries in RSS feeds, but my symptoms were different to what some people are describing.  I narrowed the problem down to only feeds that contained items that had a doubly escaped ampersand in the link.  Here are the steps to reproduce my problem:

1.Subscribe to a feed that contains a link that has a doubly escaped ampersand
2.Click get mail to download the feed
3.Click get mail again to download the feed again

Here is an example feed that can be used to reproduce the problem:

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"
xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:dc="http://purl.org/dc/elements/1.1/">
  <title>Example</title>
  <link rel="alternate" href="http://www.example.com" />
  <subtitle>Example subtitle</subtitle>
  <id>http://www.example.com</id>
  <entry>
    <title>Example Entry</title>
    <link rel="alternate"
        href="http://www.example.com/page?id=10&amp;amp;comment=12" />
    <author>
      <name>John Smith</name>
    </author>
    <published>2008-05-13T23:18:30Z</published>
    <summary type="html">
        Summary text
    </summary>
    <dc:creator>John Smith</dc:creator>
    <dc:date>2008-05-13T23:18:30Z</dc:date>
  </entry>
</feed>

For convenience, I have placed a copy of this feed on my own webserver, at http://jazzy.id.au/thunderbird-duplicates.xml

I have experienced this bug on Thunderbird 2.0.0.16 under Ubuntu Hardy Heron, as well as Mac OS X Leopard.

Obviously, there are two bugs here, one in Thunderbird with the duplicate feeds, and one in the application generating the RSS feeds, because it shouldn't be doubly escaping ampersands.  However, the second bug seems to be common enough to cause quite some annoyance, indeed, if you read above, Mozillas own RSS feeds at one point exhibited this bug.  I came across it because Atlassian's Confluence exhibited the bug in its RSS feeds, but seeing as I work for Atlassian, I was able to get the bug fixed in the next iteration and so it wasn't a problem for me anymore.

As a side note, another client, NetNewsWire, is not only able to handle these feeds, but they also doubly unescape the links, not sure if this is intentional or not, but it means that this common bug causes no issues for NetNewsWire users.

Now, as I understand it, most Mozilla developers are volunteers, working in their own time.  So to those of you who's tone in these comments has been quite rude, who do you think you are being rude to people who you are not paying, while not providing them enough information for them to reproduce the problem in their environment?  Do you think they have the time, in between their day jobs, and looking after their children, to spend hours trying to guess what your exact environment is to reproduce the bug?  Sure, you may be frustrated, but don't be rude to people you aren't paying.
I agree entirely with the comments that have gotten rather "heated".

This bug has been here since I've been running Thunderbird, it has NOT been addressed, there has been ZERO effort put into it, and it is trivially reproducable.

Set up a feed to look at any of the various Yahoo News RSS links, and it will appear within hours, along with a whole host of other places.

Now you've got someone who just gave you a "quick and easy" way to reproduce it, any time you'd like.

So what's the excuse now?

If the Thunderbird development team's response is basically "**** off", then just the right thing to do is REMOVE THE RSS FEATURE ENTIRELY.

Of course that's not politically correct, is it? 

How do the developers justify shipping update after update with a KNOWN problem like this, simply ignoring it?
(In reply to comment #154)
> I think that this bug is a blanket bug for a number of unrelated bugs.

Yes. AFAIR there are 3 possible scenario to reproduce message duplicating. Me myself stopped using Thunderbird as RSS reader after 1.5 years of waiting for a fix.

> I narrowed the problem down to only feeds that contained items that had 
> a doubly escaped ampersand in the link.  

This is one case, I experienced it too. But I prefer to generalize from "had a double escaped ampersand" to broader range of semi-valid links. IMHO it was enough to get one such message to make Thunderbird duplicate all newer messages including such broken one. The another possible scenario was described in comment #100 -- when server with feed become not available for some time -- in such cases Thunderbird either starts to duplicate or refuse to update feed until restarted even if feed's server available again. Also I've experienced what was described in comments 17...19 even in more recent versions of Thunderbird (2.0 branch). Plus AFAIR I got duplications after feed folders grown to some thousands of messages (thousands of read messages with tens of thousands of deleted).

IMHO mozilla guys neither use Thunderbird themselves as RSS reader nor put more than 1 minute of effort trying to reproduce problem and consider problem as non-reproducible just because they were not able to reproduce it in such short time.
hmm, i don't see an explicit or implied "piss off". I also have *a* problem, so I relate to those who are frustrated. However, it would be *helpful* to channel frustration in productive directions. 

Thank you James for that productive comment. Unfortunately this bug has been left open too long and allowed to become a bucket for probably too many issues.  And we are now digressing to etiquette issues [1]  (yes, it doesn't help that the primary coder in this area is gone and that no one has attempted further patches since 12/2004). Which leads us to ...

Good practice in bugzilla-land is when there are multiple causes and issues each should have it's own bug - this allows people who care, and coders who are interested in fixing well defined problems, to better focus on *a* problem and achieve a solution.  

So for everyone who cares, has cc: and has voted, some suggestions:

1. Get bugs filed on each discrete RSS issue ... including the ampersand issue ((I don't see a bug on on it even though it is described multiple times in this bug) so that when it gets fixed we don't have 20 people say their issue is fixed and 30 say they don't (what helps is filing a new bug).  

2. Get out there in bug-land and triage [2] the crap out of the RSS bugs [3]. Get rid of the ones that no longer exist, and improve the ones that do.  In other words go out and be a little selfish - help yourself by touching bugs others care about (and where you may benifit as well), and then maybe someone will return the favor and help with a bug you care about.

3. cc: on the bugs you do care about - get engaged

4. Have a look at the code and specs, even if you are not a coder [4]

There is still plenty of time to get some of these fixed before beta 2. And any that aren't fixed by then can be considered for what will eventually be 3.0 _branch_.


Shockingly (really) I think one person in 4 years has posted any workaround. So here's mine - the severity of the problem is greatly lessened, but not bullet proof
a) never or rarely  manually "get new [RSS] messages"
b) set a high value for automatically getting new messages (I use 100)


[1] https://bugzilla.mozilla.org/page.cgi?id=etiquette.html
[2] https://wiki.mozilla.org/Thunderbird:Bug_Triage  
[3] http://tinyurl.com/6qrwlu
[4] http://mxr.mozilla.org/mozilla/source/mail/extensions/newsblog/
    http://en.wikipedia.org/wiki/RSS_(file_format)
Depends on: 451737
I re-filed the reproducible intermittent 503+noitems testcase as bug 451737.
(In reply to comment #158)
> I re-filed the reproducible intermittent 503+noitems testcase as bug 451737.

which brings up some points:
1. post numbers here of new bugs filed
2. new bugs don't have to be 100% reproducible, but should have clear steps to reproduce, including URL


(In reply to comment #157)
> 2. Get out there in bug-land and triage [2] the crap out of the RSS bugs [3].
> Get rid of the ones that no longer exist, and improve the ones that do.  In
> other words go out and be a little selfish - help yourself by touching bugs
> others care about (and where you may benifit as well), and then maybe someone
> will return the favor and help with a bug you care about.
> [2] https://wiki.mozilla.org/Thunderbird:Bug_Triage  
> [3] http://tinyurl.com/6qrwlu

join us on Thursday bugdays and you'll get free :) assistance in triaging bugs
https://wiki.mozilla.org/Thunderbird:QA_Days
Depends on: 451770
(In reply to comment #153)
> Guys, from our (user's) point of view you're never ever tried to reproduce it.
> It can be reproduced with any feed within 1 or 2 hours. Put here your steps how
> you've tried to reproduce and we'll find step you're missing. Even on feed
> you'll provide us.

Okay, here's a video of me setting up the feed in the latest Thunderbird trunk build on Windows XP, and checking it:
<http://ilias.ca/flashback/rssdupes.html>[4.2M]

Here's approx. two hours after setup: <http://ilias.ca/flashback/rssdupes-2hrslater.html>[1.0M]. Still no dupes.

And here's approx. four hours after setup: <http://ilias.ca/flashback/rssdupes-4hrslater.html>[1.4M]. Still no dupes.

What are the steps I'm missing? I'm not saying the bug does not exist. In fact, I blogged about it in 2005 <http://ilias.ca/blog/2005/03/my-1-thunderbird-bug/>. What I /am/ saying is that your steps to reproduce are too vague to be helpful in fixing this bug. The more details, the better. How Mike found out the circumstances for bug 433386, I'll never know. :-)

P.S. I'm not a coder. :-)
If nobody can't reproduce it with <http://ghisler.ch/board/rss.php> then I consider this bug fixed.
Note: it usually happens after TB restart. I've had it with TB 2.x and 3.x.
@#161: WHAT? Are you crazy? While only a few comments back we have deduced that this is a blanket bug?
(In reply to comment #161)
> If nobody can't reproduce it with <http://ghisler.ch/board/rss.php> then I
> consider this bug fixed.
> Note: it usually happens after TB restart. I've had it with TB 2.x and 3.x.

This is pure nonsense. The essence of a bug is not its reproducibility. The bug is there, and just have a look at how many votes it received and how many CC. Again, thanks to all who are freely contributing, but do not simply deny the evidence just because you can't see it.
This MUST BE A BLOCKER
Please stop asking for this to be a blocker.  As a developer, I promise you that saying "this must be a blocker" only lowers the chances of someone actually looking at this bug.

A blocker must be something that prevents usage of the application, e.g. makes it impossible to develop or use it.  If, for example, Thunderbird locked up every time it checked a feed, this would be a blocker.

A critical bug is one that causes crashes (but can be avoided, at least enough for testing), causes loss of important data, and otherwise makes the application a "ticking time bomb."

A major bug (this bug's current severity) is one that causes a "major" (not trivial) loss in functionality.  For a comparison with Firefox, if bookmarks could not be created at all, it would be "major" (not critical or a blocker!)

And in any case, as a developer, I will also tell you that a bug is its reproducibility.  Most development shops (which you PAY several thousand dollars to for software) make you sign agreements saying they won't do a thing unless you give them steps-to-reproduce.

There's a problem, but if I can't easily reproduce it, it might take me/someone hours and hours (possibly days) of testing to fix/figure it out.  Does that sound fun to anyone?  It sounds like a wild goose chase to me.

Anyway, some people are trying to help here and that is good - please, just stop making the helpful comments hard to find!

-[Unknown]
(In reply to comment #161)
> If nobody can't reproduce it with <http://ghisler.ch/board/rss.php> then I
> consider this bug fixed.
> Note: it usually happens after TB restart. I've had it with TB 2.x and 3.x.

I ran with this feed checking on startup and 20 min intervals in parallel on 2.0.0.6/16 and 20080819021808 3.0b1pre in the daytime since you mentioned it, including several restarts; also running the tbird3 build for a few hours and a three-ish restarts on the profile where tbird2 was used to subscribe. I only saw tbird2 getting duplicates (usually a feedful, on startup). If you can still reproduce dups in a trunk (tbird3) build, file a bug on it and mark it blocking this one.

(Also, don't let the overreactions to your mention of what part of this bug you happen to mostly care about drive you away or anything :)
(In reply to comment #165)
> 
> If you can still
> reproduce dups in a trunk (tbird3) build, file a bug on it and mark it 
> blocking this one.

Indeed, it looks fixed on my side with TB trunk.

> (Also, don't let the overreactions to your mention of what part of this bug 
> you happen to mostly care about drive you away or anything :)

Thanks.
Unfortunately it happened again this morning with <http://ghisler.ch/board/rss.php>.
Whiteboard: [DO NOT COMMENT unless you're using a trunk or 1.8 branch build] → [file specific bugs with specific URL - see comment 157][delight]
Not fixed. Bug is still there in Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1a2pre) Gecko/2008080500 Shredder/3.0b1pre
Still there. Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1b1pre) Gecko/20080927031346 Shredder/3.0b1pre
Hi, I have found a test case with a specific message that gets duplicate again and again all days, each time you get messages from this RSS.

The RSS is:
http://www.ibm.com/developerworks/views/rss/customrssatom.jsp?type_by=Tutorials

And the conflictive message is one with Subject "CNN" from 7-Sept-2007

As I said, each time you get messages from this RSS you will get a new copy of it although you have it in TB already.

My build is: version 2.0.0.17 (20080914) on Wondows XP.

Hope this helps,

   Chemi.
Depends on: 461109
Depends on: 410842
Filed bug 461109 for comment 170.
Keywords: meta
Summary: Duplicate entries appear in feeds → [tracking] Duplicate entries appear in feeds
News?
(In reply to comment #173)
> News?

Tons of duplicates...
This bug has been here for years, any news on a fix?  I've rigged a filter to cut out an messages more than 24h old, so I get a few less now, but it's still pretty bad.
Component: RSS → Feed Reader
Flags: blocking-thunderbird2-
Product: Thunderbird → MailNews Core
I have 6 RSS accounts and run Lanikai nightly builds continuously without this issue until a few days ago.
It then started on all RSS accounts simultaneously. I assumed at first it was a new bug in the nightly build (hence my https://bugzilla.mozilla.org/show_bug.cgi?id=608520 ), but on further testing with the same profile it happens using several different TB versions going back to TB2 running in both i586 and x86_64. 
This only leaves my profile as common denominator, so can anyone think what may have changed in the profile to trigger this? I had not changed any account settings manually prior to this starting. 

Barry
On reflection (just to clarify) I do recall getting single duplicates which were never a real problem to me - this rapid multiple spawning is new and has never happened until now. 

There may be two different bugs here.
6 years and counting - will this ever be fixed?
I am deleting 10,000+ duplicates every few days.
(In reply to comment #180)
> 6 years and counting - will this ever be fixed?
> I am deleting 10,000+ duplicates every few days.

File a new bug this is a tracking bug :-)
Huh??
I did (608520) last October and it was just marked resolved as a duplicate of this.
So I don't see how reporting it again will achieve anything.
What is a [tracking] bug?
I was the one that marked it as a dupe, so it's probably my fault. Sorry. This bug morphed into a meta bug and I didn't know they wanted a separate bug for each URL.
Depends on: 608520
I'm closing this bug for the following reasons:
1) A massive renovation of feeds in Tb was concluded with the release of Tb17.  The codebase is not the same as when the dupe bugs were rampant. (Neither is usage, to be sure.)
2) There hasn't been a dupe report since Tb17.

This is not to say there can't be a true incorrect dupe.  But it would need its own testcase/example.  The most likely cause of a true dupe is a publisher reinserting a same item after removing it and Tb's dupe cache expiring (24hrs).

There are more likely to be apparent dupes, which are not true dupes, due to publisher reasons than due to an issue in Tb.  For example, the current file in this feed contains identical items differing only by guid - http://www.facebook.com/feeds/page.php?id=204474822902925&format=rss20.  Etc.
Status: NEW → RESOLVED
Closed: 20 years ago11 years ago
Resolution: --- → INCOMPLETE
Flags: needinfo?(saurabh.edelytics)
Flags: needinfo?(bugzilla1)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: