Closed Bug 264482 Opened 17 years ago Closed 7 years ago

RSS protocol (not Atom) feeds with no <guid> only, and identical <link> or no <link> and identical <title> are treated as dupes and not stored

Categories

(MailNews Core :: Feed Reader, defect)

x86
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 28.0

People

(Reporter: u162355, Assigned: alta88)

References

Details

Attachments

(2 files, 1 obsolete file)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; rv:1.7.3) Gecko/20041004 Firefox/0.10.1
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; rv:1.7.3) Gecko/20041004 Firefox/0.10.1

http://www.mirkx.com.nyud.net:8090/rss.php
-only 1-2 items are shown, there should be a lot more

http://www.baka-updates.com/rss.php
-the same problem

they work in straw or in liferea.
(well, the first one is a little problematic because it uses chunked-encoding.
but except that it works ok (i had to fetch that feed with curl, and then pipe
it into liferea. but mozilla seems to download the feed file fine, so the
problem shouldn't be there))

Reproducible: Always
Steps to Reproduce:
1.add one of the mentioned rss feeds to the feed list in thunderbird

Actual Results:  
only 1-2 entries are displayed

Expected Results:  
a lot more entries should be here (if you check manually the xml feed file)
The baka-updates url does not load any new rss feed items after subscribing,
where it loads the newest item only (the top most in the xml file). It only
shows a new item when deleting and resubscribing to the feed. This feed is also
quite valid with feedvalidator.org, only a small notice about the mime type,
which shouldn't be a problem with thunderbird at all. Feedreader and all the
other Programs used to read the feed work just fine with it, every item displayed.
(In reply to comment #1)
I can confirm this bug with the latest nightly (26-Nov-04) and the official 0.9
release.

Additional info:
After subscription to the feed and downloading only the latest item where in
fact about 30 should be downloaded, no new items are downloaded at all when
fetching new articles manualy and automaticaly.
The latest nightly (07-Dez-2004) still has the bug. Tried with a fresh
thunderbird-folder and no remaining profiles.
These feeds fail for three reasons. 

1.) They don't have GUIDs.
2.) Some of the items contain invalid URIs in the link element. This prevents them from being stored, 
since the link element is used as an identifier in the absence of a GUID.
3.) Some of the items don't contain link or guid elements. Thunderbird fails on these, even though they 
are valid RSS (whatever that means...).

I have patches for this these, but other bugs are preventing me from making a clean patch right now 
(bug 258102, bug 278560).
*** Bug 273834 has been marked as a duplicate of this bug. ***
As noted at the dupe, I am not having any problem with the baka-updates feed.

The nyud.net feed is being reported as "not found" at the moment.


The bugs referenced in comment 4 have both been fixed.  Robert Sayre, what is 
the status of the patches you mention?
> 
> The bugs referenced in comment 4 have both been fixed.  Robert Sayre, what is 
> the status of the patches you mention?

I guess I should get that patch together for the current codebase, huh? :)

Things have moved around quite a bit, so I'll have to rewrite it.
This bug has no progress for half an year. Robert, what is the status of the
rewrite? Is there any difficulties that keep you from finishing it?

BTW, here is an extra URL for testing: http://mycgiserver.com/~lwchk/test1.xml
(In reply to comment #8)
> BTW, here is an extra URL for testing: http://mycgiserver.com/~lwchk/test1.xml

That feed is working for me at the moment (I got four articles when I added it).


Robert Sayre, is there still anything left to do regarding your comment 4?
Four is wrong. There should be a total of five articles. So you "confirmed" that
the bug still exist :-(
Assignee: mscott → sayrer
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
(In reply to comment #10)
> Four is wrong. There should be a total of five articles. So you "confirmed" that
> the bug still exist :-(

The bug still exists in 1.6a of thunderbird. I guess its somehow related to the
( ) letters. The first thread of the feed containing a (, in this case (DVD), is
included in the thunderbird output, the second one directly beneath it and all
the others also including (DVD) are not displayed anymore. The first one after
that to be again displayed by thunderbid is the first one without a (DVD) in its
title. So I assume its somehow related to that.
The popular RDF feed at http://www.heise.de/newsticker/heise.rdf
stopped working in 1.5 beta 2. It works fine in 1.07
and validates without problems at http://www.w3.org/RDF/Validator/
Summary: 2 non-working feeds → 3 non-working feeds
I now tried a nightly snapshot (bug is still there) and got this backtrace after pressing Ctrl+C when it hang verifying the feed:

====================8<========================================
...
[New Thread 65541 (LWP 23853)]

Program received signal SIGINT, Interrupt.
[Switching to Thread 49156 (LWP 23851)]
0x401dab96 in nanosleep () from /lib/libpthread.so.0
(gdb) thread apply all bt

Thread 6 (Thread 65541 (LWP 23853)):
#0  0x401dab96 in nanosleep () from /lib/libpthread.so.0
#1  0x00000000 in ?? ()
#2  0x401d51d5 in __pthread_timedsuspend_new () from /lib/libpthread.so.0
#3  0x401d21d7 in pthread_cond_timedwait@GLIBC_2.0 () from /lib/libpthread.so.0
#4  0x401a6b64 in PR_Unlock () from ./libnspr4.so
#5  0x401a6d5e in PR_WaitCondVar () from ./libnspr4.so
#6  0x08106066 in nsTHashtable<nsBaseHashtableET<nsDepCharHashKey, nsAutoPtr<nsINIParser::INIValue> > >::~nsTHashtable ()
#7  0x0810626f in nsTHashtable<nsBaseHashtableET<nsDepCharHashKey, nsAutoPtr<nsINIParser::INIValue> > >::~nsTHashtable ()
#8  0x401ac531 in PR_Select () from ./libnspr4.so
#9  0x401d2f4c in pthread_start_thread () from /lib/libpthread.so.0
#10 0x401d2fda in pthread_start_thread_event () from /lib/libpthread.so.0
#11 0x40acc8ea in clone () from /lib/libc.so.6

Thread 5 (Thread 49156 (LWP 23851)):
#0  0x401dab96 in nanosleep () from /lib/libpthread.so.0
#1  0x00000001 in ?? ()
#2  0x401d51d5 in __pthread_timedsuspend_new () from /lib/libpthread.so.0
#3  0x401d21d7 in pthread_cond_timedwait@GLIBC_2.0 () from /lib/libpthread.so.0
#4  0x401a6b64 in PR_Unlock () from ./libnspr4.so
#5  0x401a6d5e in PR_WaitCondVar () from ./libnspr4.so
#6  0x080e1f99 in nsTHashtable<nsBaseHashtableET<nsDepCharHashKey, nsAutoPtr<nsINIParser::INIValue> > >::~nsTHashtable ()
#7  0x401ac531 in PR_Select () from ./libnspr4.so
#8  0x401d2f4c in pthread_start_thread () from /lib/libpthread.so.0
#9  0x401d2fda in pthread_start_thread_event () from /lib/libpthread.so.0
#10 0x40acc8ea in clone () from /lib/libc.so.6

Thread 4 (Thread 32771 (LWP 23850)):
#0  0x401dab96 in nanosleep () from /lib/libpthread.so.0
#1  0x00000001 in ?? ()
#2  0x401d51d5 in __pthread_timedsuspend_new () from /lib/libpthread.so.0
#3  0x401d21d7 in pthread_cond_timedwait@GLIBC_2.0 () from /lib/libpthread.so.0
#4  0x401a6b64 in PR_Unlock () from ./libnspr4.so
#5  0x401a6d5e in PR_WaitCondVar () from ./libnspr4.so
#6  0x4014a67b in TimerThread::UpdateFilter () from ./libxpcom_core.so
#7  0x40147dfb in nsThread::Main () from ./libxpcom_core.so
#8  0x401ac531 in PR_Select () from ./libnspr4.so
#9  0x401d2f4c in pthread_start_thread () from /lib/libpthread.so.0
#10 0x401d2fda in pthread_start_thread_event () from /lib/libpthread.so.0
#11 0x40acc8ea in clone () from /lib/libc.so.6

Thread 3 (Thread 16386 (LWP 23849)):
#0  0x40ac3446 in poll () from /lib/libc.so.6
#1  0x401aaec9 in PR_OpenDir () from ./libnspr4.so
#2  0x080fbced in nsTHashtable<nsBaseHashtableET<nsDepCharHashKey, nsAutoPtr<nsINIParser::INIValue> > >::~nsTHashtable ()
#3  0x080fc286 in nsTHashtable<nsBaseHashtableET<nsDepCharHashKey, nsAutoPtr<nsINIParser::INIValue> > >::~nsTHashtable ()
#4  0x40147dfb in nsThread::Main () from ./libxpcom_core.so
#5  0x401ac531 in PR_Select () from ./libnspr4.so
#6  0x401d2f4c in pthread_start_thread () from /lib/libpthread.so.0
#7  0x401d2fda in pthread_start_thread_event () from /lib/libpthread.so.0
#8  0x40acc8ea in clone () from /lib/libc.so.6

Thread 2 (Thread 32769 (LWP 23848)):
#0  0x40ac3446 in poll () from /lib/libc.so.6
#1  0x401d3514 in __pthread_manager () from /lib/libpthread.so.0
#2  0x401d3e82 in __pthread_manager_event () from /lib/libpthread.so.0
#3  0x40acc8ea in clone () from /lib/libc.so.6

Thread 1 (Thread 16384 (LWP 23845)):
#0  0x40ac3446 in poll () from /lib/libc.so.6
#1  0x4063b57b in g_main_context_check () from /usr/lib/libglib-2.0.so.0
#2  0x4063bbd8 in g_main_loop_run () from /usr/lib/libglib-2.0.so.0
#3  0x40335989 in gtk_main () from /usr/lib/libgtk-x11-2.0.so.0
#4  0x08220038 in XmlInitUnknownEncodingNS ()
#5  0x086bd707 in nsXPTCVariant::Init ()
#6  0x0807cd4e in ?? ()
#7  0x08cfaa60 in ?? ()
#8  0x08a0d8c8 in _IO_stdin_used ()
#9  0x00000000 in ?? ()
#10 0x00000000 in ?? ()
#11 0x00000000 in ?? ()
#12 0x00000000 in ?? ()
#13 0x00000000 in ?? ()
#14 0x00000000 in ?? ()
#15 0x00000000 in ?? ()
#16 0x00000000 in ?? ()
#17 0x00000000 in ?? ()
#18 0x00000000 in ?? ()
#19 0x00000000 in ?? ()
#20 0x00000001 in ?? ()
#21 0x08a0ec60 in nsIFactory::GetIID()::iid ()
#22 0xbffff408 in ?? ()
#23 0x08a0ec60 in nsIFactory::GetIID()::iid ()
#24 0xbffff408 in ?? ()
#25 0x00000000 in ?? ()
#26 0x00000000 in ?? ()
#27 0x08a0ebc0 in nsIFactory::GetIID()::iid ()
#28 0xbffff404 in ?? ()
#29 0x08a0ebc0 in nsIFactory::GetIID()::iid ()
#30 0xbffff404 in ?? ()
#31 0x00000000 in ?? ()
#32 0x00000000 in ?? ()
#33 0x00000000 in ?? ()
#34 0x00000000 in ?? ()
#35 0x40b29a6c in __malloc_initialize_hook () from /lib/libc.so.6
#36 0x00000000 in ?? ()
#37 0x00000001 in ?? ()
#38 0x08c1c0d8 in ?? ()
#39 0x08ec5fd8 in ?? ()
#40 0x00008000 in ?? ()
#41 0x0000000d in ?? ()
#42 0xbffff440 in ?? ()
#43 0x08d58b88 in ?? ()
#44 0x401dcff4 in ?? () from /lib/libpthread.so.0
#45 0x40b29a20 in __malloc_initialize_hook () from /lib/libc.so.6
#46 0x63c1c3e0 in ?? ()
#47 0x08e195d0 in ?? ()
#48 0xbffff460 in ?? ()
#49 0x4000bfd9 in _dl_debug_state () from /lib/ld-linux.so.2
#50 0x08078a14 in ?? ()
#51 0x00000001 in ?? ()
#52 0xbffff684 in ?? ()
#53 0x08a0d940 in _IO_stdin_used ()
#54 0x08a0d895 in crmf_encoder_out ()
#55 0x40a28413 in __libc_start_main () from /lib/libc.so.6
#56 0x08078941 in ?? ()
#0  0x401dab96 in nanosleep () from /lib/libpthread.so.0
*** Bug 260745 has been marked as a duplicate of this bug. ***
*** Bug 316568 has been marked as a duplicate of this bug. ***
(In reply to comment #12)
> The popular RDF feed at http://www.heise.de/newsticker/heise.rdf
> stopped working in 1.5 beta 2. It works fine in 1.07
> and validates without problems at http://www.w3.org/RDF/Validator/
> 

OT, but Heise works with http://www.heise.de/newsticker/heise-atom.xml
(In reply to comment #12)
> The popular RDF feed at http://www.heise.de/newsticker/heise.rdf
> stopped working in 1.5 beta 2.

See bug 313422.
*** Bug 284249 has been marked as a duplicate of this bug. ***
QA Contact: rss
Assignee: sayrer → nobody
Status: ASSIGNED → NEW
I was asked to check if this bug is still present. I am using Thunderbird 2.x now, it seems it no longer contains a RSS reader?
RSS is still present in 2.x. When starting afresh (with no existing profile), or from Tools->Account Settings->Add Account, select "RSS News and Blogs"
Interesting - it seems to be a problem with my particular version of Thunderbird: I am using Debian Icedove 2.0.04.

When using LANG=de_DE I get the german version and in the "new account" dialog I only see:
* E-Mail-Konto
* Newsgruppen-Konto
but nothing related to RSS.

When using no LANG I get the english version of Thunderbird and it shows the following choices:
* Email account
* GMail
* Newsgroups account
and no RSS either.

Anyway, the problem with the RDF 0.9 feed http://www.heise.de/newsticker/heise.rdf was fixed in the separate bug 313422
Component: RSS → Feed Reader
Product: Thunderbird → MailNews Core
Summary: 3 non-working feeds → RSS protocol (not Atom) feeds with no <guid> only, and identical <link> or no <link> and identical <title> are treated as dupes and not stored
Duplicate of this bug: 610671
Duplicate of this bug: 297906
Attached patch badSpecRSS.patch (obsolete) — Splinter Review
the non mandatory uid in the rss2.0 spec is just one of many reasons it should not be used, in favor of Atom.  nevertheless, this is a workaround.

many old urls in related bugs have fixed themselves or gone away, but this one is still alive,
http://www.ci.austin.tx.us/qact/qact_rss.cfm

this needs bug 930118 for non bitrottenness.
Assignee: nobody → alta88
Attachment #823399 - Flags: review?(mkmelin+mozilla)
Comment on attachment 823399 [details] [diff] [review]
badSpecRSS.patch

Review of attachment 823399 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, thx! r=mkmelin

::: mailnews/extensions/newsblog/content/feed-parser.js
@@ +171,5 @@
> +      item.title = this.getNodeValue(tags ? tags[0] : null);
> +      if (!(item.title || item.description))
> +      {
> +        FeedUtils.log.info("FeedParser.parseAsRSS2: <item> missing mandatory " +
> +                           "element, both <title> and <description>; skipping");

you're testing either or, not both

@@ +313,5 @@
> +                     (this.stripTags(item.description).substr(0, 150)) : null);
> +      if (!item.url || !item.title)
> +      {
> +        FeedUtils.log.info("FeedParser.parseAsRSS1: <item> missing mandatory " +
> +                           "element <item rdf:about> and <link>, or <title> and" +

space after and
Attachment #823399 - Flags: review?(mkmelin+mozilla) → review+
Attached patch badSpecRSS.patchSplinter Review
address comments.
Attachment #823399 - Attachment is obsolete: true
Attachment #823586 - Flags: review+
Keywords: checkin-needed
https://hg.mozilla.org/comm-central/rev/37035f4004c2
Status: NEW → RESOLVED
Closed: 7 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 28.0
Attached patch parser.patchSplinter Review
followup, need to add a null check.
Attachment #827538 - Flags: review?(mkmelin+mozilla)
Comment on attachment 827538 [details] [diff] [review]
parser.patch

Review of attachment 827538 [details] [diff] [review]:
-----------------------------------------------------------------

Sure, r=mkmelin
Attachment #827538 - Flags: review?(mkmelin+mozilla) → review+
checkin-needed for Attachment #827538 [details] [diff] followup.
Keywords: checkin-needed
You need to log in before you can comment on or make changes to this bug.