Closed Bug 491720 Opened 15 years ago Closed 15 years ago

RSS feeds with non-2xx status should trigger onDownloadError and not be processed

Categories

(MailNews Core :: Feed Reader, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 3.0b3

People

(Reporter: rimas, Assigned: rimas)

References

()

Details

Attachments

(1 file, 5 obsolete files)

This is a copy of Bug 451737 which has been repurposed, and is a specific case spun off bug 258465

--------8<--------

bug 258465 Comment #100 Rimas Kudelis  2006-09-25 05:31:07 PDT

Anyone want to try this feed for testing: http://www.vu.lt/RQ/testfeed.php ? It
consists of three items, and should generate new duplicates every five minutes
(so make sure you update often enough).

The catch here is: Every five minutes the feed changes its HTTP response code
from 200 (OK) to Error 503 (server failure). When simulating a failure, it
doesn't present the user with any entries at all, and this seems like enough to
trigger the duplication when the entries are back after five minutes.

You can see the colorized sourcecode of the file here:
http://www.vu.lt/RQ/testfeed.phps .

bug 258465 Comment #101 Dan Wing  2006-09-25 09:38:48 PDT

I am able to duplicate the problem using the test feed at
http://www.vu.lt/RQ/testfeed.php using Thunderbird 1.5.0.7.  I configured a new
"RSS News & Blogs" account that checks for new articles every 2 minutes and
contains only this test feed.

bug 258465 Comment #102 Magnus Melin  2006-09-25 11:29:22 PDT

I also get duplicate items for that feed (using Mozilla/5.0 (X11; U; Linux
i686; en-US; rv:1.8.1) Gecko/20060925 Thunderbird/2.0b1pre ID:2006092503)

bug 451737 Comment #3 From  Unknown W. Brackets   2008-08-24 14:58:34 PDT 

Well, this is because of invalidateItems().  You see, how it works is this,
step-by-step, as I understand:

1. Feed added to Thunderbird.

2. On first check:
   a. every item is parsed.
   b. one by one added to the datastore (with valid and stored arcs.)

3. On next and subsequent checks:
   a. "valid" is unasserted for each item in datastore.
   b. items are added back into the datastore (by setting valid.)
   c. duplicates are checked for by existance in datastore, with "stored" set.
   d. items that still don't have "valid" are removed.

So, what's happening (easier to reproduce with a feed that alternates between
503/empty and 200/not empty) is that it gets the empty list, clears all the
items out:

http://mxr.mozilla.org/thunderbird/source/mail/extensions/newsblog/content/feed-parser.js#117

And then flushes it (3d above.)  Next time it checks, its duplicate cache is
entirely empty, and so it assumes all items are new.

To fix this, I think the following might help:

1. AFAICT Feed seems to expect that a 503 will call onerror instead of onload,
but I'm sure it doesn't check the status code.  A good place might be in
onDownloaded or here:

http://mxr.mozilla.org/mozilla/source/mail/extensions/newsblog/content/Feed.js#357

2. The clearing I mentioned (invalidateItems call) happens even if the feed is
entirely empty.  I assume this is done to not store things indefinitely, so not
clearing it for an empty feed would help.

3. Feeds can very - they might come from load balanced servers, future items
may be removed (bringing back older ones), etc.  It would probably solve *a
lot* of duplication problems to simply do expiration by date (e.g. wait a day
or a week.)  I'm not sure what's best there, though... doing it immediately
seems clearly brittle to me.

I thought I'd make this comment in case it helps.  I'm not familiar with the
code really at all, but if anyone can confirm my understanding a bit (top
level) I'm happy to submit a patch (just have to figure out mercurial...)

I won't have time to test and write solution #3 above in the next week (I'm a
busy guy like most) but #1 and #2 are very easy.

-[Unknown]

bug 451737 Comment #10 From  Myk Melez [:myk]   2009-05-04 11:56:32 PDT
(In reply to comment #3)
> 1. AFAICT Feed seems to expect that a 503 will call onerror instead of onload,
> but I'm sure it doesn't check the status code.  A good place might be in
> onDownloaded or here:

That'd work for 503 errors, and we should probably do it, but it won't solve
the problem for other cases of intermittent feed items that don't register an
HTTP failure, which I've noticed in a variety of feeds (particularly Google
News ones).


> 2. The clearing I mentioned (invalidateItems call) happens even if the feed is
> entirely empty.  I assume this is done to not store things indefinitely, so not
> clearing it for an empty feed would help.

Hmm, that seems problematic given that feeds can be legitimately empty.

(In reply to comment #5)
> In other words, it never looks at the actual messages to determine uniqueness,

Indeed, and another solution to this problem might be to also check actual
messages, although that has its own problems, since users can delete messages
after reading them, and we don't want those messages to reappear.  Forumzilla
used to do an additional check like this, but I'm not sure it was worth it.

bug 451737 Comment #0 From  Tuukka Tolvanen (sp3000)   2008-08-22 10:46:06 PDT

Also reproducible in 20080821225349 Thunderbird/3.0b1pre. The two hints that
the feed fetch is bad are the error response and the feed content having 0
items; the latter may be a more generally useful hint that whatever index
flushing happens shouldn't happen (if that's roughly how it goes, haven't
looked at this part of the code).
Flags: wanted-thunderbird3?
adding everyone from the original bug to the CC list here.
I wonder – would fixing this be as easy as adding a check of this.request.status somewhere around Line 357 of Feed.js ( http://hg.mozilla.org/comm-central/annotate/a90f0b8c7a0c/mailnews/extensions/newsblog/content/Feed.js#l357 )? If so, what status codes are acceptable?
Thanks for filing this bug!

(In reply to comment #2)
> I wonder – would fixing this be as easy as adding a check of
> this.request.status somewhere around Line 357 of Feed.js (
> http://hg.mozilla.org/comm-central/annotate/a90f0b8c7a0c/mailnews/extensions/newsblog/content/Feed.js#l357
> )?

I would actually add the check in onDownloaded <http://hg.mozilla.org/comm-central/annotate/a90f0b8c7a0c/mailnews/extensions/newsblog/content/Feed.js#l177> and have it simply pass the event to onDownloadError if the status code indicates failure.


> If so, what status codes are acceptable?

Anything in the 200 range (i.e. 2xx, where xx are any two digits).
Attached patch Possible patch? (obsolete) — Splinter Review
I'm not a Mozilla source hacker, but maybe this would be sufficient?
Attachment #376066 - Flags: review?(myk)
whoops, used wrong function
Attachment #376066 - Attachment is obsolete: true
Attachment #376068 - Flags: review?(myk)
Attachment #376066 - Flags: review?(myk)
Attachment #376068 - Flags: review?(myk) → review-
Comment on attachment 376068 [details] [diff] [review]
use onDownloadError instead of onParseError

>--- a/Feed.js   2009-05-06 22:59:42.000000000 +0300
>+++ b/Feed.js   2009-05-06 23:21:06.000000000 +0300
>@@ -177,10 +177,15 @@
>   onDownloaded: function(aEvent)
>   {
>     var request = aEvent.target;
>     var url = request.channel.originalURI.spec;
>     debug(url + " downloaded");
>+    if(parseInt(request.status/100) != 2)
>+    {
>+      this.onDownloadError(this);
>+      return;
>+    }

Close!  onDownloadError takes an event parameter, so you need to pass aEvent to it, i.e. |this.onDownloadError(aEvent)|.
Attached patch pass aEvent instead of this (obsolete) — Splinter Review
there we go
Attachment #376068 - Attachment is obsolete: true
Attachment #376078 - Flags: review?(myk)
Comment on attachment 376078 [details] [diff] [review]
pass aEvent instead of this


Oof, upon testing this patch I realized that onDownloaded is actually being called as an independent callback function, even though it's defined as a property of the Feed prototype, which makes it look like a method of the Feed object (which, as the original author of this code, is totally my bad).

As a result, |this| doesn't mean what it looks like it would mean within onDownloaded, and |this.onDownloadError(aEvent)| fails.  Instead, onDownloaded has to call onDownloadError as |Feed.prototype.onDownloadError(aEvent)|.


Also, anticipating super-review, a couple nits:

>+    if(parseInt(request.status/100) != 2)

Put a space between if and the opening parenthesis.


>+    {
>+      this.onDownloadError(aEvent);
>+      return;
>+    }

This can simply return the result of the function call, making it a one liner that doesn't need braces, i.e.:

      return Feed.prototype.onDownloadError(aEvent);
Attachment #376078 - Flags: review?(myk) → review-
Shouldn't at least some 3xx codes be ok too ? Esp. "304 Not Modified"
Assignee: nobody → rq
(In reply to comment #9)
> Shouldn't at least some 3xx codes be ok too ? Esp. "304 Not Modified"

Per this test page <http://www.mnot.net/javascript/xmlhttprequest/>, Gecko's implementation of XMLHttpRequest handles 301, 302, 303, and 307 automatically.  Feed::onDownloadError itself handles 304, so if a 304 happens to trigger onDownloaded instead of onDownloadError, then this patch will actually fix our handling of it, since it'll make onDownloaded redirect processing of that response to onDownloadError.
Attached patch Comment #8 addressed (obsolete) — Splinter Review
Hope this one is better. :)
Attachment #376078 - Attachment is obsolete: true
Attachment #376190 - Flags: superreview?
Attachment #376190 - Flags: review?(myk)
Hm, I think this allows us to remove lines 231-233 and 236 since feeds with 304 won't reach the parser now. Am I correct?
Comment on attachment 376190 [details] [diff] [review]
Comment #8 addressed

This looks and works great! r=myk

(In reply to comment #12)
> Hm, I think this allows us to remove lines 231-233 and 236 since feeds with 304
> won't reach the parser now. Am I correct?

Yup, good catch!  Can you update your patch to include that change?
Attachment #376190 - Flags: review?(myk) → review+
OK, I refactored that part a little too. Myk, re-requesting your review just in case, and also asking for sr from bienvenu.
Attachment #376190 - Attachment is obsolete: true
Attachment #376300 - Flags: superreview?
Attachment #376300 - Flags: review?
Attachment #376190 - Flags: superreview?
Attachment #376300 - Flags: superreview?(bienvenu)
Attachment #376300 - Flags: superreview?
Attachment #376300 - Flags: review?(myk)
Attachment #376300 - Flags: review?
Attachment #376300 - Flags: review?(myk) → review+
Comment on attachment 376300 [details] [diff] [review]
Unnecessary lines removed from onParseError

This looks and works great. r=myk
I don't quite like that parseInt line, so here's an alternate version with two comparisons instead. Choose which one's better.
Attachment #376309 - Flags: superreview?(bienvenu)
Attachment #376309 - Flags: review?(myk)
Comment on attachment 376309 [details] [diff] [review]
patch with two comparisons instead of one parseInt

I like this version better, too.  It's easier to understand at a glance and is similar to the way we do HTTP status comparisons elsewhere in the code. r=myk
Attachment #376309 - Flags: review?(myk) → review+
Attachment #376300 - Attachment is obsolete: true
Attachment #376300 - Flags: superreview?(bienvenu)
Comment on attachment 376309 [details] [diff] [review]
patch with two comparisons instead of one parseInt


>+    if ((request.status < 200) || (request.status >= 300))

Note that you don't need parentheses around the individual components of this conditional.
Yeah, I wasn't sure. Fixed locally, will commit without them:

+    if (request.status < 200 || request.status >= 300)
+      return Feed.prototype.onDownloadError(aEvent);
Comment on attachment 376309 [details] [diff] [review]
patch with two comparisons instead of one parseInt

If I'm reading this correctly, you don't need the temp var error anymore:

+      var error = kNewsBlogInvalidFeed;
+      aFeed.mInvalidFeed = true;
 
       if (aFeed.downloadCallback)
         aFeed.downloadCallback.downloaded(aFeed, error);
 
       FeedCache.removeFeed(aFeed.url);

so you could simply pass in kNewsBlogInvalidFeed to downloaded.

sr=me with that fixed.
Attachment #376309 - Flags: superreview?(bienvenu) → superreview+
Thanks for the tip, David!

Committed as rev. cb36617b6a11 and merged as rev. 718d8ca1cdbe. Hope everything's fine.
Summary: feed that fails intermittently as empty feed with 503 causes duplicate items → RSS feeds with non-2xx status should trigger onDownloadError and not be processed
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
don't forget to set the taret milestone when fixing ...
Target Milestone: --- → Thunderbird 3.0b3
OK, thanks! :)
(In reply to comment #21)
> Committed as rev. cb36617b6a11 and merged as rev. 718d8ca1cdbe. 

It's usually preferred to avoid merge turds, so pull before committing.
Flags: wanted-thunderbird3? → wanted-thunderbird3+
Depends on: 490028
Blocks: 490028
No longer depends on: 490028
Reopening: see bug 490028 discussion.
Please, could someone help Rimas?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Version: unspecified → Trunk
No reason to reopen.
Status: REOPENED → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → FIXED
No longer blocks: 490028
Depends on: 490028
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: