Last Comment Bug 491720 - RSS feeds with non-2xx status should trigger onDownloadError and not be processed
: RSS feeds with non-2xx status should trigger onDownloadError and not be proce...
Status: RESOLVED FIXED
:
Product: MailNews Core
Classification: Components
Component: Feed Reader (show other bugs)
: Trunk
: All All
: -- normal (vote)
: Thunderbird 3.0b3
Assigned To: Rimas Kudelis
:
:
Mentors:
http://www.vu.lt/RQ/testfeed.php
Depends on: 490028
Blocks: 258465
  Show dependency treegraph
 
Reported: 2009-05-06 12:26 PDT by Rimas Kudelis
Modified: 2009-05-28 11:34 PDT (History)
12 users (show)
mkmelin+mozilla: wanted‑thunderbird3+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
Possible patch? (634 bytes, patch)
2009-05-06 13:22 PDT, Rimas Kudelis
no flags Details | Diff | Splinter Review
use onDownloadError instead of onParseError (635 bytes, patch)
2009-05-06 13:25 PDT, Rimas Kudelis
myk: review-
Details | Diff | Splinter Review
pass aEvent instead of this (637 bytes, patch)
2009-05-06 14:38 PDT, Rimas Kudelis
myk: review-
Details | Diff | Splinter Review
Comment #8 addressed (623 bytes, patch)
2009-05-06 23:57 PDT, Rimas Kudelis
myk: review+
Details | Diff | Splinter Review
Unnecessary lines removed from onParseError (1.51 KB, patch)
2009-05-07 13:24 PDT, Rimas Kudelis
myk: review+
Details | Diff | Splinter Review
patch with two comparisons instead of one parseInt (1.53 KB, patch)
2009-05-07 13:57 PDT, Rimas Kudelis
myk: review+
mozilla: superreview+
Details | Diff | Splinter Review

Description Rimas Kudelis 2009-05-06 12:26:29 PDT
This is a copy of Bug 451737 which has been repurposed, and is a specific case spun off bug 258465

--------8<--------

bug 258465 Comment #100 Rimas Kudelis  2006-09-25 05:31:07 PDT

Anyone want to try this feed for testing: http://www.vu.lt/RQ/testfeed.php ? It
consists of three items, and should generate new duplicates every five minutes
(so make sure you update often enough).

The catch here is: Every five minutes the feed changes its HTTP response code
from 200 (OK) to Error 503 (server failure). When simulating a failure, it
doesn't present the user with any entries at all, and this seems like enough to
trigger the duplication when the entries are back after five minutes.

You can see the colorized sourcecode of the file here:
http://www.vu.lt/RQ/testfeed.phps .

bug 258465 Comment #101 Dan Wing  2006-09-25 09:38:48 PDT

I am able to duplicate the problem using the test feed at
http://www.vu.lt/RQ/testfeed.php using Thunderbird 1.5.0.7.  I configured a new
"RSS News & Blogs" account that checks for new articles every 2 minutes and
contains only this test feed.

bug 258465 Comment #102 Magnus Melin  2006-09-25 11:29:22 PDT

I also get duplicate items for that feed (using Mozilla/5.0 (X11; U; Linux
i686; en-US; rv:1.8.1) Gecko/20060925 Thunderbird/2.0b1pre ID:2006092503)

bug 451737 Comment #3 From  Unknown W. Brackets   2008-08-24 14:58:34 PDT 

Well, this is because of invalidateItems().  You see, how it works is this,
step-by-step, as I understand:

1. Feed added to Thunderbird.

2. On first check:
   a. every item is parsed.
   b. one by one added to the datastore (with valid and stored arcs.)

3. On next and subsequent checks:
   a. "valid" is unasserted for each item in datastore.
   b. items are added back into the datastore (by setting valid.)
   c. duplicates are checked for by existance in datastore, with "stored" set.
   d. items that still don't have "valid" are removed.

So, what's happening (easier to reproduce with a feed that alternates between
503/empty and 200/not empty) is that it gets the empty list, clears all the
items out:

http://mxr.mozilla.org/thunderbird/source/mail/extensions/newsblog/content/feed-parser.js#117

And then flushes it (3d above.)  Next time it checks, its duplicate cache is
entirely empty, and so it assumes all items are new.

To fix this, I think the following might help:

1. AFAICT Feed seems to expect that a 503 will call onerror instead of onload,
but I'm sure it doesn't check the status code.  A good place might be in
onDownloaded or here:

http://mxr.mozilla.org/mozilla/source/mail/extensions/newsblog/content/Feed.js#357

2. The clearing I mentioned (invalidateItems call) happens even if the feed is
entirely empty.  I assume this is done to not store things indefinitely, so not
clearing it for an empty feed would help.

3. Feeds can very - they might come from load balanced servers, future items
may be removed (bringing back older ones), etc.  It would probably solve *a
lot* of duplication problems to simply do expiration by date (e.g. wait a day
or a week.)  I'm not sure what's best there, though... doing it immediately
seems clearly brittle to me.

I thought I'd make this comment in case it helps.  I'm not familiar with the
code really at all, but if anyone can confirm my understanding a bit (top
level) I'm happy to submit a patch (just have to figure out mercurial...)

I won't have time to test and write solution #3 above in the next week (I'm a
busy guy like most) but #1 and #2 are very easy.

-[Unknown]

bug 451737 Comment #10 From  Myk Melez [:myk]   2009-05-04 11:56:32 PDT
(In reply to comment #3)
> 1. AFAICT Feed seems to expect that a 503 will call onerror instead of onload,
> but I'm sure it doesn't check the status code.  A good place might be in
> onDownloaded or here:

That'd work for 503 errors, and we should probably do it, but it won't solve
the problem for other cases of intermittent feed items that don't register an
HTTP failure, which I've noticed in a variety of feeds (particularly Google
News ones).


> 2. The clearing I mentioned (invalidateItems call) happens even if the feed is
> entirely empty.  I assume this is done to not store things indefinitely, so not
> clearing it for an empty feed would help.

Hmm, that seems problematic given that feeds can be legitimately empty.

(In reply to comment #5)
> In other words, it never looks at the actual messages to determine uniqueness,

Indeed, and another solution to this problem might be to also check actual
messages, although that has its own problems, since users can delete messages
after reading them, and we don't want those messages to reappear.  Forumzilla
used to do an additional check like this, but I'm not sure it was worth it.

bug 451737 Comment #0 From  Tuukka Tolvanen (sp3000)   2008-08-22 10:46:06 PDT

Also reproducible in 20080821225349 Thunderbird/3.0b1pre. The two hints that
the feed fetch is bad are the error response and the feed content having 0
items; the latter may be a more generally useful hint that whatever index
flushing happens shouldn't happen (if that's roughly how it goes, haven't
looked at this part of the code).
Comment 1 Rimas Kudelis 2009-05-06 12:33:15 PDT
adding everyone from the original bug to the CC list here.
Comment 2 Rimas Kudelis 2009-05-06 12:42:19 PDT
I wonder – would fixing this be as easy as adding a check of this.request.status somewhere around Line 357 of Feed.js ( http://hg.mozilla.org/comm-central/annotate/a90f0b8c7a0c/mailnews/extensions/newsblog/content/Feed.js#l357 )? If so, what status codes are acceptable?
Comment 3 Myk Melez [:myk] [@mykmelez] 2009-05-06 12:56:05 PDT
Thanks for filing this bug!

(In reply to comment #2)
> I wonder – would fixing this be as easy as adding a check of
> this.request.status somewhere around Line 357 of Feed.js (
> http://hg.mozilla.org/comm-central/annotate/a90f0b8c7a0c/mailnews/extensions/newsblog/content/Feed.js#l357
> )?

I would actually add the check in onDownloaded <http://hg.mozilla.org/comm-central/annotate/a90f0b8c7a0c/mailnews/extensions/newsblog/content/Feed.js#l177> and have it simply pass the event to onDownloadError if the status code indicates failure.


> If so, what status codes are acceptable?

Anything in the 200 range (i.e. 2xx, where xx are any two digits).
Comment 4 Rimas Kudelis 2009-05-06 13:22:16 PDT
Created attachment 376066 [details] [diff] [review]
Possible patch?

I'm not a Mozilla source hacker, but maybe this would be sufficient?
Comment 5 Rimas Kudelis 2009-05-06 13:25:28 PDT
Created attachment 376068 [details] [diff] [review]
use onDownloadError instead of onParseError

whoops, used wrong function
Comment 6 Myk Melez [:myk] [@mykmelez] 2009-05-06 14:16:38 PDT
Comment on attachment 376068 [details] [diff] [review]
use onDownloadError instead of onParseError

>--- a/Feed.js   2009-05-06 22:59:42.000000000 +0300
>+++ b/Feed.js   2009-05-06 23:21:06.000000000 +0300
>@@ -177,10 +177,15 @@
>   onDownloaded: function(aEvent)
>   {
>     var request = aEvent.target;
>     var url = request.channel.originalURI.spec;
>     debug(url + " downloaded");
>+    if(parseInt(request.status/100) != 2)
>+    {
>+      this.onDownloadError(this);
>+      return;
>+    }

Close!  onDownloadError takes an event parameter, so you need to pass aEvent to it, i.e. |this.onDownloadError(aEvent)|.
Comment 7 Rimas Kudelis 2009-05-06 14:38:13 PDT
Created attachment 376078 [details] [diff] [review]
pass aEvent instead of this

there we go
Comment 8 Myk Melez [:myk] [@mykmelez] 2009-05-06 15:32:07 PDT
Comment on attachment 376078 [details] [diff] [review]
pass aEvent instead of this


Oof, upon testing this patch I realized that onDownloaded is actually being called as an independent callback function, even though it's defined as a property of the Feed prototype, which makes it look like a method of the Feed object (which, as the original author of this code, is totally my bad).

As a result, |this| doesn't mean what it looks like it would mean within onDownloaded, and |this.onDownloadError(aEvent)| fails.  Instead, onDownloaded has to call onDownloadError as |Feed.prototype.onDownloadError(aEvent)|.


Also, anticipating super-review, a couple nits:

>+    if(parseInt(request.status/100) != 2)

Put a space between if and the opening parenthesis.


>+    {
>+      this.onDownloadError(aEvent);
>+      return;
>+    }

This can simply return the result of the function call, making it a one liner that doesn't need braces, i.e.:

      return Feed.prototype.onDownloadError(aEvent);
Comment 9 Magnus Melin 2009-05-06 22:28:24 PDT
Shouldn't at least some 3xx codes be ok too ? Esp. "304 Not Modified"
Comment 10 Myk Melez [:myk] [@mykmelez] 2009-05-06 22:39:27 PDT
(In reply to comment #9)
> Shouldn't at least some 3xx codes be ok too ? Esp. "304 Not Modified"

Per this test page <http://www.mnot.net/javascript/xmlhttprequest/>, Gecko's implementation of XMLHttpRequest handles 301, 302, 303, and 307 automatically.  Feed::onDownloadError itself handles 304, so if a 304 happens to trigger onDownloaded instead of onDownloadError, then this patch will actually fix our handling of it, since it'll make onDownloaded redirect processing of that response to onDownloadError.
Comment 11 Rimas Kudelis 2009-05-06 23:57:54 PDT
Created attachment 376190 [details] [diff] [review]
Comment #8 addressed

Hope this one is better. :)
Comment 12 Rimas Kudelis 2009-05-07 00:02:30 PDT
Hm, I think this allows us to remove lines 231-233 and 236 since feeds with 304 won't reach the parser now. Am I correct?
Comment 13 Myk Melez [:myk] [@mykmelez] 2009-05-07 10:54:37 PDT
Comment on attachment 376190 [details] [diff] [review]
Comment #8 addressed

This looks and works great! r=myk

(In reply to comment #12)
> Hm, I think this allows us to remove lines 231-233 and 236 since feeds with 304
> won't reach the parser now. Am I correct?

Yup, good catch!  Can you update your patch to include that change?
Comment 14 Rimas Kudelis 2009-05-07 13:24:43 PDT
Created attachment 376300 [details] [diff] [review]
Unnecessary lines removed from onParseError

OK, I refactored that part a little too. Myk, re-requesting your review just in case, and also asking for sr from bienvenu.
Comment 15 Myk Melez [:myk] [@mykmelez] 2009-05-07 13:56:46 PDT
Comment on attachment 376300 [details] [diff] [review]
Unnecessary lines removed from onParseError

This looks and works great. r=myk
Comment 16 Rimas Kudelis 2009-05-07 13:57:55 PDT
Created attachment 376309 [details] [diff] [review]
patch with two comparisons instead of one parseInt

I don't quite like that parseInt line, so here's an alternate version with two comparisons instead. Choose which one's better.
Comment 17 Myk Melez [:myk] [@mykmelez] 2009-05-07 14:03:32 PDT
Comment on attachment 376309 [details] [diff] [review]
patch with two comparisons instead of one parseInt

I like this version better, too.  It's easier to understand at a glance and is similar to the way we do HTTP status comparisons elsewhere in the code. r=myk
Comment 18 Myk Melez [:myk] [@mykmelez] 2009-05-07 14:06:11 PDT
Comment on attachment 376309 [details] [diff] [review]
patch with two comparisons instead of one parseInt


>+    if ((request.status < 200) || (request.status >= 300))

Note that you don't need parentheses around the individual components of this conditional.
Comment 19 Rimas Kudelis 2009-05-07 14:08:58 PDT
Yeah, I wasn't sure. Fixed locally, will commit without them:

+    if (request.status < 200 || request.status >= 300)
+      return Feed.prototype.onDownloadError(aEvent);
Comment 20 David :Bienvenu 2009-05-08 06:56:50 PDT
Comment on attachment 376309 [details] [diff] [review]
patch with two comparisons instead of one parseInt

If I'm reading this correctly, you don't need the temp var error anymore:

+      var error = kNewsBlogInvalidFeed;
+      aFeed.mInvalidFeed = true;
 
       if (aFeed.downloadCallback)
         aFeed.downloadCallback.downloaded(aFeed, error);
 
       FeedCache.removeFeed(aFeed.url);

so you could simply pass in kNewsBlogInvalidFeed to downloaded.

sr=me with that fixed.
Comment 21 Rimas Kudelis 2009-05-08 07:09:32 PDT
Thanks for the tip, David!

Committed as rev. cb36617b6a11 and merged as rev. 718d8ca1cdbe. Hope everything's fine.
Comment 22 Ludovic Hirlimann [:Usul] 2009-05-08 07:11:49 PDT
don't forget to set the taret milestone when fixing ...
Comment 23 Rimas Kudelis 2009-05-08 07:16:20 PDT
OK, thanks! :)
Comment 24 Magnus Melin 2009-05-08 12:19:45 PDT
(In reply to comment #21)
> Committed as rev. cb36617b6a11 and merged as rev. 718d8ca1cdbe. 

It's usually preferred to avoid merge turds, so pull before committing.
Comment 25 Serge Gautherie (:sgautherie) 2009-05-28 09:29:50 PDT
Reopening: see bug 490028 discussion.
Please, could someone help Rimas?
Comment 26 Magnus Melin 2009-05-28 10:48:30 PDT
No reason to reopen.

Note You need to log in before you can comment on or make changes to this bug.