Closed Bug 309085 Opened 19 years ago Closed 18 years ago

False positives from FeedHandler.harvestFeeds hide distinct feeds

Categories

(Firefox Graveyard :: RSS Discovery and Preview, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
Firefox 3 alpha1

People

(Reporter: philor, Assigned: sayrer)

References

()

Details

Attachments

(1 file)

FeedHandler.harvestFeeds() tries too hard to hide "the same feed content in
different feed formats" listings, and as a result hides feeds with different
content, when the distribution of number of feeds in each type happens to be equal.

http://plasmastrum.org has autodiscovery links for two feeds:

<link rel="alternate" type="application/atom+xml" href="/syn/" title="Full text" />
<link rel="alternate" type="application/rss+xml"
href="http://del.icio.us/rss/ap/!log" title="Link log" />

Anyone trying to subscribe to the "Link log" feed will need to copy the URL out
of the source, open the Bookmark Manager, find the Add Live Bookmark item in the
File menu, and paste it in.

Suggested alternate algorithm:

1. strip (case-insensitively) "rss" and "atom" and all numbers from link titles
2. if all the titles in each type are also in the other types, return just the
first type (or, a preferred type if we actually have a reason to prefer one).

e.g., for

<link rel="alternate" type="application/rss+xml" title="Entries in RSS 2.0">
<link rel="alternate" type="application/atom+xml" title="Entries in Atom 1.0">
<link rel="alternate" type="application/rss+xml" title="Comments in RSS 2.0">
<link rel="alternate" type="application/atom+xml" title="Comments in Atom 1.0">

once the titles are stripped down to where
feedHash["application/rss+xml"][0].title is "Entries in ", every title in
feedHash["application/rss+xml"] is also in feedHash["application/atom+xml"], and
we can (more) safely assume they only differ in format.
Flags: blocking1.8b5?
Is this on trunk or on branch? I can't load the example URL, and on my weblog
(http://www.beltzner.ca/mike) the harvester reports ALL of the RSS feeds,
seemingly not stripping out any.
Trunk and branch. harvestFeeds bails on hiding feeds if there's a number
mismatch: you have two application/rss+xml and one application/atom+xml, so you
don't get hidden. I can upload a trivial testcase, if you want.
D'oh, no wonder you can't load it, I typoed it in comment 0 (the URL field's
correct).
Flags: blocking1.8b5? → blocking1.8b5+
Yeah, that sounds good to me. What do you think of also checking to ensure that
the path is the same between them? I've run across a few sites where no title
attribute is given (it's not required, I don't think) but the pathnames were
different. So ..:

<link rel="alternate" type="application/rss+xml" title="RSS Feed"
href="/news/rss.xml">
<link rel="alternate" type="application/rss+xml" title="RSS Feed"
href="/sports/rss.xml">

So:

1. Remove "RSS", "ATOM", "{0-9}" and "." from the title attribute
2. Remove everything up to the last "/" in the href attribute
3. If all the href attribute values are the same and all the title attributes
are the same, then optimize the feed list, selecting ATOM* if it exists or the
first available feed otherwise.
4. Else, return all feeds.
Flags: blocking1.8b5+ → blocking1.8b5?
oops - a bugzilla overwrite thingie changed the blocking request back to ? from
+ ... my bad! (mtschrep had granted blocking previous to my screwup.)
Flags: blocking1.8b5? → blocking1.8b5+
Unless we see a safe patch pretty quickly, this isn't gonna make it. If you do
get a patch, please request reviews and approval to land on the branch. Thanks.
Flags: blocking1.8b5+ → blocking1.8b5-
Target Milestone: --- → Firefox1.6-
(In reply to comment #4)
> Yeah, that sounds good to me. What do you think of also checking to ensure that
> the path is the same between them? I've run across a few sites where no title
> attribute is given (it's not required, I don't think) but the pathnames were
> different. So ..:
> 
> <link rel="alternate" type="application/rss+xml" title="RSS Feed"
> href="/news/rss.xml">
> <link rel="alternate" type="application/rss+xml" title="RSS Feed"
> href="/sports/rss.xml">
> 
> So:
> 
> 1. Remove "RSS", "ATOM", "{0-9}" and "." from the title attribute
> 2. Remove everything up to the last "/" in the href attribute
> 3. If all the href attribute values are the same and all the title attributes
> are the same, then optimize the feed list, selecting ATOM* if it exists or the
> first available feed otherwise.
> 4. Else, return all feeds.

But your suggest number 2 negates your suggestion to check the path. If you remove all the href up to the last "/" then in fact your only going to check the filename. What if there:

http://www.example.com/club/sports.xml
http://www.example.com/professional/sports.xml

Your step 2 would make these the same when they clearly aren't.
A test case I have used:

<html>
<head>
	<title>Page Title</title>
	<link rel="alternate" type="application/atom+xml" title="no.feed.here" href="http://no.feed.here"  />
	<link rel="alternate" type="application/rss+xml" title="this.is.a.test" href="http://this.is.a.test"  />
</head>
<body>Web Page</body></html>


shows that currently Firefox (and Minefield) don't seem to test the titles & hrefs correctly at all. The listed feeds have completely different text and hrefs yet only the atom one is shown. What also doesn't make sense is if you add a third feed.

e.g.

<html>
<head>
	<title>Page Title</title>
	<link rel="alternate" type="application/atom+xml" title="no.feed.here" href="http://no.feed.here"  />
	<link rel="alternate" type="application/rss+xml" title="this.is.a.test" href="http://this.is.a.test"  />
	<link rel="alternate" type="application/rss+xml" title="now.there.are.three" href="http://now.there.are.three"  />
</head>
<body>Web Page</body></html>

Then the feed that was being hidden is now shown again, along with the others. This suggests it is more dependent on finding even numbers of feeds where one type is rss and one type is atom than anything else. 

This can be further confirmed by adding a fourth:

<html>
<head>
	<title>Page Title</title>
	<link rel="alternate" type="application/atom+xml" title="no.feed.here" href="http://no.feed.here"  />
	<link rel="alternate" type="application/rss+xml" title="this.is.a.test" href="http://this.is.a.test"  />	
	<link rel="alternate" type="application/rss+xml" title="now.there.are.three" href="http://now.there.are.three"  />
	<link rel="alternate" type="application/atom+xml" title="oh.wait.four" href="http://oh.wait.four"  />
</head>
<body>Web Page</body></html>

which then causes both rss feeds to disappear. Then adding a fifth:

<html>
<head>
	<title>Page Title</title>
	<link rel="alternate" type="application/atom+xml" title="no.feed.here" href="http://no.feed.here"  />
	<link rel="alternate" type="application/rss+xml" title="this.is.a.test" href="http://this.is.a.test"  />	
	<link rel="alternate" type="application/rss+xml" title="now.there.are.three" href="http://now.there.are.three"  />
	<link rel="alternate" type="application/atom+xml" title="oh.wait.four" href="http://oh.wait.four"  />
	<link rel="alternate" type="application/rss+xml" title="man.this.is.a.lot.of.feeds" href="http://man.this.is.a.lot.of.feeds"  />
</head>
<body>Web Page</body></html>

which shows all of them again.
QA Contact: nobody → rss.preview
Target Milestone: Firefox 2 → ---
This "AI" is causing silent dataloss all over the place. IE7 will show two discovered links when the page contains two link elements, even if they are exactly the same in every respect (including URI). We should delete the code that does this.
Assignee: nobody → sayrer
Case 1: Two link elements, different formats, different titles, different URIs
----------------------------------------------
http://www.franklinmint.fm/2006/12/09/1.html

IE7: two feeds
Fx2: one feed


Case 2: Three link elements, different titles, different URIs, 1 Atom, 2 RSS
----------------------------------------------
http://www.franklinmint.fm/2006/12/09/2.html

IE7: three feeds
Fx2: three feeds

Case 3: Four feeds, 2 Atom, 2 RSS, different URLs, different titles
----------------------------------------------
http://www.franklinmint.fm/2006/12/09/3.html

IE7: four feeds
Fx2: two feeds

Case 4: Five feeds, 3 RSS, 2 Atom, different titles, different URLs
----------------------------------------------
http://www.franklinmint.fm/2006/12/09/4.html

IE7: five feeds
Fx2: five feeds

Case 5: 2 feeds, different titles, same URL, different formats
----------------------------------------------
http://www.franklinmint.fm/2006/12/09/5.html

IE7: 2 feeds
Fx2: 1 feed


Case 6: http://plasmasturm.org case
----------------------------------------------
http://www.franklinmint.fm/2006/12/09/6.html

IE7: 2 feeds
Fx2: 1 feed

Case 7: two feeds, identical in all respects
----------------------------------------------
http://www.franklinmint.fm/2006/12/09/7.html

IE7: 2 feeds
Fx2: 1 feed
this will show multiple feeds in cases where we didn't used to, but it gives IE compat and no prospect of endless AI bugs.
Attachment #248124 - Flags: ui-review?(beltzner)
Attachment #248124 - Flags: review?(gavin.sharp)
Attachment #248124 - Flags: review?(gavin.sharp) → review+
Comment on attachment 248124 [details] [diff] [review]
remove cleverness

So, this is ui-r=beltzner assuming that there will be a follow-up bug that tries to restore some type of duplicate stripping. While I agree that dataloss is bad, and would rather show duplicates than make it look like feeds are missing, we also shouldn't be showing users a bunch of the same thing and asking them to choose between meaningless data formats.
Attachment #248124 - Flags: ui-review?(beltzner) → ui-review+
Target Milestone: --- → Firefox 3 alpha1
Checking in browser.js;
/cvsroot/mozilla/browser/base/content/browser.js,v  <--  browser.js
new revision: 1.746; previous revision: 1.745
done
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
file bug 363968.
Product: Firefox → Firefox Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: