Closed Bug 782285 Opened 12 years ago Closed 12 years ago

Reader Mode is offered for inappropriate websites

Categories

(Firefox for Android Graveyard :: Reader View, defect)

ARM
Android
defect
Not set
normal

Tracking

(firefox16 verified, firefox17 fixed)

VERIFIED FIXED
Firefox 17
Tracking Status
firefox16 --- verified
firefox17 --- fixed

People

(Reporter: xti, Unassigned)

References

Details

Attachments

(2 files, 2 obsolete files)

Attached image screenshot
Firefox 17.0a1 (2012-08-13)
Device: Galaxy Nexus
OS: Android 4.1.1

Steps to reproduce:
1. Open Fennec
2. Go to http://www.youtube.com/watch?v=hu2rRqFazuA (YouTube desktop site)
3. Tap on the video to play it
4. Long tap on the video
5. Select "Pop up" from the context menu
6. Check if Reader Mode icon is displayed for the new opened tab

Expected result:
The Reader Mode is not offered for inappropriate pages.

Actual result:
The Reader Mode icon is displayed in URL Bar for the tab opened at step 5 as you can see in the attached screenshot.
Blocks: 779796
Blocks: reader
No longer blocks: 779796
Attachment #653757 - Flags: review?(mark.finkle)
Attachment #653757 - Flags: review?
Attachment #653757 - Flags: feedback?(bnicholson)
This patch greatly improves the readability check results for pages that are not articles. Pages like nytimes.com/technology or http://edition.cnn.com/WORLD should not have reader enabled. Here's a test build:

http://dl.dropbox.com/u/1187037/better-reader-check.apk
Comment on attachment 653757 [details] [diff] [review]
Don't offer reader in pages with too much reading competition

+        for (let t = 0; t < this.N_TOP_CANDIDATES; t++) {
+          let aTopCandidate = topCandidates[t];
+
+          if (!aTopCandidate || candidateScore > aTopCandidate.readability.contentScore) {
+            if (t + 1 < this.N_TOP_CANDIDATES && aTopCandidate)
+              topCandidates[t + 1] = aTopCandidate;
+
+            topCandidates[t] = candidate;
+            break;
+          }

Wouldn't it be better to shift the list of top candidates rather than just replacing the next candidate? Otherwise, if you have a list of scores [20,15,3,2,1], and you try to add a candidate with score 25, you'd end up with [25,20,3,2,1], whereas I think you'd want [25,20,15,3,2]. Or if you read a page of candidates 1,2,3,4,5, you'd end up with [5,4], but I think you'd want [5,4,3,2,1].


It looks like this breaks some pages that should be articles, e.g.:
* http://en.m.wikipedia.org/wiki/Mozilla
* http://mobile.slate.com/posts/2012/06/22/new_law_threatens_mississippi_s_last_abortion_clinic.html

It also doesn't filter out Google search queries, which I would have expected to fail, e.g.:
http://www.google.com/search?q=query&ie=utf-8&oe=utf-8&channel=fs

For readability, I think it's better to err on the side of having more false positives as opposed to more false negatives. Showing the icon for unreadable pages is annoying, but not showing the icon for readable pages reduces defeats the purpose of reader mode. Having a competition algorithm is a good idea, but it's probably better to tweak it more first to make sure we have as few false negatives as possible.
Attachment #653757 - Flags: feedback?(bnicholson) → feedback-
(In reply to Brian Nicholson (:bnicholson) from comment #3)
> Comment on attachment 653757 [details] [diff] [review]
> Don't offer reader in pages with too much reading competition
> 
> +        for (let t = 0; t < this.N_TOP_CANDIDATES; t++) {
> +          let aTopCandidate = topCandidates[t];
> +
> +          if (!aTopCandidate || candidateScore >
> aTopCandidate.readability.contentScore) {
> +            if (t + 1 < this.N_TOP_CANDIDATES && aTopCandidate)
> +              topCandidates[t + 1] = aTopCandidate;
> +
> +            topCandidates[t] = candidate;
> +            break;
> +          }
> 
> Wouldn't it be better to shift the list of top candidates rather than just
> replacing the next candidate? Otherwise, if you have a list of scores
> [20,15,3,2,1], and you try to add a candidate with score 25, you'd end up
> with [25,20,3,2,1], whereas I think you'd want [25,20,15,3,2]. Or if you
> read a page of candidates 1,2,3,4,5, you'd end up with [5,4], but I think
> you'd want [5,4,3,2,1].
> 
> 
> It looks like this breaks some pages that should be articles, e.g.:
> * http://en.m.wikipedia.org/wiki/Mozilla

Known issue but we're not doing the right thing with Wikipedia pages anyway (see bug 782423). Wikipedia is very unique kind of page. We'll have to sort out the best way to handle it in that separate bug.

I personally prefer to not offer reader mode at all instead of offering and showing something totally unexpected. So, in a way, this is an improvement :-)

> http://mobile.slate.com/posts/2012/06/22/
> new_law_threatens_mississippi_s_last_abortion_clinic.html

This will be fixed once we properly convert DIVs to Ps. This is probably unrelated to this change. I'll double-check.

> It also doesn't filter out Google search queries, which I would have
> expected to fail, e.g.:
> http://www.google.com/search?q=query&ie=utf-8&oe=utf-8&channel=fs

Interesting. I tested it on a few Google search queries and it worked fine. Might be a mobile site thing. Will have a look.

> For readability, I think it's better to err on the side of having more false
> positives as opposed to more false negatives. Showing the icon for
> unreadable pages is annoying, but not showing the icon for readable pages
> reduces defeats the purpose of reader mode. Having a competition algorithm
> is a good idea, but it's probably better to tweak it more first to make sure
> we have as few false negatives as possible.

Agree that we should err on the false positives and I tweaked the maximum contrast ratio to do exactly that. There's no 100% accurate way of doing this. From my tests on a long list of websites (nytimes, cnn, arstechnica, latimes, tons of blogs, and much more) the results have always err'd on the false positive side.

Anyway, I posted the test build exactly for that: to get an idea of where it fails when it really shouldn't. I wouldn't block on things like google.com case though (i.e. this is not a regression). We should be incrementally tweaking the readability code to improve its accuracy.
Attachment #653757 - Attachment is obsolete: true
Attachment #653757 - Flags: review?(mark.finkle)
Attachment #653850 - Flags: review?(mark.finkle)
Attachment #653850 - Flags: review?
Attachment #653850 - Flags: feedback?(bnicholson)
I'm still worried about http://mobile.slate.com/posts/2012/06/22/new_law_threatens_mississippi_s_last_abortion_clinic.html; it looks like it parses fine before this patch, but doesn't work with it applied. Can you confirm the DIV->P patch fixes it?
(In reply to Brian Nicholson (:bnicholson) from comment #6)
> I'm still worried about
> http://mobile.slate.com/posts/2012/06/22/
> new_law_threatens_mississippi_s_last_abortion_clinic.html; it looks like it
> parses fine before this patch, but doesn't work with it applied. Can you
> confirm the DIV->P patch fixes it?

Yes, just confirmed that it fixes it. The reason is simple: without the DIV->P patch, the DIVs-with-P elements in mobile.slate.com were being considered candidates on their own right.
Attachment #654199 - Flags: review?(bnicholson)
(In reply to Lucas Rocha (:lucasr) from comment #8)
> Created attachment 654199 [details] [diff] [review]
> Break line one log messages in Readability

Is this just for the Readability XULRunner test? I didn't think this happened with Readability.js on Android.
Attachment #653850 - Flags: feedback?(bnicholson) → feedback+
Comment on attachment 653850 [details] [diff] [review]
Don't offer reader in pages with too much reading competition  * *

nit: aTopCandidate looks a bit odd since we use "a" + "Something" as a function param. Maybe just use "topCandidate" ?
Attachment #653850 - Flags: review?(mark.finkle) → review+
(In reply to Mark Finkle (:mfinkle) from comment #10)
> Comment on attachment 653850 [details] [diff] [review]
> Don't offer reader in pages with too much reading competition  * *
> 
> nit: aTopCandidate looks a bit odd since we use "a" + "Something" as a
> function param. Maybe just use "topCandidate" ?

I didn't use "topCandidate" there to avoid confusion with the important variable called topCandidate below.
Comment on attachment 653850 [details] [diff] [review]
Don't offer reader in pages with too much reading competition  * *

[Approval Request Comment]
User impact if declined: We're offering reader mode on pages which are more like lists of articles instead of actual readable content. We want to avoid these false positive cases as much as possible in reader.
Testing completed (on m-c, etc.): Tested locally with a long list of major and minor websites and blogs. The change didn't cause any false negative in my tests. There are still some remaining false positives which I intent to cover in a follow-up bugs though.
Risk to taking this patch (and alternatives if risky): Medium. There's no 100% accurate way of predicting how actually readable a page should be. This patch sets a higher bar for pages to be considered reader mode material. There's a risk of adding false negative cases but the best way to find out about those is to have this patch in the wild.
String or UUID changes made by this patch: None.
Attachment #653850 - Flags: approval-mozilla-aurora?
Comment on attachment 654199 [details] [diff] [review]
Break line one log messages in Readability

No big deal, dropping this one.
Attachment #654199 - Attachment is obsolete: true
Attachment #654199 - Flags: review?(bnicholson)
https://hg.mozilla.org/mozilla-central/rev/55c4a3f3a6a9
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 17
(In reply to Lucas Rocha (:lucasr) from comment #13)
> Comment on attachment 653850 [details] [diff] [review]
> Don't offer reader in pages with too much reading competition  * *
> 
> [Approval Request Comment]
> User impact if declined: We're offering reader mode on pages which are more
> like lists of articles instead of actual readable content. We want to avoid
> these false positive cases as much as possible in reader.

^ Given the user impact here

> Risk to taking this patch (and alternatives if risky): Medium. There's no
> 100% accurate way of predicting how actually readable a page should be. This
> patch sets a higher bar for pages to be considered reader mode material.
> There's a risk of adding false negative cases but the best way to find out
> about those is to have this patch in the wild.

and the fallout being more false negatives (which wouldn't feel like a regression as it's a new feature), let's get this onto Aurora as soon as possible.
Attachment #653850 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Status: RESOLVED → VERIFIED
What should I do if reader mode icon never shows up even on news sites (e.g. CNN)?
(In reply to henryfhchan from comment #18)
> What should I do if reader mode icon never shows up even on news sites (e.g.
> CNN)?

Depending on your Android device you might be yielding the behaviour added in bug 828124 where we've set a memory threshold for using reader-mode
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: