Last Comment Bug 782285 - Reader Mode is offered for inappropriate websites
: Reader Mode is offered for inappropriate websites
Status: VERIFIED FIXED
:
Product: Firefox for Android
Classification: Client Software
Component: Reader View (show other bugs)
: Trunk
: ARM Android
: -- normal (vote)
: Firefox 17
Assigned To: Nobody; OK to take it and work on it
:
Mentors:
Depends on:
Blocks: reader
  Show dependency treegraph
 
Reported: 2012-08-13 08:16 PDT by Cristian Nicolae (:xti)
Modified: 2016-07-29 14:29 PDT (History)
12 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
verified
fixed


Attachments
screenshot (330.88 KB, image/png)
2012-08-13 08:16 PDT, Cristian Nicolae (:xti)
no flags Details
Don't offer reader in pages with too much reading competition (5.75 KB, patch)
2012-08-21 08:00 PDT, Lucas Rocha (:lucasr)
bnicholson: feedback-
Details | Diff | Splinter Review
Don't offer reader in pages with too much reading competition * * (5.75 KB, patch)
2012-08-21 11:03 PDT, Lucas Rocha (:lucasr)
mark.finkle: review+
bnicholson: feedback+
akeybl: approval‑mozilla‑aurora+
Details | Diff | Splinter Review
Break line one log messages in Readability (984 bytes, patch)
2012-08-22 07:32 PDT, Lucas Rocha (:lucasr)
no flags Details | Diff | Splinter Review

Description Cristian Nicolae (:xti) 2012-08-13 08:16:12 PDT
Created attachment 651396 [details]
screenshot

Firefox 17.0a1 (2012-08-13)
Device: Galaxy Nexus
OS: Android 4.1.1

Steps to reproduce:
1. Open Fennec
2. Go to http://www.youtube.com/watch?v=hu2rRqFazuA (YouTube desktop site)
3. Tap on the video to play it
4. Long tap on the video
5. Select "Pop up" from the context menu
6. Check if Reader Mode icon is displayed for the new opened tab

Expected result:
The Reader Mode is not offered for inappropriate pages.

Actual result:
The Reader Mode icon is displayed in URL Bar for the tab opened at step 5 as you can see in the attached screenshot.
Comment 1 Lucas Rocha (:lucasr) 2012-08-21 08:00:14 PDT
Created attachment 653757 [details] [diff] [review]
Don't offer reader in pages with too much reading competition
Comment 2 Lucas Rocha (:lucasr) 2012-08-21 09:38:59 PDT
This patch greatly improves the readability check results for pages that are not articles. Pages like nytimes.com/technology or http://edition.cnn.com/WORLD should not have reader enabled. Here's a test build:

http://dl.dropbox.com/u/1187037/better-reader-check.apk
Comment 3 Brian Nicholson (:bnicholson) 2012-08-21 09:50:36 PDT
Comment on attachment 653757 [details] [diff] [review]
Don't offer reader in pages with too much reading competition

+        for (let t = 0; t < this.N_TOP_CANDIDATES; t++) {
+          let aTopCandidate = topCandidates[t];
+
+          if (!aTopCandidate || candidateScore > aTopCandidate.readability.contentScore) {
+            if (t + 1 < this.N_TOP_CANDIDATES && aTopCandidate)
+              topCandidates[t + 1] = aTopCandidate;
+
+            topCandidates[t] = candidate;
+            break;
+          }

Wouldn't it be better to shift the list of top candidates rather than just replacing the next candidate? Otherwise, if you have a list of scores [20,15,3,2,1], and you try to add a candidate with score 25, you'd end up with [25,20,3,2,1], whereas I think you'd want [25,20,15,3,2]. Or if you read a page of candidates 1,2,3,4,5, you'd end up with [5,4], but I think you'd want [5,4,3,2,1].


It looks like this breaks some pages that should be articles, e.g.:
* http://en.m.wikipedia.org/wiki/Mozilla
* http://mobile.slate.com/posts/2012/06/22/new_law_threatens_mississippi_s_last_abortion_clinic.html

It also doesn't filter out Google search queries, which I would have expected to fail, e.g.:
http://www.google.com/search?q=query&ie=utf-8&oe=utf-8&channel=fs

For readability, I think it's better to err on the side of having more false positives as opposed to more false negatives. Showing the icon for unreadable pages is annoying, but not showing the icon for readable pages reduces defeats the purpose of reader mode. Having a competition algorithm is a good idea, but it's probably better to tweak it more first to make sure we have as few false negatives as possible.
Comment 4 Lucas Rocha (:lucasr) 2012-08-21 10:08:18 PDT
(In reply to Brian Nicholson (:bnicholson) from comment #3)
> Comment on attachment 653757 [details] [diff] [review]
> Don't offer reader in pages with too much reading competition
> 
> +        for (let t = 0; t < this.N_TOP_CANDIDATES; t++) {
> +          let aTopCandidate = topCandidates[t];
> +
> +          if (!aTopCandidate || candidateScore >
> aTopCandidate.readability.contentScore) {
> +            if (t + 1 < this.N_TOP_CANDIDATES && aTopCandidate)
> +              topCandidates[t + 1] = aTopCandidate;
> +
> +            topCandidates[t] = candidate;
> +            break;
> +          }
> 
> Wouldn't it be better to shift the list of top candidates rather than just
> replacing the next candidate? Otherwise, if you have a list of scores
> [20,15,3,2,1], and you try to add a candidate with score 25, you'd end up
> with [25,20,3,2,1], whereas I think you'd want [25,20,15,3,2]. Or if you
> read a page of candidates 1,2,3,4,5, you'd end up with [5,4], but I think
> you'd want [5,4,3,2,1].
> 
> 
> It looks like this breaks some pages that should be articles, e.g.:
> * http://en.m.wikipedia.org/wiki/Mozilla

Known issue but we're not doing the right thing with Wikipedia pages anyway (see bug 782423). Wikipedia is very unique kind of page. We'll have to sort out the best way to handle it in that separate bug.

I personally prefer to not offer reader mode at all instead of offering and showing something totally unexpected. So, in a way, this is an improvement :-)

> http://mobile.slate.com/posts/2012/06/22/
> new_law_threatens_mississippi_s_last_abortion_clinic.html

This will be fixed once we properly convert DIVs to Ps. This is probably unrelated to this change. I'll double-check.

> It also doesn't filter out Google search queries, which I would have
> expected to fail, e.g.:
> http://www.google.com/search?q=query&ie=utf-8&oe=utf-8&channel=fs

Interesting. I tested it on a few Google search queries and it worked fine. Might be a mobile site thing. Will have a look.

> For readability, I think it's better to err on the side of having more false
> positives as opposed to more false negatives. Showing the icon for
> unreadable pages is annoying, but not showing the icon for readable pages
> reduces defeats the purpose of reader mode. Having a competition algorithm
> is a good idea, but it's probably better to tweak it more first to make sure
> we have as few false negatives as possible.

Agree that we should err on the false positives and I tweaked the maximum contrast ratio to do exactly that. There's no 100% accurate way of doing this. From my tests on a long list of websites (nytimes, cnn, arstechnica, latimes, tons of blogs, and much more) the results have always err'd on the false positive side.

Anyway, I posted the test build exactly for that: to get an idea of where it fails when it really shouldn't. I wouldn't block on things like google.com case though (i.e. this is not a regression). We should be incrementally tweaking the readability code to improve its accuracy.
Comment 5 Lucas Rocha (:lucasr) 2012-08-21 11:03:33 PDT
Created attachment 653850 [details] [diff] [review]
Don't offer reader in pages with too much reading competition  * *
Comment 6 Brian Nicholson (:bnicholson) 2012-08-21 11:48:09 PDT
I'm still worried about http://mobile.slate.com/posts/2012/06/22/new_law_threatens_mississippi_s_last_abortion_clinic.html; it looks like it parses fine before this patch, but doesn't work with it applied. Can you confirm the DIV->P patch fixes it?
Comment 7 Lucas Rocha (:lucasr) 2012-08-22 03:20:45 PDT
(In reply to Brian Nicholson (:bnicholson) from comment #6)
> I'm still worried about
> http://mobile.slate.com/posts/2012/06/22/
> new_law_threatens_mississippi_s_last_abortion_clinic.html; it looks like it
> parses fine before this patch, but doesn't work with it applied. Can you
> confirm the DIV->P patch fixes it?

Yes, just confirmed that it fixes it. The reason is simple: without the DIV->P patch, the DIVs-with-P elements in mobile.slate.com were being considered candidates on their own right.
Comment 8 Lucas Rocha (:lucasr) 2012-08-22 07:32:27 PDT
Created attachment 654199 [details] [diff] [review]
Break line one log messages in Readability
Comment 9 Brian Nicholson (:bnicholson) 2012-08-22 10:47:40 PDT
(In reply to Lucas Rocha (:lucasr) from comment #8)
> Created attachment 654199 [details] [diff] [review]
> Break line one log messages in Readability

Is this just for the Readability XULRunner test? I didn't think this happened with Readability.js on Android.
Comment 10 Mark Finkle (:mfinkle) (use needinfo?) 2012-08-22 23:18:07 PDT
Comment on attachment 653850 [details] [diff] [review]
Don't offer reader in pages with too much reading competition  * *

nit: aTopCandidate looks a bit odd since we use "a" + "Something" as a function param. Maybe just use "topCandidate" ?
Comment 11 Lucas Rocha (:lucasr) 2012-08-23 09:06:25 PDT
(In reply to Mark Finkle (:mfinkle) from comment #10)
> Comment on attachment 653850 [details] [diff] [review]
> Don't offer reader in pages with too much reading competition  * *
> 
> nit: aTopCandidate looks a bit odd since we use "a" + "Something" as a
> function param. Maybe just use "topCandidate" ?

I didn't use "topCandidate" there to avoid confusion with the important variable called topCandidate below.
Comment 12 Lucas Rocha (:lucasr) 2012-08-23 09:16:13 PDT
Pushed: https://hg.mozilla.org/integration/mozilla-inbound/rev/55c4a3f3a6a9
Comment 13 Lucas Rocha (:lucasr) 2012-08-23 09:20:23 PDT
Comment on attachment 653850 [details] [diff] [review]
Don't offer reader in pages with too much reading competition  * *

[Approval Request Comment]
User impact if declined: We're offering reader mode on pages which are more like lists of articles instead of actual readable content. We want to avoid these false positive cases as much as possible in reader.
Testing completed (on m-c, etc.): Tested locally with a long list of major and minor websites and blogs. The change didn't cause any false negative in my tests. There are still some remaining false positives which I intent to cover in a follow-up bugs though.
Risk to taking this patch (and alternatives if risky): Medium. There's no 100% accurate way of predicting how actually readable a page should be. This patch sets a higher bar for pages to be considered reader mode material. There's a risk of adding false negative cases but the best way to find out about those is to have this patch in the wild.
String or UUID changes made by this patch: None.
Comment 14 Lucas Rocha (:lucasr) 2012-08-23 09:21:28 PDT
Comment on attachment 654199 [details] [diff] [review]
Break line one log messages in Readability

No big deal, dropping this one.
Comment 15 Ryan VanderMeulen [:RyanVM] 2012-08-23 19:20:08 PDT
https://hg.mozilla.org/mozilla-central/rev/55c4a3f3a6a9
Comment 16 Alex Keybl [:akeybl] 2012-08-24 15:54:20 PDT
(In reply to Lucas Rocha (:lucasr) from comment #13)
> Comment on attachment 653850 [details] [diff] [review]
> Don't offer reader in pages with too much reading competition  * *
> 
> [Approval Request Comment]
> User impact if declined: We're offering reader mode on pages which are more
> like lists of articles instead of actual readable content. We want to avoid
> these false positive cases as much as possible in reader.

^ Given the user impact here

> Risk to taking this patch (and alternatives if risky): Medium. There's no
> 100% accurate way of predicting how actually readable a page should be. This
> patch sets a higher bar for pages to be considered reader mode material.
> There's a risk of adding false negative cases but the best way to find out
> about those is to have this patch in the wild.

and the fallout being more false negatives (which wouldn't feel like a regression as it's a new feature), let's get this onto Aurora as soon as possible.
Comment 17 Lucas Rocha (:lucasr) 2012-08-24 16:29:06 PDT
Pushed to aurora: https://hg.mozilla.org/releases/mozilla-aurora/rev/e978f8eda190
Comment 18 henryfhchan 2013-04-17 08:55:00 PDT
What should I do if reader mode icon never shows up even on news sites (e.g. CNN)?
Comment 19 Aaron Train [:aaronmt] 2013-04-17 09:07:18 PDT
(In reply to henryfhchan from comment #18)
> What should I do if reader mode icon never shows up even on news sites (e.g.
> CNN)?

Depending on your Android device you might be yielding the behaviour added in bug 828124 where we've set a memory threshold for using reader-mode

Note You need to log in before you can comment on or make changes to this bug.