isProbablyReaderable too aggressive, many valid sites fail to qualify

RESOLVED FIXED in Firefox 38

Status

()

defect
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: markh, Assigned: markh)

Tracking

(Blocks 1 bug)

unspecified
mozilla40
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox38 fixed, firefox39 fixed, firefox40 fixed)

Details

Attachments

(1 attachment)

(Assignee)

Description

4 years ago
eg:

http://www.abc.net.au/news/2015-04-01/local-surfboard-makers-are-being-wiped-out-foreign-drop-ins/6362240
http://www.theage.com.au/digital-life/games/surprise-pacman-is-munching-away-on-google-maps-this-april-fools-day-20150331-1mcfz6.html

Both fail to offer reader view.  Instrumenting isProbablyReaderable to dump when the node is too short shows:

node too short: 50
node too short: 123
node too short: 160
node too short: 84
node too short: 115
node too short: 131
node too short: 72
node too short: 0
node too short: 32
node too short: 180
node too short: 169
node too short: 96
node too short: 79
node too short: 88
node too short: 94
node too short: 176
node too short: 90
node too short: 0
node too short: 133
node too short: 61
node too short: 186
node too short: 135
node too short: 137
node too short: 183
node too short: 175
node too short: 59
node too short: 55
node too short: 29
node too short: 0
node too short: 69
node too short: 15
node too short: 84
node too short: 119
node too short: 146
node too short: 39
node too short: 94
node too short: 161
node too short: 100

So there are plenty of paragraphs - just not enough over 200 chars.  Tweaking the code to always allow the page to enter reader mode also shows the article looks fine.

So I propose we drop the limit from 200 to 100.
Attachment #8586505 - Flags: review?(margaret.leibovic)
(Assignee)

Comment 1

4 years ago
alternatively, instead of wanting 5 paras over a particular length, we just look for a total length and ignore the number of nodes - eg, 20 <p> elements with 90 chars each is probably still readable.

Updated

4 years ago
Blocks: 1147626
Comment on attachment 8586505 [details] [diff] [review]
0009-Bug-XXXXXXX-readermode-now-considers-paragraphs-with.patch

Review of attachment 8586505 [details] [diff] [review]:
-----------------------------------------------------------------

Looking at Readability.js, it seems like we only throw away paragraphs that have fewer than 25 characters:
https://github.com/mozilla/readability/blob/master/Readability.js#L581

So the approach here sounds reasonable to me.

I feel like Gijs has spent more time than me in this candidate logic, so it would be good to get his opinion as well.

We just need to remember that the goal here is to show the button only in cases where Readability.js will actually succeed. So are there cases where we do have the requisite number of these short paragraphs but Readability won't return an article? Maybe if there's a lot of links in them? I would also be in support of continuing to test this with real testcases to converge on something that works reasonably well in the real world.
Attachment #8586505 - Flags: review?(margaret.leibovic)
Attachment #8586505 - Flags: review+
Attachment #8586505 - Flags: feedback?(gijskruitbosch+bugs)

Comment 3

4 years ago
Comment on attachment 8586505 [details] [diff] [review]
0009-Bug-XXXXXXX-readermode-now-considers-paragraphs-with.patch

Review of attachment 8586505 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM, we should probably build a corpus of things that we use to decide reader-ability... e.g. it'd be sad if we now went and offered reader mode on youtube or $SEARCH_PROVIDER search results again, or something.
Attachment #8586505 - Flags: feedback?(gijskruitbosch+bugs) → feedback+
(Assignee)

Comment 4

4 years ago
(In reply to :Gijs Kruitbosch from comment #3)
> e.g. it'd be sad if we now went and offered reader mode on
> youtube or $SEARCH_PROVIDER search results again, or something.

FTR, this doesn't cause youtube, nor google or yahoo search results pages from offering reader mode, so I think I'll go ahead and push this later today.
https://hg.mozilla.org/mozilla-central/rev/2c7c65e46790
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla40

Updated

4 years ago
Depends on: 1151087

Updated

4 years ago
Assignee: nobody → mhammond
You need to log in before you can comment on or make changes to this bug.