Closed Bug 1149859 Opened 9 years ago Closed 9 years ago

isProbablyReaderable too aggressive, many valid sites fail to qualify

Categories

(Toolkit :: Reader Mode, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla40
Tracking Status
firefox38 --- fixed
firefox39 --- fixed
firefox40 --- fixed

People

(Reporter: markh, Assigned: markh)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

eg:

http://www.abc.net.au/news/2015-04-01/local-surfboard-makers-are-being-wiped-out-foreign-drop-ins/6362240
http://www.theage.com.au/digital-life/games/surprise-pacman-is-munching-away-on-google-maps-this-april-fools-day-20150331-1mcfz6.html

Both fail to offer reader view.  Instrumenting isProbablyReaderable to dump when the node is too short shows:

node too short: 50
node too short: 123
node too short: 160
node too short: 84
node too short: 115
node too short: 131
node too short: 72
node too short: 0
node too short: 32
node too short: 180
node too short: 169
node too short: 96
node too short: 79
node too short: 88
node too short: 94
node too short: 176
node too short: 90
node too short: 0
node too short: 133
node too short: 61
node too short: 186
node too short: 135
node too short: 137
node too short: 183
node too short: 175
node too short: 59
node too short: 55
node too short: 29
node too short: 0
node too short: 69
node too short: 15
node too short: 84
node too short: 119
node too short: 146
node too short: 39
node too short: 94
node too short: 161
node too short: 100

So there are plenty of paragraphs - just not enough over 200 chars.  Tweaking the code to always allow the page to enter reader mode also shows the article looks fine.

So I propose we drop the limit from 200 to 100.
Attachment #8586505 - Flags: review?(margaret.leibovic)
alternatively, instead of wanting 5 paras over a particular length, we just look for a total length and ignore the number of nodes - eg, 20 <p> elements with 90 chars each is probably still readable.
Blocks: 1147626
Comment on attachment 8586505 [details] [diff] [review]
0009-Bug-XXXXXXX-readermode-now-considers-paragraphs-with.patch

Review of attachment 8586505 [details] [diff] [review]:
-----------------------------------------------------------------

Looking at Readability.js, it seems like we only throw away paragraphs that have fewer than 25 characters:
https://github.com/mozilla/readability/blob/master/Readability.js#L581

So the approach here sounds reasonable to me.

I feel like Gijs has spent more time than me in this candidate logic, so it would be good to get his opinion as well.

We just need to remember that the goal here is to show the button only in cases where Readability.js will actually succeed. So are there cases where we do have the requisite number of these short paragraphs but Readability won't return an article? Maybe if there's a lot of links in them? I would also be in support of continuing to test this with real testcases to converge on something that works reasonably well in the real world.
Attachment #8586505 - Flags: review?(margaret.leibovic)
Attachment #8586505 - Flags: review+
Attachment #8586505 - Flags: feedback?(gijskruitbosch+bugs)
Comment on attachment 8586505 [details] [diff] [review]
0009-Bug-XXXXXXX-readermode-now-considers-paragraphs-with.patch

Review of attachment 8586505 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM, we should probably build a corpus of things that we use to decide reader-ability... e.g. it'd be sad if we now went and offered reader mode on youtube or $SEARCH_PROVIDER search results again, or something.
Attachment #8586505 - Flags: feedback?(gijskruitbosch+bugs) → feedback+
(In reply to :Gijs Kruitbosch from comment #3)
> e.g. it'd be sad if we now went and offered reader mode on
> youtube or $SEARCH_PROVIDER search results again, or something.

FTR, this doesn't cause youtube, nor google or yahoo search results pages from offering reader mode, so I think I'll go ahead and push this later today.
https://hg.mozilla.org/mozilla-central/rev/2c7c65e46790
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
Depends on: 1151087
Assignee: nobody → mhammond
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: