Closed Bug 1217007 Opened 6 years ago Closed 5 years ago

using reader mode on medium.com can omit leading paragraphs/sections

Categories

(Toolkit :: Reader Mode, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla54
Tracking Status
firefox54 --- fixed

People

(Reporter: froydnj, Assigned: evanxd)

References

(Blocks 3 open bugs)

Details

(Whiteboard: [reader-mode-readability-algorithm])

STR:

1. Visit https://medium.com/@johncwelch/samantha-and-the-great-big-lie-d146a92473a1#.tpqxbw99y
2. Observe that the first sentence starts off "How to get shanked doing what people say they want"
3. Click the reader mode icon in the URL bar

Expected result:

The first sentence starts off as before.

Actual result:

The first sentence starts off "The response, from all quarters...", which is the first sentence of the paragraph that begins the third "section" of the article.  The previous two sections have disappeared completely.

This is on Nightly 44.  Same thing happens on Beta 42.
Priority: -- → P3
Whiteboard: [reader-mode-readability-algorithm]
Blocks: 1286221
I have another example but from the New York Times with 48.0.1. 

1. Visit http://www.nytimes.com/2016/09/08/technology/whats-really-missing-from-the-new-iphone-dazzle.html 
2. Observe that the first sentence starts off "Forget about the headphone jack for a second."
3. Click the reader mode icon in the URL bar, and the first sentence starts off "It’s the same story for Apple Music" that is the beginning of the 9th paragraph instead.
Blocks: 1324630
Assignee: nobody → evan
Hi giacecco,

The New York Times issue is already fixed by Bug 1300697.
Root cause is found. Same issue, the algorithm finds a wrong top candidate. The `<div class="section-content"></div>` node which belongs to `<div class="section-3">` node is found as the top candidate by the algorithm. But the `<div class="postArticle-content">` node is the correct top candidate.

Maybe we could try to remove the empty `<div class="section-divider"></div>` nodes and the logic[1] added in Bug 1300697 can find the correct top candidate. Start to write a patch to fix it.

```
<div class="postArticle-content">
  <div class="section-1">
    <div class="section-divider"></div>
    <div class="section-content"></div>
  </div>
  <div class="section-2">
    <div class="section-divider"></div>
    <div class="section-content"></div>
  </div>
  <div class="section-3">
    <div class="section-divider"></div>
    <div class="section-content"></div>
  </div>
  <div class="section-4">
    <div class="section-divider"></div>
    <div class="section-content"></div>
  </div>
  <div class="section-5">
    <div class="section-divider"></div>
    <div class="section-content"></div>
  </div>
</div>
```

[1]: https://github.com/mozilla/readability/blob/master/Readability.js#L925-L931
Sent a pull request[1] to fix the issue and the CI is good.

[1]: https://github.com/mozilla/readability/pull/354/commits/f983444fb676b35bb0283efbe9a3fe41b3a8d781
and tried updated the patch to do some experiments for review comments.
Will land the patch in m-c in the mozreview request[1].

[1]: https://reviewboard.mozilla.org/r/114842/diff/1#index_header
https://hg.mozilla.org/mozilla-central/rev/e04079f3f386
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
Blocks: 1329358
You need to log in before you can comment on or make changes to this bug.