Correctly mark Ps created from text-only DIVs for scoring

VERIFIED FIXED in Firefox 17

Status

()

P1
normal
VERIFIED FIXED
6 years ago
6 years ago

People

(Reporter: lucasr, Assigned: lucasr)

Tracking

unspecified
Firefox 17
All
Android
Points:
---

Firefox Tracking Flags

(firefox16 unaffected, firefox17 verified, firefox18 verified)

Details

Attachments

(1 attachment)

(Assignee)

Description

6 years ago
Right now it seems we're doing it wrong add queuing a replaced node for scoring:

  http://mxr.mozilla.org/mozilla-central/source/mobile/android/chrome/content/Readability.js#465
(Assignee)

Comment 1

6 years ago
Created attachment 654740 [details] [diff] [review]
Correctly mark Ps created from text-only DIVs for scoring
Attachment #654740 - Flags: review?(bnicholson)
(Assignee)

Updated

6 years ago
Blocks: 774914
Comment on attachment 654740 [details] [diff] [review]
Correctly mark Ps created from text-only DIVs for scoring

This change causes this page to fail:
http://articles.latimes.com/2012/mar/27/health/la-he-green-coffee-weight-loss-20120328

Even though the node is no longer in the document, maybe adding it to nodesToScore was intentional? Maybe the former node's parent is still useful for the algorithm.

Also, nodeIndex is decremented here. This means the index is checked again, which would cause newNode to be looked at twice. Perhaps this is contributing to the failure for the article I linked to? Maybe you could try removing the "nodeIndex -= 1;" line, but we might just want to leave this alone unless we have good reason to change it.
Attachment #654740 - Flags: review?(bnicholson) → review-
(Assignee)

Comment 3

6 years ago
(In reply to Brian Nicholson (:bnicholson) from comment #2)
> Comment on attachment 654740 [details] [diff] [review]
> Correctly mark Ps created from text-only DIVs for scoring
> 
> This change causes this page to fail:
> http://articles.latimes.com/2012/mar/27/health/la-he-green-coffee-weight-
> loss-20120328

Interesting, it's not failing for me. Are you testing this on your reader test app or on device?

> Even though the node is no longer in the document, maybe adding it to
> nodesToScore was intentional? Maybe the former node's parent is still useful
> for the algorithm.

It doesn't make any sense to add the replaced to node to nodesToScore array. Once we call node.parentNode.replaceChild(newNode, node), the DIV that got replaced by the P will have a null parentNode pointer. This means this node will be simply discarded once through all the nodesToScore items. See:

http://mxr.mozilla.org/mozilla-central/source/mobile/android/chrome/content/Readability.js#497

Note that the point of converting these DIVs to Ps is because the whole algorithm is driven by how much content each node has. This why we convert double BRs to Ps and why we convert certain DIVs to Ps. Paragraph nodes increment score for their parent and grandparent nodes:

http://mxr.mozilla.org/mozilla-central/source/mobile/android/chrome/content/Readability.js#530

Hence, considering a parentless node for scoring is pointless for the algorithm.

> Also, nodeIndex is decremented here. This means the index is checked again,
> which would cause newNode to be looked at twice. Perhaps this is
> contributing to the failure for the article I linked to? Maybe you could try
> removing the "nodeIndex -= 1;" line, but we might just want to leave this
> alone unless we have good reason to change it.

Good catch. This is intentional. Once the loop visits the same index again, it will now be a P and the newNode will be added to nodesToScore array (mxr.mozilla.org/mozilla-central/source/mobile/android/chrome/content/Readability.js#530). With this change, we don't need to do this anymore as we're adding the newNode straight away.

I think I'll combine the patch in this bug with the patch in bug 774914 as they're dealing with the same part of the algorithm. Better get these changes all at once.
(Assignee)

Updated

6 years ago
Priority: -- → P1
(Assignee)

Updated

6 years ago
No longer blocks: 774914
https://hg.mozilla.org/mozilla-central/rev/5e2b2c9c4f67
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 17

Updated

6 years ago
status-firefox17: --- → fixed
http://articles.latimes.com/2012/mar/27/health/la-he-green-coffee-weight-loss-20120328 doesn't fail anymore on the latest Nightly, Aurora and Beta builds. It seems that the issue was fixed. Closing bug as verified fixed.

--
Device: Galaxy Note
OS: Android 4.0.4
Status: RESOLVED → VERIFIED
status-firefox16: --- → verified
status-firefox17: fixed → verified
status-firefox18: --- → verified

Comment 8

6 years ago
I guess it was a regression in 17.0.
status-firefox16: verified → unaffected
You need to log in before you can comment on or make changes to this bug.