Closed Bug 774914 Opened 12 years ago Closed 12 years ago

Reader Mode: Convert divs with only a <p> element child into plain <p> elements

Categories

(Firefox for Android Graveyard :: Reader View, defect, P2)

ARM
Android
defect

Tracking

(firefox16- verified, firefox17 verified, firefox18 verified)

VERIFIED FIXED
Firefox 17
Tracking Status
firefox16 - verified
firefox17 --- verified
firefox18 --- verified

People

(Reporter: aryx, Assigned: lucasr)

References

Details

Attachments

(2 files, 3 obsolete files)

Firefox for Android trunk nightly 2012-07-17, Android 4.0.4 (stock), Google Nexus S

Opening http://www.computerbase.de/forum/showthread.php?t=1086596 and clicking the reader mode shows only one post of the thread displayed at the normal page, it's one in the middle: http://www.computerbase.de/forum/showthread.php?p=12377579
Blocks: reader
(In reply to Archaeopteryx [:aryx] from comment #0)
> Firefox for Android trunk nightly 2012-07-17, Android 4.0.4 (stock), Google
> Nexus S
> 
> Opening http://www.computerbase.de/forum/showthread.php?t=1086596 and
> clicking the reader mode shows only one post of the thread displayed at the
> normal page, it's one in the middle:
> http://www.computerbase.de/forum/showthread.php?p=12377579

This is probably because the readability algorithm is picking the post that has more text inside. To be honest, I wouldn't even offer reader mode in a page showing a list of posts. For the same reason that we shouldn't show reader mode option for pages like the Technology section at nytimes.com/technology
(In reply to Brian Nicholson (:bnicholson) from comment #1)
> I encountered the same thing for this article:
> 
> http://mobile.slate.com/posts/2012/06/22/
> new_law_threatens_mississippi_s_last_abortion_clinic.html

This is a different case. Slate encloses each paragraph with <div class="text"> element which makes the readability algorithm think it has to pick only the div with more text.

We should probably tweak Readability.js to make it convert divs with only a <p> element inside into plain <p> elements.
Summary: Reader mode shows only one post of a forum thread → Reader Mode: Convert divs with only a <p> element child into plain <p> elements
Assignee: nobody → lucasr.at.mozilla
Attachment #651904 - Flags: review?(bnicholson)
this doesn't look like a release blocker, and since there's already a patch forthcoming please just nominate for uplift when it's been on central and verified.
Comment on attachment 651904 [details] [diff] [review]
Convert divs with only a <p> element child into plain <p> elements

+    if (e.tagName !== "DIV")
+      return false;

We already check to see if we're in a div before calling this method, so do we need this?


+    if (childNodes.length !== 1)

If a paragraph looks like this:

<div>
  <p>[paragraph></p>
</div>

there will actually be 3 child nodes: the first whitespace Text node, the <p> element, and the second whitespace Text node. But I think we would still want to convert this to a single <p>.


+    return !this._getInnerText(e) &&

I don't understand - why check to see if the <p> has no text? I thought the problem is that slate uses paragraphs like this:

<div><p>[paragraph]</p></div>

and since the paragraphs are wrapped in <div>s, it's only using the <div> with the most text. If only empty <p>s are converted, these paragraphs would remain untouched, and the problem would still exist, right?
(In reply to Brian Nicholson (:bnicholson) from comment #6)
> Comment on attachment 651904 [details] [diff] [review]
> Convert divs with only a <p> element child into plain <p> elements
> 
> +    if (e.tagName !== "DIV")
> +      return false;
> 
> We already check to see if we're in a div before calling this method, so do
> we need this?

This is just to make it explicit that we should only cover divs in this method. Just in case we start using this method in a different scope for some reason in the future. Not a big deal though, I can remove it (to avoid the overhead).
 
> +    if (childNodes.length !== 1)
> 
> If a paragraph looks like this:
> 
> <div>
>   <p>[paragraph></p>
> </div>
> 
> there will actually be 3 child nodes: the first whitespace Text node, the
> <p> element, and the second whitespace Text node. But I think we would still
> want to convert this to a single <p>.

Nice catch, I forgot to cover the whitespace case here.

> 
> +    return !this._getInnerText(e) &&
> 
> I don't understand - why check to see if the <p> has no text? I thought the
> problem is that slate uses paragraphs like this:
> 
> <div><p>[paragraph]</p></div>
> 
> and since the paragraphs are wrapped in <div>s, it's only using the <div>
> with the most text. If only empty <p>s are converted, these paragraphs would
> remain untouched, and the problem would still exist, right?

"e" is the DIV, not the P. We don't want to convert the DIV into a P if we find things like this:

<div>text text text <p>paragraph</p></div>

I'll send a new patch to cover the review comments.
Comment on attachment 651904 [details] [diff] [review]
Convert divs with only a <p> element child into plain <p> elements

r- for new patch
Attachment #651904 - Flags: review?(bnicholson) → review-
Attachment #654155 - Flags: review?(bnicholson)
Attachment #651904 - Attachment is obsolete: true
Comment on attachment 654155 [details] [diff] [review]
Convert divs with only a <p> element child into plain <p> elements

             let newNode = doc.createElement('p');
-            newNode.innerHTML = node.innerHTML;
+            newNode.innerHTML = (pIndex >= 0 ? node.childNodes[pIndex].innerHTML : node.innerHTML);

If we already have a <p> inside the <div>, can we just use that <p> as-is without creating a new one and avoiding the expensive innerHTML copying? Basically, just set newNode to the <p> if we're doing the replacement; otherwise, do doc.createElement() like we do now.

Other than that, this looks good. r- for suggestion above.
Attachment #654155 - Flags: review?(bnicholson) → review-
Depends on: 785145
Attachment #654741 - Flags: review?(bnicholson)
Attachment #654155 - Attachment is obsolete: true
FYI: this patch needs the fix for bug 785145 in order to work properly.
Comment on attachment 654741 [details] [diff] [review]
Convert divs with only a <p> element child into plain <p> elements

This patch looks good, but can you explain the dependency on bug 785145? I tried this patch by itself and didn't notice any problems without bug 785145 applied.
Attachment #654741 - Flags: review?(bnicholson) → review+
Attachment #654741 - Attachment is obsolete: true
Attachment #654841 - Flags: review?(bnicholson) → review?(mark.finkle)
Attachment #654842 - Flags: review?(bnicholson) → review?(mark.finkle)
Attachment #654842 - Flags: review?(mark.finkle) → review+
Attachment #654841 - Flags: review?(mark.finkle) → review+
Comment on attachment 654842 [details] [diff] [review]
/785145 - Convert divs with only a <p> element child into plain <p> elements

[Approval Request Comment]
User impact if declined: Pages following the same pattern than mobile.slate.com (enclose all Ps inside DIVs) will show incomplete content in Reader Mode.
Testing completed (on m-c, etc.): Local tests in a long list of websites, no regressions found.
Risk to taking this patch (and alternatives if risky): Very low as it only applies to websites following this specific pattern.
String or UUID changes made by this patch: None.
Attachment #654842 - Flags: approval-mozilla-aurora?
Comment on attachment 654841 [details] [diff] [review]
Ensure newNode is orphan before replacing a node in JSDOMParser

[Approval Request Comment]
User impact if declined: Pages following the same pattern than mobile.slate.com (enclose all Ps inside DIVs) will show incomplete content in Reader Mode.
Testing completed (on m-c, etc.): Local tests in a long list of websites, no regressions found.
Risk to taking this patch (and alternatives if risky): Very low as it only applies to websites following this specific pattern.
String or UUID changes made by this patch: None.
Attachment #654841 - Flags: approval-mozilla-aurora?
No longer blocks: 777557
No longer depends on: 785145
Priority: -- → P2
https://hg.mozilla.org/mozilla-central/rev/8182bfe539f8
https://hg.mozilla.org/mozilla-central/rev/5e2b2c9c4f67
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 17
Comment on attachment 654841 [details] [diff] [review]
Ensure newNode is orphan before replacing a node in JSDOMParser

[Triage Comment]
Low risk, mobile only, approving for Beta.
Attachment #654841 - Flags: approval-mozilla-aurora? → approval-mozilla-beta+
Attachment #654842 - Flags: approval-mozilla-aurora? → approval-mozilla-beta+
I cannot reproduce this issue for 
http://mobile.slate.com/posts/2012/06/22/new_law_threatens_mississippi_s_last_abortion_clinic.html on the latest Nightly, Aurora and Beta builds. Closing bug as verified fixed.

--
Device: Galaxy Note
OS: Android 4.0.4
Status: RESOLVED → VERIFIED
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: