Closed Bug 760554 Opened 12 years ago Closed 2 years ago

Reader Mode: Support multi-page articles

Categories

(Toolkit :: Reader Mode, enhancement, P3)

enhancement

Tracking

()

RESOLVED WONTFIX

People

(Reporter: lucasr, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [reader-mode-firefox-integration])

Attachments

(2 obsolete files)

If the article in the site is split into multiple pages, we have to incrementally load the rest of the content and append to the initially loaded article content.

The readability script does support it but I've disabled it to be able to land the first reader mode patches.
Component: General → Reader Mode
Multi-page articles are quite common on some major websites (e.g. nytimes, wired, ...).
Priority: -- → P2
Oftentimes the printable version of the article contains the full text as well.  That's consistent enough among major websites it might be useful?

Some also provide a full-text/one-page button, but my feeling is that the nomenclature isn't very consistent.
Attachment #715159 - Flags: feedback?(lucasr.at.mozilla)
Comment on attachment 715159 [details] [diff] [review]
first patch for multipage articles in reader mode

Review of attachment 715159 [details] [diff] [review]:
-----------------------------------------------------------------

Looking promising but needs work. Readability.js contains the code to fetch the next page link for you (with _findNextPageLink() and all). Why did you need to hardcode specific URLs?

::: mobile/android/chrome/content/Readability.js
@@ +194,5 @@
>     * @return void
>     **/
> +  _prepDocument: function(document) {
> +    //let doc = this._doc;
> +    let doc = document;

How's this change related to the main point of the patch?

@@ +873,5 @@
>    _findBaseUrl: function() {
>      let uri = this._uri;
> +    //let noUrlParams = uri.path.split("?")[0];
> +    let noUrlParams = uri.spec.split("?")[0];
> +    noUrlParams = noUrlParams.replace(uri.prePath, "");

Some context about this change would be nice :-) What's this about?

@@ +921,5 @@
>          cleanedSegments.push(segment);
>      }
>  
>      // This is our final, cleaned, base article URL.
> +    //return uri.scheme + "://" + uri.host + cleanedSegments.reverse().join("/");

Cleanup the commented code?

@@ +956,5 @@
> +       if(!linkHref.match(/www\..+\.com/)) {
> +
> +        linkHref = uri.prePath + linkHref;
> +       }
> +      }

What's this about? Hardcoding pages here won't scale well. Doesn't the original algorithm cover these things?

@@ +1072,5 @@
> +      // if the URl contins things like ad,ads,adn,adt, for sure is not a next page link, give a huge decrease.
> +      // example : "http://www.nytimes.com/adx/bin/adx_click.html?type=goto&opzn&page=www.nytimes.com/yr/mo/day/world/africa&pos=Box1&sn2=5be1d228/22bac537&sn1=4db0b17/d3b11b59&camp=IHT2012-Mktg-336x280-US_ROS-Intl-Mktg_module&ad=28Jan13_Opinion_IndiaInk_336x280&goto=http://global.nytimes.com/"
> +      //also for things like slideshow(for some reason a link for some slideshow images got a bigger score then the NextPage link so we penalize it)
> +      if (linkHref.match(/\/ad\w{1,4}|slideshow/i))
> +       linkObj.score -= 100;

Where's this code coming from?
Attachment #715159 - Flags: feedback?(lucasr.at.mozilla) → feedback-
I tested this patch mainly on www.nytimes.com (and a little on www.phoronix.com, here I run in to another problem...) 
1."Why did you need to hardcode specific URLs?"
I had to hardcode specific URLs because on www.nytimes.com the next page link from the <a> tag does not contain the whole URL for the next page but only the path 
example: <a class="next" href="/2013/03/01/us/politics/house-republicans-cheer-boehners-refusal-to-negotiate-on-cuts.html?pagewanted=2&hp" title="Next Page" onclick="s_code_linktrack('Article-MultiPage-Next');"></a>
so because of this code: 
http://mxr.mozilla.org/mozilla-central/source/mobile/android/chrome/content/Readability.js#956
next page link was ignored

2."::: mobile/android/chrome/content/Readability.js
  @@ +194,5 @@"
Now it is not related to this patch (I made this change at the beginning of my attempt of solving this bug)

3."@@ +873,5 @@"
uri.path gave me undefined for some reason so I had to make an workaround.
Also if I recall correctly uri.scheme gave me undefined too.

4."@@ +921,5 @@"
ok :-)

5."@@ +956,5 @@"
this is just for www.nytimes.com and for www.phoronix.com for the reason explained in point 1.

6."@@ +1072,5 @@"
This code is coming from me :-). I added it because for some unknown reason some links(add ons, slide show) still got a higher score then the next page link.
Attachment #715159 - Attachment is obsolete: true
Attachment #721181 - Flags: review?(lucasr.at.mozilla)
Blocks: readerv2
No longer blocks: reader
Comment on attachment 721181 [details] [diff] [review]
second patch of multipage articles in reader mode

Thanks a lot for the patch, Alex. Apologies for the huge delay here :-/ The reader mode has changed a lot since you submitted this patch. Let me know if you'd like to discuss how to complete this work. Clearing the review flag for now.
Attachment #721181 - Flags: review?(lucasr.at.mozilla)
Just a friendly ping to see how things are going in this bug? :)
Attachment #721181 - Attachment is obsolete: true
Blocks: fix-readability
No longer blocks: readerv2
Component: Reader Mode → Readability
Component: Readability → Reader Mode
Product: Firefox for Android → Toolkit
Out of scope for v1 work.
OS: Android → All
Priority: P2 → P3
Whiteboard: [readinglist-v2]
Whiteboard: [readinglist-v2] → [reader-mode-firefox-integration]
Type: defect → enhancement
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: