Reader Mode: Support multi-page articles

NEW
Unassigned

Status

()

Toolkit
Reader Mode
P3
normal
6 years ago
2 years ago

People

(Reporter: lucasr, Unassigned)

Tracking

(Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [reader-mode-firefox-integration])

Attachments

(2 obsolete attachments)

(Reporter)

Description

6 years ago
If the article in the site is split into multiple pages, we have to incrementally load the rest of the content and append to the initially loaded article content.

The readability script does support it but I've disabled it to be able to land the first reader mode patches.
(Reporter)

Updated

6 years ago
Component: General → Reader Mode
(Reporter)

Comment 1

6 years ago
Multi-page articles are quite common on some major websites (e.g. nytimes, wired, ...).
Priority: -- → P2

Comment 2

6 years ago
Oftentimes the printable version of the article contains the full text as well.  That's consistent enough among major websites it might be useful?

Some also provide a full-text/one-page button, but my feeling is that the nomenclature isn't very consistent.

Comment 3

5 years ago
Created attachment 715159 [details] [diff] [review]
first patch for multipage articles in reader mode
Attachment #715159 - Flags: feedback?(lucasr.at.mozilla)
(Reporter)

Comment 4

5 years ago
Comment on attachment 715159 [details] [diff] [review]
first patch for multipage articles in reader mode

Review of attachment 715159 [details] [diff] [review]:
-----------------------------------------------------------------

Looking promising but needs work. Readability.js contains the code to fetch the next page link for you (with _findNextPageLink() and all). Why did you need to hardcode specific URLs?

::: mobile/android/chrome/content/Readability.js
@@ +194,5 @@
>     * @return void
>     **/
> +  _prepDocument: function(document) {
> +    //let doc = this._doc;
> +    let doc = document;

How's this change related to the main point of the patch?

@@ +873,5 @@
>    _findBaseUrl: function() {
>      let uri = this._uri;
> +    //let noUrlParams = uri.path.split("?")[0];
> +    let noUrlParams = uri.spec.split("?")[0];
> +    noUrlParams = noUrlParams.replace(uri.prePath, "");

Some context about this change would be nice :-) What's this about?

@@ +921,5 @@
>          cleanedSegments.push(segment);
>      }
>  
>      // This is our final, cleaned, base article URL.
> +    //return uri.scheme + "://" + uri.host + cleanedSegments.reverse().join("/");

Cleanup the commented code?

@@ +956,5 @@
> +       if(!linkHref.match(/www\..+\.com/)) {
> +
> +        linkHref = uri.prePath + linkHref;
> +       }
> +      }

What's this about? Hardcoding pages here won't scale well. Doesn't the original algorithm cover these things?

@@ +1072,5 @@
> +      // if the URl contins things like ad,ads,adn,adt, for sure is not a next page link, give a huge decrease.
> +      // example : "http://www.nytimes.com/adx/bin/adx_click.html?type=goto&opzn&page=www.nytimes.com/yr/mo/day/world/africa&pos=Box1&sn2=5be1d228/22bac537&sn1=4db0b17/d3b11b59&camp=IHT2012-Mktg-336x280-US_ROS-Intl-Mktg_module&ad=28Jan13_Opinion_IndiaInk_336x280&goto=http://global.nytimes.com/"
> +      //also for things like slideshow(for some reason a link for some slideshow images got a bigger score then the NextPage link so we penalize it)
> +      if (linkHref.match(/\/ad\w{1,4}|slideshow/i))
> +       linkObj.score -= 100;

Where's this code coming from?
Attachment #715159 - Flags: feedback?(lucasr.at.mozilla) → feedback-

Comment 5

5 years ago
I tested this patch mainly on www.nytimes.com (and a little on www.phoronix.com, here I run in to another problem...) 
1."Why did you need to hardcode specific URLs?"
I had to hardcode specific URLs because on www.nytimes.com the next page link from the <a> tag does not contain the whole URL for the next page but only the path 
example: <a class="next" href="/2013/03/01/us/politics/house-republicans-cheer-boehners-refusal-to-negotiate-on-cuts.html?pagewanted=2&hp" title="Next Page" onclick="s_code_linktrack('Article-MultiPage-Next');"></a>
so because of this code: 
http://mxr.mozilla.org/mozilla-central/source/mobile/android/chrome/content/Readability.js#956
next page link was ignored

2."::: mobile/android/chrome/content/Readability.js
  @@ +194,5 @@"
Now it is not related to this patch (I made this change at the beginning of my attempt of solving this bug)

3."@@ +873,5 @@"
uri.path gave me undefined for some reason so I had to make an workaround.
Also if I recall correctly uri.scheme gave me undefined too.

4."@@ +921,5 @@"
ok :-)

5."@@ +956,5 @@"
this is just for www.nytimes.com and for www.phoronix.com for the reason explained in point 1.

6."@@ +1072,5 @@"
This code is coming from me :-). I added it because for some unknown reason some links(add ons, slide show) still got a higher score then the next page link.

Comment 6

5 years ago
Created attachment 721181 [details] [diff] [review]
second patch of multipage articles in reader mode
Attachment #715159 - Attachment is obsolete: true
Attachment #721181 - Flags: review?(lucasr.at.mozilla)
(Reporter)

Updated

5 years ago
Blocks: 917884
No longer blocks: 696921
(Reporter)

Comment 7

5 years ago
Comment on attachment 721181 [details] [diff] [review]
second patch of multipage articles in reader mode

Thanks a lot for the patch, Alex. Apologies for the huge delay here :-/ The reader mode has changed a lot since you submitted this patch. Let me know if you'd like to discuss how to complete this work. Clearing the review flag for now.
Attachment #721181 - Flags: review?(lucasr.at.mozilla)
Just a friendly ping to see how things are going in this bug? :)

Updated

4 years ago
Attachment #721181 - Attachment is obsolete: true

Updated

4 years ago
Blocks: 1102450
No longer blocks: 917884

Updated

4 years ago
Component: Reader Mode → Readability

Updated

4 years ago
Component: Readability → Reader Mode
Product: Firefox for Android → Toolkit
Out of scope for v1 work.
OS: Android → All
Priority: P2 → P3
Whiteboard: [readinglist-v2]

Updated

2 years ago
Whiteboard: [readinglist-v2] → [reader-mode-firefox-integration]
You need to log in before you can comment on or make changes to this bug.