Closed Bug 1152219 Opened 4 years ago Closed 4 years ago

Raise (remove?) the MAX_ELEMS_TO_PARSE limit

Categories

(Toolkit :: Reader Mode, defect)

38 Branch
x86
macOS
defect
Not set
Points:
1

Tracking

()

RESOLVED FIXED
mozilla40
Iteration:
40.2 - 27 Apr
Tracking Status
firefox38 --- fixed
firefox39 --- fixed
firefox40 --- fixed

People

(Reporter: NiKo, Assigned: Gijs)

Details

Attachments

(3 files)

Trying to render a readable version of https://wiki.debian.org/WhyTheName, we get a failure, reader page displays "Failed to load article from page", while debug log says:

> Aborting parse for https://wiki.debian.org/WhyTheName; 6593 elements found

The current limit set for MAX_ELEMS_TO_PARSE is 3000 elements, which seems to me to be an highly empirically defined number (though I might be wrong; any input appreciated here).

We could either:

- increase that number (question is then how to determine it accurately)
- remove the limit entirely (question is then how this could possibly impact UX)
- improve the error message (we should do this *anyway* imho)

Thoughts?
If we do this, we'll have to be careful about memory use/performance on mobile. Perhaps we can set this as a pref, and have a different value for desktop/mobile (or also change it depending on the available memory on mobile).

However, I would definitely want to fix bug 1150174 before increasing this value at all on mobile.
Component: Reading List → Reader Mode
Product: Firefox → Toolkit
(In reply to :Margaret Leibovic from comment #1)
> Perhaps we can set this as a pref

Sounds good. We could add a third `options` argument to the Readability() constructor:

> new Readability(uri, document, {
>   debug: true,
>   maxElemsToParse: 5000,
>   // etc.
> });

I've just filed https://github.com/mozilla/readability/issues/128 about this.
Woops, just realized that MAX_ELEMS_TO_PARSE is chrome only, not Readability. Browser pref it is, then.
Actually, we could totally delegate the task of counting the number of nodes and aborting the parsing process to Readability itself, that wouldn't change chrome-side code too much while bringing the ability to add unit tests to Readability. Thoughts?
PR https://github.com/mozilla/readability/pull/129 is an early attempt at doing just that, feedback welcome.
Comment on attachment 8590180 [details] [diff] [review]
Part 1 - Update Readability and JSDOMParser to latest version.

Review of attachment 8590180 [details] [diff] [review]:
-----------------------------------------------------------------

Gijs and I have just been updating Readability.js/JSDOMParser.js directly in fx-team without bugs/patches. This seems fine to me, but I think we should probably keep the warnings at the tops of the files until we come up with a different solution for making sure people don't try to patch these files directly.

::: toolkit/components/reader/JSDOMParser.js
@@ -2,5 @@
> - * DO NOT MODIFY THIS FILE DIRECTLY!
> - *
> - * This is a shared library that is maintained in an external repo:
> - * https://github.com/mozilla/readability
> - */

Gijs and I have been maintaining these headers, but it does make it annoying to update these files... maybe we should just get rid of them and count on reviewers knowing not to update these files directly? Or maybe we should move these to some /thirdparty directory.
Attachment #8590180 - Flags: review?(margaret.leibovic) → review+
Comment on attachment 8590182 [details] [diff] [review]
Part 2 - Use the new maxNbElemsToParse option from Readability.

Review of attachment 8590182 [details] [diff] [review]:
-----------------------------------------------------------------

r- because this has the potential to regress page load performance on Android.

::: toolkit/components/reader/ReaderMode.jsm
@@ -230,5 @@
> -    let numTags = doc.getElementsByTagName("*").length;
> -    if (numTags > this.MAX_ELEMS_TO_PARSE) {
> -      this.log("Aborting parse for " + uri.spec + "; " + numTags + " elements found");
> -      return null;
> -    }

Once nice thing about having this check here is that we avoid doing the DOM serialization/parsing in this case as well.

As this function is currently run on every page load on Fennec, I'd be worried about adding additional computational complexity right now. I think we should wait on bug 1150174 before moving forward with this approach. If we want to go ahead with this behavior for desktop, we could create a pref in all.js/firefox.js that changes this.
Attachment #8590182 - Flags: review?(margaret.leibovic) → review-
Comment on attachment 8590180 [details] [diff] [review]
Part 1 - Update Readability and JSDOMParser to latest version.

I just went for it and updated these files to match the lastest readability repo master:
https://hg.mozilla.org/integration/fx-team/rev/e4242ba4cc35
I'd like to land this and uplift ASAP because it's impacting testing of real sites (e.g. long wikipedia pages) and so it'd be helpful to have that part of the noise removed (at least on desktop). We can easily turn the pref to 0 for mobile as well if/when it's ready, even quite late - at least like this we have something in place for desktop. I'd like to leave at least the facility to change this in here for the initial release; if it turns out we really don't ever need it anymore, we can remove it post 38.1. Sound like a plan?
Attachment #8593408 - Flags: review?(margaret.leibovic)
Assignee: nobody → gijskruitbosch+bugs
Status: NEW → ASSIGNED
Iteration: --- → 40.2 - 27 Apr
Flags: qe-verify?
Flags: firefox-backlog+
Comment on attachment 8593408 [details] [diff] [review]
make reader mode node limit a pref, turn off entirely for desktop because of isProbablyReaderable,

Review of attachment 8593408 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great, thanks.
Attachment #8593408 - Flags: review?(margaret.leibovic) → review+
Hi Gijs, can you provide a point value.
Flags: needinfo?(gijskruitbosch+bugs)
Points: --- → 1
Flags: qe-verify?
Flags: qe-verify-
Flags: needinfo?(gijskruitbosch+bugs)
https://hg.mozilla.org/mozilla-central/rev/1b47d308e51c
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
Comment on attachment 8593408 [details] [diff] [review]
make reader mode node limit a pref, turn off entirely for desktop because of isProbablyReaderable,

Approval Request Comment
[Feature/regressing bug #]: reader view - this causes e.g. bug 1149418, and probably others
[User impact if declined]: we offer reader view but it is broken
[Describe test coverage new/current, TreeHerder]: limited on m-c infra, sadly
[Risks and why]: low, this basically converts a code constant into a pref, and sets the pref to 0 for desktop, causing there to no longer be a limit to the number of nodes we process.
[String/UUID change made/needed]: none
Attachment #8593408 - Flags: approval-mozilla-beta?
Attachment #8593408 - Flags: approval-mozilla-aurora?
Comment on attachment 8593408 [details] [diff] [review]
make reader mode node limit a pref, turn off entirely for desktop because of isProbablyReaderable,

Should be in 38 beta 6.
Attachment #8593408 - Flags: approval-mozilla-beta?
Attachment #8593408 - Flags: approval-mozilla-beta+
Attachment #8593408 - Flags: approval-mozilla-aurora?
Attachment #8593408 - Flags: approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.