Last Comment Bug 765930 - Reader Mode: Optimize readability check
: Reader Mode: Optimize readability check
Status: RESOLVED FIXED
[sps]
: perf
Product: Firefox for Android
Classification: Client Software
Component: General (show other bugs)
: Trunk
: All Android
: -- normal (vote)
: Firefox 16
Assigned To: Mark Finkle (:mfinkle) (use needinfo?)
:
: Sebastian Kaspari (:sebastian)
Mentors:
Depends on:
Blocks: reader check2
  Show dependency treegraph
 
Reported: 2012-06-18 15:35 PDT by Lucas Rocha (:lucasr)
Modified: 2016-07-29 14:25 PDT (History)
8 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
fixed


Attachments
Log while readability heuristics are running (85.26 KB, text/plain)
2012-07-03 08:54 PDT, away[Nov24,Dec5) Kartikaya Gupta (email:kats@mozilla.com)
no flags Details
part 1: skip DOM manipulations on check (2.21 KB, patch)
2012-07-12 07:14 PDT, Mark Finkle (:mfinkle) (use needinfo?)
bnicholson: review+
Details | Diff | Splinter Review
part 2: wait until after page load to perform check (2.29 KB, patch)
2012-07-12 07:16 PDT, Mark Finkle (:mfinkle) (use needinfo?)
bnicholson: review+
Details | Diff | Splinter Review
part 3: make browsertoolbar smarter about page actions (5.72 KB, patch)
2012-07-12 15:08 PDT, Mark Finkle (:mfinkle) (use needinfo?)
sriram.mozilla: review+
Details | Diff | Splinter Review

Description Lucas Rocha (:lucasr) 2012-06-18 15:35:05 PDT
The readability check introduced in bug 750683 involves cloning the DOM document to find the top candidate node with the article content inside the page. We need to optimize this readability check as much as possible and avoid the document node cloning altogether.

Ideas:
- Avoid removing any node from document to process the page
- Look for more steps in the parsing process that are unnecessary for the readability check (we already skip prepDocument and removeScripts, there might be more)
Comment 1 Peter Retzer (:pretzer) 2012-06-23 11:08:46 PDT
Does this include optimizing current false positive cases? For example I'm seeing the reader icon on Bugzilla bug lists, which doesn't really make sense. In-scope or new bug?
Comment 2 Lucas Rocha (:lucasr) 2012-06-25 02:38:38 PDT
(In reply to pretzer from comment #1)
> Does this include optimizing current false positive cases? For example I'm
> seeing the reader icon on Bugzilla bug lists, which doesn't really make
> sense. In-scope or new bug?

Please file a separate bug for false positives. The readability detection code definitely has to be improved. This bug is about the efficiency of the readability checking code.
Comment 3 Peter Retzer (:pretzer) 2012-06-25 06:11:41 PDT
Filed bug 767956
Comment 4 away[Nov24,Dec5) Kartikaya Gupta (email:kats@mozilla.com) 2012-07-03 08:54:42 PDT
Created attachment 638754 [details]
Log while readability heuristics are running

+1 on optimizing the readability check. If you load a page like https://crash-stats.mozilla.com/report/index/bp-2b7c0808-37df-40f0-a4d2-a3ed42120703 and then immediately zoom in, the page remains blurry for ~30 seconds before it repaints (on a Galaxy Tab 10.1). Logcat (attached) indicates that it is running the readability detection heuristics during this time. In the log I did the pinch at around 11:47:16 and the page didn't render sharply until around 11:47:48 which is a ridiculously long time.
Comment 5 Benoit Girard (:BenWa) 2012-07-11 13:46:15 PDT
This script is showing up as a major source of main thread contention on mobile. Readability.js blocks for 1.7 seconds on cnn.com page load.

Profile:
http://people.mozilla.com/~bgirard/cleopatra/?report=AMIfv94gvUJfBU9O1bx1V8cewSf-Qz01_PLNizR0f4SKW39Cp0AXMKsIMNB3fCW4zwYS7ZnGoxLPBBfyfYWi3AuZaua2WXG5bVBTB6_ZM5GP4ZFEv5BxH04d7oHGjKDgtceS1GPFS3x0KJfq4I6xxjL0afF0O9SkOQ

I think we need to escalate this.
Comment 6 Mark Finkle (:mfinkle) (use needinfo?) 2012-07-12 07:14:45 PDT
Created attachment 641447 [details] [diff] [review]
part 1: skip DOM manipulations on check

This patch tries to skip any DOM manipulations when performing a check. It also fixes a typo "doc" -> "node".

This part seems to reduce the time to check on nytimes by half
Comment 7 Mark Finkle (:mfinkle) (use needinfo?) 2012-07-12 07:16:49 PDT
Created attachment 641448 [details] [diff] [review]
part 2: wait until after page load to perform check

We are currently doing the reader check on "DOMContentLoaded" which is too early. The page scripts have not run and the network isn't even really finished yet. This makes it seem to take longer for the page to load.

This patch moves to "pageshow", which is also called when going back/forward in session history too.
Comment 8 Mark Finkle (:mfinkle) (use needinfo?) 2012-07-12 07:19:19 PDT
Part 2 introduces a bug. It seems that the Java BrowserToolbar code is designed in a way that assumes the reader check is done _before_ the network progress is complete. Therefore, the reader icon is not displayed unless you change the orientation and the toolbar is refreshed. Patch coming up...
Comment 9 Mark Finkle (:mfinkle) (use needinfo?) 2012-07-12 15:08:04 PDT
Created attachment 641620 [details] [diff] [review]
part 3: make browsertoolbar smarter about page actions

BrowserToolbar assumes that the security mode and reader mode will be set beofre the network progress finishes. That is not a good idea.

This patch:
* Renames setReaderVisiblilty to setReaderMode, since it does not actually force the icon to be visible.
* Renames setStopVisibility to setPageActionVisibility since it's about more than "stop" now.
* setReaderMode and setSecurityMode both now call setPageActionVisibility in case the icon needs to be shown now.

The only real hack I had was using mStop.getVisibility() as a param to setPageActionVisibility, but it works well enough for now.
Comment 10 Sriram Ramasubramanian [:sriram] 2012-07-12 15:21:01 PDT
Comment on attachment 641620 [details] [diff] [review]
part 3: make browsertoolbar smarter about page actions

Review of attachment 641620 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good to me.
Comment 11 Brian Nicholson (:bnicholson) 2012-07-13 01:08:47 PDT
Comment on attachment 641447 [details] [diff] [review]
part 1: skip DOM manipulations on check

Review of attachment 641447 [details] [diff] [review]:
-----------------------------------------------------------------

These changes look pretty harmless, and they didn't affect my readability test cases.

::: mobile/android/chrome/content/Readability.js
@@ -1300,5 @@
>  
>    check: function() {
> -    // Set proper flags for parsing document in readability check mode
> -    this._flags = this.FLAG_READABILITY_CHECK |
> -                  this.FLAG_STRIP_UNLIKELYS;

I wonder if this change might actually hurt performance. It avoids removing them, which should help, but that means we leave more nodes to be processed when parsing - especially if these nodes have lots of children.
Comment 12 Brian Nicholson (:bnicholson) 2012-07-13 01:18:52 PDT
Comment on attachment 641448 [details] [diff] [review]
part 2: wait until after page load to perform check

Review of attachment 641448 [details] [diff] [review]:
-----------------------------------------------------------------

Hopefully this will fix bug 767959.

Note You need to log in before you can comment on or make changes to this bug.