Closed Bug 784386 Opened 12 years ago Closed 12 years ago

Big memory spikes on readability check

Tracking

(firefox16 fixed, firefox17 fixed)

Status:

RESOLVED FIXED

Milestone:

Firefox 18

Tracking Flags:

Tracking

Status

firefox16

---

fixed

firefox17

---

fixed

People

(Reporter: lucasr, Assigned: bnicholson)

References

Details

Attachments

(12 files, 7 obsolete files)

Part 1: Replace string concatenations with Array.join() 12 years ago Brian Nicholson (:bnicholson) 2.84 KB, patch	lucasr : review+	Details \| Diff \| Splinter Review
Part 2: Replace innerHTML call for <font> to <span> conversion 12 years ago Brian Nicholson (:bnicholson) 1.08 KB, patch		Details \| Diff \| Splinter Review
Part 3: Replace innerHTML call for <br><p> to <p> conversion 12 years ago Brian Nicholson (:bnicholson) 4.84 KB, patch	lucasr : review+	Details \| Diff \| Splinter Review
Part 4: Replace innerHTML call for <div> to <p> conversion 12 years ago Brian Nicholson (:bnicholson) 2.31 KB, patch		Details \| Diff \| Splinter Review
Part 5: Replace innerHTML call for topCandidate 12 years ago Brian Nicholson (:bnicholson) 1.43 KB, patch	lucasr : review+	Details \| Diff \| Splinter Review
Part 6: Replace innerHTML call for putting sibling in <div> 12 years ago Brian Nicholson (:bnicholson) 1.69 KB, patch	lucasr : review-	Details \| Diff \| Splinter Review
Part 7: Replace innerHTML call for readability-page-1 <div> 12 years ago Brian Nicholson (:bnicholson) 1.51 KB, patch	lucasr : review+	Details \| Diff \| Splinter Review
Part 8: Replace innerHTML call for divToPElements search 12 years ago Brian Nicholson (:bnicholson) 2.53 KB, patch		Details \| Diff \| Splinter Review
Part 2: Replace innerHTML call for <font> to <span> conversion, v2 12 years ago Brian Nicholson (:bnicholson) 1.62 KB, patch	lucasr : review+	Details \| Diff \| Splinter Review
Part 4: Replace innerHTML call for <div> to <p> conversion, v2 12 years ago Brian Nicholson (:bnicholson) 1.78 KB, patch	lucasr : review-	Details \| Diff \| Splinter Review
Part 8: Replace innerHTML call for divToPElements search, v2 12 years ago Brian Nicholson (:bnicholson) 2.52 KB, patch	lucasr : review-	Details \| Diff \| Splinter Review
Part 1: Replace string concatenations with Array.join(), v2 12 years ago Brian Nicholson (:bnicholson) 3.23 KB, patch	lucasr : review+	Details \| Diff \| Splinter Review
Part 4: Replace innerHTML call for <div> to <p> conversion, v3 12 years ago Brian Nicholson (:bnicholson) 2.03 KB, patch	lucasr : review+	Details \| Diff \| Splinter Review
Part 6: Replace innerHTML call for putting sibling in <div>, v2 12 years ago Brian Nicholson (:bnicholson) 2.34 KB, patch	lucasr : review+	Details \| Diff \| Splinter Review
Part 8: Replace innerHTML call for divToPElements search, v3 12 years ago Brian Nicholson (:bnicholson) 3.82 KB, patch	lucasr : review+	Details \| Diff \| Splinter Review
Part 9: Move pageCacheHtml outside of loop 12 years ago Brian Nicholson (:bnicholson) 1.71 KB, patch	lucasr : review+	Details \| Diff \| Splinter Review
Part 10: Remove killBreaks() 12 years ago Brian Nicholson (:bnicholson) 3.88 KB, patch	lucasr : review+	Details \| Diff \| Splinter Review
Part 11: Replace setting innerHTML with textContent for text 12 years ago Brian Nicholson (:bnicholson) 1.09 KB, patch	lucasr : review+	Details \| Diff \| Splinter Review
roll-up patch for aurora and beta 12 years ago Brian Nicholson (:bnicholson) 20.23 KB, patch	lsblakk : approval-mozilla-aurora+ lsblakk : approval-mozilla-beta+	Details \| Diff \| Splinter Review

Lucas Rocha (:lucasr)

Reporter

Description

•

12 years ago

Some users have reported big memory spikes while firefox is running the readability check on pages. We have to gather data on this issue and figure out how to fix it.

Lucas Rocha (:lucasr)

Reporter

Updated

•

12 years ago

Priority: -- → P1

Lucas Rocha (:lucasr)

Reporter

Comment 1

•

12 years ago

Assigning to Brian as he's working on it.

Assignee: nobody → bnicholson

Brian Nicholson (:bnicholson)

Assignee

Comment 2

•

12 years ago

Attached patch Part 1: Replace string concatenations with Array.join() (obsolete) — Details — Splinter Review

We're building lots of huge string objects as the strings get incrementally larger with each concatenation. For example, concatenating "aaa" + "bbb" + "ccc" would look like this:

str = "aaa"  // strings in heap: "aaa"
str += "bbb" // strings in heap: "aaa", "aaabbb"
str += "ccc" // strings in heap: "aaa", "aaabbb", "aaabbbccc"

whereas with this patch, str is created in one go:

str = ["aaa", "bbb", "ccc"].join("") // strings in heap: "aaabbbccc" 

In practice, there are many, many more strings, and the resulting strings will be much larger (the size of the document), so we should avoid creating tons of huge intermediate strings.

Attachment #656713 - Flags: review?(lucasr.at.mozilla)

Brian Nicholson (:bnicholson)

Assignee

Comment 3

•

12 years ago

Attached patch Part 2: Replace innerHTML call for <font> to <span> conversion (obsolete) — Details — Splinter Review

Attachment #656714 - Flags: review?(lucasr.at.mozilla)

Brian Nicholson (:bnicholson)

Assignee

Comment 4

•

12 years ago

Attached patch Part 3: Replace innerHTML call for <br><p> to <p> conversion — Details — Splinter Review

Attachment #656715 - Flags: review?(lucasr.at.mozilla)

Brian Nicholson (:bnicholson)

Assignee

Comment 5

•

12 years ago

Attached patch Part 4: Replace innerHTML call for <div> to <p> conversion (obsolete) — Details — Splinter Review

Attachment #656716 - Flags: review?(lucasr.at.mozilla)

Brian Nicholson (:bnicholson)

Assignee

Comment 6

•

12 years ago

Attached patch Part 5: Replace innerHTML call for topCandidate — Details — Splinter Review

Attachment #656717 - Flags: review?(lucasr.at.mozilla)

Brian Nicholson (:bnicholson)

Assignee

Comment 7

•

12 years ago

Attached patch Part 6: Replace innerHTML call for putting sibling in <div> (obsolete) — Details — Splinter Review

Brian Nicholson (:bnicholson)

Assignee

Updated

•

12 years ago

Attachment #656718 - Flags: review?(lucasr.at.mozilla)

Brian Nicholson (:bnicholson)

Assignee

Comment 8

•

12 years ago

Attached patch Part 7: Replace innerHTML call for readability-page-1 <div> — Details — Splinter Review

Attachment #656719 - Flags: review?(lucasr.at.mozilla)

Brian Nicholson (:bnicholson)

Assignee

Comment 9

•

12 years ago

Attached patch Part 8: Replace innerHTML call for divToPElements search (obsolete) — Details — Splinter Review

Attachment #656720 - Flags: review?(lucasr.at.mozilla)

Brian Nicholson (:bnicholson)

Assignee

Comment 10

•

12 years ago

There's still more to come, so I'll keep adding patches as they get done. Sorry for so many files, but I figured they'd be easier to review each separate change than if they were all combined into a single monolithic patch.

Brian Nicholson (:bnicholson)

Assignee

Comment 11

•

12 years ago

Attached patch Part 2: Replace innerHTML call for <font> to <span> conversion, v2 — Details — Splinter Review

Moved _setNodeTag() from patch 4 to this one.

Attachment #656714 - Attachment is obsolete: true

Attachment #656714 - Flags: review?(lucasr.at.mozilla)

Attachment #656722 - Flags: review?(lucasr.at.mozilla)

Brian Nicholson (:bnicholson)

Assignee

Comment 12

•

12 years ago

Attached patch Part 4: Replace innerHTML call for <div> to <p> conversion, v2 (obsolete) — Details — Splinter Review

Moved _setNodeTag() to patch 2.

Attachment #656716 - Attachment is obsolete: true

Attachment #656716 - Flags: review?(lucasr.at.mozilla)

Attachment #656723 - Flags: review?(lucasr.at.mozilla)

Brian Nicholson (:bnicholson)

Assignee

Updated

•

12 years ago

Attachment #656723 - Attachment description: Part 4: Replace innerHTML call for <div> to <p> conversion → Part 4: Replace innerHTML call for <div> to <p> conversion, v2

Brian Nicholson (:bnicholson)

Assignee

Comment 13

•

12 years ago

Attached patch Part 8: Replace innerHTML call for divToPElements search, v2 (obsolete) — Details — Splinter Review

Fixed return value for hasDivToPElement().

Attachment #656720 - Attachment is obsolete: true

Attachment #656720 - Flags: review?(lucasr.at.mozilla)

Attachment #656726 - Flags: review?(lucasr.at.mozilla)

Mark Finkle (:mfinkle) (use needinfo?)

Comment 14

•

12 years ago

What is the affect on memory usage with these patches?

Lucas Rocha (:lucasr)

Reporter

Comment 15

•

12 years ago

Comment on attachment 656713 [details] [diff] [review]
Part 1: Replace string concatenations with Array.join()

Review of attachment 656713 [details] [diff] [review]:
-----------------------------------------------------------------

Please a comment explaining why the use of Array.join() is relevant here for future reference.

Attachment #656713 - Flags: review?(lucasr.at.mozilla) → review+

Lucas Rocha (:lucasr)

Reporter

Updated

•

12 years ago

Attachment #656715 - Flags: review?(lucasr.at.mozilla) → review+

Lucas Rocha (:lucasr)

Reporter

Comment 16

•

12 years ago

Comment on attachment 656717 [details] [diff] [review]
Part 5: Replace innerHTML call for topCandidate

Review of attachment 656717 [details] [diff] [review]:
-----------------------------------------------------------------

This patch is fine but I wonder if you could simply do "topCandidate = page" if topCandidate is null. We don't care about the 'page' variable after topCandidate is set and we simply reset it (page.innerHTML = pageCacheHtml) if we need another loop iteration.

Attachment #656717 - Flags: review?(lucasr.at.mozilla) → review+

Lucas Rocha (:lucasr)

Reporter

Updated

•

12 years ago

Attachment #656722 - Flags: review?(lucasr.at.mozilla) → review+

Lucas Rocha (:lucasr)

Reporter

Comment 17

•

12 years ago

Comment on attachment 656723 [details] [diff] [review]
Part 4: Replace innerHTML call for <div> to <p> conversion, v2

Review of attachment 656723 [details] [diff] [review]:
-----------------------------------------------------------------

I'd like to see a new version of this patch with suggested fixes.

::: mobile/android/chrome/content/Readability.js
@@ +490,5 @@
>            // algorithm with DIVs with are, in practice, paragraphs.
>            let pIndex = this._getSinglePIndexInsideDiv(node);
>  
>            if (node.innerHTML.search(this.REGEXPS.divToPElements) === -1 || pIndex >= 0) {
>              let newNode;

No need for this variable anymore, right?

@@ +499,2 @@
>              } else {
> +              newNode = node;

You can just use "node" directly here, no need to use newNode.

@@ +499,3 @@
>              } else {
> +              newNode = node;
> +              this._setNodeTag(newNode, "P");

I'd prefer if you keep the "nodesToScore[nodesToScore.length] = node;" here instead of decrementing nodeIndex (which much less obvious).

Attachment #656723 - Flags: review?(lucasr.at.mozilla) → review-

Lucas Rocha (:lucasr)

Reporter

Comment 18

•

12 years ago

Comment on attachment 656718 [details] [diff] [review]
Part 6: Replace innerHTML call for putting sibling in <div>

Review of attachment 656718 [details] [diff] [review]:
-----------------------------------------------------------------

Just want the question answered before giving a r+.

::: mobile/android/chrome/content/Readability.js
@@ +666,5 @@
>  
>          if (append) {
>            this.log("Appending node: " + siblingNode);
>  
> +          let nodeToAppend = siblingNode;

No real need to declatre this nodeToAppend variable, right?

@@ +668,5 @@
>            this.log("Appending node: " + siblingNode);
>  
> +          let nodeToAppend = siblingNode;
> +          s -= 1;
> +          sl -= 1;

Not caused by your patch but I think this decrement only makes sense if we were using a live node list, no? For instance, where do we remove the appended siblingNode fro the siblingNodes array? What am I missing?

Attachment #656718 - Flags: review?(lucasr.at.mozilla) → review-

Lucas Rocha (:lucasr)

Reporter

Comment 19

•

12 years ago

Comment on attachment 656719 [details] [diff] [review]
Part 7: Replace innerHTML call for readability-page-1 <div>

Review of attachment 656719 [details] [diff] [review]:
-----------------------------------------------------------------

This could probably just be commented out for now as we don't really support multi-page articles yet.

::: mobile/android/chrome/content/Readability.js
@@ +691,2 @@
>  
>        yield { msg: "grabArticle 7" };

I'm assuming you'll remove these yield calls before pushing btw :-)

@@ +695,5 @@
>        this._prepArticle(articleContent);
>  
>        yield { msg: "grabArticle 8" };
>  
> +      if (this._curPageNum === 1) {

This could probably just be commented out for now by the way.

@@ +698,5 @@
>  
> +      if (this._curPageNum === 1) {
> +        let div = doc.createElement("DIV");
> +        div.id = "readability-page-1";
> +        div.className = "page";

nit: add empty line here.

Attachment #656719 - Flags: review?(lucasr.at.mozilla) → review+

Lucas Rocha (:lucasr)

Reporter

Comment 20

•

12 years ago

Comment on attachment 656726 [details] [diff] [review]
Part 8: Replace innerHTML call for divToPElements search, v2

Review of attachment 656726 [details] [diff] [review]:
-----------------------------------------------------------------

I'd like to see a cleaned up version of this patch before giving r+.

::: mobile/android/chrome/content/Readability.js
@@ +412,5 @@
>     * @param page a document to run upon. Needs to be a full document, complete with body.
>     * @return Element
>    **/
>    _grabArticle: function (page) {
> +    function hasDivToPElement(node) {

Make this a top level method just like _getSinglePIndexInsideDiv().

@@ +417,5 @@
> +      let length = node.childNodes.length;
> +      for (let i = 0; i < length; i++) {
> +        let child = node.childNodes[i];
> +        if (child.nodeType != 1)
> +          continue;

nit: add empty line here.

@@ -489,5 @@
>            // safely converted into plain P elements to avoid confusing the scoring
>            // algorithm with DIVs with are, in practice, paragraphs.
>            let pIndex = this._getSinglePIndexInsideDiv(node);
>  
> -          if (node.innerHTML.search(this.REGEXPS.divToPElements) === -1 || pIndex >= 0) {

I guess you can remove the divToPElements regex now?

Attachment #656726 - Flags: review?(lucasr.at.mozilla) → review-

Lucas Rocha (:lucasr)

Reporter

Comment 21

•

12 years ago

Good stuff, Brian! I'd be interested to know if you have any (rough) estimates of how much these patches impact performance? Does the readability check take longer or does it get faster? Also, same question than mfinkle: what kind of memory usage improvements are we talking about here?

These patches a fairly risky. We'll probably need to let them bake for a long enough period in Nightly before uplifting.

Lucas Rocha (:lucasr)

Reporter

Comment 22

•

12 years ago

Could you please run the reader test app with this file and confirm we're not regressing?

  http://dl.dropbox.com/u/1187037/sites.json

Just a list of sites I've been testing on. Feel free to merge these into the git repo.

Brian Nicholson (:bnicholson)

Assignee

Comment 23

•

12 years ago

(In reply to Mark Finkle (:mfinkle) from comment #14)
> What is the affect on memory usage with these patches?

This isn't done yet, but here's some before/after numbers I saw with these patches:

Before:
non-function objects: 106.8MB
strings: 95.5MB
string-chars: 20.7MB
gc-heap-chunk-dirty-unused: 46.1

After:
non-function objects: 45.6MB
strings: 28.4MB
string-chars: 15MB
gc-heap-chunk-dirty-unused: 27.7

These represent the maximum memory usage I observed for each category during a parse. Overall, this appears to drop the maximum memory usage by about 50%. As an added bonus, removing these innerHTML calls considerably improves parsing speed (I'll give more numbers once all patches are done).

Brian Nicholson (:bnicholson)

Assignee

Comment 24

•

12 years ago

(In reply to Lucas Rocha (:lucasr) from comment #22)
> Could you please run the reader test app with this file and confirm we're
> not regressing?
> 
>   http://dl.dropbox.com/u/1187037/sites.json
> 
> Just a list of sites I've been testing on. Feel free to merge these into the
> git repo.

All these sites (and the ones in the repo) pass the same before/and after these patches.

I'd actually like to think these patches are fairly low risk - it's relatively straightforward to replace these innerHTML calls to the non-innerHTML equivalents, and in several cases, the process is the same. And we have a pretty thorough test suite to make sure we aren't regressing along the way.

Brian Nicholson (:bnicholson)

Assignee

Comment 25

•

12 years ago

(In reply to Lucas Rocha (:lucasr) from comment #18)
> Comment on attachment 656718 [details] [diff] [review]
> Part 6: Replace innerHTML call for putting sibling in <div>
> 
> Review of attachment 656718 [details] [diff] [review]:
> -----------------------------------------------------------------
> @@ +668,5 @@
> >            this.log("Appending node: " + siblingNode);
> >  
> > +          let nodeToAppend = siblingNode;
> > +          s -= 1;
> > +          sl -= 1;
> 
> Not caused by your patch but I think this decrement only makes sense if we
> were using a live node list, no? For instance, where do we remove the
> appended siblingNode fro the siblingNodes array? What am I missing?

appendChild() calls removeChild(), which does this:

  childNodes.splice(childIndex, 1)[0];

siblingNodes is just a reference to the childNodes array.

Brian Nicholson (:bnicholson)

Assignee

Comment 26

•

12 years ago

Attached patch Part 1: Replace string concatenations with Array.join(), v2 — Details — Splinter Review

I changed the textContent getter to use a single array for the entire operation. In the previous patch, each level of recursion resulted in an Array.join(), which was less efficient and resulted in more strings on the heap.

Attachment #656713 - Attachment is obsolete: true

Attachment #657162 - Flags: review?(lucasr.at.mozilla)

Brian Nicholson (:bnicholson)

Assignee

Comment 27

•

12 years ago

Attached patch Part 4: Replace innerHTML call for <div> to <p> conversion, v3 — Details — Splinter Review

Attachment #656723 - Attachment is obsolete: true

Attachment #657166 - Flags: review?(lucasr.at.mozilla)

Brian Nicholson (:bnicholson)

Assignee

Comment 28

•

12 years ago

Attached patch Part 6: Replace innerHTML call for putting sibling in <div>, v2 — Details — Splinter Review

Attachment #656718 - Attachment is obsolete: true

Attachment #657168 - Flags: review?(lucasr.at.mozilla)

Brian Nicholson (:bnicholson)

Assignee

Comment 29

•

12 years ago

Attached patch Part 8: Replace innerHTML call for divToPElements search, v3 — Details — Splinter Review

Attachment #656726 - Attachment is obsolete: true

Attachment #657169 - Flags: review?(lucasr.at.mozilla)

Brian Nicholson (:bnicholson)

Assignee

Comment 30

•

12 years ago

Attached patch Part 9: Move pageCacheHtml outside of loop — Details — Splinter Review

This avoids getting the innerHTML with each loop iteration (which should always give the same thing).

Attachment #657170 - Flags: review?(lucasr.at.mozilla)

Brian Nicholson (:bnicholson)

Assignee

Comment 31

•

12 years ago

Attached patch Part 10: Remove killBreaks() — Details — Splinter Review

killBreaks() is a pretty expensive operation, and I'm not sure we even need it. It appears to do two things:
1) Removes whitespace after <brs>, and
2) Removes multiple adjacent <br>s (optionally separated by whitespace)

#1 shouldn't really affect the output - there's extra space all over HTML that doesn't cause any harm. #2 is, I think, already done by replaceBrs(). Looking through all sites in the reader test, I haven't noticed any differences in output with this patch applied, so let me know if you notice anything.

Also, this change somehow triggered the following:

      let contentScore = (typeof tagsList[i].readability !== 'undefined') ? tagsList[i].this._contentScore : 0;

where tagsList[i].readability is true somewhere. This gave an error that "tagsList[i].this" was undefined. Looking at this code, neither the "this" property nor the "_contentScore" property seem to be set anywhere, so I removed them.

Attachment #657174 - Flags: review?(lucasr.at.mozilla)

Brian Nicholson (:bnicholson)

Assignee

Comment 32

•

12 years ago

Attached patch Part 11: Replace setting innerHTML with textContent for text — Details — Splinter Review

Setting innerHTML is more expensive than setting textContent since it tries to parse the string. We might as well just use textContent here since the text is just coming from a TEXT_NODE.

Attachment #657175 - Flags: review?(lucasr.at.mozilla)

Brian Nicholson (:bnicholson)

Assignee

Comment 33

•

12 years ago

Thanks for reviewing all these. This should pretty much cover all of the obvious changes I could find; it dropped the max memory usage from one test site (http://en.wikipedia.org/wiki/List_of_j%C5%8Dy%C5%8D_kanji) from 170MB to 70MB. Obviously, 70MB is still too much, so we should also have a check that makes sure the page is a reasonable size before we even try to parse it (e.g., do an element count via getElementsByTagName()).

I'll post more numbers tomorrow morning regarding the memory/performance gains from these patches. Here's a build if you'd like to test it: http://dl.dropbox.com/u/35559547/fennec-worker-mem.apk

Lucas Rocha (:lucasr)

Reporter

Updated

•

12 years ago

Attachment #657162 - Flags: review?(lucasr.at.mozilla) → review+

Lucas Rocha (:lucasr)

Reporter

Updated

•

12 years ago

Attachment #657166 - Flags: review?(lucasr.at.mozilla) → review+

Lucas Rocha (:lucasr)

Reporter

Updated

•

12 years ago

Attachment #657168 - Flags: review?(lucasr.at.mozilla) → review+

Lucas Rocha (:lucasr)

Reporter

Updated

•

12 years ago

Attachment #657169 - Flags: review?(lucasr.at.mozilla) → review+

Lucas Rocha (:lucasr)

Reporter

Comment 34

•

12 years ago

Comment on attachment 657170 [details] [diff] [review]
Part 9: Move pageCacheHtml outside of loop

Review of attachment 657170 [details] [diff] [review]:
-----------------------------------------------------------------

Nice catch.

Attachment #657170 - Flags: review?(lucasr.at.mozilla) → review+

Lucas Rocha (:lucasr)

Reporter

Comment 35

•

12 years ago

Comment on attachment 657174 [details] [diff] [review]
Part 10: Remove killBreaks()

Review of attachment 657174 [details] [diff] [review]:
-----------------------------------------------------------------

Makes sense.

Attachment #657174 - Flags: review?(lucasr.at.mozilla) → review+

Lucas Rocha (:lucasr)

Reporter

Updated

•

12 years ago

Attachment #657175 - Flags: review?(lucasr.at.mozilla) → review+

Lucas Rocha (:lucasr)

Reporter

Comment 36

•

12 years ago

(In reply to Brian Nicholson (:bnicholson) from comment #33)
> Thanks for reviewing all these. This should pretty much cover all of the
> obvious changes I could find; it dropped the max memory usage from one test
> site (http://en.wikipedia.org/wiki/List_of_j%C5%8Dy%C5%8D_kanji) from 170MB
> to 70MB. Obviously, 70MB is still too much, so we should also have a check
> that makes sure the page is a reasonable size before we even try to parse it
> (e.g., do an element count via getElementsByTagName()).

What do you mean exactly with these 170MB and 70MB? Memory used only by the web worker? Global memory usage increase? 70MB still feels insanely high for just a web parsing worker. But I guess this is for huge pages like the Wikipedia article you mentioned? What's the memory usage for things like nytimes.com articles? Just wondering.

Just thinking out loud here: maybe a safer way to check this is to define max number of children on elements like tables and lists as this is likely where we can detect if the page is really about just a huge list of "things", which we don't care about in Reader Mode.

It would be nice if we could simply discard (while parsing) stuff from the page that we won't use anyway: style attributes and tags, script tags, link tags in head, comments, form element tags, etc.

Brian Nicholson (:bnicholson)

Assignee

Comment 37

•

12 years ago

(In reply to Lucas Rocha (:lucasr) from comment #36)
> What do you mean exactly with these 170MB and 70MB? Memory used only by the
> web worker? Global memory usage increase? 70MB still feels insanely high for
> just a web parsing worker. But I guess this is for huge pages like the
> Wikipedia article you mentioned? What's the memory usage for things like
> nytimes.com articles? Just wondering.

This is just for the reader worker. These are the numbers reported by the memory reporter (I see similar numbers if go to about:memory and refresh while the worker is running). I agree the numbers feel high, though I'm not familiar enough with JS memory analysis to know what to expect.

Some things I've noticed:
* The worker itself consumes ~9MB (before the JSDOMParser is even created)
* In that page, there are 80518 total nodes (including Element nodes, Text nodes, etc)
* Once the JSDOMParser has run, the memory usage jumps up to 36MB:
  * ~17MB is from non-function js objects
  * ~8MB from strings/string headers
* When the worker hits 76MB at the end (right before returning):
  * ~25MB from non-function js objets
  * ~30MB from strings
* Right before the innerHTML call at the end (which is used to serialize the entire document so the worker can pass it back), memory usage is at 66MB. That means the final innerHTML call adds 10MB.

> Just thinking out loud here: maybe a safer way to check this is to define
> max number of children on elements like tables and lists as this is likely
> where we can detect if the page is really about just a huge list of
> "things", which we don't care about in Reader Mode.

Yeah, we can tweak the algorithm to be more accurate if a simple element count is too exclusive. Thankfully, the reader test app we're using should make this easy to detect.

> It would be nice if we could simply discard (while parsing) stuff from the
> page that we won't use anyway: style attributes and tags, script tags, link
> tags in head, comments, form element tags, etc.

We should definitely at least clean these elements out once the parse is done (bug 782344); I'd be concerned about removing them before then because it could affect the algorithm.

Brian Nicholson (:bnicholson)

Assignee

Comment 38

•

12 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/456a21cee04f
https://hg.mozilla.org/integration/mozilla-inbound/rev/5b685856d907
https://hg.mozilla.org/integration/mozilla-inbound/rev/a5806821ee03
https://hg.mozilla.org/integration/mozilla-inbound/rev/9eb0130a844e
https://hg.mozilla.org/integration/mozilla-inbound/rev/2e0defbd0a57
https://hg.mozilla.org/integration/mozilla-inbound/rev/9f9821b688fc
https://hg.mozilla.org/integration/mozilla-inbound/rev/419cd2d97273
https://hg.mozilla.org/integration/mozilla-inbound/rev/13d8ee081bcc
https://hg.mozilla.org/integration/mozilla-inbound/rev/edc004492ca2
https://hg.mozilla.org/integration/mozilla-inbound/rev/adcc4425a3ac
https://hg.mozilla.org/integration/mozilla-inbound/rev/1daf6f0c5008

Brian Nicholson (:bnicholson)

Assignee

Updated

•

12 years ago

status-firefox16: --- → affected

status-firefox17: --- → affected

:Ms2ger (he/him; ⌚ UTC+1/+2)

Comment 39

•

12 years ago

(In reply to Brian Nicholson (:bnicholson) from comment #2)
> Created attachment 656713 [details] [diff] [review]
> Part 1: Replace string concatenations with Array.join()
> 
> We're building lots of huge string objects as the strings get incrementally
> larger with each concatenation. For example, concatenating "aaa" + "bbb" +
> "ccc" would look like this:
> 
> str = "aaa"  // strings in heap: "aaa"
> str += "bbb" // strings in heap: "aaa", "aaabbb"
> str += "ccc" // strings in heap: "aaa", "aaabbb", "aaabbbccc"
> 
> whereas with this patch, str is created in one go:
> 
> str = ["aaa", "bbb", "ccc"].join("") // strings in heap: "aaabbbccc" 
> 
> In practice, there are many, many more strings, and the resulting strings
> will be much larger (the size of the document), so we should avoid creating
> tons of huge intermediate strings.

Just so you know, the JS engine actually does the assignment lazily: http://blog.cdleary.com/2012/01/string-representation-in-spidermonkey/#ropes

Ryan VanderMeulen [:RyanVM]

Comment 40

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/456a21cee04f
https://hg.mozilla.org/mozilla-central/rev/5b685856d907
https://hg.mozilla.org/mozilla-central/rev/a5806821ee03
https://hg.mozilla.org/mozilla-central/rev/9eb0130a844e
https://hg.mozilla.org/mozilla-central/rev/2e0defbd0a57
https://hg.mozilla.org/mozilla-central/rev/9f9821b688fc
https://hg.mozilla.org/mozilla-central/rev/419cd2d97273
https://hg.mozilla.org/mozilla-central/rev/13d8ee081bcc
https://hg.mozilla.org/mozilla-central/rev/edc004492ca2
https://hg.mozilla.org/mozilla-central/rev/adcc4425a3ac
https://hg.mozilla.org/mozilla-central/rev/1daf6f0c5008

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Target Milestone: --- → Firefox 18

Brian Nicholson (:bnicholson)

Assignee

Comment 41

•

12 years ago

(In reply to :Ms2ger from comment #39)
> Just so you know, the JS engine actually does the assignment lazily:
> http://blog.cdleary.com/2012/01/string-representation-in-spidermonkey/#ropes

Thanks for the correction. I assumed the implementation in comment 2 because I noticed performance/memory improvements when switching from string concatenation to Array.join(). But considering how common an operation string concatenation is, I should have known that we wouldn't use the naive implementation I described (and it would have resulted in a much larger memory consumption than what I was observing).

But as I said, I did see a noticeable improvement when switching from concatenation to Array.join(). Running the JSDOMParser/Readability scripts on a PC for http://en.wikipedia.org/wiki/List_of_j%C5%8Dy%C5%8D_kanji, I noticed a ~100ms decrease in parse time and ~5MB less in memory usage. Is this just overhead from building/traversing the rope?

Brian Nicholson (:bnicholson)

Assignee

Comment 42

•

12 years ago

Attached patch roll-up patch for aurora and beta — Details — Splinter Review

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 779796
User impact if declined: reader parsing will consume more memory, which will close background apps, slow things down, potentially crash browser, etc
Testing completed (on m-c, etc.): m-c
Risk to taking this patch (and alternatives if risky): Medium risk for affecting reader mode output. This changes lots of code in Readability.js, so there's a greater chance of breaking something. But these patches have been tested on our reader mode test case to make sure nothing has changed, which runs the code on 50+ pages.
String or UUID changes made by this patch: none

Attachment #658159 - Flags: approval-mozilla-beta?

Attachment #658159 - Flags: approval-mozilla-aurora?

Lukas Blakk [:lsblakk] use ?needinfo

Comment 43

•

12 years ago

Comment on attachment 658159 [details] [diff] [review]
roll-up patch for aurora and beta

With 4 more betas left in this cycle we should be able to shake out any major bugs in this new feature, so approving for branches.

Attachment #658159 - Flags: approval-mozilla-beta?

Attachment #658159 - Flags: approval-mozilla-beta+

Attachment #658159 - Flags: approval-mozilla-aurora?

Attachment #658159 - Flags: approval-mozilla-aurora+

Brian Nicholson (:bnicholson)

Assignee

Comment 44

•

12 years ago

http://hg.mozilla.org/releases/mozilla-aurora/rev/bff0d574ab71

status-firefox17: affected → fixed

Brian Nicholson (:bnicholson)

Assignee

Comment 45

•

12 years ago

http://hg.mozilla.org/releases/mozilla-beta/rev/3e246c296259

status-firefox16: affected → fixed

Nicholas Nethercote [inactive]

Comment 46

•

12 years ago

> But as I said, I did see a noticeable improvement when switching from
> concatenation to Array.join(). Running the JSDOMParser/Readability scripts
> on a PC for http://en.wikipedia.org/wiki/List_of_j%C5%8Dy%C5%8D_kanji, I
> noticed a ~100ms decrease in parse time and ~5MB less in memory usage. Is
> this just overhead from building/traversing the rope?

Yes.  With ropes you end up building a binary tree.  The internal nodes are Rope nodes and the leaf nodes contain actual strings.  With Array.join you avoid creating all those internal nodes -- you instead have an array element per string but an array element is smaller than a Rope node.

BMO Automation

Updated

•

3 years ago

Product: Firefox for Android → Firefox for Android Graveyard

You need to log in before you can comment on or make changes to this bug.