Last Comment Bug 784386 - Big memory spikes on readability check
: Big memory spikes on readability check
Status: RESOLVED FIXED
:
Product: Firefox for Android
Classification: Client Software
Component: Reader View (show other bugs)
: unspecified
: All Android
: P1 normal (vote)
: Firefox 18
Assigned To: Brian Nicholson (:bnicholson)
:
Mentors:
Depends on:
Blocks: reader
  Show dependency treegraph
 
Reported: 2012-08-21 09:34 PDT by Lucas Rocha (:lucasr)
Modified: 2012-09-04 18:26 PDT (History)
7 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
fixed
fixed


Attachments
Part 1: Replace string concatenations with Array.join() (2.84 KB, patch)
2012-08-29 19:15 PDT, Brian Nicholson (:bnicholson)
lucasr.at.mozilla: review+
Details | Diff | Splinter Review
Part 2: Replace innerHTML call for <font> to <span> conversion (1.08 KB, patch)
2012-08-29 19:17 PDT, Brian Nicholson (:bnicholson)
no flags Details | Diff | Splinter Review
Part 3: Replace innerHTML call for <br><p> to <p> conversion (4.84 KB, patch)
2012-08-29 19:18 PDT, Brian Nicholson (:bnicholson)
lucasr.at.mozilla: review+
Details | Diff | Splinter Review
Part 4: Replace innerHTML call for <div> to <p> conversion (2.31 KB, patch)
2012-08-29 19:18 PDT, Brian Nicholson (:bnicholson)
no flags Details | Diff | Splinter Review
Part 5: Replace innerHTML call for topCandidate (1.43 KB, patch)
2012-08-29 19:19 PDT, Brian Nicholson (:bnicholson)
lucasr.at.mozilla: review+
Details | Diff | Splinter Review
Part 6: Replace innerHTML call for putting sibling in <div> (1.69 KB, patch)
2012-08-29 19:19 PDT, Brian Nicholson (:bnicholson)
lucasr.at.mozilla: review-
Details | Diff | Splinter Review
Part 7: Replace innerHTML call for readability-page-1 <div> (1.51 KB, patch)
2012-08-29 19:20 PDT, Brian Nicholson (:bnicholson)
lucasr.at.mozilla: review+
Details | Diff | Splinter Review
Part 8: Replace innerHTML call for divToPElements search (2.53 KB, patch)
2012-08-29 19:21 PDT, Brian Nicholson (:bnicholson)
no flags Details | Diff | Splinter Review
Part 2: Replace innerHTML call for <font> to <span> conversion, v2 (1.62 KB, patch)
2012-08-29 19:35 PDT, Brian Nicholson (:bnicholson)
lucasr.at.mozilla: review+
Details | Diff | Splinter Review
Part 4: Replace innerHTML call for <div> to <p> conversion, v2 (1.78 KB, patch)
2012-08-29 19:36 PDT, Brian Nicholson (:bnicholson)
lucasr.at.mozilla: review-
Details | Diff | Splinter Review
Part 8: Replace innerHTML call for divToPElements search, v2 (2.52 KB, patch)
2012-08-29 19:43 PDT, Brian Nicholson (:bnicholson)
lucasr.at.mozilla: review-
Details | Diff | Splinter Review
Part 1: Replace string concatenations with Array.join(), v2 (3.23 KB, patch)
2012-08-30 23:00 PDT, Brian Nicholson (:bnicholson)
lucasr.at.mozilla: review+
Details | Diff | Splinter Review
Part 4: Replace innerHTML call for <div> to <p> conversion, v3 (2.03 KB, patch)
2012-08-30 23:04 PDT, Brian Nicholson (:bnicholson)
lucasr.at.mozilla: review+
Details | Diff | Splinter Review
Part 6: Replace innerHTML call for putting sibling in <div>, v2 (2.34 KB, patch)
2012-08-30 23:13 PDT, Brian Nicholson (:bnicholson)
lucasr.at.mozilla: review+
Details | Diff | Splinter Review
Part 8: Replace innerHTML call for divToPElements search, v3 (3.82 KB, patch)
2012-08-30 23:15 PDT, Brian Nicholson (:bnicholson)
lucasr.at.mozilla: review+
Details | Diff | Splinter Review
Part 9: Move pageCacheHtml outside of loop (1.71 KB, patch)
2012-08-30 23:17 PDT, Brian Nicholson (:bnicholson)
lucasr.at.mozilla: review+
Details | Diff | Splinter Review
Part 10: Remove killBreaks() (3.88 KB, patch)
2012-08-30 23:26 PDT, Brian Nicholson (:bnicholson)
lucasr.at.mozilla: review+
Details | Diff | Splinter Review
Part 11: Replace setting innerHTML with textContent for text (1.09 KB, patch)
2012-08-30 23:29 PDT, Brian Nicholson (:bnicholson)
lucasr.at.mozilla: review+
Details | Diff | Splinter Review
roll-up patch for aurora and beta (20.23 KB, patch)
2012-09-04 11:17 PDT, Brian Nicholson (:bnicholson)
lukasblakk+bugs: approval‑mozilla‑aurora+
lukasblakk+bugs: approval‑mozilla‑beta+
Details | Diff | Splinter Review

Description Lucas Rocha (:lucasr) 2012-08-21 09:34:25 PDT
Some users have reported big memory spikes while firefox is running the readability check on pages. We have to gather data on this issue and figure out how to fix it.
Comment 1 Lucas Rocha (:lucasr) 2012-08-29 09:32:31 PDT
Assigning to Brian as he's working on it.
Comment 2 Brian Nicholson (:bnicholson) 2012-08-29 19:15:20 PDT
Created attachment 656713 [details] [diff] [review]
Part 1: Replace string concatenations with Array.join()

We're building lots of huge string objects as the strings get incrementally larger with each concatenation. For example, concatenating "aaa" + "bbb" + "ccc" would look like this:

str = "aaa"  // strings in heap: "aaa"
str += "bbb" // strings in heap: "aaa", "aaabbb"
str += "ccc" // strings in heap: "aaa", "aaabbb", "aaabbbccc"

whereas with this patch, str is created in one go:

str = ["aaa", "bbb", "ccc"].join("") // strings in heap: "aaabbbccc" 

In practice, there are many, many more strings, and the resulting strings will be much larger (the size of the document), so we should avoid creating tons of huge intermediate strings.
Comment 3 Brian Nicholson (:bnicholson) 2012-08-29 19:17:52 PDT
Created attachment 656714 [details] [diff] [review]
Part 2: Replace innerHTML call for <font> to <span> conversion
Comment 4 Brian Nicholson (:bnicholson) 2012-08-29 19:18:25 PDT
Created attachment 656715 [details] [diff] [review]
Part 3: Replace innerHTML call for <br><p> to <p> conversion
Comment 5 Brian Nicholson (:bnicholson) 2012-08-29 19:18:53 PDT
Created attachment 656716 [details] [diff] [review]
Part 4: Replace innerHTML call for <div> to <p> conversion
Comment 6 Brian Nicholson (:bnicholson) 2012-08-29 19:19:26 PDT
Created attachment 656717 [details] [diff] [review]
Part 5: Replace innerHTML call for topCandidate
Comment 7 Brian Nicholson (:bnicholson) 2012-08-29 19:19:50 PDT
Created attachment 656718 [details] [diff] [review]
Part 6: Replace innerHTML call for putting sibling in <div>
Comment 8 Brian Nicholson (:bnicholson) 2012-08-29 19:20:49 PDT
Created attachment 656719 [details] [diff] [review]
Part 7: Replace innerHTML call for readability-page-1 <div>
Comment 9 Brian Nicholson (:bnicholson) 2012-08-29 19:21:17 PDT
Created attachment 656720 [details] [diff] [review]
Part 8: Replace innerHTML call for divToPElements search
Comment 10 Brian Nicholson (:bnicholson) 2012-08-29 19:24:07 PDT
There's still more to come, so I'll keep adding patches as they get done. Sorry for so many files, but I figured they'd be easier to review each separate change than if they were all combined into a single monolithic patch.
Comment 11 Brian Nicholson (:bnicholson) 2012-08-29 19:35:44 PDT
Created attachment 656722 [details] [diff] [review]
Part 2: Replace innerHTML call for <font> to <span> conversion, v2

Moved _setNodeTag() from patch 4 to this one.
Comment 12 Brian Nicholson (:bnicholson) 2012-08-29 19:36:30 PDT
Created attachment 656723 [details] [diff] [review]
Part 4: Replace innerHTML call for <div> to <p> conversion, v2

Moved _setNodeTag() to patch 2.
Comment 13 Brian Nicholson (:bnicholson) 2012-08-29 19:43:05 PDT
Created attachment 656726 [details] [diff] [review]
Part 8: Replace innerHTML call for divToPElements search, v2

Fixed return value for hasDivToPElement().
Comment 14 Mark Finkle (:mfinkle) (use needinfo?) 2012-08-29 22:30:51 PDT
What is the affect on memory usage with these patches?
Comment 15 Lucas Rocha (:lucasr) 2012-08-30 02:52:36 PDT
Comment on attachment 656713 [details] [diff] [review]
Part 1: Replace string concatenations with Array.join()

Review of attachment 656713 [details] [diff] [review]:
-----------------------------------------------------------------

Please a comment explaining why the use of Array.join() is relevant here for future reference.
Comment 16 Lucas Rocha (:lucasr) 2012-08-30 03:06:22 PDT
Comment on attachment 656717 [details] [diff] [review]
Part 5: Replace innerHTML call for topCandidate

Review of attachment 656717 [details] [diff] [review]:
-----------------------------------------------------------------

This patch is fine but I wonder if you could simply do "topCandidate = page" if topCandidate is null. We don't care about the 'page' variable after topCandidate is set and we simply reset it (page.innerHTML = pageCacheHtml) if we need another loop iteration.
Comment 17 Lucas Rocha (:lucasr) 2012-08-30 03:17:58 PDT
Comment on attachment 656723 [details] [diff] [review]
Part 4: Replace innerHTML call for <div> to <p> conversion, v2

Review of attachment 656723 [details] [diff] [review]:
-----------------------------------------------------------------

I'd like to see a new version of this patch with suggested fixes.

::: mobile/android/chrome/content/Readability.js
@@ +490,5 @@
>            // algorithm with DIVs with are, in practice, paragraphs.
>            let pIndex = this._getSinglePIndexInsideDiv(node);
>  
>            if (node.innerHTML.search(this.REGEXPS.divToPElements) === -1 || pIndex >= 0) {
>              let newNode;

No need for this variable anymore, right?

@@ +499,2 @@
>              } else {
> +              newNode = node;

You can just use "node" directly here, no need to use newNode.

@@ +499,3 @@
>              } else {
> +              newNode = node;
> +              this._setNodeTag(newNode, "P");

I'd prefer if you keep the "nodesToScore[nodesToScore.length] = node;" here instead of decrementing nodeIndex (which much less obvious).
Comment 18 Lucas Rocha (:lucasr) 2012-08-30 03:28:04 PDT
Comment on attachment 656718 [details] [diff] [review]
Part 6: Replace innerHTML call for putting sibling in <div>

Review of attachment 656718 [details] [diff] [review]:
-----------------------------------------------------------------

Just want the question answered before giving a r+.

::: mobile/android/chrome/content/Readability.js
@@ +666,5 @@
>  
>          if (append) {
>            this.log("Appending node: " + siblingNode);
>  
> +          let nodeToAppend = siblingNode;

No real need to declatre this nodeToAppend variable, right?

@@ +668,5 @@
>            this.log("Appending node: " + siblingNode);
>  
> +          let nodeToAppend = siblingNode;
> +          s -= 1;
> +          sl -= 1;

Not caused by your patch but I think this decrement only makes sense if we were using a live node list, no? For instance, where do we remove the appended siblingNode fro the siblingNodes array? What am I missing?
Comment 19 Lucas Rocha (:lucasr) 2012-08-30 03:34:37 PDT
Comment on attachment 656719 [details] [diff] [review]
Part 7: Replace innerHTML call for readability-page-1 <div>

Review of attachment 656719 [details] [diff] [review]:
-----------------------------------------------------------------

This could probably just be commented out for now as we don't really support multi-page articles yet.

::: mobile/android/chrome/content/Readability.js
@@ +691,2 @@
>  
>        yield { msg: "grabArticle 7" };

I'm assuming you'll remove these yield calls before pushing btw :-)

@@ +695,5 @@
>        this._prepArticle(articleContent);
>  
>        yield { msg: "grabArticle 8" };
>  
> +      if (this._curPageNum === 1) {

This could probably just be commented out for now by the way.

@@ +698,5 @@
>  
> +      if (this._curPageNum === 1) {
> +        let div = doc.createElement("DIV");
> +        div.id = "readability-page-1";
> +        div.className = "page";

nit: add empty line here.
Comment 20 Lucas Rocha (:lucasr) 2012-08-30 03:56:01 PDT
Comment on attachment 656726 [details] [diff] [review]
Part 8: Replace innerHTML call for divToPElements search, v2

Review of attachment 656726 [details] [diff] [review]:
-----------------------------------------------------------------

I'd like to see a cleaned up version of this patch before giving r+.

::: mobile/android/chrome/content/Readability.js
@@ +412,5 @@
>     * @param page a document to run upon. Needs to be a full document, complete with body.
>     * @return Element
>    **/
>    _grabArticle: function (page) {
> +    function hasDivToPElement(node) {

Make this a top level method just like _getSinglePIndexInsideDiv().

@@ +417,5 @@
> +      let length = node.childNodes.length;
> +      for (let i = 0; i < length; i++) {
> +        let child = node.childNodes[i];
> +        if (child.nodeType != 1)
> +          continue;

nit: add empty line here.

@@ -489,5 @@
>            // safely converted into plain P elements to avoid confusing the scoring
>            // algorithm with DIVs with are, in practice, paragraphs.
>            let pIndex = this._getSinglePIndexInsideDiv(node);
>  
> -          if (node.innerHTML.search(this.REGEXPS.divToPElements) === -1 || pIndex >= 0) {

I guess you can remove the divToPElements regex now?
Comment 21 Lucas Rocha (:lucasr) 2012-08-30 04:00:06 PDT
Good stuff, Brian! I'd be interested to know if you have any (rough) estimates of how much these patches impact performance? Does the readability check take longer or does it get faster? Also, same question than mfinkle: what kind of memory usage improvements are we talking about here?

These patches a fairly risky. We'll probably need to let them bake for a long enough period in Nightly before uplifting.
Comment 22 Lucas Rocha (:lucasr) 2012-08-30 04:19:13 PDT
Could you please run the reader test app with this file and confirm we're not regressing?

  http://dl.dropbox.com/u/1187037/sites.json

Just a list of sites I've been testing on. Feel free to merge these into the git repo.
Comment 23 Brian Nicholson (:bnicholson) 2012-08-30 11:52:37 PDT
(In reply to Mark Finkle (:mfinkle) from comment #14)
> What is the affect on memory usage with these patches?

This isn't done yet, but here's some before/after numbers I saw with these patches:

Before:
non-function objects: 106.8MB
strings: 95.5MB
string-chars: 20.7MB
gc-heap-chunk-dirty-unused: 46.1

After:
non-function objects: 45.6MB
strings: 28.4MB
string-chars: 15MB
gc-heap-chunk-dirty-unused: 27.7

These represent the maximum memory usage I observed for each category during a parse. Overall, this appears to drop the maximum memory usage by about 50%. As an added bonus, removing these innerHTML calls considerably improves parsing speed (I'll give more numbers once all patches are done).
Comment 24 Brian Nicholson (:bnicholson) 2012-08-30 12:03:00 PDT
(In reply to Lucas Rocha (:lucasr) from comment #22)
> Could you please run the reader test app with this file and confirm we're
> not regressing?
> 
>   http://dl.dropbox.com/u/1187037/sites.json
> 
> Just a list of sites I've been testing on. Feel free to merge these into the
> git repo.

All these sites (and the ones in the repo) pass the same before/and after these patches.

I'd actually like to think these patches are fairly low risk - it's relatively straightforward to replace these innerHTML calls to the non-innerHTML equivalents, and in several cases, the process is the same. And we have a pretty thorough test suite to make sure we aren't regressing along the way.
Comment 25 Brian Nicholson (:bnicholson) 2012-08-30 21:52:50 PDT
(In reply to Lucas Rocha (:lucasr) from comment #18)
> Comment on attachment 656718 [details] [diff] [review]
> Part 6: Replace innerHTML call for putting sibling in <div>
> 
> Review of attachment 656718 [details] [diff] [review]:
> -----------------------------------------------------------------
> @@ +668,5 @@
> >            this.log("Appending node: " + siblingNode);
> >  
> > +          let nodeToAppend = siblingNode;
> > +          s -= 1;
> > +          sl -= 1;
> 
> Not caused by your patch but I think this decrement only makes sense if we
> were using a live node list, no? For instance, where do we remove the
> appended siblingNode fro the siblingNodes array? What am I missing?

appendChild() calls removeChild(), which does this:

  childNodes.splice(childIndex, 1)[0];

siblingNodes is just a reference to the childNodes array.
Comment 26 Brian Nicholson (:bnicholson) 2012-08-30 23:00:01 PDT
Created attachment 657162 [details] [diff] [review]
Part 1: Replace string concatenations with Array.join(), v2

I changed the textContent getter to use a single array for the entire operation. In the previous patch, each level of recursion resulted in an Array.join(), which was less efficient and resulted in more strings on the heap.
Comment 27 Brian Nicholson (:bnicholson) 2012-08-30 23:04:34 PDT
Created attachment 657166 [details] [diff] [review]
Part 4: Replace innerHTML call for <div> to <p> conversion, v3
Comment 28 Brian Nicholson (:bnicholson) 2012-08-30 23:13:49 PDT
Created attachment 657168 [details] [diff] [review]
Part 6: Replace innerHTML call for putting sibling in <div>, v2
Comment 29 Brian Nicholson (:bnicholson) 2012-08-30 23:15:18 PDT
Created attachment 657169 [details] [diff] [review]
Part 8: Replace innerHTML call for divToPElements search, v3
Comment 30 Brian Nicholson (:bnicholson) 2012-08-30 23:17:50 PDT
Created attachment 657170 [details] [diff] [review]
Part 9: Move pageCacheHtml outside of loop

This avoids getting the innerHTML with each loop iteration (which should always give the same thing).
Comment 31 Brian Nicholson (:bnicholson) 2012-08-30 23:26:50 PDT
Created attachment 657174 [details] [diff] [review]
Part 10: Remove killBreaks()

killBreaks() is a pretty expensive operation, and I'm not sure we even need it. It appears to do two things:
1) Removes whitespace after <brs>, and
2) Removes multiple adjacent <br>s (optionally separated by whitespace)

#1 shouldn't really affect the output - there's extra space all over HTML that doesn't cause any harm. #2 is, I think, already done by replaceBrs(). Looking through all sites in the reader test, I haven't noticed any differences in output with this patch applied, so let me know if you notice anything.

Also, this change somehow triggered the following:

      let contentScore = (typeof tagsList[i].readability !== 'undefined') ? tagsList[i].this._contentScore : 0;

where tagsList[i].readability is true somewhere. This gave an error that "tagsList[i].this" was undefined. Looking at this code, neither the "this" property nor the "_contentScore" property seem to be set anywhere, so I removed them.
Comment 32 Brian Nicholson (:bnicholson) 2012-08-30 23:29:45 PDT
Created attachment 657175 [details] [diff] [review]
Part 11: Replace setting innerHTML with textContent for text

Setting innerHTML is more expensive than setting textContent since it tries to parse the string. We might as well just use textContent here since the text is just coming from a TEXT_NODE.
Comment 33 Brian Nicholson (:bnicholson) 2012-08-30 23:44:48 PDT
Thanks for reviewing all these. This should pretty much cover all of the obvious changes I could find; it dropped the max memory usage from one test site (http://en.wikipedia.org/wiki/List_of_j%C5%8Dy%C5%8D_kanji) from 170MB to 70MB. Obviously, 70MB is still too much, so we should also have a check that makes sure the page is a reasonable size before we even try to parse it (e.g., do an element count via getElementsByTagName()).

I'll post more numbers tomorrow morning regarding the memory/performance gains from these patches. Here's a build if you'd like to test it: http://dl.dropbox.com/u/35559547/fennec-worker-mem.apk
Comment 34 Lucas Rocha (:lucasr) 2012-08-31 05:39:44 PDT
Comment on attachment 657170 [details] [diff] [review]
Part 9: Move pageCacheHtml outside of loop

Review of attachment 657170 [details] [diff] [review]:
-----------------------------------------------------------------

Nice catch.
Comment 35 Lucas Rocha (:lucasr) 2012-08-31 05:41:03 PDT
Comment on attachment 657174 [details] [diff] [review]
Part 10: Remove killBreaks()

Review of attachment 657174 [details] [diff] [review]:
-----------------------------------------------------------------

Makes sense.
Comment 36 Lucas Rocha (:lucasr) 2012-08-31 05:54:12 PDT
(In reply to Brian Nicholson (:bnicholson) from comment #33)
> Thanks for reviewing all these. This should pretty much cover all of the
> obvious changes I could find; it dropped the max memory usage from one test
> site (http://en.wikipedia.org/wiki/List_of_j%C5%8Dy%C5%8D_kanji) from 170MB
> to 70MB. Obviously, 70MB is still too much, so we should also have a check
> that makes sure the page is a reasonable size before we even try to parse it
> (e.g., do an element count via getElementsByTagName()).

What do you mean exactly with these 170MB and 70MB? Memory used only by the web worker? Global memory usage increase? 70MB still feels insanely high for just a web parsing worker. But I guess this is for huge pages like the Wikipedia article you mentioned? What's the memory usage for things like nytimes.com articles? Just wondering.

Just thinking out loud here: maybe a safer way to check this is to define max number of children on elements like tables and lists as this is likely where we can detect if the page is really about just a huge list of "things", which we don't care about in Reader Mode.

It would be nice if we could simply discard (while parsing) stuff from the page that we won't use anyway: style attributes and tags, script tags, link tags in head, comments, form element tags, etc.
Comment 37 Brian Nicholson (:bnicholson) 2012-08-31 13:47:26 PDT
(In reply to Lucas Rocha (:lucasr) from comment #36)
> What do you mean exactly with these 170MB and 70MB? Memory used only by the
> web worker? Global memory usage increase? 70MB still feels insanely high for
> just a web parsing worker. But I guess this is for huge pages like the
> Wikipedia article you mentioned? What's the memory usage for things like
> nytimes.com articles? Just wondering.

This is just for the reader worker. These are the numbers reported by the memory reporter (I see similar numbers if go to about:memory and refresh while the worker is running). I agree the numbers feel high, though I'm not familiar enough with JS memory analysis to know what to expect.

Some things I've noticed:
* The worker itself consumes ~9MB (before the JSDOMParser is even created)
* In that page, there are 80518 total nodes (including Element nodes, Text nodes, etc)
* Once the JSDOMParser has run, the memory usage jumps up to 36MB:
  * ~17MB is from non-function js objects
  * ~8MB from strings/string headers
* When the worker hits 76MB at the end (right before returning):
  * ~25MB from non-function js objets
  * ~30MB from strings
* Right before the innerHTML call at the end (which is used to serialize the entire document so the worker can pass it back), memory usage is at 66MB. That means the final innerHTML call adds 10MB.

> Just thinking out loud here: maybe a safer way to check this is to define
> max number of children on elements like tables and lists as this is likely
> where we can detect if the page is really about just a huge list of
> "things", which we don't care about in Reader Mode.

Yeah, we can tweak the algorithm to be more accurate if a simple element count is too exclusive. Thankfully, the reader test app we're using should make this easy to detect.

> It would be nice if we could simply discard (while parsing) stuff from the
> page that we won't use anyway: style attributes and tags, script tags, link
> tags in head, comments, form element tags, etc.

We should definitely at least clean these elements out once the parse is done (bug 782344); I'd be concerned about removing them before then because it could affect the algorithm.
Comment 39 :Ms2ger (⌚ UTC+1/+2) 2012-09-01 04:09:28 PDT
(In reply to Brian Nicholson (:bnicholson) from comment #2)
> Created attachment 656713 [details] [diff] [review]
> Part 1: Replace string concatenations with Array.join()
> 
> We're building lots of huge string objects as the strings get incrementally
> larger with each concatenation. For example, concatenating "aaa" + "bbb" +
> "ccc" would look like this:
> 
> str = "aaa"  // strings in heap: "aaa"
> str += "bbb" // strings in heap: "aaa", "aaabbb"
> str += "ccc" // strings in heap: "aaa", "aaabbb", "aaabbbccc"
> 
> whereas with this patch, str is created in one go:
> 
> str = ["aaa", "bbb", "ccc"].join("") // strings in heap: "aaabbbccc" 
> 
> In practice, there are many, many more strings, and the resulting strings
> will be much larger (the size of the document), so we should avoid creating
> tons of huge intermediate strings.

Just so you know, the JS engine actually does the assignment lazily: http://blog.cdleary.com/2012/01/string-representation-in-spidermonkey/#ropes
Comment 41 Brian Nicholson (:bnicholson) 2012-09-01 12:29:37 PDT
(In reply to :Ms2ger from comment #39)
> Just so you know, the JS engine actually does the assignment lazily:
> http://blog.cdleary.com/2012/01/string-representation-in-spidermonkey/#ropes

Thanks for the correction. I assumed the implementation in comment 2 because I noticed performance/memory improvements when switching from string concatenation to Array.join(). But considering how common an operation string concatenation is, I should have known that we wouldn't use the naive implementation I described (and it would have resulted in a much larger memory consumption than what I was observing).

But as I said, I did see a noticeable improvement when switching from concatenation to Array.join(). Running the JSDOMParser/Readability scripts on a PC for http://en.wikipedia.org/wiki/List_of_j%C5%8Dy%C5%8D_kanji, I noticed a ~100ms decrease in parse time and ~5MB less in memory usage. Is this just overhead from building/traversing the rope?
Comment 42 Brian Nicholson (:bnicholson) 2012-09-04 11:17:34 PDT
Created attachment 658159 [details] [diff] [review]
roll-up patch for aurora and beta

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 779796
User impact if declined: reader parsing will consume more memory, which will close background apps, slow things down, potentially crash browser, etc
Testing completed (on m-c, etc.): m-c
Risk to taking this patch (and alternatives if risky): Medium risk for affecting reader mode output. This changes lots of code in Readability.js, so there's a greater chance of breaking something. But these patches have been tested on our reader mode test case to make sure nothing has changed, which runs the code on 50+ pages.
String or UUID changes made by this patch: none
Comment 43 Lukas Blakk [:lsblakk] use ?needinfo 2012-09-04 16:31:01 PDT
Comment on attachment 658159 [details] [diff] [review]
roll-up patch for aurora and beta

With 4 more betas left in this cycle we should be able to shake out any major bugs in this new feature, so approving for branches.
Comment 44 Brian Nicholson (:bnicholson) 2012-09-04 16:42:58 PDT
http://hg.mozilla.org/releases/mozilla-aurora/rev/bff0d574ab71
Comment 45 Brian Nicholson (:bnicholson) 2012-09-04 16:43:22 PDT
http://hg.mozilla.org/releases/mozilla-beta/rev/3e246c296259
Comment 46 Nicholas Nethercote [:njn] 2012-09-04 18:26:28 PDT
> But as I said, I did see a noticeable improvement when switching from
> concatenation to Array.join(). Running the JSDOMParser/Readability scripts
> on a PC for http://en.wikipedia.org/wiki/List_of_j%C5%8Dy%C5%8D_kanji, I
> noticed a ~100ms decrease in parse time and ~5MB less in memory usage. Is
> this just overhead from building/traversing the rope?

Yes.  With ropes you end up building a binary tree.  The internal nodes are Rope nodes and the leaf nodes contain actual strings.  With Array.join you avoid creating all those internal nodes -- you instead have an array element per string but an array element is smaller than a Rope node.

Note You need to log in before you can comment on or make changes to this bug.