Last Comment Bug 837845 - Multi-GB memory spike when using regular expressions
: Multi-GB memory spike when using regular expressions
Status: VERIFIED FIXED
[MemShrink:P1]
: regression
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: 19 Branch
: x86_64 Windows 7
: -- normal (vote)
: mozilla21
Assigned To: Terrence Cole [:terrence]
: Anthony Hughes (:ashughes) [GFX][QA][Mentor]
Mentors:
Depends on:
Blocks: 798624 848577
  Show dependency treegraph
 
Reported: 2013-02-04 13:06 PST by Steffen Weihrauch
Modified: 2013-04-17 07:42 PDT (History)
21 users (show)
jsmith: in‑qa‑testsuite-
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
unaffected
+
verified
+
verified
+
verified


Attachments
firefox-memleak.htm testcase (1.38 KB, text/html)
2013-02-04 13:06 PST, Steffen Weihrauch
no flags Details
v0 (49.45 KB, patch)
2013-02-05 11:55 PST, Terrence Cole [:terrence]
jwalden+bmo: review+
Details | Diff | Splinter Review
v1: review comment addressed (49.40 KB, patch)
2013-02-05 14:42 PST, Terrence Cole [:terrence]
terrence.d.cole: review+
akeybl: approval‑mozilla‑beta+
terrence.d.cole: checkin+
Details | Diff | Splinter Review
v0: Ported patch forward to aurora, carrying r+. (50.99 KB, patch)
2013-02-06 14:45 PST, Terrence Cole [:terrence]
terrence.d.cole: review+
lukasblakk+bugs: approval‑mozilla‑aurora+
terrence.d.cole: checkin+
Details | Diff | Splinter Review

Description Steffen Weihrauch 2013-02-04 13:06:39 PST
Created attachment 709849 [details]
firefox-memleak.htm testcase

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0
Build ID: 20130130080006

Steps to reproduce:

Using regular expressions with Firefox 19.0 (beta).


Actual results:

Mega Memory Leak. Have tested on two different computers with windows 7, same result. On Firefox 18.0.1 there are all ok (see attachment for testcase).


Expected results:

Same behaviour as firefox 18.0.1 and all other browsers => no memeory leak
Comment 1 David Anderson [:dvander] 2013-02-04 16:08:14 PST
How are you measuring that it's a memory leak, and how big is the leak in exact terms?
Comment 2 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2013-02-04 16:45:09 PST
> Firefox 18.0.1
RAM marginally increases, alerts "done" almost immediately

> Firefox 20.0a2
RAM balloons to 3.4GB almost immediately, does not alert "done"
Navigating away from the page drops memory back to normal levels

> Firefox 19.0b4
Ballooning memory while testcase open and browser hangs (needs to be force closed)

> Firefox 19.0a1 2012-11-19 (last mozilla-central-19)
Ballooning memory while testcase open.
Memory is NOT released when navigating away.

The way I see it we have a couple of different issues in Firefox 19 here. 
1. Memory ballooning with the testcase (introduced in Firefox 19)
2. Memory leak after closing testcase ("fixed" in Firefox 20)

I think the priority should be trying to fix the ballooning memory since I think that's the catalyst for this bug. I'll try to find a regression range for this.
Comment 3 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2013-02-04 17:08:11 PST
Regression range for the memory ballooning to 3.4GB:
* Last Good: 2012-10-11 (99898ec9976a)
* First Bad: 2012-10-12 (90857937b601)
* Pushlog: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=99898ec9976a&tochange=90857937b601

Unfortunately there is three PGO merges totalling 234 changes in this pushlog and I can't seem to find hourlies going back to this date range.
Comment 4 Boris Zbarsky [:bz] (TPAC) 2013-02-04 17:17:40 PST
Hmm.  In that range, bug 798624 seems like it might be related.

Anthony, would you be willing to complain to the IT folks about the lack of hourlies older than 30 days?  We've been trying to get them to fix that for months with no luck.  :(  Maybe if they heard from someone not on the platform team...
Comment 5 Alice0775 White 2013-02-04 17:29:29 PST
Regression window(m-c)
Good:
http://hg.mozilla.org/mozilla-central/rev/99898ec9976a
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/19.0 Firefox/19.0 ID:20121010133903
Bad:
http://hg.mozilla.org/mozilla-central/rev/2fae8bd461da
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/19.0 Firefox/19.0 ID:20121011064702
Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=99898ec9976a&tochange=2fae8bd461da


Regression window(cached m-i)
Good:
http://hg.mozilla.org/integration/mozilla-inbound/rev/88bb0e2f0373
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/19.0 Firefox/19.0 ID:20121009171503
Bad:
http://hg.mozilla.org/integration/mozilla-inbound/rev/0761bc637081
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/19.0 Firefox/19.0 ID:20121009174802
Pushlog:
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=88bb0e2f0373&tochange=0761bc637081

Triggered by:
2c08d52e521d	Terrence Cole — Bug 798624 - Specialize low-level character access to JSStableString; r=luke, rs=Waldo Implements JSStableString::chars() and pushes StableCharPtr into the interface at several key choke-points. This will solidly enforce the requirement that we have uninlined jschar*s in places that we may GC with a jschar* on the stack.
Comment 6 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2013-02-04 17:58:04 PST
Thanks a lot for your help, Alice!
Comment 7 Terrence Cole [:terrence] 2013-02-05 10:26:17 PST
In today's aurora, after clicking the button the console shows that an out of memory exception was raised.

Timestamp: 2/5/2013 10:21:09 AM
Error: Uncaught asynchronous error: [Exception... "Out of Memory"  nsresult: "0x8007000e (NS_ERROR_OUT_OF_MEMORY)"  location: "JS frame :: resource://gre/modules/sessionstore/_SessionFile.jsm :: task :: line 207"  data: no] at
undefined
Source File: resource://gre/modules/sessionstore/_SessionFile.jsm
Line: 108

Timestamp: 2/5/2013 10:24:31 AM
Error: out of memory
Source File: resource:///modules/sessionstore/SessionStore.jsm
Line: 4169

Refreshing the page afterwards reliably crashes the browser. This should be fun.
Comment 8 Terrence Cole [:terrence] 2013-02-05 11:55:24 PST
Created attachment 710328 [details] [diff] [review]
v0

Manually backout the API and RegExp aspects of the regressing bug.

Try run is at:
https://tbpl.mozilla.org/?tree=Try&rev=ca79ec726062
Comment 9 Jeff Walden [:Waldo] (remove +bmo to email) 2013-02-05 14:11:31 PST
Comment on attachment 710328 [details] [diff] [review]
v0

Review of attachment 710328 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsstr.cpp
@@ +2465,1 @@
>              return false;

Let's get rid of this if(!) test, because it's going to look crazy if it stands.
Comment 10 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2013-02-05 14:34:05 PST
(In reply to Terrence Cole [:terrence] from comment #8)
> Created attachment 710328 [details] [diff] [review]
> v0
> 
> Manually backout the API and RegExp aspects of the regressing bug.
> 
> Try run is at:
> https://tbpl.mozilla.org/?tree=Try&rev=ca79ec726062

I've confirmed with the win32-opt build that the 3.4GB memory spike no longer occurs and the testcase alerts "done" almost immediately. This should be safe to land on the branches.
Comment 11 Terrence Cole [:terrence] 2013-02-05 14:42:26 PST
Created attachment 710389 [details] [diff] [review]
v1: review comment addressed

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Bug 798624
User impact if declined: Memory spikes under string heavy workloads that cross the JSAPI.
Testing completed (on m-c, etc.): Local shell tests pass.
Risk to taking this patch (and alternatives if risky): Low.
String or UUID changes made by this patch: None.

The Try run for this patch is fairly orange, but this appears to be because I based the patch off of mozilla-beta and we've changed the build infrastructure since then. I'm running another try run of a bare mozilla-beta tree to test  (https://tbpl.mozilla.org/?tree=Try&rev=0c64734b0437), although #releng tells me that orange and worse should be expected here. I'm currently whipping up a similar patch against aurora to Try, which will hopefully work better.

The good news is that this patch is very unlikely to have any adverse impact: if a shell or browser builds and runs at all then it will be exercising the changed code. Since that works fine locally, I think it is most likely fine.
Comment 12 Terrence Cole [:terrence] 2013-02-05 14:51:24 PST
Try push for Aurora patch, with branding set:
https://tbpl.mozilla.org/?tree=Try&rev=a838077c9603
Comment 13 Alex Keybl [:akeybl] 2013-02-05 14:55:39 PST
Comment on attachment 710389 [details] [diff] [review]
v1: review comment addressed

Let's land on Beta and quickly back out if we see new orange, rather than waiting on further builds/tests. Waiting to find out whether we regress anything first may push a fix into beta 6, which is super close to the release.
Comment 14 Terrence Cole [:terrence] 2013-02-05 15:09:31 PST
https://hg.mozilla.org/releases/mozilla-beta/rev/f22252f3041f
Comment 15 Ryan VanderMeulen [:RyanVM] 2013-02-05 15:19:54 PST
(I guess disabled is the right flag for when the change is backed out)
Comment 16 Terrence Cole [:terrence] 2013-02-05 17:03:03 PST
https://hg.mozilla.org/releases/mozilla-beta/rev/f0edb3dc1896
Comment 17 Terrence Cole [:terrence] 2013-02-06 10:09:21 PST
I think this stuck.
Comment 18 Terrence Cole [:terrence] 2013-02-06 14:45:15 PST
Created attachment 710980 [details] [diff] [review]
v0: Ported patch forward to aurora, carrying r+.

[Approval Request Comment]
See request for Beta.

Testing completed (on m-c, etc.): One day on mozilla-beta.
Comment 19 Virgil Dicu [:virgil] [QA] 2013-02-07 05:42:15 PST
Verified the memory leak on Firefox 19.0 beta 4 following the steps from comment 0.

There is no memory leak or hang for Firefox 19.0 beta 5. 
After loading the testcase with large amount of data and click "test" the pop-up with "done" appears immediately.

Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0
Build ID: 20130206083616
Comment 20 Steffen Weihrauch 2013-02-08 12:21:38 PST
Just great, thanks to all. Seems to be stable at the moment with version 19.0.0.4875 (sorry i don't know how to get the exact version).
Comment 21 Lukas Blakk [:lsblakk] use ?needinfo 2013-02-08 16:34:19 PST
Comment on attachment 710980 [details] [diff] [review]
v0: Ported patch forward to aurora, carrying r+.

I know we're doing it a little backwards on this bug what with landing to Beta first, but now that we know it's working as intended let's get this to Aurora and Trunk pronto.
Comment 22 Terrence Cole [:terrence] 2013-02-08 17:16:31 PST
https://hg.mozilla.org/releases/mozilla-aurora/rev/1939e049e60d
Comment 23 Terrence Cole [:terrence] 2013-02-12 09:20:57 PST
https://hg.mozilla.org/integration/mozilla-inbound/rev/3b6cf8fe73e0
Comment 24 Ryan VanderMeulen [:RyanVM] 2013-02-12 18:29:13 PST
https://hg.mozilla.org/mozilla-central/rev/3b6cf8fe73e0
Comment 25 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2013-02-14 10:24:41 PST
Dave, this seems like it might be a good candidate for a Mozmill Endurance test.
Comment 26 Dave Hunt (:davehunt) 2013-02-14 15:35:08 PST
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #25)
> Dave, this seems like it might be a good candidate for a Mozmill Endurance
> test.

The endurance tests are for typically for repeating a snippet of user interaction (open a window/tab/bookmark) over and over whilst gathering metrics. That said, I'd be happy to help out if someone wants to work on an endurance test for this.
Comment 27 Virgil Dicu [:virgil] [QA] 2013-02-21 01:08:16 PST
No memory leak or hang for Firefox 20.0 beta 1. 
After loading the testcase with large amount of data and click "test" the pop-up with "done" appears immediately and memory usage remains at the same level.

Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0 (20130220104816)
Comment 28 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2013-03-06 15:05:11 PST
(In reply to Dave Hunt (:davehunt) from comment #26)
> I'd be happy to help out if someone wants to work on an
> endurance test for this.

Thanks Dave. I've gone ahead and filed bug 848577. I'm not sure if you have any whiteboard tags for mentorship or the like.
Comment 29 Jason Smith [:jsmith] 2013-04-16 05:29:17 PDT
Question - Why is a Mozmill test appropriate here? Why not use a TBPL-based automation framework here?
Comment 30 Dave Hunt (:davehunt) 2013-04-16 05:38:19 PDT
(In reply to Jason Smith [:jsmith] from comment #29)
> Question - Why is a Mozmill test appropriate here? Why not use a TBPL-based
> automation framework here?

You're right, we should only use Mozmill if there is no suitable alternative. Did you have one in mind Jason? As far as I know talos and AWSY only measures metrics for page loads.
Comment 31 Jason Smith [:jsmith] 2013-04-16 05:44:19 PDT
(In reply to Dave Hunt (:davehunt) from comment #30)
> (In reply to Jason Smith [:jsmith] from comment #29)
> > Question - Why is a Mozmill test appropriate here? Why not use a TBPL-based
> > automation framework here?
> 
> You're right, we should only use Mozmill if there is no suitable
> alternative. Did you have one in mind Jason? As far as I know talos and AWSY
> only measures metrics for page loads.

Based on what I'm reading on this bug, this is a memory spike due to a memory leak in the JS engine with regular expressions, right?

If it's just a leak with the existing test case attached as the point of verification, I know a crashtest would be a potential option to use here, as that runs in TBPL and does memory leak checking automatically. That would be really simple to implement as well in alignment with the attached test case given here.
Comment 32 Jason Smith [:jsmith] 2013-04-16 05:47:04 PDT
Moving discussion to bug 845577 on test framework choice.
Comment 33 Terrence Cole [:terrence] 2013-04-16 10:52:20 PDT
(In reply to Jason Smith [:jsmith] from comment #32)
> Moving discussion to bug 845577 on test framework choice.

I'm not sure that's the right bug you linked.

In any case, this bug wasn't a leak, it was an expected increase in memory usage; it just happened to be much more drastic than desired in this particular case. Given that the regressing code is gone now (and it was a very specific piece of code), I don't think a test is really warranted here.
Comment 34 Jason Smith [:jsmith] 2013-04-16 10:55:53 PDT
(In reply to Terrence Cole [:terrence] from comment #33)
> (In reply to Jason Smith [:jsmith] from comment #32)
> > Moving discussion to bug 845577 on test framework choice.
> 
> I'm not sure that's the right bug you linked.

Ack. Meant to say bug 848577.

> 
> In any case, this bug wasn't a leak, it was an expected increase in memory
> usage; it just happened to be much more drastic than desired in this
> particular case. Given that the regressing code is gone now (and it was a
> very specific piece of code), I don't think a test is really warranted here.

Fair enough. Minusing then on that basis.
Comment 35 Bogdan Maris, QA [:bogdan_maris] 2013-04-17 07:42:36 PDT
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:21.0) Gecko/20100101 Firefox/21.0

No memory leak or hang for Firefox 21.0 beta 3 (buildID: 20130416200523).
After loading the source of the forum in the box and test is clicked, I see a 10 MB memory increase in Task Manager, I think that is reasonable since there is so much code loading.

Note You need to log in before you can comment on or make changes to this bug.