Closed Bug 837845 Opened 11 years ago Closed 11 years ago

Multi-GB memory spike when using regular expressions

Categories

(Core :: JavaScript Engine, defect)

19 Branch
x86_64
Windows 7
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla21
Tracking Status
firefox18 --- unaffected
firefox19 + verified
firefox20 + verified
firefox21 + verified

People

(Reporter: steffen.weihrauch, Assigned: terrence)

References

Details

(Keywords: regression, Whiteboard: [MemShrink:P1])

Attachments

(3 files, 1 obsolete file)

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0
Build ID: 20130130080006

Steps to reproduce:

Using regular expressions with Firefox 19.0 (beta).


Actual results:

Mega Memory Leak. Have tested on two different computers with windows 7, same result. On Firefox 18.0.1 there are all ok (see attachment for testcase).


Expected results:

Same behaviour as firefox 18.0.1 and all other browsers => no memeory leak
Group: core-security
Summary: Memory leak → Memory leak when using regular expressions
Attachment #709849 - Attachment mime type: text/plain → text/html
QA Contact: anthony.s.hughes
How are you measuring that it's a memory leak, and how big is the leak in exact terms?
> Firefox 18.0.1
RAM marginally increases, alerts "done" almost immediately

> Firefox 20.0a2
RAM balloons to 3.4GB almost immediately, does not alert "done"
Navigating away from the page drops memory back to normal levels

> Firefox 19.0b4
Ballooning memory while testcase open and browser hangs (needs to be force closed)

> Firefox 19.0a1 2012-11-19 (last mozilla-central-19)
Ballooning memory while testcase open.
Memory is NOT released when navigating away.

The way I see it we have a couple of different issues in Firefox 19 here. 
1. Memory ballooning with the testcase (introduced in Firefox 19)
2. Memory leak after closing testcase ("fixed" in Firefox 20)

I think the priority should be trying to fix the ballooning memory since I think that's the catalyst for this bug. I'll try to find a regression range for this.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Regression range for the memory ballooning to 3.4GB:
* Last Good: 2012-10-11 (99898ec9976a)
* First Bad: 2012-10-12 (90857937b601)
* Pushlog: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=99898ec9976a&tochange=90857937b601

Unfortunately there is three PGO merges totalling 234 changes in this pushlog and I can't seem to find hourlies going back to this date range.
Hmm.  In that range, bug 798624 seems like it might be related.

Anthony, would you be willing to complain to the IT folks about the lack of hourlies older than 30 days?  We've been trying to get them to fix that for months with no luck.  :(  Maybe if they heard from someone not on the platform team...
Regression window(m-c)
Good:
http://hg.mozilla.org/mozilla-central/rev/99898ec9976a
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/19.0 Firefox/19.0 ID:20121010133903
Bad:
http://hg.mozilla.org/mozilla-central/rev/2fae8bd461da
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/19.0 Firefox/19.0 ID:20121011064702
Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=99898ec9976a&tochange=2fae8bd461da


Regression window(cached m-i)
Good:
http://hg.mozilla.org/integration/mozilla-inbound/rev/88bb0e2f0373
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/19.0 Firefox/19.0 ID:20121009171503
Bad:
http://hg.mozilla.org/integration/mozilla-inbound/rev/0761bc637081
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/19.0 Firefox/19.0 ID:20121009174802
Pushlog:
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=88bb0e2f0373&tochange=0761bc637081

Triggered by:
2c08d52e521d	Terrence Cole — Bug 798624 - Specialize low-level character access to JSStableString; r=luke, rs=Waldo Implements JSStableString::chars() and pushes StableCharPtr into the interface at several key choke-points. This will solidly enforce the requirement that we have uninlined jschar*s in places that we may GC with a jschar* on the stack.
Blocks: 798624
Thanks a lot for your help, Alice!
Assignee: general → terrence
Whiteboard: [MemShrink]
In today's aurora, after clicking the button the console shows that an out of memory exception was raised.

Timestamp: 2/5/2013 10:21:09 AM
Error: Uncaught asynchronous error: [Exception... "Out of Memory"  nsresult: "0x8007000e (NS_ERROR_OUT_OF_MEMORY)"  location: "JS frame :: resource://gre/modules/sessionstore/_SessionFile.jsm :: task :: line 207"  data: no] at
undefined
Source File: resource://gre/modules/sessionstore/_SessionFile.jsm
Line: 108

Timestamp: 2/5/2013 10:24:31 AM
Error: out of memory
Source File: resource:///modules/sessionstore/SessionStore.jsm
Line: 4169

Refreshing the page afterwards reliably crashes the browser. This should be fun.
Attached patch v0 (obsolete) — Splinter Review
Manually backout the API and RegExp aspects of the regressing bug.

Try run is at:
https://tbpl.mozilla.org/?tree=Try&rev=ca79ec726062
Summary: Memory leak when using regular expressions → Multi-GB memory spike when using regular expressions
Attachment #710328 - Flags: review?(jwalden+bmo)
Comment on attachment 710328 [details] [diff] [review]
v0

Review of attachment 710328 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsstr.cpp
@@ +2465,1 @@
>              return false;

Let's get rid of this if(!) test, because it's going to look crazy if it stands.
Attachment #710328 - Flags: review?(jwalden+bmo) → review+
(In reply to Terrence Cole [:terrence] from comment #8)
> Created attachment 710328 [details] [diff] [review]
> v0
> 
> Manually backout the API and RegExp aspects of the regressing bug.
> 
> Try run is at:
> https://tbpl.mozilla.org/?tree=Try&rev=ca79ec726062

I've confirmed with the win32-opt build that the 3.4GB memory spike no longer occurs and the testcase alerts "done" almost immediately. This should be safe to land on the branches.
[Approval Request Comment]
Bug caused by (feature/regressing bug #): Bug 798624
User impact if declined: Memory spikes under string heavy workloads that cross the JSAPI.
Testing completed (on m-c, etc.): Local shell tests pass.
Risk to taking this patch (and alternatives if risky): Low.
String or UUID changes made by this patch: None.

The Try run for this patch is fairly orange, but this appears to be because I based the patch off of mozilla-beta and we've changed the build infrastructure since then. I'm running another try run of a bare mozilla-beta tree to test  (https://tbpl.mozilla.org/?tree=Try&rev=0c64734b0437), although #releng tells me that orange and worse should be expected here. I'm currently whipping up a similar patch against aurora to Try, which will hopefully work better.

The good news is that this patch is very unlikely to have any adverse impact: if a shell or browser builds and runs at all then it will be exercising the changed code. Since that works fine locally, I think it is most likely fine.
Attachment #710389 - Flags: review+
Attachment #710389 - Flags: approval-mozilla-beta?
Try push for Aurora patch, with branding set:
https://tbpl.mozilla.org/?tree=Try&rev=a838077c9603
Comment on attachment 710389 [details] [diff] [review]
v1: review comment addressed

Let's land on Beta and quickly back out if we see new orange, rather than waiting on further builds/tests. Waiting to find out whether we regress anything first may push a fix into beta 6, which is super close to the release.
Attachment #710389 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(I guess disabled is the right flag for when the change is backed out)
I think this stuck.
Whiteboard: [MemShrink] → [MemShrink:P1]
Attachment #710389 - Flags: checkin+
Attachment #710328 - Attachment is obsolete: true
[Approval Request Comment]
See request for Beta.

Testing completed (on m-c, etc.): One day on mozilla-beta.
Attachment #710980 - Flags: review+
Attachment #710980 - Flags: approval-mozilla-aurora?
Verified the memory leak on Firefox 19.0 beta 4 following the steps from comment 0.

There is no memory leak or hang for Firefox 19.0 beta 5. 
After loading the testcase with large amount of data and click "test" the pop-up with "done" appears immediately.

Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0
Build ID: 20130206083616
Just great, thanks to all. Seems to be stable at the moment with version 19.0.0.4875 (sorry i don't know how to get the exact version).
Comment on attachment 710980 [details] [diff] [review]
v0: Ported patch forward to aurora, carrying r+.

I know we're doing it a little backwards on this bug what with landing to Beta first, but now that we know it's working as intended let's get this to Aurora and Trunk pronto.
Attachment #710980 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #710980 - Flags: checkin+
https://hg.mozilla.org/mozilla-central/rev/3b6cf8fe73e0
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla21
Dave, this seems like it might be a good candidate for a Mozmill Endurance test.
Flags: in-qa-testsuite?(dave.hunt)
Keywords: verifyme
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #25)
> Dave, this seems like it might be a good candidate for a Mozmill Endurance
> test.

The endurance tests are for typically for repeating a snippet of user interaction (open a window/tab/bookmark) over and over whilst gathering metrics. That said, I'd be happy to help out if someone wants to work on an endurance test for this.
No memory leak or hang for Firefox 20.0 beta 1. 
After loading the testcase with large amount of data and click "test" the pop-up with "done" appears immediately and memory usage remains at the same level.

Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0 (20130220104816)
Flags: in-qa-testsuite?(dave.hunt) → in-qa-testsuite?
(In reply to Dave Hunt (:davehunt) from comment #26)
> I'd be happy to help out if someone wants to work on an
> endurance test for this.

Thanks Dave. I've gone ahead and filed bug 848577. I'm not sure if you have any whiteboard tags for mentorship or the like.
Blocks: 848577
Flags: in-qa-testsuite? → in-qa-testsuite?(hskupin)
Question - Why is a Mozmill test appropriate here? Why not use a TBPL-based automation framework here?
(In reply to Jason Smith [:jsmith] from comment #29)
> Question - Why is a Mozmill test appropriate here? Why not use a TBPL-based
> automation framework here?

You're right, we should only use Mozmill if there is no suitable alternative. Did you have one in mind Jason? As far as I know talos and AWSY only measures metrics for page loads.
(In reply to Dave Hunt (:davehunt) from comment #30)
> (In reply to Jason Smith [:jsmith] from comment #29)
> > Question - Why is a Mozmill test appropriate here? Why not use a TBPL-based
> > automation framework here?
> 
> You're right, we should only use Mozmill if there is no suitable
> alternative. Did you have one in mind Jason? As far as I know talos and AWSY
> only measures metrics for page loads.

Based on what I'm reading on this bug, this is a memory spike due to a memory leak in the JS engine with regular expressions, right?

If it's just a leak with the existing test case attached as the point of verification, I know a crashtest would be a potential option to use here, as that runs in TBPL and does memory leak checking automatically. That would be really simple to implement as well in alignment with the attached test case given here.
Moving discussion to bug 845577 on test framework choice.
(In reply to Jason Smith [:jsmith] from comment #32)
> Moving discussion to bug 845577 on test framework choice.

I'm not sure that's the right bug you linked.

In any case, this bug wasn't a leak, it was an expected increase in memory usage; it just happened to be much more drastic than desired in this particular case. Given that the regressing code is gone now (and it was a very specific piece of code), I don't think a test is really warranted here.
(In reply to Terrence Cole [:terrence] from comment #33)
> (In reply to Jason Smith [:jsmith] from comment #32)
> > Moving discussion to bug 845577 on test framework choice.
> 
> I'm not sure that's the right bug you linked.

Ack. Meant to say bug 848577.

> 
> In any case, this bug wasn't a leak, it was an expected increase in memory
> usage; it just happened to be much more drastic than desired in this
> particular case. Given that the regressing code is gone now (and it was a
> very specific piece of code), I don't think a test is really warranted here.

Fair enough. Minusing then on that basis.
Flags: in-qa-testsuite?(hskupin) → in-qa-testsuite-
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:21.0) Gecko/20100101 Firefox/21.0

No memory leak or hang for Firefox 21.0 beta 3 (buildID: 20130416200523).
After loading the source of the forum in the box and test is clicked, I see a 10 MB memory increase in Task Manager, I think that is reasonable since there is so much code loading.
Status: RESOLVED → VERIFIED
Keywords: verifyme
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: