Closed Bug 660734 Opened 13 years ago Closed 13 years ago

Strings are not counted by GC heuristics (high memory use with regexps on large strings)


(Core :: JavaScript Engine, defect)

Windows 7
Not set





(Reporter: jarben, Assigned: luke)



(Keywords: memory-footprint, testcase, Whiteboard: [MemShrink:P2][fixed-in-tracemonkey])


(3 files, 1 obsolete file)

User-Agent:       Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0.1) Gecko/20100101 Firefox/4.0.1
Build Identifier: 4.0.1

Try attached example and see memory usage: 

FF4 uses 109MB - 560MB 
FF3 uses 73MB - 196MB
Chrome 11 uses 55MB - 60MB

The other (and major) problem is that it leaks when embedded as in iframe but I haven't been able to reproduce a simple test case outside of our product, will try and report it as another bug. 

Interesting that Chrome can do the same operation with 5MB, maybe would be worth to overview this as it might improve performance in general. 

Reproducible: Always
Attachment #536215 - Attachment is obsolete: true
Attachment #536218 - Attachment mime type: text/plain → text/html
Attachment #536215 - Attachment mime type: text/plain → text/html
I am seeing pretty high memory usage, it seems to go from roughly 180mb to 630 before garbage collection kicks it and it goes back down. But the number it goes to is always higher than the number is started at the last cycle (eg, 180-630 GC 140-630GC)
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0.1) Gecko/20100101 Firefox/4.0.1
Ever confirmed: true
Keywords: testcase
Version: unspecified → 2.0 Branch
Luke, could this have anything to do with ropes?
The rope optimization only applies when we can do "flat" matches (e.g., str.replace('a', 'b')), not when we are given a regexp as in the example.

Basically, the testcase just takes a huge string and does 6 regexp replace's on it (each producing a copy), and does that from setTimeout every 10ms.  Each string becomes garbage, so it should all get collected on each GC (which about:memory and ps seems to show is the case).  I watched total usage for several minutes and saw no increase in the peaks or valleys of the sawtooth curve.  I believe the reason we get so high compared to Chrome is b/c we wait so long for a GC whereas Chrome (with its generational gc) is going to collect much more often and thus collect all the garbage being created.  Our own GC work should help this.
Luke, any thoughts why FF3 memory behaviour is better? Is there anything we could do, would WeakMap help in this case?
3.6 is a lot more aggressive about running gc than 4.0 is.  That means more time is taken running GC, though, which hurts throughput.
Blocks: mlk-fx4
Keywords: footprint
Whiteboard: [MemShrink:P2]
I also noticed this on my very low end netbook where it causes serious memory pressure problems as I told Luke yesterday.
I looked at a shark profile and noticed that most memory is allocated via our vectors and vector reallocs don't account for our GC triggers. So we could probably run into OOM without running the GC.
Luke does this sound reasonable?
Could you point me to the particular Vectors that are doing this?  So, recently, igor changed ContextAllocPolicy to not do gc-malloc-bytes accounting since, in theory, if you can use ContextAllocPolicy, you must be short-lived (else you can get a dangling JSContext, as you know :), and hence you are "temporary" and thats not what gcMallocBytes is measuring.  However, here you have a case where a Vector is long-lived enough to cause real GC pressure.  So it seems we just really need a new AllocPolicy.  There have been a couple of recent examples that our AllocPolicy situation is weird and needs to be refactored; I'll file a bug once you point me to the offending Vector so I can check out the code.
Aha, when ContextAllocPolicy stopped accounting gcMallocBytes, StringBuffer's internal vector stopped accounting for its allcoations.  Since the StringBuffer's buffer is ultimately extracted to be the long-lived buffer of a string, this is totally a hole in the GC heuristic!  No wonder the GC sawtooth gets to high.  The fix is simply to give StringBuffer's Vector a policy that does do accounting.
Luke, I'll assign this to you, hope this is ok.
Assignee: general → luke
Version: 2.0 Branch → unspecified
Sure, I can take it.

What I'd like to do is throw out all these crazy "Context" "System" "Runtime" names and make a new set whose names correspond the pairs:
  { Reporting, NonReporting } x { Accounting, NonAccounting }
(In the process auditing our existing policy use.)
I need to put back the old ContextAllocPolicy in the next patch (the one that calls cx->malloc_ instead of js_malloc+report), so rename the thing currently (mis)named as ContextAllocPolicy.  I'd rather not debate the merits of the name "TempAllocPolicy" since all these *AllocPolicy names need to be changed after we settle on the underlying allocator names in bug 647103.
Attachment #539926 - Flags: review?(igor)
Incidentally, even before Igor's changes, StringBuffer had accounting problems since it called realloc() without passing oldBytes (which means realloc() wouldn't do accounting).  Fortunately, we just added an oldBytes param to AllocPolicy (like, yesterday).
Attachment #539928 - Flags: review?(igor)
With these two patches (applied to TM tip) I see memory usage drop from FF4 levels to FF3 levels (as reported by comment 0).
(In reply to comment #15)
> With these two patches (applied to TM tip) I see memory usage drop from FF4
> levels to FF3 levels (as reported by comment 0).

Great!  Thanks, Luke.
Comment on attachment 539928 [details] [diff] [review]
add old ContextAllocPolicy, use in StringBuffer

Review of attachment 539928 [details] [diff] [review]:
Attachment #539928 - Flags: review?(igor) → review+
Igor: there is one other patch needing your r+ (purely syntactic).
Comment on attachment 539926 [details] [diff] [review]

Review of attachment 539926 [details] [diff] [review]:
Attachment #539926 - Flags: review?(igor) → review+
Whiteboard: [MemShrink:P2] → [MemShrink:P2][fixed-in-tracemonkey]
My netbook says DANKE! :)
Closed: 13 years ago
Resolution: --- → FIXED
Summary: Firefox 4: Huge memory usage when performing regular expressions on large strings → Strings are not counted by GC heuristics (high memory use with regexps on large strings)
You need to log in before you can comment on or make changes to this bug.