Severe memory leak problems in dehydra

RESOLVED FIXED

Status

--
critical
RESOLVED FIXED
9 years ago
8 months ago

People

(Reporter: dnovillo, Assigned: taras.mozilla)

Tracking

Trunk
x86
Linux

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments)

(Reporter)

Description

9 years ago
We are running into memory problems with dehydra and it's not
clear where the leaks are.  This is the description of the
problem from one of our users:

===================================================================
I'm trying to run the script on a bigger set of files, and it runs
fine, except when I need to run in DEBUG mode.  In that mode, I
start using a lot of memory (>3.5G), and eventually crash the
computer.

I've tracked it down to a memory leak in dehydra's print function
-- or, if not a leak, at least a holding-on to memory.

There's one leak for sure: in dehydra_builtins.c, Print() is
defined with this line:

      fprintf(out, "%s", JS_GetStringBytes(str));

This is a memory leak, since we need to call JS_Free() on the
output of JS_GetStringBytes().  I figured this out by looking at
another(!) copy of Print, defined in js/js.c:

---
       bytes = JS_EncodeString(cx, str);
       if (!bytes)
           return JS_FALSE;
       fprintf(gOutFile, "%s%s", i ? " " : "", bytes);
       JS_free(cx, bytes);
---

But even when I fixed that, I still see usage spike to 3.5G.  In
fact, the memory still spikes even if I don't print() at all -- I
wrote my own print routine in javascript, which just recursed
through the struct and built up the string, but never printed it
--, which leads me to believe that javascript is just caching the
strings that I create when printing, and never garbage collects
them.

While bad.cc is small enough I can get debug info out of it
without crashing my computer, most files in our codebase are not.  I
can't really debug 'in the wild' while this is going on.
===================================================================

I managed to create the same problem with this simple script 

function process_decl(t)
{ 
  print(t);
}

I executed it over a relatively large .ii file (~640Kb) and it
caused cc1plus to grow to 3Gb before I killed it.

I patched various calls to JS_GetStringBytes (below) to free the memory
they allocate, but that's clearly insufficient as I still manage
to make the compiler grow to 3Gb.

Is there any way to trace memory allocation in dehydra and/or
SpiderMonkey?

diff -r 0cb4e99fb036 dehydra_builtins.c
--- a/dehydra_builtins.c        Tue Sep 08 12:58:36 2009 -0700
+++ b/dehydra_builtins.c        Tue Sep 08 18:56:10 2009 -0700
@@ -171,7 +171,9 @@
     JSString *str = JS_ValueToString(cx, argv[i]);
     if (!str)
       return JS_FALSE;
-    fprintf(out, "%s", JS_GetStringBytes(str));
+    char *s = JS_GetStringBytes(str);
+    fprintf(out, "%s", s);
+    JS_free(cx, s);
   }
   fprintf(out, "\n");
   return JS_TRUE;
@@ -281,6 +283,7 @@
           break;
         }
       } while (*str);
+      JS_free(cx, str);
     }
   }
   fflush(stderr);
@@ -319,8 +322,10 @@
                    filename, strerror(errno));
     return JS_FALSE;
   }
-  fwrite (JS_GetStringBytes(str), 1, JS_GetStringLength(str), f);
+  char *s = JS_GetStringBytes(str);
+  fwrite (s, 1, JS_GetStringLength(str), f);
   fclose (f);
+  JS_free(cx, s);
   return JS_TRUE;
 }
(Assignee)

Comment 1

9 years ago
Have you tried calling js_GC on every process_decl(or some other repetitive point?), you'd need to add a binding for it or call it from C. When I wrote dehydra I didn't realize spidermonkey expected me to manually call GC, and I haven't run into a clear testcase like you are describing to test a correct gc strategy on.
(Reporter)

Comment 2

9 years ago
(In reply to comment #1)
> Have you tried calling js_GC on every process_decl(or some other repetitive
> point?), you'd need to add a binding for it or call it from C. When I wrote
> dehydra I didn't realize spidermonkey expected me to manually call GC, and I
> haven't run into a clear testcase like you are describing to test a correct gc
> strategy on.

Thanks Taras.  I'm using the attached patch.  It calls js_GC every 50 invocations from process_{decl,type,function}.  Seems like a good compromise and the testcase that used to take 3Gb now takes ~500Mb.  Running js_GC every time reduces memory consumption to 80Mb, but doubles run times.

Is this acceptable?
(Reporter)

Comment 3

9 years ago
Created attachment 399393 [details] [diff] [review]
Call js_GC() every so often
(Assignee)

Comment 4

9 years ago
I'd rather use a memory-sized-based heuristic. Something like GC every time memory usage doubles?
(Reporter)

Comment 5

9 years ago
(In reply to comment #4)
> I'd rather use a memory-sized-based heuristic. Something like GC every time
> memory usage doubles?

Sure, but I'll need more guidance here.  I tried doing that initially, but I don't know what to call to determine the size of the GC pool.
(Assignee)

Comment 6

9 years ago
try JS_GetGCParameter(rt, JSGC_BYTES)
JS_MaybeGC(cx) is the "standard" heuristic-GC API, give it a try maybe? It has been over-tuned a bit for Firefox, but it could be ok for your embedding.

/be

Comment 8

9 years ago
Awesome - I've been noticing 2GB+ consumption when running over the mozilla tree, too, which makes parallel builds impractical. Would love to see a patch here... if we add an API for calling {GC,MaybeGC} I can update the docs with it.
I hope you can find a few good places to call JS_MaybeGC or JS_GC in dehydra itself, and not let loose the attractive nuisance of a scriptable GC() or maybeGC() function. Java did this and it was abused endlessly.

/be
(Assignee)

Comment 10

9 years ago
so JS_GetStringBytes doesn't leak, it just needs gc to run
(Reporter)

Comment 11

9 years ago
(In reply to comment #9)
> I hope you can find a few good places to call JS_MaybeGC or JS_GC in dehydra
> itself, and not let loose the attractive nuisance of a scriptable GC() or
> maybeGC() function. Java did this and it was abused endlessly.

Agreed.  Doing this inside dehydra itself seems much saner.


Diego.

Comment 12

9 years ago
Note that I already added a JS_MaybeGC to treehydra (it gets called after every treehydra_call_js). We should add something similar after every dehydra callback, I think.
(Reporter)

Comment 13

9 years ago
(In reply to comment #12)

> We should add something similar after every dehydra
> callback, I think.

I'm attaching a patch to do just that.  It limits cc1plus memory consumption to <400Mb on the test case I have and it doesn't seem to have a big impact on runtime.

Is this patch OK?
(Reporter)

Comment 14

9 years ago
Created attachment 399521 [details] [diff] [review]
Call JS_MaybeGC() after every process_{function,decl,type}
(Assignee)

Comment 15

9 years ago
Comment on attachment 399521 [details] [diff] [review]
Call JS_MaybeGC() after every process_{function,decl,type}

looks good to me
Attachment #399521 - Flags: review+
(Assignee)

Comment 16

9 years ago
http://hg.mozilla.org/rewriting-and-analysis/dehydra/rev/882c9c5fe700
Status: NEW → RESOLVED
Last Resolved: 9 years ago
Resolution: --- → FIXED
Comment on attachment 399521 [details] [diff] [review]
Call JS_MaybeGC() after every process_{function,decl,type}

Don't include jsgc.h, it's library-private or "friends only" (heh). It does not contain any JS_MaybeGC or similar API prototypes, you get those via jsapi.h as usual.

/be
(Assignee)

Comment 18

9 years ago
(In reply to comment #17)
> (From update of attachment 399521 [details] [diff] [review])
> Don't include jsgc.h, it's library-private or "friends only" (heh). It does not
> contain any JS_MaybeGC or similar API prototypes, you get those via jsapi.h as
> usual.
> 

thanks
http://hg.mozilla.org/rewriting-and-analysis/dehydra/rev/9f9bb511ad62
(Reporter)

Comment 19

9 years ago
(In reply to comment #18)

> thanks
> http://hg.mozilla.org/rewriting-and-analysis/dehydra/rev/9f9bb511ad62

I had included jsgc.h in dehydra_plugin.c as well.

Comment 21

9 years ago
i'd just like to point out that we actually have documentation at
https://developer.mozilla.org/en/SpiderMonkey/JSAPI_Reference/JS_GetStringBytes and it is very clear that callers don't own the string. I wish we'd consider an API break to add 'const' to its return type :(

That page also indicates that JS_EncodeString is different :)

Updated

8 months ago
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.