LLVM-to-JS converter causing SM to use lots of memory

ASSIGNED
Unassigned

Status

()

Core
JavaScript Engine
ASSIGNED
7 years ago
3 years ago

People

(Reporter: azakai, Unassigned)

Tracking

(Depends on: 2 bugs)

Other Branch
x86
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking2.0 -)

Details

Attachments

(4 attachments)

(Reporter)

Description

7 years ago
Created attachment 472229 [details]
Test code archive

Attached is code that when run in SpiderMonkey, takes over 10X the memory that v8 uses. SpiderMonkey (a recent build of JaegerMonkey) goes up to around 500MB for a few seconds, then stabilizes at 411MB. v8 takes less than 40MB.

Instructions:

1. Unpack the attachment into a directory.
2. In that directory, run
     PATH-TO-JS-ENGINE/js < src.cpp.o.llvm

Other details: The attached code is emscripten (llvm->js compiler) run on code adapted from cube 2 (FOSS game engine) and compiled to llvm. Tested on Ubuntu 32bit, recent Jaegermonkey build, recent v8 build.
(Reporter)

Comment 1

7 years ago
Some more info:

1. The initial memory bump mentioned before happens in the Tokenizer, parser.js:360. (That function is called once per line of llvm, so over 14,000 times.)

2. Enabling relooping (RELOOPING=1 in settings.js, this leads to parser.js:1172 being run) makes things worse. v8 goes to 70MB, while SpiderMonkey takes more memory than my laptop can handle, killed it at somewhere over 700MB.
(Reporter)

Comment 2

7 years ago
Sorry, that command should have been

  PATH-TO-JS-ENGINE/js parser.js < src.cpp.o.llvm
(In reply to comment #0)
>
> 1. Unpack the attachment into a directory.
> 2. In that directory, run
>      PATH-TO-JS-ENGINE/js < src.cpp.o.llvm

AFAICT that should be:

       PATH-TO-JS-ENGINE/js jsparser.cpp < src.cpp.o.llvm

When I run that it spews a lot of output and then halts with

  utility.js:13: out of memory

after about 10s.  This is on my desktop linux machine which has 4GiB of memory.  Running 'top' at the same time the virtual size exceeds 4GiB and the resident (physical) size exceeds 3GiB.

Line 13 in utility.js is at the bottom of the function lineify():

function dump(item) {
  function lineify(text) {
    var ret = '';
    while (text.length > 0) {
      if (text.length < 80) {
        ret += text;
        return ret;
      }
      var subText = text.substring(60, 80);
      var index = 61+Math.max(subText.indexOf(','), subText.indexOf(']'), subTex
t.indexOf('}'), 21);
      ret += text.substr(0,index) + '\n';
      text = '// ' + text.substr(index);      // LINE 13
    }
  }

I'm about to attach an abridged heap profile from Massif.  At termination almost 2 GiB of heap memory had been allocated (it aborted quicker under Massif than it would natively because Valgrind/Massif has to track heaps of extra stuff).  It appears the culprit is line 397 of jsstr.cpp (in TM 52847:857ab0bfeb22) which is in js_ConcatStrings() -- over 99% of the allocated memory is from that line.  I suspect we're doing something silly if V8 shows much smaller memory usage.

(For those unfamiliar with Massif's output, http://www.valgrind.org/docs/manual/ms-manual.html#ms-manual.theoutputgraph explains how to read it.)
Alan Pierce implemented ropes, but I can't work out what his bugzilla email is.  If someone else knows can they please CC him?
Summary: Example code that takes an extreme amount of memory → string concatenation using huge amounts of memory, much more than V8
Created attachment 472312 [details]
Massif profile
Nominating for blocking2.0 because string concatenation is important.
blocking2.0: --- → ?
Actually, here's what's going on:

- If you run 'js' or 'js -m', it runs to completion successfully.

- If you run 'js -j', it aborts early with an OOM on utility.js:13.

- If you run 'js -j -m', it aborts a little later with this:

    uncaught exception: // failzzzzzzzzzzzzzz undefined ::: {"text":"i32"}


So the OOM is a red herring;  there's a trace JIT bug that is independent of azakai's original complaint.
blocking2.0: ? → ---
Summary: string concatenation using huge amounts of memory, much more than V8 → problems with LLVM-to-JS converter and the trace JIT
I opened bug Bug 596730 to deal with the tracer bug.  This bug can be retasked to cover its original issue, ie. that it's using lots of memory.  Apologies for any confusion.  Anyone looking at this bug would be advised to use 'js -m' (ie. the methodjit only) until bug 596730 is addressed.
Summary: problems with LLVM-to-JS converter and the trace JIT → LLVM-to-JS converter causing SM to use lots of memory
(Reporter)

Comment 9

7 years ago
Created attachment 483896 [details]
Better testcase

This is a better testcase. It reaches the problem much faster - just a few seconds, and it takes over 1.5GB of RAM (at which point my laptop runs out of memory). Runs ok in v8, using very little memory.

Run it with

  ..../js/src/js compiler.js < libLinearMath.ll 

The same behavior happens with -m, -j, -m -j and with no parameters.

Note that an exception is thrown in the program. That is what should happen.

Stack trace taken when memory usage surges (when run without -m or -j):

#0  0xb753d9b1 in memcpy () from /lib/tls/i686/cmov/libc.so.6
#1  0x081c9b4c in JSString::flatten (this=0xb598a1a0) at /usr/include/bits/string3.h:52
#2  0x081caedd in js_NewDependentString (cx=0x9909860, base=0xb598a1a0, start=60, length=20) at jsstr.h:247
#3  0x081ce50b in str_substring (cx=0x9909860, argc=<value optimized out>, vp=0xb6f942e0) at jsstr.cpp:967
#4  0x0834438a in js::Interpret (cx=0x9909860, entryFrame=0xb6f941b0, inlineCallCount=2, interpFlags=0) at jscntxtinlines.h:652
#5  0x0810c239 in js::RunScript (cx=0x9909860, script=0x992c6b0, fp=0xb6f941b0) at jsinterp.cpp:638
#6  0x08110e9c in js::Invoke (cx=0x9909860, argsRef=@0xbf959238, flags=<value optimized out>) at jsinterp.cpp:747
#7  0x0808e288 in js::InvokeSessionGuard::invoke (this=0xbf959238, cx=0x9909860) at jsinterpinlines.h:553
#8  0x080892b8 in array_extra (cx=0x9909860, mode=FOREACH, argc=<value optimized out>, vp=0xb6f94158) at jsarray.cpp:2721
#9  0x0834438a in js::Interpret (cx=0x9909860, entryFrame=0xb6f94030, inlineCallCount=2, interpFlags=0) at jscntxtinlines.h:652
#10 0x0810c239 in js::RunScript (cx=0x9909860, script=0x991d188, fp=0xb6f94030) at jsinterp.cpp:638
#11 0x0810f286 in js::Execute (cx=0x9909860, chain=0xb6d02028, script=0x991d188, prev=0x0, flags=0, result=0x0) at jsinterp.cpp:998
#12 0x0806dc63 in JS_ExecuteScript (cx=0x9909860, obj=0xb6d02028, script=0x991d188, rval=0x0) at jsapi.cpp:4844
#13 0x0805245d in Process (cx=0xb6d02028, obj=0x0, filename=0xbf95b701 "compiler.js", forceTTY=0) at js.cpp:445
#14 0x08052be9 in Shell (cx=0x8052e50, argc=1, argv=0xbf959bb8, envp=0xbf959bb8) at js.cpp:879
#15 0x08052e50 in main (argc=138334220, argv=0x8184ca0, envp=0xbd28630f) at js.cpp:5381
(Reporter)

Comment 10

7 years ago
Created attachment 485665 [details]
additional testcase, different but related error

A related testcase - newer version of emscripten, running on a much larger input file (the bullet physics engine).

Run with

  ../tracemonkey/js/src/js -m compiler.js < bulletTest.ll

(-m/-j etc. don't make a difference here). The output is something like

  // LLVM_STYLE: old
  // Solving Intertyper...
  // Cycle start, items: 
  // Processing using LineSplitter: 1
  // New results: 0 out of 0
  // Working on Intertyper, so far 4.456 seconds.
  // Processing using Tokenizer: 176757
  intertyper.js:175: out of memory

'out of memory' is thrown, however not a lot of memory is actually used.

The 'Tokenizer' component in which the problem occurs does a lot of intensive operations on strings. It was rewritten to use regexes in this version.

For comparison, V8 runs the script properly, finishing the Tokenizer and all other passes.
blocking2.0: --- → ?
(In reply to comment #3)

> stuff).  It appears the culprit is line 397 of jsstr.cpp (in TM
> 52847:857ab0bfeb22) which is in js_ConcatStrings() -- over 99% of the allocated
> memory is from that line.  I suspect we're doing something silly if V8 shows
> much smaller memory usage.

Yeah.  It's the 4th clause of js_ConcatStrings:

    } else {
        /* Neither child is a rope: need to make a new buffer. */
        printf("neither is a rope: %u + %u -> %u\n", leftLen, rightLen, length);
        size_t capacity;
        size_t allocSize = RopeAllocSize(length, &capacity);
        buf = (JSRopeBufferInfo *) cx->malloc(allocSize);
        if (!buf)
            return NULL;
        buf->capacity = capacity;
    }

The printf produces a whole bunch of these, using the example in
comment 9:

  neither is a rope: 3 + 425723 -> 425726
  neither is a rope: 3 + 425644 -> 425647
  neither is a rope: 3 + 425565 -> 425568
  neither is a rope: 3 + 425486 -> 425489
  neither is a rope: 3 + 425407 -> 425410
  neither is a rope: 3 + 425328 -> 425331

Not sure what that signifies.  Repeated concatenation of small strings
onto a big string, and the big string never gets ropified?
Duplicate of this bug: 596730
(In reply to comment #12)
> *** Bug 596730 has been marked as a duplicate of this bug. ***

That was a mistake, please ignore.

Updated

7 years ago
blocking2.0: ? → betaN+
I'll take this one.
Assignee: general → nnethercote
Status: NEW → ASSIGNED
Yuk.  This is cause by a nasty interaction between two string optimizations
-- dependent strings (used in substr()) and ropes (use in concatenation).

The important function is in utility.js:

  function lineify(text) {
    var ret = '';
    while (text.length > 0) {
      if (text.length < 80) {
        ret += text;
        return ret;
      } 
      var subText = text.substring(60, 80);
      var index = 61+Math.max(subText.indexOf(','), subText.indexOf(']'), subTex
t.indexOf('}'), 21);
      ret += text.substr(0,index) + '\n';       // OUCH 1
      text = '// ' + text.substr(index);        // OUCH 2
    } 
  }   
    
'text' is a very long string, eg. 500,000 chars.  The aim of lineify() is to
create a new string that is the same as 'text' but with line breaks every
~80 chars.  (It's a bit buggy in where it chooses to split, but that doesn't
matter for the purposes of this bug.)

Consider the line marked "OUCH 2".  The substr() call creates a new
dependent string.  Then the concatenation creates a rope that has two leaf
nodes:  the '// ' and the dependent string.  This requires allocating a big
buffer.  The next time around the loop, the rope has to be flattened before
we can do the substr() call.

So we allocate a big buffer for the first loop iteration, then a slightly
smaller one for the next iteration, etc, etc.  At first I thought that bug
609440 might fix this, but all it would do is delay each allocation from the
concat to the flattening in the subsequent loop iteration.

But "OUCH 2" is clobbering 'text' each time around -- why isn't GC cleaning
this up?  It's because on line "OUCH 1" the text.substr() call is creating a
(small) dependent string, keeping that particular (large) 'text' alive.
Furthermore, all these substrings are concatenated lazily (via ropes)
keeping them alive longer than if we concatenated eagerly.

So we end up creating all these huge flat strings and holding onto a short
substring from each one, thus keeping the huge string alive, and avoid
combining them for as long as possible.

As for a fix... maybe js_ConcatStrings() should avoid being lazy if
one of the operands is a dependent string that keeps a large string alive.
Or maybe we shouldn't bother trying to fix?  It's an unusual code pattern.

Comment 16

7 years ago
Incredible.  Great job uncovering the reason!

Just to check my understanding, if you threw in "text.indexOf('')" at the end of the loop, this won't use so much memory, yes?

Instead of coming up with a heuristic for when to eagerly flatten (which itself could lead do different pathological cases), an alternative is to soup up substr and indexOf to not flatten.  This fits in with general plans mentioned in bug 609440 to avoid calling flatten so much in the JS engine.  indexOf would be easy (extraordinarily easy -- we already have RopeMatch!), substr would require something to the effect of "dependent rope nodes" which would complicate JSString::flatten a bit.  All doable, though.

Putting in the "text.indexOf('')", do we use memory comparable to V8 and run in comparable time?  If they have rope versions of indexOf and substr, they will be much much faster.  (They could also be faster due to generational gc keeping them in cache :-/ )

Comment 17

7 years ago
I had one more idea: during the marking phase for a rope, where we already traverse all the nodes (and already test for dependent string), we could keep a running total of how much total capacity this rope is keeping pinned down and if (1) that capacity is > a big constant and (2) the lengthe of the rope is < a fixed percentage of that, we could flatten.
(In reply to comment #16)
> 
> Just to check my understanding, if you threw in "text.indexOf('')" at the end
> of the loop, this won't use so much memory, yes?

No, that doesn't help.  That causes text to be flattened at the end of the loop, but without that addition it'll be flattened at the start of the next loop anyway due to the |text.substring(60, 80)|.

If you add a |ret.indexOf('')| it avoids the memory blow-up, because it flattens |ret|.  It's still pretty slow, though, because it still allocates lots of big buffers.


> Instead of coming up with a heuristic for when to eagerly flatten (which itself
> could lead do different pathological cases), an alternative is to soup up
> substr and indexOf to not flatten.  This fits in with general plans mentioned
> in bug 609440 to avoid calling flatten so much in the JS engine.  indexOf would
> be easy (extraordinarily easy -- we already have RopeMatch!), substr would
> require something to the effect of "dependent rope nodes" which would
> complicate JSString::flatten a bit.  All doable, though.

Mmm, but it's more string variants, and we already have plenty.  An alternative might be to make substring() create a flat string by traversing the rope without flattening it?


> Putting in the "text.indexOf('')", do we use memory comparable to V8 and run in
> comparable time?  If they have rope versions of indexOf and substr, they will
> be much much faster.

I haven't tried V8 on it.  Sounds like you know for a fact that V8 has ropes?

Comment 19

7 years ago
(In reply to comment #18)
> If you add a |ret.indexOf('')| it avoids the memory blow-up

Oops, yes, I meant 'ret'.

> Mmm, but it's more string variants, and we already have plenty.  An alternative
> might be to make substring() create a flat string by traversing the rope
> without flattening it?

I generally fight complexity, but of the global/pervasive variety.  As long as all the variations on strings are encapsulated behind a simple interface (both C++ and from generated jit code) and the increase in complexity is restricted to js_ConcatStrings/JSString::flatten (which are way simpler after bug 609440, btw), I don't see any problem.  Having substring() not flatten its input would save 1 giant flattening, but it still leaves one giant copy per iteration that a fully rope-ified substring() would avoid.

> I haven't tried V8 on it.  Sounds like you know for a fact that V8 has ropes?

I believe they have a segmented string data structure; I don't know how much it resembles our own.
Thinking some more, there are two problems here.  First, calling substr() on a rope requires it to be flattened, causing extra allocations and copying.  Second, dependent strings can cause large amounts of dead string memory to be held onto.

In this test case, fixing the first problem (as Luke proposes) means the second problem is avoided.  That's worth doing, and I'll give some thought how best to proceed.

But the second problem can strike independent of the first.  Eg. it'll happen any time a program gets a small substring from a large string.  I don't know how pervasive it is in practice.  I wonder if it's possible during GC to detect this case and convert the substring into a standalone flat string and reclaim the large string.

Comment 21

7 years ago
Did you, by chance, see comment 17?
Depends on: 609440
Depends on: 615280
Depends on: 616910

Comment 22

7 years ago
(In reply to comment #20)
> Thinking some more, there are two problems here.  First, calling substr() on a
> rope requires it to be flattened,

This suggests to use short strings for rope substrings and avoid rope flattening. As the short string can hold up to 11 chars this should cover many practical cases. I filed bug 616941 about it.
At this point, I don't think we can block Fx4 on a 500MB memory blowup on a highly advanced JS application.
blocking2.0: betaN+ → -
(In reply to comment #21)
> Did you, by chance, see comment 17?

Definitely -- a great idea we discussed back when dependent strings first went in (9 years ago? something like that).

This seems worth filing separately as a blocker for this bug, unless it fits here.

/be
(In reply to comment #24)
> 
> This seems worth filing separately as a blocker for this bug, unless it fits
> here.

Already done -- bug 616910.
Assignee: n.nethercote → nobody
You need to log in before you can comment on or make changes to this bug.