Closed Bug 120977 Opened 23 years ago Closed 13 years ago

Investigate reference counting for JS strings

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: bratell, Unassigned)

References

Details

(Keywords: memory-footprint, perf)

As shown in bug 116711 we can sometimes allocate way too much memory for strings
because we don't know that we can reuse the same buffer over and over again.
This is not so much a memory issue as a performance issue. Allocating and
freeing memory to/from the heap is very expensive compared to many of the small
operations usually connected to the allocations making us suck in the test cases
when compared that MSIE (that must be reusing buffers much better than us).

Changing to reference counting for instance strings could make it much easier to
reuse memory but at the same time it adds a lot of addrefs and releases
that could slow some things down so it's not obvious what's best in real world
applications.

If I understood Brendan correctly, this would need a lot of work that would have
to be done post 1.0. Just filing this bug as a reminder.
That should be bug 117611. 
Blocks: 117611
Keywords: perf
Giving to khanson.

/be
Assignee: rogerl → khanson
Summary: Investigate reference counting for some js objects → Investigate reference counting for JS strings
As my test that brute-force freed the buffer resulting from + before it could be
used shows (see third paragraph in
http://bugzilla.mozilla.org/show_bug.cgi?id=117611#c23), a lot of the cost is in
growing the malloc heap (process data segment), not in lack of "buffer reuse".

I think that the heap growth penalty also accounts for the program being slower
on subsequent runs.  At least on Linux, I believe the VM system (and maybe the
scheduler) is penalizing the runs after the first, which naively grows its data
segment to allocate all the buffers needed for the (useless) operator+ results.
 Shaver may know more.

/be
Target Milestone: --- → Future
*** Bug 314890 has been marked as a duplicate of this bug. ***
Assignee: khanson → general
QA Contact: pschwartau → general
Igor you want to take this :-)?
Flags: blocking1.9?
OS: Windows 2000 → All
Hardware: PC → All
Target Milestone: Future → ---
Keywords: footprint
http://www.yafla.com/dforbes/String_Concatenation_and_Immutable_Strings_Speeding_Spidermonkey#a412

Has some interesting notes.  Likely not to make the complete change in this bug for B2 - but if we can do something that'd be great.
schrep:  Were you aiming that at bug 157334?
(In reply to comment #8)
> schrep:  Were you aiming that at bug 157334?
> 

Kinda - was suggesting smaller optimization than this one for b3..
It seems like the somewhat naive overallocation approach would cost us (maybe heavily) in footprint, though.
Comment 8 is right, let's not mix issues.

Re: comment 10 -- I commented on that blog. The code shown there has a couple of bugs, and we would not want to round up to powers of two past a certain point. We often go linear at a threshold: 2, 4, 8, ... 1024, 2048, 3072, 4096, ... although of course you can measure O(n^2) compounding of the linear cost function if you try hard enough.

/be
Taking discussion of string-growth to bug 157334.
Given comment 12 moving off blocking list
Flags: blocking1.9? → blocking1.9-
This bug pre-dates dependent strings, which allow concatenation (a = b + c) to turn b into a dependent (prefix) string based on the new buffer for a. This still requires allocating a new header (JSString) for a, leaving the mutated b header alive. But it is winning as an improvement over independent flat strings. See bug 56940.

Ref-counting would allow cases where b's header is referenced only by b, and the b variable dies or is killed soon, to recycle the memory for b's header quickly, compared to GC. For a case like x += y, where the pre-concatenate header denoted by x has a ref-count of 1, ref-counted string runtimes can mutate the header and realloc its buffer. If your malloc isn't good at it, you can over-allocate too.

Slicing strings can result in dependent strings which, with ref-counting can inherit their base string's buffer, freeing the base header. Patterns such as

  s = "hello, world";
  t = s.slice(2,5);
  s = null;
  t = "ho" + t + "w land";

with ref-counting can free s's header and give its buffer to t when the last assignment is evaluated, and even the resulting "hollow land" in the same space. Our current approach has to allocate a new GC thing for every concatenation result, although shaver will soon fix the last line to do an N-way JSOP_CONCAT.

Ref-counting adds pervasive small cost to the runtime, and requires C++ smart pointers.

/be
Assignee: general → brendan
I'm not flush with time for this, and others may have time, so throwing back into the pool. Not sure it wins vs. GC.

Luke and I talked about a static analysis that finds linearity -- can tell when the only ref to a string is killed by assignment and can claim the buffer and header. That seems more promising than ref-counting all the time, especially as we aim (post Fx4) for fast mostly-copying (due to our safety-first conservative stack scanning), single-threaded generational GC.

/be
Assignee: brendan → general
Agreed :)
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.