Last Comment Bug 708873 - Write unescape to match spec and document StringBuffer API's memory usage characteristics
: Write unescape to match spec and document StringBuffer API's memory usage cha...
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: All All
: -- normal (vote)
: mozilla11
Assigned To: Chris Leary [:cdleary] (not checking bugmail)
:
: Jason Orendorff [:jorendorff]
Mentors:
Depends on: CVE-2012-4204
Blocks: 710883
  Show dependency treegraph
 
Reported: 2011-12-08 15:23 PST by Chris Leary [:cdleary] (not checking bugmail)
Modified: 2012-08-01 10:43 PDT (History)
2 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Unescape with copy avoidance. (8.68 KB, patch)
2011-12-08 15:23 PST, Chris Leary [:cdleary] (not checking bugmail)
jwalden+bmo: review+
Details | Diff | Splinter Review

Description Chris Leary [:cdleary] (not checking bugmail) 2011-12-08 15:23:02 PST
Created attachment 580212 [details] [diff] [review]
Unescape with copy avoidance.

This rewrites the unescape algorithm to match the spec, stepwise. The optimization I've added halves the speed of this microbenchmark on my machine:

(function bench() {
    var start = new Date();
    var array = [];
    for (var i = 0; i < 1000000; ++i) {
        array[i] = unescape("Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc lectus urna, tempus ac pulvinar at, scelerisque non tortor. Vivamus quis elit id elit tempor bibendum eget eget mi. Curabitur enim lacus, euismod sit amet facilisis nec, congue ut felis. Proin posuere blandit dignissim. Donec facilisis lacinia lobortis. Aenean in nulla ut ante rhoncus sagittis. Donec non lectus massa, molestie interdum felis. Mauris turpis diam, ultrices a posuere nec, semper ac sem. Fusce egestas, arcu ac volutpat aliquam, eros nisi feugiat neque, quis viverra magna elit non nibh. Donec ac purus in lorem auctor mattis. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc vitae enim velit, et condimentum quam. Duis consequat gravida feugiat. Sed interdum sagittis ultrices. Duis sit amet congue est. Integer quis velit libero.")
    }
    var end = new Date();
    print(end - start);
})();

Extra documentation per bug 707125 comment 10.
Comment 1 Chris Leary [:cdleary] (not checking bugmail) 2011-12-08 15:23:57 PST
Er, halves the execution time, doubles the speed. Not enough coffee yet. ;-)
Comment 2 Chris Leary [:cdleary] (not checking bugmail) 2011-12-08 15:47:07 PST
Review note to self: should undef ENSURE_BUILDING to keep it scoped to that particular function.
Comment 3 Jeff Walden [:Waldo] (remove +bmo to email) 2011-12-08 16:16:29 PST
Comment on attachment 580212 [details] [diff] [review]
Unescape with copy avoidance.

Review of attachment 580212 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsstr.cpp
@@ +320,5 @@
> +        jschar c = chars[k];
> +
> +        /* Step 7. */
> +        if (c != '%')
> +            goto step_18;

I am not sure whether this is a really awesome way to write this or a really awful way to write this.  Both?

@@ +3142,5 @@
>          return NULL;
>  
>      /* For medium/big buffers, avoid wasting more than 1/4 of the memory. */
>      JS_ASSERT(capacity >= length);
> +    if (length > CharBuffer::sMaxInlineStorage && capacity - length > (length >> 2)) {

Make that |length / 4| -- reads better, compilers are smart.
Comment 4 Chris Leary [:cdleary] (not checking bugmail) 2011-12-08 16:32:47 PST
https://hg.mozilla.org/integration/mozilla-inbound/rev/bb342b356f76
Comment 5 Ed Morley [:emorley] 2011-12-09 06:45:39 PST
https://hg.mozilla.org/mozilla-central/rev/bb342b356f76
Comment 6 Nicholas Nethercote [:njn] 2011-12-12 18:58:58 PST
Comment on attachment 580212 [details] [diff] [review]
Unescape with copy avoidance.

Review of attachment 580212 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsstrinlines.h
@@ +57,5 @@
> + * exception report on the context and results in a failed return value.
> + *
> + * Well-sized extractions (which waste no more than 1/4 of their char
> + * buffer space) are guaranteed for strings built by this interface.
> + * See |extractWellSized|.

What does "their char buffer space" mean?  Is it the requested buffer size, or the actual buffer size?  I hope it's the latter.

Let's be pessimistic and assume it's the former.  If we requested 130 bytes, jemalloc will round up to 256 bytes.  If str_unescape shrinks the string down to 100 bytes, that's less than 1/4 of the *requested* space wasted, but more than 1/2 of the *actual* space wasted, which will trigger the assertion failure I fixed in bug 707125.

(This kind of subtlety is exactly why I avoided using a StringBuffer in bug 707125 -- even after you added documentation and passed review, it's still not clear to me whether my requirements are met.)
Comment 7 Chris Leary [:cdleary] (not checking bugmail) 2011-12-12 20:27:27 PST
(In reply to Nicholas Nethercote [:njn] from comment #6)

Thanks for clarifying. We're basing it on the requested buffer size, because we weren't assuming any knowledge of the underlying memory allocator's rounding policies. I'll create a patch to fix that in a moment.

On the plus side, by consolidating this into the implementation of StringBuffer we've obviated all other uses of StringBuffer that may have triggered the assertion that you mention.
Comment 8 Nicholas Nethercote [:njn] 2011-12-12 20:38:38 PST
> Thanks for clarifying. We're basing it on the requested buffer size, because
> we weren't assuming any knowledge of the underlying memory allocator's
> rounding policies. I'll create a patch to fix that in a moment.

That'll be tricky because moz_malloc_usable_size isn't usable in the JS engine.

I suggest that you instead add a flag to StringBuffer::finishString() that lets you request that the final allocation be as small as possible, i.e. the request matches the string length exactly, without the up-to-25% waste.
Comment 9 Chris Leary [:cdleary] (not checking bugmail) 2011-12-12 20:47:34 PST
(In reply to Nicholas Nethercote [:njn] from comment #8)
> I suggest that you instead add a flag to StringBuffer::finishString() that
> lets you request that the final allocation be as small as possible, i.e. the
> request matches the string length exactly, without the up-to-25% waste.

I don't fully understand the context for the assertion you're describing (bug 707125 doesn't seem to describe where the assertion is coming from or have a test that I can try to run), but I would think that this would have to be the default. You said:

assertion in the "string-chars" memory reporter's sanity checking of |usable| and |computedSize|.

StringBuffer is used to create a bunch of other strings in the engine as well -- why would they be exempt from this assertion? (From your comment I'm guessing that moz_malloc_usable_size is more complicated than rounding up the capacity to the nearest power of two, involving various bits of jemalloc arena size information and such.)
Comment 10 Nicholas Nethercote [:njn] 2011-12-12 21:05:20 PST
Here's the checking code:
http://mxr.mozilla.org/mozilla-central/source/xpcom/base/nsIMemoryReporter.idl#383

It's used for every single memory reporter.  Basically, it assumes that no allocator will ever hand back more than twice the requested size.  This assumption is violated in cases like this one where you can end up deliberately over-allocating... but such cases are rare, and the checking has found numerous bugs in existing code (both in memory reporters and in other code).  So I'd strongly prefer to avoid tolerating this kind of over-allocation.


> StringBuffer is used to create a bunch of other strings in the engine as
> well -- why would they be exempt from this assertion?

Good point, having no wasted space will have to be the default, because it's conceivable that any string created with a StringBuffer will end up being measured (and thus checked) by a memory reporter.


> (From your comment I'm
> guessing that moz_malloc_usable_size is more complicated than rounding up
> the capacity to the nearest power of two, involving various bits of jemalloc
> arena size information and such.)

It is more complicated.  See http://mxr.mozilla.org/mozilla-central/source/memory/jemalloc/jemalloc.c#47 for details.  (Even that comment is inaccurate because we've made changes to our copy of jemalloc.)

(Oh, my example before with 130 bytes is bogus because that gets rounded up to 144.  But there are other sizes where the same logic applies, e.g. 514 bytes gets rounded up by jemalloc to 1024.)
Comment 11 Chris Leary [:cdleary] (not checking bugmail) 2011-12-12 23:58:13 PST
Hmm, I'm a worried that will lead to a lot more realloc calls. I'll instrument my browser and check out whether that worry is at all founded tomorrow.

Note You need to log in before you can comment on or make changes to this bug.