The default bug view has changed. See this FAQ.

Write unescape to match spec and document StringBuffer API's memory usage characteristics

RESOLVED FIXED in mozilla11

Status

()

Core
JavaScript Engine
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: cdleary, Assigned: cdleary)

Tracking

unspecified
mozilla11
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

Created attachment 580212 [details] [diff] [review]
Unescape with copy avoidance.

This rewrites the unescape algorithm to match the spec, stepwise. The optimization I've added halves the speed of this microbenchmark on my machine:

(function bench() {
    var start = new Date();
    var array = [];
    for (var i = 0; i < 1000000; ++i) {
        array[i] = unescape("Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc lectus urna, tempus ac pulvinar at, scelerisque non tortor. Vivamus quis elit id elit tempor bibendum eget eget mi. Curabitur enim lacus, euismod sit amet facilisis nec, congue ut felis. Proin posuere blandit dignissim. Donec facilisis lacinia lobortis. Aenean in nulla ut ante rhoncus sagittis. Donec non lectus massa, molestie interdum felis. Mauris turpis diam, ultrices a posuere nec, semper ac sem. Fusce egestas, arcu ac volutpat aliquam, eros nisi feugiat neque, quis viverra magna elit non nibh. Donec ac purus in lorem auctor mattis. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc vitae enim velit, et condimentum quam. Duis consequat gravida feugiat. Sed interdum sagittis ultrices. Duis sit amet congue est. Integer quis velit libero.")
    }
    var end = new Date();
    print(end - start);
})();

Extra documentation per bug 707125 comment 10.
Attachment #580212 - Flags: review?(jwalden+bmo)
Er, halves the execution time, doubles the speed. Not enough coffee yet. ;-)
Review note to self: should undef ENSURE_BUILDING to keep it scoped to that particular function.
Comment on attachment 580212 [details] [diff] [review]
Unescape with copy avoidance.

Review of attachment 580212 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsstr.cpp
@@ +320,5 @@
> +        jschar c = chars[k];
> +
> +        /* Step 7. */
> +        if (c != '%')
> +            goto step_18;

I am not sure whether this is a really awesome way to write this or a really awful way to write this.  Both?

@@ +3142,5 @@
>          return NULL;
>  
>      /* For medium/big buffers, avoid wasting more than 1/4 of the memory. */
>      JS_ASSERT(capacity >= length);
> +    if (length > CharBuffer::sMaxInlineStorage && capacity - length > (length >> 2)) {

Make that |length / 4| -- reads better, compilers are smart.
Attachment #580212 - Flags: review?(jwalden+bmo) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/bb342b356f76
Target Milestone: --- → mozilla11
https://hg.mozilla.org/mozilla-central/rev/bb342b356f76
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Comment on attachment 580212 [details] [diff] [review]
Unescape with copy avoidance.

Review of attachment 580212 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsstrinlines.h
@@ +57,5 @@
> + * exception report on the context and results in a failed return value.
> + *
> + * Well-sized extractions (which waste no more than 1/4 of their char
> + * buffer space) are guaranteed for strings built by this interface.
> + * See |extractWellSized|.

What does "their char buffer space" mean?  Is it the requested buffer size, or the actual buffer size?  I hope it's the latter.

Let's be pessimistic and assume it's the former.  If we requested 130 bytes, jemalloc will round up to 256 bytes.  If str_unescape shrinks the string down to 100 bytes, that's less than 1/4 of the *requested* space wasted, but more than 1/2 of the *actual* space wasted, which will trigger the assertion failure I fixed in bug 707125.

(This kind of subtlety is exactly why I avoided using a StringBuffer in bug 707125 -- even after you added documentation and passed review, it's still not clear to me whether my requirements are met.)
(In reply to Nicholas Nethercote [:njn] from comment #6)

Thanks for clarifying. We're basing it on the requested buffer size, because we weren't assuming any knowledge of the underlying memory allocator's rounding policies. I'll create a patch to fix that in a moment.

On the plus side, by consolidating this into the implementation of StringBuffer we've obviated all other uses of StringBuffer that may have triggered the assertion that you mention.
> Thanks for clarifying. We're basing it on the requested buffer size, because
> we weren't assuming any knowledge of the underlying memory allocator's
> rounding policies. I'll create a patch to fix that in a moment.

That'll be tricky because moz_malloc_usable_size isn't usable in the JS engine.

I suggest that you instead add a flag to StringBuffer::finishString() that lets you request that the final allocation be as small as possible, i.e. the request matches the string length exactly, without the up-to-25% waste.
(In reply to Nicholas Nethercote [:njn] from comment #8)
> I suggest that you instead add a flag to StringBuffer::finishString() that
> lets you request that the final allocation be as small as possible, i.e. the
> request matches the string length exactly, without the up-to-25% waste.

I don't fully understand the context for the assertion you're describing (bug 707125 doesn't seem to describe where the assertion is coming from or have a test that I can try to run), but I would think that this would have to be the default. You said:

assertion in the "string-chars" memory reporter's sanity checking of |usable| and |computedSize|.

StringBuffer is used to create a bunch of other strings in the engine as well -- why would they be exempt from this assertion? (From your comment I'm guessing that moz_malloc_usable_size is more complicated than rounding up the capacity to the nearest power of two, involving various bits of jemalloc arena size information and such.)
Here's the checking code:
http://mxr.mozilla.org/mozilla-central/source/xpcom/base/nsIMemoryReporter.idl#383

It's used for every single memory reporter.  Basically, it assumes that no allocator will ever hand back more than twice the requested size.  This assumption is violated in cases like this one where you can end up deliberately over-allocating... but such cases are rare, and the checking has found numerous bugs in existing code (both in memory reporters and in other code).  So I'd strongly prefer to avoid tolerating this kind of over-allocation.


> StringBuffer is used to create a bunch of other strings in the engine as
> well -- why would they be exempt from this assertion?

Good point, having no wasted space will have to be the default, because it's conceivable that any string created with a StringBuffer will end up being measured (and thus checked) by a memory reporter.


> (From your comment I'm
> guessing that moz_malloc_usable_size is more complicated than rounding up
> the capacity to the nearest power of two, involving various bits of jemalloc
> arena size information and such.)

It is more complicated.  See http://mxr.mozilla.org/mozilla-central/source/memory/jemalloc/jemalloc.c#47 for details.  (Even that comment is inaccurate because we've made changes to our copy of jemalloc.)

(Oh, my example before with 130 bytes is bogus because that gets rounded up to 144.  But there are other sizes where the same logic applies, e.g. 514 bytes gets rounded up by jemalloc to 1024.)
Hmm, I'm a worried that will lead to a lot more realloc calls. I'll instrument my browser and check out whether that worry is at all founded tomorrow.
Blocks: 710883

Updated

5 years ago
Depends on: 778603
You need to log in before you can comment on or make changes to this bug.