measure/optimize unnecessary string temporaries created then passed to js_AtomizeString

RESOLVED FIXED in mozilla26

Status

()

defect
RESOLVED FIXED
8 years ago
6 years ago

People

(Reporter: luke, Assigned: yaron.tausky)

Tracking

unspecified
mozilla26
All
Linux
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [good first bug][mentor=luke@mozilla.com])

Attachments

(3 attachments, 6 obsolete attachments)

Reporter

Description

8 years ago
Grepping for js_AtomizeString in js/src show several places where we create a string and then pass it directly into js_AtomizeString.  Since js_AtomizeString never atomizes its argument (even if it needs to create a new atom string!), this is a potentially large source of unnecessary garbage and wasted time.  I think in most cases its not completely trivial to just call js_AtomizeChars instead (which skips the intermediate string temporary), thus step 1 of this bug would be to add a simple counter to each site where we do this and see how much we are wasting in practice by browsing around and running benchmarks.  If any hotspots are found, we should try to replace them with js_AtomizeChars.
I'd like to poke at this, how should I get started?

Comment 2

7 years ago
Posted patch patch draft (obsolete) — Splinter Review
It appears most of the unnecessary allocations are done through the js_ValueToAtom call, which for non strings (i.e. int32 and number types) will perform the conversion to JSString (with the obligatory heap allocations) and then atomize.

This is a first attempt at changing this (plus some other changes that are more straightforward).

There is some (unnecessary?) code duplication in the js_IntToAtom and js_NumberToAtom methods pertaining to the cached values.
Attachment #600722 - Flags: review?(luke)
Ioannis, have you done any measurements to see how many allocations this avoids?  How many and what proportion of js_AtomizeString calls are affected when loading some popular web sites would be a useful metric.
Reporter

Comment 4

7 years ago
I looked over the patch quickly and, at a micro-level, it looks like good code.  At a high-level, though, I am a bit worried about the code duplication.  One thing I was thinking: what if js_AtomizeString atomized the given string in-place.  That would require duplicating some logic from AtomizeInline (to make a version that took a JSString), but with all the recent refactorings, the new version should be fairly simple.  This would also help some cases that the patch doesn't help (like JSFlatString::toPropertyName). 

With js_AtomizeString in place, there are still cases where we'd create a temporary string wastefully (when there was already an atom), but perhaps, as njn suggested, we should measure and see how often that happens.  If so, then adding a ToAtom path would make sense like you've essentially done in the patch.

How does that sound?
Reporter

Comment 5

7 years ago
Comment on attachment 600722 [details] [diff] [review]
patch draft

Canceling review until further comment.
Attachment #600722 - Flags: review?(luke)
Assignee

Comment 6

6 years ago
Since atomizing in-place turned out to be impractical (because atoms have to be in the atoms zone), back to the previous approach it is.

I tested two use cases and counted hits only when the string wasn't an atom to begin with. The first one, loading the NYT front page and scrolling to the bottom of the page, gave the following hit counts:
    893 js/src/jsclone.cpp:1204
    674 js/src/jsatom.cpp:464
    402 js/src/builtin/RegExp.cpp:277
     63 js/src/jsstr.cpp:1711
     61 js/src/frontend/FoldConstants.cpp:601
     59 js/src/frontend/FoldConstants.cpp:624
The second, logging in to Facebook, posting something, sending some messages and finally logging out, gave these results:
  22268 js/src/jsatom.cpp:464
   3606 js/src/builtin/RegExp.cpp:277
    466 js/src/jsstr.cpp:1711
    456 js/src/jsstr.cpp:2175
    204 js/src/jsclone.cpp:1204
     77 js/src/frontend/FoldConstants.cpp:601
     59 js/src/frontend/FoldConstants.cpp:624
Assignee

Comment 7

6 years ago
I ran the Facebook case again, printing the type information for values that get coerced into temporary strings, and got these results:
  13663 js/src/jsatom.cpp:463 Int32
   3715 js/src/builtin/RegExp.cpp:276 String
   1085 js/src/jsatom.cpp:463 Object
    896 js/src/jsatom.cpp:463 Double
    482 js/src/jsstr.cpp:2189
    238 js/src/jsstr.cpp:1710 String
    170 js/src/jsclone.cpp:1204
     77 js/src/frontend/FoldConstants.cpp:601
     59 js/src/frontend/FoldConstants.cpp:624

The hottest spot is in js::ToAtom and it seems that it would benefit from a dedicated path. There's nothing to do for RegExp.cpp since it only gets strings as parameters. Likewise for the first occurrence in jsstr.cpp, but it would be possible to spare the creation of dependent strings in the second one (with 482 hits). The rest seem insignificant, even though they're relatively simple (and in Ioannis's patch already).
I'm not sure about the significance of these numbers in practice, though, so I'd be glad to hear some opinions about it.
>   13663 js/src/jsatom.cpp:463 Int32
>    3715 js/src/builtin/RegExp.cpp:276 String
>    1085 js/src/jsatom.cpp:463 Object

These are counts, right?  It would be interesting to know how big these strings are.  Perhaps you could increment the counter by the length each time instead of incrementing by one, and then report both the count and the average length?  If some of these strings are big that makes it more interesting.
Assignee

Comment 9

6 years ago
Okay, here's the new measurement:
  13016 js/src/jsatom.cpp:463 Int32 3.0
   5278 js/src/builtin/RegExp.cpp:276 String 12.8
   1868 js/src/jsatom.cpp:463 Object 6.9
    915 js/src/jsatom.cpp:463 Double 6.3
    886 js/src/jsstr.cpp:2189 5.8
    325 js/src/jsstr.cpp:1710 String 30.5
    187 js/src/jsclone.cpp:1203 6.0
     88 js/src/frontend/FoldConstants.cpp:624 83.0
     82 js/src/frontend/FoldConstants.cpp:601 329.0
The number in the front is the hit count and the number at the back is the average length for strings that went through that path.
Assignee

Comment 10

6 years ago
Posted patch optimise_toatom v1 (obsolete) — Splinter Review
I wrote a patch to address the worst offender, ToAtom. I'm not sure how it affects actual performance, since the strings that go through that path are quite short. I tried to minimise code duplication in {Int32, Number}ToString but it's still a bit hairy.
Assignee: general → yaron.tausky
Attachment #796326 - Flags: review?(luke)
Comment on attachment 796326 [details] [diff] [review]
optimise_toatom v1

(Please click on the "patch" checkbox when submitting a patch;  that enables Bugzilla's patch viewing tools.  I've done that for you this time.  Thanks!)
Attachment #796326 - Attachment is patch: true
Comment on attachment 796326 [details] [diff] [review]
optimise_toatom v1

Review of attachment 796326 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsatom.cpp
@@ +458,5 @@
> +static JSAtom *
> +ToAtomSlow(ExclusiveContext *cx, typename MaybeRooted<Value, allowGC>::HandleType arg)
> +{
> +    JS_ASSERT(!arg.isString());
> +    

Nit: extra whitespace

::: js/src/jsnum.cpp
@@ +541,5 @@
> +LookupDtoaCache(ThreadSafeContext *cx, double d)
> +{
> +    if (!cx->isExclusiveContext())
> +        return NULL;
> +    

Nit: extra whitespace here and in (almost?) all added empty lines below.

::: js/src/jsnum.h
@@ +55,5 @@
>  js_NumberToString(js::ThreadSafeContext *cx, double d);
>  
> +template <js::AllowGC allowGC>
> +extern JSAtom *
> +js_NumberToAtom(js::ExclusiveContext *cx, double d);

This should go into the js namespace and lose the prefix. Yes, that's inconsistent with js_NumberToString, but I think we shouldn't introduce more pseudo-namespaced functions.
Reporter

Comment 13

6 years ago
Comment on attachment 796326 [details] [diff] [review]
optimise_toatom v1

Review of attachment 796326 [details] [diff] [review]:
-----------------------------------------------------------------

Great patch!  Nice job factoring out the shared parts.  Just a few stylistic nits and one non-style request.

If you aren't already familiar with it, I'd recommend pushing the patch to try
  https://wiki.mozilla.org/ReleaseEngineering/TryServer#How_to_push_to_try

::: js/src/jsatom.cpp
@@ +470,5 @@
> +        v = v2;
> +    }
> +
> +    JSAtom *atom;
> +    if (v.isString()) {

No need for { } around all these then/else bodies since they are all single-statement.  Also, I think you could write:

if (v.isString())
    return AtomizeString...
if (v.isInt32())
    return ...
...
return cx->names().undefined;

::: js/src/jsnum.cpp
@@ +541,5 @@
> +LookupDtoaCache(ThreadSafeContext *cx, double d)
> +{
> +    if (!cx->isExclusiveContext())
> +        return NULL;
> +    

To add to what Till said: this dangling whitespace shows up as red in bugzilla's diff viewer.  If you add

[color]
diff.trailingwhitespace = bold red_background

it'll show up red in 'hg diff' as well.

@@ +558,5 @@
> +    if (!cx->isExclusiveContext())
> +        return;
> +    
> +    if (JSCompartment *comp = cx->asExclusiveContext()->compartment()) {
> +        comp->dtoaCache.cache(10, d, str);

No { } around single-statement then branch.

@@ +609,3 @@
>  
> +    str->resetLength(length);
> +    PodCopy(buffer, start, length + 1);

PodCopy calls memcpy which is not valid to call if the source and destination ranges overlap which I think they do in this case.  That means you need to use memmove, which isn't as efficient.  Perhaps you can put this back to using a local char buffer and copying to that?

@@ +1408,5 @@
> +    if (mozilla::DoubleIsInt32(d, &si))
> +        return Int32ToAtom<allowGC>(cx, si);
> +
> +    if (JSFlatString *str = LookupDtoaCache(cx, d))
> +        return AtomizeString<allowGC>(cx, str);

Same question here as above.

::: js/src/jsnum.h
@@ +55,5 @@
>  js_NumberToString(js::ThreadSafeContext *cx, double d);
>  
> +template <js::AllowGC allowGC>
> +extern JSAtom *
> +js_NumberToAtom(js::ExclusiveContext *cx, double d);

To expand a bit on what Till was saying: SM was C code until 4 years ago and the convention was that the js_ prefix was for functions used between different internal SM .cpps.  With C++, we started the 'js' namespace and we've been incrementally moving all js_ prefixed functions into 'js'.  Please feel free to rename any you see yourself.
Attachment #796326 - Flags: review?(luke) → review+
Assignee

Comment 14

6 years ago
Posted patch optimise_toatom v2 (obsolete) — Splinter Review
I've addressed all the issues above now. I can't use the try server yet since I don't have sufficient access and the process will probably take several days to complete; I'll wait with the checkin-needed keyword until I do it.

Would you say that there's sense in optimising any of the other occurrences?
Attachment #796326 - Attachment is obsolete: true
Reporter

Comment 15

6 years ago
(In reply to Yaron Tausky from comment #14)
> I've addressed all the issues above now. I can't use the try server yet
> since I don't have sufficient access and the process will probably take
> several days to complete; I'll wait with the checkin-needed keyword until I
> do it.

Great, if you need anyone to vouch, just CC me.

> Would you say that there's sense in optimising any of the other occurrences?

The RegExp.cpp:276 case in your measurements seems trivial to fix with ToAtom.  Perhaps you could give a quick scan of the rest to see if there is anything that benefits from what you just added.  Otherwise, we'll just let this bug get resolved FIXED when your patches land and get merged to mozilla-central.
Assignee

Comment 16

6 years ago
Posted patch Move to ToAtom (obsolete) — Splinter Review
I went through the list again and changed all the ~one-liners to the new path.
Attachment #797771 - Flags: review?(luke)
Assignee

Comment 17

6 years ago
Posted patch Rename js_NumberToString (obsolete) — Splinter Review
And finally, rename js_NumberToString to js::NumberToString to keep it consistent with js::NumberToAtom.
Attachment #797781 - Flags: review?(luke)
Reporter

Comment 18

6 years ago
Comment on attachment 797771 [details] [diff] [review]
Move to ToAtom

Review of attachment 797771 [details] [diff] [review]:
-----------------------------------------------------------------

Great!
Attachment #797771 - Flags: review?(luke) → review+
Reporter

Comment 19

6 years ago
Comment on attachment 797781 [details] [diff] [review]
Rename js_NumberToString

Review of attachment 797781 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!
Attachment #797781 - Flags: review?(luke) → review+
Comment on attachment 796872 [details] [diff] [review]
optimise_toatom v2

Review of attachment 796872 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsnum.cpp
@@ +576,5 @@
> +JS_ALWAYS_INLINE
> +static T *
> +BackfillInt32InBuffer(int32_t si, T *buffer, size_t size, size_t *length)
> +{
> +    uint32_t ui = si < 0 ? uint32_t(-si) : si;

mozilla::Abs?
Assignee

Comment 21

6 years ago
Posted patch optimise_toatom v3 (obsolete) — Splinter Review
The default template parameter in js::Atomize broke the build on XP, so I turned it into a wrapper for js::AtomizeMaybeGC<CanGC>. I also used mozilla::Abs instead of the manual way.
Attachment #796872 - Attachment is obsolete: true
Assignee

Comment 22

6 years ago
https://tbpl.mozilla.org/?tree=Try&rev=16c4a1f06ade

I rebased the patches; carrying over the r+'s.
Attachment #800743 - Attachment is obsolete: true
Attachment #801266 - Flags: review+
Assignee

Comment 23

6 years ago
Attachment #797771 - Attachment is obsolete: true
Attachment #801267 - Flags: review+
Assignee

Comment 24

6 years ago
Attachment #797781 - Attachment is obsolete: true
Attachment #801268 - Flags: review+
Assignee

Updated

6 years ago
Keywords: checkin-needed
Attachment #600722 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.