Closed Bug 1038099 Opened 10 years ago Closed 6 years ago

Consider creating Latin1 inline strings in JS_NewExternalString if the string is short

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla60
Performance Impact high
Tracking Status
firefox60 --- fixed

People

(Reporter: jandem, Assigned: Waldo)

References

Details

(Keywords: perf)

Attachments

(2 files)

Gecko passes two-byte strings to JS_NewExternalString, which is a bit unfortunate because concatenating such strings will result in more two-byte strings.

For short strings though, JS_NewExternalString could just try to deflate and create a (fat)inline Latin1 string. In the non-fat inline string case, this would use no extra memory at all.

This could also be nice for nursery-allocated strings (bug 903519), because we don't have to call the external string finalizer.

We should instrument the browser to get some data on external string length, count and encoding.
What about one-byte strings that XPConnect currently inflates to two-byte strings?
Ah, those are already latin1, I didn't follow the template well enough.
Note that this would need some sort of API change to tell the consumer that we ignored the passed-in pointer, right?
(In reply to Boris Zbarsky [:bz] from comment #3)
> Note that this would need some sort of API change to tell the consumer that
> we ignored the passed-in pointer, right?

Either that or JS_NewExternalString could just call the finalizer when it creates a non-external string.
That may not be safe unless the finalizer call happens async.  For example, consider the code in NonVoidStringToJsval which looks like this once you inline the StringBufferToJSVal stuff:

  nsStringBuffer* buf = str.StringBuffer();
  JSString* str = JS_NewExternalString(cx, buf->Data(), length, &sDOMStringFinalizer);
  if (str) {
    buf->AddRef();
  }

this could inadvertently delete the buffer if it starts out with a refcount of 1 and the finalizer is called before JS_NewExternalString returns.
(In reply to Boris Zbarsky [:bz] from comment #5)

Ah, fair enough. In that case we'd probably want to update the API, yeah...
(In reply to Boris Zbarsky [:bz] from comment #5)
That's a rather dangerous piece of code, as it relies on a GC not occurring before the AddRef.  It seems like the AddRef() should be moved above the NewExternalString with a Release() in the failure case (or, in the more modern style, nsAutoRef with a Release() in the success case).  There appears to be only two uses of JS_NewExternalString, so perhaps we can just change it in place.
> as it relies on a GC not occurring before the AddRef.

That's true.  In practice this assumption holds.

> It seems like the AddRef() should be moved above the NewExternalString with a Release()
> in the failure case

We don't know ahead of time whether we'll call JS_NewExternalString, it's very common in hot code to not actually end up calling it, and in the hot case when we don't call it (the cache hit on the string cache) we'd rather not pay the cost of two useless atomic operations...

> There appears to be only two uses of JS_NewExternalString

In Gecko code.
(In reply to Boris Zbarsky [:bz] from comment #8)
> We don't know ahead of time whether we'll call JS_NewExternalString, it's
> very common in hot code to not actually end up calling it, and in the hot
> case when we don't call it (the cache hit on the string cache) we'd rather
> not pay the cost of two useless atomic operations...

I was thinking the AddRef() would immediately precede the NewExternalString.

> > There appears to be only two uses of JS_NewExternalString
> 
> In Gecko code.

Don't we hide JSAPI symbols in various builds now (bug 920731) with the intention to hide them completely once we can fold JS back into libxul (bug 609976, but I think there are more recent attempts met with PGO bustage)?  Anyhow, if that was a realistic concern, then yes we should probably rename.
> I was thinking the AddRef() would immediately precede the NewExternalString.

The problem is that the code that is doing the NewExternalString doesn't know whether the addref is needed and the code that knows that doesn't know ahead of time whether we'll do NewExternalString.  This arises from having too many string types, with too many different storage strategies, and needing them all tolerably fast.  :(

> Don't we hide JSAPI symbols in various builds now

I'm thinking about other SpiderMonkey embedders, not use in Firefox.
(In reply to Jan de Mooij)
> Gecko passes two-byte strings to JS_NewExternalString, which is a bit
> unfortunate because concatenating such strings will result in more two-byte
> strings.
> 
> For short strings though, JS_NewExternalString could just try to deflate and
> create a (fat)inline Latin1 string. In the non-fat inline string case, this
> would use no extra memory at all.
> 
> This could also be nice for nursery-allocated strings (bug 903519), because
> we don't have to call the external string finalizer.
> 
> We should instrument the browser to get some data on external string length,
> count and encoding.

I can't tell you about length yet, I've only just found out how long is a piece of string.

A quick startup/shutdown test with my instrumentation gives the following results:
97% of UTF16 strings passed from C++ to JS created external strings.
20% of UTF16 strings passed from JS to XPConnect were external strings where XPConnect was able to retrieve the original strings. (I don't think WebIDL tries to do this.)
78% of Latin1 strings passsed from XPConnect to JS could have been external strings. (I don't know if WebIDL ever has need to pass Latin1 strings to JS.)
5% of Latin1 strings passed from XPConnect to JS could have been external strings.
> I don't think WebIDL tries to do this.

That's correct.  If we have a fast inline way to do it, we might...

> I don't know if WebIDL ever has need to pass Latin1 strings to JS.

It's pretty rare but does happen.  Mostly for XMLHttpRequest.

I assume the description of either the 78% number or the 5% number is wrong?
(In reply to Boris Zbarsky from comment #12)
> I assume the description of either the 78% number or the 5% number is wrong?

Yeah, the 5% was supposed to be strings passed back from JS to XPConnect.
(In reply to neil@parkwaycc.co.uk from comment #11)
> A quick startup/shutdown test with my instrumentation gives the following
> results:

Thanks for measuring!

Do you know if, at least in some cases, we can use Latin1 external strings? Right now all external strings are TwoByte but it'd be pretty trivial to fix this on the JS side...
(In reply to Jan de Mooij from comment #14)
> Do you know if, at least in some cases, we can use Latin1 external strings?
> Right now all external strings are TwoByte but it'd be pretty trivial to fix
> this on the JS side...
(In reply to Boris Zbarsky from comment #12)
> > I don't think WebIDL tries to do this.
> 
> That's correct.  If we have a fast inline way to do it, we might...

Doesn't seem very useful; in my follow-up test fewer than 1% of DOM strings converted back from JS turned out to be external.

Also my statistic of 78% of DOM strings being turned into external JS strings is incorrect because of various codepaths.
(In reply to from comment #16)
> Also my statistic of 78% of DOM strings being turned into external JS
> strings is incorrect because of various codepaths.

Well, my actual statistic of 97% turned out to be correct after all since the codepaths I was worried about weren't actually used (mostly converting a dictionary to a jsval or something).
(In reply to comment #17)
> Well, my actual statistic of 97% turned out to be correct

Inasmuch as my instrumentation is correct, startup/shutdown really does use that many external strings, but when I start using the browser, the figure drops to 69%.

I also looked at string literals but at best (i.e. if they're wide literals) they only account for 1.5% of strings passed into JS.
Whiteboard: [qf]
Whiteboard: [qf] → [qf:p1]
Whiteboard: [qf:p1] → [qf:p2]
Keywords: perf
Whiteboard: [qf:p2] → [perf:p2]
Whiteboard: [perf:p2] → [qf:p2]
Jeff can you take a look at what is left here?
Flags: needinfo?(jwalden+bmo)
Whiteboard: [qf:p2] → [qf:p1]
Whiteboard: [qf:p1] → [qf:i60][qf:p1]
Whiteboard: [qf:i60][qf:p1] → [qf:f60][qf:p1]
Whiteboard: [qf:f60][qf:p1] → [qf:f61][qf:p1]
In the time since this was filed, it looks like we've picked up JS_NewMaybeExternalString which is an "update" to the API as mooted in comment 6.  So all that's necessary here, then, is just to augment js::NewMaybeExternalString to return inline strings when appropriate.  (And maybe not solely for Latin-1 -- no reason it couldn't happen for two-byte strings as well.)

So, next step: figure out when is "appropriate".  I'll instrument a build and get some length stats.
This is not all that representative, but from a random browsing session by me, with some Facebook visiting and other stuff, a few stats-dumps with the attached (obviously hackish) patch.  I didn't try to visit any non-English sites, and I'm not sure what "representative" browsing would be that takes into account what a median browsing session incorporating some non-English sites would look like.  But these stats-dumps (none from the same process -- I scrolled through scrollback to grab copies from runtimes in the different processes) do suggest substantial room to optimize into thin and maybe fat inline strings:

NOTE:
  Thin:
    max Latin-1:  15
    max two-byte: 7
  Fat:
    max Latin-1:  23
    max two-byte: 11

NewMaybeExternalString observed sizes
 N  |  Latin-1   |  Two-byte
----------------------------
 0 |       0    |       0  
 1 |       0    |       0  
 2 |       0    |       0  
 3 |   46946    |       0  
 4 |    4019    |       0  
 5 |    4631    |       0  
 6 |     978    |       0  
 7 |    3983    |       0  
 8 |    1732    |       0  
 9 |    1372    |       0  
10 |    4122    |       0  
11 |    3232    |       0  
12 |     137    |       0  
13 |    1186    |       0  
14 |    1598    |       0  
15 |     698    |       0  
16 |     970    |       0  
17 |    1595    |       0  
18 |      46    |       0  
19 |     498    |       0  
20 |     570    |       0  
21 |      29    |       0  
22 |      86    |       0  
23+ |       0    |   17574  

NewMaybeExternalString observed sizes
 N  |  Latin-1   |  Two-byte
----------------------------
 0 |       0    |       0  
 1 |       0    |       0  
 2 |       2    |       0  
 3 |     413    |       0  
 4 |    2091    |       0  
 5 |    2448    |       0  
 6 |    2197    |       0  
 7 |    1033    |       0  
 8 |     552    |       0  
 9 |     235    |       0  
10 |     447    |       0  
11 |     431    |       1  
12 |    3004    |       0  
13 |     593    |       0  
14 |     181    |       0  
15 |     220    |       0  
16 |     162    |       1  
17 |     357    |       0  
18 |     140    |       1  
19 |     105    |       0  
20 |     104    |       0  
21 |     138    |       0  
22 |      66    |       0  
23+ |       0    |    8247  

NewMaybeExternalString observed sizes
 N  |  Latin-1   |  Two-byte
----------------------------
 0 |       0    |       0  
 1 |       0    |       0  
 2 |       0    |       0  
 3 |   46497    |       0  
 4 |    4630    |       0  
 5 |    4878    |       0  
 6 |    1549    |       0  
 7 |    2589    |       0  
 8 |    1504    |       0  
 9 |    1551    |       0  
10 |    3127    |       0  
11 |    2885    |       0  
12 |     506    |       0  
13 |     914    |       0  
14 |    1413    |       0  
15 |     685    |       0  
16 |     885    |       0  
17 |    1392    |       0  
18 |      77    |       0  
19 |     520    |       0  
20 |     605    |       0  
21 |      58    |       0  
22 |      75    |       0  
23+ |       0    |   18336

There remains some question just how much effort we should go to to use inline strings, but with these sorts of numbers, it seems at least worth *some* effort.  I'll write up a patch that does only thin inline strings, then a patch extending it to fat inline strings, and we can review/land the parts incrementally and see what the numbers look like in the real world.
Assignee: nobody → jwalden+bmo
Status: NEW → ASSIGNED
Flags: needinfo?(jwalden+bmo)
Talos comparison results here:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=0f7d159b26e2b2cb5582bf6a80730cb96ba52051&newProject=try&newRevision=6f211a77e5cd74ce8637fd0e458220651779bf77&framework=1

As noted on IRC, this does not seem a particularly easy patch to benchmark.  You slow down the process of creating a string using this function, but you gain in not having to call a finalizer...eventually.  And a finalizer that maybe has a well-predicted indirect call?  *shrug*  Data just doesn't seem like it's going to be clear here, unfortunately.

However, the basic idea seems pretty obviously like one that would pay off, so I wouldn't sweat trying to get a real measurement and do thin inlines anyway.  Fat is maybe different -- your strings get bigger -- so at the worst, hold off on it til someone comes up with a good way to prove its necessity.
Attachment #8952599 - Flags: review?(sphink)
Comment on attachment 8952599 [details] [diff] [review]
Make JS_NewMaybeExternalString return thin (but not fat) inline Latin-1 strings when the provided chars/length will fit within one

Review of attachment 8952599 [details] [diff] [review]:
-----------------------------------------------------------------

The talos results don't have anything that looks scary to me, and I agree that this intuitively seems worthwhile.
Attachment #8952599 - Flags: review?(sphink) → review+
Pushed by jwalden@mit.edu:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0852d89a06a1
Make JS_NewMaybeExternalString return thin (but not fat) inline Latin-1 strings when the provided chars/length will fit within one.  r=sfink
https://hg.mozilla.org/mozilla-central/rev/0852d89a06a1
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
Performance Impact: --- → P1
Whiteboard: [qf:f61][qf:p1]
You need to log in before you can comment on or make changes to this bug.