1038099 - Consider creating Latin1 inline strings in JS_NewExternalString if the string is short

Reporter

Description

•

11 years ago

Gecko passes two-byte strings to JS_NewExternalString, which is a bit unfortunate because concatenating such strings will result in more two-byte strings. For short strings though, JS_NewExternalString could just try to deflate and create a (fat)inline Latin1 string. In the non-fat inline string case, this would use no extra memory at all. This could also be nice for nursery-allocated strings (bug 903519), because we don't have to call the external string finalizer. We should instrument the browser to get some data on external string length, count and encoding.

neil@parkwaycc.co.uk

Comment 1

•

11 years ago

What about one-byte strings that XPConnect currently inflates to two-byte strings?

neil@parkwaycc.co.uk

Comment 2

•

11 years ago

Ah, those are already latin1, I didn't follow the template well enough.

Boris Zbarsky [:bzbarsky]

Comment 3

•

11 years ago

Note that this would need some sort of API change to tell the consumer that we ignored the passed-in pointer, right?

Jan de Mooij [:jandem]

Reporter

Comment 4

•

11 years ago

(In reply to Boris Zbarsky [:bz] from comment #3) > Note that this would need some sort of API change to tell the consumer that > we ignored the passed-in pointer, right? Either that or JS_NewExternalString could just call the finalizer when it creates a non-external string.

Boris Zbarsky [:bzbarsky]

Comment 5

•

11 years ago

That may not be safe unless the finalizer call happens async. For example, consider the code in NonVoidStringToJsval which looks like this once you inline the StringBufferToJSVal stuff: nsStringBuffer* buf = str.StringBuffer(); JSString* str = JS_NewExternalString(cx, buf->Data(), length, &sDOMStringFinalizer); if (str) { buf->AddRef(); } this could inadvertently delete the buffer if it starts out with a refcount of 1 and the finalizer is called before JS_NewExternalString returns.

Jan de Mooij [:jandem]

Reporter

Comment 6

•

11 years ago

(In reply to Boris Zbarsky [:bz] from comment #5) Ah, fair enough. In that case we'd probably want to update the API, yeah...

Luke Wagner [:luke]

Comment 7

•

11 years ago

(In reply to Boris Zbarsky [:bz] from comment #5) That's a rather dangerous piece of code, as it relies on a GC not occurring before the AddRef. It seems like the AddRef() should be moved above the NewExternalString with a Release() in the failure case (or, in the more modern style, nsAutoRef with a Release() in the success case). There appears to be only two uses of JS_NewExternalString, so perhaps we can just change it in place.

Boris Zbarsky [:bzbarsky]

Comment 8

•

11 years ago

> as it relies on a GC not occurring before the AddRef. That's true. In practice this assumption holds. > It seems like the AddRef() should be moved above the NewExternalString with a Release() > in the failure case We don't know ahead of time whether we'll call JS_NewExternalString, it's very common in hot code to not actually end up calling it, and in the hot case when we don't call it (the cache hit on the string cache) we'd rather not pay the cost of two useless atomic operations... > There appears to be only two uses of JS_NewExternalString In Gecko code.

Luke Wagner [:luke]

Comment 9

•

11 years ago

(In reply to Boris Zbarsky [:bz] from comment #8) > We don't know ahead of time whether we'll call JS_NewExternalString, it's > very common in hot code to not actually end up calling it, and in the hot > case when we don't call it (the cache hit on the string cache) we'd rather > not pay the cost of two useless atomic operations... I was thinking the AddRef() would immediately precede the NewExternalString. > > There appears to be only two uses of JS_NewExternalString > > In Gecko code. Don't we hide JSAPI symbols in various builds now (bug 920731) with the intention to hide them completely once we can fold JS back into libxul (bug 609976, but I think there are more recent attempts met with PGO bustage)? Anyhow, if that was a realistic concern, then yes we should probably rename.

Boris Zbarsky [:bzbarsky]

Comment 10

•

11 years ago

> I was thinking the AddRef() would immediately precede the NewExternalString. The problem is that the code that is doing the NewExternalString doesn't know whether the addref is needed and the code that knows that doesn't know ahead of time whether we'll do NewExternalString. This arises from having too many string types, with too many different storage strategies, and needing them all tolerably fast. :( > Don't we hide JSAPI symbols in various builds now I'm thinking about other SpiderMonkey embedders, not use in Firefox.

neil@parkwaycc.co.uk

Comment 11

•

11 years ago

(In reply to Jan de Mooij) > Gecko passes two-byte strings to JS_NewExternalString, which is a bit > unfortunate because concatenating such strings will result in more two-byte > strings. > > For short strings though, JS_NewExternalString could just try to deflate and > create a (fat)inline Latin1 string. In the non-fat inline string case, this > would use no extra memory at all. > > This could also be nice for nursery-allocated strings (bug 903519), because > we don't have to call the external string finalizer. > > We should instrument the browser to get some data on external string length, > count and encoding. I can't tell you about length yet, I've only just found out how long is a piece of string. A quick startup/shutdown test with my instrumentation gives the following results: 97% of UTF16 strings passed from C++ to JS created external strings. 20% of UTF16 strings passed from JS to XPConnect were external strings where XPConnect was able to retrieve the original strings. (I don't think WebIDL tries to do this.) 78% of Latin1 strings passsed from XPConnect to JS could have been external strings. (I don't know if WebIDL ever has need to pass Latin1 strings to JS.) 5% of Latin1 strings passed from XPConnect to JS could have been external strings.

Boris Zbarsky [:bzbarsky]

Comment 12

•

11 years ago

> I don't think WebIDL tries to do this. That's correct. If we have a fast inline way to do it, we might... > I don't know if WebIDL ever has need to pass Latin1 strings to JS. It's pretty rare but does happen. Mostly for XMLHttpRequest. I assume the description of either the 78% number or the 5% number is wrong?

neil@parkwaycc.co.uk

Comment 13

•

11 years ago

(In reply to Boris Zbarsky from comment #12) > I assume the description of either the 78% number or the 5% number is wrong? Yeah, the 5% was supposed to be strings passed back from JS to XPConnect.

Jan de Mooij [:jandem]

Reporter

Comment 14

•

11 years ago

(In reply to neil@parkwaycc.co.uk from comment #11) > A quick startup/shutdown test with my instrumentation gives the following > results: Thanks for measuring! Do you know if, at least in some cases, we can use Latin1 external strings? Right now all external strings are TwoByte but it'd be pretty trivial to fix this on the JS side...

neil@parkwaycc.co.uk

Comment 15

•

11 years ago

(In reply to Jan de Mooij from comment #14) > Do you know if, at least in some cases, we can use Latin1 external strings? > Right now all external strings are TwoByte but it'd be pretty trivial to fix > this on the JS side...

neil@parkwaycc.co.uk

Comment 16

•

11 years ago

(In reply to Boris Zbarsky from comment #12) > > I don't think WebIDL tries to do this. > > That's correct. If we have a fast inline way to do it, we might... Doesn't seem very useful; in my follow-up test fewer than 1% of DOM strings converted back from JS turned out to be external. Also my statistic of 78% of DOM strings being turned into external JS strings is incorrect because of various codepaths.

neil@parkwaycc.co.uk

Comment 17

•

11 years ago

(In reply to from comment #16) > Also my statistic of 78% of DOM strings being turned into external JS > strings is incorrect because of various codepaths. Well, my actual statistic of 97% turned out to be correct after all since the codepaths I was worried about weren't actually used (mostly converting a dictionary to a jsval or something).

neil@parkwaycc.co.uk

Comment 18

•

11 years ago

(In reply to comment #17) > Well, my actual statistic of 97% turned out to be correct Inasmuch as my instrumentation is correct, startup/shutdown really does use that many external strings, but when I start using the browser, the figure drops to 69%. I also looked at string literals but at best (i.e. if they're wide literals) they only account for 1.5% of strings passed into JS.

Naveed Ihsanullah [:naveed]

Updated

•

8 years ago

Whiteboard: [qf]

(no longer active)

Updated

•

8 years ago

Whiteboard: [qf] → [qf:p1]

Shu-yu Guo [:shu]

Updated

•

8 years ago

Whiteboard: [qf:p1] → [qf:p2]

Firefox Bug Husbandry Bot

Updated

•

8 years ago

Keywords: perf

Naveed Ihsanullah [:naveed]

Updated

•

8 years ago

Whiteboard: [qf:p2] → [perf:p2]

Naveed Ihsanullah [:naveed]

Updated

•

8 years ago

Whiteboard: [perf:p2] → [qf:p2]

Naveed Ihsanullah [:naveed]

Comment 19

•

8 years ago

Jeff can you take a look at what is left here?

Flags: needinfo?(jwalden+bmo)

Whiteboard: [qf:p2] → [qf:p1]

Jean Gong :jgong

Updated

•

8 years ago

Whiteboard: [qf:p1] → [qf:i60][qf:p1]

Jean Gong :jgong

Updated

•

8 years ago

Whiteboard: [qf:i60][qf:p1] → [qf:f60][qf:p1]

Jean Gong :jgong

Updated

•

8 years ago

Whiteboard: [qf:f60][qf:p1] → [qf:f61][qf:p1]

Jeff Walden [:Waldo]

Assignee

Comment 20

•

8 years ago

In the time since this was filed, it looks like we've picked up JS_NewMaybeExternalString which is an "update" to the API as mooted in comment 6. So all that's necessary here, then, is just to augment js::NewMaybeExternalString to return inline strings when appropriate. (And maybe not solely for Latin-1 -- no reason it couldn't happen for two-byte strings as well.) So, next step: figure out when is "appropriate". I'll instrument a build and get some length stats.

Jeff Walden [:Waldo]

Assignee

Comment 21

•

8 years ago

Attached patch Stats-dumping patch — Details — Splinter Review

This is not all that representative, but from a random browsing session by me, with some Facebook visiting and other stuff, a few stats-dumps with the attached (obviously hackish) patch. I didn't try to visit any non-English sites, and I'm not sure what "representative" browsing would be that takes into account what a median browsing session incorporating some non-English sites would look like. But these stats-dumps (none from the same process -- I scrolled through scrollback to grab copies from runtimes in the different processes) do suggest substantial room to optimize into thin and maybe fat inline strings: NOTE: Thin: max Latin-1: 15 max two-byte: 7 Fat: max Latin-1: 23 max two-byte: 11 NewMaybeExternalString observed sizes N | Latin-1 | Two-byte ---------------------------- 0 | 0 | 0 1 | 0 | 0 2 | 0 | 0 3 | 46946 | 0 4 | 4019 | 0 5 | 4631 | 0 6 | 978 | 0 7 | 3983 | 0 8 | 1732 | 0 9 | 1372 | 0 10 | 4122 | 0 11 | 3232 | 0 12 | 137 | 0 13 | 1186 | 0 14 | 1598 | 0 15 | 698 | 0 16 | 970 | 0 17 | 1595 | 0 18 | 46 | 0 19 | 498 | 0 20 | 570 | 0 21 | 29 | 0 22 | 86 | 0 23+ | 0 | 17574 NewMaybeExternalString observed sizes N | Latin-1 | Two-byte ---------------------------- 0 | 0 | 0 1 | 0 | 0 2 | 2 | 0 3 | 413 | 0 4 | 2091 | 0 5 | 2448 | 0 6 | 2197 | 0 7 | 1033 | 0 8 | 552 | 0 9 | 235 | 0 10 | 447 | 0 11 | 431 | 1 12 | 3004 | 0 13 | 593 | 0 14 | 181 | 0 15 | 220 | 0 16 | 162 | 1 17 | 357 | 0 18 | 140 | 1 19 | 105 | 0 20 | 104 | 0 21 | 138 | 0 22 | 66 | 0 23+ | 0 | 8247 NewMaybeExternalString observed sizes N | Latin-1 | Two-byte ---------------------------- 0 | 0 | 0 1 | 0 | 0 2 | 0 | 0 3 | 46497 | 0 4 | 4630 | 0 5 | 4878 | 0 6 | 1549 | 0 7 | 2589 | 0 8 | 1504 | 0 9 | 1551 | 0 10 | 3127 | 0 11 | 2885 | 0 12 | 506 | 0 13 | 914 | 0 14 | 1413 | 0 15 | 685 | 0 16 | 885 | 0 17 | 1392 | 0 18 | 77 | 0 19 | 520 | 0 20 | 605 | 0 21 | 58 | 0 22 | 75 | 0 23+ | 0 | 18336 There remains some question just how much effort we should go to to use inline strings, but with these sorts of numbers, it seems at least worth *some* effort. I'll write up a patch that does only thin inline strings, then a patch extending it to fat inline strings, and we can review/land the parts incrementally and see what the numbers look like in the real world.

Assignee: nobody → jwalden+bmo

Status: NEW → ASSIGNED

Flags: needinfo?(jwalden+bmo)

Jeff Walden [:Waldo]

Assignee

Comment 22

•

7 years ago

Attached patch Make JS_NewMaybeExternalString return thin (but not fat) inline Latin-1 strings when the provided chars/length will fit within one — Details — Splinter Review

Talos comparison results here: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=0f7d159b26e2b2cb5582bf6a80730cb96ba52051&newProject=try&newRevision=6f211a77e5cd74ce8637fd0e458220651779bf77&framework=1 As noted on IRC, this does not seem a particularly easy patch to benchmark. You slow down the process of creating a string using this function, but you gain in not having to call a finalizer...eventually. And a finalizer that maybe has a well-predicted indirect call? *shrug* Data just doesn't seem like it's going to be clear here, unfortunately. However, the basic idea seems pretty obviously like one that would pay off, so I wouldn't sweat trying to get a real measurement and do thin inlines anyway. Fat is maybe different -- your strings get bigger -- so at the worst, hold off on it til someone comes up with a good way to prove its necessity.

Attachment #8952599 - Flags: review?(sphink)

Steve Fink [:sfink] [:s:]

Comment 23

•

7 years ago

Comment on attachment 8952599 [details] [diff] [review] Make JS_NewMaybeExternalString return thin (but not fat) inline Latin-1 strings when the provided chars/length will fit within one Review of attachment 8952599 [details] [diff] [review]: ----------------------------------------------------------------- The talos results don't have anything that looks scary to me, and I agree that this intuitively seems worthwhile.

Attachment #8952599 - Flags: review?(sphink) → review+

Pulsebot

Comment 24

•

7 years ago

Pushed by jwalden@mit.edu: https://hg.mozilla.org/integration/mozilla-inbound/rev/0852d89a06a1 Make JS_NewMaybeExternalString return thin (but not fat) inline Latin-1 strings when the provided chars/length will fit within one. r=sfink

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 25

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/0852d89a06a1

Status: ASSIGNED → RESOLVED

Closed: 7 years ago

status-firefox60: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla60

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

3 years ago

Performance Impact: --- → P1

Whiteboard: [qf:f61][qf:p1]

Stats-dumping patch 8 years ago Jeff Walden [:Waldo] 3.05 KB, patch		Details \| Diff \| Splinter Review
Make JS_NewMaybeExternalString return thin (but not fat) inline Latin-1 strings when the provided chars/length will fit within one 7 years ago Jeff Walden [:Waldo] 1.27 KB, patch	sfink : review+	Details \| Diff \| Splinter Review