860574 - No performant way to construct a JS String from a typed array containing code points

Reporter

Description

•

12 years ago

After a discussion with Azakai and Chad in the #emscripten channel, I did some testing and discovered that there is no performant way to construct a JS string instance from a typed array containing some codepoints. For a normal JS array containing codepoints, you can use String.fromCharCode.apply, which - while definitely suboptimal, apply has overhead - performs fairly okay. Unfortunately, .apply seems to perform awful when passed a typed array for the arguments list. And worse still, the obvious ways of doing this by hand in JS are just as slow (if not slower). There appears to be no fast way to construct a string from typed array based code (like Emscripten applications), and it sounds like this has actually shown up as a bottleneck in tests. JSIL probably needs a fast way to do this too, as it moves closer to using Typed Arrays whenever possible instead of JS arrays. Here's a SPS profile from nightly: http://people.mozilla.com/~bgirard/cleopatra/#report=6ef88764e125a9e28a0341ad222989dd1b269275 Apologies for the test case being jsperf; I took steps to try and ensure that it's actually testing the workload and not being optimized out by LICM or anything. In the short term I think it's worth considering figuring out how to optimize at least one of the alternative approaches for constructing a string from a typed array - making those pure JS loops faster seems like it could benefit other workloads. Alternately, could .apply be made fast for typed arrays? I don't really know what's involved there. That seems like it could provide some wins for other workloads too. In the long run we probably want a well-specified way to do string decoding/encoding against typed arrays, using the browser's built in text encoding machinery and collation tables. I've been thinking about this some so hopefully I'll have a proposal for that to share in the near future.

Nicolas B. Pierron [:nbp] {back on 17/11}

Comment 1

•

12 years ago

(In reply to Kevin Gadd (:kael) from comment #0) > In the short term I think it's worth considering figuring out how to > optimize at least one of the alternative approaches for constructing a > string from a typed array - making those pure JS loops faster seems like it > could benefit other workloads. I did some work to improve append operations on strings (see Bug 856178). This optimization cut by 2 the time needed for the execution of such loop, by allocating only one buffer ahead where the compiler can ensure that it is safe to append single character to it. Bug 856178 will not improve the case reported in the jsperf benchmarks, as it construct a charCache which does not provide the same guarantee as fromCharCode. 1. arguments[i] is sub-optimal because it copies the vector to the C stack. (and fallback to JM / interpreter if the vector is too large) 2. fun.apply is sub-optimal because it copies the vector to the JSStackFrame. 3. Using the same function with both Array and Typed Array cause us to generate a GetElementIC, which check the bounds and the type of the array at every read. 4. fromCharCode should be faster than a charCache. (at least on the 0-255 range)

Katelyn Gadd (:kael)

Reporter

Comment 2

•

12 years ago

Thanks for the specfic perf advice! 1 and 2 aren't a surprise to me. If they will remain suboptimal forever, that makes sense. Does that apply to String.fromCharCode.apply as well, though? 3 - I actually took steps to avoid that, if you look at the test cases that don't use Function.apply, I cloned the function in order to give them separate PIC data so that they only ever get passed an Array or Typed Array. So the reads should be fast. (Whether they actually are is another question, I guess). 4. I didn't know fromCharCode is cached! Someone had said in the channel that it was slower than doing a cache manually. I'll update the test to use fromCharCode, then.

Till Schneidereit [:till]

Updated

•

12 years ago

Blocks: 885526

Katelyn Gadd (:kael)

Reporter

Updated

•

12 years ago

Whiteboard: [games:p?]

Katelyn Gadd (:kael)

Reporter

Updated

•

12 years ago

Blocks: JSIL

Brendan Dahl [:bdahl]

Comment 3

•

12 years ago

I'm not sure how performant it is but what about using a TextDecoder? http://wiki.whatwg.org/wiki/StringEncoding#TextDecoder

Katelyn Gadd (:kael)

Reporter

Comment 4

•

12 years ago

Looks perfect assuming it's performant. When were those classes introduced? Are they shipping in FF and/or Chrome at present?

Boris Zbarsky [:bzbarsky]

Comment 5

•

12 years ago

On main thread, we've supported TextDecoder since Firefox 18. In workers, since Firefox 20. Not sure about Chrome. Also, if your string is small, performance of TextDecoder may not be that great. Worth measuring for large strings, though.

Katelyn Gadd (:kael)

Reporter

Comment 6

•

12 years ago

I'll try and do some benchmarking with TextDecoder. IMO the only thing that would really improve on it in theory would be a way to create a 'string view' backed by a region of a Uint16Array containing UTF-16 codepoints (basically, given that the internal representation of JS strings is set in stone, allow you to create a string backed by your typed array instead of having to make a copy). But I imagine that would involve some gotchas like having to invalidate any cached hashes for the string when the view changed, etc.

Brendan Dahl [:bdahl]

Comment 7

•

12 years ago

It's not supported by chrome yet, they've started work http://code.google.com/p/chromium/issues/detail?id=243354. For Chrome, in pdf.js we use FileReaderSync().readAsText(new Blob([bytes]), encoding); but FileReaderSync is only available on the worker.

Alon Zakai (:azakai)

Comment 8

•

12 years ago

3x faster than the current emscripten stringification code on a 1024-character test string.

Boris Zbarsky [:bzbarsky]

Comment 9

•

12 years ago

Fwiw, if you have a testcase to profile, we might be able to speed up TextDecoder too. Just not sure where the hotspots might be.

Marco Mucci [:MarcoM]

Updated

•

12 years ago

Blocks: gecko-games

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Updated

•

11 years ago

Whiteboard: [games:p?] → [games:p3]

Nobody; OK to take it and work on it

Assignee

Updated

•

11 years ago

Assignee: general → nobody

Mike Hoye [:mhoye]

Updated

•

11 years ago

Whiteboard: [games:p3] → [games:p3][diamond]

Anthony Vaughn [:anthony][:avaughn][San Francisco - Pacific]

Updated

•

9 years ago

Whiteboard: [games:p3][diamond] → [games:p?][diamond]

BMO Automation

Updated

•

3 years ago

Severity: normal → S3

Mayank Bansal

Comment 10

•

1 year ago

Is this still an issue for you? If yes, please could you attach a testcase?

Flags: needinfo?(kg)

Mayank Bansal

Comment 11

•

1 year ago

No response from OP.
Code like this should target WASM these days.

Therefore I am closing this bug. Please reopen if needed!

Status: NEW → RESOLVED

Closed: 1 year ago

Resolution: --- → WONTFIX

Bugzilla

No performant way to construct a JS String from a typed array containing code points

Categories

(Core :: JavaScript Engine, defect)

Tracking

()

People

(Reporter: kael, Unassigned, NeedInfo)

References

(Blocks 1 open bug,
URL
)

Details

(Whiteboard: [games:p?][diamond])

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Updated

Updated

Updated

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Updated

Updated

Updated

Updated

Updated

Updated

Comment 10

Comment 11