Closed
Bug 1493441
Opened Last year
Closed Last year
Make ScriptSource able to hold UTF-8 in addition to UTF-16
Categories
(Core :: JavaScript Engine, enhancement, P2)
Core
JavaScript Engine
P2
Tracking
()
RESOLVED
FIXED
mozilla64
Tracking | Status | |
---|---|---|
firefox64 | --- | fixed |
People
(Reporter: Waldo, Assigned: Waldo)
References
Details
Attachments
(5 files, 3 obsolete files)
63.44 KB,
patch
|
Details | Diff | Splinter Review | |
2.20 KB,
patch
|
tcampbell
:
review+
|
Details | Diff | Splinter Review |
2.26 KB,
patch
|
tcampbell
:
review+
|
Details | Diff | Splinter Review |
2.59 KB,
patch
|
tcampbell
:
review+
|
Details | Diff | Splinter Review |
67.28 KB,
patch
|
tcampbell
:
review+
|
Details | Diff | Splinter Review |
If we're going to parse UTF-8 and save any memory, we can't have lazily compiled functions on UTF-8 source text without making the held/compressed source text be stored as UTF-8 (and not inflated to UTF-16). In theory I finished up the patching to make this *possible* -- but not actually to create any such ScriptSources backed by UTF-8 text -- on the flights yesterday. Still need to test it, for one. And for another, it currently uses "char" instead of "Utf8Unit" for code units, and the intent of that type was to use it for things like this, so I want to try swapping that into place for the final patch.
Assignee | ||
Comment 1•Last year
|
||
Mostly the process of fixing this is just making ScriptSource::Compressed and ScriptSource::Uncompressed templates and then templating a bunch of code and then doing type-tests before instantiating PinnedChars with the appropriate type. Lots of mess, but not actually many messy changes in it. Still have various cleanup to do as mentioned in comment 0.
Assignee | ||
Updated•Last year
|
Assignee: nobody → jwalden+bmo
Status: NEW → ASSIGNED
Assignee | ||
Comment 2•Last year
|
||
In a tree a few days old, all jstests pass. jit-test results are: [6386| 22| 5| 0] 100% ==========================================>| 976.2s FAILURES: auto-regress/bug1468629.js basic/bug1057571.js latin1/assorted.js parser/bug-1431353.js xdr/async.js xdr/bug1186973.js xdr/classes.js xdr/debug-lazy.js xdr/decode-off-thread.js xdr/incremental-encoder.js xdr/function-flags.js xdr/scope.js xdr/tagged-template-literals.js xdr/lazy.js xdr/relazify.js xdr/tagged-template-literals-2.js xdr/trivial.js TIMEOUTS: basic/bug832203.js basic/inflate-oom.js gc/bug-1143706.js gc/bug-1462337.js gc/oomInGetJumpLabelForBranch.js so there clearly remain *some* issues to this.
Assignee | ||
Comment 3•Last year
|
||
Turns out XDRState::codeChars for Latin1Char is only half-implemented, not for both encoding and decoding. Still going to look/work more on this to see if I can get it to use Utf8Unit, tho, before posting for review.
Assignee | ||
Updated•Last year
|
Attachment #9011238 -
Attachment is obsolete: true
Assignee | ||
Comment 4•Last year
|
||
Ideally we need do as few Utf8Unit<->char/unsigned char conversions in as few places as possible. So this makes sense to do.
Attachment #9011974 -
Flags: review?(tcampbell)
Assignee | ||
Comment 5•Last year
|
||
I busted up in my patching because this wasn't happening. There is not a caller that would not need to align first, or that couldn't benefit from a harmless don't-need-to-align: it should be in codeChars.
Attachment #9011975 -
Flags: review?(tcampbell)
Assignee | ||
Comment 6•Last year
|
||
Given that I added the Utf8Unit overload, it's possible my subsequent patches don't need this, come to think of it. But it is rather Dumb that we had this and it only worked for one way, when it's easy to support both ways.
Attachment #9011976 -
Flags: review?(tcampbell)
Assignee | ||
Comment 7•Last year
|
||
Mostly this is just a lot of adding templates to stuff. Our source caching mechanisms are already keyed by a thing that can incorporate source text type. The source cache mechanism under the hood supports both one-byte and two-byte data incoming (and under the hood just treats it all as bags'o'bytes). All accesses are type-directed from a PinnedChars, and with a runtime tag-test on the relevant variant, it's easy to have one path for PinnedChars<Utf8Unit> and one path for PinnedChars<char16_t>. This passes all jstests and jit-tests locally. I'll kick off a try-push for whenever this is reviewed, which doesn't need to be very soon, because the rest of this week I'm out at Strange Loop. Should be around next week, at least after EDT mornings due to SCOTUS oral arguments I'm attending. Opening week OT2018, baby! 🤘⚖🤓
Attachment #9011978 -
Flags: review?(tcampbell)
Comment 8•Last year
|
||
Comment on attachment 9011974 [details] [diff] [review] Implement XDRState::codeChars(Utf8Unit*, size_t) Review of attachment 9011974 [details] [diff] [review]: ----------------------------------------------------------------- r=me if we are sure std::transform meets our minimum compiler targets. ::: js/src/vm/Xdr.cpp @@ +79,5 @@ > + if (!ptr) { > + return fail(JS::TranscodeResult_Throw); > + } > + > + std::transform(units, units + count, It looks like std::transform is marked C++17 and I thought we were C++14 in Gecko? Change my mind. @@ +87,5 @@ > + if (!ptr) { > + return fail(JS::TranscodeResult_Failure_BadDecode); > + } > + > + std::transform(ptr, ptr + count, This was a zero-copy for Latin1, but that seems a silly ideal for this XDR stuff anyways.
Attachment #9011974 -
Flags: review?(tcampbell) → review+
Comment 9•Last year
|
||
Comment on attachment 9011975 [details] [diff] [review] Make XDRState::codeChars(char16_t*, size_t) perform buffer-alignment before acting so that its callers don't have to do so manually Review of attachment 9011975 [details] [diff] [review]: ----------------------------------------------------------------- My understanding here is that codeChars(char16_t) is only called for the ENCODE case (and we use peek for the other case). Therefore the align needs to be outside codeChars? Feel free to ping me about this if I'm missing something.
Attachment #9011975 -
Flags: review?(tcampbell) → review-
Comment 10•Last year
|
||
Comment on attachment 9011976 [details] [diff] [review] Change XDRState::codeChars(const Latin1Char*, size_t) to XDRState::codeChars(Latin1Char*, size_t) for consistency with the other overloads of the same name for char16_t and Utf8Unit Review of attachment 9011976 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/vm/Xdr.cpp @@ +48,4 @@ > > template<XDRMode mode> > XDRResult > +XDRState<mode>::codeChars(Latin1Char* chars, size_t nchars) Agreed. It was bizarre to have a |mode| template parameter and const buffer.
Attachment #9011976 -
Flags: review?(tcampbell) → review+
Assignee | ||
Comment 11•Last year
|
||
(In reply to Ted Campbell [:tcampbell] (PTO until Oct 9) from comment #8) > r=me if we are sure std::transform meets our minimum compiler targets. I imagine you're reading https://en.cppreference.com/w/cpp/algorithm/transform which has somewhat confusing notation -- std::transform is new as constexpr since C++20 (I think), but it's been in C++ for a good long time. Plus I found std::transform in the C++11 spec and was going off that. I expect we can rely on it. :-) Although I should probably try for safety.
Comment 12•Last year
|
||
Pushed by jwalden@mit.edu: https://hg.mozilla.org/integration/mozilla-inbound/rev/60633a71c04d Implement XDRState::codeChars(Utf8Unit*, size_t). r=tcampbell https://hg.mozilla.org/integration/mozilla-inbound/rev/2f853725d168 Change XDRState::codeChars(const Latin1Char*, size_t) to XDRState::codeChars(Latin1Char*, size_t) for consistency with the other overloads of the same name for char16_t and Utf8Unit. r=tcampbell
Assignee | ||
Updated•Last year
|
Keywords: leave-open
Comment 13•Last year
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/60633a71c04d https://hg.mozilla.org/mozilla-central/rev/2f853725d168
Assignee | ||
Comment 14•Last year
|
||
Mm, I think you're right about that. Nonetheless, I think always aligning -- and making the exceptional cases be extra-careful about it -- is still good to do. This version of the patch should do it correctly, I think.
Attachment #9014975 -
Flags: review?(tcampbell)
Assignee | ||
Updated•Last year
|
Attachment #9011975 -
Attachment is obsolete: true
Assignee | ||
Comment 15•Last year
|
||
Updated for after bug 1459067's changes that made a mess of the code in these areas. :-)
Attachment #9014984 -
Flags: review?(tcampbell)
Assignee | ||
Updated•Last year
|
Attachment #9011978 -
Attachment is obsolete: true
Attachment #9011978 -
Flags: review?(tcampbell)
Updated•Last year
|
Priority: -- → P2
Updated•Last year
|
Attachment #9014975 -
Flags: review?(tcampbell) → review+
Comment 16•Last year
|
||
Pushed by jwalden@mit.edu: https://hg.mozilla.org/integration/mozilla-inbound/rev/acd510f0152a Make XDRState::codeChars(char16_t*, size_t) perform buffer-alignment before acting so that its callers don't have to do so manually. r=tcampbell
Comment 17•Last year
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/acd510f0152a
Comment 18•Last year
|
||
Comment on attachment 9014984 [details] [diff] [review] Allow ScriptSource to store UTF-8 script data in addition to UTF-16 script data (but don't create any UTF-8-backed ScriptSources yet) Review of attachment 9014984 [details] [diff] [review]: ----------------------------------------------------------------- Once I understood the patterns for mozilla::Variant, this makes far more sense to me. Seems to be some good cleanups in the XDR code too which is nice. Rubberstamp for the RR-related stuff, but it seems reasonable and we checked with :bhackett. ::: js/src/vm/JSFunction.cpp @@ +1816,5 @@ > + > + // XXX There are no UTF-8 ScriptSources now, so just crash so this > + // gets filled in later. > + MOZ_CRASH("UTF-8 lazy function compilation not implemented yet"); > + } else { MOZ_ASSERT(lazy->scriptSource()->hasSourceType<uint16_t>()); or similar. ::: js/src/vm/JSScript.cpp @@ +1834,4 @@ > cursor += numChars; > } > > + // XXX Can we remove the null-termination? It's unclear if anyone uses I've opened Bug 1499192 to remove the null-termination. ::: js/src/vm/JSScript.h @@ +460,5 @@ > + > + static UniqueChars toCacheable(EntryChars<mozilla::Utf8Unit> str) { > + // The cache only stores strings of |char| or |char16_t|, and right now > + // it seems best not to gunk up the cache with |Utf8Unit| too. So > + // cache |Utf8Unit| strings by interpreting them as |char| strings. *Please* make sure the SharedImmutableStringCache gets some comments indicating that we may have UTF-8 strings in it. I almost wonder if we should have a follow-up bug to de-type the ImmutableStringCache and have it just store black-box raw bytes. ::: toolkit/recordreplay/ipc/JSControl.cpp @@ +691,5 @@ > + if (!aContentType) { > + return false; > + } > + > + aContent.set(info.mContent8.length() > 0 Can you assert that only one length>0 at a time?
Attachment #9014984 -
Flags: review?(tcampbell) → review+
Comment 19•Last year
|
||
Pushed by jwalden@mit.edu: https://hg.mozilla.org/integration/mozilla-inbound/rev/2ec2641a20ae Allow ScriptSource to store UTF-8 script data in addition to UTF-16 script data (but don't create any UTF-8-backed ScriptSources yet). r=tcampbell
Assignee | ||
Comment 20•Last year
|
||
And with that, this is done. More work in other bugs is necessary to *create* and *consume* UTF-8 ScriptSources, but this should finish up ScriptSource changes (barring the unexpected tweak or two, as always).
Keywords: leave-open
Comment 21•Last year
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/2ec2641a20ae
Status: ASSIGNED → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
You need to log in
before you can comment on or make changes to this bug.
Description
•