Closed
Bug 1485800
Opened 6 years ago
Closed 6 years ago
Allow SourceBufferHolder to store both UTF-8 and UTF-16 source text
Categories
(Core :: JavaScript Engine, defect, P1)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
mozilla65
People
(Reporter: Waldo, Assigned: Waldo)
References
Details
Attachments
(1 file, 1 obsolete file)
81.74 KB,
patch
|
tcampbell
:
review+
fitzgen
:
review+
|
Details | Diff | Splinter Review |
No description provided.
Updated•6 years ago
|
Priority: -- → P1
Updated•6 years ago
|
Updated•6 years ago
|
status-firefox64:
--- → wontfix
status-firefox65:
--- → affected
Assignee | ||
Comment 1•6 years ago
|
||
The templates are technically all that's needed here, but if I have to touch everyone, why not rename to simplify as well? Also I like "take" as instructing the various functions what to do, and "borrowed" is consistent with increasing Rust terminology IMO. Also, more introduction of UniquePtr for always-taking callers.
Attachment #9021011 -
Flags: review?(tcampbell)
Assignee | ||
Comment 2•6 years ago
|
||
On Windows, the various Windows API things that work with char16_t on our side, return char16ptr_t to represent a pointer to characters -- which converts to char16_t* *and* to wchar_t that Windows APIs want. That means we can't just have one overload to handle both |const Unit*| and |const CharT*|.
So instead, there's one for Unit. And for CharT, we use some SFINAE to only expose it if CharT != Unit. Bleh.
The Rust build also needed minor tweaking -- seems that now we need a PhantomData in the single SourceText creation in pure Rust code. Not sure why the change exactly, but it's not hard to do. Poking fitzen to review the js/rust changes -- the only other file you probably would want to look at to understand that is js/public/Source{BufferHolder,Text}.h in the patch.
Attachment #9022354 -
Flags: review?(tcampbell)
Attachment #9022354 -
Flags: review?(nfitzgerald)
Assignee | ||
Updated•6 years ago
|
Attachment #9021011 -
Attachment is obsolete: true
Attachment #9021011 -
Flags: review?(tcampbell)
Updated•6 years ago
|
Flags: needinfo?(bzbarsky)
Assignee | ||
Comment 3•6 years ago
|
||
Er, whoops -- sorry bz, I meant bug 1503086. :-( That's the bug that moves source-length enforcement out of the JS engine, into callers who must manually/fallibly initialize SourceBufferHolder, and handle errors if it fails. I don't think there's anything here that needs a Gecko-minded look.
Updated•6 years ago
|
Flags: needinfo?(bzbarsky)
Comment 4•6 years ago
|
||
Comment on attachment 9022354 [details] [diff] [review]
Updated so the Rust bindings will compile, and so (on Windows) char16ptr_t can be passed to the |init| function
Review of attachment 9022354 [details] [diff] [review]:
-----------------------------------------------------------------
LGTM! A thought on API design below, feel free to take it or leave it.
::: js/public/SourceText.h
@@ +9,5 @@
> + * units ("source units").
> + *
> + * A SourceText either observes without owning, or takes ownership of, source
> + * units passed to |SourceText::init|. Thus SourceText can be used to
> + * efficiently avoid copying.
I wonder if the API would be nicer if `JS::SourceOwnership` wasn't exposed to API consumers, and instead there were two diffferent factory/initialization methods:
1. JS::SourceText::CreateOwned(...) or `JS::SourceText::initOwned(...)`
2. JS::SourceText::CreateBorrowed(...) or `JS::SourceText::initBorrowed`
Attachment #9022354 -
Flags: review?(nfitzgerald) → review+
Assignee | ||
Comment 5•6 years ago
|
||
It's a fair/plausible thought. However, there are some existing callers in Gecko and SpiderMonkey that do pass a variable SourceOwnership -- generally but not exclusively through JS::AutoStableStringChars::maybeGiveOwnershipToCaller -- and I'm not sure what I'd do about them, precisely. Not every single one of those callers can/could be adapted for this. :-\ Still, maybe we can cut this knot at some point somehow.
Comment 6•6 years ago
|
||
Comment on attachment 9022354 [details] [diff] [review]
Updated so the Rust bindings will compile, and so (on Windows) char16ptr_t can be passed to the |init| function
Review of attachment 9022354 [details] [diff] [review]:
-----------------------------------------------------------------
I can see you are trying to be precise about CharT and Unit, but a lot of code feels very obtuse. A bit of exposition of the practical encodings we are finagling and the why our UTF-8 is sometimes char and sometimes Utf8Unit would go a long way. The code itself holds up on technical grounds, and I think the style and organization makes sense.
::: js/public/SourceText.h
@@ +5,5 @@
>
> /*
> + * SourceText encapsulates a count of char16_t (UTF-16) or Utf8Unit (UTF-8)
> + * code units (note: code *units*, not bytes or code points) and those code
> + * units ("source units").
It is probably worth having an explanatory paragraph explaining that the parser does not support Latin-1, but it exists in many places in the engine but is expanded to char16_t before becoming a SourceText.
@@ +9,5 @@
> + * units ("source units").
> + *
> + * A SourceText either observes without owning, or takes ownership of, source
> + * units passed to |SourceText::init|. Thus SourceText can be used to
> + * efficiently avoid copying.
I like this idea a lot. (I would also like a third option for externally owned, but always available so no copying is needed... for reasons.)
@@ +21,5 @@
> + * keep them alive until associated JS compilation is complete.
> + * 3) Code that calls |SourceText::take{Chars,Units}()| must keep the source
> + * units alive until JS compilation completes. Normally only the JS engine
> + * should call |SourceText::take{Chars,Units}()|.
> + * 4) Use the appropriate SourceText parametrization depending on the source
*parameterization
@@ +32,2 @@
> * if (!chars) {
> * return false;
This may be a comment, but probably should include JS_ReportOutOfMemory
@@ +148,2 @@
>
> + static const CharT nonNullChar = '\0';
Something along the lines of |static const CharT emptyString[] = { '\0' };|
@@ +151,5 @@
> // Initialize all fields *before* checking length. This ensures that
> + // if |ownership == SourceOwnership::TakeOwnership|, |units| will be
> + // freed when |this|'s destructor is called.
> + if (units) {
> + units_ = reinterpret_cast<const Unit*>(units);
const_cast
@@ +200,3 @@
> }
>
> + const CharT* get() const { return reinterpret_cast<const CharT*>(units_); }
Add a short comment suggesting get() is used when doing string-related operations, and units() is for the raw storage. Or some sort of hints of the subtlety here.
Attachment #9022354 -
Flags: review?(tcampbell) → review+
Pushed by jwalden@mit.edu:
https://hg.mozilla.org/integration/mozilla-inbound/rev/56eaf6c976d3
Rename SourceBufferHolder to SourceText, and add a <typename Unit> template parameter to it so it can hold putative UTF-8 or UTF-16 source text. r=tcampbell, r=fitzgen
Comment 8•6 years ago
|
||
bugherder |
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
You need to log in
before you can comment on or make changes to this bug.
Description
•