Closed Bug 1485800 Opened 6 years ago Closed 6 years ago

Allow SourceBufferHolder to store both UTF-8 and UTF-16 source text

Categories

(Core :: JavaScript Engine, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla65
Tracking Status
firefox63 --- wontfix
firefox64 --- wontfix
firefox65 --- fixed

People

(Reporter: Waldo, Assigned: Waldo)

References

Details

Attachments

(1 file, 1 obsolete file)

No description provided.
Blocks: 1485805
Depends on: 1486577
Priority: -- → P1
The templates are technically all that's needed here, but if I have to touch everyone, why not rename to simplify as well? Also I like "take" as instructing the various functions what to do, and "borrowed" is consistent with increasing Rust terminology IMO. Also, more introduction of UniquePtr for always-taking callers.
Attachment #9021011 - Flags: review?(tcampbell)
On Windows, the various Windows API things that work with char16_t on our side, return char16ptr_t to represent a pointer to characters -- which converts to char16_t* *and* to wchar_t that Windows APIs want. That means we can't just have one overload to handle both |const Unit*| and |const CharT*|. So instead, there's one for Unit. And for CharT, we use some SFINAE to only expose it if CharT != Unit. Bleh. The Rust build also needed minor tweaking -- seems that now we need a PhantomData in the single SourceText creation in pure Rust code. Not sure why the change exactly, but it's not hard to do. Poking fitzen to review the js/rust changes -- the only other file you probably would want to look at to understand that is js/public/Source{BufferHolder,Text}.h in the patch.
Attachment #9022354 - Flags: review?(tcampbell)
Attachment #9022354 - Flags: review?(nfitzgerald)
Attachment #9021011 - Attachment is obsolete: true
Attachment #9021011 - Flags: review?(tcampbell)
Flags: needinfo?(bzbarsky)
Er, whoops -- sorry bz, I meant bug 1503086. :-( That's the bug that moves source-length enforcement out of the JS engine, into callers who must manually/fallibly initialize SourceBufferHolder, and handle errors if it fails. I don't think there's anything here that needs a Gecko-minded look.
Flags: needinfo?(bzbarsky)
Comment on attachment 9022354 [details] [diff] [review] Updated so the Rust bindings will compile, and so (on Windows) char16ptr_t can be passed to the |init| function Review of attachment 9022354 [details] [diff] [review]: ----------------------------------------------------------------- LGTM! A thought on API design below, feel free to take it or leave it. ::: js/public/SourceText.h @@ +9,5 @@ > + * units ("source units"). > + * > + * A SourceText either observes without owning, or takes ownership of, source > + * units passed to |SourceText::init|. Thus SourceText can be used to > + * efficiently avoid copying. I wonder if the API would be nicer if `JS::SourceOwnership` wasn't exposed to API consumers, and instead there were two diffferent factory/initialization methods: 1. JS::SourceText::CreateOwned(...) or `JS::SourceText::initOwned(...)` 2. JS::SourceText::CreateBorrowed(...) or `JS::SourceText::initBorrowed`
Attachment #9022354 - Flags: review?(nfitzgerald) → review+
It's a fair/plausible thought. However, there are some existing callers in Gecko and SpiderMonkey that do pass a variable SourceOwnership -- generally but not exclusively through JS::AutoStableStringChars::maybeGiveOwnershipToCaller -- and I'm not sure what I'd do about them, precisely. Not every single one of those callers can/could be adapted for this. :-\ Still, maybe we can cut this knot at some point somehow.
Comment on attachment 9022354 [details] [diff] [review] Updated so the Rust bindings will compile, and so (on Windows) char16ptr_t can be passed to the |init| function Review of attachment 9022354 [details] [diff] [review]: ----------------------------------------------------------------- I can see you are trying to be precise about CharT and Unit, but a lot of code feels very obtuse. A bit of exposition of the practical encodings we are finagling and the why our UTF-8 is sometimes char and sometimes Utf8Unit would go a long way. The code itself holds up on technical grounds, and I think the style and organization makes sense. ::: js/public/SourceText.h @@ +5,5 @@ > > /* > + * SourceText encapsulates a count of char16_t (UTF-16) or Utf8Unit (UTF-8) > + * code units (note: code *units*, not bytes or code points) and those code > + * units ("source units"). It is probably worth having an explanatory paragraph explaining that the parser does not support Latin-1, but it exists in many places in the engine but is expanded to char16_t before becoming a SourceText. @@ +9,5 @@ > + * units ("source units"). > + * > + * A SourceText either observes without owning, or takes ownership of, source > + * units passed to |SourceText::init|. Thus SourceText can be used to > + * efficiently avoid copying. I like this idea a lot. (I would also like a third option for externally owned, but always available so no copying is needed... for reasons.) @@ +21,5 @@ > + * keep them alive until associated JS compilation is complete. > + * 3) Code that calls |SourceText::take{Chars,Units}()| must keep the source > + * units alive until JS compilation completes. Normally only the JS engine > + * should call |SourceText::take{Chars,Units}()|. > + * 4) Use the appropriate SourceText parametrization depending on the source *parameterization @@ +32,2 @@ > * if (!chars) { > * return false; This may be a comment, but probably should include JS_ReportOutOfMemory @@ +148,2 @@ > > + static const CharT nonNullChar = '\0'; Something along the lines of |static const CharT emptyString[] = { '\0' };| @@ +151,5 @@ > // Initialize all fields *before* checking length. This ensures that > + // if |ownership == SourceOwnership::TakeOwnership|, |units| will be > + // freed when |this|'s destructor is called. > + if (units) { > + units_ = reinterpret_cast<const Unit*>(units); const_cast @@ +200,3 @@ > } > > + const CharT* get() const { return reinterpret_cast<const CharT*>(units_); } Add a short comment suggesting get() is used when doing string-related operations, and units() is for the raw storage. Or some sort of hints of the subtlety here.
Attachment #9022354 - Flags: review?(tcampbell) → review+
Pushed by jwalden@mit.edu: https://hg.mozilla.org/integration/mozilla-inbound/rev/56eaf6c976d3 Rename SourceBufferHolder to SourceText, and add a <typename Unit> template parameter to it so it can hold putative UTF-8 or UTF-16 source text. r=tcampbell, r=fitzgen
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: