Closed Bug 1347224 Opened 7 years ago Closed 7 years ago

Expose fallible SetLength and fallible Assign of XPCOM strings to Rust

Categories

(Core :: XPCOM, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: hsivonen, Assigned: nika)

References

Details

Attachments

(2 files, 1 obsolete file)

My current code for bug 1261841 recreates in C++ using XPCOM strings the methods that in the pure-Rust encoding_rs return (infallibly-allocated) Cow<'a, str> or Cow<'a, [u8]>.

The existing C++ callers for the corresponding functionality generally want to perform SetLength() on the target nsAString or nsACString infallibly.

I could move this code to (Gecko-specific) Rust, if SetLength(uint32_t n, fallible) (on both nsACString and nsAString) and nsACString::Assign(const nsACString& s, fallible) were exposed to Rust.

Moving this code to Rust would have the benefit of being able to avoid frequent malloc for the encoder and decoder objects. (These are stack-allocatable in Rust, but the FFI exposes them only as boxed, because their size depends on non-C-like enums.)
(In reply to Henri Sivonen (:hsivonen) from comment #0)
> The existing C++ callers for the corresponding functionality generally want
> to perform SetLength() on the target nsAString or nsACString infallibly.

Oops. s/infallibly/fallibly/
(In reply to Henri Sivonen (:hsivonen) from comment #0)
> My current code for bug 1261841 recreates in C++ using XPCOM strings the
> methods that in the pure-Rust encoding_rs return (infallibly-allocated)
> Cow<'a, str> or Cow<'a, [u8]>.
> 
> The existing C++ callers for the corresponding functionality generally want
> to perform SetLength() on the target nsAString or nsACString infallibly.
> 
> I could move this code to (Gecko-specific) Rust, if SetLength(uint32_t n,
> fallible) (on both nsACString and nsAString) and nsACString::Assign(const
> nsACString& s, fallible) were exposed to Rust.
> 
> Moving this code to Rust would have the benefit of being able to avoid
> frequent malloc for the encoder and decoder objects. (These are
> stack-allocatable in Rust, but the FFI exposes them only as boxed, because
> their size depends on non-C-like enums.)

So what you're asking for is for the nsString bindings to expose setlength, and assign in fallible forms?

That shouldn't be too hard. I actually have a comment expressing that it might be nice to have in the rust bindings, but I didn't include it as I wasn't aware of any consumers. Adding the support would be quite easy.
Priority: -- → P2
This patch adds a series of fallible methods for the rust ns[C]String
bindings, as well as the `set_length` method, which is the same as the
`SetLength` method in C++. `set_length` is marked as unsafe.

The decision was made to make the fallible methods seperate from the
infallible methods, and to use seperate Rust->C++ bindings for each of
them, rather than only binding the fallible bindings, and unwrapping
them in rust-land. This is to try to match the C++ API as closely as
possible, and to ensure that the behavior matches.

MozReview-Commit-ID: FkSomkFUFGD
Attachment #8847201 - Flags: review?(nfroyd)
Assignee: nobody → michael
Comment on attachment 8847201 [details] [diff] [review]
Expose fallible methods on the rust ns[C]String bindings

Does this patch add the methods you were needing for your encoding_rs work?
Attachment #8847201 - Flags: feedback?(hsivonen)
Comment on attachment 8847201 [details] [diff] [review]
Expose fallible methods on the rust ns[C]String bindings

Review of attachment 8847201 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Michael Layzell [:mystor] from comment #4)
> Comment on attachment 8847201 [details] [diff] [review]
> Expose fallible methods on the rust ns[C]String bindings
> 
> Does this patch add the methods you were needing for your encoding_rs work?

For fallible SetLength(), yes. For assigning an nsACString to another, I'm not sure,, since it's not clear to me how magic the code is. (See inline.)

::: xpcom/rust/nsstring/src/lib.rs
@@ +536,5 @@
>          }
>      }
>  
> +    pub fn fallible_assign<T: AsRef<[u8]> + ?Sized>(&mut self, other: &T) -> Result<(), ()> {
> +        let s = nsCString::from(other.as_ref());

Does this end up sharing the underlying nsStringBuffer (instead of copying the slice) if possible?

::: xpcom/string/nsReadableUtils.cpp
@@ +1381,5 @@
>  }
>  
> +bool Gecko_FallibleAppendUTF16toCString(nsACString* aThis, const nsAString* aOther)
> +{
> +  AppendUTF16toUTF8(*aOther, *aThis, mozilla::fallible);

This seems to be missing "return".

@@ +1386,5 @@
> +}
> +
> +bool Gecko_FallibleAppendUTF8toString(nsAString* aThis, const nsACString* aOther)
> +{
> +  AppendUTF8toUTF16(*aOther, *aThis, mozilla::fallible);

This seems to be missing "return".
Comment on attachment 8847201 [details] [diff] [review]
Expose fallible methods on the rust ns[C]String bindings

Review of attachment 8847201 [details] [diff] [review]:
-----------------------------------------------------------------

Henri raises good points as well.

::: xpcom/rust/nsstring/src/lib.rs
@@ +538,5 @@
>  
> +    pub fn fallible_assign<T: AsRef<[u8]> + ?Sized>(&mut self, other: &T) -> Result<(), ()> {
> +        let s = nsCString::from(other.as_ref());
> +        unsafe {
> +            if Gecko_FallibleAssignCString(self, &*s) {

I have a slight preference for:

if unsafe { Gecko_FallibleAssignCString(...) } {
  Ok(())
} else {
  Err(())
}

here and elsewhere for scoping `unsafe` use more tightly.

@@ +593,5 @@
> +        }
> +    }
> +
> +    pub unsafe fn set_length(&mut self, len: u32) {
> +        Gecko_SetLengthCString(self, len);

What's the rationale behind making the entire function `unsafe` as opposed to scoping it to the call of `Gecko_SetLengthCString`?

@@ +597,5 @@
> +        Gecko_SetLengthCString(self, len);
> +    }
> +
> +    pub unsafe fn fallible_set_length(&mut self, len: u32) -> Result<(), ()> {
> +        if Gecko_FallibleSetLengthCString(self, len) {

Same question here.

@@ +755,5 @@
> +        }
> +    }
> +
> +    pub unsafe fn set_length(&mut self, len: u32) {
> +        Gecko_SetLengthString(self, len);

Same question as for the CString variants.

@@ +759,5 @@
> +        Gecko_SetLengthString(self, len);
> +    }
> +
> +    pub unsafe fn fallible_set_length(&mut self, len: u32) -> Result<(), ()> {
> +        if Gecko_FallibleSetLengthString(self, len) {

Same question as for the CString variants.
Attachment #8847201 - Flags: review?(nfroyd) → feedback+
This time I did some reworkings of how the trait dispatch of assign() and other methods work to allow for using the passed in ns[A][C]String object directly when it is provided.

This should work better in that scenario.

MozReview-Commit-ID: FkSomkFUFGD
Attachment #8847806 - Flags: review?(nfroyd)
Attachment #8847201 - Attachment is obsolete: true
Attachment #8847201 - Flags: feedback?(hsivonen)
Comment on attachment 8847806 [details] [diff] [review]
Expose fallible methods on the rust ns[C]String bindings

I'd like you to look over this version too - I think it should do what you would be looking for :).
Attachment #8847806 - Flags: feedback?(hsivonen)
(In reply to Nathan Froyd [:froydnj] from comment #6)
> @@ +593,5 @@
> > +        }
> > +    }
> > +
> > +    pub unsafe fn set_length(&mut self, len: u32) {
> > +        Gecko_SetLengthCString(self, len);
> 
> What's the rationale behind making the entire function `unsafe` as opposed
> to scoping it to the call of `Gecko_SetLengthCString`?

SetLength with a larger length than the old length exposes uninitialized memory for reading, which is unsafe from the Rust perspective.
Comment on attachment 8847806 [details] [diff] [review]
Expose fallible methods on the rust ns[C]String bindings

Review of attachment 8847806 [details] [diff] [review]:
-----------------------------------------------------------------

This version appears to meet my needs. Thank you!
Attachment #8847806 - Flags: feedback?(hsivonen) → feedback+
Comment on attachment 8847806 [details] [diff] [review]
Expose fallible methods on the rust ns[C]String bindings

Review of attachment 8847806 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the explanations.  Assuming my understanding below is correct, I think this patch is good to go.

::: xpcom/rust/nsstring/src/lib.rs
@@ +708,4 @@
>  }
>  
>  impl nsACString {
> +    pub fn assign_utf16<T: nsStringLike + ?Sized>(&mut self, other: &T) {

AIUI, the reason you did all the assign() changes is to address Henri's comment 5, correct?
Attachment #8847806 - Flags: review?(nfroyd) → review+
(In reply to Nathan Froyd [:froydnj] from comment #11)
> AIUI, the reason you did all the assign() changes is to address Henri's
> comment 5, correct?

Yes, the reason for the assign() changes was that I needed to change the trait bounds to a new trait (ns[C]StringLike) which would allow me to recover the underlying nsA[C]String reference if avaliable, while still allowing objects like &[u8] or str which don't have an underlying nsA[C]String reference to be passed to assign and co.

Rewriting this logic ended up requiring a lot of changes to the assign methods. I took the opportunity of needing to rewrite most of it to also move the common methods into the macro to avoid code duplication.

Emilio, I can't land this patch directly into M-C as it removes the Gecko_Truncate[C]String methods (as the bindings now generate calls to SetLength(0) directly). This means that once this patch is landed, the version of nsstring in servo (nsstring_vendor) will no longer compile correctly.

How would you recommend orchestrating the landing of this patch on both nsstring and nsstring_vendor to avoid this build breakage? I'm considering re-introducing the Gecko_Truncate[C]String method to land this patch, and then filing a follow-up to remove it once nsstring_vendor has been updated, but if there is a better option I'd like to hear about it :).
Flags: needinfo?(emilio+bugs)
(In reply to Michael Layzell [:mystor] from comment #12)
> How would you recommend orchestrating the landing of this patch on both
> nsstring and nsstring_vendor to avoid this build breakage? I'm considering
> re-introducing the Gecko_Truncate[C]String method to land this patch, and
> then filing a follow-up to remove it once nsstring_vendor has been updated,
> but if there is a better option I'd like to hear about it :).

That sounds doable. Alternative could be:

 * I pull your patches and revendor on the Servo side.
 * We wait for it to land, and autoland this right-after vcs-sync lands the servo commit.

Whatever you prefer is fine for me.
Flags: needinfo?(emilio+bugs)
Comment on attachment 8848629 [details] [diff] [review]
Part 2: Re-add Gecko_Truncate[C]String temporarially to avoid stylo breakage

Review of attachment 8848629 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/string/nsSubstring.cpp
@@ +427,5 @@
>  
> +// NOTE: These two methods, Gecko_TruncateString and Gecko_TruncateCString are
> +// not used by the nsstring bindings, but until the version in servo
> +// (nsstring_vendor) is udpated, they still need to be included in the binary to
> +// not break the tree. These will be removed in bug 1348398.

Do you think it would be more valuable to have the nsstring bindings mirror the operations on ns*String?  I understand that SetLength(0) is the same as Truncate, but we still have the latter...just as Length() == 0 is the same as IsEmpty(), but we prefer the latter for clarity.

The case for getting rid of these in the bindings is not necessarily straightforward, is all I'm saying.
Attachment #8848629 - Flags: review?(nfroyd) → review+
(In reply to Nathan Froyd [:froydnj] from comment #15)
> Do you think it would be more valuable to have the nsstring bindings mirror
> the operations on ns*String?  I understand that SetLength(0) is the same as
> Truncate, but we still have the latter...just as Length() == 0 is the same
> as IsEmpty(), but we prefer the latter for clarity.
> 
> The case for getting rid of these in the bindings is not necessarily
> straightforward, is all I'm saying.

So, the methods which are being defined here (in the `extern "C"` block) are only meant for internal use within the bindings. They are just a set of symbols which are used to link up the actual `SetLength` method to the `set_length` method.

When removing this binding, I am not actually removing the `truncate` method from nsA[C]String in rust-land, instead I am doing what the nsACString bindings do locally, which is call `set_length(0)` (assuming you passed no arguments to Truncate, which is the usual behavior). My intentions behind removing this binding was simply to reduce the amount of IPC surface. I figure that doing very simple work in rust-land is often worthwhile.

TL;DR: I am not removing Truncate from the rust side, I am only removing the binding, meaning that instead of the call looking like:
nsACString::truncate -> Gecko_TruncateCString -> nsACString::Truncate(0) -> nsACString::SetLength(0)
it looks like:
nsACString::truncate -> nsACString::set_length(0) -> Gecko_SetLengthCString(0) -> nsACString::SetLength(0)
Ah, indeed.  Proceed!
Pushed by michael@thelayzells.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/367073fab1aa
Part 1: Expose fallible methods on the rust ns[C]String bindings, r=froydnj
https://hg.mozilla.org/integration/mozilla-inbound/rev/c8862bd99750
Part 2: Re-add Gecko_Truncate[C]String temporarially to avoid stylo breakage, r=froydnj
https://hg.mozilla.org/mozilla-central/rev/367073fab1aa
https://hg.mozilla.org/mozilla-central/rev/c8862bd99750
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: