Closed Bug 770215 Opened 12 years ago Closed 12 years ago

[OS.File] Utilities for strings

Categories

(Core :: Networking: File, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla17

People

(Reporter: Yoric, Assigned: Yoric)

References

Details

Attachments

(4 files, 14 obsolete files)

6.70 KB, patch
Yoric
: review+
Details | Diff | Splinter Review
4.55 KB, patch
Yoric
: review+
Details | Diff | Splinter Review
1.54 KB, patch
Yoric
: review+
Details | Diff | Splinter Review
3.01 KB, patch
Yoric
: review+
Details | Diff | Splinter Review
We need the following features:
- convert a js string to a \0-terminated jschar*, getting the length of that jschar* (required for writing strings under Windows);
- convert a js string to a \0-terminated char* in a given encoding, getting the length of that char* (required for writing strings under Unix);
- convert a \0-terminated char* in a given encoding to a js string (required for reading a complete file under Unix);
- convert a \0-terminated jschar* to a js string (required for reading a complete file under Windows);
- convert a char* with length in a given encoding to a js string (required for reading a chunk of a file under Unix);
- convert a jschar* with length in to a js string (required for reading a chunk of a file under Windows).

We might not need reading chunks for the time being.
Attached patch Companion testsuite (obsolete) — Splinter Review
Assignee: nobody → dteller
Attached patch Native utilities (obsolete) — Splinter Review
Here is a first prototype.
Mike, could you tell me what you think of this implementation?
Attachment #638991 - Flags: feedback?(mh+mozilla)
Attached patch JS bindings (obsolete) — Splinter Review
Attachment #638990 - Attachment is obsolete: true
Attachment #638991 - Flags: feedback?(mh+mozilla) → review?(taras.mozilla)
I wish to add that we need these utilities either as part of OS.File itself or for any user of OS.File who wants to read strings from or write strings to a file (e.g session store).
Attachment #638989 - Flags: review?(taras.mozilla)
Attachment #639064 - Flags: review?(taras.mozilla)
Comment on attachment 639064 [details] [diff] [review]
JS bindings

      exports.OS.Shared.declareFFI = declareFFI;
 
 
+
      /**

really? need even more whitespace? 

:)

+// Logistics
again. whitespace.

I'm not sure i'm the best person for this. jorendorff is a better for ctypes utility apis like this
Attachment #639064 - Flags: review?(taras.mozilla)
Attachment #639064 - Flags: review?(jorendorff)
Attachment #638991 - Flags: review?(taras.mozilla) → review?(jorendorff)
Comment on attachment 638989 [details] [diff] [review]
Companion testsuite

ok by me
Attachment #638989 - Flags: review?(taras.mozilla) → review+
(In reply to Taras Glek (:taras) from comment #6)
> Comment on attachment 639064 [details] [diff] [review]
> JS bindings
> 
>       exports.OS.Shared.declareFFI = declareFFI;
>  
>  
> +
>       /**
> 
> really? need even more whitespace? 
> 
> :)
> 
> +// Logistics
> again. whitespace.
> 
> I'm not sure i'm the best person for this. jorendorff is a better for ctypes
> utility apis like this

I actually asked jorendorff first and he declined reviewing these patches.
Comment on attachment 638991 [details] [diff] [review]
Native utilities

Perhaps Nathan would like to review this?
Attachment #638991 - Flags: review?(nfroyd)
Comment on attachment 639064 [details] [diff] [review]
JS bindings

Review of attachment 639064 [details] [diff] [review]:
-----------------------------------------------------------------

There's some DOM efforts to expose real string encoding/decoding primitives going on; you may want to look at those.  I don't have the bug #s immediately available.  Might be some overlap, might not.

::: toolkit/components/osfile/osfile_shared.jsm
@@ +508,5 @@
>       };
>       exports.OS.Shared.declareFFI = declareFFI;
>  
>  
> +

Collapse whitespace, please.

@@ +519,5 @@
> +     _aux.initUnicode = function initUnicode(free, strdup) {
> +       let libxul = ctypes.open(exports.OS.Constants.Path.libxul);
> +       exports.OS.Shared.libxul = libxul;
> +
> +// Logistics

Nit: indentation.

@@ +530,5 @@
> +         /*return*/ Types.void_t.implementation,
> +         /*decoder*/Types.udecoder.in_ptr.implementation);
> +
> +// Public functions
> +       let getEncoder = declareFFI(libxul, "osfile_GetEncoder",

Enlightenment question: why use declareFFI for these functions and libxul.declare for the logistics functions?  Consistency would be betteer, but if there some subtle reason lurking, then leave it as-is.

@@ +532,5 @@
> +
> +// Public functions
> +       let getEncoder = declareFFI(libxul, "osfile_GetEncoder",
> +         ctypes.default_abi,
> +         /*return*/ Types.uencoder.out_ptr.releaseWith(releaseEncoder),

Oooo, I like.

@@ +540,5 @@
> +         /*return*/ Types.udecoder.out_ptr.releaseWith(releaseDecoder),
> +         /*charset*/Types.char.in_ptr);
> +       let encode = declareFFI(libxul, "osfile_Encode",
> +        ctypes.default_abi,
> +         /*return*/     Types.jschar.out_ptr.releaseWith(free),

I think there is a little too much copy-paste going on here.  Please double-check that encode/decode match up with the declarations in the header file.

@@ +574,5 @@
> +          * the number of bytes in the output.
> +          * @return {CData} A C string in the given encoding,
> +          * or |null| in case of error.
> +          */
> +         encodeAll:  function encodeAll(encoder, string, outBytes) {

I am skeptical about providing outBytes as a parameter; it seems like it'd be more useful as a return value.  Then again, I'm not sure that it'd be really useful to the caller, either.  (It would be useful for incremental encoding, but that's not supported here.)  Have you written code calling encodeAll?  Have you used the outBytes parameter?

Actually, do we even need getEncoder/encodeAll pairings?  Why not just provide:

  encode(encoding-name, js-string) => c-string

and hide all the details?

@@ +587,5 @@
> +          * @param {CData} bytes An out pointer to a int32_t representing
> +          * the number of chars in the output.
> +          * @return {string} A JS string or |null| in case of error.
> +          */
> +         decodeAll:  function decodeAll(decoder, cstring, outChars) {

I don't see the usefulness of outChars here, since we can get the length directly from JS.  Ergo, we should probably change the C API to not require this parameter to decode().  Doing so also means that we can eliminate the confused definition of the bytes parameter in the docstring here. :)

@@ +614,5 @@
> +          * This value will be automatically garbage-collected once it is
> +          * not referenced anymore.
> +          */
> +         exportWString: function exportWString(string) {
> +           return strdup(string);

Pretty sure using strdup to copy JavaScript strings is going to end poorly due to confusion over 8-bit versus 16-bit characters.

::: toolkit/components/osfile/osfile_unix_back.jsm
@@ +389,5 @@
>  
> +       // Initialize Unicode conversion
> +       let wstrdup = declareFFI("strdup", ctypes.default_abi,
> +                               /*return*/ Types.jschar.out_ptr,
> +                               /*str*/    Types.jschar.in_ptr.releaseWith(UnixFile.free));

This definition looks confused about what to name the function and what function it actually wants to call.

::: toolkit/components/osfile/osfile_win_back.jsm
@@ +232,5 @@
>  
> +       // Initialize Unicode conversion
> +       let wstrdup = declareFFI("StrDup", ctypes.default_abi,
> +                     /*return*/ Types.char.out_ptr,
> +                     /*str*/    Types.char.in_ptr.releaseWith(WinFile.free));

Same comments from the Unix backend apply here.
Attachment #639064 - Flags: review?(nfroyd) → review-
Comment on attachment 638991 [details] [diff] [review]
Native utilities

Review of attachment 638991 [details] [diff] [review]:
-----------------------------------------------------------------

The FIXMEs wrt SetLastError/errno need to be corrected.  SetLastError/errno should be set at all returns (e.g. failing to get nsICharsetConverterManager), too.  Preferably so that you're not constantly saying #if defined(XP_WIN).

I think this looks fairly sane, but after looking at the JS bits, I think we may want to modify the C API.

::: toolkit/components/osfile/osfileutils.cpp
@@ +17,5 @@
> +MOZ_EXPORT_API(void) osfile_ReleaseDecoder(nsIUnicodeDecoder* decoder)
> +{
> +  if (!decoder) {
> +    return;
> +  }

You don't need this check with NS_IF_RELEASE.

@@ +25,5 @@
> +MOZ_EXPORT_API(void) osfile_ReleaseEncoder(nsIUnicodeEncoder* encoder)
> +{
> +  if (!encoder) {
> +    return;
> +  }

Likewise.

@@ +29,5 @@
> +  }
> +  NS_IF_RELEASE(encoder);
> +}
> +
> +MOZ_EXPORT_API(nsIUnicodeEncoder*) osfile_GetEncoder(const char* encoding)

Shared template functions for osfile_Get{Encoder,Decoder} and osfile_{Encode,Decode} would be a worthwhile change, I think.

::: toolkit/components/osfile/osfileutils.h
@@ +39,5 @@
> +
> +/**
> + * Decode a sequence of bytes into a Unicode string.
> + *
> + * @param aDecode The decoder to use.

"aDecoder"

@@ +45,5 @@
> + *   end of the output.
> + * @param aSource The sequence of bytes to decode.
> + * @param aSourceBytes (in) The number of bytes to decode. If NULL,
> + *   the number of bytes will be |strlen(aSource)|.
> + * @param aSourceBytes (out) The number of bytes effectively decoded.

You should label aSourceBytes as inout.

@@ +47,5 @@
> + * @param aSourceBytes (in) The number of bytes to decode. If NULL,
> + *   the number of bytes will be |strlen(aSource)|.
> + * @param aSourceBytes (out) The number of bytes effectively decoded.
> + * @param aOutChars (out) The number of Unicode chars in the output,
> + *   not including the nul char.

This parameter should have a more descriptive name; I was puzzled why it was "aOutChars" with type int32_t*.  Maybe aCharsWritten?

@@ +58,5 @@
> +   int32_t* aOutChars);
> +/**
> + * Encode a complete Unicode string into a set of bytes.
> + *
> + * @return null in case of error, otherwise a nul-terminated string.

Needs more documentation.

@@ +62,5 @@
> + * @return null in case of error, otherwise a nul-terminated string.
> + */
> +MOZ_EXPORT_API(char*) osfile_Encode(nsIUnicodeEncoder* aEncoder,
> +   bool aAppendNulToOutput, const PRUnichar* aSource, int32_t* aSourceChars,
> +   int32_t* aOutBytes);

Same comments about naming.
Attachment #638991 - Flags: review?(nfroyd) → review-
(In reply to Nathan Froyd (:froydnj) from comment #10)
> Comment on attachment 639064 [details] [diff] [review]
> JS bindings
> 
> Review of attachment 639064 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> There's some DOM efforts to expose real string encoding/decoding primitives
> going on; you may want to look at those.  I don't have the bug #s
> immediately available.  Might be some overlap, might not.

I have heard rumours about that. However, until these primitives are available (and better than that, available to all threads), I will need a workaround.

> 
> ::: toolkit/components/osfile/osfile_shared.jsm
> @@ +508,5 @@
> >       };
> >       exports.OS.Shared.declareFFI = declareFFI;
> >  
> >  
> > +
> 
> Collapse whitespace, please.


Will do.

> 
> @@ +519,5 @@
> > +     _aux.initUnicode = function initUnicode(free, strdup) {
> > +       let libxul = ctypes.open(exports.OS.Constants.Path.libxul);
> > +       exports.OS.Shared.libxul = libxul;
> > +
> > +// Logistics
> 
> Nit: indentation.

Will do.

> 
> @@ +530,5 @@
> > +         /*return*/ Types.void_t.implementation,
> > +         /*decoder*/Types.udecoder.in_ptr.implementation);
> > +
> > +// Public functions
> > +       let getEncoder = declareFFI(libxul, "osfile_GetEncoder",
> 
> Enlightenment question: why use declareFFI for these functions and
> libxul.declare for the logistics functions?  Consistency would be betteer,
> but if there some subtle reason lurking, then leave it as-is.

There are places at which I have no choice for subtle reasons, but here, I could avoid this. I will fix that.

> 
> @@ +532,5 @@
> > +
> > +// Public functions
> > +       let getEncoder = declareFFI(libxul, "osfile_GetEncoder",
> > +         ctypes.default_abi,
> > +         /*return*/ Types.uencoder.out_ptr.releaseWith(releaseEncoder),
> 
> Oooo, I like.

:)

> 
> @@ +540,5 @@
> > +         /*return*/ Types.udecoder.out_ptr.releaseWith(releaseDecoder),
> > +         /*charset*/Types.char.in_ptr);
> > +       let encode = declareFFI(libxul, "osfile_Encode",
> > +        ctypes.default_abi,
> > +         /*return*/     Types.jschar.out_ptr.releaseWith(free),
> 
> I think there is a little too much copy-paste going on here.  Please
> double-check that encode/decode match up with the declarations in the header
> file.

Well-spotted, thanks.

> 
> @@ +574,5 @@
> > +          * the number of bytes in the output.
> > +          * @return {CData} A C string in the given encoding,
> > +          * or |null| in case of error.
> > +          */
> > +         encodeAll:  function encodeAll(encoder, string, outBytes) {
> 
> I am skeptical about providing outBytes as a parameter; it seems like it'd
> be more useful as a return value.  Then again, I'm not sure that it'd be
> really useful to the caller, either.  (It would be useful for incremental
> encoding, but that's not supported here.)  Have you written code calling
> encodeAll?  Have you used the outBytes parameter?

Actually, I have examples, but you are right, they are probably too premature. Let's forget about them atm.

> Actually, do we even need getEncoder/encodeAll pairings?  Why not just
> provide:
> 
>   encode(encoding-name, js-string) => c-string
> 
> and hide all the details?

The main reason is symmetry: with an API that allocates the string, decoders can be discarded after use, but encoders are stateful regardless, so I cannot provide

  decode(encoding-name, c-string) => js-string

If you think it wiser, I can provide asymmetric APIs for encoding and decoding. This will be a little bit slower (due to no reuse of the encoder), but possibly not by much.

> 
> @@ +587,5 @@
> > +          * @param {CData} bytes An out pointer to a int32_t representing
> > +          * the number of chars in the output.
> > +          * @return {string} A JS string or |null| in case of error.
> > +          */
> > +         decodeAll:  function decodeAll(decoder, cstring, outChars) {
> 
> I don't see the usefulness of outChars here, since we can get the length
> directly from JS.

This was mostly for symmetry, too. You are right, let's forget about it.

> Ergo, we should probably change the C API to not require
> this parameter to decode().  Doing so also means that we can eliminate the
> confused definition of the bytes parameter in the docstring here. :)

:)

> 
> @@ +614,5 @@
> > +          * This value will be automatically garbage-collected once it is
> > +          * not referenced anymore.
> > +          */
> > +         exportWString: function exportWString(string) {
> > +           return strdup(string);
> 
> Pretty sure using strdup to copy JavaScript strings is going to end poorly
> due to confusion over 8-bit versus 16-bit characters.

You are right, I should change the name.
... plus as you point further down, I should also change the function I pass as argument.

> 
> ::: toolkit/components/osfile/osfile_unix_back.jsm
> @@ +389,5 @@
> >  
> > +       // Initialize Unicode conversion
> > +       let wstrdup = declareFFI("strdup", ctypes.default_abi,
> > +                               /*return*/ Types.jschar.out_ptr,
> > +                               /*str*/    Types.jschar.in_ptr.releaseWith(UnixFile.free));
> 
> This definition looks confused about what to name the function and what
> function it actually wants to call.

Right, this should have been wcpcpy on both sides. Thanks for the spot.

> 
> ::: toolkit/components/osfile/osfile_win_back.jsm
> @@ +232,5 @@
> >  
> > +       // Initialize Unicode conversion
> > +       let wstrdup = declareFFI("StrDup", ctypes.default_abi,
> > +                     /*return*/ Types.char.out_ptr,
> > +                     /*str*/    Types.char.in_ptr.releaseWith(WinFile.free));
> 
> Same comments from the Unix backend apply here.

Right, this should have been StrDupW and jschar instead of char. Thanks for the spot.
(In reply to Nathan Froyd (:froydnj) from comment #11)
> Comment on attachment 638991 [details] [diff] [review]
> Native utilities
> 
> Review of attachment 638991 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> The FIXMEs wrt SetLastError/errno need to be corrected.  SetLastError/errno
> should be set at all returns (e.g. failing to get
> nsICharsetConverterManager), too.  Preferably so that you're not constantly
> saying #if defined(XP_WIN).

Will do.

> I think this looks fairly sane, but after looking at the JS bits, I think we
> may want to modify the C API.

You mean for removing the outbound information? We can certainly remove it with the current set of features exposed by JS. I will readd what is missing once we start working on streams (which require more info).

> 
> ::: toolkit/components/osfile/osfileutils.cpp
> @@ +17,5 @@
> > +MOZ_EXPORT_API(void) osfile_ReleaseDecoder(nsIUnicodeDecoder* decoder)
> > +{
> > +  if (!decoder) {
> > +    return;
> > +  }
> 
> You don't need this check with NS_IF_RELEASE.

Ah, good to know.

> 
> @@ +25,5 @@
> > +MOZ_EXPORT_API(void) osfile_ReleaseEncoder(nsIUnicodeEncoder* encoder)
> > +{
> > +  if (!encoder) {
> > +    return;
> > +  }
> 
> Likewise.
> 
> @@ +29,5 @@
> > +  }
> > +  NS_IF_RELEASE(encoder);
> > +}
> > +
> > +MOZ_EXPORT_API(nsIUnicodeEncoder*) osfile_GetEncoder(const char* encoding)
> 
> Shared template functions for osfile_Get{Encoder,Decoder} and
> osfile_{Encode,Decode} would be a worthwhile change, I think.

Ok, will do.

> 
> ::: toolkit/components/osfile/osfileutils.h
> @@ +39,5 @@
> > +
> > +/**
> > + * Decode a sequence of bytes into a Unicode string.
> > + *
> > + * @param aDecode The decoder to use.
> 
> "aDecoder"

Well spotted :)

> 
> @@ +45,5 @@
> > + *   end of the output.
> > + * @param aSource The sequence of bytes to decode.
> > + * @param aSourceBytes (in) The number of bytes to decode. If NULL,
> > + *   the number of bytes will be |strlen(aSource)|.
> > + * @param aSourceBytes (out) The number of bytes effectively decoded.
> 
> You should label aSourceBytes as inout.

Thanks.

> 
> @@ +47,5 @@
> > + * @param aSourceBytes (in) The number of bytes to decode. If NULL,
> > + *   the number of bytes will be |strlen(aSource)|.
> > + * @param aSourceBytes (out) The number of bytes effectively decoded.
> > + * @param aOutChars (out) The number of Unicode chars in the output,
> > + *   not including the nul char.
> 
> This parameter should have a more descriptive name; I was puzzled why it was
> "aOutChars" with type int32_t*.  Maybe aCharsWritten?

Thanks for the suggestion, I'll see how that works out.

> 
> @@ +58,5 @@
> > +   int32_t* aOutChars);
> > +/**
> > + * Encode a complete Unicode string into a set of bytes.
> > + *
> > + * @return null in case of error, otherwise a nul-terminated string.
> 
> Needs more documentation.

Ok.

> 
> @@ +62,5 @@
> > + * @return null in case of error, otherwise a nul-terminated string.
> > + */
> > +MOZ_EXPORT_API(char*) osfile_Encode(nsIUnicodeEncoder* aEncoder,
> > +   bool aAppendNulToOutput, const PRUnichar* aSource, int32_t* aSourceChars,
> > +   int32_t* aOutBytes);
> 
> Same comments about naming.

Ok.
(In reply to David Rajchenbach Teller [:Yoric] from comment #12)
> > There's some DOM efforts to expose real string encoding/decoding primitives
> > going on; you may want to look at those.  I don't have the bug #s
> > immediately available.  Might be some overlap, might not.
> 
> I have heard rumours about that. However, until these primitives are
> available (and better than that, available to all threads), I will need a
> workaround.

Indeed.

> > I am skeptical about providing outBytes as a parameter; it seems like it'd
> > be more useful as a return value.  Then again, I'm not sure that it'd be
> > really useful to the caller, either.  (It would be useful for incremental
> > encoding, but that's not supported here.)  Have you written code calling
> > encodeAll?  Have you used the outBytes parameter?
> 
> Actually, I have examples, but you are right, they are probably too
> premature. Let's forget about them atm.
> 
> > Actually, do we even need getEncoder/encodeAll pairings?  Why not just
> > provide:
> > 
> >   encode(encoding-name, js-string) => c-string
> > 
> > and hide all the details?
> 
> The main reason is symmetry: with an API that allocates the string, decoders
> can be discarded after use, but encoders are stateful regardless, so I
> cannot provide
> 
>   decode(encoding-name, c-string) => js-string
> 
> If you think it wiser, I can provide asymmetric APIs for encoding and
> decoding. This will be a little bit slower (due to no reuse of the encoder),
> but possibly not by much.

I am a fan of symmetry, but I do not understand the justification provided here.  I do not see why providing the method you describe above is any different than:

  let d = getDecoder(encoding-name);
  return decodeAll(d, c-string, /*XXX param to be removed*/null);

I am also not seeing what the statefulness argument re: encoders and decoders has to do with providing an easy API.  Could you please elaborate?

I agree that it would be nice to reuse encoders and decoders when possible.  But you provide no way to get at nsIUnicode{Encoder,Decoder}.Reset, so arguing for reuse seems like a non-starter.  (If you are going to argue for efficiency, then pushing for nsIUnicode{Encoder,Decoder} to be more fully exposed through this API is the way to go; the big win from that is incremental encoding/decoding of buffers.  But it's not clear that we need that at the moment.)
(In reply to David Rajchenbach Teller [:Yoric] from comment #13)
> > I think this looks fairly sane, but after looking at the JS bits, I think we
> > may want to modify the C API.
> 
> You mean for removing the outbound information? We can certainly remove it
> with the current set of features exposed by JS. I will readd what is missing
> once we start working on streams (which require more info).

Yes, removing the outbound information.  If it's actually going to be needed later, then I'd be OK with leaving it in.  But I don't see a good reason to expose the outbound parameters all the way up through JS with an API like {encode,decode}All.  Those two functions should just do the translation and sweep whatever messy details the C API requires under the rug.

IMHO, there should be a separate API for doing incremental, stream-type encoding/decoding.
(In reply to Nathan Froyd (:froydnj) from comment #15)
> IMHO, there should be a separate API for doing incremental, stream-type
> encoding/decoding.

Clarification: separate methods for doing incremental, stream-type encoding/decoding in OS.File alongside {encode,decode}All or whatever we wind up calling them.
Attachment #638991 - Flags: review?(jorendorff)
Attachment #639064 - Flags: review?(jorendorff)
(In reply to Nathan Froyd (:froydnj) from comment #16)
> (In reply to Nathan Froyd (:froydnj) from comment #15)
> > IMHO, there should be a separate API for doing incremental, stream-type
> > encoding/decoding.
> 
> Clarification: separate methods for doing incremental, stream-type
> encoding/decoding in OS.File alongside {encode,decode}All or whatever we
> wind up calling them.

Yes, that's what I had in mind.
(In reply to Nathan Froyd (:froydnj) from comment #14)
> > > Actually, do we even need getEncoder/encodeAll pairings?  Why not just
> > > provide:
> > > 
> > >   encode(encoding-name, js-string) => c-string
> > > 
> > > and hide all the details?
> > 
> > The main reason is symmetry: with an API that allocates the string, decoders
> > can be discarded after use, but encoders are stateful regardless, so I
> > cannot provide
> > 
> >   decode(encoding-name, c-string) => js-string
> > 
> > If you think it wiser, I can provide asymmetric APIs for encoding and
> > decoding. This will be a little bit slower (due to no reuse of the encoder),
> > but possibly not by much.
> 
> I am a fan of symmetry, but I do not understand the justification provided
> here.  I do not see why providing the method you describe above is any
> different than:
> 
>   let d = getDecoder(encoding-name);
>   return decodeAll(d, c-string, /*XXX param to be removed*/null);
>
> 
> I am also not seeing what the statefulness argument re: encoders and
> decoders has to do with providing an easy API.  Could you please elaborate?

The problem appears if c-string ends with a partial sequence. In this case, we need to reuse the same decoder with the next sequence.

I grant you that it is unlikely that we end up with a nul-terminated partial sequence, so we can probably forget about this argument.

> I agree that it would be nice to reuse encoders and decoders when possible. 
> But you provide no way to get at nsIUnicode{Encoder,Decoder}.Reset, so
> arguing for reuse seems like a non-starter.  (If you are going to argue for
> efficiency, then pushing for nsIUnicode{Encoder,Decoder} to be more fully
> exposed through this API is the way to go; the big win from that is
> incremental encoding/decoding of buffers.  But it's not clear that we need
> that at the moment.)

The idea was indeed to progressively publish more of nsIUnicode{Encoder,Decoder}, to allow progressive encoding/decoding of buffers/streams.
Attached patch Native utilities (obsolete) — Splinter Review
Attachment #638991 - Attachment is obsolete: true
Attachment #642941 - Flags: review?(nfroyd)
Attached patch JS bindings v2 (obsolete) — Splinter Review
Here is a simplified version of the JS bindings. Until we decide whether to simplify it further.
Attachment #639064 - Attachment is obsolete: true
Attachment #642942 - Flags: review?(nfroyd)
(In reply to David Rajchenbach Teller (away until early August) [:Yoric] from comment #18)
> > I am a fan of symmetry, but I do not understand the justification provided
> > here.  I do not see why providing the method you describe above is any
> > different than:
> > 
> >   let d = getDecoder(encoding-name);
> >   return decodeAll(d, c-string, /*XXX param to be removed*/null);
> >
> > 
> > I am also not seeing what the statefulness argument re: encoders and
> > decoders has to do with providing an easy API.  Could you please elaborate?
> 
> The problem appears if c-string ends with a partial sequence. In this case,
> we need to reuse the same decoder with the next sequence.

Ah.  That is a good point and definitely a case we should handle.  But I think that decodeAll should not be obligated to deal with this sort of thing; the caller should be responsible for providing fully-formed input.  (A more incremental API should deal properly with this case.)

In any event, I don't think the scenario you describe works with the current C support, since when Convert() fails, you just return nsnull rather than the partially decoded sequence.

> > I agree that it would be nice to reuse encoders and decoders when possible. 
> > But you provide no way to get at nsIUnicode{Encoder,Decoder}.Reset, so
> > arguing for reuse seems like a non-starter.  (If you are going to argue for
> > efficiency, then pushing for nsIUnicode{Encoder,Decoder} to be more fully
> > exposed through this API is the way to go; the big win from that is
> > incremental encoding/decoding of buffers.  But it's not clear that we need
> > that at the moment.)
> 
> The idea was indeed to progressively publish more of
> nsIUnicode{Encoder,Decoder}, to allow progressive encoding/decoding of
> buffers/streams.

Very good.
Actually, there is something wrong brewing here.

For most encodings, applying |strlen| to a c buffer is absolutely useless. Rather, we need to return the length of the c buffer in chars from EncodeAll and take this length as argument in DecodeAll.

My bad for only testing with utf-8.
(In reply to David Rajchenbach Teller (away until early August) [:Yoric] from comment #22)
> For most encodings, applying |strlen| to a c buffer is absolutely useless.
> Rather, we need to return the length of the c buffer in chars from EncodeAll
> and take this length as argument in DecodeAll.

Mmm, yes; strlen on a UTF-16 buffer is not going to end well.  My bad for not noticing that >8-bit encodings do need lengths of buffers returned and I shouldn't have complained about encodeAll.

Do you want to cancel the r? flags, then?
Attached patch Native utilities, v2 (obsolete) — Splinter Review
Here is a new version that does not expose encoders/decoders yet, but exposes number of bytes.

As discussed previously, this means that the only way to read an encoded file is to first load it completely to memory, then convert everything to a string. Probably not a major deal for the scope intended (e.g. configuration files), since we are going to read them off-main thread.
Attachment #642941 - Attachment is obsolete: true
Attachment #642941 - Flags: review?(nfroyd)
Attachment #643414 - Flags: review?(nfroyd)
Attached patch JS bindings v3 (obsolete) — Splinter Review
Attachment #642942 - Attachment is obsolete: true
Attachment #642942 - Flags: review?(nfroyd)
Attachment #643415 - Flags: review?(nfroyd)
Attached patch Companion testsuite (obsolete) — Splinter Review
Attachment #638989 - Attachment is obsolete: true
Attachment #643416 - Flags: review?(nfroyd)
Comment on attachment 642941 [details] [diff] [review]
Native utilities

Review of attachment 642941 [details] [diff] [review]:
-----------------------------------------------------------------

Works for me.  I think I'd like to see the EncodeAll length outparam changes and the osfile_ns_alloc question answered, though.

::: toolkit/components/osfile/osfileutils.cpp
@@ +212,5 @@
> +  // Convert, add trailing \0
> +
> +  rv = aDecoder->Convert(aSource, &srcBytes, dest.rwget(), &upperBoundChars);
> +  if (NS_FAILED(rv)) {
> +    error_wrong_sequence();

Nit: I think error_invalid_argument is more correct here.  Also in osfile_EncodeAll.  Then error_wrong_sequence can go away.

@@ +234,5 @@
> +  int32_t upperBoundBytes;
> +  nsresult rv = aEncoder->GetMaxLength(aSource, srcChars, &upperBoundBytes);
> +  if (NS_FAILED(rv)) {
> +    error_unexpected();
> +    return nsnull;

FWIW, GetMaxLength is specified to never fail (!).  I think it'd be OK to turn this into an NS_ASSERTION (or the MOZ flavor-of-the-rewrite equivalent) that NS_SUCCEEDED(rv).  You should also initialize upperBoundBytes properly, just in case.  Then you can get rid of error_unexpected.  Also in osfile_EncodeAll.  (See how much easier this would have been with template functions? :)

::: toolkit/components/osfile/osfileutils.h
@@ +8,5 @@
> +extern "C" {
> +
> +// Memory utilities
> +
> +MOZ_EXPORT_API(void*) osfile_ns_alloc(PRSize size);

This doesn't look used in the current bindings.  Is this another save-it-for-later sort of function?

@@ +60,5 @@
> + * @param aDecoder The decoder to use.
> + * @param aSource The C string to decode. Must be null-terminated.
> + *
> + * @return null in case of error, otherwise a sequence of Unicode chars,
> + *   representing |aSource|, optionally terminated with a nul-terminator.

The optionally terminated language here is no longer necessary.

@@ +61,5 @@
> + * @param aSource The C string to decode. Must be null-terminated.
> + *
> + * @return null in case of error, otherwise a sequence of Unicode chars,
> + *   representing |aSource|, optionally terminated with a nul-terminator.
> + *   This memory should be released with |free|.

Point of order: this is not what the JS code does.  Fix the docstring or the code, please.

@@ +75,5 @@
> + *
> + * @return null in case of error, otherwise a nul-terminated string.
> + */
> +MOZ_EXPORT_API(char*) osfile_EncodeAll(nsIUnicodeEncoder* aEncoder,
> +   const PRUnichar* aSource);

Later comments suggested that we should pass a length parameter out of here and out to JS.
Attachment #642941 - Attachment is obsolete: false
Attachment #642941 - Attachment is obsolete: true
Comment on attachment 643414 [details] [diff] [review]
Native utilities, v2

Review of attachment 643414 [details] [diff] [review]:
-----------------------------------------------------------------

Please see comment 27 for caveats; I think we mid-aired in adding/removing patches.
Attachment #643414 - Flags: review?(nfroyd) → review+
Comment on attachment 643415 [details] [diff] [review]
JS bindings v3

Review of attachment 643415 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.  I would like to see the encodeAll change before commit.

::: toolkit/components/osfile/osfile_shared.jsm
@@ +689,5 @@
> +     /**
> +      * A trimmed-down version of declareFFI that returns a raw js-ctypes
> +      * function, which does not translate its arguments.
> +      *
> +      * Used to declare finalizers.

I am rs'ing this part; I assume you know what you are doing here. :)

@@ +702,5 @@
> +       abi = abi || ctypes.default_abi;
> +       if (Object.prototype.toString.call(abi) != "[object CABI]") {
> +         // Note: This is the only known manner of checking whether an object
> +         // is an abi.
> +         throw new TypeError("declareFFI expects as second argument an abi or null");

"declareFinalizer expects..."

@@ +705,5 @@
> +         // is an abi.
> +         throw new TypeError("declareFFI expects as second argument an abi or null");
> +       }
> +       let signature = [symbol, abi];
> +       let argtypes  = [];

argtypes is unused.  Bug?

@@ +773,5 @@
> +        ctypes.default_abi,
> +         /*return*/     Types.char.out_ptr.releaseWith(NS_Free),
> +         /*encoding*/   Types.char.in_ptr,
> +         /*source*/     Types.jschar.in_ptr,
> +         /*bytes*/      Types.uint32_t.out_ptr);

We should try to make the caller's job easy, though; just because the C code returns outparams doesn't mean we have to.  Please make encodeAll return the allocated buffer and count as multiple values.  I am uncertain about whether [buffer, count] or { buffer: ..., count: ... } is preferred style here; I think I lean towards the former.

I realize this makes the API asymmetric.  I don't have good ideas as to how to make them symmetric.
Attachment #643415 - Flags: review?(nfroyd) → review+
Comment on attachment 643416 [details] [diff] [review]
Companion testsuite

Review of attachment 643416 [details] [diff] [review]:
-----------------------------------------------------------------

Taras already r+'d this, so this is essentially a rubberstamp.  I assume you can change the test appropriately if the API changes from the other patches. :)

::: toolkit/components/osfile/tests/mochi/worker_test_osfile_front.js
@@ +55,5 @@
> +  ok(true, "Starting test_unicode");
> +  function test_go_round(encoding, sentence)  {
> +    let bytes = new OS.Shared.Type.uint32_t.implementation();
> +    let pBytes = bytes.address();
> +    ok(true, "test_unicode: testing encoding of " + sentence + " with encoding " + encoding);

Please use info() here instead of ok().
Attachment #643416 - Flags: review?(nfroyd) → review+
Ok, this time I am sure that I have replied to your question but that the reply does not appear. Bugzilla seems to be eating my comments, so I will just post updated patches and reply later :)
Attached patch JS bindings v4 (obsolete) — Splinter Review
While I definitely agree that EncodeAll is not client-friendly, I intended to address the issue in a followup patch. The idea here is to provide the lower-level bindings and to provide in a second patch higher-level bindings without outparams and with exception-handling, etc.
Attachment #643415 - Attachment is obsolete: true
Attachment #644301 - Flags: review+
Attached patch Companion makefile (obsolete) — Splinter Review
(self-reviewing the makefile)
Attachment #644302 - Flags: review+
Attached patch Native utilities, v3 (obsolete) — Splinter Review
Applied feedback.
Attachment #643414 - Attachment is obsolete: true
Attachment #644303 - Flags: review+
(In reply to Nathan Froyd (:froydnj) from comment #30)
> Comment on attachment 643416 [details] [diff] [review]
> Companion testsuite
> 
> Review of attachment 643416 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Taras already r+'d this, so this is essentially a rubberstamp.  I assume you
> can change the test appropriately if the API changes from the other patches.
> :)
> 
> ::: toolkit/components/osfile/tests/mochi/worker_test_osfile_front.js
> @@ +55,5 @@
> > +  ok(true, "Starting test_unicode");
> > +  function test_go_round(encoding, sentence)  {
> > +    let bytes = new OS.Shared.Type.uint32_t.implementation();
> > +    let pBytes = bytes.address();
> > +    ok(true, "test_unicode: testing encoding of " + sentence + " with encoding " + encoding);
> 
> Please use info() here instead of ok().

Note that this is a worker, so I had to redefine |ok| myself, I/O does not always work (etc.), so if you do not mind, I will keep this for a followup bug.
Attached patch Companion testsuite (obsolete) — Splinter Review
Attachment #643416 - Attachment is obsolete: true
Attachment #644304 - Flags: review+
(In reply to David Rajchenbach Teller (away until early August) [:Yoric] from comment #35)
> Note that this is a worker, so I had to redefine |ok| myself, I/O does not
> always work (etc.), so if you do not mind, I will keep this for a followup
> bug.

I totally missed the worker bit.  I don't mind one way or the other, just a tidyness detail.

(In reply to David Rajchenbach Teller (away until early August) [:Yoric] from comment #32)
> While I definitely agree that EncodeAll is not client-friendly, I intended
> to address the issue in a followup patch. The idea here is to provide the
> lower-level bindings and to provide in a second patch higher-level bindings
> without outparams and with exception-handling, etc.

OK, that seems reasonable.  Presumably the low-level {en,de}codeAll will either no longer be available or have their API changed, then?
(In reply to Nathan Froyd (:froydnj) from comment #37)
> (In reply to David Rajchenbach Teller (away until early August) [:Yoric]
> from comment #32)
> > While I definitely agree that EncodeAll is not client-friendly, I intended
> > to address the issue in a followup patch. The idea here is to provide the
> > lower-level bindings and to provide in a second patch higher-level bindings
> > without outparams and with exception-handling, etc.
> 
> OK, that seems reasonable.  Presumably the low-level {en,de}codeAll will
> either no longer be available or have their API changed, then?

That's the idea. Haven't quite decided the specifics yet.
Attached patch Native utilities (obsolete) — Splinter Review
Attachment #644303 - Attachment is obsolete: true
Attachment #647159 - Flags: review+
Attached patch Native utilitiesSplinter Review
Attachment #647159 - Attachment is obsolete: true
Attachment #650868 - Flags: review+
Attached patch JS bindingsSplinter Review
Attachment #644301 - Attachment is obsolete: true
Attachment #650869 - Flags: review+
Attachment #644302 - Attachment is obsolete: true
Attachment #650870 - Flags: review+
Attachment #644304 - Attachment is obsolete: true
Attachment #650871 - Flags: review+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: