Last Comment Bug 770215 - [OS.File] Utilities for strings
: [OS.File] Utilities for strings
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Networking: File (show other bugs)
: unspecified
: All All
: -- normal with 1 vote (vote)
: mozilla17
Assigned To: David Rajchenbach-Teller [:Yoric] (please use "needinfo")
:
Mentors:
Depends on:
Blocks: 762867 771094 771927 771928 775950
  Show dependency treegraph
 
Reported: 2012-07-02 09:22 PDT by David Rajchenbach-Teller [:Yoric] (please use "needinfo")
Modified: 2012-08-11 19:56 PDT (History)
6 users (show)
ryanvm: in‑testsuite+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Companion testsuite (3.05 KB, patch)
2012-07-04 00:49 PDT, David Rajchenbach-Teller [:Yoric] (please use "needinfo")
taras.mozilla: review+
Details | Diff | Review
JS bindings (9.17 KB, patch)
2012-07-04 00:51 PDT, David Rajchenbach-Teller [:Yoric] (please use "needinfo")
no flags Details | Diff | Review
Native utilities (6.98 KB, patch)
2012-07-04 00:52 PDT, David Rajchenbach-Teller [:Yoric] (please use "needinfo")
nfroyd: review-
Details | Diff | Review
JS bindings (9.17 KB, patch)
2012-07-04 05:37 PDT, David Rajchenbach-Teller [:Yoric] (please use "needinfo")
nfroyd: review-
Details | Diff | Review
Native utilities (8.58 KB, patch)
2012-07-17 07:23 PDT, David Rajchenbach-Teller [:Yoric] (please use "needinfo")
no flags Details | Diff | Review
JS bindings v2 (7.48 KB, patch)
2012-07-17 07:28 PDT, David Rajchenbach-Teller [:Yoric] (please use "needinfo")
no flags Details | Diff | Review
Native utilities, v2 (7.79 KB, patch)
2012-07-18 09:26 PDT, David Rajchenbach-Teller [:Yoric] (please use "needinfo")
nfroyd: review+
Details | Diff | Review
JS bindings v3 (5.23 KB, patch)
2012-07-18 09:27 PDT, David Rajchenbach-Teller [:Yoric] (please use "needinfo")
nfroyd: review+
Details | Diff | Review
Companion testsuite (2.98 KB, patch)
2012-07-18 09:28 PDT, David Rajchenbach-Teller [:Yoric] (please use "needinfo")
nfroyd: review+
Details | Diff | Review
JS bindings v4 (6.05 KB, patch)
2012-07-20 06:56 PDT, David Rajchenbach-Teller [:Yoric] (please use "needinfo")
dteller: review+
Details | Diff | Review
Companion makefile (1.52 KB, patch)
2012-07-20 06:57 PDT, David Rajchenbach-Teller [:Yoric] (please use "needinfo")
dteller: review+
Details | Diff | Review
Native utilities, v3 (6.70 KB, patch)
2012-07-20 06:58 PDT, David Rajchenbach-Teller [:Yoric] (please use "needinfo")
dteller: review+
Details | Diff | Review
Companion testsuite (2.99 KB, patch)
2012-07-20 07:00 PDT, David Rajchenbach-Teller [:Yoric] (please use "needinfo")
dteller: review+
Details | Diff | Review
Native utilities (6.70 KB, patch)
2012-07-30 07:26 PDT, David Rajchenbach-Teller [:Yoric] (please use "needinfo")
dteller: review+
Details | Diff | Review
Native utilities (6.70 KB, patch)
2012-08-10 06:50 PDT, David Rajchenbach-Teller [:Yoric] (please use "needinfo")
dteller: review+
Details | Diff | Review
JS bindings (4.55 KB, patch)
2012-08-10 06:51 PDT, David Rajchenbach-Teller [:Yoric] (please use "needinfo")
dteller: review+
Details | Diff | Review
Companion makefile (1.54 KB, patch)
2012-08-10 06:52 PDT, David Rajchenbach-Teller [:Yoric] (please use "needinfo")
dteller: review+
Details | Diff | Review
Companion testsuite (3.01 KB, patch)
2012-08-10 06:53 PDT, David Rajchenbach-Teller [:Yoric] (please use "needinfo")
dteller: review+
Details | Diff | Review

Description David Rajchenbach-Teller [:Yoric] (please use "needinfo") 2012-07-02 09:22:16 PDT
We need the following features:
- convert a js string to a \0-terminated jschar*, getting the length of that jschar* (required for writing strings under Windows);
- convert a js string to a \0-terminated char* in a given encoding, getting the length of that char* (required for writing strings under Unix);
- convert a \0-terminated char* in a given encoding to a js string (required for reading a complete file under Unix);
- convert a \0-terminated jschar* to a js string (required for reading a complete file under Windows);
- convert a char* with length in a given encoding to a js string (required for reading a chunk of a file under Unix);
- convert a jschar* with length in to a js string (required for reading a chunk of a file under Windows).

We might not need reading chunks for the time being.
Comment 1 David Rajchenbach-Teller [:Yoric] (please use "needinfo") 2012-07-04 00:49:43 PDT
Created attachment 638989 [details] [diff] [review]
Companion testsuite
Comment 2 David Rajchenbach-Teller [:Yoric] (please use "needinfo") 2012-07-04 00:51:36 PDT
Created attachment 638990 [details] [diff] [review]
JS bindings
Comment 3 David Rajchenbach-Teller [:Yoric] (please use "needinfo") 2012-07-04 00:52:24 PDT
Created attachment 638991 [details] [diff] [review]
Native utilities

Here is a first prototype.
Mike, could you tell me what you think of this implementation?
Comment 4 David Rajchenbach-Teller [:Yoric] (please use "needinfo") 2012-07-04 05:37:40 PDT
Created attachment 639064 [details] [diff] [review]
JS bindings
Comment 5 David Rajchenbach-Teller [:Yoric] (please use "needinfo") 2012-07-09 13:46:25 PDT
I wish to add that we need these utilities either as part of OS.File itself or for any user of OS.File who wants to read strings from or write strings to a file (e.g session store).
Comment 6 (dormant account) 2012-07-10 15:13:37 PDT
Comment on attachment 639064 [details] [diff] [review]
JS bindings

      exports.OS.Shared.declareFFI = declareFFI;
 
 
+
      /**

really? need even more whitespace? 

:)

+// Logistics
again. whitespace.

I'm not sure i'm the best person for this. jorendorff is a better for ctypes utility apis like this
Comment 7 (dormant account) 2012-07-10 15:15:46 PDT
Comment on attachment 638989 [details] [diff] [review]
Companion testsuite

ok by me
Comment 8 David Rajchenbach-Teller [:Yoric] (please use "needinfo") 2012-07-11 09:47:56 PDT
(In reply to Taras Glek (:taras) from comment #6)
> Comment on attachment 639064 [details] [diff] [review]
> JS bindings
> 
>       exports.OS.Shared.declareFFI = declareFFI;
>  
>  
> +
>       /**
> 
> really? need even more whitespace? 
> 
> :)
> 
> +// Logistics
> again. whitespace.
> 
> I'm not sure i'm the best person for this. jorendorff is a better for ctypes
> utility apis like this

I actually asked jorendorff first and he declined reviewing these patches.
Comment 9 David Rajchenbach-Teller [:Yoric] (please use "needinfo") 2012-07-13 07:10:31 PDT
Comment on attachment 638991 [details] [diff] [review]
Native utilities

Perhaps Nathan would like to review this?
Comment 10 Nathan Froyd [:froydnj] 2012-07-13 09:11:20 PDT
Comment on attachment 639064 [details] [diff] [review]
JS bindings

Review of attachment 639064 [details] [diff] [review]:
-----------------------------------------------------------------

There's some DOM efforts to expose real string encoding/decoding primitives going on; you may want to look at those.  I don't have the bug #s immediately available.  Might be some overlap, might not.

::: toolkit/components/osfile/osfile_shared.jsm
@@ +508,5 @@
>       };
>       exports.OS.Shared.declareFFI = declareFFI;
>  
>  
> +

Collapse whitespace, please.

@@ +519,5 @@
> +     _aux.initUnicode = function initUnicode(free, strdup) {
> +       let libxul = ctypes.open(exports.OS.Constants.Path.libxul);
> +       exports.OS.Shared.libxul = libxul;
> +
> +// Logistics

Nit: indentation.

@@ +530,5 @@
> +         /*return*/ Types.void_t.implementation,
> +         /*decoder*/Types.udecoder.in_ptr.implementation);
> +
> +// Public functions
> +       let getEncoder = declareFFI(libxul, "osfile_GetEncoder",

Enlightenment question: why use declareFFI for these functions and libxul.declare for the logistics functions?  Consistency would be betteer, but if there some subtle reason lurking, then leave it as-is.

@@ +532,5 @@
> +
> +// Public functions
> +       let getEncoder = declareFFI(libxul, "osfile_GetEncoder",
> +         ctypes.default_abi,
> +         /*return*/ Types.uencoder.out_ptr.releaseWith(releaseEncoder),

Oooo, I like.

@@ +540,5 @@
> +         /*return*/ Types.udecoder.out_ptr.releaseWith(releaseDecoder),
> +         /*charset*/Types.char.in_ptr);
> +       let encode = declareFFI(libxul, "osfile_Encode",
> +        ctypes.default_abi,
> +         /*return*/     Types.jschar.out_ptr.releaseWith(free),

I think there is a little too much copy-paste going on here.  Please double-check that encode/decode match up with the declarations in the header file.

@@ +574,5 @@
> +          * the number of bytes in the output.
> +          * @return {CData} A C string in the given encoding,
> +          * or |null| in case of error.
> +          */
> +         encodeAll:  function encodeAll(encoder, string, outBytes) {

I am skeptical about providing outBytes as a parameter; it seems like it'd be more useful as a return value.  Then again, I'm not sure that it'd be really useful to the caller, either.  (It would be useful for incremental encoding, but that's not supported here.)  Have you written code calling encodeAll?  Have you used the outBytes parameter?

Actually, do we even need getEncoder/encodeAll pairings?  Why not just provide:

  encode(encoding-name, js-string) => c-string

and hide all the details?

@@ +587,5 @@
> +          * @param {CData} bytes An out pointer to a int32_t representing
> +          * the number of chars in the output.
> +          * @return {string} A JS string or |null| in case of error.
> +          */
> +         decodeAll:  function decodeAll(decoder, cstring, outChars) {

I don't see the usefulness of outChars here, since we can get the length directly from JS.  Ergo, we should probably change the C API to not require this parameter to decode().  Doing so also means that we can eliminate the confused definition of the bytes parameter in the docstring here. :)

@@ +614,5 @@
> +          * This value will be automatically garbage-collected once it is
> +          * not referenced anymore.
> +          */
> +         exportWString: function exportWString(string) {
> +           return strdup(string);

Pretty sure using strdup to copy JavaScript strings is going to end poorly due to confusion over 8-bit versus 16-bit characters.

::: toolkit/components/osfile/osfile_unix_back.jsm
@@ +389,5 @@
>  
> +       // Initialize Unicode conversion
> +       let wstrdup = declareFFI("strdup", ctypes.default_abi,
> +                               /*return*/ Types.jschar.out_ptr,
> +                               /*str*/    Types.jschar.in_ptr.releaseWith(UnixFile.free));

This definition looks confused about what to name the function and what function it actually wants to call.

::: toolkit/components/osfile/osfile_win_back.jsm
@@ +232,5 @@
>  
> +       // Initialize Unicode conversion
> +       let wstrdup = declareFFI("StrDup", ctypes.default_abi,
> +                     /*return*/ Types.char.out_ptr,
> +                     /*str*/    Types.char.in_ptr.releaseWith(WinFile.free));

Same comments from the Unix backend apply here.
Comment 11 Nathan Froyd [:froydnj] 2012-07-13 09:14:39 PDT
Comment on attachment 638991 [details] [diff] [review]
Native utilities

Review of attachment 638991 [details] [diff] [review]:
-----------------------------------------------------------------

The FIXMEs wrt SetLastError/errno need to be corrected.  SetLastError/errno should be set at all returns (e.g. failing to get nsICharsetConverterManager), too.  Preferably so that you're not constantly saying #if defined(XP_WIN).

I think this looks fairly sane, but after looking at the JS bits, I think we may want to modify the C API.

::: toolkit/components/osfile/osfileutils.cpp
@@ +17,5 @@
> +MOZ_EXPORT_API(void) osfile_ReleaseDecoder(nsIUnicodeDecoder* decoder)
> +{
> +  if (!decoder) {
> +    return;
> +  }

You don't need this check with NS_IF_RELEASE.

@@ +25,5 @@
> +MOZ_EXPORT_API(void) osfile_ReleaseEncoder(nsIUnicodeEncoder* encoder)
> +{
> +  if (!encoder) {
> +    return;
> +  }

Likewise.

@@ +29,5 @@
> +  }
> +  NS_IF_RELEASE(encoder);
> +}
> +
> +MOZ_EXPORT_API(nsIUnicodeEncoder*) osfile_GetEncoder(const char* encoding)

Shared template functions for osfile_Get{Encoder,Decoder} and osfile_{Encode,Decode} would be a worthwhile change, I think.

::: toolkit/components/osfile/osfileutils.h
@@ +39,5 @@
> +
> +/**
> + * Decode a sequence of bytes into a Unicode string.
> + *
> + * @param aDecode The decoder to use.

"aDecoder"

@@ +45,5 @@
> + *   end of the output.
> + * @param aSource The sequence of bytes to decode.
> + * @param aSourceBytes (in) The number of bytes to decode. If NULL,
> + *   the number of bytes will be |strlen(aSource)|.
> + * @param aSourceBytes (out) The number of bytes effectively decoded.

You should label aSourceBytes as inout.

@@ +47,5 @@
> + * @param aSourceBytes (in) The number of bytes to decode. If NULL,
> + *   the number of bytes will be |strlen(aSource)|.
> + * @param aSourceBytes (out) The number of bytes effectively decoded.
> + * @param aOutChars (out) The number of Unicode chars in the output,
> + *   not including the nul char.

This parameter should have a more descriptive name; I was puzzled why it was "aOutChars" with type int32_t*.  Maybe aCharsWritten?

@@ +58,5 @@
> +   int32_t* aOutChars);
> +/**
> + * Encode a complete Unicode string into a set of bytes.
> + *
> + * @return null in case of error, otherwise a nul-terminated string.

Needs more documentation.

@@ +62,5 @@
> + * @return null in case of error, otherwise a nul-terminated string.
> + */
> +MOZ_EXPORT_API(char*) osfile_Encode(nsIUnicodeEncoder* aEncoder,
> +   bool aAppendNulToOutput, const PRUnichar* aSource, int32_t* aSourceChars,
> +   int32_t* aOutBytes);

Same comments about naming.
Comment 12 David Rajchenbach-Teller [:Yoric] (please use "needinfo") 2012-07-14 07:35:29 PDT
(In reply to Nathan Froyd (:froydnj) from comment #10)
> Comment on attachment 639064 [details] [diff] [review]
> JS bindings
> 
> Review of attachment 639064 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> There's some DOM efforts to expose real string encoding/decoding primitives
> going on; you may want to look at those.  I don't have the bug #s
> immediately available.  Might be some overlap, might not.

I have heard rumours about that. However, until these primitives are available (and better than that, available to all threads), I will need a workaround.

> 
> ::: toolkit/components/osfile/osfile_shared.jsm
> @@ +508,5 @@
> >       };
> >       exports.OS.Shared.declareFFI = declareFFI;
> >  
> >  
> > +
> 
> Collapse whitespace, please.


Will do.

> 
> @@ +519,5 @@
> > +     _aux.initUnicode = function initUnicode(free, strdup) {
> > +       let libxul = ctypes.open(exports.OS.Constants.Path.libxul);
> > +       exports.OS.Shared.libxul = libxul;
> > +
> > +// Logistics
> 
> Nit: indentation.

Will do.

> 
> @@ +530,5 @@
> > +         /*return*/ Types.void_t.implementation,
> > +         /*decoder*/Types.udecoder.in_ptr.implementation);
> > +
> > +// Public functions
> > +       let getEncoder = declareFFI(libxul, "osfile_GetEncoder",
> 
> Enlightenment question: why use declareFFI for these functions and
> libxul.declare for the logistics functions?  Consistency would be betteer,
> but if there some subtle reason lurking, then leave it as-is.

There are places at which I have no choice for subtle reasons, but here, I could avoid this. I will fix that.

> 
> @@ +532,5 @@
> > +
> > +// Public functions
> > +       let getEncoder = declareFFI(libxul, "osfile_GetEncoder",
> > +         ctypes.default_abi,
> > +         /*return*/ Types.uencoder.out_ptr.releaseWith(releaseEncoder),
> 
> Oooo, I like.

:)

> 
> @@ +540,5 @@
> > +         /*return*/ Types.udecoder.out_ptr.releaseWith(releaseDecoder),
> > +         /*charset*/Types.char.in_ptr);
> > +       let encode = declareFFI(libxul, "osfile_Encode",
> > +        ctypes.default_abi,
> > +         /*return*/     Types.jschar.out_ptr.releaseWith(free),
> 
> I think there is a little too much copy-paste going on here.  Please
> double-check that encode/decode match up with the declarations in the header
> file.

Well-spotted, thanks.

> 
> @@ +574,5 @@
> > +          * the number of bytes in the output.
> > +          * @return {CData} A C string in the given encoding,
> > +          * or |null| in case of error.
> > +          */
> > +         encodeAll:  function encodeAll(encoder, string, outBytes) {
> 
> I am skeptical about providing outBytes as a parameter; it seems like it'd
> be more useful as a return value.  Then again, I'm not sure that it'd be
> really useful to the caller, either.  (It would be useful for incremental
> encoding, but that's not supported here.)  Have you written code calling
> encodeAll?  Have you used the outBytes parameter?

Actually, I have examples, but you are right, they are probably too premature. Let's forget about them atm.

> Actually, do we even need getEncoder/encodeAll pairings?  Why not just
> provide:
> 
>   encode(encoding-name, js-string) => c-string
> 
> and hide all the details?

The main reason is symmetry: with an API that allocates the string, decoders can be discarded after use, but encoders are stateful regardless, so I cannot provide

  decode(encoding-name, c-string) => js-string

If you think it wiser, I can provide asymmetric APIs for encoding and decoding. This will be a little bit slower (due to no reuse of the encoder), but possibly not by much.

> 
> @@ +587,5 @@
> > +          * @param {CData} bytes An out pointer to a int32_t representing
> > +          * the number of chars in the output.
> > +          * @return {string} A JS string or |null| in case of error.
> > +          */
> > +         decodeAll:  function decodeAll(decoder, cstring, outChars) {
> 
> I don't see the usefulness of outChars here, since we can get the length
> directly from JS.

This was mostly for symmetry, too. You are right, let's forget about it.

> Ergo, we should probably change the C API to not require
> this parameter to decode().  Doing so also means that we can eliminate the
> confused definition of the bytes parameter in the docstring here. :)

:)

> 
> @@ +614,5 @@
> > +          * This value will be automatically garbage-collected once it is
> > +          * not referenced anymore.
> > +          */
> > +         exportWString: function exportWString(string) {
> > +           return strdup(string);
> 
> Pretty sure using strdup to copy JavaScript strings is going to end poorly
> due to confusion over 8-bit versus 16-bit characters.

You are right, I should change the name.
... plus as you point further down, I should also change the function I pass as argument.

> 
> ::: toolkit/components/osfile/osfile_unix_back.jsm
> @@ +389,5 @@
> >  
> > +       // Initialize Unicode conversion
> > +       let wstrdup = declareFFI("strdup", ctypes.default_abi,
> > +                               /*return*/ Types.jschar.out_ptr,
> > +                               /*str*/    Types.jschar.in_ptr.releaseWith(UnixFile.free));
> 
> This definition looks confused about what to name the function and what
> function it actually wants to call.

Right, this should have been wcpcpy on both sides. Thanks for the spot.

> 
> ::: toolkit/components/osfile/osfile_win_back.jsm
> @@ +232,5 @@
> >  
> > +       // Initialize Unicode conversion
> > +       let wstrdup = declareFFI("StrDup", ctypes.default_abi,
> > +                     /*return*/ Types.char.out_ptr,
> > +                     /*str*/    Types.char.in_ptr.releaseWith(WinFile.free));
> 
> Same comments from the Unix backend apply here.

Right, this should have been StrDupW and jschar instead of char. Thanks for the spot.
Comment 13 David Rajchenbach-Teller [:Yoric] (please use "needinfo") 2012-07-14 07:39:49 PDT
(In reply to Nathan Froyd (:froydnj) from comment #11)
> Comment on attachment 638991 [details] [diff] [review]
> Native utilities
> 
> Review of attachment 638991 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> The FIXMEs wrt SetLastError/errno need to be corrected.  SetLastError/errno
> should be set at all returns (e.g. failing to get
> nsICharsetConverterManager), too.  Preferably so that you're not constantly
> saying #if defined(XP_WIN).

Will do.

> I think this looks fairly sane, but after looking at the JS bits, I think we
> may want to modify the C API.

You mean for removing the outbound information? We can certainly remove it with the current set of features exposed by JS. I will readd what is missing once we start working on streams (which require more info).

> 
> ::: toolkit/components/osfile/osfileutils.cpp
> @@ +17,5 @@
> > +MOZ_EXPORT_API(void) osfile_ReleaseDecoder(nsIUnicodeDecoder* decoder)
> > +{
> > +  if (!decoder) {
> > +    return;
> > +  }
> 
> You don't need this check with NS_IF_RELEASE.

Ah, good to know.

> 
> @@ +25,5 @@
> > +MOZ_EXPORT_API(void) osfile_ReleaseEncoder(nsIUnicodeEncoder* encoder)
> > +{
> > +  if (!encoder) {
> > +    return;
> > +  }
> 
> Likewise.
> 
> @@ +29,5 @@
> > +  }
> > +  NS_IF_RELEASE(encoder);
> > +}
> > +
> > +MOZ_EXPORT_API(nsIUnicodeEncoder*) osfile_GetEncoder(const char* encoding)
> 
> Shared template functions for osfile_Get{Encoder,Decoder} and
> osfile_{Encode,Decode} would be a worthwhile change, I think.

Ok, will do.

> 
> ::: toolkit/components/osfile/osfileutils.h
> @@ +39,5 @@
> > +
> > +/**
> > + * Decode a sequence of bytes into a Unicode string.
> > + *
> > + * @param aDecode The decoder to use.
> 
> "aDecoder"

Well spotted :)

> 
> @@ +45,5 @@
> > + *   end of the output.
> > + * @param aSource The sequence of bytes to decode.
> > + * @param aSourceBytes (in) The number of bytes to decode. If NULL,
> > + *   the number of bytes will be |strlen(aSource)|.
> > + * @param aSourceBytes (out) The number of bytes effectively decoded.
> 
> You should label aSourceBytes as inout.

Thanks.

> 
> @@ +47,5 @@
> > + * @param aSourceBytes (in) The number of bytes to decode. If NULL,
> > + *   the number of bytes will be |strlen(aSource)|.
> > + * @param aSourceBytes (out) The number of bytes effectively decoded.
> > + * @param aOutChars (out) The number of Unicode chars in the output,
> > + *   not including the nul char.
> 
> This parameter should have a more descriptive name; I was puzzled why it was
> "aOutChars" with type int32_t*.  Maybe aCharsWritten?

Thanks for the suggestion, I'll see how that works out.

> 
> @@ +58,5 @@
> > +   int32_t* aOutChars);
> > +/**
> > + * Encode a complete Unicode string into a set of bytes.
> > + *
> > + * @return null in case of error, otherwise a nul-terminated string.
> 
> Needs more documentation.

Ok.

> 
> @@ +62,5 @@
> > + * @return null in case of error, otherwise a nul-terminated string.
> > + */
> > +MOZ_EXPORT_API(char*) osfile_Encode(nsIUnicodeEncoder* aEncoder,
> > +   bool aAppendNulToOutput, const PRUnichar* aSource, int32_t* aSourceChars,
> > +   int32_t* aOutBytes);
> 
> Same comments about naming.

Ok.
Comment 14 Nathan Froyd [:froydnj] 2012-07-16 11:24:01 PDT
(In reply to David Rajchenbach Teller [:Yoric] from comment #12)
> > There's some DOM efforts to expose real string encoding/decoding primitives
> > going on; you may want to look at those.  I don't have the bug #s
> > immediately available.  Might be some overlap, might not.
> 
> I have heard rumours about that. However, until these primitives are
> available (and better than that, available to all threads), I will need a
> workaround.

Indeed.

> > I am skeptical about providing outBytes as a parameter; it seems like it'd
> > be more useful as a return value.  Then again, I'm not sure that it'd be
> > really useful to the caller, either.  (It would be useful for incremental
> > encoding, but that's not supported here.)  Have you written code calling
> > encodeAll?  Have you used the outBytes parameter?
> 
> Actually, I have examples, but you are right, they are probably too
> premature. Let's forget about them atm.
> 
> > Actually, do we even need getEncoder/encodeAll pairings?  Why not just
> > provide:
> > 
> >   encode(encoding-name, js-string) => c-string
> > 
> > and hide all the details?
> 
> The main reason is symmetry: with an API that allocates the string, decoders
> can be discarded after use, but encoders are stateful regardless, so I
> cannot provide
> 
>   decode(encoding-name, c-string) => js-string
> 
> If you think it wiser, I can provide asymmetric APIs for encoding and
> decoding. This will be a little bit slower (due to no reuse of the encoder),
> but possibly not by much.

I am a fan of symmetry, but I do not understand the justification provided here.  I do not see why providing the method you describe above is any different than:

  let d = getDecoder(encoding-name);
  return decodeAll(d, c-string, /*XXX param to be removed*/null);

I am also not seeing what the statefulness argument re: encoders and decoders has to do with providing an easy API.  Could you please elaborate?

I agree that it would be nice to reuse encoders and decoders when possible.  But you provide no way to get at nsIUnicode{Encoder,Decoder}.Reset, so arguing for reuse seems like a non-starter.  (If you are going to argue for efficiency, then pushing for nsIUnicode{Encoder,Decoder} to be more fully exposed through this API is the way to go; the big win from that is incremental encoding/decoding of buffers.  But it's not clear that we need that at the moment.)
Comment 15 Nathan Froyd [:froydnj] 2012-07-16 11:30:57 PDT
(In reply to David Rajchenbach Teller [:Yoric] from comment #13)
> > I think this looks fairly sane, but after looking at the JS bits, I think we
> > may want to modify the C API.
> 
> You mean for removing the outbound information? We can certainly remove it
> with the current set of features exposed by JS. I will readd what is missing
> once we start working on streams (which require more info).

Yes, removing the outbound information.  If it's actually going to be needed later, then I'd be OK with leaving it in.  But I don't see a good reason to expose the outbound parameters all the way up through JS with an API like {encode,decode}All.  Those two functions should just do the translation and sweep whatever messy details the C API requires under the rug.

IMHO, there should be a separate API for doing incremental, stream-type encoding/decoding.
Comment 16 Nathan Froyd [:froydnj] 2012-07-16 11:32:17 PDT
(In reply to Nathan Froyd (:froydnj) from comment #15)
> IMHO, there should be a separate API for doing incremental, stream-type
> encoding/decoding.

Clarification: separate methods for doing incremental, stream-type encoding/decoding in OS.File alongside {encode,decode}All or whatever we wind up calling them.
Comment 17 David Rajchenbach-Teller [:Yoric] (please use "needinfo") 2012-07-17 06:45:56 PDT
(In reply to Nathan Froyd (:froydnj) from comment #16)
> (In reply to Nathan Froyd (:froydnj) from comment #15)
> > IMHO, there should be a separate API for doing incremental, stream-type
> > encoding/decoding.
> 
> Clarification: separate methods for doing incremental, stream-type
> encoding/decoding in OS.File alongside {encode,decode}All or whatever we
> wind up calling them.

Yes, that's what I had in mind.
Comment 18 David Rajchenbach-Teller [:Yoric] (please use "needinfo") 2012-07-17 07:16:47 PDT
(In reply to Nathan Froyd (:froydnj) from comment #14)
> > > Actually, do we even need getEncoder/encodeAll pairings?  Why not just
> > > provide:
> > > 
> > >   encode(encoding-name, js-string) => c-string
> > > 
> > > and hide all the details?
> > 
> > The main reason is symmetry: with an API that allocates the string, decoders
> > can be discarded after use, but encoders are stateful regardless, so I
> > cannot provide
> > 
> >   decode(encoding-name, c-string) => js-string
> > 
> > If you think it wiser, I can provide asymmetric APIs for encoding and
> > decoding. This will be a little bit slower (due to no reuse of the encoder),
> > but possibly not by much.
> 
> I am a fan of symmetry, but I do not understand the justification provided
> here.  I do not see why providing the method you describe above is any
> different than:
> 
>   let d = getDecoder(encoding-name);
>   return decodeAll(d, c-string, /*XXX param to be removed*/null);
>
> 
> I am also not seeing what the statefulness argument re: encoders and
> decoders has to do with providing an easy API.  Could you please elaborate?

The problem appears if c-string ends with a partial sequence. In this case, we need to reuse the same decoder with the next sequence.

I grant you that it is unlikely that we end up with a nul-terminated partial sequence, so we can probably forget about this argument.

> I agree that it would be nice to reuse encoders and decoders when possible. 
> But you provide no way to get at nsIUnicode{Encoder,Decoder}.Reset, so
> arguing for reuse seems like a non-starter.  (If you are going to argue for
> efficiency, then pushing for nsIUnicode{Encoder,Decoder} to be more fully
> exposed through this API is the way to go; the big win from that is
> incremental encoding/decoding of buffers.  But it's not clear that we need
> that at the moment.)

The idea was indeed to progressively publish more of nsIUnicode{Encoder,Decoder}, to allow progressive encoding/decoding of buffers/streams.
Comment 19 David Rajchenbach-Teller [:Yoric] (please use "needinfo") 2012-07-17 07:23:10 PDT
Created attachment 642941 [details] [diff] [review]
Native utilities
Comment 20 David Rajchenbach-Teller [:Yoric] (please use "needinfo") 2012-07-17 07:28:15 PDT
Created attachment 642942 [details] [diff] [review]
JS bindings v2

Here is a simplified version of the JS bindings. Until we decide whether to simplify it further.
Comment 21 Nathan Froyd [:froydnj] 2012-07-17 07:31:43 PDT
(In reply to David Rajchenbach Teller (away until early August) [:Yoric] from comment #18)
> > I am a fan of symmetry, but I do not understand the justification provided
> > here.  I do not see why providing the method you describe above is any
> > different than:
> > 
> >   let d = getDecoder(encoding-name);
> >   return decodeAll(d, c-string, /*XXX param to be removed*/null);
> >
> > 
> > I am also not seeing what the statefulness argument re: encoders and
> > decoders has to do with providing an easy API.  Could you please elaborate?
> 
> The problem appears if c-string ends with a partial sequence. In this case,
> we need to reuse the same decoder with the next sequence.

Ah.  That is a good point and definitely a case we should handle.  But I think that decodeAll should not be obligated to deal with this sort of thing; the caller should be responsible for providing fully-formed input.  (A more incremental API should deal properly with this case.)

In any event, I don't think the scenario you describe works with the current C support, since when Convert() fails, you just return nsnull rather than the partially decoded sequence.

> > I agree that it would be nice to reuse encoders and decoders when possible. 
> > But you provide no way to get at nsIUnicode{Encoder,Decoder}.Reset, so
> > arguing for reuse seems like a non-starter.  (If you are going to argue for
> > efficiency, then pushing for nsIUnicode{Encoder,Decoder} to be more fully
> > exposed through this API is the way to go; the big win from that is
> > incremental encoding/decoding of buffers.  But it's not clear that we need
> > that at the moment.)
> 
> The idea was indeed to progressively publish more of
> nsIUnicode{Encoder,Decoder}, to allow progressive encoding/decoding of
> buffers/streams.

Very good.
Comment 22 David Rajchenbach-Teller [:Yoric] (please use "needinfo") 2012-07-17 08:14:14 PDT
Actually, there is something wrong brewing here.

For most encodings, applying |strlen| to a c buffer is absolutely useless. Rather, we need to return the length of the c buffer in chars from EncodeAll and take this length as argument in DecodeAll.

My bad for only testing with utf-8.
Comment 23 Nathan Froyd [:froydnj] 2012-07-17 08:20:09 PDT
(In reply to David Rajchenbach Teller (away until early August) [:Yoric] from comment #22)
> For most encodings, applying |strlen| to a c buffer is absolutely useless.
> Rather, we need to return the length of the c buffer in chars from EncodeAll
> and take this length as argument in DecodeAll.

Mmm, yes; strlen on a UTF-16 buffer is not going to end well.  My bad for not noticing that >8-bit encodings do need lengths of buffers returned and I shouldn't have complained about encodeAll.

Do you want to cancel the r? flags, then?
Comment 24 David Rajchenbach-Teller [:Yoric] (please use "needinfo") 2012-07-18 09:26:05 PDT
Created attachment 643414 [details] [diff] [review]
Native utilities, v2

Here is a new version that does not expose encoders/decoders yet, but exposes number of bytes.

As discussed previously, this means that the only way to read an encoded file is to first load it completely to memory, then convert everything to a string. Probably not a major deal for the scope intended (e.g. configuration files), since we are going to read them off-main thread.
Comment 25 David Rajchenbach-Teller [:Yoric] (please use "needinfo") 2012-07-18 09:27:44 PDT
Created attachment 643415 [details] [diff] [review]
JS bindings v3
Comment 26 David Rajchenbach-Teller [:Yoric] (please use "needinfo") 2012-07-18 09:28:11 PDT
Created attachment 643416 [details] [diff] [review]
Companion testsuite
Comment 27 Nathan Froyd [:froydnj] 2012-07-18 09:38:27 PDT
Comment on attachment 642941 [details] [diff] [review]
Native utilities

Review of attachment 642941 [details] [diff] [review]:
-----------------------------------------------------------------

Works for me.  I think I'd like to see the EncodeAll length outparam changes and the osfile_ns_alloc question answered, though.

::: toolkit/components/osfile/osfileutils.cpp
@@ +212,5 @@
> +  // Convert, add trailing \0
> +
> +  rv = aDecoder->Convert(aSource, &srcBytes, dest.rwget(), &upperBoundChars);
> +  if (NS_FAILED(rv)) {
> +    error_wrong_sequence();

Nit: I think error_invalid_argument is more correct here.  Also in osfile_EncodeAll.  Then error_wrong_sequence can go away.

@@ +234,5 @@
> +  int32_t upperBoundBytes;
> +  nsresult rv = aEncoder->GetMaxLength(aSource, srcChars, &upperBoundBytes);
> +  if (NS_FAILED(rv)) {
> +    error_unexpected();
> +    return nsnull;

FWIW, GetMaxLength is specified to never fail (!).  I think it'd be OK to turn this into an NS_ASSERTION (or the MOZ flavor-of-the-rewrite equivalent) that NS_SUCCEEDED(rv).  You should also initialize upperBoundBytes properly, just in case.  Then you can get rid of error_unexpected.  Also in osfile_EncodeAll.  (See how much easier this would have been with template functions? :)

::: toolkit/components/osfile/osfileutils.h
@@ +8,5 @@
> +extern "C" {
> +
> +// Memory utilities
> +
> +MOZ_EXPORT_API(void*) osfile_ns_alloc(PRSize size);

This doesn't look used in the current bindings.  Is this another save-it-for-later sort of function?

@@ +60,5 @@
> + * @param aDecoder The decoder to use.
> + * @param aSource The C string to decode. Must be null-terminated.
> + *
> + * @return null in case of error, otherwise a sequence of Unicode chars,
> + *   representing |aSource|, optionally terminated with a nul-terminator.

The optionally terminated language here is no longer necessary.

@@ +61,5 @@
> + * @param aSource The C string to decode. Must be null-terminated.
> + *
> + * @return null in case of error, otherwise a sequence of Unicode chars,
> + *   representing |aSource|, optionally terminated with a nul-terminator.
> + *   This memory should be released with |free|.

Point of order: this is not what the JS code does.  Fix the docstring or the code, please.

@@ +75,5 @@
> + *
> + * @return null in case of error, otherwise a nul-terminated string.
> + */
> +MOZ_EXPORT_API(char*) osfile_EncodeAll(nsIUnicodeEncoder* aEncoder,
> +   const PRUnichar* aSource);

Later comments suggested that we should pass a length parameter out of here and out to JS.
Comment 28 Nathan Froyd [:froydnj] 2012-07-19 09:06:28 PDT
Comment on attachment 643414 [details] [diff] [review]
Native utilities, v2

Review of attachment 643414 [details] [diff] [review]:
-----------------------------------------------------------------

Please see comment 27 for caveats; I think we mid-aired in adding/removing patches.
Comment 29 Nathan Froyd [:froydnj] 2012-07-19 09:24:33 PDT
Comment on attachment 643415 [details] [diff] [review]
JS bindings v3

Review of attachment 643415 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.  I would like to see the encodeAll change before commit.

::: toolkit/components/osfile/osfile_shared.jsm
@@ +689,5 @@
> +     /**
> +      * A trimmed-down version of declareFFI that returns a raw js-ctypes
> +      * function, which does not translate its arguments.
> +      *
> +      * Used to declare finalizers.

I am rs'ing this part; I assume you know what you are doing here. :)

@@ +702,5 @@
> +       abi = abi || ctypes.default_abi;
> +       if (Object.prototype.toString.call(abi) != "[object CABI]") {
> +         // Note: This is the only known manner of checking whether an object
> +         // is an abi.
> +         throw new TypeError("declareFFI expects as second argument an abi or null");

"declareFinalizer expects..."

@@ +705,5 @@
> +         // is an abi.
> +         throw new TypeError("declareFFI expects as second argument an abi or null");
> +       }
> +       let signature = [symbol, abi];
> +       let argtypes  = [];

argtypes is unused.  Bug?

@@ +773,5 @@
> +        ctypes.default_abi,
> +         /*return*/     Types.char.out_ptr.releaseWith(NS_Free),
> +         /*encoding*/   Types.char.in_ptr,
> +         /*source*/     Types.jschar.in_ptr,
> +         /*bytes*/      Types.uint32_t.out_ptr);

We should try to make the caller's job easy, though; just because the C code returns outparams doesn't mean we have to.  Please make encodeAll return the allocated buffer and count as multiple values.  I am uncertain about whether [buffer, count] or { buffer: ..., count: ... } is preferred style here; I think I lean towards the former.

I realize this makes the API asymmetric.  I don't have good ideas as to how to make them symmetric.
Comment 30 Nathan Froyd [:froydnj] 2012-07-19 11:50:15 PDT
Comment on attachment 643416 [details] [diff] [review]
Companion testsuite

Review of attachment 643416 [details] [diff] [review]:
-----------------------------------------------------------------

Taras already r+'d this, so this is essentially a rubberstamp.  I assume you can change the test appropriately if the API changes from the other patches. :)

::: toolkit/components/osfile/tests/mochi/worker_test_osfile_front.js
@@ +55,5 @@
> +  ok(true, "Starting test_unicode");
> +  function test_go_round(encoding, sentence)  {
> +    let bytes = new OS.Shared.Type.uint32_t.implementation();
> +    let pBytes = bytes.address();
> +    ok(true, "test_unicode: testing encoding of " + sentence + " with encoding " + encoding);

Please use info() here instead of ok().
Comment 31 David Rajchenbach-Teller [:Yoric] (please use "needinfo") 2012-07-20 06:52:26 PDT
Ok, this time I am sure that I have replied to your question but that the reply does not appear. Bugzilla seems to be eating my comments, so I will just post updated patches and reply later :)
Comment 32 David Rajchenbach-Teller [:Yoric] (please use "needinfo") 2012-07-20 06:56:46 PDT
Created attachment 644301 [details] [diff] [review]
JS bindings v4

While I definitely agree that EncodeAll is not client-friendly, I intended to address the issue in a followup patch. The idea here is to provide the lower-level bindings and to provide in a second patch higher-level bindings without outparams and with exception-handling, etc.
Comment 33 David Rajchenbach-Teller [:Yoric] (please use "needinfo") 2012-07-20 06:57:49 PDT
Created attachment 644302 [details] [diff] [review]
Companion makefile

(self-reviewing the makefile)
Comment 34 David Rajchenbach-Teller [:Yoric] (please use "needinfo") 2012-07-20 06:58:51 PDT
Created attachment 644303 [details] [diff] [review]
Native utilities, v3

Applied feedback.
Comment 35 David Rajchenbach-Teller [:Yoric] (please use "needinfo") 2012-07-20 06:59:58 PDT
(In reply to Nathan Froyd (:froydnj) from comment #30)
> Comment on attachment 643416 [details] [diff] [review]
> Companion testsuite
> 
> Review of attachment 643416 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Taras already r+'d this, so this is essentially a rubberstamp.  I assume you
> can change the test appropriately if the API changes from the other patches.
> :)
> 
> ::: toolkit/components/osfile/tests/mochi/worker_test_osfile_front.js
> @@ +55,5 @@
> > +  ok(true, "Starting test_unicode");
> > +  function test_go_round(encoding, sentence)  {
> > +    let bytes = new OS.Shared.Type.uint32_t.implementation();
> > +    let pBytes = bytes.address();
> > +    ok(true, "test_unicode: testing encoding of " + sentence + " with encoding " + encoding);
> 
> Please use info() here instead of ok().

Note that this is a worker, so I had to redefine |ok| myself, I/O does not always work (etc.), so if you do not mind, I will keep this for a followup bug.
Comment 36 David Rajchenbach-Teller [:Yoric] (please use "needinfo") 2012-07-20 07:00:46 PDT
Created attachment 644304 [details] [diff] [review]
Companion testsuite
Comment 37 Nathan Froyd [:froydnj] 2012-07-20 07:25:12 PDT
(In reply to David Rajchenbach Teller (away until early August) [:Yoric] from comment #35)
> Note that this is a worker, so I had to redefine |ok| myself, I/O does not
> always work (etc.), so if you do not mind, I will keep this for a followup
> bug.

I totally missed the worker bit.  I don't mind one way or the other, just a tidyness detail.

(In reply to David Rajchenbach Teller (away until early August) [:Yoric] from comment #32)
> While I definitely agree that EncodeAll is not client-friendly, I intended
> to address the issue in a followup patch. The idea here is to provide the
> lower-level bindings and to provide in a second patch higher-level bindings
> without outparams and with exception-handling, etc.

OK, that seems reasonable.  Presumably the low-level {en,de}codeAll will either no longer be available or have their API changed, then?
Comment 38 David Rajchenbach-Teller [:Yoric] (please use "needinfo") 2012-07-21 10:13:17 PDT
(In reply to Nathan Froyd (:froydnj) from comment #37)
> (In reply to David Rajchenbach Teller (away until early August) [:Yoric]
> from comment #32)
> > While I definitely agree that EncodeAll is not client-friendly, I intended
> > to address the issue in a followup patch. The idea here is to provide the
> > lower-level bindings and to provide in a second patch higher-level bindings
> > without outparams and with exception-handling, etc.
> 
> OK, that seems reasonable.  Presumably the low-level {en,de}codeAll will
> either no longer be available or have their API changed, then?

That's the idea. Haven't quite decided the specifics yet.
Comment 39 David Rajchenbach-Teller [:Yoric] (please use "needinfo") 2012-07-30 07:26:01 PDT
Created attachment 647159 [details] [diff] [review]
Native utilities
Comment 40 David Rajchenbach-Teller [:Yoric] (please use "needinfo") 2012-08-10 06:50:44 PDT
Created attachment 650868 [details] [diff] [review]
Native utilities
Comment 41 David Rajchenbach-Teller [:Yoric] (please use "needinfo") 2012-08-10 06:51:23 PDT
Created attachment 650869 [details] [diff] [review]
JS bindings
Comment 42 David Rajchenbach-Teller [:Yoric] (please use "needinfo") 2012-08-10 06:52:16 PDT
Created attachment 650870 [details] [diff] [review]
Companion makefile
Comment 43 David Rajchenbach-Teller [:Yoric] (please use "needinfo") 2012-08-10 06:53:03 PDT
Created attachment 650871 [details] [diff] [review]
Companion testsuite
Comment 44 David Rajchenbach-Teller [:Yoric] (please use "needinfo") 2012-08-10 06:54:05 PDT
Try run: https://tbpl.mozilla.org/?tree=Try&rev=1135d39e54fc

Note You need to log in before you can comment on or make changes to this bug.