Closed
Bug 770215
Opened 12 years ago
Closed 12 years ago
[OS.File] Utilities for strings
Categories
(Core :: Networking: File, defect)
Core
Networking: File
Tracking
()
RESOLVED
FIXED
mozilla17
People
(Reporter: Yoric, Assigned: Yoric)
References
Details
Attachments
(4 files, 14 obsolete files)
6.70 KB,
patch
|
Yoric
:
review+
|
Details | Diff | Splinter Review |
4.55 KB,
patch
|
Yoric
:
review+
|
Details | Diff | Splinter Review |
1.54 KB,
patch
|
Yoric
:
review+
|
Details | Diff | Splinter Review |
3.01 KB,
patch
|
Yoric
:
review+
|
Details | Diff | Splinter Review |
We need the following features: - convert a js string to a \0-terminated jschar*, getting the length of that jschar* (required for writing strings under Windows); - convert a js string to a \0-terminated char* in a given encoding, getting the length of that char* (required for writing strings under Unix); - convert a \0-terminated char* in a given encoding to a js string (required for reading a complete file under Unix); - convert a \0-terminated jschar* to a js string (required for reading a complete file under Windows); - convert a char* with length in a given encoding to a js string (required for reading a chunk of a file under Unix); - convert a jschar* with length in to a js string (required for reading a chunk of a file under Windows). We might not need reading chunks for the time being.
Assignee | ||
Comment 1•12 years ago
|
||
Assignee: nobody → dteller
Assignee | ||
Comment 2•12 years ago
|
||
Assignee | ||
Comment 3•12 years ago
|
||
Here is a first prototype. Mike, could you tell me what you think of this implementation?
Attachment #638991 -
Flags: feedback?(mh+mozilla)
Assignee | ||
Comment 4•12 years ago
|
||
Attachment #638990 -
Attachment is obsolete: true
Assignee | ||
Updated•12 years ago
|
Attachment #638991 -
Flags: feedback?(mh+mozilla) → review?(taras.mozilla)
Assignee | ||
Comment 5•12 years ago
|
||
I wish to add that we need these utilities either as part of OS.File itself or for any user of OS.File who wants to read strings from or write strings to a file (e.g session store).
Assignee | ||
Updated•12 years ago
|
Attachment #638989 -
Flags: review?(taras.mozilla)
Assignee | ||
Updated•12 years ago
|
Attachment #639064 -
Flags: review?(taras.mozilla)
Comment 6•12 years ago
|
||
Comment on attachment 639064 [details] [diff] [review] JS bindings exports.OS.Shared.declareFFI = declareFFI; + /** really? need even more whitespace? :) +// Logistics again. whitespace. I'm not sure i'm the best person for this. jorendorff is a better for ctypes utility apis like this
Attachment #639064 -
Flags: review?(taras.mozilla)
Updated•12 years ago
|
Attachment #639064 -
Flags: review?(jorendorff)
Updated•12 years ago
|
Attachment #638991 -
Flags: review?(taras.mozilla) → review?(jorendorff)
Comment 7•12 years ago
|
||
Comment on attachment 638989 [details] [diff] [review] Companion testsuite ok by me
Attachment #638989 -
Flags: review?(taras.mozilla) → review+
Assignee | ||
Comment 8•12 years ago
|
||
(In reply to Taras Glek (:taras) from comment #6) > Comment on attachment 639064 [details] [diff] [review] > JS bindings > > exports.OS.Shared.declareFFI = declareFFI; > > > + > /** > > really? need even more whitespace? > > :) > > +// Logistics > again. whitespace. > > I'm not sure i'm the best person for this. jorendorff is a better for ctypes > utility apis like this I actually asked jorendorff first and he declined reviewing these patches.
Assignee | ||
Comment 9•12 years ago
|
||
Comment on attachment 638991 [details] [diff] [review] Native utilities Perhaps Nathan would like to review this?
Attachment #638991 -
Flags: review?(nfroyd)
Assignee | ||
Updated•12 years ago
|
Attachment #639064 -
Flags: review?(nfroyd)
Comment 10•12 years ago
|
||
Comment on attachment 639064 [details] [diff] [review] JS bindings Review of attachment 639064 [details] [diff] [review]: ----------------------------------------------------------------- There's some DOM efforts to expose real string encoding/decoding primitives going on; you may want to look at those. I don't have the bug #s immediately available. Might be some overlap, might not. ::: toolkit/components/osfile/osfile_shared.jsm @@ +508,5 @@ > }; > exports.OS.Shared.declareFFI = declareFFI; > > > + Collapse whitespace, please. @@ +519,5 @@ > + _aux.initUnicode = function initUnicode(free, strdup) { > + let libxul = ctypes.open(exports.OS.Constants.Path.libxul); > + exports.OS.Shared.libxul = libxul; > + > +// Logistics Nit: indentation. @@ +530,5 @@ > + /*return*/ Types.void_t.implementation, > + /*decoder*/Types.udecoder.in_ptr.implementation); > + > +// Public functions > + let getEncoder = declareFFI(libxul, "osfile_GetEncoder", Enlightenment question: why use declareFFI for these functions and libxul.declare for the logistics functions? Consistency would be betteer, but if there some subtle reason lurking, then leave it as-is. @@ +532,5 @@ > + > +// Public functions > + let getEncoder = declareFFI(libxul, "osfile_GetEncoder", > + ctypes.default_abi, > + /*return*/ Types.uencoder.out_ptr.releaseWith(releaseEncoder), Oooo, I like. @@ +540,5 @@ > + /*return*/ Types.udecoder.out_ptr.releaseWith(releaseDecoder), > + /*charset*/Types.char.in_ptr); > + let encode = declareFFI(libxul, "osfile_Encode", > + ctypes.default_abi, > + /*return*/ Types.jschar.out_ptr.releaseWith(free), I think there is a little too much copy-paste going on here. Please double-check that encode/decode match up with the declarations in the header file. @@ +574,5 @@ > + * the number of bytes in the output. > + * @return {CData} A C string in the given encoding, > + * or |null| in case of error. > + */ > + encodeAll: function encodeAll(encoder, string, outBytes) { I am skeptical about providing outBytes as a parameter; it seems like it'd be more useful as a return value. Then again, I'm not sure that it'd be really useful to the caller, either. (It would be useful for incremental encoding, but that's not supported here.) Have you written code calling encodeAll? Have you used the outBytes parameter? Actually, do we even need getEncoder/encodeAll pairings? Why not just provide: encode(encoding-name, js-string) => c-string and hide all the details? @@ +587,5 @@ > + * @param {CData} bytes An out pointer to a int32_t representing > + * the number of chars in the output. > + * @return {string} A JS string or |null| in case of error. > + */ > + decodeAll: function decodeAll(decoder, cstring, outChars) { I don't see the usefulness of outChars here, since we can get the length directly from JS. Ergo, we should probably change the C API to not require this parameter to decode(). Doing so also means that we can eliminate the confused definition of the bytes parameter in the docstring here. :) @@ +614,5 @@ > + * This value will be automatically garbage-collected once it is > + * not referenced anymore. > + */ > + exportWString: function exportWString(string) { > + return strdup(string); Pretty sure using strdup to copy JavaScript strings is going to end poorly due to confusion over 8-bit versus 16-bit characters. ::: toolkit/components/osfile/osfile_unix_back.jsm @@ +389,5 @@ > > + // Initialize Unicode conversion > + let wstrdup = declareFFI("strdup", ctypes.default_abi, > + /*return*/ Types.jschar.out_ptr, > + /*str*/ Types.jschar.in_ptr.releaseWith(UnixFile.free)); This definition looks confused about what to name the function and what function it actually wants to call. ::: toolkit/components/osfile/osfile_win_back.jsm @@ +232,5 @@ > > + // Initialize Unicode conversion > + let wstrdup = declareFFI("StrDup", ctypes.default_abi, > + /*return*/ Types.char.out_ptr, > + /*str*/ Types.char.in_ptr.releaseWith(WinFile.free)); Same comments from the Unix backend apply here.
Attachment #639064 -
Flags: review?(nfroyd) → review-
Comment 11•12 years ago
|
||
Comment on attachment 638991 [details] [diff] [review] Native utilities Review of attachment 638991 [details] [diff] [review]: ----------------------------------------------------------------- The FIXMEs wrt SetLastError/errno need to be corrected. SetLastError/errno should be set at all returns (e.g. failing to get nsICharsetConverterManager), too. Preferably so that you're not constantly saying #if defined(XP_WIN). I think this looks fairly sane, but after looking at the JS bits, I think we may want to modify the C API. ::: toolkit/components/osfile/osfileutils.cpp @@ +17,5 @@ > +MOZ_EXPORT_API(void) osfile_ReleaseDecoder(nsIUnicodeDecoder* decoder) > +{ > + if (!decoder) { > + return; > + } You don't need this check with NS_IF_RELEASE. @@ +25,5 @@ > +MOZ_EXPORT_API(void) osfile_ReleaseEncoder(nsIUnicodeEncoder* encoder) > +{ > + if (!encoder) { > + return; > + } Likewise. @@ +29,5 @@ > + } > + NS_IF_RELEASE(encoder); > +} > + > +MOZ_EXPORT_API(nsIUnicodeEncoder*) osfile_GetEncoder(const char* encoding) Shared template functions for osfile_Get{Encoder,Decoder} and osfile_{Encode,Decode} would be a worthwhile change, I think. ::: toolkit/components/osfile/osfileutils.h @@ +39,5 @@ > + > +/** > + * Decode a sequence of bytes into a Unicode string. > + * > + * @param aDecode The decoder to use. "aDecoder" @@ +45,5 @@ > + * end of the output. > + * @param aSource The sequence of bytes to decode. > + * @param aSourceBytes (in) The number of bytes to decode. If NULL, > + * the number of bytes will be |strlen(aSource)|. > + * @param aSourceBytes (out) The number of bytes effectively decoded. You should label aSourceBytes as inout. @@ +47,5 @@ > + * @param aSourceBytes (in) The number of bytes to decode. If NULL, > + * the number of bytes will be |strlen(aSource)|. > + * @param aSourceBytes (out) The number of bytes effectively decoded. > + * @param aOutChars (out) The number of Unicode chars in the output, > + * not including the nul char. This parameter should have a more descriptive name; I was puzzled why it was "aOutChars" with type int32_t*. Maybe aCharsWritten? @@ +58,5 @@ > + int32_t* aOutChars); > +/** > + * Encode a complete Unicode string into a set of bytes. > + * > + * @return null in case of error, otherwise a nul-terminated string. Needs more documentation. @@ +62,5 @@ > + * @return null in case of error, otherwise a nul-terminated string. > + */ > +MOZ_EXPORT_API(char*) osfile_Encode(nsIUnicodeEncoder* aEncoder, > + bool aAppendNulToOutput, const PRUnichar* aSource, int32_t* aSourceChars, > + int32_t* aOutBytes); Same comments about naming.
Attachment #638991 -
Flags: review?(nfroyd) → review-
Assignee | ||
Comment 12•12 years ago
|
||
(In reply to Nathan Froyd (:froydnj) from comment #10) > Comment on attachment 639064 [details] [diff] [review] > JS bindings > > Review of attachment 639064 [details] [diff] [review]: > ----------------------------------------------------------------- > > There's some DOM efforts to expose real string encoding/decoding primitives > going on; you may want to look at those. I don't have the bug #s > immediately available. Might be some overlap, might not. I have heard rumours about that. However, until these primitives are available (and better than that, available to all threads), I will need a workaround. > > ::: toolkit/components/osfile/osfile_shared.jsm > @@ +508,5 @@ > > }; > > exports.OS.Shared.declareFFI = declareFFI; > > > > > > + > > Collapse whitespace, please. Will do. > > @@ +519,5 @@ > > + _aux.initUnicode = function initUnicode(free, strdup) { > > + let libxul = ctypes.open(exports.OS.Constants.Path.libxul); > > + exports.OS.Shared.libxul = libxul; > > + > > +// Logistics > > Nit: indentation. Will do. > > @@ +530,5 @@ > > + /*return*/ Types.void_t.implementation, > > + /*decoder*/Types.udecoder.in_ptr.implementation); > > + > > +// Public functions > > + let getEncoder = declareFFI(libxul, "osfile_GetEncoder", > > Enlightenment question: why use declareFFI for these functions and > libxul.declare for the logistics functions? Consistency would be betteer, > but if there some subtle reason lurking, then leave it as-is. There are places at which I have no choice for subtle reasons, but here, I could avoid this. I will fix that. > > @@ +532,5 @@ > > + > > +// Public functions > > + let getEncoder = declareFFI(libxul, "osfile_GetEncoder", > > + ctypes.default_abi, > > + /*return*/ Types.uencoder.out_ptr.releaseWith(releaseEncoder), > > Oooo, I like. :) > > @@ +540,5 @@ > > + /*return*/ Types.udecoder.out_ptr.releaseWith(releaseDecoder), > > + /*charset*/Types.char.in_ptr); > > + let encode = declareFFI(libxul, "osfile_Encode", > > + ctypes.default_abi, > > + /*return*/ Types.jschar.out_ptr.releaseWith(free), > > I think there is a little too much copy-paste going on here. Please > double-check that encode/decode match up with the declarations in the header > file. Well-spotted, thanks. > > @@ +574,5 @@ > > + * the number of bytes in the output. > > + * @return {CData} A C string in the given encoding, > > + * or |null| in case of error. > > + */ > > + encodeAll: function encodeAll(encoder, string, outBytes) { > > I am skeptical about providing outBytes as a parameter; it seems like it'd > be more useful as a return value. Then again, I'm not sure that it'd be > really useful to the caller, either. (It would be useful for incremental > encoding, but that's not supported here.) Have you written code calling > encodeAll? Have you used the outBytes parameter? Actually, I have examples, but you are right, they are probably too premature. Let's forget about them atm. > Actually, do we even need getEncoder/encodeAll pairings? Why not just > provide: > > encode(encoding-name, js-string) => c-string > > and hide all the details? The main reason is symmetry: with an API that allocates the string, decoders can be discarded after use, but encoders are stateful regardless, so I cannot provide decode(encoding-name, c-string) => js-string If you think it wiser, I can provide asymmetric APIs for encoding and decoding. This will be a little bit slower (due to no reuse of the encoder), but possibly not by much. > > @@ +587,5 @@ > > + * @param {CData} bytes An out pointer to a int32_t representing > > + * the number of chars in the output. > > + * @return {string} A JS string or |null| in case of error. > > + */ > > + decodeAll: function decodeAll(decoder, cstring, outChars) { > > I don't see the usefulness of outChars here, since we can get the length > directly from JS. This was mostly for symmetry, too. You are right, let's forget about it. > Ergo, we should probably change the C API to not require > this parameter to decode(). Doing so also means that we can eliminate the > confused definition of the bytes parameter in the docstring here. :) :) > > @@ +614,5 @@ > > + * This value will be automatically garbage-collected once it is > > + * not referenced anymore. > > + */ > > + exportWString: function exportWString(string) { > > + return strdup(string); > > Pretty sure using strdup to copy JavaScript strings is going to end poorly > due to confusion over 8-bit versus 16-bit characters. You are right, I should change the name. ... plus as you point further down, I should also change the function I pass as argument. > > ::: toolkit/components/osfile/osfile_unix_back.jsm > @@ +389,5 @@ > > > > + // Initialize Unicode conversion > > + let wstrdup = declareFFI("strdup", ctypes.default_abi, > > + /*return*/ Types.jschar.out_ptr, > > + /*str*/ Types.jschar.in_ptr.releaseWith(UnixFile.free)); > > This definition looks confused about what to name the function and what > function it actually wants to call. Right, this should have been wcpcpy on both sides. Thanks for the spot. > > ::: toolkit/components/osfile/osfile_win_back.jsm > @@ +232,5 @@ > > > > + // Initialize Unicode conversion > > + let wstrdup = declareFFI("StrDup", ctypes.default_abi, > > + /*return*/ Types.char.out_ptr, > > + /*str*/ Types.char.in_ptr.releaseWith(WinFile.free)); > > Same comments from the Unix backend apply here. Right, this should have been StrDupW and jschar instead of char. Thanks for the spot.
Assignee | ||
Comment 13•12 years ago
|
||
(In reply to Nathan Froyd (:froydnj) from comment #11) > Comment on attachment 638991 [details] [diff] [review] > Native utilities > > Review of attachment 638991 [details] [diff] [review]: > ----------------------------------------------------------------- > > The FIXMEs wrt SetLastError/errno need to be corrected. SetLastError/errno > should be set at all returns (e.g. failing to get > nsICharsetConverterManager), too. Preferably so that you're not constantly > saying #if defined(XP_WIN). Will do. > I think this looks fairly sane, but after looking at the JS bits, I think we > may want to modify the C API. You mean for removing the outbound information? We can certainly remove it with the current set of features exposed by JS. I will readd what is missing once we start working on streams (which require more info). > > ::: toolkit/components/osfile/osfileutils.cpp > @@ +17,5 @@ > > +MOZ_EXPORT_API(void) osfile_ReleaseDecoder(nsIUnicodeDecoder* decoder) > > +{ > > + if (!decoder) { > > + return; > > + } > > You don't need this check with NS_IF_RELEASE. Ah, good to know. > > @@ +25,5 @@ > > +MOZ_EXPORT_API(void) osfile_ReleaseEncoder(nsIUnicodeEncoder* encoder) > > +{ > > + if (!encoder) { > > + return; > > + } > > Likewise. > > @@ +29,5 @@ > > + } > > + NS_IF_RELEASE(encoder); > > +} > > + > > +MOZ_EXPORT_API(nsIUnicodeEncoder*) osfile_GetEncoder(const char* encoding) > > Shared template functions for osfile_Get{Encoder,Decoder} and > osfile_{Encode,Decode} would be a worthwhile change, I think. Ok, will do. > > ::: toolkit/components/osfile/osfileutils.h > @@ +39,5 @@ > > + > > +/** > > + * Decode a sequence of bytes into a Unicode string. > > + * > > + * @param aDecode The decoder to use. > > "aDecoder" Well spotted :) > > @@ +45,5 @@ > > + * end of the output. > > + * @param aSource The sequence of bytes to decode. > > + * @param aSourceBytes (in) The number of bytes to decode. If NULL, > > + * the number of bytes will be |strlen(aSource)|. > > + * @param aSourceBytes (out) The number of bytes effectively decoded. > > You should label aSourceBytes as inout. Thanks. > > @@ +47,5 @@ > > + * @param aSourceBytes (in) The number of bytes to decode. If NULL, > > + * the number of bytes will be |strlen(aSource)|. > > + * @param aSourceBytes (out) The number of bytes effectively decoded. > > + * @param aOutChars (out) The number of Unicode chars in the output, > > + * not including the nul char. > > This parameter should have a more descriptive name; I was puzzled why it was > "aOutChars" with type int32_t*. Maybe aCharsWritten? Thanks for the suggestion, I'll see how that works out. > > @@ +58,5 @@ > > + int32_t* aOutChars); > > +/** > > + * Encode a complete Unicode string into a set of bytes. > > + * > > + * @return null in case of error, otherwise a nul-terminated string. > > Needs more documentation. Ok. > > @@ +62,5 @@ > > + * @return null in case of error, otherwise a nul-terminated string. > > + */ > > +MOZ_EXPORT_API(char*) osfile_Encode(nsIUnicodeEncoder* aEncoder, > > + bool aAppendNulToOutput, const PRUnichar* aSource, int32_t* aSourceChars, > > + int32_t* aOutBytes); > > Same comments about naming. Ok.
Comment 14•12 years ago
|
||
(In reply to David Rajchenbach Teller [:Yoric] from comment #12) > > There's some DOM efforts to expose real string encoding/decoding primitives > > going on; you may want to look at those. I don't have the bug #s > > immediately available. Might be some overlap, might not. > > I have heard rumours about that. However, until these primitives are > available (and better than that, available to all threads), I will need a > workaround. Indeed. > > I am skeptical about providing outBytes as a parameter; it seems like it'd > > be more useful as a return value. Then again, I'm not sure that it'd be > > really useful to the caller, either. (It would be useful for incremental > > encoding, but that's not supported here.) Have you written code calling > > encodeAll? Have you used the outBytes parameter? > > Actually, I have examples, but you are right, they are probably too > premature. Let's forget about them atm. > > > Actually, do we even need getEncoder/encodeAll pairings? Why not just > > provide: > > > > encode(encoding-name, js-string) => c-string > > > > and hide all the details? > > The main reason is symmetry: with an API that allocates the string, decoders > can be discarded after use, but encoders are stateful regardless, so I > cannot provide > > decode(encoding-name, c-string) => js-string > > If you think it wiser, I can provide asymmetric APIs for encoding and > decoding. This will be a little bit slower (due to no reuse of the encoder), > but possibly not by much. I am a fan of symmetry, but I do not understand the justification provided here. I do not see why providing the method you describe above is any different than: let d = getDecoder(encoding-name); return decodeAll(d, c-string, /*XXX param to be removed*/null); I am also not seeing what the statefulness argument re: encoders and decoders has to do with providing an easy API. Could you please elaborate? I agree that it would be nice to reuse encoders and decoders when possible. But you provide no way to get at nsIUnicode{Encoder,Decoder}.Reset, so arguing for reuse seems like a non-starter. (If you are going to argue for efficiency, then pushing for nsIUnicode{Encoder,Decoder} to be more fully exposed through this API is the way to go; the big win from that is incremental encoding/decoding of buffers. But it's not clear that we need that at the moment.)
Comment 15•12 years ago
|
||
(In reply to David Rajchenbach Teller [:Yoric] from comment #13) > > I think this looks fairly sane, but after looking at the JS bits, I think we > > may want to modify the C API. > > You mean for removing the outbound information? We can certainly remove it > with the current set of features exposed by JS. I will readd what is missing > once we start working on streams (which require more info). Yes, removing the outbound information. If it's actually going to be needed later, then I'd be OK with leaving it in. But I don't see a good reason to expose the outbound parameters all the way up through JS with an API like {encode,decode}All. Those two functions should just do the translation and sweep whatever messy details the C API requires under the rug. IMHO, there should be a separate API for doing incremental, stream-type encoding/decoding.
Comment 16•12 years ago
|
||
(In reply to Nathan Froyd (:froydnj) from comment #15) > IMHO, there should be a separate API for doing incremental, stream-type > encoding/decoding. Clarification: separate methods for doing incremental, stream-type encoding/decoding in OS.File alongside {encode,decode}All or whatever we wind up calling them.
Updated•12 years ago
|
Attachment #638991 -
Flags: review?(jorendorff)
Updated•12 years ago
|
Attachment #639064 -
Flags: review?(jorendorff)
Assignee | ||
Comment 17•12 years ago
|
||
(In reply to Nathan Froyd (:froydnj) from comment #16) > (In reply to Nathan Froyd (:froydnj) from comment #15) > > IMHO, there should be a separate API for doing incremental, stream-type > > encoding/decoding. > > Clarification: separate methods for doing incremental, stream-type > encoding/decoding in OS.File alongside {encode,decode}All or whatever we > wind up calling them. Yes, that's what I had in mind.
Assignee | ||
Comment 18•12 years ago
|
||
(In reply to Nathan Froyd (:froydnj) from comment #14) > > > Actually, do we even need getEncoder/encodeAll pairings? Why not just > > > provide: > > > > > > encode(encoding-name, js-string) => c-string > > > > > > and hide all the details? > > > > The main reason is symmetry: with an API that allocates the string, decoders > > can be discarded after use, but encoders are stateful regardless, so I > > cannot provide > > > > decode(encoding-name, c-string) => js-string > > > > If you think it wiser, I can provide asymmetric APIs for encoding and > > decoding. This will be a little bit slower (due to no reuse of the encoder), > > but possibly not by much. > > I am a fan of symmetry, but I do not understand the justification provided > here. I do not see why providing the method you describe above is any > different than: > > let d = getDecoder(encoding-name); > return decodeAll(d, c-string, /*XXX param to be removed*/null); > > > I am also not seeing what the statefulness argument re: encoders and > decoders has to do with providing an easy API. Could you please elaborate? The problem appears if c-string ends with a partial sequence. In this case, we need to reuse the same decoder with the next sequence. I grant you that it is unlikely that we end up with a nul-terminated partial sequence, so we can probably forget about this argument. > I agree that it would be nice to reuse encoders and decoders when possible. > But you provide no way to get at nsIUnicode{Encoder,Decoder}.Reset, so > arguing for reuse seems like a non-starter. (If you are going to argue for > efficiency, then pushing for nsIUnicode{Encoder,Decoder} to be more fully > exposed through this API is the way to go; the big win from that is > incremental encoding/decoding of buffers. But it's not clear that we need > that at the moment.) The idea was indeed to progressively publish more of nsIUnicode{Encoder,Decoder}, to allow progressive encoding/decoding of buffers/streams.
Assignee | ||
Comment 19•12 years ago
|
||
Attachment #638991 -
Attachment is obsolete: true
Attachment #642941 -
Flags: review?(nfroyd)
Assignee | ||
Comment 20•12 years ago
|
||
Here is a simplified version of the JS bindings. Until we decide whether to simplify it further.
Attachment #639064 -
Attachment is obsolete: true
Attachment #642942 -
Flags: review?(nfroyd)
Comment 21•12 years ago
|
||
(In reply to David Rajchenbach Teller (away until early August) [:Yoric] from comment #18) > > I am a fan of symmetry, but I do not understand the justification provided > > here. I do not see why providing the method you describe above is any > > different than: > > > > let d = getDecoder(encoding-name); > > return decodeAll(d, c-string, /*XXX param to be removed*/null); > > > > > > I am also not seeing what the statefulness argument re: encoders and > > decoders has to do with providing an easy API. Could you please elaborate? > > The problem appears if c-string ends with a partial sequence. In this case, > we need to reuse the same decoder with the next sequence. Ah. That is a good point and definitely a case we should handle. But I think that decodeAll should not be obligated to deal with this sort of thing; the caller should be responsible for providing fully-formed input. (A more incremental API should deal properly with this case.) In any event, I don't think the scenario you describe works with the current C support, since when Convert() fails, you just return nsnull rather than the partially decoded sequence. > > I agree that it would be nice to reuse encoders and decoders when possible. > > But you provide no way to get at nsIUnicode{Encoder,Decoder}.Reset, so > > arguing for reuse seems like a non-starter. (If you are going to argue for > > efficiency, then pushing for nsIUnicode{Encoder,Decoder} to be more fully > > exposed through this API is the way to go; the big win from that is > > incremental encoding/decoding of buffers. But it's not clear that we need > > that at the moment.) > > The idea was indeed to progressively publish more of > nsIUnicode{Encoder,Decoder}, to allow progressive encoding/decoding of > buffers/streams. Very good.
Assignee | ||
Comment 22•12 years ago
|
||
Actually, there is something wrong brewing here. For most encodings, applying |strlen| to a c buffer is absolutely useless. Rather, we need to return the length of the c buffer in chars from EncodeAll and take this length as argument in DecodeAll. My bad for only testing with utf-8.
Comment 23•12 years ago
|
||
(In reply to David Rajchenbach Teller (away until early August) [:Yoric] from comment #22) > For most encodings, applying |strlen| to a c buffer is absolutely useless. > Rather, we need to return the length of the c buffer in chars from EncodeAll > and take this length as argument in DecodeAll. Mmm, yes; strlen on a UTF-16 buffer is not going to end well. My bad for not noticing that >8-bit encodings do need lengths of buffers returned and I shouldn't have complained about encodeAll. Do you want to cancel the r? flags, then?
Assignee | ||
Comment 24•12 years ago
|
||
Here is a new version that does not expose encoders/decoders yet, but exposes number of bytes. As discussed previously, this means that the only way to read an encoded file is to first load it completely to memory, then convert everything to a string. Probably not a major deal for the scope intended (e.g. configuration files), since we are going to read them off-main thread.
Attachment #642941 -
Attachment is obsolete: true
Attachment #642941 -
Flags: review?(nfroyd)
Attachment #643414 -
Flags: review?(nfroyd)
Assignee | ||
Comment 25•12 years ago
|
||
Attachment #642942 -
Attachment is obsolete: true
Attachment #642942 -
Flags: review?(nfroyd)
Attachment #643415 -
Flags: review?(nfroyd)
Assignee | ||
Comment 26•12 years ago
|
||
Attachment #638989 -
Attachment is obsolete: true
Attachment #643416 -
Flags: review?(nfroyd)
Comment 27•12 years ago
|
||
Comment on attachment 642941 [details] [diff] [review] Native utilities Review of attachment 642941 [details] [diff] [review]: ----------------------------------------------------------------- Works for me. I think I'd like to see the EncodeAll length outparam changes and the osfile_ns_alloc question answered, though. ::: toolkit/components/osfile/osfileutils.cpp @@ +212,5 @@ > + // Convert, add trailing \0 > + > + rv = aDecoder->Convert(aSource, &srcBytes, dest.rwget(), &upperBoundChars); > + if (NS_FAILED(rv)) { > + error_wrong_sequence(); Nit: I think error_invalid_argument is more correct here. Also in osfile_EncodeAll. Then error_wrong_sequence can go away. @@ +234,5 @@ > + int32_t upperBoundBytes; > + nsresult rv = aEncoder->GetMaxLength(aSource, srcChars, &upperBoundBytes); > + if (NS_FAILED(rv)) { > + error_unexpected(); > + return nsnull; FWIW, GetMaxLength is specified to never fail (!). I think it'd be OK to turn this into an NS_ASSERTION (or the MOZ flavor-of-the-rewrite equivalent) that NS_SUCCEEDED(rv). You should also initialize upperBoundBytes properly, just in case. Then you can get rid of error_unexpected. Also in osfile_EncodeAll. (See how much easier this would have been with template functions? :) ::: toolkit/components/osfile/osfileutils.h @@ +8,5 @@ > +extern "C" { > + > +// Memory utilities > + > +MOZ_EXPORT_API(void*) osfile_ns_alloc(PRSize size); This doesn't look used in the current bindings. Is this another save-it-for-later sort of function? @@ +60,5 @@ > + * @param aDecoder The decoder to use. > + * @param aSource The C string to decode. Must be null-terminated. > + * > + * @return null in case of error, otherwise a sequence of Unicode chars, > + * representing |aSource|, optionally terminated with a nul-terminator. The optionally terminated language here is no longer necessary. @@ +61,5 @@ > + * @param aSource The C string to decode. Must be null-terminated. > + * > + * @return null in case of error, otherwise a sequence of Unicode chars, > + * representing |aSource|, optionally terminated with a nul-terminator. > + * This memory should be released with |free|. Point of order: this is not what the JS code does. Fix the docstring or the code, please. @@ +75,5 @@ > + * > + * @return null in case of error, otherwise a nul-terminated string. > + */ > +MOZ_EXPORT_API(char*) osfile_EncodeAll(nsIUnicodeEncoder* aEncoder, > + const PRUnichar* aSource); Later comments suggested that we should pass a length parameter out of here and out to JS.
Attachment #642941 -
Attachment is obsolete: false
Updated•12 years ago
|
Attachment #642941 -
Attachment is obsolete: true
Comment 28•12 years ago
|
||
Comment on attachment 643414 [details] [diff] [review] Native utilities, v2 Review of attachment 643414 [details] [diff] [review]: ----------------------------------------------------------------- Please see comment 27 for caveats; I think we mid-aired in adding/removing patches.
Attachment #643414 -
Flags: review?(nfroyd) → review+
Comment 29•12 years ago
|
||
Comment on attachment 643415 [details] [diff] [review] JS bindings v3 Review of attachment 643415 [details] [diff] [review]: ----------------------------------------------------------------- Looks good. I would like to see the encodeAll change before commit. ::: toolkit/components/osfile/osfile_shared.jsm @@ +689,5 @@ > + /** > + * A trimmed-down version of declareFFI that returns a raw js-ctypes > + * function, which does not translate its arguments. > + * > + * Used to declare finalizers. I am rs'ing this part; I assume you know what you are doing here. :) @@ +702,5 @@ > + abi = abi || ctypes.default_abi; > + if (Object.prototype.toString.call(abi) != "[object CABI]") { > + // Note: This is the only known manner of checking whether an object > + // is an abi. > + throw new TypeError("declareFFI expects as second argument an abi or null"); "declareFinalizer expects..." @@ +705,5 @@ > + // is an abi. > + throw new TypeError("declareFFI expects as second argument an abi or null"); > + } > + let signature = [symbol, abi]; > + let argtypes = []; argtypes is unused. Bug? @@ +773,5 @@ > + ctypes.default_abi, > + /*return*/ Types.char.out_ptr.releaseWith(NS_Free), > + /*encoding*/ Types.char.in_ptr, > + /*source*/ Types.jschar.in_ptr, > + /*bytes*/ Types.uint32_t.out_ptr); We should try to make the caller's job easy, though; just because the C code returns outparams doesn't mean we have to. Please make encodeAll return the allocated buffer and count as multiple values. I am uncertain about whether [buffer, count] or { buffer: ..., count: ... } is preferred style here; I think I lean towards the former. I realize this makes the API asymmetric. I don't have good ideas as to how to make them symmetric.
Attachment #643415 -
Flags: review?(nfroyd) → review+
Comment 30•12 years ago
|
||
Comment on attachment 643416 [details] [diff] [review] Companion testsuite Review of attachment 643416 [details] [diff] [review]: ----------------------------------------------------------------- Taras already r+'d this, so this is essentially a rubberstamp. I assume you can change the test appropriately if the API changes from the other patches. :) ::: toolkit/components/osfile/tests/mochi/worker_test_osfile_front.js @@ +55,5 @@ > + ok(true, "Starting test_unicode"); > + function test_go_round(encoding, sentence) { > + let bytes = new OS.Shared.Type.uint32_t.implementation(); > + let pBytes = bytes.address(); > + ok(true, "test_unicode: testing encoding of " + sentence + " with encoding " + encoding); Please use info() here instead of ok().
Attachment #643416 -
Flags: review?(nfroyd) → review+
Assignee | ||
Comment 31•12 years ago
|
||
Ok, this time I am sure that I have replied to your question but that the reply does not appear. Bugzilla seems to be eating my comments, so I will just post updated patches and reply later :)
Assignee | ||
Comment 32•12 years ago
|
||
While I definitely agree that EncodeAll is not client-friendly, I intended to address the issue in a followup patch. The idea here is to provide the lower-level bindings and to provide in a second patch higher-level bindings without outparams and with exception-handling, etc.
Attachment #643415 -
Attachment is obsolete: true
Attachment #644301 -
Flags: review+
Assignee | ||
Comment 33•12 years ago
|
||
(self-reviewing the makefile)
Attachment #644302 -
Flags: review+
Assignee | ||
Comment 34•12 years ago
|
||
Applied feedback.
Attachment #643414 -
Attachment is obsolete: true
Attachment #644303 -
Flags: review+
Assignee | ||
Comment 35•12 years ago
|
||
(In reply to Nathan Froyd (:froydnj) from comment #30) > Comment on attachment 643416 [details] [diff] [review] > Companion testsuite > > Review of attachment 643416 [details] [diff] [review]: > ----------------------------------------------------------------- > > Taras already r+'d this, so this is essentially a rubberstamp. I assume you > can change the test appropriately if the API changes from the other patches. > :) > > ::: toolkit/components/osfile/tests/mochi/worker_test_osfile_front.js > @@ +55,5 @@ > > + ok(true, "Starting test_unicode"); > > + function test_go_round(encoding, sentence) { > > + let bytes = new OS.Shared.Type.uint32_t.implementation(); > > + let pBytes = bytes.address(); > > + ok(true, "test_unicode: testing encoding of " + sentence + " with encoding " + encoding); > > Please use info() here instead of ok(). Note that this is a worker, so I had to redefine |ok| myself, I/O does not always work (etc.), so if you do not mind, I will keep this for a followup bug.
Assignee | ||
Comment 36•12 years ago
|
||
Attachment #643416 -
Attachment is obsolete: true
Attachment #644304 -
Flags: review+
Comment 37•12 years ago
|
||
(In reply to David Rajchenbach Teller (away until early August) [:Yoric] from comment #35) > Note that this is a worker, so I had to redefine |ok| myself, I/O does not > always work (etc.), so if you do not mind, I will keep this for a followup > bug. I totally missed the worker bit. I don't mind one way or the other, just a tidyness detail. (In reply to David Rajchenbach Teller (away until early August) [:Yoric] from comment #32) > While I definitely agree that EncodeAll is not client-friendly, I intended > to address the issue in a followup patch. The idea here is to provide the > lower-level bindings and to provide in a second patch higher-level bindings > without outparams and with exception-handling, etc. OK, that seems reasonable. Presumably the low-level {en,de}codeAll will either no longer be available or have their API changed, then?
Assignee | ||
Comment 38•12 years ago
|
||
(In reply to Nathan Froyd (:froydnj) from comment #37) > (In reply to David Rajchenbach Teller (away until early August) [:Yoric] > from comment #32) > > While I definitely agree that EncodeAll is not client-friendly, I intended > > to address the issue in a followup patch. The idea here is to provide the > > lower-level bindings and to provide in a second patch higher-level bindings > > without outparams and with exception-handling, etc. > > OK, that seems reasonable. Presumably the low-level {en,de}codeAll will > either no longer be available or have their API changed, then? That's the idea. Haven't quite decided the specifics yet.
Assignee | ||
Comment 39•12 years ago
|
||
Attachment #644303 -
Attachment is obsolete: true
Attachment #647159 -
Flags: review+
Assignee | ||
Comment 40•12 years ago
|
||
Attachment #647159 -
Attachment is obsolete: true
Attachment #650868 -
Flags: review+
Assignee | ||
Comment 41•12 years ago
|
||
Attachment #644301 -
Attachment is obsolete: true
Attachment #650869 -
Flags: review+
Assignee | ||
Comment 42•12 years ago
|
||
Attachment #644302 -
Attachment is obsolete: true
Attachment #650870 -
Flags: review+
Assignee | ||
Comment 43•12 years ago
|
||
Attachment #644304 -
Attachment is obsolete: true
Attachment #650871 -
Flags: review+
Assignee | ||
Updated•12 years ago
|
Keywords: checkin-needed
Assignee | ||
Comment 44•12 years ago
|
||
Try run: https://tbpl.mozilla.org/?tree=Try&rev=1135d39e54fc
Comment 45•12 years ago
|
||
(In reply to David Rajchenbach Teller from comment #44) > Try run: https://tbpl.mozilla.org/?tree=Try&rev=1135d39e54fc <3 https://hg.mozilla.org/integration/mozilla-inbound/rev/dccf9af0230e https://hg.mozilla.org/integration/mozilla-inbound/rev/fda9d490e006 https://hg.mozilla.org/integration/mozilla-inbound/rev/f096ccec3bb4 https://hg.mozilla.org/integration/mozilla-inbound/rev/aaa7993df9f6
Flags: in-testsuite+
Keywords: checkin-needed
Comment 46•12 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/dccf9af0230e https://hg.mozilla.org/mozilla-central/rev/fda9d490e006 https://hg.mozilla.org/mozilla-central/rev/f096ccec3bb4 https://hg.mozilla.org/mozilla-central/rev/aaa7993df9f6
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla17
You need to log in
before you can comment on or make changes to this bug.
Description
•