Closed Bug 771094 Opened 9 years ago Closed 8 years ago

[OS.File] read/write strings with encodings

Categories

(Toolkit :: OS.File, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: Yoric, Assigned: Yoric)

References

Details

Attachments

(4 files, 1 obsolete file)

No description provided.
Attached patch First prototype (obsolete) — Splinter Review
Ok, here is a possible implementation of readString/writeString. For the moment, Unix only, but the code could actually be shared with Windows.
Taras, could you tell me what you think of it?
Assignee: nobody → dteller
Attachment #639352 - Flags: feedback?(taras.mozilla)
Comment on attachment 639352 [details] [diff] [review]
First prototype

use string concatenation instead of .join.

instead of passing buffer size, pass a buffer to use directly in options. This abstraction feels like it should live outside of the file api. I'm not a fan of adding readString, readInt.
Conversion functions should live independent of IO.
Attachment #639352 - Flags: feedback?(taras.mozilla) → feedback-
(In reply to Taras Glek (:taras) from comment #2)
> Comment on attachment 639352 [details] [diff] [review]
> First prototype
> 
> use string concatenation instead of .join.

Really? Algorithmically, this sounds scary: concatenation is generally quadratic, while .join can be implemented linearly (I hope it is, I haven't checked), or at least O(n.log(n)).

> 
> instead of passing buffer size, pass a buffer to use directly in options.
> This abstraction feels like it should live outside of the file api. I'm not
> a fan of adding readString, readInt.
> Conversion functions should live independent of IO.

Ok. Perhaps we should open a bug for |OS.Stream| or something such.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
Component: Networking: File → OS.File
Product: Core → Toolkit
So, as expected, I now need Unicode conversion for bug 794091 et al. In all these bugs, we pretty much have the same scenario: take a string, convert it to Unicode, write it asynchronously to the disk. Since, with the current API, the process is clumsy and segfault-prone, I prefer defining a streamlined version.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Summary: [OS.File] readString, writeString → [OS.File] read/write strings with encodings
Attachment #639352 - Attachment is obsolete: true
Attachment #665443 - Flags: review?(nfroyd)
Comment on attachment 665443 [details] [diff] [review]
1. Read/write strings (sync version)

Review of attachment 665443 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/osfile/osfile_shared_front.jsm
@@ +157,5 @@
> +    let root;
> +    if (typeof buffer == "string") {
> +      let encoding = options.encoding || "utf-8";
> +      let bytesC = new Type.uint32_t.implementation(0);
> +      root = OS.Shared.Utils.Strings.encodeAll(encoding, buffer, bytesC.address());

Yuck. :)

Let's make |bytesC| more explicit, like |nBytes| or |byteCount|.

@@ +170,4 @@
>  
>      let pos = 0;
>      while (pos < bytes) {
> +      LOG("write", "writing at position " + pos);

This is controlled by a flag somewhere, correct?

@@ +374,5 @@
> + * - {number} bytes If unspecified, read all the remaining bytes from
> + * this file. If specified, read |bytes| bytes, or less if the file does not
> + * contain that many bytes.
> + * - {string} encoding The name of an encoding to use to decode the string.
> + * If unspecified, use "utf-8".

I'm not sure this is the right API.  Saying that we return a string and count in bytes doesn't fit together very well.

I think you should drop the |bytes| option and just declare that we read the whole file for now.  That's the biggest use case you need to solve here, yes?  We can revisit specifying an amount later.

@@ +399,5 @@
>   * Important note: In the current implementation, option |tmpPath|
>   * is required. This requirement should disappear as part of bug 793660.
>   *
>   * @param {string} path The path of the file to modify.
> + * @param {ArrayBuffer|String} buffer A buffer containing the bytes to write.

Nit: you use |string| for the type everywhere else.  Please do so here.

Nit2: please be consistent in adding spaces around | (you have spaces in the docs above).
Attachment #665443 - Flags: review?(nfroyd) → feedback+
Comment on attachment 665443 [details] [diff] [review]
1. Read/write strings (sync version)

Review of attachment 665443 [details] [diff] [review]:
-----------------------------------------------------------------

Also, having read through previous comments, what do you think about the idea of moving this out of .File, since it has nothing to do with OS-level I/O?

::: toolkit/components/osfile/osfile_shared_front.jsm
@@ +15,5 @@
>  (function(exports) {
>  
>  let LOG = exports.OS.Shared.LOG.bind(OS.Shared, "Shared front-end");
> +let Type = OS.Shared.Type;
> +let Strings = OS.Shared.Utils.Strings;

If you're going to define this, you should use it in the code that you add. :)
(In reply to Nathan Froyd (:froydnj) from comment #9)
> Comment on attachment 665443 [details] [diff] [review]
> 1. Read/write strings (sync version)
> 
> Review of attachment 665443 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Also, having read through previous comments, what do you think about the
> idea of moving this out of .File, since it has nothing to do with OS-level
> I/O?

Well, my current problem is that |encodeAll| leaks abstractions, insofar as it requires a C pointer (that can be wrapped in a nicer API) but also returns a C pointer (that's more annoying), which needs to be deallocated, which in turns requires the user to understand the finer points of CDataFinalizer, etc.

Now, we could try and do something smarter with |encodeAll| to ensure that it returns an ArrayBuffer. I believe that I now see how we could do that without additional copies.

> 
> ::: toolkit/components/osfile/osfile_shared_front.jsm
> @@ +15,5 @@
> >  (function(exports) {
> >  
> >  let LOG = exports.OS.Shared.LOG.bind(OS.Shared, "Shared front-end");
> > +let Type = OS.Shared.Type;
> > +let Strings = OS.Shared.Utils.Strings;
> 
> If you're going to define this, you should use it in the code that you add.
> :)

:)
If we wish to to keep it out of OS.File, let's try another style, then.
Attachment #665913 - Flags: feedback?(nfroyd)
Comment on attachment 665444 [details] [diff] [review]
2. Read/write strings (async version)

Review of attachment 665444 [details] [diff] [review]:
-----------------------------------------------------------------

Un-r?-ing until we get interfaces straightened out.
Attachment #665444 - Flags: review?(nfroyd)
Comment on attachment 665445 [details] [diff] [review]
3. Read/write strings (tests for async version)

Review of attachment 665445 [details] [diff] [review]:
-----------------------------------------------------------------

Un-r?-ing until we get interfaces straightened out.
Attachment #665445 - Flags: review?(nfroyd)
(In reply to David Rajchenbach Teller [:Yoric] from comment #10)
> (In reply to Nathan Froyd (:froydnj) from comment #9)
> > Comment on attachment 665443 [details] [diff] [review]
> > 1. Read/write strings (sync version)
> > 
> > Review of attachment 665443 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > Also, having read through previous comments, what do you think about the
> > idea of moving this out of .File, since it has nothing to do with OS-level
> > I/O?
> 
> Well, my current problem is that |encodeAll| leaks abstractions, insofar as
> it requires a C pointer (that can be wrapped in a nicer API) but also
> returns a C pointer (that's more annoying), which needs to be deallocated,
> which in turns requires the user to understand the finer points of
> CDataFinalizer, etc.
> 
> Now, we could try and do something smarter with |encodeAll| to ensure that
> it returns an ArrayBuffer. I believe that I now see how we could do that
> without additional copies.

The usual solution to avoiding copying in something like this is to have an EncodedLength function or similar that tells you how long a given run of characters would be in bytes (resp. DecodedLength, bytes/characters).  I don't know if we have those facilities available.  If you have something more clever than that, that'd be good too.
Comment on attachment 665913 [details] [diff] [review]
Possible API for Unicode conversion

>/**
>  * @param {string} encoding The name of the encoding (e.g, âutf-8â)
>  *
>  * @constructor
>  */
>OS.Shared.Encoding = function Encoding(encoding) {
>};
>
>OS.Shared.Encoding.prototype = {
>   /**
>     * Encode a string to an ArrayBuffer in the current encoding
>     *
>     * @param {String | ArrayBuffer | C pointer} text
>     * @param {object=} options
>     * - {number} chars
>     * - {number} offset //in chars
>     *
>     * @return  {{buffer: ArrayBuffer, bytes: number}}
>     */
>   encode: function(text) {
>   },
>   /**
>     * Decode a buffer to a string using the current encoding
>     *
>     * @param {ArrayBuffer | C pointer} buffer
>     * @param {object=} options
>     * - {number} bytes
>     * - {number} offset //in bytes
>     *
>     * @return {string}
>     */
>   decode: function(buffer) {
>   },
>   reset: function() {
>   }
>};
Attachment #665913 - Flags: feedback?(nfroyd) → feedback+
Gee, thanks, Bugzilla, for throwing away my comments there.

What is the reset() function for?

I think you are asking for trouble to pass ArrayBuffers and C pointers to encode().  Even though the documentation says the options are in chars, people are going to pass bytes, which is right, except for all those time that it isn't.
Ok, let's step back for a second.
Bug 764234 introduces the string encoding API. The only missing feature is that it returns a TypedArray rather than an ArrayBuffer, so we can't feed it to js-ctypes yet.

Now:
- returning a TypedArray rather than an ArrayBuffer makes lots of sense, and this is probably what we should do instead of returning {{buffer: ArrayBuffer, bytes: number}};
- we have discussed feeding TypedArray to js-ctypes, and I have just opened bug 795505 along these lines.

So let me suggest the following:
- I will work on bug 795505;
- once I have landed this, replace {{buffer: ArrayBuffer, bytes: number}} with |Uint8Array| and accept TypedArray wherever we currently accept ArrayBuffer;
- once both are done, if the patch for bug 764234 has stuck, our work in this bug is done.
Status: REOPENED → RESOLVED
Closed: 9 years ago8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.