Closed Bug 771094 Opened 12 years ago Closed 12 years ago

[OS.File] read/write strings with encodings

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: Yoric, Assigned: Yoric)

References

Details

Attachments

(4 files, 1 obsolete file)

First prototype 12 years ago David Teller [:Yoric] - still alive but not very active 3.84 KB, patch	taras.mozilla : feedback-	Details \| Diff \| Splinter Review
1. Read/write strings (sync version) 12 years ago David Teller [:Yoric] - still alive but not very active 8.61 KB, patch	froydnj : feedback+	Details \| Diff \| Splinter Review
2. Read/write strings (async version) 12 years ago David Teller [:Yoric] - still alive but not very active 4.69 KB, patch		Details \| Diff \| Splinter Review
3. Read/write strings (tests for async version) 12 years ago David Teller [:Yoric] - still alive but not very active 3.60 KB, patch		Details \| Diff \| Splinter Review
Possible API for Unicode conversion 12 years ago David Teller [:Yoric] - still alive but not very active 840 bytes, patch	froydnj : feedback+	Details \| Diff \| Splinter Review

David Teller [:Yoric] - still alive but not very active

Assignee

Description

•

12 years ago

      No description provided.

David Teller [:Yoric] - still alive but not very active

Assignee

Updated

•

12 years ago

Blocks: OS.File

Depends on: 769191, 770215

David Teller [:Yoric] - still alive but not very active

Assignee

Updated

•

12 years ago

Depends on: 766194

David Teller [:Yoric] - still alive but not very active

Assignee

Comment 1

•

12 years ago

Attached patch First prototype (obsolete) — Details — Splinter Review

Ok, here is a possible implementation of readString/writeString. For the moment, Unix only, but the code could actually be shared with Windows.
Taras, could you tell me what you think of it?

Assignee: nobody → dteller

Attachment #639352 - Flags: feedback?(taras.mozilla)

(dormant account)

Comment 2

•

12 years ago

Comment on attachment 639352 [details] [diff] [review]
First prototype

use string concatenation instead of .join.

instead of passing buffer size, pass a buffer to use directly in options. This abstraction feels like it should live outside of the file api. I'm not a fan of adding readString, readInt.
Conversion functions should live independent of IO.

Attachment #639352 - Flags: feedback?(taras.mozilla) → feedback-

David Teller [:Yoric] - still alive but not very active

Assignee

Comment 3

•

12 years ago

(In reply to Taras Glek (:taras) from comment #2)
> Comment on attachment 639352 [details] [diff] [review]
> First prototype
> 
> use string concatenation instead of .join.

Really? Algorithmically, this sounds scary: concatenation is generally quadratic, while .join can be implemented linearly (I hope it is, I haven't checked), or at least O(n.log(n)).

> 
> instead of passing buffer size, pass a buffer to use directly in options.
> This abstraction feels like it should live outside of the file api. I'm not
> a fan of adding readString, readInt.
> Conversion functions should live independent of IO.

Ok. Perhaps we should open a bug for |OS.Stream| or something such.

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → WONTFIX

David Teller [:Yoric] - still alive but not very active

Assignee

Updated

•

12 years ago

Blocks: 794091

Nathan Froyd [:froydnj]

Updated

•

12 years ago

Component: Networking: File → OS.File

Product: Core → Toolkit

David Teller [:Yoric] - still alive but not very active

Assignee

Comment 4

•

12 years ago

So, as expected, I now need Unicode conversion for bug 794091 et al. In all these bugs, we pretty much have the same scenario: take a string, convert it to Unicode, write it asynchronously to the disk. Since, with the current API, the process is clumsy and segfault-prone, I prefer defining a streamlined version.

Status: RESOLVED → REOPENED

Resolution: WONTFIX → ---

Summary: [OS.File] readString, writeString → [OS.File] read/write strings with encodings

David Teller [:Yoric] - still alive but not very active

Assignee

Comment 5

•

12 years ago

Attached patch 1. Read/write strings (sync version) — Details — Splinter Review

Attachment #639352 - Attachment is obsolete: true

Attachment #665443 - Flags: review?(nfroyd)

David Teller [:Yoric] - still alive but not very active

Assignee

Comment 6

•

12 years ago

Attached patch 2. Read/write strings (async version) — Details — Splinter Review

Attachment #665444 - Flags: review?(nfroyd)

David Teller [:Yoric] - still alive but not very active

Assignee

Comment 7

•

12 years ago

Attached patch 3. Read/write strings (tests for async version) — Details — Splinter Review

Attachment #665445 - Flags: review?(nfroyd)

Nathan Froyd [:froydnj]

Comment 8

•

12 years ago

Comment on attachment 665443 [details] [diff] [review]
1. Read/write strings (sync version)

Review of attachment 665443 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/osfile/osfile_shared_front.jsm
@@ +157,5 @@
> +    let root;
> +    if (typeof buffer == "string") {
> +      let encoding = options.encoding || "utf-8";
> +      let bytesC = new Type.uint32_t.implementation(0);
> +      root = OS.Shared.Utils.Strings.encodeAll(encoding, buffer, bytesC.address());

Yuck. :)

Let's make |bytesC| more explicit, like |nBytes| or |byteCount|.

@@ +170,4 @@
>  
>      let pos = 0;
>      while (pos < bytes) {
> +      LOG("write", "writing at position " + pos);

This is controlled by a flag somewhere, correct?

@@ +374,5 @@
> + * - {number} bytes If unspecified, read all the remaining bytes from
> + * this file. If specified, read |bytes| bytes, or less if the file does not
> + * contain that many bytes.
> + * - {string} encoding The name of an encoding to use to decode the string.
> + * If unspecified, use "utf-8".

I'm not sure this is the right API.  Saying that we return a string and count in bytes doesn't fit together very well.

I think you should drop the |bytes| option and just declare that we read the whole file for now.  That's the biggest use case you need to solve here, yes?  We can revisit specifying an amount later.

@@ +399,5 @@
>   * Important note: In the current implementation, option |tmpPath|
>   * is required. This requirement should disappear as part of bug 793660.
>   *
>   * @param {string} path The path of the file to modify.
> + * @param {ArrayBuffer|String} buffer A buffer containing the bytes to write.

Nit: you use |string| for the type everywhere else.  Please do so here.

Nit2: please be consistent in adding spaces around | (you have spaces in the docs above).

Attachment #665443 - Flags: review?(nfroyd) → feedback+

Nathan Froyd [:froydnj]

Comment 9

•

12 years ago

Comment on attachment 665443 [details] [diff] [review]
1. Read/write strings (sync version)

Review of attachment 665443 [details] [diff] [review]:
-----------------------------------------------------------------

Also, having read through previous comments, what do you think about the idea of moving this out of .File, since it has nothing to do with OS-level I/O?

::: toolkit/components/osfile/osfile_shared_front.jsm
@@ +15,5 @@
>  (function(exports) {
>  
>  let LOG = exports.OS.Shared.LOG.bind(OS.Shared, "Shared front-end");
> +let Type = OS.Shared.Type;
> +let Strings = OS.Shared.Utils.Strings;

If you're going to define this, you should use it in the code that you add. :)

David Teller [:Yoric] - still alive but not very active

Assignee

Comment 10

•

12 years ago

(In reply to Nathan Froyd (:froydnj) from comment #9)
> Comment on attachment 665443 [details] [diff] [review]
> 1. Read/write strings (sync version)
> 
> Review of attachment 665443 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Also, having read through previous comments, what do you think about the
> idea of moving this out of .File, since it has nothing to do with OS-level
> I/O?

Well, my current problem is that |encodeAll| leaks abstractions, insofar as it requires a C pointer (that can be wrapped in a nicer API) but also returns a C pointer (that's more annoying), which needs to be deallocated, which in turns requires the user to understand the finer points of CDataFinalizer, etc.

Now, we could try and do something smarter with |encodeAll| to ensure that it returns an ArrayBuffer. I believe that I now see how we could do that without additional copies.

> 
> ::: toolkit/components/osfile/osfile_shared_front.jsm
> @@ +15,5 @@
> >  (function(exports) {
> >  
> >  let LOG = exports.OS.Shared.LOG.bind(OS.Shared, "Shared front-end");
> > +let Type = OS.Shared.Type;
> > +let Strings = OS.Shared.Utils.Strings;
> 
> If you're going to define this, you should use it in the code that you add.
> :)

:)

David Teller [:Yoric] - still alive but not very active

Assignee

Comment 11

•

12 years ago

Attached patch Possible API for Unicode conversion — Details — Splinter Review

If we wish to to keep it out of OS.File, let's try another style, then.

Attachment #665913 - Flags: feedback?(nfroyd)

Nathan Froyd [:froydnj]

Comment 12

•

12 years ago

Comment on attachment 665444 [details] [diff] [review]
2. Read/write strings (async version)

Review of attachment 665444 [details] [diff] [review]:
-----------------------------------------------------------------

Un-r?-ing until we get interfaces straightened out.

Attachment #665444 - Flags: review?(nfroyd)

Nathan Froyd [:froydnj]

Comment 13

•

12 years ago

Comment on attachment 665445 [details] [diff] [review]
3. Read/write strings (tests for async version)

Review of attachment 665445 [details] [diff] [review]:
-----------------------------------------------------------------

Un-r?-ing until we get interfaces straightened out.

Attachment #665445 - Flags: review?(nfroyd)

Nathan Froyd [:froydnj]

Comment 14

•

12 years ago

(In reply to David Rajchenbach Teller [:Yoric] from comment #10)
> (In reply to Nathan Froyd (:froydnj) from comment #9)
> > Comment on attachment 665443 [details] [diff] [review]
> > 1. Read/write strings (sync version)
> > 
> > Review of attachment 665443 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > Also, having read through previous comments, what do you think about the
> > idea of moving this out of .File, since it has nothing to do with OS-level
> > I/O?
> 
> Well, my current problem is that |encodeAll| leaks abstractions, insofar as
> it requires a C pointer (that can be wrapped in a nicer API) but also
> returns a C pointer (that's more annoying), which needs to be deallocated,
> which in turns requires the user to understand the finer points of
> CDataFinalizer, etc.
> 
> Now, we could try and do something smarter with |encodeAll| to ensure that
> it returns an ArrayBuffer. I believe that I now see how we could do that
> without additional copies.

The usual solution to avoiding copying in something like this is to have an EncodedLength function or similar that tells you how long a given run of characters would be in bytes (resp. DecodedLength, bytes/characters).  I don't know if we have those facilities available.  If you have something more clever than that, that'd be good too.

Nathan Froyd [:froydnj]

Comment 15

•

12 years ago

Comment on attachment 665913 [details] [diff] [review]
Possible API for Unicode conversion

>/**
>  * @param {string} encoding The name of the encoding (e.g, âutf-8â)
>  *
>  * @constructor
>  */
>OS.Shared.Encoding = function Encoding(encoding) {
>};
>
>OS.Shared.Encoding.prototype = {
>   /**
>     * Encode a string to an ArrayBuffer in the current encoding
>     *
>     * @param {String | ArrayBuffer | C pointer} text
>     * @param {object=} options
>     * - {number} chars
>     * - {number} offset //in chars
>     *
>     * @return  {{buffer: ArrayBuffer, bytes: number}}
>     */
>   encode: function(text) {
>   },
>   /**
>     * Decode a buffer to a string using the current encoding
>     *
>     * @param {ArrayBuffer | C pointer} buffer
>     * @param {object=} options
>     * - {number} bytes
>     * - {number} offset //in bytes
>     *
>     * @return {string}
>     */
>   decode: function(buffer) {
>   },
>   reset: function() {
>   }
>};

Attachment #665913 - Flags: feedback?(nfroyd) → feedback+

Nathan Froyd [:froydnj]

Comment 16

•

12 years ago

Gee, thanks, Bugzilla, for throwing away my comments there.

What is the reset() function for?

I think you are asking for trouble to pass ArrayBuffers and C pointers to encode().  Even though the documentation says the options are in chars, people are going to pass bytes, which is right, except for all those time that it isn't.

David Teller [:Yoric] - still alive but not very active

Assignee

Comment 17

•

12 years ago

Ok, let's step back for a second.
Bug 764234 introduces the string encoding API. The only missing feature is that it returns a TypedArray rather than an ArrayBuffer, so we can't feed it to js-ctypes yet.

Now:
- returning a TypedArray rather than an ArrayBuffer makes lots of sense, and this is probably what we should do instead of returning {{buffer: ArrayBuffer, bytes: number}};
- we have discussed feeding TypedArray to js-ctypes, and I have just opened bug 795505 along these lines.

So let me suggest the following:
- I will work on bug 795505;
- once I have landed this, replace {{buffer: ArrayBuffer, bytes: number}} with |Uint8Array| and accept TypedArray wherever we currently accept ArrayBuffer;
- once both are done, if the patch for bug 764234 has stuck, our work in this bug is done.

David Teller [:Yoric] - still alive but not very active

Assignee

Updated

•

12 years ago

Depends on: 795505

David Teller [:Yoric] - still alive but not very active

Assignee

Updated

•

12 years ago

Depends on: 781615

David Teller [:Yoric] - still alive but not very active

Assignee

Updated

•

12 years ago

Status: REOPENED → RESOLVED

Closed: 12 years ago → 12 years ago

Resolution: --- → FIXED

BMO Automation

Updated

•

11 months ago

Product: Toolkit → Toolkit Graveyard

You need to log in before you can comment on or make changes to this bug.