Closed Bug 957424 Opened 10 years ago Closed 7 years ago

/lib/sdk/io/buffer.js should support ASCII encoding option

Categories

(Add-on SDK Graveyard :: General, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: rodneyd.teal, Unassigned)

Details

User Agent: Mozilla/5.0 (Windows NT 5.1; rv:29.0) Gecko/20100101 Firefox/29.0 (Beta/Release)
Build ID: 20131228030204

Steps to reproduce:

http://pastebin.mozilla.org/3962845

// make a random string.
var charArray = Array.apply(null, Array(20)).map(() => String.fromCharCode(parseInt(Math.random() * 0xFF)+1));
var string = charArray.join('');

// construct a buffer using the string
var buffer = new Buffer(string); // will default to UTF-8 encoding option

console.log(buffer.length, buffer.byteLength); // would be 20, 20 if the ASCII encoding option was supported.

console.log(Buffer.byteLength(string));

// actual results vary and is unreliable not that it should be consistent with an ASCII only encoding
I think that StringView should replace TextEncoder and TextDecoder which would create support for the ASCII encoding.

StringView: https://developer.mozilla.org/en-US/Add-ons/Code_snippets/StringView

I've made the changes here: https://github.com/DecipherCode/FireBit/commit/87aa6dea4590bda0334ff2346232f8a06e9b8381#diff-b11a7397c4f97594328821e7e713ca5d

This seems to be the best solution.
StringView:

Its not a WebGL Standard as such its of no concern to Bug 889977. Although perhaps it should be as it depends upon ArrayBuffer and TypedArrays: https://www.khronos.org/registry/webgl/specs/1.0/#5.13

StringView has been proposed as strawman for ES6 on ECMAScript Bugs.

Depends: https://bugs.ecmascript.org/show_bug.cgi?id=1557
Its not in the ES6 draft at the moment: http://people.mozilla.org/~jorendorff/es6-draft.html

I'm waiting on a reply from it's developer about status and expected time frame.

Thoughts?
To quote Brendan Eich from a recent ES discussion regarding StringView: 
What can we do to make sure this thing stays dead, if it is dead?

That said StringView is not headed towards being available as a native Web API. TextEncoder and TextDecoder serve the Buffer API nicely with the exception regarding interpreting chars as ASCII encoded, bytes(0-255) code points. The problem is when the code point is > 127, you may end up with a two byte representation of that code point. The desired effect when specifying ASCII as the encoding argument is that code points in the range of a single byte(0-255) would be interpreted as a single byte value. i.e. 

let string = String.fromCharCode(255);
let buffer = new Buffer(string, 'ASCII');

// result
buffer.byteLength === 1 // true
buffer.length === 1 // true

I'm no longer certain that StringView is the best solution as far as efficiency goes.
A simple check for the encoding argument. -> if (encoding === 'ASCII') ....
Then:

string.split('').map((char) => String.charCodeAt(char)); 

Is all that is needed to create an array of bytes to construct the Buffer(Object) with.

The "buffer.toString()" method could do a similar check and conversion back to string.

Thoughts?
Flags: needinfo?(jsantell)
If that one liner can handle encoding/decoding in ASCII, sounds great! Of note, the tests for buffer were taken from node.js, so the ASCII tests should be reenabled in that case, and maybe pull out an encode/decode function (not exposed) where it uses Text[En|Dec]oder or the ascii specific line. Also should be added to 'isEncoding' if everything works out.
Flags: needinfo?(jsantell)

(In reply to Jordan Santell [:jsantell] [@jsantell] from comment #4)
> If that one liner can handle encoding/decoding in ASCII, sounds great! Of
> note, the tests for buffer were taken from node.js, so the ASCII tests
> should be reenabled in that case, and maybe pull out an encode/decode
> function (not exposed) where it uses Text[En|Dec]oder or the ascii specific
> line. Also should be added to 'isEncoding' if everything works out.

I think I need to be clear about the "ASCII" of which I'm referring to. ASCII which is a subset of UTF-8 will be handled properly when specifying the encoding as UTF-8. ISO 8859-1(Extended ASCII)(0-255) might be represented as more than one byte when the code point is (p > 127). If that is the proper behavior? I'm suggesting that when specifying ASCII as the encoding that each character should resolve to a single byte value. Why else would one need an ASCII encoding option when they could just use UTF-8?

That one liner will correctly convert a JS String to an Array of single byte values which can be passed to the Buffer(Uint8Array) constructor. It will not do the reverse. ;)

The """buffer.toString('ascii')""" method could do the reverse using the following line:

String.fromCharCode(...buffer); // <- rest parameter

The user of the buffer API would need to know that the string contains only ASCII(ISO 8859-1) before specifying ASCII(ISO 8859-1) as the encoding argument to expect consistent results between multiple conversions but that holds true for the other encodings as well, no?

I have no idea how node.js handles this or where to find their test code for comparison. I don't however see any other logical way of handling ASCII(ISO 8859-1). I'll send a pull request with the changes given the go ahead. If some tests need to be performed then please inform where they can be found.
Hmm, sounds good -- read up on some encoding issues to brush up. What you said makes sense, but would be worried about edge cases.

Our buffer tests are here -- 'ascii' would need to be added to the tests, and the original tests from node are here, which some ascii tests were removed or commented out.

https://github.com/mozilla/addon-sdk/blob/master/test/test-buffer.js
https://github.com/joyent/node/blob/master/test/simple/test-buffer.js

Node also has additional ascii-specific tests that could be useful:
https://github.com/joyent/node/blob/master/test/simple/test-buffer-ascii.js
https://bugzilla.mozilla.org/show_bug.cgi?id=1399562
Status: UNCONFIRMED → RESOLVED
Closed: 7 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.