Last Comment Bug 764234 - String Encoding Decoding API
: String Encoding Decoding API
Status: RESOLVED FIXED
[LOE:M] [WebAPI:P2]
: dev-doc-complete, feature
Product: Core
Classification: Components
Component: DOM (show other bugs)
: Trunk
: All All
: -- normal (vote)
: mozilla18
Assigned To: Masatoshi Kimura [:emk]
:
Mentors:
http://wiki.whatwg.org/wiki/StringEnc...
Depends on: 764277 767930 779364 781046 782412 782721 795542 795544 801487 820373 825195
Blocks: 745095 768697 790855
  Show dependency treegraph
 
Reported: 2012-06-12 18:28 PDT by bsurender
Modified: 2013-08-21 10:06 PDT (History)
20 users (show)
mounir: in‑testsuite+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
-


Attachments
String encoding decoding patch (12.75 KB, patch)
2012-06-12 18:28 PDT, bsurender
no flags Details | Diff | Review
Text decoder implementation & unit tests. (63.20 KB, patch)
2012-08-01 00:20 PDT, bsurender
no flags Details | Diff | Review
Decoder complete. Tests complete. Ready for review. (74.90 KB, patch)
2012-08-03 19:46 PDT, bsurender
dougt: review-
Details | Diff | Review
Bug 763234 - Implement Text Encoder per the http://wiki.whatwg.org/wiki/StringEncoding StringEncoding API. (34.88 KB, patch)
2012-08-10 15:46 PDT, bsurender
no flags Details | Diff | Review
Bug 763234 - Implement Text Encoder per the http://wiki.whatwg.org/wiki/StringEncoding StringEncoding API. (35.28 KB, patch)
2012-08-10 16:25 PDT, bsurender
no flags Details | Diff | Review
String API, encoder + decoder patch. (101.53 KB, patch)
2012-08-15 23:56 PDT, bsurender
dougt: review-
Details | Diff | Review
Encoder + decoder patch with improvements. (103.93 KB, patch)
2012-08-17 23:12 PDT, bsurender
VYV03354: review-
Details | Diff | Review
diff of latest review corrections to the entire api. (32.68 KB, patch)
2012-08-17 23:27 PDT, bsurender
no flags Details | Diff | Review
Fix fatal problems of the patch (1.61 KB, patch)
2012-08-18 07:36 PDT, Masatoshi Kimura [:emk]
no flags Details | Diff | Review
{streaming: true} tests (1.55 KB, patch)
2012-08-18 07:39 PDT, Masatoshi Kimura [:emk]
no flags Details | Diff | Review
Update bsurender's patch to tip (103.96 KB, patch)
2012-09-02 18:48 PDT, Masatoshi Kimura [:emk]
no flags Details | Diff | Review
Implement StringEncoding API (140.54 KB, patch)
2012-09-02 18:49 PDT, Masatoshi Kimura [:emk]
no flags Details | Diff | Review
Part 2: Import tests from StringEncoding shim (1.04 MB, patch)
2012-09-02 18:50 PDT, Masatoshi Kimura [:emk]
dougt: review+
Details | Diff | Review
Incorporate the imported test into our tree and updated some tests to the latest spec (9.98 KB, patch)
2012-09-02 18:51 PDT, Masatoshi Kimura [:emk]
no flags Details | Diff | Review
Fix tests for multi-byte encodings (2.80 KB, patch)
2012-09-02 18:51 PDT, Masatoshi Kimura [:emk]
dougt: review+
Details | Diff | Review
Update bsurender's patch to tip (103.95 KB, patch)
2012-09-08 03:45 PDT, Masatoshi Kimura [:emk]
no flags Details | Diff | Review
Implement StringEncoding API (140.94 KB, patch)
2012-09-08 03:47 PDT, Masatoshi Kimura [:emk]
no flags Details | Diff | Review
Implement StringEncoding API (141.04 KB, patch)
2012-09-08 04:56 PDT, Masatoshi Kimura [:emk]
no flags Details | Diff | Review
Implement StringEncoding API (97.42 KB, patch)
2012-09-12 15:37 PDT, Masatoshi Kimura [:emk]
no flags Details | Diff | Review
Implement StringEncoding API (87.90 KB, patch)
2012-09-13 04:45 PDT, Masatoshi Kimura [:emk]
no flags Details | Diff | Review
Interdiff from the previous patch (for ease of review) (38.89 KB, patch)
2012-09-13 04:45 PDT, Masatoshi Kimura [:emk]
no flags Details | Diff | Review
Implement StringEncoding API (87.96 KB, patch)
2012-09-13 08:07 PDT, Masatoshi Kimura [:emk]
no flags Details | Diff | Review
Implement StringEncoding API (87.98 KB, patch)
2012-09-16 03:15 PDT, Masatoshi Kimura [:emk]
dougt: review+
Details | Diff | Review
Interdiff (15.24 KB, patch)
2012-09-16 03:16 PDT, Masatoshi Kimura [:emk]
no flags Details | Diff | Review
Incorporate the imported test into our tree and updated some tests to the latest spec (9.65 KB, patch)
2012-09-16 03:17 PDT, Masatoshi Kimura [:emk]
dougt: review+
Details | Diff | Review
Implement StringEncoding API (87.91 KB, patch)
2012-09-18 16:18 PDT, Masatoshi Kimura [:emk]
smontagu: review+
Details | Diff | Review
Part 3: Incorporate the imported test into our tree and updated some tests to the latest spec. r=doug (9.71 KB, patch)
2012-09-18 17:00 PDT, Masatoshi Kimura [:emk]
VYV03354: review+
Details | Diff | Review
Part 4: Fix tests for multi-byte encodings. r=doug (3.25 KB, patch)
2012-09-18 17:01 PDT, Masatoshi Kimura [:emk]
VYV03354: review+
Details | Diff | Review
Part 1: Implement StringEncoding API. r=doug, smontagu (88.12 KB, patch)
2012-09-27 08:39 PDT, Masatoshi Kimura [:emk]
VYV03354: review+
Details | Diff | Review
Part 1: Implement StringEncoding API. r=doug, smontagu (88.12 KB, patch)
2012-09-27 09:18 PDT, Masatoshi Kimura [:emk]
VYV03354: review+
Details | Diff | Review
Part 1: Implement StringEncoding API. r=doug, smontagu (88.08 KB, patch)
2012-09-27 10:18 PDT, Masatoshi Kimura [:emk]
VYV03354: review+
Details | Diff | Review

Description bsurender 2012-06-12 18:28:56 PDT
Created attachment 632490 [details] [diff] [review]
String encoding decoding patch

http://wiki.whatwg.org/wiki/StringEncoding

Currently failing to call the TextEncoder constructor from the JS .html test.
Comment 1 Boris Zbarsky [:bz] (Out June 25-July 6) 2012-06-12 20:32:40 PDT
So the issue is that registering webidl bindings with the script namespace manager is as a no-op for names the script namespace manager doesn't already know about.

Peter, would it make sense to add entries in that code if there isn't an entry already?  Can we use one of the existing nametypes, or should we mint a new one just to be safe?
Comment 2 bsurender 2012-08-01 00:20:32 PDT
Created attachment 647863 [details] [diff] [review]
Text decoder implementation & unit tests.

BOM needs to be addressed.
Comment 3 Masatoshi Kimura [:emk] 2012-08-01 00:47:56 PDT
Comment on attachment 647863 [details] [diff] [review]
Text decoder implementation & unit tests.

> +#ifndef mozilla_dom_base_encodings_h__
Please do not include double underscore in the symbol. That's always reserved.

> +DOM_MSG_DEF(NS_ERROR_DOM_ENCODER_DECODER_ENCODING_FAILURE_ERR, "EncodingError.")
DOM4_MSG_DEF(EncodingError, "EncodingError.", NS_ERROR_DOM_ENCODER_DECODER_ENCODING_FAILURE_ERR)
and more informative messages would be preffered.

> +    xs->RegisterExceptionProvider(provider, NS_ERROR_MODULE_DOM_ENCODER_DECODER);
Is a new error module really required? You can just add messages under the DOM module.
Comment 4 bsurender 2012-08-01 14:50:38 PDT
(In reply to Masatoshi Kimura [:emk] from comment #3)
> Comment on attachment 647863 [details] [diff] [review]
> Text decoder implementation & unit tests.
> 
> > +#ifndef mozilla_dom_base_encodings_h__
> Please do not include double underscore in the symbol. That's always
> reserved.
> 
- Will do.

> > +DOM_MSG_DEF(NS_ERROR_DOM_ENCODER_DECODER_ENCODING_FAILURE_ERR, "EncodingError.")
> DOM4_MSG_DEF(EncodingError, "EncodingError.",
> NS_ERROR_DOM_ENCODER_DECODER_ENCODING_FAILURE_ERR)
> and more informative messages would be preffered.

- Will double check the API and then include a more informative message.
 
> > +    xs->RegisterExceptionProvider(provider, NS_ERROR_MODULE_DOM_ENCODER_DECODER);
> Is a new error module really required? You can just add messages under the
> DOM module.

- can do if the new error module is really not required.
Comment 5 Masatoshi Kimura [:emk] 2012-08-01 15:47:25 PDT
I don't see any reason to add a new module for each new API. SVG, XPath, DOMFile, and IndexedDB modules were present for a historical reason (see bug 730161 and bug 743888). I didn't remove them because I do not want to change nsresult values as much as possible.
Comment 6 bsurender 2012-08-02 23:50:45 PDT
(In reply to Masatoshi Kimura [:emk] from comment #5)
> I don't see any reason to add a new module for each new API. SVG, XPath,
> DOMFile, and IndexedDB modules were present for a historical reason (see bug
> 730161 and bug 743888). I didn't remove them because I do not want to change
> nsresult values as much as possible.

ok. Can do.
Thanks for the pointers.
Comment 7 bsurender 2012-08-03 19:46:05 PDT
Created attachment 648933 [details] [diff] [review]
Decoder complete. Tests complete. Ready for review.

Decoder complete. Tests complete.
Comment 8 Masatoshi Kimura [:emk] 2012-08-06 00:31:41 PDT
Comment on attachment 648933 [details] [diff] [review]
Decoder complete. Tests complete. Ready for review.

>+#define NS_ERROR_DOM_ENCODER_DECODER_ERR         NS_ERROR_GENERATE_FAILURE(NS_ERROR_MODULE_DOM,26)
> 
> /* XXX Should be JavaScript native errors */
> 
> #define NS_ERROR_TYPE_ERR                        NS_ERROR_GENERATE_FAILURE(NS_ERROR_MODULE_DOM,26)
The nsresult value is duplicated...

>+    { 26, "EncoderDecoderError" }
>+  const unsigned short      ENCODER_DECODER_ERR            = 26;
>+  EncoderDecoderError        = nsIDOMDOMException::ENCODER_DECODER_ERR,
Don't define a new code (numeric value). Error code was deprecated in favor of error name. No new error code should never be defined.
Comment 9 Masatoshi Kimura [:emk] 2012-08-06 00:34:01 PDT
> No new error code should never be defined.
No new error code should ever be defined.
Just use zero for new errors.
Comment 10 :Ms2ger 2012-08-06 01:10:38 PDT
Comment on attachment 648933 [details] [diff] [review]
Decoder complete. Tests complete. Ready for review.

Review of attachment 648933 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/base/Encodings.cpp
@@ +61,5 @@
> +                               PRUint32 aLength,
> +                               nsAString& aRetval,
> +                               PRUint32& aOffset)
> +{
> +  aRetval.AssignASCII("");

Truncate()

::: dom/base/TextDecoder.cpp
@@ +9,5 @@
> +#include "nsContentUtils.h"
> +
> +using namespace mozilla::dom;
> +
> +mzTextDecoder::mzTextDecoder(nsISupports* aGlobal,

What's the mz for?

@@ +50,5 @@
> +nsresult
> +mzTextDecoder::Decode(const TextDecodeOptions& aOption,
> +                      const Optional<ArrayBufferView>& aView,
> +                      nsAString& aOutDecodedString,
> +                      ErrorResult& aRv)

You either have an ErrorResult& aRv outparam or an nsresult return value, not both

@@ +54,5 @@
> +                      ErrorResult& aRv)
> +{
> +  if (!aView.WasPassed()) {
> +    mFatal ? aRv.Throw(NS_ERROR_DOM_ENCODER_DECODER_ERR)
> +      : aRv.Throw(NS_ERROR_FAILURE);

You throw if fatal is false? That doesn't sound right.

@@ +61,5 @@
> +
> +  if (aView.Value().Length() <= 0) {
> +    mFatal ? aRv.Throw(NS_ERROR_DOM_ENCODER_DECODER_ERR)
> +      : aRv.Throw(NS_ERROR_FAILURE);
> +    return NS_OK;

Neither of these cases should throw, afaict.

@@ +72,5 @@
> +                                  aView.Value().Length(),
> +                                  BOMEncoding,
> +                                  offset);
> +
> +  if (mEncodingProvided && BOMEncoding.Length() &&

!BOMEncoding.IsEmpty()

@@ +80,5 @@
> +    return NS_OK;
> +  }
> +
> +  dataAfterOffset += offset;
> +  mConverter->SetCharset(NS_ConvertUTF16toUTF8(mEncoding).get());

Not sure if this is safe

@@ +81,5 @@
> +  }
> +
> +  dataAfterOffset += offset;
> +  mConverter->SetCharset(NS_ConvertUTF16toUTF8(mEncoding).get());
> +  nsresult  rv =

One less space

@@ +89,5 @@
> +
> +  // Setting streaming for next call.
> +  mStreaming = aOption.stream;
> +  if (!mStreaming) {
> +    mEncoding.Assign(NS_LITERAL_STRING("utf-8"));

AssignLiteral; but it's not clear to me what streaming has to do with utf-8

@@ +99,5 @@
> +nsresult
> +mzTextDecoder::GetEncoding(nsAString& aEncoding, ErrorResult& aRv)
> +{
> +  aEncoding.Assign(mEncoding);
> +  return NS_OK;

void
TextDecoder::GetEncoding(nsAString& aEncoding)
{
  aEncoding.Assign(mEncoding);
}

@@ +104,5 @@
> +}
> +
> +mzTextDecoder::~mzTextDecoder()
> +{
> +  mEncodings = nsnull;

nullptr

::: dom/base/TextDecoder.h
@@ +13,5 @@
> +#include "jsapi.h"
> +#include "mozilla/ErrorResult.h"
> +#include "mozilla/dom/BindingUtils.h"
> +#include "mozilla/dom/TextDecoderBinding.h"
> +#include "mozilla/dom/TextDecoder.h"

You're including this file into itself?

@@ +22,5 @@
> +#include "nsAutoPtr.h"
> +#include "nsCOMPtr.h"
> +#include "nsCycleCollectionParticipant.h"
> +
> +using namespace mozilla::dom;

No using statements in headers.

@@ +55,5 @@
> +  {
> +    const Optional<nsAString> decodeString;
> +    const TextDecoderOptions decoderOption;
> +    return Constructor(aGlobal, decoderOption, decodeString, aRv);
> +  }

Is this used?

@@ +76,5 @@
> +  }
> +
> +  bool GetDecodeOptions()
> +  {
> +      return false;

Indentation

::: dom/base/domerr.msg
@@ +29,5 @@
>  DOM4_MSG_DEF(QuotaExceededError, "The quota has been exceeded.", NS_ERROR_DOM_QUOTA_EXCEEDED_ERR)
>  DOM4_MSG_DEF(TimeoutError, "The operation timed out.", NS_ERROR_DOM_TIMEOUT_ERR)
>  DOM4_MSG_DEF(InvalidNodeTypeError, "The supplied node is incorrect or has an incorrect ancestor for this operation.", NS_ERROR_DOM_INVALID_NODE_TYPE_ERR)
>  DOM4_MSG_DEF(DataCloneError, "The object could not be cloned.", NS_ERROR_DOM_DATA_CLONE_ERR)
> +DOM4_MSG_DEF(EncoderDecoderError, "Encoder or Decoder failed.", NS_ERROR_DOM_ENCODER_DECODER_ERR)

EncodingError, per spec

::: dom/base/test/Makefile.in
@@ +19,5 @@
>    test_gsp-standards.html \
>    test_gsp-quirks.html \
>    test_nondomexception.html \
>    test_screen_orientation.html \
> +	test_TextDecoder.html \

No tabs

::: dom/base/test/test_bug764234_helpers.js
@@ +47,5 @@
> +  if (eRv === null) {
> +    ok(true, "Passed " + msg);
> +  } else {
> +    ok(false, "Failed " + msg);
> +  }

ise(eRv, null, msg)

::: dom/webidl/WebIDL.mk
@@ +14,5 @@
>    PerformanceTiming.webidl \
>    XMLHttpRequest.webidl \
>    XMLHttpRequestEventTarget.webidl \
>    XMLHttpRequestUpload.webidl \
> +	TextDecoder.webidl \

No tab
Comment 11 Doug Turner (:dougt) 2012-08-06 16:48:12 PDT
Comment on attachment 648933 [details] [diff] [review]
Decoder complete. Tests complete. Ready for review.

Review of attachment 648933 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/base/Encodings.cpp
@@ +1,2 @@
> +/* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */
> +/* vim: set sw=2 ts=2 et tw=78: */

do you really use vim?  if not, remove.

@@ +9,5 @@
> +#include "nsUnicharUtils.h"
> +
> +using namespace mozilla::dom;
> +
> +namespace {

why this unnamed namespace?

@@ +11,5 @@
> +using namespace mozilla::dom;
> +
> +namespace {
> +  Encodings* gEncodings = nsnull;
> +  const PRUint8 gUTF16Bytes[] = {0xFF, 0xFE};

a comment above gUTF16Bytes needed.  Explain that these are the headers to utf strings used to declare their type

@@ +13,5 @@
> +namespace {
> +  Encodings* gEncodings = nsnull;
> +  const PRUint8 gUTF16Bytes[] = {0xFF, 0xFE};
> +  const PRUint8 gUTF16BEBytes[] = {0xFE, 0xFF};
> +  const PRUint8 gUTF8Bytes[] = {0xEF, 0xBB, 0xBF};

if you only use these to setup your array, no need for them to be globals.  They should also be static too.

@@ +26,5 @@
> +  MOZ_ASSERT(!gEncodings);
> +  MOZ_ASSERT(NS_SUCCEEDED(PopulateEncodingsTable()),
> +             "Failed to populated hash table of labels and encodings.");
> +
> +  BOMEncodingTypes.AppendElement(BOMEncoding("utf-16", gUTF16Bytes));

So, you can heap allocate these....

Also... is there always a 1-1 mapping between the encoding and the encoding bytes?  In other words, will there ever be a utf-16 that points to two different byte prefixes?  If NO, then just use a hash table here that mapps the string to the encoding bytes.

@@ +34,5 @@
> +
> +Encodings::~Encodings()
> +{
> +  MOZ_ASSERT(gEncodings && gEncodings == this);
> +  mLabelsEncodings.Clear();

I do not think you need to actually call Clear() here.  It should be automatically cleared.

@@ +35,5 @@
> +Encodings::~Encodings()
> +{
> +  MOZ_ASSERT(gEncodings && gEncodings == this);
> +  mLabelsEncodings.Clear();
> +  gEncodings = nsnull;

I don't like how the Release of gEncodings and the nulling of gEcondings happens in two different places.  What you should do is ensure that these happen at the same time.

@@ +41,5 @@
> +
> +void
> +Encodings::ShutDown()
> +{
> +  NS_IF_RELEASE(gEncodings);

and set to null?

@@ +63,5 @@
> +                               PRUint32& aOffset)
> +{
> +  aRetval.AssignASCII("");
> +  aOffset = 0;
> +  if (aLength < 2) {

why <2 ?  Add a comment.

@@ +67,5 @@
> +  if (aLength < 2) {
> +    return;
> +  }
> +
> +  for (PRUint32 i = 0; i < BOMEncodingTypes.Length() - 1; i++) {

you could enumerate a hashtable in the same time.

@@ +92,5 @@
> +
> +nsresult
> +Encodings::FindEncodingForLabel(nsAString& aLabel, nsAString& aRvEncoding)
> +{
> +  aLabel.StripChar(PRUnichar(' '), 0);

add comment as to why you'd wnat to strip the white space here.

@@ +99,5 @@
> +  }
> +
> +  nsString* encoding;
> +  nsresult rv = mLabelsEncodings.Get(aLabel, &encoding);
> +  NS_ENSURE_SUCCESS(rv, rv);

if this fails, aRvEncoding will not be cleared.  Maybe you want to also truncate aRvEncoding when you enter this method?

@@ +100,5 @@
> +
> +  nsString* encoding;
> +  nsresult rv = mLabelsEncodings.Get(aLabel, &encoding);
> +  NS_ENSURE_SUCCESS(rv, rv);
> +  NS_ENSURE_ARG_POINTER(encoding);

you don't need NS_ENSURE_ARG_POINTER.

@@ +134,5 @@
> +  private:
> +    LabelEncoding();
> +  };
> +
> +  const LabelEncoding labelsEncodings[] = {LabelEncoding("unicode-1-1-utf-8", "utf-8"),

Does list change alot?  Will it change based on locale?  Is this the sort of thing that should be in a file (jar) that we read-in on use?



Also, i do not think you need to declare each LabelEncoding if your using structs.

 typedef struct {
    char* a, * b;
  } LabelEncoding;                                                              
                                                                                
  const LabelEncoding labelsEncodings[] = {                                     
    {"", ""},                                                                   
    {"", ""},                                                                   
    {"", ""},                                                                   
    {"", ""},                                                                   
    {"", ""}                                                                    
  };                                                                            

For example..

@@ +357,5 @@
> +  mLabelsEncodings.Init(numLabels);
> +
> +  for (PRUint32 i = 0; i < numLabels; i++) {
> +    nsString key;
> +    key.AssignASCII(labelsEncodings[i].mLabel);

why not just keep the strings above in unicode (wide), then you don't need to convert here.

::: dom/base/Encodings.h
@@ +12,5 @@
> +
> +using namespace mozilla;
> +
> +namespace mozilla {
> +  namespace dom {

don't indent this line.

@@ +19,5 @@
> +{
> +public:
> +  NS_INLINE_DECL_THREADSAFE_REFCOUNTING(Encodings)
> +  static already_AddRefed<Encodings> GetOrCreate();
> +  void IdentifyBOMEncoding(const PRUint8* aData,

Does more than id's the encoding.  Maybe a better method name.

@@ +27,5 @@
> +  nsresult FindEncodingForLabel(nsAString& aLabel,
> +                                nsAString& aRvEncoding);
> +
> +  /* Called to free up Encoding instance */
> +  static NS_HIDDEN_(void) ShutDown();

why NS_HIDDEN?

Shutdown(), not ShutDown().

** worried about who calls this and when...

@@ +32,5 @@
> +
> +protected:
> +  nsClassHashtable<nsStringHashKey, nsString> mLabelsEncodings;
> +
> +  struct BOMEncoding {

This might be going away, but if not... this looks alot like a class, not a struct.

@@ +51,5 @@
> +  private:
> +    BOMEncoding();
> +  };
> +
> +  nsTArray<BOMEncoding> BOMEncodingTypes;

m prefix for members. mBOMEncodingTypes, please.

::: dom/base/TextDecoder.cpp
@@ +22,5 @@
> +
> +  nsresult rv;
> +  mConverter =
> +    do_CreateInstance("@mozilla.org/intl/scriptableunicodeconverter", &rv);
> +  if (NS_FAILED(rv)) {

Don't test the rv... just test mConverter.

@@ +23,5 @@
> +  nsresult rv;
> +  mConverter =
> +    do_CreateInstance("@mozilla.org/intl/scriptableunicodeconverter", &rv);
> +  if (NS_FAILED(rv)) {
> +    NS_WARNING("Failed to get converter.");

Should this be fatal?

@@ +44,5 @@
> +  }
> +
> +  mConverter->SetCharset(NS_ConvertUTF16toUTF8(mEncoding).get());
> +  mFatal = aFatal.fatal;
> +}

There are alot of ways for mzTextDecoder::mzTextDecoder to fail.  We are returning after some of them.  Maybe instead of doing all of this initalization in the constructor, we add an init() method?  Then in your "Constructor" method, you create mzTextDecoder.  Then call Init() on it.  If it fails, you do not return an object back to the dom.

::: dom/base/TextDecoder.h
@@ +13,5 @@
> +#include "jsapi.h"
> +#include "mozilla/ErrorResult.h"
> +#include "mozilla/dom/BindingUtils.h"
> +#include "mozilla/dom/TextDecoderBinding.h"
> +#include "mozilla/dom/TextDecoder.h"

recursion is fun!

@@ +32,5 @@
> +  NS_DECL_CYCLE_COLLECTION_SCRIPT_HOLDER_CLASS(mzTextDecoder)
> +
> +  // The WebIDL constructor.
> +  static already_AddRefed<mzTextDecoder>
> +    Constructor(nsISupports* aGlobal,

no indent on the Constructor line.

@@ +45,5 @@
> +    }
> +
> +    nsRefPtr<mzTextDecoder> txtDecoder = new mzTextDecoder(aGlobal,
> +                                                           aFatal,
> +                                                           aEncoding, aRv);

Here's where you'd call Init() on mzTextDecoder

@@ +63,5 @@
> +                const Optional<nsAString>& aEncoding,
> +                ErrorResult& aRv);
> +  virtual ~mzTextDecoder();
> +
> +  virtual JSObject* WrapObject(JSContext *aCx, JSObject *aScope,

virtual JSObject*
WrapObject(JSContext *aCx, JSObject *aScope,

@@ +69,5 @@
> +  {
> +    return TextDecoderBinding::Wrap(aCx, aScope, this, aTriedToWrap);
> +  }
> +
> +  nsISupports* GetParentObject()

nsISupports*
GetParentObject()

@@ +74,5 @@
> +  {
> +    return mGlobal;
> +  }
> +
> +  bool GetDecodeOptions()

similar - put the return type on its own line.

::: dom/interfaces/core/nsIDOMDOMException.idl
@@ +44,5 @@
>    const unsigned short      QUOTA_EXCEEDED_ERR             = 22;
>    const unsigned short      TIMEOUT_ERR                    = 23;
>    const unsigned short      INVALID_NODE_TYPE_ERR          = 24;
>    const unsigned short      DATA_CLONE_ERR                 = 25;
> +  const unsigned short      ENCODER_DECODER_ERR            = 26;

Bump the uuid of this interface.
Comment 12 Doug Turner (:dougt) 2012-08-06 16:51:32 PDT
maybe we don't care if you bump the uuid of nsIDOMDOMException.idl.
Comment 13 Jonas Sicking (:sicking) PTO Until July 5th 2012-08-06 16:57:32 PDT
FWIW, I'm not sure that a hash table for BOMs is worth it. There's only 3 different BOMs, and it's highly unlikely that that number will ever grow. So looping through an array of 3 is likely just as fast.
Comment 14 Masatoshi Kimura [:emk] 2012-08-06 16:58:57 PDT
> ::: dom/interfaces/core/nsIDOMDOMException.idl
> @@ +44,5 @@
> >    const unsigned short      QUOTA_EXCEEDED_ERR             = 22;
> >    const unsigned short      TIMEOUT_ERR                    = 23;
> >    const unsigned short      INVALID_NODE_TYPE_ERR          = 24;
> >    const unsigned short      DATA_CLONE_ERR                 = 25;
> > +  const unsigned short      ENCODER_DECODER_ERR            = 26;
> 
> Bump the uuid of this interface.

This constant should not be added in the first place. See comment #8.
Comment 15 Doug Turner (:dougt) 2012-08-06 17:57:15 PDT
regarding comment 13 - makes sense.  If that is the case, we really don't even need a nsTArray<BOMEncoding>.
Comment 16 Jonas Sicking (:sicking) PTO Until July 5th 2012-08-06 18:01:21 PDT
Quite possible.
Comment 17 Makoto Kato [:m_kato] 2012-08-09 00:07:43 PDT
nsIScriptableUnicode(Encoder|Decoder) cannot detect invalid sequence well.  If encoder found invalid sequence, it may replace with '?'.  (Current WHATWG spec throws exception).

So, use nsIUnicode(Encoder|Decoder) or nsIUnicode(Encoder|Decoder)Raw instead of.  These interfaces have an option for exception.
Comment 18 Masatoshi Kimura [:emk] 2012-08-09 05:02:09 PDT
(In reply to Makoto Kato from comment #17)
> (Current WHATWG spec throws exception).
Only if "fatal: true" is given to TextDecoderOptions.
> So, use nsIUnicode(Encoder|Decoder) or nsIUnicode(Encoder|Decoder)Raw
> instead of.  These interfaces have an option for exception.
Of course, the option will be required anyway.
Comment 19 Jonas Sicking (:sicking) PTO Until July 5th 2012-08-09 16:12:18 PDT
FWIW, the latest discussion in the mailing list is that during decoding we by default use the replacement character. But if the "fatal" flag is set it should throw an exception. I think we should implement with the assumption that this behavior will stick.
Comment 20 Andrew Sutherland [:asuth] 2012-08-09 16:29:50 PDT
2 questions from the perspective of the e-mail client which wants this:

1) Is the plan to implement TextEncoder once TextDecoder works?

2) Is there a way to get these new-fangled DOM bindings into an xpcshell global context somehow?  IndexedDB does it, but it uses old-school XPConnect bindings.  The mail library unit tests run in xpcshell, and I would be very happy to still be able to do that and use these native bindings.  It's okay if the xpcshell code needs to create a richer sandbox like an actual DOM window, especially if there is existing code someone can point me at...
Comment 21 bsurender 2012-08-09 16:48:39 PDT
(In reply to Andrew Sutherland (:asuth) from comment #20)
> 2 questions from the perspective of the e-mail client which wants this:
> 
> 1) Is the plan to implement TextEncoder once TextDecoder works?
> 
The TextEncoder implementation is currently underway. Should be up for review soon once I've resolved the nsIScriptableUnicodeConverter vs nsIUnicodeEncoder detail.

> 2) Is there a way to get these new-fangled DOM bindings into an xpcshell
> global context somehow?  IndexedDB does it, but it uses old-school XPConnect
> bindings.  The mail library unit tests run in xpcshell, and I would be very
> happy to still be able to do that and use these native bindings.  It's okay
> if the xpcshell code needs to create a richer sandbox like an actual DOM
> window, especially if there is existing code someone can point me at...

- have no idea about this. Sicking?
Comment 22 Boris Zbarsky [:bz] (Out June 25-July 6) 2012-08-09 18:03:50 PDT
You should be able to create a docshell and set it up from xpcshell.  Or even just an nsWebBrowser?  The problem is it will expect all sorts of ui-like stuff to be hooked up.  :(

Note, however, bug 781615.
Comment 23 Jonas Sicking (:sicking) PTO Until July 5th 2012-08-10 12:35:44 PDT
Let's discuss exposing this to xpcshell in a separate bug. I don't actually know the answer but have to defer to people that know the new bindings better.

Likewise for exposing this API in jsm components. I.e. please file a separate bug if that's functionality that we want (which I suspect it is).
Comment 24 bsurender 2012-08-10 15:46:45 PDT
Created attachment 651023 [details] [diff] [review]
Bug 763234 - Implement Text Encoder per the http://wiki.whatwg.org/wiki/StringEncoding StringEncoding API.

This patch is not the text decoder patch. This patch contains only the text encoder and its tests. Minor changes to Encodings.cpp that is shared by both the text encoder and decoder has also been included.

Review comments wrt to Encodings and the text decoder will be applied to the text decoder patch.
Comment 25 bsurender 2012-08-10 16:25:50 PDT
Created attachment 651043 [details] [diff] [review]
Bug 763234 - Implement Text Encoder per the http://wiki.whatwg.org/wiki/StringEncoding StringEncoding API.


This patch is not the text decoder patch. This patch contains only the text encoder and its tests. Minor changes to Encodings.cpp that is shared by both the text encoder and decoder has also been included.

Review comments wrt to Encodings and the text decoder will be applied to the text decoder patch.
Comment 26 bsurender 2012-08-10 16:28:24 PDT
Comment on attachment 651043 [details] [diff] [review]
Bug 763234 - Implement Text Encoder per the http://wiki.whatwg.org/wiki/StringEncoding StringEncoding API.

This patch is not the text decoder patch. This patch contains only the text encoder and its tests. Minor changes to Encodings.cpp that is shared by both the text encoder and decoder has also been included.

Review comments wrt to Encodings and the text decoder will be applied to the text decoder patch.
Comment 27 Jonas Sicking (:sicking) PTO Until July 5th 2012-08-10 16:44:50 PDT
Comment on attachment 648933 [details] [diff] [review]
Decoder complete. Tests complete. Ready for review.

Review of attachment 648933 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/base/TextDecoder.cpp
@@ +54,5 @@
> +                      ErrorResult& aRv)
> +{
> +  if (!aView.WasPassed()) {
> +    mFatal ? aRv.Throw(NS_ERROR_DOM_ENCODER_DECODER_ERR)
> +      : aRv.Throw(NS_ERROR_FAILURE);

This is not at all what the spec says the fatal property means. It should affect whether encoding errors throw.
Comment 28 Masatoshi Kimura [:emk] 2012-08-10 20:41:08 PDT
Comment on attachment 651043 [details] [diff] [review]
Bug 763234 - Implement Text Encoder per the http://wiki.whatwg.org/wiki/StringEncoding StringEncoding API.

You are still using nsIScriptableUnicodeConverter.
I'm not sure whether the spec note
> NOTE: Because only UTF encodings are supported, and because of the algorithm used to convert a DOMString to a sequence of Unicode characters, no input can cause the encoding process to emit an encoder error.
is correct. What happens if the input contains an unpaired surrogate?
Comment 29 Jonas Sicking (:sicking) PTO Until July 5th 2012-08-10 21:12:57 PDT
Indeed, that is not correct. I'll raise it on the list.
Comment 30 Masatoshi Kimura [:emk] 2012-08-10 21:27:11 PDT
Ah, I found the answer myself.
The "convert a DOMString to a sequence of Unicode characters" algorithm will convert an unpaired surrogate to a replacement character silently *before* TextEncoder receive the input.
http://dev.w3.org/2006/webapi/WebIDL/#dfn-obtain-unicode
So TextEncoder will never see unpaired surrogates in the input as long as our WebIDL bindings implementation is correct.
Comment 31 :Ms2ger 2012-08-10 23:48:27 PDT
(In reply to Masatoshi Kimura [:emk] from comment #30)
> Ah, I found the answer myself.
> The "convert a DOMString to a sequence of Unicode characters" algorithm will
> convert an unpaired surrogate to a replacement character silently *before*
> TextEncoder receive the input.
> http://dev.w3.org/2006/webapi/WebIDL/#dfn-obtain-unicode
> So TextEncoder will never see unpaired surrogates in the input as long as
> our WebIDL bindings implementation is correct.

We don't implement this requirement, and I don't think we intend to.
Comment 32 Jonas Sicking (:sicking) PTO Until July 5th 2012-08-11 00:51:58 PDT
Note that no spec says that all DOMStrings have to be valid UTF16. However the algorithm in question here explicitly invokes an algorithm which converts the DOMString into a set of valid unicode characters before encoding to the target encoding.

So I believe we implement the WebIDL parts just fine here, since they simply say to pass the DOMString data through.
Comment 33 :Ms2ger 2012-08-11 01:30:49 PDT
Comment on attachment 651043 [details] [diff] [review]
Bug 763234 - Implement Text Encoder per the http://wiki.whatwg.org/wiki/StringEncoding StringEncoding API.

Review of attachment 651043 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/base/Encodings.cpp
@@ +94,3 @@
>  Encodings::FindEncodingForLabel(nsAString& aLabel, nsAString& aRvEncoding)
>  {
>    aLabel.StripChar(PRUnichar(' '), 0);

Needs to strip

U+0020 SPACE, U+0009 CHARACTER TABULATION (tab), U+000A LINE FEED (LF), U+000C FORM FEED (FF), and U+000D CARRIAGE RETURN (CR).

@@ +107,4 @@
>    NS_ENSURE_SUCCESS(rv, rv);
> +  if (!encoding) {
> +    return NS_ERROR_FAILURE;
> +  }

Does this belong in this patch?

::: dom/base/TextEncoder.cpp
@@ +25,5 @@
> +    aRv.Throw(NS_ERROR_FAILURE);
> +    return;
> +  }
> +
> +  if (!aEncoding.WasPassed() || aEncoding.Value().IsEmpty()) {

The IsEmpty check here doesn't match the spec.

However, I tend to think we should only support UTF-8...

@@ +31,5 @@
> +  } else {
> +    if (NS_FAILED(InvalidEncoding(aEncoding.Value()))) {
> +      aRv.Throw(NS_ERROR_DOM_ENCODER_DECODER_ERR);
> +      return;
> +    }

This will throw for " utf-8"

@@ +54,5 @@
> +                    ErrorResult& aRv)
> +{
> +  if (aString.IsEmpty()) {
> +    return JSVAL_NULL;
> +  }

Doesn't look correct

@@ +58,5 @@
> +  }
> +
> +  nsCString data;
> +  PRUint32 dataLength;
> +  nsresult rv = mConverter->ConvertFromUnicode(aString, data);

Just assign to aRv directly

@@ +78,5 @@
> +  return OBJECT_TO_JSVAL(Uint8Array::Create(aCx, this, dataLength, convertedData));
> +}
> +
> +nsresult
> +TextEncoder::GetEncoding(nsAString& aEncoding)

Return void.

@@ +90,5 @@
> +{ 
> +  if (aEncoding.LowerCaseEqualsASCII("utf-8") ||
> +      aEncoding.LowerCaseEqualsASCII("utf-16") ||
> +      aEncoding.LowerCaseEqualsASCII("utf-16be")) {
> +    return NS_OK;

Not worth the extra function

@@ +100,5 @@
> +{
> +  mEncodings = nullptr;
> +  mOptions = nullptr;
> +  mConverter = nullptr;
> +  mGlobal = nullptr;

No need for this

::: dom/base/TextEncoder.h
@@ +44,5 @@
> +
> +    nsRefPtr<TextEncoder> txtEncoder = new TextEncoder();
> +    txtEncoder->Init(aGlobal, aEncoding, aRv);
> +    if (aRv.Failed()) {
> +      txtEncoder = nullptr;

No need for this

@@ +47,5 @@
> +    if (aRv.Failed()) {
> +      txtEncoder = nullptr;
> +      return nullptr;
> +    } else {
> +      return  txtEncoder.forget();

No else-after-return; extra space

@@ +58,5 @@
> +    const Optional<nsAString> encodingString;
> +    return Constructor(aGlobal, encodingString, aRv);
> +  }
> +
> +  

Trailing whitespace

@@ +62,5 @@
> +  
> +  TextEncoder()
> +    : mStreaming(false)
> +  { }
> +  

.

@@ +66,5 @@
> +  
> +  virtual ~TextEncoder();
> +
> +  virtual JSObject*
> +  WrapObject(JSContext *aCx, JSObject *aScope, bool *aTriedToWrap)

* to the left

@@ +80,5 @@
> +
> +  bool
> +  GetEncodeOptions()
> +  {
> +      return false;

Indentation

@@ +94,5 @@
> +   * @param aOption    Streaming option. Initialised by default to false.
> +   *                   If the streaming option is false, then the encoding
> +   *                   will get reset to "utf-8". If set to true then the 
> +   *                   previous encoding is reused/continued.
> +   * @return JS::Value The Uint8Array is wrapped in a JS object that is then 

Trailing whitespace

@@ +104,5 @@
> +                  ErrorResult& aRv);
> +private:
> +  bool mStreaming;
> +  nsString mEncoding;
> +  nsAutoPtr<TextEncodeOptions> mOptions;

mOptions is unused

@@ +114,5 @@
> +   * Validates provided encoding and throws an exception if invalid encoding.
> +   * If no encoding is provided then mEncoding is default initialised to "utf-8".
> +   *
> +   * @param aGlobal      Global object
> +   * @param aEncoding    Optional encoding (case insensitive) provided. 

.

@@ +123,5 @@
> +  void Init(nsISupports* aGlobal, const Optional<nsAString>& aEncoding,
> +            ErrorResult& aRv);
> +
> +  /**
> +   * Checks if the incoming encoding is either a case insensitive "utf-8", 

.

::: dom/bindings/Bindings.conf
@@ +187,5 @@
>  
> +'TextEncoder': {
> +    'nativeType': 'TextEncoder',
> +    'headerFile': 'TextEncoder.h',
> +    'infallible': [ 'encoding' ],

Infallibility is now declared in the IDL file. Are you on an old tree or does this code not compile?

@@ +188,5 @@
> +'TextEncoder': {
> +    'nativeType': 'TextEncoder',
> +    'headerFile': 'TextEncoder.h',
> +    'infallible': [ 'encoding' ],
> +    'implicitJSContext': [ 'encode' ],

I don't think you need this; please check if you still get a cx without this line

::: dom/webidl/TextEncoder.webidl
@@ +3,5 @@
> + * License, v. 2.0. If a copy of the MPL was not distributed with this file,
> + * You can obtain one at http://mozilla.org/MPL/2.0/.
> + *
> + * The origin of this IDL file is
> + * http://dvcs.w3.org/hg/encoding/raw-file/tip/Overview.html

It most definitely is not.

@@ +7,5 @@
> + * http://dvcs.w3.org/hg/encoding/raw-file/tip/Overview.html
> + * http://dev.w3.org/2006/webapi/WebIDL
> + *
> + * Copyright © 2012 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C
> + * liability, trademark and document use rules apply.

And that's a lie too.

@@ +14,5 @@
> +[Constructor(optional DOMString encoding)]
> +interface TextEncoder {
> +  readonly attribute DOMString encoding;
> +  any encode(DOMString? aString,
> +             optional TextEncodeOptions aOptions);

You do want to return a Uint8Array here.

::: dom/webidl/WebIDL.mk
@@ +15,5 @@
>    XMLHttpRequest.webidl \
>    XMLHttpRequestEventTarget.webidl \
>    XMLHttpRequestUpload.webidl \
> +  TextDecoder.webidl \
> +  TextEncoder.webidl \

You should keep this list sorted
Comment 34 :Ms2ger 2012-08-11 01:47:34 PDT
(In reply to Jonas Sicking (:sicking) from comment #32)
> Note that no spec says that all DOMStrings have to be valid UTF16. However
> the algorithm in question here explicitly invokes an algorithm which
> converts the DOMString into a set of valid unicode characters before
> encoding to the target encoding.
> 
> So I believe we implement the WebIDL parts just fine here, since they simply
> say to pass the DOMString data through.

Oh, looks like that changed. Never mind me.
Comment 35 bsurender 2012-08-11 14:38:19 PDT
Comment on attachment 651043 [details] [diff] [review]
Bug 763234 - Implement Text Encoder per the http://wiki.whatwg.org/wiki/StringEncoding StringEncoding API.

Making review corrections.
Comment 36 bsurender 2012-08-11 18:03:41 PDT
Wrt Comment 11

> +  private:
> +    LabelEncoding();
> +  };
> +
> +  const LabelEncoding labelsEncodings[] = {LabelEncoding("unicode-1-1-utf-8", "utf-8"),

Does list change alot?  Will it change based on locale?  Is this the sort of thing that should be in a file (jar) that we read-in on use?

No it doesn't change a lot. It's pretty much a static list and should not be modified so I did it this way. Should I change it?
Comment 37 bsurender 2012-08-11 18:51:36 PDT
wrt comment 11

@@ +27,5 @@
> +  nsresult FindEncodingForLabel(nsAString& aLabel,
> +                                nsAString& aRvEncoding);
> +
> +  /* Called to free up Encoding instance */
> +  static NS_HIDDEN_(void) ShutDown();

why NS_HIDDEN?

Shutdown(), not ShutDown().

** worried about who calls this and when...


I need the Encodings instance alive for the entire application and released when the application shuts down. Therefore the Encodings::Shutdown() is called from within nsContentUtils.cpp only.

Encodings is used by both the TextEncoder and the TextDecoder.
Comment 38 bsurender 2012-08-11 18:58:28 PDT
(In reply to Doug Turner (:dougt) from comment #11)
> Comment on attachment 648933 [details] [diff] [review]
> Decoder complete. Tests complete. Ready for review.
> 
> Review of attachment 648933 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> @@ +23,5 @@
> > +  nsresult rv;
> > +  mConverter =
> > +    do_CreateInstance("@mozilla.org/intl/scriptableunicodeconverter", &rv);
> > +  if (NS_FAILED(rv)) {
> > +    NS_WARNING("Failed to get converter.");
> 
> Should this be fatal?
> 
Based on my interpretation of http://wiki.whatwg.org/wiki/StringEncoding under the TextDecoder constructor point number 3 

"Run the steps to get an encoding from Encoding, with label as label.

    If the steps result in failure, throw a "EncodingError" exception and terminate these steps.
    Otherwise, set the decoder object's internal encoding property to the returned encoding. "

It doesn't cover how to handle other errors but specifies when an "EncodingError" is to be thrown so I thought may be NS_ERROR_FAILURE instead.
Comment 39 bsurender 2012-08-11 19:01:26 PDT
Jonas

I had a review comment that wanted the "EncodingError" to be more descriptive and then another review comment that wanted me to stick to the spec. I think I should stick to the spec. Should the encoding error failure notification be more descriptive or does it just depend on people being familiar with the API? If it should be more descriptive should I modify the spec?
Comment 40 Jonas Sicking (:sicking) PTO Until July 5th 2012-08-11 20:10:45 PDT
We should just stick to the spec. We can always add more properties to the thrown exception in a followup bug.
Comment 41 Masatoshi Kimura [:emk] 2012-08-11 22:49:47 PDT
(In reply to bsurender from comment #39)
> Jonas
> 
> I had a review comment that wanted the "EncodingError" to be more
> descriptive and then another review comment that wanted me to stick to the
> spec. I think I should stick to the spec. Should the encoding error failure
> notification be more descriptive or does it just depend on people being
> familiar with the API? If it should be more descriptive should I modify the
> spec?

I meant to say more descriptive error messages, not an error name. Error name should be follow the spec (i.e. "EncodingError").

You can define error messages such as the following to make error messages more descriptive:
DOM4_MSG_DEF(EncodingError, "Encoding label not supported.", NS_ERROR_DOM_ENCODING_LABEL_NOT_SUPPORTED_ERR)
DOM4_MSG_DEF(EncodingError, "Encoding label must be utf-8, utf-16, or utf-16be.", NS_ERROR_DOM_ENCODING_LABEL_NOT_UTF_ERR)
DOM4_MSG_DEF(EncodingError, "Decoder failed to convert the codepoint.", NS_ERROR_DOM_DECODER_ERR)
Comment 42 bsurender 2012-08-11 23:42:51 PDT
(In reply to :Ms2ger from comment #33)
> Comment on attachment 651043 [details] [diff] [review]
> Bug 763234 - Implement Text Encoder per the
> http://wiki.whatwg.org/wiki/StringEncoding StringEncoding API.
> 
> Review of attachment 651043 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> @@ +7,5 @@
> > + * http://dvcs.w3.org/hg/encoding/raw-file/tip/Overview.html
> > + * http://dev.w3.org/2006/webapi/WebIDL
> > + *
> > + * Copyright © 2012 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C
> > + * liability, trademark and document use rules apply.
> 
> And that's a lie too.
> 
Hmmmm all of the webidl files that I just randomly picked and saw seem to have this Copyright, unless I misunderstood what you are referring too. What are you referring to? If it's the copyright what do I do?

> @@ +14,5 @@
> > +[Constructor(optional DOMString encoding)]
> > +interface TextEncoder {
> > +  readonly attribute DOMString encoding;
> > +  any encode(DOMString? aString,
> > +             optional TextEncodeOptions aOptions);
> 
> You do want to return a Uint8Array here.

Return type Uint8Array currently not supported by the new DOM Bindings. Threw a compilation error along the lines of "bindings does not know return type Uint8Array".
Comment 43 :Ms2ger 2012-08-12 00:06:22 PDT
Ah, right, bug 767930. Either build on top of that patch or file a followup, please?
Comment 44 Boris Zbarsky [:bz] (Out June 25-July 6) 2012-08-12 09:47:26 PDT
> So TextEncoder will never see unpaired surrogates in the input as long as our WebIDL
> bindings implementation is correct.

This has nothing to do with bindings.  The bindings don't promise that they're handing in a sequence of unicode characters (and indeed, there are lots of places in the platform that expect to take a DOMString and get code units, not characters).

We can certainly add a helper function that implements that algorithm, and then places that explicitly call for a sequence of unicode characters (like this spec) can use it.  But that would happen in the C++ callee, not in the WebIDL binding layer, because this is a prose requirement, not something expressed in IDL.

If you want to have a way to express this in IDL, that would be fine too, of course.  Just talk to heycam!

> We don't implement this requirement, and I don't think we intend to.

See above; it's not a "requirement"; it's just an algorithm that the post-binding-layer processing can use.
Comment 45 Boris Zbarsky [:bz] (Out June 25-July 6) 2012-08-12 10:19:20 PDT
> Hmmmm all of the webidl files that I just randomly picked and saw seem to have this
> Copyright

The copyright depends on where the WebIDL comes from.  It's not something _we_ decide; it's an intrinsic property of where the spec is being developed.

For a W3C spec, the copyright is what you have there.  But you're not implementing a W3C spec, so I seriously doubt the spec text is copyrighted by the W3C.  Find out who the real copyright holder is, please.  If the spec doesn't say, that's a spec bug.
Comment 46 :Ms2ger 2012-08-12 13:01:21 PDT
Per <http://wiki.whatwg.org/wiki/WHATWG_Wiki:Copyrights> it's MIT-licensed.
Comment 47 Boris Zbarsky [:bz] (Out June 25-July 6) 2012-08-12 13:42:08 PDT
OK, and the relevant part is "Copyright © 2006 The WHATWG Contributors"
Comment 48 Masatoshi Kimura [:emk] 2012-08-12 15:56:44 PDT
(In reply to Jonas Sicking (:sicking) from comment #32)
> Note that no spec says that all DOMStrings have to be valid UTF16. However
> the algorithm in question here explicitly invokes an algorithm which
> converts the DOMString into a set of valid unicode characters before
> encoding to the target encoding.
> 
> So I believe we implement the WebIDL parts just fine here, since they simply
> say to pass the DOMString data through.

(In reply to Boris Zbarsky (:bz) [In and out Aug 1 - 10, out Aug 11-20] from comment #44)
> > So TextEncoder will never see unpaired surrogates in the input as long as our WebIDL
> > bindings implementation is correct.
> 
> This has nothing to do with bindings.  The bindings don't promise that
> they're handing in a sequence of unicode characters (and indeed, there are
> lots of places in the platform that expect to take a DOMString and get code
> units, not characters).
> 
> We can certainly add a helper function that implements that algorithm, and
> then places that explicitly call for a sequence of unicode characters (like
> this spec) can use it.  But that would happen in the C++ callee, not in the
> WebIDL binding layer, because this is a prose requirement, not something
> expressed in IDL.
> 
> If you want to have a way to express this in IDL, that would be fine too, of
> course.  Just talk to heycam!
> 
> > We don't implement this requirement, and I don't think we intend to.
> 
> See above; it's not a "requirement"; it's just an algorithm that the
> post-binding-layer processing can use.

I see. Correction:
TextEncoder.encode explicitly calls "convert a DOMString to a sequence of Unicode characters".
http://wiki.whatwg.org/wiki/StringEncoding#TextEncoder
So it should never fail. In other words, it should be annotated with [infallible] and unpaired surrogates should be handled inside the function. (I don't know whether nsIScriptableUnicodeConverter handles the error following the spec.)
Comment 49 Masatoshi Kimura [:emk] 2012-08-12 21:00:43 PDT
Comment on attachment 648933 [details] [diff] [review]
Decoder complete. Tests complete. Ready for review.

> +                                           LabelEncoding("utf-16", "utf-16"),
> +                                           LabelEncoding("utf-16le", "utf-16"),
> +                                           LabelEncoding("utf-16be", "utf-16be")};
Our utf-16 converter does not comply with the Encoding Standard. You need to use utf-16le converter for the encoding label "utf-16", and add a comment explaining the workaround.
Of course, "encoding" property should return "utf-16" regardless of our internal name.
Comment 50 Masatoshi Kimura [:emk] 2012-08-12 21:14:06 PDT
> +                                           LabelEncoding("cseuckr", "euc-kr"),
> +                                           LabelEncoding("csksc56011987", "euc-kr"),
> +                                           LabelEncoding("euc-kr", "euc-kr"),
> +                                           LabelEncoding("iso-ir-149", "euc-kr"),
> +                                           LabelEncoding("korean", "euc-kr"),
> +                                           LabelEncoding("ks_c_5601-1987", "euc-kr"),
> +                                           LabelEncoding("ks_c_5601-1989", "euc-kr"),
> +                                           LabelEncoding("ksc5601", "euc-kr"),
> +                                           LabelEncoding("ksc_5601", "euc-kr"),
> +                                           LabelEncoding("windows-949", "euc-kr"),
Similarly, use "x-windows-949" for all "euc-kr" family.
Comment 51 bsurender 2012-08-13 12:02:15 PDT
(In reply to Boris Zbarsky (:bz) [In and out Aug 1 - 10, out Aug 11-20] from comment #44)
> > So TextEncoder will never see unpaired surrogates in the input as long as our WebIDL
> > bindings implementation is correct.
> 
> This has nothing to do with bindings.  The bindings don't promise that
> they're handing in a sequence of unicode characters (and indeed, there are
> lots of places in the platform that expect to take a DOMString and get code
> units, not characters).
> 
> We can certainly add a helper function that implements that algorithm, and
> then places that explicitly call for a sequence of unicode characters (like
> this spec) can use it.  But that would happen in the C++ callee, not in the
> WebIDL binding layer, because this is a prose requirement, not something
> expressed in IDL.
> 
> If you want to have a way to express this in IDL, that would be fine too, of
> course.  Just talk to heycam!
> 
> > We don't implement this requirement, and I don't think we intend to.
> 
> See above; it's not a "requirement"; it's just an algorithm that the
> post-binding-layer processing can use.

Doesn't unpaired surrogates in the incoming DOMString mean that the incoming UTF-16 code units are malformed which could result in erroneous UTF-8 & UTF-16BE conversions, in which case shouldn't the existing Encoder module throw an error/exception?
Comment 52 bsurender 2012-08-13 15:39:51 PDT
(In reply to bsurender from comment #51)
> (In reply to Boris Zbarsky (:bz) [In and out Aug 1 - 10, out Aug 11-20] from
> comment #44)
> > > So TextEncoder will never see unpaired surrogates in the input as long as our WebIDL
> > > bindings implementation is correct.
> > 
> > This has nothing to do with bindings.  The bindings don't promise that
> > they're handing in a sequence of unicode characters (and indeed, there are
> > lots of places in the platform that expect to take a DOMString and get code
> > units, not characters).
> > 
> > We can certainly add a helper function that implements that algorithm, and
> > then places that explicitly call for a sequence of unicode characters (like
> > this spec) can use it.  But that would happen in the C++ callee, not in the
> > WebIDL binding layer, because this is a prose requirement, not something
> > expressed in IDL.
> > 
> > If you want to have a way to express this in IDL, that would be fine too, of
> > course.  Just talk to heycam!
> > 
> > > We don't implement this requirement, and I don't think we intend to.
> > 
> > See above; it's not a "requirement"; it's just an algorithm that the
> > post-binding-layer processing can use.
> 
> Doesn't unpaired surrogates in the incoming DOMString mean that the incoming
> UTF-16 code units are malformed which could result in erroneous UTF-8 &
> UTF-16BE conversions, in which case shouldn't the existing Encoder module
> throw an error/exception?

Sorted. https://bugzilla.mozilla.org/show_bug.cgi?id=782412

Based on the spec. and http://dvcs.w3.org/hg/encoding/raw-file/tip/Overview.html for unmatched surrogates for the utf-16 encoder handle as follows: "If code point is in the range 0xD800 to 0xDFFF, emit an encoder error."
Comment 53 bsurender 2012-08-14 12:59:24 PDT
(In reply to Doug Turner (:dougt) from comment #11)
> Comment on attachment 648933 [details] [diff] [review]
> Decoder complete. Tests complete. Ready for review.
> 
> Review of attachment 648933 [details] [diff] [review]:
> -----------------------------------------------------------------
> @@ +357,5 @@
> > +  mLabelsEncodings.Init(numLabels);
> > +
> > +  for (PRUint32 i = 0; i < numLabels; i++) {
> > +    nsString key;
> > +    key.AssignASCII(labelsEncodings[i].mLabel);
> 
> why not just keep the strings above in unicode (wide), then you don't need
> to convert here.
> 

Cannot do this because the nsString("some chars here"); constructor is protected in nsTString.h
Comment 54 Boris Zbarsky [:bz] (Out June 25-July 6) 2012-08-14 15:26:47 PDT
> Cannot do this because the nsString("some chars here"); constructor is protected in
> nsTString.h

You could make mLabel a NS_ConvertUTF8toUTF16.  And make labelsEncodings static, which is probably desirable anyway.

On a side node, AssignASCII("") is probably better written as Truncate().
Comment 55 bsurender 2012-08-14 23:35:41 PDT
(In reply to Boris Zbarsky (:bz) [In and out Aug 1 - 10, out Aug 11-20] from comment #54)
> > Cannot do this because the nsString("some chars here"); constructor is protected in
> > nsTString.h
> 
> You could make mLabel a NS_ConvertUTF8toUTF16.  And make labelsEncodings
> static, which is probably desirable anyway.
> 
> On a side node, AssignASCII("") is probably better written as Truncate().

hmmmm agreed with the static but if I make labelsEncodings static I get the "WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /home/bbsurender/mz_src_string_api/xpcom/base/nsTraceRefcntImpl.cpp, line 141".

I created a seperate LabelEncoding class and removed the former struct, I NS_ADDREF_THIS then tried MOZ_COUNT_CTOR and MOZ_COUNT_DTOR thinking it was a non-xpcom object and still get the warning because the labelsEncodings array is static.

How do I go about this?
Comment 56 Boris Zbarsky [:bz] (Out June 25-July 6) 2012-08-14 23:40:10 PDT
Ah, your problem is then the static strings you end up with?

Your options are basically to either only store const char* data in the static thing or to do dynamic allocation/deallocation of the big array.  Or I guess to leave it as-is, and have a ton of code running to copy all those strings every single time through that function.
Comment 57 bsurender 2012-08-15 14:48:08 PDT
(In reply to Boris Zbarsky (:bz) [In and out Aug 1 - 10, out Aug 11-20] from comment #54)
> > Cannot do this because the nsString("some chars here"); constructor is protected in
> > nsTString.h
> 
> You could make mLabel a NS_ConvertUTF8toUTF16.  And make labelsEncodings
> static, which is probably desirable anyway.
> 
> On a side node, AssignASCII("") is probably better written as Truncate().

- NS_LITERAL_STRING can also be used. 
- Creating static nsStrings is prohibited. I get the XPCOM warning and jonas has confirmed this if I have described it correctly. So I will leave the labelsEncodings array as is.
Comment 58 Boris Zbarsky [:bz] (Out June 25-July 6) 2012-08-15 21:03:56 PDT
> - NS_LITERAL_STRING can also be used. 

Yes, but Truncate() is the right way to say "make this an empty string".

As you pointed out, the big array is in an init function that's only called once.  Given that, the non-static array is fine.
Comment 59 bsurender 2012-08-15 21:06:59 PDT
(In reply to Boris Zbarsky (:bz) [In and out Aug 1 - 10, out Aug 11-20] from comment #58)
> > - NS_LITERAL_STRING can also be used. 
> 
> Yes, but Truncate() is the right way to say "make this an empty string".

Agreed and rectified in the code. Patch on it's way! :)

> 
> As you pointed out, the big array is in an init function that's only called
> once.  Given that, the non-static array is fine.
Comment 60 bsurender 2012-08-15 23:56:39 PDT
Created attachment 652340 [details] [diff] [review]
String API, encoder + decoder patch.

The decoder depends on https://bugzilla.mozilla.org/show_bug.cgi?id=782721
for the replacement character feature. I will have to look into the recommended workaround and its efficiency.

The encoder depends on bug https://bugzilla.mozilla.org/show_bug.cgi?id=782412

Additionally, bug 764234 (this bug), comment 49, "Our utf-16 converter does not comply with the Encoding Standard. You need to use utf-16le converter for the encoding label "utf-16", and add a comment explaining the workaround.
Of course, "encoding" property should return "utf-16" regardless of our internal name." Also UTF-16LE is used instead of the UTF-16 based encoder.

This work around has been used and commented on in the code.
Comment 61 Doug Turner (:dougt) 2012-08-16 12:47:58 PDT
Comment on attachment 652340 [details] [diff] [review]
String API, encoder + decoder patch.

Review of attachment 652340 [details] [diff] [review]:
-----------------------------------------------------------------

Didn't review tests or encoder.  Many of the issues found in the encoder will apply to the decoder as well.

::: dom/base/DOMError.cpp
@@ +41,5 @@
>  DOMError::CreateForDOMExceptionCode(PRUint16 aDOMExceptionCode)
>  {
>    // All of these codes (and yes, some are skipped) come from the spec.
>    static const NameMap kNames[] = {
> +    {  0, "EncodingoError" },

The spec being referred to is here:  http://www.w3.org/TR/dom/#error-types-0

It doesn't defined EncodingError.  If I were to guess, if it was added, it would not be in position 0.

Why did you add it here?

::: dom/base/Encodings.cpp
@@ +2,5 @@
> +/* This Source Code Form is subject to the terms of the Mozilla Public
> + * License, v. 2.0. If a copy of the MPL was not distributed with this
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +// Local Includes

Useless comment

@@ +25,5 @@
> +void
> +Encodings::Shutdown()
> +{
> +  NS_IF_RELEASE(gEncodings);
> +  gEncodings = nullptr;

No need to set nullptr if gEnconding is already null.

if (gEnconding) {
  NS_RELEASE(gEnconding)
  gEnconding = nullptr;
}

::: dom/base/Encodings.h
@@ +17,5 @@
> +
> +class Encodings
> +{
> +public:
> +  NS_INLINE_DECL_THREADSAFE_REFCOUNTING(Encodings)

Does this really need to be threadsafe?  If so, there is a race on GetOrCreate().

::: dom/base/TextDecoder.cpp
@@ +2,5 @@
> +/* This Source Code Form is subject to the terms of the Mozilla Public
> + * License, v. 2.0. If a copy of the MPL was not distributed with this
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +// Local Includes

useless comment

@@ +21,5 @@
> +
> +  nsresult rv;
> +  mConverter =
> +    do_CreateInstance("@mozilla.org/intl/scriptableunicodeconverter", &rv);
> +  if (!mConverter) {

you're not using rv.  Instead:

mConverter = do_CreateInstance("@mozilla.org/intl/scriptableunicodeconverter");

Also, move this down to where you use it (above SetCharset())

@@ +34,5 @@
> +    mEncodingProvided = true;
> +  }
> +
> +  mEncodings = Encodings::GetOrCreate();
> +  MOZ_ASSERT(mEncodings);

Assert or throw?

@@ +56,5 @@
> +    aRv = NS_ERROR_FAILURE;
> +    return;
> +  }
> +
> +  nsString BOMEncoding;

variable names don't start with upper case.  try bomEncodining

@@ +58,5 @@
> +  }
> +
> +  nsString BOMEncoding;
> +  PRUint32 offset;
> +  PRUint8* dataAfterOffset = aView.Value().Data();

please move this to where you use it.  Above the increment by offset.

@@ +74,5 @@
> +  dataAfterOffset += offset;
> +  mConverter->SetCharset(NS_ConvertUTF16toUTF8(mEncoding).get());
> +  nsresult rv = mConverter->ConvertFromByteArray(dataAfterOffset,
> +                                                 aView.Value().Length() - offset,
> +                                                 aOutDecodedString);

If this fails, why don't we throw and return immediately?

@@ +75,5 @@
> +  mConverter->SetCharset(NS_ConvertUTF16toUTF8(mEncoding).get());
> +  nsresult rv = mConverter->ConvertFromByteArray(dataAfterOffset,
> +                                                 aView.Value().Length() - offset,
> +                                                 aOutDecodedString);
> +  // Setting streaming for next call.

I don't understand what this next part is about -- if we aren't streaming, why do you set things back to utf8?

@@ +99,5 @@
> +{
> +  mEncodings = nullptr;
> +  mOptions = nullptr;
> +  mConverter = nullptr;
> +  mGlobal = nullptr;

You do not need to null any of these.

@@ +109,5 @@
> +NS_INTERFACE_MAP_BEGIN_CYCLE_COLLECTION(TextDecoder)
> +  NS_INTERFACE_MAP_ENTRY(nsISupports)
> +NS_INTERFACE_MAP_END
> +
> +NS_IMPL_CYCLE_COLLECTION_WRAPPERCACHE_1(TextDecoder, mGlobal)

Don't you have to also add the members to CC?

nsAutoPtr<TextDecodeOptions> mOptions;
nsRefPtr<Encodings> mEncodings;
nsCOMPtr<nsIScriptableUnicodeConverter> mConverter;
nsCOMPtr<nsISupports> mGlobal;

::: dom/base/TextDecoder.h
@@ +16,5 @@
> +#include "nsIScriptableUConv.h"
> +#include "nsString.h"
> +
> +// Local Includes
> +// Helper Classes

useless includes

@@ +45,5 @@
> +
> +    nsRefPtr<TextDecoder> txtDecoder = new TextDecoder();
> +    txtDecoder->Init(aGlobal, aFatal, aEncoding, aRv);
> +    if (aRv.Failed()) {
> +      txtDecoder = nullptr;

no need to null txtDecoder.  It will fall out of scope and die.

@@ +48,5 @@
> +    if (aRv.Failed()) {
> +      txtDecoder = nullptr;
> +      return nullptr;
> +    }
> +    return  txtDecoder.forget();

Two many spaces after the return.

@@ +67,5 @@
> +  virtual ~TextDecoder();
> +
> +  virtual JSObject*
> +  WrapObject(JSContext* aCx, JSObject* aScope,
> +             bool* aTriedToWrap)

you probably can move bool up on the same line.

@@ +90,5 @@
> +  /**
> +   * Decodes incoming byte stream of characters in charset indicated by
> +   * encoding.
> +   *
> +   * The encoding algorithm state is reset if mStreaming is not set. The default

this mentions mStreaming, but it isn't clear how that is set.

@@ +99,5 @@
> +   * NS_ERROR_DOM_ENCODING_NOT_SUPPORTED_ERR. Else the decoder will return a
> +   * decoded string with replacement character(s) for unidentified character(s).
> +   * Currently replacement characters are not supported by nsIUnicodeDecoder and
> +   * nsBasicDecoder,
> +   * follow up bug at https://bugzilla.mozilla.org/show_bug.cgi?id=782721

so, what happens?

@@ +103,5 @@
> +   * follow up bug at https://bugzilla.mozilla.org/show_bug.cgi?id=782721
> +   *
> +   * The modification to the decode function is a one liner ALONG THE LINES OF
> +   * mConverter->SetInputErrorBehavior(mConverter->kOnError_Replace,
> +   *                                   ?????, 0xfffd);

Is this important information in the header?  Add the workaround somewhere, then document it there.

::: dom/base/TextEncoder.cpp
@@ +19,5 @@
> +  SetIsDOMBinding();
> +
> +  nsresult rv;
> +  mConverter =
> +    do_CreateInstance("@mozilla.org/intl/scriptableunicodeconverter", &rv);

Same thing - not using rv

::: dom/interfaces/core/nsIDOMDOMException.idl
@@ +19,5 @@
>  
>  [scriptable, uuid(5bd766d3-57a9-4833-995d-dbe21da29595)]
>  interface nsIDOMDOMException : nsISupports
>  {
> +  const unsigned short      ENCODING_NOT_SUPPORTED_ERR     = 0;

same thing - this probably isn't the right value for the encoding error.

::: xpcom/base/nsError.h
@@ +806,5 @@
>   * dom/src/base/domerr.msg */
>  
>  /* Standard DOM error codes: http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html */
> +#define NS_ERROR_DOM_ENCODING_NOT_SUPPORTED_ERR \
> +  NS_ERROR_GENERATE_FAILURE(NS_ERROR_MODULE_DOM, 0)

not the right value.
Comment 62 Blake Kaplan (:mrbkap) (please use needinfo!) 2012-08-16 13:42:58 PDT
(In reply to Doug Turner (:dougt) from comment #61)
> > +  NS_IF_RELEASE(gEncodings);
> > +  gEncodings = nullptr;
> 
> No need to set nullptr if gEnconding is already null.
> 
> if (gEnconding) {
>   NS_RELEASE(gEnconding)
>   gEnconding = nullptr;
> }

For what it's worth, NS_IF_RELEASE and NS_RELEASE both already null out their argument, so this could simply be:

NS_IF_RELEASE(gEncoding);

and that's it.
Comment 63 bsurender 2012-08-16 15:17:33 PDT
(In reply to Doug Turner (:dougt) from comment #61)
> Comment on attachment 652340 [details] [diff] [review]
> String API, encoder + decoder patch.
> 
> Review of attachment 652340 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Didn't review tests or encoder.  Many of the issues found in the encoder
> will apply to the decoder as well.
> 
> ::: dom/base/DOMError.cpp
> @@ +41,5 @@
> >  DOMError::CreateForDOMExceptionCode(PRUint16 aDOMExceptionCode)
> >  {
> >    // All of these codes (and yes, some are skipped) come from the spec.
> >    static const NameMap kNames[] = {
> > +    {  0, "EncodingoError" },
> 
> The spec being referred to is here:  http://www.w3.org/TR/dom/#error-types-0
> 
> It doesn't defined EncodingError.  If I were to guess, if it was added, it
> would not be in position 0.
> 
> Why did you add it here?
>
> 
> ::: dom/interfaces/core/nsIDOMDOMException.idl
> @@ +19,5 @@
> >  
> >  [scriptable, uuid(5bd766d3-57a9-4833-995d-dbe21da29595)]
> >  interface nsIDOMDOMException : nsISupports
> >  {
> > +  const unsigned short      ENCODING_NOT_SUPPORTED_ERR     = 0;
> 
> same thing - this probably isn't the right value for the encoding error.
> 
> ::: xpcom/base/nsError.h
> @@ +806,5 @@
> >   * dom/src/base/domerr.msg */
> >  
> >  /* Standard DOM error codes: http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html */
> > +#define NS_ERROR_DOM_ENCODING_NOT_SUPPORTED_ERR \
> > +  NS_ERROR_GENERATE_FAILURE(NS_ERROR_MODULE_DOM, 0)
> 
> not the right value.

For the above comments I followed these numeric values based on comment 8

"
>+    { 26, "EncoderDecoderError" }
>+  const unsigned short      ENCODER_DECODER_ERR            = 26;
>+  EncoderDecoderError        = nsIDOMDOMException::ENCODER_DECODER_ERR,
Don't define a new code (numeric value). Error code was deprecated in favor of error name. No new error code should never be defined."

and comment 9 

"
> No new error code should never be defined.
No new error code should ever be defined.
Just use zero for new errors."

If I do not define a new number (formerly 26, then 0) and have the exception implemented as in the existing patch I get the following warning.

"WARNING: Huh, someone is throwing non-DOM errors using the DOM module!: file /home/bbsurender/mz_src_string_api/dom/base/nsDOMException.cpp, line 102"

I've reverted the number back to 26. What is the right way to proceed with this?
Comment 64 bsurender 2012-08-16 15:38:20 PDT
(In reply to Doug Turner (:dougt) from comment #61)
> Comment on attachment 652340 [details] [diff] [review]
> String API, encoder + decoder patch.
> 
> Review of attachment 652340 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Didn't review tests or encoder.  Many of the issues found in the encoder
> will apply to the decoder as well.
> 
> ::: dom/base/Encodings.cpp
> @@ +2,5 @@
> > +/* This Source Code Form is subject to the terms of the Mozilla Public
> > + * License, v. 2.0. If a copy of the MPL was not distributed with this
> > + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> > +
> > +// Local Includes
> 
> Useless comment

Followed existing http://mxr.mozilla.org/mozilla-central/source/embedding/browser/webBrowser/nsWebBrowser.cpp#6 and many more examples. Redundant style I suppose. Will remove. Np.

> ::: dom/base/TextDecoder.cpp
> @@ +2,5 @@
> > +/* This Source Code Form is subject to the terms of the Mozilla Public
> > + * License, v. 2.0. If a copy of the MPL was not distributed with this
> > + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> > +
> > +// Local Includes
> 
> useless comment
> 
Same as above.

> @@ +21,5 @@
> > +
> > +  nsresult rv;
> > +  mConverter =
> > +    do_CreateInstance("@mozilla.org/intl/scriptableunicodeconverter", &rv);
> > +  if (!mConverter) {
> 
> you're not using rv.  Instead:
> 

Hmmm some code files check rv for do_CreateInstance and some don't. 

E.g. at https://bugzilla.mozilla.org/show_bug.cgi?id=715041

"
@@ +9110,5 @@
> +    mLocalIdleTimer = nsnull;
> +    mLocalIdleTimerIndex = -1;
> +    
> +    if (mIdleTimer) {
> +      mIdleTimer = do_CreateInstance("@mozilla.org/timer;1", &rv);

Please return rv if it fails."

Also comment 11 in this bug

"
::: dom/base/TextDecoder.cpp
@@ +22,5 @@
> +
> +  nsresult rv;
> +  mConverter =
> +    do_CreateInstance("@mozilla.org/intl/scriptableunicodeconverter", &rv);
> +  if (NS_FAILED(rv)) {

Don't test the rv... just test mConverter."


I did an NS_ENSURE_SUCCESS(rv, rv) before but removed that and left this in. So if rv can be passed as an arg, do I NS_ENSURE_SUCCESS it or do I just check the return ptr value. I.e. just check mConverter in this case. Which style is more secure? 

> 
> @@ +56,5 @@
> > +    aRv = NS_ERROR_FAILURE;
> > +    return;
> > +  }
> > +
> > +  nsString BOMEncoding;
> 
> variable names don't start with upper case.  try bomEncodining
> 

BOM for Byte Order Mark so thought was a special case. But I could lowercase it. Np.
Comment 65 bsurender 2012-08-16 16:52:47 PDT
(In reply to Doug Turner (:dougt) from comment #61)
> Comment on attachment 652340 [details] [diff] [review]
> String API, encoder + decoder patch.
> 
> Review of attachment 652340 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Didn't review tests or encoder.  Many of the issues found in the encoder
> will apply to the decoder as well.
> 
@@ +74,5 @@
> +  dataAfterOffset += offset;
> +  mConverter->SetCharset(NS_ConvertUTF16toUTF8(mEncoding).get());
> +  nsresult rv = mConverter->ConvertFromByteArray(dataAfterOffset,
> +                                                 aView.Value().Length() - offset,
> +                                                 aOutDecodedString);

If this fails, why don't we throw and return immediately?

In the JS there is an array of multiple byte streams. This particular byte stream (call it byte stream 0) may have failed but the next byte stream (call it byte stream 1) for the same decoder object needs to be decoded. If the streaming flag was set for byte stream zero then before I return failure I need to set the decoder mStreaming flag to that value so that byte stream 1 continues to use the same encoding that was set for byte stream zero. If the mStreaming flag was not set when decode was called for byte stream 0 then before returning failure I need to reset the mStreaming flag to false and reset the encoding to the default of utf-8. This way byte stream 1 will be decoded to uft-8 code units. This is also done this way so that just in case the decoder error is ignored by the JS function and the next byte stream for the same decoder object is passed anyway then the correct streaming behaviour should still be reflected.

from http://wiki.whatwg.org/wiki/StringEncoding

"If the internal streaming flag of the decoder object is not set, then reset the encoding algorithm state to the default values for encoding encoding. Otherwise, the encoding algorithm state is re-used from the previous call to decode on this object."

Please let me know if I have misunderstood.
Comment 66 bsurender 2012-08-16 16:55:53 PDT
(In reply to Doug Turner (:dougt) from comment #61)
> Comment on attachment 652340 [details] [diff] [review]
> String API, encoder + decoder patch.
> 
> Review of attachment 652340 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Didn't review tests or encoder.  Many of the issues found in the encoder
> will apply to the decoder as well.
> 
> @@ +75,5 @@
> > +  mConverter->SetCharset(NS_ConvertUTF16toUTF8(mEncoding).get());
> > +  nsresult rv = mConverter->ConvertFromByteArray(dataAfterOffset,
> > +                                                 aView.Value().Length() - offset,
> > +                                                 aOutDecodedString);
> > +  // Setting streaming for next call.
> 
> I don't understand what this next part is about -- if we aren't streaming,
> why do you set things back to utf8?

From http://wiki.whatwg.org/wiki/StringEncoding

"The decode method runs the following steps:

    If the internal streaming flag of the decoder object is not set, then reset the encoding algorithm state to the default values for encoding encoding. Otherwise, the encoding algorithm state is re-used from the previous call to decode on this object.

    If the options parameter is specified and the stream option is true, then the internal streaming flag is set; otherwise the internal streaming flag is cleared."
Comment 67 Masatoshi Kimura [:emk] 2012-08-16 18:09:07 PDT
(In reply to bsurender from comment #63)
Sorry for my vague comments. I meant to say change
>+  EncoderDecoderError        = nsIDOMDOMException::ENCODER_DECODER_ERR,
to
>+  EncoderDecoderError        = 0,
.

I'll clarify other points in comment #8:
>+    { 26, "EncoderDecoderError" }
Do not add any members to kNames (not 26, not zero, not whatever other values).

>+  const unsigned short      ENCODER_DECODER_ERR            = 26;
Do not add this constant.

> > > +  NS_ERROR_GENERATE_FAILURE(NS_ERROR_MODULE_DOM, 0)
> > 
> > not the right value.
> 
> For the above comments I followed these numeric values based on comment 8
You need to use a unique value for each nsresults. The low word of nsresult has nothing to do with error code value which I mentioned in comment #9.
Comment 68 Masatoshi Kimura [:emk] 2012-08-16 18:24:42 PDT
Comment on attachment 652340 [details] [diff] [review]
String API, encoder + decoder patch.

>+     {NS_LITERAL_STRING("x-windows-949"), NS_LITERAL_STRING("x-windows-949")},
Change back the left value to "euc-kr" (but leave the right value to "x-windows-949").
>+     {NS_LITERAL_STRING("utf-16le"), NS_LITERAL_STRING("utf-16")},
Change "utf-16" to "utf-16le" here, too.
>+   * Label "utf-16le" is instead used for encoding "utf-16".
Could you rewrite TextDecoder::GetEncoding and TextEncoder::GetEncoding so that they will return "utf-16" for "utf-16le" and return "euc-kr" for "x-windows-949"? We should not expose our implementation details to public API.
Comment 69 Masatoshi Kimura [:emk] 2012-08-16 18:40:25 PDT
(In reply to Masatoshi Kimura [:emk] from comment #67)
> >+  EncoderDecoderError        = 0,
Ugh,
>+  EncodingError        = 0,
.
Comment 70 Masatoshi Kimura [:emk] 2012-08-16 19:18:45 PDT
(In reply to bsurender from comment #66)
> (In reply to Doug Turner (:dougt) from comment #61)
> > I don't understand what this next part is about -- if we aren't streaming,
> > why do you set things back to utf8?
> 
> From http://wiki.whatwg.org/wiki/StringEncoding
> 
> "The decode method runs the following steps:
> 
>     If the internal streaming flag of the decoder object is not set, then
> reset the encoding algorithm state to the default values for encoding
> encoding. Otherwise, the encoding algorithm state is re-used from the
> previous call to decode on this object.
> 
>     If the options parameter is specified and the stream option is true,
> then the internal streaming flag is set; otherwise the internal streaming
> flag is cleared."

You misunderstand the concept "encoding algorithm state". It means, for example, "utf-8 code point", "utf-8 bytes seen", "utf-8 bytes needed", and "utf-8 lower boundary" for utf-8 encoding,
http://dvcs.w3.org/hg/encoding/raw-file/tip/Overview.html#utf-8
It doesn't mean options of TextEncodeer/TextDecoder object.

And the "encoding algorithm state" is managed by nsIUnicodeDecoder/nsIUnicodeEncoder objects. So you don't have to manipulate it directly. All you have to do is calling Reset() method of nsIUnicodeDecoder/nsIUnicodeEncoder.
Comment 71 Masatoshi Kimura [:emk] 2012-08-16 20:53:30 PDT
(In reply to bsurender from comment #64)
> (In reply to Doug Turner (:dougt) from comment #61)
> > +  nsresult rv;
> > +  mConverter =
> > +    do_CreateInstance("@mozilla.org/intl/scriptableunicodeconverter", &rv);
> > +  if (NS_FAILED(rv)) {
> 
> Don't test the rv... just test mConverter."
> 
> I did an NS_ENSURE_SUCCESS(rv, rv) before but removed that and left this in.
> So if rv can be passed as an arg, do I NS_ENSURE_SUCCESS it or do I just
> check the return ptr value. I.e. just check mConverter in this case. Which
> style is more secure? 

mConverter = do_CreateInstance("@mozilla.org/intl/scriptableunicodeconverter");
if (!mConverter) {
  return NS_ERROR_UNEXPECTED;
}

And do not use nsIScriptableUnicodeConverter. You don't have to wait until bug 782721 and bug 782412 has been resolved. nsIScriptableUnicodeConverter will have the same problems anyway unless those bugs are fixed because nsIScriptableUnicodeConverter uses nsIUnicodeDecoder/nsIUnicodeEncoder internally.
Comment 72 bsurender 2012-08-17 10:51:54 PDT
(In reply to Doug Turner (:dougt) from comment #61)
> Comment on attachment 652340 [details] [diff] [review]
> String API, encoder + decoder patch.
> 
> Review of attachment 652340 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Didn't review tests or encoder.  Many of the issues found in the encoder
> will apply to the decoder as well.
> 
> 
> @@ +109,5 @@
> > +NS_INTERFACE_MAP_BEGIN_CYCLE_COLLECTION(TextDecoder)
> > +  NS_INTERFACE_MAP_ENTRY(nsISupports)
> > +NS_INTERFACE_MAP_END
> > +
> > +NS_IMPL_CYCLE_COLLECTION_WRAPPERCACHE_1(TextDecoder, mGlobal)
> 
> Don't you have to also add the members to CC?
> 
> nsAutoPtr<TextDecodeOptions> mOptions;
> nsRefPtr<Encodings> mEncodings;
> nsCOMPtr<nsIScriptableUnicodeConverter> mConverter;
> nsCOMPtr<nsISupports> mGlobal;

mOptions doesn't need to be added to CC b/c mOptions is not a JS object but just a struct with a bool in it. So there are no circular object references here. It also doesn't follow the cases in khuey's guide to cc http://blog.kylehuey.com/post/27564411715/cycle-collection.

mEncodings doesn't need to be added to CC b/c the nsClassHash only contains nsStrings and the likes. No xpcom objects or cyclic references.

Yes mConverter needs to be added to the cycle collection. I've added it.
Comment 73 :Ms2ger 2012-08-17 11:06:39 PDT
As I said before, mOptions is even unused. Can you please address all outstanding comments before uploading a new patch?
Comment 74 bsurender 2012-08-17 11:08:55 PDT
(In reply to :Ms2ger from comment #73)
> As I said before, mOptions is even unused. Can you please address all
> outstanding comments before uploading a new patch?

Ah i removed it before but it must have crept back some how during my merges and updates. thanks.
Comment 75 bsurender 2012-08-17 11:12:25 PDT
(In reply to :Ms2ger from comment #73)
> As I said before, mOptions is even unused. Can you please address all
> outstanding comments before uploading a new patch?


Sorry I think I meant qfold.
Comment 76 Masatoshi Kimura [:emk] 2012-08-17 16:14:27 PDT
(In reply to bsurender from comment #72)
> Yes mConverter needs to be added to the cycle collection. I've added it.
No, replace it with nsIUnicodeEncoder and nsIUnicodeDecoder.
Comment 77 bsurender 2012-08-17 23:09:34 PDT
(In reply to Masatoshi Kimura [:emk] from comment #76)
> (In reply to bsurender from comment #72)
> > Yes mConverter needs to be added to the cycle collection. I've added it.
> No, replace it with nsIUnicodeEncoder and nsIUnicodeDecoder.

Please let me know why I cannot use nsIScriptableUnicodeConverter.

I think I can use it because nsIScriptableUnicodeConverter does exactly what I need to do. It sets up the nsIUnicode(Encoder | Decoder) via the nsICharsetConverterManager in the same manner that I would have to. It (nsIScriptabeUnicodeConverter) then very few, if at all any additional layers except for error checking before it calls nsUnicode(Encoder | Decoder) Convert() functions. To me it seems like a straightforward interface to nsIUnicode(Encoder | Decoder) Convert(). Even if I were to write my own direct calls to nsIUnicode(Encoder | Decoder) I would have to write the same code. I think that this code is reusable and that also means less versions of interfaces to nsIUnicode(Encoder | Decoder) to maintain. I only use the functions I need and make sure that any related variables and functions are setup correctly. 

What do you think?

As for the decoder replacement character bug and encoder surrogate pairs bugs, I've also  stated in those bugs that they occur in nsIUnicode(Encoder | Decoder). So clearly it will have to be fixed in nsIUnicode(Encoder | Decoder) not nsIScriptableUnicodeConverter. This may be what you were trying to say and if so I agree with you.
Comment 78 bsurender 2012-08-17 23:12:28 PDT
Created attachment 653026 [details] [diff] [review]
Encoder + decoder patch with improvements.
Comment 79 bsurender 2012-08-17 23:25:45 PDT
Comment on attachment 653026 [details] [diff] [review]
Encoder + decoder patch with improvements.

Doug I'm uploading the entire patch with corrections to everyone's comments. I am uploading an entire patch because you said that you hadn't reviewed the Encoder and the tests because of review comments for the decoder which the encoder also is similar to.

I will also be uploading another patch with just the diff's from the your last review and everyone else's comments too for everyone's convenience.

I think I have agreeably replied to all the comments explaining why they cannot be corrected and have corrected those that can be.
Comment 80 bsurender 2012-08-17 23:27:45 PDT
Created attachment 653027 [details] [diff] [review]
diff of latest review corrections to the entire api.

Just a diff of the latest review comment fixes.
Comment 81 Masatoshi Kimura [:emk] 2012-08-18 00:09:49 PDT
(In reply to bsurender from comment #77)
> (In reply to Masatoshi Kimura [:emk] from comment #76)
> > (In reply to bsurender from comment #72)
> > > Yes mConverter needs to be added to the cycle collection. I've added it.
> > No, replace it with nsIUnicodeEncoder and nsIUnicodeDecoder.
> 
> Please let me know why I cannot use nsIScriptableUnicodeConverter.

How do you implement "streaming" flag with nsIScriptableUnicodeConverter?
It is completely wrong to change the encoding to "utf-8" when "streaming" is false (see comment #70, Doug and Ms2ger will agree).

> Even if I were to
> write my own direct calls to nsIUnicode(Encoder | Decoder) I would have to
> write the same code.
It is very few as you said yourself. It may be even less than what you have to write for nsIScriptableUnicodeConverter.
nsIScritableConverter is a wrapper to give an interface to JavaScript (as the name implies). It's the very purpose of StringEncoding API. It would not have been created if StringEncoding API was present from the start.
It is not more convenient than direct nsIUnicodeEncoder/Decoder for C++ caller. Indeed, no C++ callers use nsIScriptableUnicodeConverter in the tree.
https://mxr.mozilla.org/mozilla-central/search?string=nsIScriptableUnicodeConverter

> As for the decoder replacement character bug and encoder surrogate pairs
> bugs, I've also  stated in those bugs that they occur in nsIUnicode(Encoder
> | Decoder). So clearly it will have to be fixed in nsIUnicode(Encoder |
> Decoder) not nsIScriptableUnicodeConverter. This may be what you were trying
> to say and if so I agree with you.
You cannot workaround bugs unless using nsIUnicodeEncoder/Decoder directly because nsIScriptableUnicodeConverter do not tell the error (that's aready said in comment #17).
Comment 82 :Ms2ger 2012-08-18 01:51:58 PDT
Comment on attachment 653026 [details] [diff] [review]
Encoder + decoder patch with improvements.

Review of attachment 653026 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/base/src/nsContentUtils.cpp
@@ +119,5 @@
>  #include "nsParserConstants.h"
>  #include "nsIWebNavigation.h"
>  #include "nsTextFragment.h"
>  #include "mozilla/Selection.h"
> +#include "Encodings.h"

"mozilla/dom/Encodings.h"

::: dom/base/Encodings.cpp
@@ +4,5 @@
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +#include "Encodings.h"
> +
> +using namespace mozilla::dom;

Wrap the file in

namespace mozilla {
namespace dom {

} // namespace dom
} // namespace mozilla

@@ +12,5 @@
> +Encodings::Encodings()
> +{
> +  MOZ_ASSERT(!gEncodings);
> +  MOZ_ASSERT(NS_SUCCEEDED(PopulateEncodingsTable()),
> +             "Failed to populate hash table of labels and encodings.");

This will not run PopulateEncodingsTable() in opt builds.

@@ +38,5 @@
> +  return gEncodings;
> +}
> +
> +nsresult
> +Encodings::IdentifyDataOffset(const PRUint8* aData,

You don't check the return value of this function, so please make it return void.

@@ +41,5 @@
> +nsresult
> +Encodings::IdentifyDataOffset(const PRUint8* aData,
> +                              const PRUint32 aLength,
> +                              nsAString& aRetval,
> +                              PRUint32& aOffset)

PRUint32*, or return it directly.

@@ +80,5 @@
> +  return NS_OK;
> +}
> +
> +nsresult
> +Encodings::FindEncodingForLabel(const nsAString& aLabel, nsAString& aRvEncoding)

I would return a boolean (whether the encoding was found)

@@ +93,5 @@
> +  ToLowerCase(label);
> +
> +  // Truncating to clear aRvEncoding incase of failure.
> +  nsString* encoding;
> +  nsresult rv = mLabelsEncodings.Get(label, &encoding);

I'm pretty sure this Get() returns a boolean. However, I think you could just use

nsString* encoding = mLabelsEncodings.Get(label);

@@ +333,5 @@
> +    mLabelsEncodings.Put(labelsEncodings[i].GetLabel(),
> +                         new nsString(labelsEncodings[i].GetEncoding()));
> +  }
> +
> +  return NS_OK;

Just make this function return void.

::: dom/base/Encodings.h
@@ +5,5 @@
> +
> +#ifndef mozilla_dom_base_encodings_h_
> +#define mozilla_dom_base_encodings_h_
> +
> +#include "LabelEncoding.h"

No need to include this in the header; include it in Encodings.cpp as "mozilla/dom/LabelEncoding.h"

@@ +9,5 @@
> +#include "LabelEncoding.h"
> +#include "nsClassHashtable.h"
> +#include "nsString.h"
> +
> +using namespace mozilla;

Remove this

::: dom/base/LabelEncoding.h
@@ +10,5 @@
> +
> +namespace mozilla {
> +namespace dom {
> +
> +class LabelEncoding {

{ on the next line

@@ +16,5 @@
> +  LabelEncoding(const nsAString& aLabel, const nsAString& aEncoding)
> +    : mLabel(aLabel), mEncoding(aEncoding)
> +  {
> +    mLabel.StripChars(" \t\n\f\r");
> +    mEncoding.StripChars(" \t\n\f\r");

You don't need to do this, I don't think.

@@ +27,5 @@
> +  {
> +    MOZ_COUNT_DTOR(LabelEncoding);
> +  }
> +
> +  nsString

const nsString&, I think

@@ +33,5 @@
> +  {
> +    return mLabel;
> +  }
> +
> +  nsString

Same here

@@ +42,5 @@
> +
> +private:
> +  nsString mLabel;
> +  nsString mEncoding;
> +  LabelEncoding() {}

MOZ_DELETE, I think

::: dom/base/TextDecoder.cpp
@@ +3,5 @@
> + * License, v. 2.0. If a copy of the MPL was not distributed with this
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +#include "nsContentUtils.h"
> +#include "TextDecoder.h"

Include "mozilla/dom/TextDecoder.h" *first*

@@ +17,5 @@
> +  MOZ_ASSERT(aGlobal);
> +  mGlobal = aGlobal;
> +  SetIsDOMBinding();
> +
> +  if (!aEncoding.WasPassed() || aEncoding.Value().IsEmpty()) {

I think I told you that the aEncoding.Value().IsEmpty() check is wrong. This needs a test.

@@ +30,5 @@
> +
> +  mEncodings = Encodings::GetOrCreate();
> +  if (!mEncodings) {
> +    aRv.Throw(NS_ERROR_OUT_OF_MEMORY);
> +  }

Assert that it isn't null.

@@ +56,5 @@
> +{
> +  if (!aView.WasPassed() || aView.Value().Length() <= 0) {
> +    aRv = NS_ERROR_FAILURE;
> +    return;
> +  }

Wrong.

@@ +95,5 @@
> +void
> +TextDecoder::GetEncoding(nsAString& aEncoding)
> +{
> +  if (mOriginalEncoding.EqualsASCII("utf-16") &&
> +      mEncoding.EqualsASCII("utf-16le")) {

EqualsLiteral

::: dom/base/TextDecoder.h
@@ +19,5 @@
> +#include "nsAutoPtr.h"
> +#include "nsCOMPtr.h"
> +#include "nsCycleCollectionParticipant.h"
> +
> +using namespace mozilla::dom;

Don't do this

::: dom/base/TextEncoder.cpp
@@ +16,5 @@
> +  MOZ_ASSERT(aGlobal);
> +  mGlobal = aGlobal;
> +  SetIsDOMBinding();
> +
> +  if (!aEncoding.WasPassed() || aEncoding.Value().IsEmpty()) {

Wrong...

::: dom/webidl/TextDecoder.webidl
@@ +7,5 @@
> + * http://wiki.whatwg.org/wiki/StringEncoding
> + * http://dev.w3.org/2006/webapi/WebIDL
> + *
> + * Copyright © 2012 W3C® (MIT), All Rights Reserved. W3C
> + * liability, trademark and document use rules apply.

That's a lie.

@@ +16,5 @@
> +interface TextDecoder {
> +  [GetterInfallible]
> +  readonly attribute DOMString encoding;
> +  DOMString decode(optional TextDecodeOptions decodeOptions,
> +                   optional ArrayBufferView aView);

... Why did you swap around the arguments? The view needs to come first.
Comment 83 Masatoshi Kimura [:emk] 2012-08-18 04:05:41 PDT
Comment on attachment 653026 [details] [diff] [review]
Encoder + decoder patch with improvements.

This patch failed to build using vc11 with a lot of C2552 errors:
h:/m/mozilla-central/dom/base/Encodings.cpp(209) : error C2552: 'labelsEncodings
' : 初期化子リストによる個別の識別子の初期化に誤りがあります。
        'mozilla::dom::LabelEncoding' : private または protected データ メンバー
を含む型はアグリゲートではありません。

Sorry for the localized message. possible translation is:

h:/m/mozilla-central/dom/base/Encodings.cpp(209) : error C2552: 'labelsEncodings' : non-aggregates cannot be initialized with initializer list.
        'mozilla::dom::LabelEncoding' : Types with private or protected data members are not aggregate.
Comment 84 :Ms2ger 2012-08-18 04:08:28 PDT
(In reply to Masatoshi Kimura [:emk] from comment #83)
> Comment on attachment 653026 [details] [diff] [review]
> Encoder + decoder patch with improvements.
> 
> This patch failed to build using vc11 with a lot of C2552 errors:
> h:/m/mozilla-central/dom/base/Encodings.cpp(209) : error C2552:
> 'labelsEncodings
> ' : 初期化子リストによる個別の識別子の初期化に誤りがあります。
>         'mozilla::dom::LabelEncoding' : private または protected データ メンバー
> を含む型はアグリゲートではありません。
> 
> Sorry for the localized message. possible translation is:
> 
> h:/m/mozilla-central/dom/base/Encodings.cpp(209) : error C2552:
> 'labelsEncodings' : non-aggregates cannot be initialized with initializer
> list.
>         'mozilla::dom::LabelEncoding' : Types with private or protected data
> members are not aggregate.

I was surprised this would build indeed; to initialize the table in Encodings::PopulateEncodingsTable(), LabelEncoding needs to be a POD (<http://en.wikipedia.org/wiki/Plain_old_data_structure>) class.
Comment 85 Masatoshi Kimura [:emk] 2012-08-18 06:05:21 PDT
(In reply to :Ms2ger from comment #82)
> @@ +16,5 @@
> > +interface TextDecoder {
> > +  [GetterInfallible]
> > +  readonly attribute DOMString encoding;
> > +  DOMString decode(optional TextDecodeOptions decodeOptions,
> > +                   optional ArrayBufferView aView);
> 
> ... Why did you swap around the arguments? The view needs to come first.
Because WebIDL.py complained? (Of course, even so, argument order should not be arranged)
There's a spec bug because TextDecoder's syntax is not valid per WebIDL spec.
http://dev.w3.org/2006/webapi/WebIDL/#dfn-optional-argument-default-value
> If an optional argument has a default value, then all preceding optional
> arguments MUST also have a default value.
>
> If the type of an argument is a dictionary type, and this argument is either
> the final argument or is followed by an optional argument, then the argument
> is implicitly considered optional and MUST be specified as optional. Optional
> dictionary type arguments are always considered to have a default value of an
> empty dictionary. 
decodeOptions is an optional dictionary type arguments and considered to have a default value. So the preceding optional argument (i.e. aView) must have a default value.
The same can be said for the TextDecoder constructor.
Comment 86 Masatoshi Kimura [:emk] 2012-08-18 07:36:55 PDT
Created attachment 653058 [details] [diff] [review]
Fix fatal problems of the patch

I couldn't even run the test without this patch.
Comment 87 Masatoshi Kimura [:emk] 2012-08-18 07:39:41 PDT
Created attachment 653059 [details] [diff] [review]
{streaming: true} tests

This test fails with your patch. Your patch does not implement the "streaming" option correctly.
Comment 88 Masatoshi Kimura [:emk] 2012-08-18 07:41:06 PDT
Comment on attachment 653026 [details] [diff] [review]
Encoder + decoder patch with improvements.

>+   * Resets the charset converter so it may be recycled for a completely
>+   * different and urelated buffer of data.
>+   *
>+   * @return NS_OK
>+   */
>+  void ResetEncoder();
>+
>+  /**
>+   * Resets the charset converter so it may be recycled for a completely
>+   * different and urelated buffer of data.
>+   *
>+   * @return NS_OK
>+   */
>+  void ResetDecoder();
You need to bump an iid whenever you change an XPCOM interface.
But I will not accept this ad-hoc change as a de facto intl peer. JS callers should not be able to reset the converter randomly. It will break the object integrity and make the interface harder to use.
In the first place, this change is useless to implement "streaming" option (see the above attachment).
Comment 89 Masatoshi Kimura [:emk] 2012-08-18 09:18:16 PDT
Comment on attachment 653026 [details] [diff] [review]
Encoder + decoder patch with improvements.

>+function CheckEncodingErrorException(msg)
>+{
>+  if (eRv && eRv.name === EncodingErrorName) {
>+    ok(true, "Passed " + msg);
>+  } else {
>+    ok(false, "Failed " + msg);
>+  }
>+  eRv = null;
>+}
>+
>+function CheckNSErrorException(msg)
>+{
>+  if (eRv && eRv.name === NSErrorFailureName) {
>+    ok(true, "Passed " + msg);
>+  } else {
>+    ok(false, "Failed " + msg);
>+  }
>+  eRv = null;
>+}
>+
>+function CheckERvNull(msg)
>+{
>+  if (eRv === null) {
>+    ok(true, "Passed " + msg);
>+  } else {
>+    ok(false, "Failed " + msg);
>+  }
>+}
>+
>+function CheckNullABVException()
>+{
>+  if (eRv.name === "TypeError") {
>+    ok(true, "Passed decoder test, null ABV view passed to decode().");
>+  } else {
>+    ok(false, "Failed decoder test, null ABV view passed to decode().");
>+  }
>+}
Just embed is() in the tests rather than defining bogus functions. Those junks hide expected values and actual values, so they will make the test log uninformative. "PASSED" or "FAIL" will be automatically appended so you don't have to care about them.
For example, replace
>  CheckNSErrorException("constructor fatal option set to false test part 1.");
with
  is(eRv && eRv.name, NSErrorFailureName, "constructor fatal option set to false test part 1.");

>+function ValidateNonEmptyOutText(msg)
>+{
>+  if (outTextLength && outTextIndexLength) {
>+    ok(true, "Passed " + msg);
>+  } else {
>+    ok(false, "Failed " + msg);
>+  }
>+}
>+
>+function ValidateEmptyOutText(msg)
>+{
>+  if (!outTextLength && !outTextIndexLength) {
>+    ok(true, "Passed " + msg);
>+  } else {
>+    ok(false, "Failed " + msg);
>+  }
>+}

>+function NotifySimpleTest(result, msg)
>+{
>+  if (result) {
>+    ok(true, "Passed " + msg);
>+  } else {
>+    ok(false, "Failed " + msg);
>+  }
>+}
Ditto. Use ok() directly.
Comment 90 :Ms2ger 2012-08-18 11:36:28 PDT
Oh, and actually I'd like those tests to use testharness.js :)
Comment 91 Olli Pettay [:smaug] (high review load, please consider other reviewers) 2012-08-19 00:41:40 PDT
Do we have documentation how to write testharness.js tests?
Comment 92 :Ms2ger 2012-08-19 01:00:12 PDT
There's documentation in the file itself: <http://mxr.mozilla.org/mozilla-central/source/dom/imptests/testharness.js#9>, and a tutorial by Robin Berjon: <http://darobin.github.com/test-harness-tutorial/docs/using-testharness.html>.
Comment 93 bsurender 2012-08-19 12:04:39 PDT
(In reply to Masatoshi Kimura [:emk] from comment #81)
> (In reply to bsurender from comment #77)
> > (In reply to Masatoshi Kimura [:emk] from comment #76)
> > > (In reply to bsurender from comment #72)
> > > > Yes mConverter needs to be added to the cycle collection. I've added it.
> > > No, replace it with nsIUnicodeEncoder and nsIUnicodeDecoder.
> > 
> > Please let me know why I cannot use nsIScriptableUnicodeConverter.
> 
> How do you implement "streaming" flag with nsIScriptableUnicodeConverter?
> It is completely wrong to change the encoding to "utf-8" when "streaming" is
> false (see comment #70, Doug and Ms2ger will agree).
> 

Can someone please clarify 

"
decode

    The decode method runs the following steps:

        If the internal streaming flag of the decoder object is not set, then reset the encoding algorithm state to the default values for encoding encoding." 

May be Boris or Jonas? Especially the part "then reset the encoding algorithm state to the default values for encoding encoding."
Comment 94 bsurender 2012-08-19 12:14:36 PDT
Comment on attachment 653026 [details] [diff] [review]
Encoder + decoder patch with improvements.

Fixes tbd.
Comment 95 Masatoshi Kimura [:emk] 2012-08-19 15:40:35 PDT
(In reply to bsurender from comment #93)
> Can someone please clarify 
I already said in comment #70. Did you decide not to trust my comment? Great.
If "encoding algorithm state" means resetting the encoding to utf-8, TextDecoder con not use nothing but utf-8. Because TextDecoder constructor will "Initialize the internal encoding algorithm state to the default values for encoding encoding." in step 5.
http://wiki.whatwg.org/wiki/StringEncoding#TextDecoder

> May be Boris or Jonas? Especially the part "then reset the encoding
> algorithm state to the default values for encoding encoding."
Everyone except you will agree.
Comment 96 bsurender 2012-08-19 17:04:12 PDT
 (In reply to :Ms2ger from comment #82)
> Comment on attachment 653026 [details] [diff] [review]
> Encoder + decoder patch with improvements.
> 
> Review of attachment 653026 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> @@ +80,5 @@
> > +  return NS_OK;
> > +}
> > +
> > +nsresult
> > +Encodings::FindEncodingForLabel(const nsAString& aLabel, nsAString& aRvEncoding)
> 
> I would return a boolean (whether the encoding was found)

I thought about this when I was coding it but then thought that the more gecko way to do it would be to return an NS_ERROR_FAILURE if an encoding was not found and NS_OK if it was found at which point aRvEncoding would be populated with a valid encoding.
Comment 97 bsurender 2012-08-19 17:10:34 PDT
(In reply to bsurender from comment #96)
>  (In reply to :Ms2ger from comment #82)
> > Comment on attachment 653026 [details] [diff] [review]
> > Encoder + decoder patch with improvements.
> > 
> > Review of attachment 653026 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > @@ +80,5 @@
> > > +  return NS_OK;
> > > +}
> > > +
> > > +nsresult
> > > +Encodings::FindEncodingForLabel(const nsAString& aLabel, nsAString& aRvEncoding)
> > 
> > I would return a boolean (whether the encoding was found)
> 
> I thought about this when I was coding it but then thought that the more
> gecko way to do it would be to return an NS_ERROR_FAILURE if an encoding was
> not found and NS_OK if it was found at which point aRvEncoding would be
> populated with a valid encoding.

Actually Get() does return a bool. I think I'll follow a similar pattern.
Comment 98 Masatoshi Kimura [:emk] 2012-08-19 17:55:48 PDT
(In reply to bsurender from comment #96)
> I thought about this when I was coding it but then thought that the more
> gecko way to do it would be to return an NS_ERROR_FAILURE if an encoding was
> not found and NS_OK if it was found at which point aRvEncoding would be
> populated with a valid encoding.
No, not at all. At most, it is only a convention required by XPCOM. You don't have to follow the "coonvention" for internal functions.
Comment 99 Masatoshi Kimura [:emk] 2012-08-19 17:58:57 PDT
Comment on attachment 653026 [details] [diff] [review]
Encoder + decoder patch with improvements.

>+        eRv = NSErrorFailureName;
Do not mix a failure indicator with a real exception.
>+  } catch (e) {
>+    eRv = e;
Do not pass around exceptions using a global variable.
Honestly, I can't read your cryptic tests.
Comment 100 Masatoshi Kimura [:emk] 2012-08-19 18:14:50 PDT
>+Encodings::FindEncodingForLabel(const nsAString& aLabel, nsAString& aRvEncoding)
Do not name non-nsresult arguments such as aRvFoo.
"rv" is only for nsresult/ErrorResult.
Comment 101 bsurender 2012-08-19 19:18:08 PDT
(In reply to :Ms2ger from comment #82)
> Comment on attachment 653026 [details] [diff] [review]
> Encoder + decoder patch with improvements.
> 
> Review of attachment 653026 [details] [diff] [review]:
> -----------------------------------------------------------------
> > ::: dom/base/TextDecoder.cpp
> @@ +3,5 @@
> > + * License, v. 2.0. If a copy of the MPL was not distributed with this
> > + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> > +
> > +#include "nsContentUtils.h"
> > +#include "TextDecoder.h"
> 
> Include "mozilla/dom/TextDecoder.h" *first*
> 
Ah yes. I did think about this but then decided to follow the alphabetised version.
Comment 102 bsurender 2012-08-19 19:28:24 PDT
(In reply to :Ms2ger from comment #82)
> Comment on attachment 653026 [details] [diff] [review]
> Encoder + decoder patch with improvements.
> 
> Review of attachment 653026 [details] [diff] [review]:
> -----------------------------------------------------------------
> @@ +17,5 @@
> > +  MOZ_ASSERT(aGlobal);
> > +  mGlobal = aGlobal;
> > +  SetIsDOMBinding();
> > +
> > +  if (!aEncoding.WasPassed() || aEncoding.Value().IsEmpty()) {
> 
> I think I told you that the aEncoding.Value().IsEmpty() check is wrong. This
> needs a test.

Why is this check wrong or why won't this check work?
Comment 103 bsurender 2012-08-19 19:35:16 PDT
(In reply to :Ms2ger from comment #82)
> Comment on attachment 653026 [details] [diff] [review]
> Encoder + decoder patch with improvements.
> 
> Review of attachment 653026 [details] [diff] [review]:
> ----------------------------------------------------------------- 
> @@ +30,5 @@
> > +
> > +  mEncodings = Encodings::GetOrCreate();
> > +  if (!mEncodings) {
> > +    aRv.Throw(NS_ERROR_OUT_OF_MEMORY);
> > +  }
> 
> Assert that it isn't null.
> 

From comment 61
@@ +34,5 @@
> +    mEncodingProvided = true;
> +  }
> +
> +  mEncodings = Encodings::GetOrCreate();
> +  MOZ_ASSERT(mEncodings);

Assert or throw?

I did assert it in my very first patch then got the review comment and thought about it. The only way Encodings::GetOrCreate() fails is if 'new' fails or NS_ADDREF fails. Additionally, Encodings::GetOrCreate() is called from within Text(Encoder | Decoder)::Init() which throws for an encoding error, so I figured it could just throw another error for a non encoding type error.
Comment 104 Masatoshi Kimura [:emk] 2012-08-19 19:44:49 PDT
(In reply to bsurender from comment #102)
> > > +  if (!aEncoding.WasPassed() || aEncoding.Value().IsEmpty()) {
> > 
> > I think I told you that the aEncoding.Value().IsEmpty() check is wrong. This
> > needs a test.
> 
> Why is this check wrong or why won't this check work?

http://wiki.whatwg.org/wiki/StringEncoding#TextDecoder
> 2. If called without an encoding argument, let label be the "utf-8".
>    Otherwise, let label be the value of the encoding argument.
It does not say "If called without an encoding argument or with an empty string,". Why did you set to "utf-8" when the argument is empty? Why don't you read the spec while quoting "reset the encoding algorithm state..." repeatedly?
Comment 105 :Ms2ger 2012-08-20 09:18:35 PDT
(In reply to bsurender from comment #103)
> (In reply to :Ms2ger from comment #82)
> > Comment on attachment 653026 [details] [diff] [review]
> > Encoder + decoder patch with improvements.
> > 
> > Review of attachment 653026 [details] [diff] [review]:
> > ----------------------------------------------------------------- 
> > @@ +30,5 @@
> > > +
> > > +  mEncodings = Encodings::GetOrCreate();
> > > +  if (!mEncodings) {
> > > +    aRv.Throw(NS_ERROR_OUT_OF_MEMORY);
> > > +  }
> > 
> > Assert that it isn't null.
> > 
> 
> From comment 61
> @@ +34,5 @@
> > +    mEncodingProvided = true;
> > +  }
> > +
> > +  mEncodings = Encodings::GetOrCreate();
> > +  MOZ_ASSERT(mEncodings);
> 
> Assert or throw?
> 
> I did assert it in my very first patch then got the review comment and
> thought about it. The only way Encodings::GetOrCreate() fails is if 'new'
> fails or NS_ADDREF fails. Additionally, Encodings::GetOrCreate() is called
> from within Text(Encoder | Decoder)::Init() which throws for an encoding
> error, so I figured it could just throw another error for a non encoding
> type error.

new can't fail, and I'm not sure what you mean by NS_ADDREF failing. If we need to do something there that can fail at some point, we can reconsider then.
Comment 106 Doug Turner (:dougt) 2012-08-23 09:24:18 PDT
clearing assignee.  feel free to pick this up.
Comment 107 Masatoshi Kimura [:emk] 2012-09-02 18:48:35 PDT
Created attachment 657737 [details] [diff] [review]
Update bsurender's patch to tip

PR types to stdint types, nsnull to nullptr, etc.
Comment 108 Masatoshi Kimura [:emk] 2012-09-02 18:49:33 PDT
Created attachment 657738 [details] [diff] [review]
Implement StringEncoding API

This patch is based on bsurender's work.
- Created an own directory for the encoding module.
- Resolved some overlooked review comments.
- Replaced nsIScriptableUnicodeConverter with nsIUnicode(De|En)coder.
- Deciphered and completely rewrote unreadable tests.
- And many more cleanups.
Comment 109 Masatoshi Kimura [:emk] 2012-09-02 18:50:14 PDT
Created attachment 657739 [details] [diff] [review]
Part 2: Import tests from StringEncoding shim
Comment 110 Masatoshi Kimura [:emk] 2012-09-02 18:51:10 PDT
Created attachment 657740 [details] [diff] [review]
Incorporate the imported test into our tree and updated some tests to the latest spec

- TextEncoder no longer supports non-UTF encodings per the latest spec.
- Decoder algorithms for utf-16, utf-16le and utf-16be aren't supposed to consume the BOM.
- Workaround Gecko bugs about ibm864 and big5 decoders.
Comment 111 Masatoshi Kimura [:emk] 2012-09-02 18:51:52 PDT
Created attachment 657741 [details] [diff] [review]
Fix tests for multi-byte encodings

Separated a patch and generated it with "hg diff -p -U 0" to reduce the patch size.
Comment 112 Masatoshi Kimura [:emk] 2012-09-02 18:52:47 PDT
Comment on attachment 657739 [details] [diff] [review]
Part 2: Import tests from StringEncoding shim

The import source is <https://code.google.com/p/stringencoding/>.
Comment 113 Masatoshi Kimura [:emk] 2012-09-02 18:55:10 PDT
https://tbpl.mozilla.org/?tree=Try&pusher=VYV03354@nifty.ne.jp
Comment 114 Andrew Sutherland [:asuth] 2012-09-05 16:43:14 PDT
Requesting blocking-basecamp because the polyfill that we are using as an alternative to a proper native implementation for this API is big'n'slow.

Aside: The try push's specific revision for those who don't want to hit the down button a lot:
https://tbpl.mozilla.org/?tree=Try&rev=c4a8a6ae8177

== Without these changes:
gaia-email-opt.js is 1568K

Slow decodes when synchronizing 39 messages from a test yahoo.com subscribed to the gaia dev list.  Anything that takes more than 2 milliseconds per Date.now stamping gets logged  The first number is the number of milliseconds, the second is the length of the text decoded:

ERR: SLOWDECODE 3 817 ascii
ERR: SLOWDECODE 3 1013 ascii
ERR: SLOWDECODE 3 877 ascii
ERR: SLOWDECODE 3 1066 ascii
ERR: SLOWDECODE 4 976 ascii
ERR: SLOWDECODE 3 1119 ascii
ERR: SLOWDECODE 3 1053 ascii
ERR: SLOWDECODE 3 1007 ascii
ERR: SLOWDECODE 7 1172 ascii
ERR: SLOWDECODE 4 1681 utf-8
ERR: SLOWDECODE 8 2759 utf-8
ERR: SLOWDECODE 4 1441 utf-8
ERR: SLOWDECODE 3 979 utf-8
ERR: SLOWDECODE 5 1622 utf-8
ERR: SLOWDECODE 3 1111 utf-8
ERR: SLOWDECODE 6 2235 utf-8
ERR: SLOWDECODE 6 2247 utf-8
ERR: SLOWDECODE 17 4385 utf-8
ERR: SLOWDECODE 10 2721 utf-8
ERR: SLOWDECODE 5 1220 utf-8
ERR: SLOWDECODE 7 2368 utf-8
ERR: SLOWDECODE 8 2275 utf-8
ERR: SLOWDECODE 4 1046 utf-8
ERR: SLOWDECODE 8 1467 utf-8
ERR: SLOWDECODE 3 581 utf-8
ERR: SLOWDECODE 3 432 utf-8
ERR: SLOWDECODE 10 2845 utf-8
ERR: SLOWDECODE 5 1893 utf-8
ERR: SLOWDECODE 4 1693 utf-8
ERR: SLOWDECODE 6 1598 utf-8
ERR: SLOWDECODE 4 1257 utf-8
ERR: SLOWDECODE 4 1019 utf-8
ERR: SLOWDECODE 3 789 utf-8

I have elided 5 lines which were binary decodes which (at least) the polyfill doesn't do anymore so I manually implemented them and are not meaningful.  Note that we synchronized 39 messages, so *almost every* message required a slow decode.  The tally here is 172ms.

== With these patches applied:
gaia-email-opt.js is 960k (a savings of 608k, little of which is comments)

There are no slow decodes (after eliding the binary decodes).

Also, things seemed to work as well in the UI with the patches applied as they do without.  Unfortunately, I can't run the unit tests that actually deal with all the i18n and encoding logic until bug 781615 makes things easier or I rejigger the test framework to fake out a docshell or run as mochitests or something like that.
Comment 115 Andrew Overholt [:overholt] 2012-09-07 09:11:19 PDT
Blocking email app so a blocker.
Comment 116 Masatoshi Kimura [:emk] 2012-09-08 03:45:57 PDT
Created attachment 659469 [details] [diff] [review]
Update bsurender's patch to tip

Rebased to tip.
Comment 117 Masatoshi Kimura [:emk] 2012-09-08 03:47:03 PDT
Created attachment 659470 [details] [diff] [review]
Implement StringEncoding API

Rebased to tip, fixed some comments, and removed an useless check from the constructor.
Comment 118 Masatoshi Kimura [:emk] 2012-09-08 04:56:47 PDT
Created attachment 659479 [details] [diff] [review]
Implement StringEncoding API

Sorry, forgot to qrefresh.
Comment 119 Doug Turner (:dougt) 2012-09-11 13:41:53 PDT
masatoshi, in order to review this, i should apply "Update bsurender's patch to tip", then apply "Implement StringEncoding API", then look at how that diffs against m-c?  Or am I mistaking something.

If, yes, I think it makes sense to merge "Update bsurender's patch to tip" into your patch.
Comment 120 Masatoshi Kimura [:emk] 2012-09-12 15:37:35 PDT
Created attachment 660595 [details] [diff] [review]
Implement StringEncoding API

Sure. Here is a folded patch.
Comment 121 Doug Turner (:dougt) 2012-09-12 17:32:47 PDT
Comment on attachment 660595 [details] [diff] [review]
Implement StringEncoding API

Review of attachment 660595 [details] [diff] [review]:
-----------------------------------------------------------------

Lets remove all mode lines in this patch.  They are not consistent.  (Or you can fix them to be consistent).

I haven't completed the review.  It would be really good to get Simon to review too.

::: dom/encoding/EncodingUtils.cpp
@@ +48,5 @@
> +
> +  // Minimum bytes in input stream data that represents
> +  // the Byte Order Mark is 2. Max is 3.
> +  if (aLength < 2) {
> +    return 0;

shouldn't this be -1 or something to indicate an error?

@@ +72,5 @@
> +  if (aData[0] == '\xEF' && aData[1] == '\xBB' && aData[2] == '\xBF') {
> +    aRetval.AssignLiteral("utf-8");
> +    return 3;
> +  }
> +  return 0;

Same here.

@@ +88,5 @@
> +  // Truncating to clear aOutEncoding in case of failure.
> +  aOutEncoding.Truncate();
> +
> +  // Steps to get an encoding from Encoding spec
> +  // 1. Remove any leading and trailing space characters from label.

Remove this comment.  No need to comment on exactly what the source code is doing (and in this case it is doing it incorrectly as StripChars removes ALL matching chars in the string, not just leading and trailing.

@@ +89,5 @@
> +  aOutEncoding.Truncate();
> +
> +  // Steps to get an encoding from Encoding spec
> +  // 1. Remove any leading and trailing space characters from label.
> +  label.StripChars(" \t\n\f\r");

does .StripWhitespace() work?

@@ +96,5 @@
> +  }
> +
> +  // 2. If label is an ASCII case-insensitive match for any of the
> +  //    labels listed in the table below, return the corresponding
> +  //    encoding, or return failure otherwise. 

This comment isn't readable.  Also remove it.

@@ +118,5 @@
> +  const nsString&
> +  GetEncoding() const
> +  {
> +    return mEncoding;
> +  }

No need to have getter methods here.  You can directly access mLabel and mEncoding.

@@ +121,5 @@
> +    return mEncoding;
> +  }
> +
> +  nsString mLabel;
> +  nsString mEncoding;

These probably should be narrow (const char*) strings so that you don't take up two bytes per char for no real reason.

::: dom/encoding/EncodingUtils.h
@@ +64,5 @@
> +   *
> +   * Source for labels and encodingsArray arrays is:
> +   * http://dvcs.w3.org/hg/encoding/raw-file/tip/Overview.html#encodings
> +   */
> +  void PopulateEncodingsTable();

I'd remove this method and just do this in the constructor.  I'd also move the array of encodings out of the method block and make it static.

::: dom/encoding/Makefile.in
@@ +26,5 @@
> +  TextDecoder.h \
> +  TextEncoder.h \
> +  $(NULL)
> +
> +CPPSRCS =			\

CPPSRCS = \

too many tabs

::: dom/encoding/TextDecoder.cpp
@@ +17,5 @@
> +                  const TextDecoderOptions& aFatal,
> +                  ErrorResult& aRv)
> +{
> +  mEncoding.Assign(aEncoding);
> +  mEncoding.StripChars(" \t\n\f\r");

StripWhitespace() might work.

@@ +22,5 @@
> +
> +  // If label is a case-insensitive match for "utf-16"
> +  // then set the internal useBOM flag.
> +  if (mEncoding.LowerCaseEqualsLiteral("utf-16")) {
> +    mUseBOM = true;

Since you are special casing, you can avoid the FileEncodingForLabel() call, and just add:

mEnconding.AssignLiteral("utf-16le");
return;

@@ +33,5 @@
> +    aRv.Throw(NS_ERROR_DOM_ENCODING_NOT_SUPPORTED_ERR);
> +    return;
> +  }
> +
> +  // If BOM is used, we can't determine the converter yet.

You can then remove this if stmt.

@@ +38,5 @@
> +  if (mUseBOM) {
> +    return;
> +  }
> +
> +  // For UTF encodings, feed a BOM to workaround a bug of our decoders.

Do you know what bugs this is referencing?  We should add a comment referencing the bug number.

@@ +48,5 @@
> +    mOffset = 2;
> +  } else if (mEncoding.EqualsLiteral("utf-16be")) {
> +    memcpy(mInitialBytes, "\xFE\xFF", 2);
> +    mOffset = 2;
> +  }

Right now mOffset is undefined if we aren't encoding utf8.  Shouldn't we add:

else {
  mOffset = -1;
}

@@ +87,5 @@
> +
> +void
> +TextDecoder::InitDecoder(nsAString* aOutString)
> +{
> +  if (!mOffset) {

Should we also return early for a null string?

@@ +94,5 @@
> +  PRUnichar buf[3];
> +  int32_t srcLen = mOffset;
> +  int32_t dstLen = mozilla::ArrayLength(buf);
> +  DebugOnly<nsresult> rv =
> +    mDecoder->Convert(mInitialBytes, &srcLen, buf, &dstLen);

I am very worried that mOffset and mInitialBytes are not initialized for non utf8 strings.

It looks like we only initialize theses member variables in TextDecoder::Init() when mEncoding is a utf8 type.

@@ +108,5 @@
> +TextDecoder::ResetDecoder(nsAString* aOutString)
> +{
> +  mDecoder->Reset();
> +  if (!aOutString && mUseBOM) {
> +    mOffset = 0;

I am not sure if the logic is right here.  If we don't have any outstring, and we were using BOM, we set mOffset to 0, then call InitDecoder which returns early in this case.  We can avoid the extra method call and return early?

@@ +124,5 @@
> +  uint32_t length;
> +  // If view is not specified, let view be a Uint8Array of length 0.
> +  if (!aView) {
> +    data = EmptyCString().BeginReading();
> +    length = EmptyCString().Length();

for no aView, can't we return early here?  Is there really any decoding to do?

@@ +145,5 @@
> +  if (NS_FAILED(rv)) {
> +    aRv.Throw(rv);
> +    return;
> +  }
> +  // Need a fallible allocator because the caller may be a content.

You should clarify and say :

// Need a fallible allocator because the caller may be a content and the content can specify the length of the string.


It is probably a good idea to use a nsAutoPtr so that you have it manage the life span of the memory.

@@ +170,5 @@
> +      ResetDecoder();
> +    } else {
> +      data += srcLen + 1;
> +      length -= srcLen + 1;
> +      aOutDecodedString.Append(static_cast<PRUnichar>(0xFFFD));

0xFFFD - you probably want to comment what you are doing here.

@@ +237,5 @@
> +void
> +TextDecoder::GetEncoding(nsAString& aEncoding)
> +{
> +  if (mUseBOM || mEncoding.EqualsLiteral("utf-16le")) {
> +    aEncoding.AssignLiteral("utf-16");

what about utf-16be utf-16l?

@@ +242,5 @@
> +    return;
> +  }
> +
> +  if (mEncoding.EqualsLiteral("x-windows-949")) {
> +    aEncoding.AssignLiteral("euc-kr");

I don't understand this replacement.  Maybe a comment is required.
Comment 122 Masatoshi Kimura [:emk] 2012-09-12 19:43:37 PDT
(In reply to Doug Turner (:dougt) from comment #121)
> ::: dom/encoding/EncodingUtils.cpp
> @@ +48,5 @@
> > +
> > +  // Minimum bytes in input stream data that represents
> > +  // the Byte Order Mark is 2. Max is 3.
> > +  if (aLength < 2) {
> > +    return 0;
> 
> shouldn't this be -1 or something to indicate an error?
No, it means the stream didn't start with BOM and the decoder should read the stream from offset 0.
The function implements "decode a byte stream" algorithm's step 1 & 2.
http://dvcs.w3.org/hg/encoding/raw-file/tip/Overview.html#decode-and-encode
I'll update a comment to clarify that.

> @@ +88,5 @@
> > +  // Truncating to clear aOutEncoding in case of failure.
> > +  aOutEncoding.Truncate();
> > +
> > +  // Steps to get an encoding from Encoding spec
> > +  // 1. Remove any leading and trailing space characters from label.
> 
> Remove this comment.  No need to comment on exactly what the source code is
> doing (and in this case it is doing it incorrectly as StripChars removes ALL
> matching chars in the string, not just leading and trailing.
I'll remove the verbose comment, but I copied the text from the StringEncoding spec. Therefore, the code should also be fixed so that it removes only leading and trailing space.

> @@ +89,5 @@
> > +  aOutEncoding.Truncate();
> > +
> > +  // Steps to get an encoding from Encoding spec
> > +  // 1. Remove any leading and trailing space characters from label.
> > +  label.StripChars(" \t\n\f\r");
> 
> does .StripWhitespace() work?
The spec says "The space characters, for the purposes of this specification, are U+0020 SPACE, U+0009 CHARACTER TABULATION (tab), U+000A LINE FEED (LF), U+000C FORM FEED (FF), and U+000D CARRIAGE RETURN (CR)."
http://dvcs.w3.org/hg/encoding/raw-file/tip/Overview.html#terminology
The existing StripWhitespace() doesn't exactly match the definition. (Maybe I can add a function to EncodingUtils.)

> @@ +121,5 @@
> > +    return mEncoding;
> > +  }
> > +
> > +  nsString mLabel;
> > +  nsString mEncoding;
> 
> These probably should be narrow (const char*) strings so that you don't take
> up two bytes per char for no real reason.
Originally these strings are widen by your review comment #11. But ok, I'd prefer using const char* here.

> @@ +38,5 @@
> > +  if (mUseBOM) {
> > +    return;
> > +  }
> > +
> > +  // For UTF encodings, feed a BOM to workaround a bug of our decoders.
> 
> Do you know what bugs this is referencing?  We should add a comment
> referencing the bug number.
Bug 634541, I'll update a comment.

> @@ +48,5 @@
> > +    mOffset = 2;
> > +  } else if (mEncoding.EqualsLiteral("utf-16be")) {
> > +    memcpy(mInitialBytes, "\xFE\xFF", 2);
> > +    mOffset = 2;
> > +  }
> 
> Right now mOffset is undefined if we aren't encoding utf8.  Shouldn't we add:
> 
> else {
>   mOffset = -1;
> }
mOffset is initialized to zero in the constructor. See the above explanation why it is not -1.

> @@ +87,5 @@
> > +
> > +void
> > +TextDecoder::InitDecoder(nsAString* aOutString)
> > +{
> > +  if (!mOffset) {
> 
> Should we also return early for a null string?
Only HandleBOM() uses aOutString. Null aOutString is valid.

> @@ +94,5 @@
> > +  PRUnichar buf[3];
> > +  int32_t srcLen = mOffset;
> > +  int32_t dstLen = mozilla::ArrayLength(buf);
> > +  DebugOnly<nsresult> rv =
> > +    mDecoder->Convert(mInitialBytes, &srcLen, buf, &dstLen);
> 
> I am very worried that mOffset and mInitialBytes are not initialized for non
> utf8 strings.
> 
> It looks like we only initialize theses member variables in
> TextDecoder::Init() when mEncoding is a utf8 type.
mOffset is initialized to zero in the constructor. mOffset will be non-zero if and only if mInitialBytes is initialized. If mOffset is zero, mInitialBytes is not read at all. Is it really needed to initialize the garbage?

> @@ +124,5 @@
> > +  uint32_t length;
> > +  // If view is not specified, let view be a Uint8Array of length 0.
> > +  if (!aView) {
> > +    data = EmptyCString().BeginReading();
> > +    length = EmptyCString().Length();
> 
> for no aView, can't we return early here?  Is there really any decoding to
> do?
If the previous decode() was called with {stream: true}, decode() may emit a replacement character even if the input is null or empty ArrayBuffer.

> @@ +237,5 @@
> > +void
> > +TextDecoder::GetEncoding(nsAString& aEncoding)
> > +{
> > +  if (mUseBOM || mEncoding.EqualsLiteral("utf-16le")) {
> > +    aEncoding.AssignLiteral("utf-16");
> 
> what about utf-16be utf-16l?
If the BOM is not used, utf-16be is not an alias of utf-16. But I'm not sure whether we should return "utf-16" or "utf-16be" when TextDecoder was constructed with encoding "utf-16" and the BOM indicates the stream was utf-16be. The spec says nothing about how to use the internal useBOM flag.

> @@ +242,5 @@
> > +    return;
> > +  }
> > +
> > +  if (mEncoding.EqualsLiteral("x-windows-949")) {
> > +    aEncoding.AssignLiteral("euc-kr");
> 
> I don't understand this replacement.  Maybe a comment is required.
Our euc-kr decoder differs from what the Encoding spec requires (see bug 562091). So I used x-windows-949 decoder which is more close to the behavior required by the spec. I'll add a bug number and a comment explaining that.

I'll fix all other points not replied here.
Comment 123 Masatoshi Kimura [:emk] 2012-09-13 04:45:21 PDT
Created attachment 660795 [details] [diff] [review]
Implement StringEncoding API

Resolved review comments.
Comment 124 Masatoshi Kimura [:emk] 2012-09-13 04:45:58 PDT
Created attachment 660796 [details] [diff] [review]
Interdiff from the previous patch (for ease of review)
Comment 125 Masatoshi Kimura [:emk] 2012-09-13 08:07:57 PDT
Created attachment 660846 [details] [diff] [review]
Implement StringEncoding API

The nsAutoPtr usage in the previous patch was obviously wrong.
Comment 126 Masatoshi Kimura [:emk] 2012-09-16 03:15:43 PDT
Created attachment 661575 [details] [diff] [review]
Implement StringEncoding API

The latest spec update clarified that utf encodings will consume the BOM even if it is not used to determine the encoding (that is, the internal useBOM flag is false). I updated the patch accordingly.
Comment 127 Masatoshi Kimura [:emk] 2012-09-16 03:16:13 PDT
Created attachment 661576 [details] [diff] [review]
Interdiff
Comment 128 Masatoshi Kimura [:emk] 2012-09-16 03:17:47 PDT
Created attachment 661577 [details] [diff] [review]
Incorporate the imported test into our tree and updated some tests to the latest spec

Removed one fixup which is no longer needed.
Comment 129 Doug Turner (:dougt) 2012-09-17 08:54:59 PDT
Comment on attachment 661575 [details] [diff] [review]
Implement StringEncoding API

Review of attachment 661575 [details] [diff] [review]:
-----------------------------------------------------------------

with those changes.  i'd really like someone with more intl experience to review too.

::: dom/encoding/EncodingUtils.cpp
@@ +243,5 @@
> +  mLabelsEncodings.Init(numLabels);
> +
> +  for (uint32_t i = 0; i < numLabels; i++) {
> +    mLabelsEncodings.Put(NS_ConvertASCIItoUTF16(labelsEncodings[i].mLabel),
> +      new NS_ConvertASCIItoUTF16(labelsEncodings[i].mEncoding));

nit: typically the style says that the second line should be aligned with the NS_ConvertASCIItoUTF16

::: dom/encoding/EncodingUtils.h
@@ +63,5 @@
> +   *
> +   * Source for labels and encodingsArray arrays is:
> +   * http://dvcs.w3.org/hg/encoding/raw-file/tip/Overview.html#encodings
> +   */
> +  void PopulateEncodingsTable();

You can remove this method now.

::: dom/encoding/TextDecoder.cpp
@@ +15,5 @@
> +
> +void
> +TextDecoder::Init(const nsAString& aEncoding,
> +                  const TextDecoderOptions& aFatal,
> +                  ErrorResult& aRv)

aRv is a weird parameter name.  (rv usually is reserved for nsresults).  I think everyone would be happier with aErrorResult.

@@ +18,5 @@
> +                  const TextDecoderOptions& aFatal,
> +                  ErrorResult& aRv)
> +{
> +  mEncoding.Assign(aEncoding);
> +  mEncoding.Trim(" \t\n\f\r");

I liked your suggestion -- move this .Trim() into utils.  In this method, be sure to mention why we can't use .RemoveWhitespace().

::: dom/encoding/TextEncoder.cpp
@@ +33,5 @@
> +  }
> +
> +  // Otherwise, if the Name of the returned encoding is not one of
> +  // "utf-8", "utf-16", or "utf-16be" throw an "EncodingError" exception
> +  // and terminate these steps. 

trailing whitespace in a few places in this file (it is really easy to see with the splinter review tool, or M-x remove-trailing-whitespace)

@@ +82,5 @@
> +    aRv.Throw(NS_ERROR_OUT_OF_MEMORY);
> +    return nullptr;
> +  }
> +
> +  JSObject* outView = nullptr;

move this down to where it is used.  (outside the if block, of course)

::: dom/encoding/TextEncoder.h
@@ +12,5 @@
> +#include "mozilla/ErrorResult.h"
> +#include "nsIUnicodeEncoder.h"
> +#include "nsString.h"
> +
> +#include "nsAutoPtr.h"

Do you need this #include?
Comment 130 Doug Turner (:dougt) 2012-09-17 08:56:45 PDT
Comment on attachment 657741 [details] [diff] [review]
Fix tests for multi-byte encodings

you really should have a comment as to why these tests are disabled. r+ with that change.
Comment 131 Doug Turner (:dougt) 2012-09-17 08:57:39 PDT
Comment on attachment 661577 [details] [diff] [review]
Incorporate the imported test into our tree and updated some tests to the latest spec

Same thing - if you comment out a test, add a comment:
   -testEncodeDecode('windows-1252', 0, 0xFF);
   +//testEncodeDecode('windows-1252', 0, 0xFF);
Comment 132 Doug Turner (:dougt) 2012-09-17 08:59:19 PDT
Comment on attachment 661575 [details] [diff] [review]
Implement StringEncoding API

Review of attachment 661575 [details] [diff] [review]:
-----------------------------------------------------------------

with those changes.  i'd really like someone with more intl experience to review too.

Oh, one more thing:
When constructing the hash table, you had something like 'new  NS_ConvertASCIItoUTF16(...)'.  I wasn't sure if that was the best way to create a nsString.  I asked on #developers for clarification.  I didn't hear anything immediately.  You should follow up:

   if I have a const char* string, and I need to create a new'ed string to pass into nsClassHashtable, is it permissible to do:  'new NS_ConvertASCIItoUTF16(...)'

::: dom/encoding/EncodingUtils.cpp
@@ +243,5 @@
> +  mLabelsEncodings.Init(numLabels);
> +
> +  for (uint32_t i = 0; i < numLabels; i++) {
> +    mLabelsEncodings.Put(NS_ConvertASCIItoUTF16(labelsEncodings[i].mLabel),
> +      new NS_ConvertASCIItoUTF16(labelsEncodings[i].mEncoding));

nit: typically the style says that the second line should be aligned with the NS_ConvertASCIItoUTF16

::: dom/encoding/EncodingUtils.h
@@ +63,5 @@
> +   *
> +   * Source for labels and encodingsArray arrays is:
> +   * http://dvcs.w3.org/hg/encoding/raw-file/tip/Overview.html#encodings
> +   */
> +  void PopulateEncodingsTable();

You can remove this method now.

::: dom/encoding/TextDecoder.cpp
@@ +15,5 @@
> +
> +void
> +TextDecoder::Init(const nsAString& aEncoding,
> +                  const TextDecoderOptions& aFatal,
> +                  ErrorResult& aRv)

aRv is a weird parameter name.  (rv usually is reserved for nsresults).  I think everyone would be happier with aErrorResult.

@@ +18,5 @@
> +                  const TextDecoderOptions& aFatal,
> +                  ErrorResult& aRv)
> +{
> +  mEncoding.Assign(aEncoding);
> +  mEncoding.Trim(" \t\n\f\r");

I liked your suggestion -- move this .Trim() into utils.  In this method, be sure to mention why we can't use .RemoveWhitespace().

::: dom/encoding/TextEncoder.cpp
@@ +33,5 @@
> +  }
> +
> +  // Otherwise, if the Name of the returned encoding is not one of
> +  // "utf-8", "utf-16", or "utf-16be" throw an "EncodingError" exception
> +  // and terminate these steps. 

trailing whitespace in a few places in this file (it is really easy to see with the splinter review tool, or M-x remove-trailing-whitespace)

@@ +82,5 @@
> +    aRv.Throw(NS_ERROR_OUT_OF_MEMORY);
> +    return nullptr;
> +  }
> +
> +  JSObject* outView = nullptr;

move this down to where it is used.  (outside the if block, of course)

::: dom/encoding/TextEncoder.h
@@ +12,5 @@
> +#include "mozilla/ErrorResult.h"
> +#include "nsIUnicodeEncoder.h"
> +#include "nsString.h"
> +
> +#include "nsAutoPtr.h"

Do you need this #include?
Comment 133 :Ms2ger 2012-09-17 09:29:16 PDT
(In reply to Doug Turner (:dougt) from comment #129)
> ::: dom/encoding/TextDecoder.cpp
> @@ +15,5 @@
> > +
> > +void
> > +TextDecoder::Init(const nsAString& aEncoding,
> > +                  const TextDecoderOptions& aFatal,
> > +                  ErrorResult& aRv)
> 
> aRv is a weird parameter name.  (rv usually is reserved for nsresults).  I
> think everyone would be happier with aErrorResult.

Not really, no. aRv is the canonical name for an ErrorResult& argument.
Comment 134 Doug Turner (:dougt) 2012-09-17 09:49:13 PDT
Okay, everyone but ms2ger.  It is a terrible name.
Comment 135 Masatoshi Kimura [:emk] 2012-09-18 02:07:10 PDT
(In reply to Doug Turner (:dougt) from comment #132)
> Oh, one more thing:
> When constructing the hash table, you had something like 'new 
> NS_ConvertASCIItoUTF16(...)'.  I wasn't sure if that was the best way to
> create a nsString.  I asked on #developers for clarification.  I didn't hear
> anything immediately.  You should follow up:
> 
>    if I have a const char* string, and I need to create a new'ed string to
> pass into nsClassHashtable, is it permissible to do:  'new
> NS_ConvertASCIItoUTF16(...)'
Hm, NS_ConvertASCIItoUTF16 inherits from nsAutoString, which is a NS_STACK_CLASS.
We will need other way to create hash values.
Comment 136 Masatoshi Kimura [:emk] 2012-09-18 16:18:03 PDT
Created attachment 662349 [details] [diff] [review]
Implement StringEncoding API

I've changed nsClassHashtable<nsStringHashKey, nsString> to nsDataHashtable<nsStringHashKey, const char*> because:
- GetUnicode(En|De)coder() takes a const char* parameter.
- .encode attribute usage will be less frequent than .decode/.encode methods.

Simon, could you review the usage of nsIUnicode(En|De)coder?
Comment 137 Masatoshi Kimura [:emk] 2012-09-18 17:00:23 PDT
Created attachment 662360 [details] [diff] [review]
Part 3: Incorporate the imported test into our tree and updated some tests to the latest spec. r=doug

Added a comment explaining the reason of comment out.
Comment 138 Masatoshi Kimura [:emk] 2012-09-18 17:01:18 PDT
Created attachment 662361 [details] [diff] [review]
Part 4: Fix tests for multi-byte encodings. r=doug
Comment 139 Andrew Overholt [:overholt] 2012-09-24 10:40:35 PDT
We discussed this during triage today and decided that it isn't strictly a blocker anymore.  As always, feel free to disagree and re-nom.
Comment 140 Simon Montagu :smontagu 2012-09-27 03:43:52 PDT
Comment on attachment 662349 [details] [diff] [review]
Implement StringEncoding API

Review of attachment 662349 [details] [diff] [review]:
-----------------------------------------------------------------

The intl API usage looks OK to me, but I have a couple of unrelated nits:

::: dom/encoding/TextEncoder.cpp
@@ +43,5 @@
> +    return;
> +  }
> +
> +  // Initialize the internal encoding algorithm state
> +  // to the default values for the encoding encoding.

firstly, rephrase "encoding encoding", because like this it looks like a repeated word typo. Secondly, I don't really understand what this comment is saying anyway -- it sounds more appropriate to a call to mEncoder->Reset() than anything that this method is actually doing. Thirdly, the whole comment is copy-pasted into TextDecoder::CreateDecoder, where it makes even less sense: at a minimum there it should talk about the "decoding" instead of "encoding".

::: dom/encoding/test/test_BOMEncoding.js
@@ +77,5 @@
> +
> +  testBOMCharset({encoding: "utf-16", data: data, expected: expectedString,
> +                  msg: "test decoder BOM encoding for utf-16."});
> +
> +  dataUTF16 = [0xFF, 0xFE, 0x22, 0x00, 0x12, 0x04, 0x41, 0x04, 0x35, 0x04, 0x20, 0x00, 0x41, 0x04, 0x47, 0x04, 0x30, 0x04, 0x41, 0x04, 0x42, 0x04, 0x3B, 0x04, 0x38, 0x04, 0x32, 0x04, 0x4B, 0x04, 0x35, 0x04, 0x20, 0x00, 0x41, 0x04, 0x35, 0x04, 0x3C, 0x04, 0x4C, 0x04, 0x38, 0x04, 0x20, 0x00, 0x3F, 0x04, 0x3E, 0x04, 0x45, 0x04, 0x3E, 0x04, 0x36, 0x04, 0x38, 0x04, 0x20, 0x00, 0x34, 0x04, 0x40, 0x04, 0x43, 0x04, 0x33, 0x04, 0x20, 0x00, 0x3D, 0x04, 0x30, 0x04, 0x20, 0x00, 0x34, 0x04, 0x40, 0x04, 0x43, 0x04, 0x33, 0x04, 0x30, 0x04, 0x2C, 0x00, 0x20, 0x00, 0x3A, 0x04, 0x30, 0x04, 0x36, 0x04, 0x34, 0x04, 0x30, 0x04, 0x4F, 0x04, 0x20, 0x00, 0x3D, 0x04, 0x35, 0x04, 0x41, 0x04, 0x47, 0x04, 0x30, 0x04, 0x41, 0x04, 0x42, 0x04, 0x3B, 0x04, 0x38, 0x04, 0x32, 0x04, 0x30, 0x04, 0x4F, 0x04, 0x20, 0x00, 0x41, 0x04, 0x35, 0x04, 0x3C, 0x04, 0x4C, 0x04, 0x4F, 0x04, 0x20, 0x00, 0x3D, 0x04, 0x35, 0x04, 0x41, 0x04, 0x47, 0x04, 0x30, 0x04, 0x41, 0x04, 0x42, 0x04, 0x3B, 0x04, 0x38, 0x04, 0x32, 0x04, 0x30, 0x04, 0x20, 0x00, 0x3F, 0x04, 0x3E, 0x04, 0x2D, 0x00, 0x41, 0x04, 0x32, 0x04, 0x3E, 0x04, 0x35, 0x04, 0x3C, 0x04, 0x43, 0x04, 0x2E, 0x00, 0x22, 0x00];

Please add comments to the testcases that don't already have comments to specify what case each one is testing.
Comment 141 Masatoshi Kimura [:emk] 2012-09-27 08:39:47 PDT
Created attachment 665482 [details] [diff] [review]
Part 1: Implement StringEncoding API. r=doug, smontagu

Patch for checkin
Comment 142 Masatoshi Kimura [:emk] 2012-09-27 09:18:57 PDT
Created attachment 665503 [details] [diff] [review]
Part 1: Implement StringEncoding API. r=doug, smontagu

The previous patch failed to compile with compilers other than MSVC.
Comment 143 Masatoshi Kimura [:emk] 2012-09-27 10:18:01 PDT
Created attachment 665528 [details] [diff] [review]
Part 1: Implement StringEncoding API. r=doug, smontagu

Removed a bogus assertion.
Comment 144 Masatoshi Kimura [:emk] 2012-09-28 01:40:35 PDT
https://tbpl.mozilla.org/?tree=Try&rev=8026fec01d28
Comment 147 Jean-Yves Perrier [:teoli] 2013-08-21 10:06:22 PDT
The new https://developer.mozilla.org/en-US/docs/Web/API/TextDecoder
and 
https://developer.mozilla.org/en-US/docs/Web/API/TextDecoder

describes the latest version of the specification.

A note indicates the existence of this earlier implementation in the compat notes.

There also is a mention in Fx 18 for developers:
https://developer.mozilla.org/en-US/docs/Mozilla/Firefox/Releases/18

Note You need to log in before you can comment on or make changes to this bug.