Closed Bug 801487 Opened 12 years ago Closed 12 years ago

Update StringEncoding API per the latest spec and fix some bugs

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla19

People

(Reporter: emk, Assigned: emk)

References

Details

(Keywords: dev-doc-complete)

Attachments

(9 files, 13 obsolete files)

12.09 KB, patch
sicking
: review+
RyanVM
: checkin+
Details | Diff | Splinter Review
5.18 KB, patch
sicking
: review+
RyanVM
: checkin+
Details | Diff | Splinter Review
10.15 KB, patch
sicking
: review+
Details | Diff | Splinter Review
14.59 KB, patch
sicking
: review+
Details | Diff | Splinter Review
21.85 KB, patch
sicking
: review+
Details | Diff | Splinter Review
4.79 KB, patch
sicking
: review+
Details | Diff | Splinter Review
7.02 KB, patch
sicking
: review+
Details | Diff | Splinter Review
9.88 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
7.82 KB, patch
emk
: review+
Details | Diff | Splinter Review
I should have noticed that sooner :(
I found some other bugs and spec updates. Morphing the bug.
Summary: new TextDecoder("shift_jis").encoding returns "shift-jis", not "shift_jis" → Update StringEncoding API per the latest spec and fix some bugs
Attached patch patch (obsolete) — Splinter Review
- Cahnged ToLowerCase to ASCIIToLower.
- Added 2 encodings ("iso-8859-8-i", "x-user-defined") to catch up the spec.
- Fixed "shift_jis" encoding name.
- Removed "x-windows-949" hack because EUC-KR decoder was just an alias of x-windows-949 decoder and TextEncoder does not support non-UTF encodings.
- Added testcases verifying above changes.
Assignee: nobody → VYV03354
Status: NEW → ASSIGNED
Attachment #671544 - Flags: review?(jonas)
Attached patch patch v2Splinter Review
Improved the test code a bit.
Attachment #671544 - Attachment is obsolete: true
Attachment #671544 - Flags: review?(jonas)
Attachment #671648 - Flags: review?(jonas)
Blocks: 801402
Attached patch Update WebIDL definition (obsolete) — Splinter Review
The first argument of encode/decode has a default value until https://www.w3.org/Bugs/Public/show_bug.cgi?id=19646 is fixed.
This patch also fix a bug of js_ErrorToException. js_ErrorToException couldn't create an Exception with a non-ASCII message.
According to the latest spec.
Attachment #674408 - Attachment is obsolete: true
Please flag any other patches that you need reviewed.
Keywords: checkin-needed
Whiteboard: [land 671648] [leave open]
Attachment #674406 - Attachment is obsolete: true
Attachment #674828 - Flags: review?(jonas)
Attachment #674407 - Attachment is obsolete: true
Attachment #674829 - Flags: review?(jwalden+bmo)
Attachment #674426 - Attachment is obsolete: true
Attachment #674832 - Flags: review?(jonas)
Whiteboard: [land 671648] [leave open] → [land 671648, 674828] [leave open]
Comment on attachment 674832 [details] [diff] [review]
Change some exceptions from EncodingError to TypeError

Review of attachment 674832 [details] [diff] [review]:
-----------------------------------------------------------------

I think bz needs to review this. I can do the TextEncoder/Decoder parts once he has ok'ed the other changes. It seems very unfortunate to me to require a context for throwing type errors. Can't the generated bindings code do that?
Attachment #674832 - Flags: review?(jonas) → review?(bzbarsky)
TypeError is spec'ed by the Encoding Standard itself, not the WebIDL spec. So I don't think it belongs to the bindings code.
Anne, why did you change the error to TypeError?
I think the right way to do this is to have ThrowTypeError on ErrorResult, as this patch does, but have that just set state on the object.  Then the generated code should do the actual exception-throwing.  That's sort of been the exceptions plan all along.
And in particular, what the codegen does right now looks like this:

          if (rv.Failed()) {
            return ThrowMethodFailedWithDetails<true>(cx, rv, "XMLHttpRequest", "send");
          }

and ThrowMethodFailedWithDetails could do special things in case rv actually had a TypeError thrown on it.
How can I pass around varargs across the function?
Hmm.  You want varargs because you want to be able to report the label that was passed in?

In this case, I would say just throw NOT_UTF for either failure.  But in general we'd need to figure out some way of doing that, yes...

One option, I guess, might be for the ErrorResult to have a JSContext*.  But I'd really like to keep it as cheap as possible in the non-exceptional cases.  :(
Flags: in-testsuite+
Keywords: checkin-needed
Whiteboard: [land 671648, 674828] [leave open] → [leave open]
To answer comment 15, the idea is to emulate the semantics of a string enum, without writing out the actual enum in Web IDL as it would be very verbose (at least for TextDecoder). If I add a comment to that effect to the spec, would that be sufficient?
No longer blocks: 801402
> without writing out the actual enum in Web IDL as it would be very verbose 

Why is that a problem, exactly?

But note that in this case the callee only accepts the UTF-8/16 encodings anyway, not all of them....
TextDecoder supports non-UTF encodings while TextEncoder doesn't.
In which case it might actually make sense to have them take different types in the IDL to make this clear, fwiw.
Other problem is, Web IDL enum doesn't support case-insensitive matching and trimming space characters.
Ah.  Yes, ok.  That last makes it infeasible to use here.
bz, shall I name the argument utfLabel for TextEncoder to indicate it is somewhat different? FWIW, I'm happy to use a different exception here, maybe NotSupportedError? EncodingError just did not seem right as it's not related to any encoding or decoding error, it's just an incorrect argument.
> utfLabel for TextEncoder to indicate it is somewhat different?

Might not be a bad idea.

I don't think there's a problem with having a TypeError here per se, for what it's worth.  But then again, I also think that it doesn't matter that much what sorts of exceptions are thrown.  ;)
I changed the function not to depend on varargs.
Attachment #674829 - Attachment is obsolete: true
Attachment #674829 - Flags: review?(jwalden+bmo)
Attachment #675729 - Flags: review?(jwalden+bmo)
The only overhead is one pointer-size storage. It will not be even touched unless TypeError is thrown. It should be negligible because ErrorResult is noncopyable.
Attachment #675735 - Flags: review?(bzbarsky)
Attachment #674832 - Attachment is obsolete: true
Attachment #674832 - Flags: review?(bzbarsky)
Attachment #675737 - Flags: review?(jonas)
Comment on attachment 675735 [details] [diff] [review]
Implement ErrorResult::ThrowTypeError

Why not make mErrorNumber be a dom::ErrNum?

>+  MOZ_ASSERT(argCount <= 10);

How about doing that in ThrowTypeError?  So if argCount there is too large, simply delete message, set mMessage to null, and move on (after asserting).

That will make the failure mode "wrong exception thrown" not "out of bounds stack write.

Alternately, you could just set argCount = NS_MIN(argCount, 10) or something and ignore extra args (after the assert you have).

But also, you could just do:

  const jschar* args[argCount+1];

and not worry about all this, no?

>+#ifdef DEBUG
>+  ~ErrorResult() {
>+    MOZ_ASSERT_IF(IsTypeError(), !mMessage);
>+  }
>+#endif

There are ErrorResult consumers who don't ever use ThrowMethodFailedWithDetails.  In fact, they just swallow the error...

It might be safer to just make mMessage an nsAutoPtr....

r=me with that, I guess.
Attachment #675735 - Flags: review?(bzbarsky) → review+
(In reply to Boris Zbarsky (:bz) from comment #34)
> Why not make mErrorNumber be a dom::ErrNum?
Will do.

> >+  MOZ_ASSERT(argCount <= 10);
> 
> How about doing that in ThrowTypeError?  So if argCount there is too large,
> simply delete message, set mMessage to null, and move on (after asserting).
> 
> That will make the failure mode "wrong exception thrown" not "out of bounds
> stack write.
It will crash with buffer overrun in js anyway.
https://mxr.mozilla.org/mozilla-central/source/js/src/jscntxt.cpp?rev=d1984b76a0b2#886

> Alternately, you could just set argCount = NS_MIN(argCount, 10) or something
> and ignore extra args (after the assert you have).
We can't change the argCount here (see the js code above). I don't think assertion can actually fail for a reason other than silly typo because argCount is statically embedded in Errors.msg.

> But also, you could just do:
> 
>   const jschar* args[argCount+1];
> 
> and not worry about all this, no?
MSVC doesn't support C99 variable length arrays :( 

> >+#ifdef DEBUG
> >+  ~ErrorResult() {
> >+    MOZ_ASSERT_IF(IsTypeError(), !mMessage);
> >+  }
> >+#endif
> 
> There are ErrorResult consumers who don't ever use
> ThrowMethodFailedWithDetails.  In fact, they just swallow the error...
Even worse, it will cause memory leak.
But no such consumers use NS_ERROR_TYPE_ERR because ThrowTypeError is a newly added method. Also I added assertions to catch such a usage.

> It might be safer to just make mMessage an nsAutoPtr....
nsAutoPtr will always initialize mMessage even if no errors are thrown. Is it OK?
> nsAutoPtr will always initialize mMessage even if no errors are thrown. Is it OK?
And it will always check mMessage on destruction. If it doesn't matter, I'll gladly replace it with nsAutoPtr.
> We can't change the argCount here (see the js code above).

My point is that you can just ignore args after the 10th one.  Is that not the case?

> MSVC doesn't support C99 variable length arrays :( 

Even in C++?  I was pretty sure we use them elsewhere in the tree....

> Even worse, it will cause memory leak.

Well, right.   That's the problem.  Swallowing the error is intended behavior for those consumers, but leaking is not.

> Also I added assertions to catch such a usage.

Hmm.  OK, fair.  Given that, I guess let's go without nsAutoPtr for now.  I agree that having the extra init/check is undesirable.
(In reply to Boris Zbarsky (:bz) from comment #37)
> > We can't change the argCount here (see the js code above).
> 
> My point is that you can just ignore args after the 10th one.  Is that not
> the case?

OK, I've moved assertion into ThrowTypeError and added |argCount = std::min<uint16_t>(argCount, 10);|. (NS_MIN is going to be obsoleted per bug 786533.)

> > MSVC doesn't support C99 variable length arrays :( 
> 
> Even in C++?  I was pretty sure we use them elsewhere in the tree....

When I changed the code to |const jschar* args[argCount + 1];|, compile failed with the following error:
h:/m/mozilla-central/dom/bindings/BindingUtils.cpp(82) : error C2057: expected constant expression
h:/m/mozilla-central/dom/bindings/BindingUtils.cpp(82) : error C2466: cannot allocate an array of constant size 0
h:/m/mozilla-central/dom/bindings/BindingUtils.cpp(82) : error C2133: 'args' : unknown size
Supporting variable length arrays in C++ is a gcc extension. Even C++11 didn't spec that. I'd be surprised if it is actually used in our XP code.

> > Also I added assertions to catch such a usage.
> 
> Hmm.  OK, fair.  Given that, I guess let's go without nsAutoPtr for now.  I
> agree that having the extra init/check is undesirable.

I've added a comment explaining why a raw pointer is used.
Attachment #675735 - Attachment is obsolete: true
Attachment #675896 - Flags: review+
Rebased to tip
Attachment #675896 - Attachment is obsolete: true
Attachment #676692 - Flags: review+
Comment on attachment 675729 [details] [diff] [review]
Add a JSAPI to make a report with unicode arguments

Review of attachment 675729 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsapi.cpp
@@ +6732,5 @@
> +                         va_list ap)
> +{
> +    AssertHeapIsIdle(cx);
> +    js_ReportErrorNumberVA(cx, JSREPORT_ERROR, errorCallback, userRef,
> +                           errorNumber, JS_FALSE, ap);

The JS_FALSE here (which I would ordinarily say should be false, just so you're aware for other patches) is totally unintelligible when read here.  :-)

These days we would make this an enum ArgumentType { AsciiString, UnicodeString } so that places like this could pass something readable, and more obviously correct or not-correct.  Could you file a followup bug to make this change, please?  Could be good first bug material, if I don't get irritated enough by this to just go do it myself.  :-)

::: js/src/jscntxt.cpp
@@ -914,5 @@
>                  argLengths[i] = js_strlen(reportp->messageArgs[i]);
>                  totalArgsLength += argLengths[i];
>              }
> -            /* NULL-terminate for easy copying. */
> -            reportp->messageArgs[i] = NULL;

SpiderMonkey's error reporting API, mechanism, and code is a disaster, as I'm sure you've discovered.  :-)  Which makes me think that here, if you're not going to null-terminate in the case when messageArgs were passed in from the start, this entry is NULL.  You should assert this fact as well, so if any user screws it up somehow at some point, we'll quickly discover it.

@@ +1072,5 @@
> +                            void *userRef, const unsigned errorNumber,
> +                            const jschar **args)
> +{
> +    if (checkReportFlags(cx, &flags))
> +        return JS_TRUE;

SpiderMonkey doesn't use JS_TRUE or JS_FALSE in new code -- we use true and false directly.  We also use bool instead of JSBool, everywhere except in APIs (at least for the moment; SpiderMonkey's transition to being C++ means we might remove JSBool entirely sometime).  (There might be one or two cases where we make exceptions, that don't immediately spring to mind; nothing in this function is such a case.)

@@ +1073,5 @@
> +                            const jschar **args)
> +{
> +    if (checkReportFlags(cx, &flags))
> +        return JS_TRUE;
> +    JSBool warning = JSREPORT_IS_WARNING(flags);

bool
Attachment #675729 - Flags: review?(jwalden+bmo) → review+
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #40)
> Comment on attachment 675729 [details] [diff] [review]
> Add a JSAPI to make a report with unicode arguments
> 
> Review of attachment 675729 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/jsapi.cpp
> @@ +6732,5 @@
> > +                         va_list ap)
> > +{
> > +    AssertHeapIsIdle(cx);
> > +    js_ReportErrorNumberVA(cx, JSREPORT_ERROR, errorCallback, userRef,
> > +                           errorNumber, JS_FALSE, ap);
> 
> The JS_FALSE here (which I would ordinarily say should be false, just so
> you're aware for other patches) is totally unintelligible when read here. 
> :-)
> 
> These days we would make this an enum ArgumentType { AsciiString,
> UnicodeString } so that places like this could pass something readable, and
> more obviously correct or not-correct.  Could you file a followup bug to
> make this change, please?  Could be good first bug material, if I don't get
> irritated enough by this to just go do it myself.  :-)

Filed bug 808286.

> ::: js/src/jscntxt.cpp
> @@ -914,5 @@
> >                  argLengths[i] = js_strlen(reportp->messageArgs[i]);
> >                  totalArgsLength += argLengths[i];
> >              }
> > -            /* NULL-terminate for easy copying. */
> > -            reportp->messageArgs[i] = NULL;
> 
> SpiderMonkey's error reporting API, mechanism, and code is a disaster, as
> I'm sure you've discovered.  :-)  Which makes me think that here, if you're
> not going to null-terminate in the case when messageArgs were passed in from
> the start, this entry is NULL.  You should assert this fact as well, so if
> any user screws it up somehow at some point, we'll quickly discover it.

Added an assertion.

> @@ +1072,5 @@
> > +                            void *userRef, const unsigned errorNumber,
> > +                            const jschar **args)
> > +{
> > +    if (checkReportFlags(cx, &flags))
> > +        return JS_TRUE;
> 
> SpiderMonkey doesn't use JS_TRUE or JS_FALSE in new code -- we use true and
> false directly.  We also use bool instead of JSBool, everywhere except in
> APIs (at least for the moment; SpiderMonkey's transition to being C++ means
> we might remove JSBool entirely sometime).  (There might be one or two cases
> where we make exceptions, that don't immediately spring to mind; nothing in
> this function is such a case.)

Done.

> @@ +1073,5 @@
> > +                            const jschar **args)
> > +{
> > +    if (checkReportFlags(cx, &flags))
> > +        return JS_TRUE;
> > +    JSBool warning = JSREPORT_IS_WARNING(flags);
> 
> bool

Done.
Attachment #675729 - Attachment is obsolete: true
Attachment #677999 - Flags: review+
The latest spec doesn't mention useBOM internal flag anymore.
Attachment #678112 - Flags: review?(jonas)
Using -u0 to reduce the patch size.
Attachment #678162 - Flags: review?(jonas)
- Disabled tests for unpaired surrogates because Gecko doesn't comply the spec yet.
- Removed "ibm864" from and added "iso-8859-8-i" and "x-user-defined" to the encoding list. The test doesn't catch up the latest spec yet.
Attachment #678163 - Flags: review?(jonas)
The new imported test caused an assertion failure here.
https://mxr.mozilla.org/mozilla-central/source/js/src/jsexn.cpp?rev=265427a0694d#150
Attachment #676692 - Attachment is obsolete: true
Attachment #678164 - Flags: review?(bzbarsky)
Added an assertion to catch the error earlier.
Attachment #677999 - Attachment is obsolete: true
Attachment #678165 - Flags: review?(jwalden+bmo)
Comment on attachment 678164 [details] [diff] [review]
Part 4: Implement ErrorResult::ThrowTypeError

r=me
Attachment #678164 - Flags: review?(bzbarsky) → review+
Attachment #678165 - Flags: review?(jwalden+bmo) → review+
https://tbpl.mozilla.org/?tree=Try&rev=ab08d2661719
Keywords: checkin-needed
Whiteboard: [leave open] → [leave open][land 678164,678165]
Comment on attachment 678112 [details] [diff] [review]
Part 6: Remove encoding detection using BOM

Review of attachment 678112 [details] [diff] [review]:
-----------------------------------------------------------------

r=me assuming that this is what the spec says.

That said, I'm a bit surprised that the API doesn't support any type of BOM detection. But I guess this is fine to do for now and if people want BOM detection they can request it added to the spec.
Attachment #678112 - Flags: review?(jonas) → review+
Whiteboard: [leave open][land 678164,678165]
Attachment #678165 - Attachment description: Add a JSAPI to make a report with unicode arguments → Part 3: Add a JSAPI to make a report with unicode arguments
Attachment #678164 - Attachment description: Implement ErrorResult::ThrowTypeError → Part 4: Implement ErrorResult::ThrowTypeError
Attachment #675737 - Attachment description: Change some exceptions from EncodingError to TypeError → Part 5: Change some exceptions from EncodingError to TypeError
Attachment #678112 - Attachment description: Remove encoding detection using BOM → Part 6: Remove encoding detection using BOM
Attachment #678161 - Attachment description: Import the latest tests from https://code.google.com/p/stringencoding/ → Part 7: Import the latest tests from https://code.google.com/p/stringencoding/
Attachment #678162 - Attachment description: Import the latest tests from https://code.google.com/p/stringencoding/ , part 2 → Part 8: Import the latest tests from https://code.google.com/p/stringencoding/ , part 2
Attachment #678163 - Attachment description: Fixup imported tests → Part 9: Fixup imported tests
Unbitrotted
Attachment #678164 - Attachment is obsolete: true
Attachment #678829 - Flags: review+
Depends on: 1125698
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.