Closed Bug 1092737 Opened 5 years ago Closed 5 years ago

[Encoding] Implement fatal property for TextDecoder and update both TextDecoder and TextEncoder to spec a bit

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla36

People

(Reporter: crimsteam, Assigned: bzbarsky)

References

(Blocks 1 open bug, )

Details

Attachments

(3 files, 2 obsolete files)

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:33.0) Gecko/20100101 Firefox/33.0
Build ID: 20141027150301

Steps to reproduce:

TextDecoder.fatal and TextDecoder.ignoreBOM properites are not supported (return undefined). 

In the other site TextDecoder() constructor take second argument (object) where we can set "fatal" state:
https://developer.mozilla.org/en-US/docs/Web/API/TextDecoder.TextDecoder

Small test:

<script type = "text/javascript">

	var decoder1 = new TextDecoder();
	
	document.write(decoder1.fatal + "<br>"); // undefined (but should be false)
	document.write(decoder1.ignoreBOM + "<br><br>"); // undefined (but should be false)
	
	var decoder1 = new TextDecoder("utf8", {fatal:true,ignoreBOM:true});
	
	document.write(decoder1.fatal + "<br>"); // undefined (but should be true)
	document.write(decoder1.ignoreBOM); // undefined (but should be true)
 
</script>

At now only Chrome support this stuff correct.
Anne, do you know if there's any tests for this?
OS: Windows 7 → All
Hardware: x86_64 → All
Version: 36 Branch → Trunk
Perhaps in Chromium's code base. You could ask jsbell (doesn't seem to have a Bugzilla account).
I guess these properties postdate us implementing the spec?  I really wish there were good ways to track this sort of thing (e.g. spec IDL published somewhere and a way to diff against it or something....)

In any case, Ms2ger, are you planning to fix this, or should I steal it?
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(Ms2ger)
Not planning to right now.
Flags: needinfo?(Ms2ger)
These were added later, yes. Note also that the dictionary argument to TextEncoder's encode() was removed as it is redundant with USVString. We can probably clean that up too.
Blocks: encoding
Depends on: 1098739
So the main problem here is that I'm not sure what the ignoreBOM thing actually means in the context of Gecko's implementation....  I'll post patches that update stuff to the spec except for that bit, but someone else will need to pick this up after that.
Flags: needinfo?(VYV03354)
By default, our utf decoders will eat (or respect in terms of the spec) the BOM. To implement the ignoreBOM=true behavior, we have two options.
1. Feed a BOM to utf decoders internally whenever decoders are initialized if ignore BOM flag is set. This will prevent decoders from eating a BOM anymore.
2. Add ignoreBOM option to nsIUnicodeDecoder and use it from TextDecoder.
Flags: needinfo?(VYV03354)
We should have a way already to decode a string without removing a BOM. There's various URL pieces that need it for instance.
We don't remove it for something like http://example.com/%EF%BB%BF though (in the address bar, but perhaps that is all Firefox UI).
Because nsStandardURL does not decode the string. It just encodes with various charsets.
The internal storage of nsStandardURL is not a Unicode string but a byte array.
Sorry, I meant that for display purposes we decode it (sometimes, not here apparently).

Anyway, we need some way to decode without removing the BOM, but I guess we should track that separately?
(In reply to Anne (:annevk) from comment #14)
> Sorry, I meant that for display purposes we decode it (sometimes, not here
> apparently).

We have NS_ConvertUTF8ToUTF16, but it does not support streaming.

> Anyway, we need some way to decode without removing the BOM, but I guess we
> should track that separately?

Bug 634541 already exists.
A note: dom/bindings/test/test_exception_messages.html depends on TextEncoder.encode taking a dictionary second argument.
OK.  So do we want to scope this down to just exposing "fatal" for now and put off all the rest of the changes to a separate bug?
Sounds good.
OK, spun off ignoreBOM into bug 1102679.
Attachment #8522679 - Attachment is obsolete: true
Assignee: nobody → bzbarsky
Status: NEW → ASSIGNED
Attachment #8522680 - Attachment is obsolete: true
Attachment #8526544 - Flags: review?(VYV03354) → review+
Comment on attachment 8526548 [details] [diff] [review]
part 2.  Update TextDecoder to various spec changes

Review of attachment 8526548 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/encoding/TextDecoder.cpp
@@ +107,5 @@
> +  }
> +  const ArrayBufferViewOrArrayBuffer& buf = aBuffer.Value();
> +  uint8_t* data;
> +  int32_t length; // XXXbz Should the other Decode signature really
> +                  // take a signed int, and not size_t, here?

nsIUnicodeDecoder::Convert takes int32_t* and it uses -1 to indicate an error :(
Attachment #8526548 - Flags: review?(VYV03354) → review+
I'll update the comment accordingly.

Do you think we need to worry about array buffers that are >2GB?  Seems like we should at least safely throw on those...
I'm for throwing an exception. Maybe NS_ERROR_OUT_OF_MEMORY?
How about this?
Attachment #8526870 - Flags: review?(VYV03354)
Attachment #8526870 - Flags: review?(VYV03354) → review+
   https://hg.mozilla.org/integration/mozilla-inbound/rev/fee44dd3af53
   https://hg.mozilla.org/integration/mozilla-inbound/rev/762d9365ed89
Summary: [Encoding] Implement fatal and ignoreBOM properties for TextDecoder → [Encoding] Implement fatal property for TextDecoder and update both TextDecoder and TextEncoder to spec a bit
Target Milestone: --- → mozilla36
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.