Support JSON encoded in UTF-16

RESOLVED WONTFIX

Status

()

Firefox
Developer Tools: JSON Viewer
P3
normal
RESOLVED WONTFIX
5 months ago
2 months ago

People

(Reporter: Oriol, Assigned: Oriol)

Tracking

unspecified
Firefox 58
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox57 fix-optional, firefox58 fixed)

Details

MozReview Requests

()

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(1 attachment, 2 obsolete attachments)

(Assignee)

Description

5 months ago
From https://tools.ietf.org/html/rfc7159#section-8.1,
> JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32.

The JSON Viewer should support UTF-16 and UTF-32.
(Assignee)

Comment 1

5 months ago
My idea was:

1. Wait for some data, determine the encoding.
2. Set the encoding of the document to the same encoding as the data.
3. Encode the HTML to that encoding.
4. Send the data as is.

The problem is that bug 1261841 removed the capability of encoding to UTF-16 using nsIScriptableUnicodeConverter, and bug 1257877 removed it from TextEncoder.

So I guess it should be:

1. Always set the encoding of the document to UTF-8 (already done).
2. Determine the encoding of the data, decode it, and encode it to UTF-8.

I'm afraid this will be problematic if the data has encoding errors, or if the stream is segmented at a bad place.
(Assignee)

Comment 2

5 months ago
Firefox does not seem to support UTF-32, let's focus on UTF-16.
Summary: Support JSON encoded in UTF-16 or UTF-32 → Support JSON encoded in UTF-16
(Assignee)

Comment 3

4 months ago
Created attachment 8907256 [details] [diff] [review]
json-utf-16.patch
Assignee: nobody → oriol-bugzilla
Status: NEW → ASSIGNED
Attachment #8907256 - Flags: review?(odvarko)
(Assignee)

Comment 6

4 months ago
For some reason the try push didn't include the patch for 1395313 despite it was in hg out.
I hope the tests will work now that it has landed.
(Assignee)

Comment 8

4 months ago
Created attachment 8907608 [details] [diff] [review]
json-utf-16.patch

Ah, I forgot to modify browser.ini
Attachment #8907256 - Attachment is obsolete: true
Attachment #8907256 - Flags: review?(odvarko)
Attachment #8907608 - Flags: review?(odvarko)
status-firefox57: --- → fix-optional
Priority: -- → P3
Comment hidden (mozreview-request)
(Assignee)

Comment 11

4 months ago
Comment on attachment 8907608 [details] [diff] [review]
json-utf-16.patch

I changed readByteArray to readArrayBuffer for better performance with UTF-16 documents. I would have directly used it if in bug 831107 they remembered to update the MDN documentation.
Attachment #8907608 - Attachment is obsolete: true
Attachment #8907608 - Flags: review?(odvarko)

Comment 12

4 months ago
mozreview-review
Comment on attachment 8911501 [details]
Bug 1396286 - Support UTF-16 in JSON Viewer.

https://reviewboard.mozilla.org/r/182954/#review188780

Thank you for the patch.

I think the patch itself is good.  I would r+ it.  However before doing so I would like to understand the rationale for the charset choices.

I looked through RFC 7159.  It makes sense on that basis to want to expand the charsets recognized by jsonviewer.  However, I don't understand why it makes sense to assume that the transferred data will necessarily be limited to one of those charsets.  Instead, it seems to me that the response could carry any charset, and that perhaps this code should use whatever algorithm the browser is using to determine the charset.

That said, I'm not completely sure this makes sense either; but I would like to understand this before signing off on the patch.
(Assignee)

Comment 13

4 months ago
I don't know the reason either, but this is what the RFC says.

From https://tools.ietf.org/html/rfc7159#page-11,

   Note:  No "charset" parameter is defined for this registration.
   Adding one really has no effect on compliant recipients.

From https://tools.ietf.org/html/rfc7159#section-8.1,

   JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32.  The default
   encoding is UTF-8, and JSON texts that are encoded in UTF-8 are
   interoperable in the sense that they will be read successfully by the
   maximum number of implementations; there are many implementations
   that cannot successfully read texts in other encodings (such as
   UTF-16 and UTF-32).

Note that XHR ignores the HTTP charset for JSON responses. To avoid confusing people, the JSON Viewer should also ignore it (see bug 741776).

Comment 14

4 months ago
mozreview-review
Comment on attachment 8911501 [details]
Bug 1396286 - Support UTF-16 in JSON Viewer.

https://reviewboard.mozilla.org/r/182954/#review191350

I think :tromey should be assigned here as the reviewer.
Honza
Attachment #8911501 - Flags: review?(odvarko)
(Assignee)

Updated

4 months ago
Attachment #8911501 - Flags: review?(ttromey)

Comment 15

4 months ago
mozreview-review
Comment on attachment 8911501 [details]
Bug 1396286 - Support UTF-16 in JSON Viewer.

https://reviewboard.mozilla.org/r/182954/#review191504

Thanks.  I'm satisfied with that answer.  I wasn't aware that the charset was ignored in the XHR request as well.
Attachment #8911501 - Flags: review?(ttromey) → review+
(Assignee)

Updated

4 months ago
Keywords: checkin-needed

Comment 16

4 months ago
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/469b7b77071c
Support UTF-16 in JSON Viewer. r=tromey
Keywords: checkin-needed
Backed out in https://hg.mozilla.org/integration/autoland/rev/d656e0a216b2 for a 70% chance of https://treeherder.mozilla.org/logviewer.html#?job_id=135060437&repo=autoland on ASan (and a lower but non-zero chance on other slow platforms, there was one on Linux64 debug too).
(Assignee)

Comment 18

3 months ago
The problem is that JSON Viewer is so slow that the test exceeds the timeout. I guess checking that the charset parameter is ignored does not need to be repeated for every encoding test, so I will move that to a different test file. This will halve the time needed by the encoding tests, I hope this will be enough.
Comment hidden (mozreview-request)
(Assignee)

Updated

3 months ago
Attachment #8911501 - Flags: review+ → review?(ttromey)

Comment 20

3 months ago
mozreview-review
Comment on attachment 8911501 [details]
Bug 1396286 - Support UTF-16 in JSON Viewer.

https://reviewboard.mozilla.org/r/182954/#review192264

Thank you.  This looks good.
Attachment #8911501 - Flags: review?(ttromey) → review+
(Assignee)

Updated

3 months ago
Keywords: checkin-needed

Comment 21

3 months ago
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/168e7a52c760
Support UTF-16 in JSON Viewer. r=tromey
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/168e7a52c760
Status: ASSIGNED → RESOLVED
Last Resolved: 3 months ago
status-firefox58: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → Firefox 58
(Assignee)

Updated

3 months ago
Depends on: 1412074
Do we really support this when the HTML spec going to reject non-UTF-8 charsets that will overrides RFC?
https://github.com/whatwg/html/issues/1403

This change should be backed out.
Flags: needinfo?(hsivonen)
(Assignee)

Comment 24

2 months ago
I only tried to follow the RFC. Note that UTF-16 is only used if there is the corresponding BOM or if one of the two first bytes in NULL. In both cases this would result in invalid JSON if using UTF-8.

Note that when the JSON Viewer is enabled, application/json becomes a "explicitly supported JSON type" (https://html.spec.whatwg.org/#explicitly-supported-json-type), so https://html.spec.whatwg.org/#navigating-across-documents:json-mime-type (which seems the thing that will be modified to always use UTF-8) does not apply at all.

Anyways, this bug added some machinery used in other bugs. If you want to disable UTF-16 the simplest way would be a small modification in determineEncoding to always choose UTF-8. This could be uplifted, and a follow-up bug could cleanup the useless code and ride the trains.
kCharsetFromUtf8OnlyMime has higher precedence than kCharsetFromByteOrderMark, so we chose to intentionally ignore BOM.
Also note that RFC does not *require* supporting UTF-16 encodings. Rather, it warns about interop. concerns:
>   JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32.  The default
>   encoding is UTF-8, and JSON texts that are encoded in UTF-8 are
>   interoperable in the sense that they will be read successfully by the
>   maximum number of implementations; there are many implementations
>   that cannot successfully read texts in other encodings (such as
>   UTF-16 and UTF-32).
>
>   Implementations MUST NOT add a byte order mark to the beginning of a
>   JSON text.  In the interests of interoperability, implementations
>   that parse JSON texts MAY ignore the presence of a byte order mark
>   rather than treating it as an error.

I strongly recommend to disable UTF-16 support if "because RFC says it's possible" is the only reason. We don't support UTF-32 anyway.
(Assignee)

Comment 27

2 months ago
(In reply to Masatoshi Kimura [:emk] from comment #25)
> kCharsetFromUtf8OnlyMime has higher precedence than
> kCharsetFromByteOrderMark, so we chose to intentionally ignore BOM.

The JSON Viewer switches the content type to "text/html", so kCharsetFromUtf8OnlyMime is not used.

(In reply to Masatoshi Kimura [:emk] from comment #26)
> I strongly recommend to disable UTF-16 support if "because RFC says it's
> possible" is the only reason.

Well, I think it's a nice-to-have, but I don't have a strong opinion. Honza, what do you think?
Flags: needinfo?(odvarko)
(In reply to Oriol Brufau [:Oriol] from comment #27)
> Well, I think it's a nice-to-have, but I don't have a strong opinion. Honza,
> what do you think?

Apparently you are asking a different "Honza" :)
Flags: needinfo?(odvarko) → needinfo?(honzab.moz)
(Assignee)

Comment 29

2 months ago
(In reply to Masatoshi Kimura [:emk] from comment #28)
> Apparently you are asking a different "Honza" :)

I really wanted to ask Honza Odvarko, since he is the Triage Owner. But well, I don't know who is supposed to make these decisions.
Flags: needinfo?(honzab.moz) → needinfo?(odvarko)
I think the JSON viewer should not support UTF-16-encoded JSON. I think the JSON viewer should be consistent with XHR's JSON handling, since it's confusing if the JSON viewer gives different results compared to XHR. XHR's responseType='json' supports only UTF-8, and that's intentional. The WHATWG byte serialization of JSON is defined in terms of ECMA JSON and the Encoding Standard and not in terms of IETF JSON.

It not reasonable to support IETF JSON in Firefox, because IETF JSON does unreasonable things like supporting UTF-32, which we intentionally don't support.

That is, I think this patch should be backed out.
Flags: needinfo?(hsivonen)

Comment 31

2 months ago
Indeed, we should decode all JSON using UTF-8 (stripping a single leading UTF-8 BOM if present). We should support nothing more.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Assignee)

Updated

2 months ago
Depends on: 1419416
(Assignee)

Comment 32

2 months ago
I filed bug 1419416 and bug 1419471 because I couldn't push review here.

Comment 33

2 months ago
Thank you, much appreciated.

Comment 34

2 months ago
Resolving this as WONTFIX now the code has been backed out again. That makes the most sense relative to comment 0 I think.
Status: REOPENED → RESOLVED
Last Resolved: 3 months ago2 months ago
Flags: needinfo?(odvarko)
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.