Closed Bug 1250191 Opened 8 years ago Closed 8 years ago

Add a way to serialize JSON canonically

Categories

(Toolkit :: General, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla48
Tracking Status
firefox48 --- fixed

People

(Reporter: alexis+bugs, Assigned: leplatrem)

References

(Depends on 1 open bug)

Details

Attachments

(2 files, 6 obsolete files)

Attached patch canonical-json-patch.diff (obsolete) — Splinter Review
To handle signatures on content, we have to provide a canonical form of the JSON objects. Otherwise each serialization might have a different output, making it impracticable to sign or verify against a signature.

I have added this inside `services-common/canonical-json` but I think this should go somewhere else (just not sure where). Please tell me where this should leave.

I also have added tests to ensure that this is behaving properly.

This builds on top of bug 1250104, which adds JavaScript escaping library.
Attachment #8722023 - Flags: feedback?(rnewman)
Comment on attachment 8722023 [details] [diff] [review]
canonical-json-patch.diff

Review of attachment 8722023 [details] [diff] [review]:
-----------------------------------------------------------------

::: services/common/canonical-json.js
@@ +1,1 @@
> +var {classes: Cc, interfaces: Ci, utils: Cu, results: Cr} = Components;

License block at the top, please, and `const`, not `var`.

@@ +4,5 @@
> +
> +Cu.import("resource://services-common/jsesc.js");
> +
> +/**
> + * Return the canonical form of the passed Array, sorting all the object keys

Is it an array or an object?

@@ +8,5 @@
> + * Return the canonical form of the passed Array, sorting all the object keys
> + * and ordering the passed array via the `sorter` function given as argument.
> + *
> + * @param input
> + *        The array of elements to be serialized.

Array of objects?

@@ +21,5 @@
> + * @usage
> + *        function compareElements(a, b) { return a.id - b.id }
> + *        getCanonicalJSON(listOfRecords, compareElements);
> + **/
> +function getCanonicalJSON(input, compareFunction) {

If the caller is providing the comparison function, why not have the caller provide sorted input?

@@ +42,5 @@
> + * - Does not alter "null" and "undefined" nodes;
> + * - Walks inside arrays and call this function recursively;
> + * - Reorders the object keys by inserting them in alphabetical order.
> + **/
> +function sortObjectKeys(node) {

This should be named `asCanonicalValue`: it doesn't (just) sort, and it doesn't (just) take objects! What it does is return canonical values for its input.

@@ +45,5 @@
> + **/
> +function sortObjectKeys(node) {
> +  if (node === undefined) {
> +    return undefined;
> +  } else if (typeof node !== "object" || node === null) {

You don't need `else` in the case of early returns:

```
if (node === undefined) {
  return undefined;
}
if (typeof node …) {
  return node;
}
if (…) {
  …
```

@@ +48,5 @@
> +    return undefined;
> +  } else if (typeof node !== "object" || node === null) {
> +    return node;
> +  } else if (Array.isArray(node)) {
> +    const out = [];

Simply

  return node.map(asCanonicalValue);

@@ +60,5 @@
> +    Object.keys(node).forEach(function(key) {
> +      sortedKeys.push(key);
> +    });
> +    sortedKeys.sort();
> +    const newObject = {};

`const` isn't the right keyword for something you're about to write into :)

@@ +63,5 @@
> +    sortedKeys.sort();
> +    const newObject = {};
> +    sortedKeys.forEach(function(key) {
> +      if (node.hasOwnProperty(key)) {
> +        newObject[key] = sortObjectKeys(node[key]);

This is fundamentally wrong. You're taking an object like

{"b": 1, "a": 2}

taking the keys:

["b", "a"]

sorting them:

["a", "b"]

And generating a new object by retrieving values from the first object:

{"b": 1, "a": 2}

See what I did there? Note that a JS object's key-value pairs are not ordered, so constructing a new object here gives you no guarantees about its internal structure!

If you want to generate a canonical representation of an object, you need to do one of two things:

* Pass along a separate ordering array, and retrieve keys in that order.
* Use a data structure other than a plain object (e.g., an array of pairs, or implement an OrderedMap class).

::: services/common/tests/unit/test_canonicaljson.js
@@ +16,5 @@
> +    }];
> +
> +    do_check_eq(
> +      getCanonicalJSON(input),
> +      '[{"a":["zero","one"],"b":["two","three"]}]'

This test behavior isn't guaranteed. I suspect that if you create an object with thousands of keys you'll see this start to work more like a HashMap than an ArrayMap, and this test will fail.
Attachment #8722023 - Flags: feedback?(rnewman) → feedback-
The only way to do what I think you're trying to do is serialize as you walk the object tree. You cannot produce an equivalent-but-ordered result for a JS object, because there is no concept of ordering for JS objects.
Hi Richard, and thanks for your feedback!

> I suspect that if you create an object with thousands of keys you'll see this start to work more like a HashMap than an ArrayMap, and this test will fail.

I'm currently struggling writing such a test that makes the current implementation fail. How would you write such a test?

So, it seems that:

- json.stringify is non-deterministic in theory;
- In practice, the ordering of the keys seems to be guaranteed by the current platform implementation.

So we're left in a situation where the current implementation seems to solve our use case, but shouldn't in theory (regarding the spec).

I would like to describe the failure with a test case, but can't find any way to write it right now.

Once we've got this test lined up, showing what the problem is, we could solve the problem with two different approaches:

- Use JSON.Stringify recursively on all elements but objects — as proposed by https://github.com/substack/json-stable-stringify/blob/master/index.js
- Rewrite a JSON serializer by hand, and sort the object keys before doing the serialization, as done in https://github.com/mirkokiefer/canonical-json/blob/master/index.js

Any help would be greatly appreciated!
Flags: needinfo?(rnewman)
Paired on this with :NiKo` this afternoon. Can you provide another round of feedback :rnewman ? Thanks!
Attachment #8722023 - Attachment is obsolete: true
Attachment #8727847 - Flags: review?(rnewman)
(In reply to Alexis Metaireau (:alexis) from comment #3)
> How would you write such a test?

White-box, mainly.


> - In practice, the ordering of the keys seems to be guaranteed by the
> current platform implementation.

Yes; jsobj.h suggests that the keys of an object are ordered, but it doesn't say how they're ordered or whether that ordering could change at runtime. If you try to rely on that, you're relying on an implementation detail that might change without you noticing… and that seems like the wrong call for a library that is supposed to produce canonical output!


> I would like to describe the failure with a test case, but can't find any
> way to write it right now.

It's not possible to conclusively define the issue with a simple test, precisely because the issue is that the implementation might change while still obeying its published contract.

You could try to simulate it by permuting your test objects into all of their legal representations (pseudocode):

  let testObj = {"a": 1, "b": 2, "c": 3};
  let keys = Object.keys(testObj);
  for ks in permutations(keys) {
    let permutedObj = {};
    // Insert in each possible order.
    for k in ks {
      permutedObj[k] = testObj[k];
    }
    // The objects each have the same keys and values.
    assert(sameObj(permutedObj, testObj));

    // The objects have the same canonical representations.
    assert(canonical(permutedObj) == canonical(testObj));

    // In most cases this won't be true in SpiderMonkey,
    // because the keys were inserted in different orders.
    // But it might be, so you can't uncomment this line!
    //assertNot(JSON.toString(permutedObj), JSON.toString(testObj));
  }


but I'm not sure what the value is in the test; this is an intrinsic part of handling objects in JavaScript.



> - Use JSON.Stringify recursively on all elements but objects — as proposed
> by https://github.com/substack/json-stable-stringify/blob/master/index.js

Yeah, <https://github.com/substack/json-stable-stringify/blob/master/index.js#L53> being the key.
Flags: needinfo?(rnewman)
Comment on attachment 8727847 [details] [diff] [review]
New attempt, serializing the object keys on the fly.

Review of attachment 8727847 [details] [diff] [review]:
-----------------------------------------------------------------

Overall points:

* Add tests for the edge cases: empty object, empty array, NaN, Inf, empty string, backslashes in strings.
* Split out the changes for moz-kinto-client.js into a separate patch.
* Are there any tests for canonicalized JSON already out in the world that we can test against?

::: services/common/canonical-json.js
@@ +1,5 @@
> +/* This Source Code Form is subject to the terms of the Mozilla Public
> + * License, v. 2.0. If a copy of the MPL was not distributed with this file,
> + * You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +const {classes: Cc, interfaces: Ci, utils: Cu, results: Cr} = Components;

Remove the binds that you don't use.

@@ +20,5 @@
> + *
> + * @usage
> + *        getCanonicalJSON(listOfRecords);
> + **/
> +function getCanonicalJSON(source) {

I'd prefer this to be a method 'stringify' on an object named 'CanonicalJSON'.

@@ +24,5 @@
> +function getCanonicalJSON(source) {
> +  if (Array.isArray(source)) {
> +    const jsonArray = source.map(x => typeof x === "undefined" ? null : x);
> +    return `[${jsonArray.map(getCanonicalJSON).join(",")}]`;
> +  }

Nit: newline after these major clauses.
Attachment #8727847 - Flags: review?(rnewman)
Attachment #8727847 - Attachment is obsolete: true
After another pair session with :NiKo`, here is the updated patch.

> Are there any tests for canonicalized JSON already out in the world that we can test against?

Currently there are not. We've took the tests from the python canonical json implementation instead and made them pass.

The Canonical JSON produced here is compatible with what's described in http://webpki.org/ietf/draft-rundgren-predictable-serialization-for-json-tools-00.html minus a single point, described in section 2.2.3:

> The textual representation of numbers MUST be preserved during parsing and serialization. That is, if numbers like 3.50 and -0 are encountered during a parsing process, they MUST be serialized as 3.50 and -0 respectively although 3.5 and 0 would be the most natural outcome.

It seems that JavaScript directly converts 3.50 into 3.5, and we haven't found any way to get back the initial 3.50 value.
It's not blocking for our use case, but it means that this implementation wouldn't be compatible with the spec.

I believe that it would be okay to ship it like that, as it addresses our use case.

Let me know what you think!
Flags: needinfo?(rnewman)
Attachment #8727847 - Flags: review?(rnewman)
(In reply to Alexis Metaireau (:alexis) from comment #9)

> The Canonical JSON produced here is compatible with what's described in
> http://webpki.org/ietf/draft-rundgren-predictable-serialization-for-json-
> tools-00.html minus a single point

It looks like minus *two* key points of the spec, no?

Canonical JSON produces object keys in a universal order and numbers in a canonical format.

Predictable serialization (as defined in that doc) produces object keys in the _input_ order and numbers in the _input format_. That absolutely requires parser support.

Your code sorts object keys, so it doesn't follow that draft.

I'd ignore that doc: it's solving a very different problem -- indeed, the opposite problem! -- to canonicalization, and I generally disagree with that document's assertion that it's applicable to signing JSON.


> It's not blocking for our use case, but it means that this implementation
> wouldn't be compatible with the spec.

To me that's a requirement: you cannot implement canonicalization and be compatible with that spec!


> I believe that it would be okay to ship it like that, as it addresses our
> use case.

WFM.
Flags: needinfo?(rnewman)
Comment on attachment 8728403 [details] [diff] [review]
imported patch canonical-json

Review of attachment 8728403 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good with these nits.

::: services/common/canonical-json.js
@@ +29,5 @@
> +    }
> +
> +    if (typeof source === "number") {
> +      if (source === 0) {
> +        return (1/source) > 0 ?  "0" : "-0";

Oh god JS makes me so sad.

But if you have ES2015, use Object.is, which knows the difference:

  return (Object.is(source, -0)) ? "-0" : "0";

@@ +47,5 @@
> +      if (typeof value === "undefined") {
> +        return serial;
> +      }
> +      const jsonValue = value && value.toJSON ? value.toJSON() : value;
> +      const suffix = index !== sortedKeys.length - 1 ? "," : "";

Don't check the length property every time. It's a stored value so the lookup is quick, but decrementing it each time isn't free.

let lastIndex = sortedKeys.length - 1;
return sortedKeys.reduce( …
  const suffix = index !== lastIndex ? "," : "";

@@ +51,5 @@
> +      const suffix = index !== sortedKeys.length - 1 ? "," : "";
> +      const escapedKey = toJSON(key);
> +      return serial + `${escapedKey}:${stringify(jsonValue)}${suffix}`;
> +    }, "{") + "}";
> +  }

Nit: trailing comma. (Saves churn when adding another method to this object.)

@@ +52,5 @@
> +      const escapedKey = toJSON(key);
> +      return serial + `${escapedKey}:${stringify(jsonValue)}${suffix}`;
> +    }, "{") + "}";
> +  }
> +}

Nit: trailing semicolon.

@@ +58,5 @@
> +/**
> + * Encode the given input to valid escaped JSON.
> + * @private
> + **/
> +function toJSON(input) {

Perhaps just inline this in the two places it's used? It's a single function call.

::: services/common/tests/unit/test_canonicaljson.js
@@ +97,5 @@
> +  do_check_eq(CanonicalJSON.stringify([]), "[]");
> +});
> +
> +add_task(function* test_canonicalJSON_serializes_NaN() {
> +  do_check_eq(CanonicalJSON.stringify(NaN), "null");

JS makes me so sad.
Attachment #8728403 - Flags: review+
Status: NEW → ASSIGNED
Attachment #8727847 - Flags: review?(rnewman)
See Also: → 1262389
Assignee: alexis+bugs → mathieu
Blocks: 1263602
I gave r+ on this; assuming your changes are all addressing my comments, you don't need to request re-review.
Noted. I'm sorry.

(The intention was to switch to MozReview, I hope it's still easier for you)
Attachment #8739969 - Flags: review?(rnewman) → review?(MattN+bmo)
Attachment #8739968 - Flags: review?(rnewman) → review?(MattN+bmo)
Comment on attachment 8739968 [details]
MozReview Request: Bug 1250191 - Add a way to serialize JSON canonically

https://reviewboard.mozilla.org/r/45459/#review42911

Some initial feedback… I don't mean to clear the review flag but I can't give comments ahead of time without doing so until bug 1197879 lands.

::: services/common/canonical-json.js:5
(Diff revision 1)
> +/* This Source Code Form is subject to the terms of the Mozilla Public
> + * License, v. 2.0. If a copy of the MPL was not distributed with this file,
> + * You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +const {utils: Cu} = Components;

(In reply to Richard Newman [:rnewman] from comment #7)
> ::: services/common/canonical-json.js
> @@ +1,5 @@
> > +/* This Source Code Form is subject to the terms of the Mozilla Public
> > + * License, v. 2.0. If a copy of the MPL was not distributed with this file,
> > + * You can obtain one at http://mozilla.org/MPL/2.0/. */
> > +
> > +const {classes: Cc, interfaces: Ci, utils: Cu, results: Cr} = Components;
> 
> Remove the binds that you don't use.

This is actually the opposite of what we do in browser/. We recommend keep the same line with all four so developers don't need to ever think about which ones are already in scope

::: services/common/canonical-json.js:9
(Diff revision 1)
> +
> +const {utils: Cu} = Components;
> +
> +this.EXPORTED_SYMBOLS = ["CanonicalJSON"];
> +
> +Cu.import("resource://gre/modules/third_party/jsesc/jsesc.js");

This should be lazily loaded. No need to load jsec until stringify is actually used.

::: services/common/canonical-json.js:25
(Diff revision 1)
> +   * as lowercase hexadecimal.
> +   *
> +   * @usage
> +   *        getCanonicalJSON(listOfRecords);
> +   **/
> +  stringify: function stringify(source) {

Nit: new method syntax:
stringify(source) {

::: services/common/tests/unit/test_canonicaljson.js:1
(Diff revision 1)
> +/* Any copyright is dedicated to the Public Domain.
> +   http://creativecommons.org/publicdomain/zero/1.0/ */
> +

Nit: no longer necessary for test as they're PD by default so save a few lines by removing this.
Attachment #8739968 - Flags: review?(MattN+bmo)
Comment on attachment 8739969 [details]
MozReview Request: Bug 1250191 - Address rnewman review, r=rnewman

https://reviewboard.mozilla.org/r/45461/#review43187

Can you squash this into the other commit for review? We always squash follow-up commits before review so we're reviewing the commits exactly as they would land.
Attachment #8739969 - Flags: review?(MattN+bmo) → review+
https://reviewboard.mozilla.org/r/45459/#review42911

> Nit: new method syntax:
> stringify(source) {

Actually, the function name `stringify` is used to perform the recursivity. I suggest we leave it this way.
Comment on attachment 8741844 [details]
MozReview Request: Bug 1250191 - Load 3rd party lazily, r=MattN

https://reviewboard.mozilla.org/r/46791/#review43463
Attachment #8741844 - Flags: review?(MattN+bmo) → review+
https://reviewboard.mozilla.org/r/45457/#review43477

::: services/common/canonical-json.js:5
(Diff revision 2)
> +/* This Source Code Form is subject to the terms of the Mozilla Public
> + * License, v. 2.0. If a copy of the MPL was not distributed with this file,
> + * You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +this.EXPORTED_SYMBOLS = ["CanonicalJSON"];

This file should be named CanonicalJSON.jsm

::: services/common/canonical-json.js:27
(Diff revision 2)
> +   * as lowercase hexadecimal.
> +   *
> +   * @usage
> +   *        CanonicalJSON.stringify(listOfRecords);
> +   **/
> +  stringify: function stringify(source) {

One problem with this function is that it will cause an infinite loop if there are cycles in the `source`. I realize this isn't going to a problem for Kinto but if we're making this a module for anyone to use then it should probably handle that.

One suggestion is to do an initial JSON.stringify on the initial argument so the JSON engine can catch the cycle, another option is to keep track of the visited ancestors during recursion. Can you file a follow-up bug for this?

::: services/common/moz.build:17
(Diff revision 2)
>      'async.js',
> +    'canonical-json.js',
>      'kinto-updater.js',

Can we put this in toolkit/modules instead? It seems like this doesn't need to be specific to services

::: services/common/tests/unit/test_canonicaljson.js:3
(Diff revision 2)
> +function stringRepresentation(obj) {
> +  const clone = JSON.parse(JSON.stringify(obj));
> +  return JSON.stringify(clone);
> +}
> +
> +add_task(function* test_canonicalJSON_should_preserve_array_order() {
> +    const input = ['one', 'two', 'three'];
> +    // No sorting should be done on arrays.
> +    do_check_eq(CanonicalJSON.stringify(input), '["one","two","three"]');

This file should be using 2-space indentation (the default of mozilla-central) but it's using a mix of 2 and 4.
Blocks: 1265357
https://reviewboard.mozilla.org/r/45457/#review43477

> One problem with this function is that it will cause an infinite loop if there are cycles in the `source`. I realize this isn't going to a problem for Kinto but if we're making this a module for anyone to use then it should probably handle that.
> 
> One suggestion is to do an initial JSON.stringify on the initial argument so the JSON engine can catch the cycle, another option is to keep track of the visited ancestors during recursion. Can you file a follow-up bug for this?

Indeed! I created the bug https://bugzilla.mozilla.org/show_bug.cgi?id=1265357 and assigned it to me.
Comment on attachment 8739968 [details]
MozReview Request: Bug 1250191 - Add a way to serialize JSON canonically

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/45459/diff/1-2/
Attachment #8739968 - Flags: review?(MattN+bmo)
Attachment #8739969 - Attachment is obsolete: true
Attachment #8741844 - Attachment is obsolete: true
Attachment #8742323 - Attachment is obsolete: true
Attachment #8742323 - Flags: review?(MattN+bmo)
Attachment #8742324 - Attachment is obsolete: true
Attachment #8742324 - Flags: review?(MattN+bmo)
Comment on attachment 8739968 [details]
MozReview Request: Bug 1250191 - Add a way to serialize JSON canonically

https://reviewboard.mozilla.org/r/45459/#review43465

Thanks, I landed this with a comment about bug 1265357
Attachment #8739968 - Flags: review?(MattN+bmo) → review+
No longer blocks: 1265357
Depends on: 1265357
https://hg.mozilla.org/mozilla-central/rev/4046917f854b
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
You need to log in before you can comment on or make changes to this bug.