Closed Bug 1067699 Opened 5 years ago Closed 5 years ago

Add mfbt/JSONWriter.h and use it for memory reporting

Categories

(Core :: MFBT, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla35

People

(Reporter: njn, Assigned: njn)

References

Details

Attachments

(2 files, 4 obsolete files)

Memory reports are JSON, and the code for generating the JSON is ugly and ad
hoc. I'm in the process of converting DMD to use JSON for output as well (bug
1044709) so it seems like a good idea to have a library for doing this. Rolling
our own library seems reasonable because:

- generating JSON isn't very hard, and

- most JSON libs require building up a big data structure, which we want to
  avoid.
Comment on attachment 8489749 [details] [diff] [review]
Add mfbt/JSONWriter.h and use it for memory reporting

Review of attachment 8489749 [details] [diff] [review]:
-----------------------------------------------------------------

It would be *most* lovely if whatever JSON writing bits in the JS engine were made to use this instead.  (That's not for this bug, I think, unless you feel really motivated.)  There may be good reasons to keep things separate; I see the JS engine's version contains comments detailing the spec steps involved, for instance.

::: mfbt/JSONWriter.h
@@ +79,5 @@
> +// places we don't want it to.
> +class JSONWriteFunc
> +{
> +public:
> +  virtual void Write(const char* aStr) const = 0;

I am not convinced about the need for |const| here (or on JSONWriter's methods itself, really).  I realize that the idea behind all this stuff is to have a streaming API, but if you have:

class BufferingJSONWriter : JSONWriterFunc {
  nsCString mBuffer;   // or similar
};

the |const| requirement here forces whatever is storing the buffer in the subclass to be |mutable|.  My perception is that |mutable| is not something that should be widely used, much less that API design should require to be used.

Can you provide some background on why |const| was used so widely in this API?

@@ +101,5 @@
> +  //
> +  class EscapedString
> +  {
> +    static_assert(sizeof(char) == 1,
> +                  "JSONWriter assumes that chars are not multi-byte");

This is a tautological assert; see the C++ standard, [expr.sizeof]p1.  I think what you really want is something like:

#include <limits.h>

static_assert(CHAR_BIT == 8, ...);

@@ +119,5 @@
> +      MOZ_ASSERT_IF(!mIsOwned, !mOwnedStr.get() &&  mUnownedStr);
> +    }
> +
> +  public:
> +    EscapedString(const char* aStr)

This constructor should be |explicit|.

@@ +126,5 @@
> +    {
> +      const char* p;
> +      char* q;
> +
> +      // First, see if we need to modify the string.

Instead of doing a two-pass version, you could do a one-pass version similar to how the JSON engine does it, writing maximal-length unescaped substrings in one go.  mozilla::Vector could be used for the buffer management.  Might get complicated if you want to keep the owned/unowned distinction, though...

@@ +151,5 @@
> +      }
> +
> +      // Escapes are needed. We'll create a new string.
> +      size_t len = (p - aStr) + nExtra;
> +      char* str = q = new char[len + 1];

Slightly fancier would be to make this UniquePtr<char[]> so you could Move it into mOwnedStr below.  But that would require keeping a separate index variable.

@@ +185,5 @@
> +    {
> +      SanityCheck();
> +    }
> +
> +    static char hexDigitToAsciiChar(uint8_t u)

I daresay this doesn't need to be public.

@@ +287,5 @@
> +    mWriter->Write(aEndChar);
> +  }
> +
> +public:
> +  // JSONWriter takes ownership of |aWriter|, which must be heap-allocated.

Maybe have it take UniquePtr instead, to show the ownership constraint in the API?

@@ +298,5 @@
> +  }
> +
> +  // Returns the JSONWriteFunc passed in at creation, for temporary use. The
> +  // JSONWriter object still owns the JSONWriteFunc.
> +  JSONWriteFunc* WriteFunc() const { return mWriter.get(); }

Have this return const UniquePtr<JSONWriteFunc>& ?  I guess you could |Move()| the writer out, so that would be bad, but at least the API is a little clearer.

I do like the UniquePtr<> usage for class members in this patch.  I'm not yet sure how wild we should go with sprinkling UniquePtr<> throughout the API contracts in MFBT and/or Gecko code, so please feel free to offer an opinion!

@@ +342,5 @@
> +  // Prints: <aDouble>
> +  void DoubleElement(double aDouble) const
> +  {
> +    char buf[64];
> +    sprintf(buf, "%f", aDouble);

I am leery of using sprintf widely in this API because of differences between implementations for floating-point printing and because ISTR good reasons why sprintf usage is discouraged on Windows, though I can't recall exact reasons at this time.  Maybe those reasons just relate to return values and/or improper termination of the string in some cases, neither of which are a concern here...

For doubles, at least, we could use the FP formatting stuff in MFBT itself.

@@ +370,5 @@
> +    EscapedString escapedStr(aStr);
> +    QuotedScalar(aName, escapedStr.get());
> +  }
> +
> +  // Prints: "<aPtr>"

Nit: "...as a hexadecimal integer with a leading 0x"?

@@ +371,5 @@
> +    QuotedScalar(aName, escapedStr.get());
> +  }
> +
> +  // Prints: "<aPtr>"
> +  void PointerAsStringElement(const void* aPtr) const

It is a little odd to have the previous things be ${THING}{Element,Property} but this be ${THING}AsString{Element,Property}.  I think one could make a reasonable argument that:

void PointerProperty(const void* aPtr) const

could really only print the thing as a string.

@@ +387,5 @@
> +    QuotedScalar(aName, buf);
> +  }
> +
> +  // Prints: [
> +  void StartArrayElement() const { StartCollection(nullptr, "["); }

WDYT about having a MOZ_STACK_CLASS |AutoArrayElement| and |AutoArrayProperty(const char* aName)|, along with the analogous Object versions?

@@ +415,5 @@
> +// The chars with non-'___' entries in this table are those that can be
> +// represented with a two-char escape sequence. The value is the second char in
> +// the sequence, that which follows the initial backslash.
> +#define ___ 0
> +/*static*/ const char JSONWriter::EscapedString::mTwoCharEscapes[256] = {

I think you want to place this in its own .cpp file, so multiple places that #include "JSONWriter.h" don't multiply-define this.

::: mfbt/tests/TestJSONWriter.cpp
@@ +309,5 @@
> +    }
> +    w.StringProperty("ascii", buf);
> +
> +    // Test lots of unicode stuff. Note that this file is encoded as UTF-8.
> +    w.BoolProperty("مرحبا هناك", true);

Out of curiosity, have you verified that MSVC does the Right Thing with this?
Attachment #8489749 - Flags: review?(nfroyd) → feedback+
Thank you for the fast and attentive review, which will result in better code being committed :)

> It would be *most* lovely if whatever JSON writing bits in the JS engine
> were made to use this instead.  (That's not for this bug, I think, unless
> you feel really motivated.)  There may be good reasons to keep things
> separate; I see the JS engine's version contains comments detailing the spec
> steps involved, for instance.

Alas, I think that would be putting a square peg in a round hole.


> Can you provide some background on why |const| was used so widely in this
> API?

I was ambivalent about this exact point, so I'll remove it and the |mutable| occurrences.


> static_assert(CHAR_BIT == 8, ...);

I'll just remove it. I bet that assumption is baked in 100 other places.


> Instead of doing a two-pass version, you could do a one-pass version similar
> to how the JSON engine does it, writing maximal-length unescaped substrings
> in one go.  mozilla::Vector could be used for the buffer management.  Might
> get complicated if you want to keep the owned/unowned distinction, though...

My goal was to minimize memory usage. I don't understand your suggestion.


> > +      // Escapes are needed. We'll create a new string.
> > +      size_t len = (p - aStr) + nExtra;
> > +      char* str = q = new char[len + 1];
> 
> Slightly fancier would be to make this UniquePtr<char[]> so you could Move
> it into mOwnedStr below.  But that would require keeping a separate index
> variable.

I think I'll leave it as is. I like the setting of the class members at the end, for symmetry with the other case.


> > +public:
> > +  // JSONWriter takes ownership of |aWriter|, which must be heap-allocated.
> 
> Maybe have it take UniquePtr instead, to show the ownership constraint in
> the API?

Ok. I'll do likewise with JSONWriter's constructor.


> > +  // Returns the JSONWriteFunc passed in at creation, for temporary use. The
> > +  // JSONWriter object still owns the JSONWriteFunc.
> > +  JSONWriteFunc* WriteFunc() const { return mWriter.get(); }
> 
> Have this return const UniquePtr<JSONWriteFunc>& ?  I guess you could
> |Move()| the writer out, so that would be bad, but at least the API is a
> little clearer.

I tried that, but the uses all involve immediately downcasting, and I couldn't work out how to do that with a |const UniquePtr<JSONWriteFunc>&|. I didn't try *that* hard because I figured it wasn't worth the effort -- a turd with a bow on it is still a turd :)


> I do like the UniquePtr<> usage for class members in this patch.  I'm not
> yet sure how wild we should go with sprinkling UniquePtr<> throughout the
> API contracts in MFBT and/or Gecko code, so please feel free to offer an
> opinion!

I'm still learning UniquePtr too. It's got some nice parts, but the inability to nicely express a borrowed use is unfortunate.


> For doubles, at least, we could use the FP formatting stuff in MFBT itself.

Good idea. I've changed it, and the output is much nicer. However, it did require some build system jiggery-pokery, because double-conversion.h is currently weird. I've put that in a preliminary patch.


> WDYT about having a MOZ_STACK_CLASS |AutoArrayElement| and
> |AutoArrayProperty(const char* aName)|, along with the analogous Object
> versions?

I considered that. It's nice if you're emitting all your JSON in a single function. But if you're split across functions -- e.g. as happens with the memory reporter dumping, due to the use of async functions -- then they're no use, because you'd have to maintain an explicit stack of them, which is no better than just calling the End() functions.

Furthermore, unlike many RAII cases where the consequences of non-destruction are subtle, if you forget the End() functions you'll detect the problem with the most basic of testing.

So I don't think it's worth it. But I'll add a comment about this.


> > +/*static*/ const char JSONWriter::EscapedString::mTwoCharEscapes[256] = {
> 
> I think you want to place this in its own .cpp file, so multiple places that #include "JSONWriter.h"
> don't multiply-define this.

You're right, but putting it into a .cpp file caused link errors I don't understand. Something about MFBT .cpp files going into libmozglue. So I just made the table creation dynamic instead, which is easy in a table that's mostly zeroes.


> > +    // Test lots of unicode stuff. Note that this file is encoded as UTF-8.
> > +    w.BoolProperty("مرحبا هناك", true);
> 
> Out of curiosity, have you verified that MSVC does the Right Thing with this?

No, but the try server will have my back!
double-conversion.h isn't exported from MFBT. The two files that use it
(jsnum.cpp and nsTSubstring.cpp) access it via LOCAL_INCLUDES, which is totally
gross.

This patch just exports it like a normal file, such as mfbt/decimal/Decimal.h.
Straightforward, except that utils.h also must be exported, which is a little
unfortunate.

froydnj, feel free to punt this review to mshal if you're not confident in you
build system quasi-peerdom :)
Attachment #8490554 - Flags: review?(nfroyd)
Changes in this version:

- Added a comment why there's no RAII facilities for closing collections.

- Used UniquePtr<> in more places.

- Renamed PointerAsString*() as Pointer*().

- Used MFBT formatting for FP values.

- Removed |const| from JSONWriteFunc::Write().

- Made mTwoCharEscapes dynamic.

- A couple of other small things.
Attachment #8490555 - Flags: review?(nfroyd)
Attachment #8489749 - Attachment is obsolete: true
Comment on attachment 8490555 [details] [diff] [review]
(part 2) - Add mfbt/JSONWriter.h and use it for memory reporting

Review of attachment 8490555 [details] [diff] [review]:
-----------------------------------------------------------------

Random driveby comments after seeing this in mail, no need for formal reviewing or whatever.

::: mfbt/JSONWriter.h
@@ +3,5 @@
> +/* This Source Code Form is subject to the terms of the Mozilla Public
> + * License, v. 2.0. If a copy of the MPL was not distributed with this
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +// A typical JSON-writing library requires you to first build up a data

This could use a /* JSON generation class for non-critical uses. */ comment or so, to act as summary on MXR.

@@ +15,5 @@
> +//
> +// The API is slightly uglier than you would see in a typical JSON-writing
> +// library, but still fairly easy to use. It's possible to generate invalid
> +// JSON with JSONWriter, but typically the most basic testing will identify any
> +// such problems.

You're awfully trusting.

@@ +28,5 @@
> +// pointers) and the consequences of defects are more subtle.
> +//
> +// Importantly, the class does solve the two hard problems of JSON generation,
> +// which are (a) correctly escaping strings, and (b) adding appropriate
> +// indentation and commas between items.

Add in "human-readable" before "JSON generation", because prettyprinting isn't really a JSON generation concern.

@@ +42,5 @@
> +//   JSONWriter w(MakeUnique<MyWriteFunc>());
> +//   w.Start();
> +//   {
> +//     w.BoolProperty("bool", true);
> +//     w.IntProperty("int", 1);

Erm, mfbt consistently uses camelCaps for method names.  My understanding was that we were sticking to camelCaps given the the large presence of such names throughout the code, both in Mozilla code (JS, XPConnect, etc.) and in imported code making absolute InterCaps adherence an impossibility.

@@ +57,5 @@
> +//     w.EndArrayProperty();
> +//   }
> +//   w.End();
> +//
> +// will produce the following output:

I suggest "will produce prettyprinted formatting for this JSON structure", so as to be slightly vague about the exact makeup.  Exact structuring for pretty-printed data usually isn't a great idea, because then people start cheating and expecting more than the baseline syntax.

@@ +66,5 @@
> +//   "string": "hello",
> +//   "array": [
> +//    3.4,
> +//    {
> +//     "0x12345678"

Uh...this is not right.

@@ +134,5 @@
> +      return u < 10 ? '0' + u : 'a' + (u - 10);
> +    }
> +
> +  public:
> +    EscapedString(const char* aStr, char aTwoCharEscapes[256])

If you really want a 256-element array, make it a reference to such an array.  This is just a pointer.

@@ +166,5 @@
> +      }
> +
> +      // Escapes are needed. We'll create a new string.
> +      size_t len = (p - aStr) + nExtra;
> +      char* str = q = new char[len + 1];

The traditional way to do this would be a UniquePtr<char[]> coupled with a |size_t i| to index into it.  Raw pointers into singly-owned arrays aren't much good, and you're almost there with UniquePtr, practically.

@@ +209,5 @@
> +  };
> +
> +  const UniquePtr<JSONWriteFunc> mWriter;
> +  Vector<bool, 8> mNeedComma;     // do we need a comma at depth N?
> +  int mDepth;                     // the current nesting depth

This is never negative, and it's a count, so make it a size_t.  Same for |i| in indent().

@@ +296,5 @@
> +  }
> +
> +public:
> +  // JSONWriter takes ownership of |aWriter|, which must be heap-allocated.
> +  explicit JSONWriter(UniquePtr<JSONWriteFunc> aWriter)

This is inherent in the interface, so no need to call it out.

@@ +304,5 @@
> +  {
> +    // The chars with non-'___' entries in this table are those that can be
> +    // represented with a two-char escape sequence. The value is the second
> +    // char in the sequence, that which follows the initial backslash.
> +    mozilla::PodArrayZero(mTwoCharEscapes);

I don't understand why this is a thing in every writer rather than static const data in a JSONWriter.cpp.  Sure, defining big mostly-zeroed arrays is obnoxious, but it only needs to happen once.

@@ +366,5 @@
> +  // Prints: <aDouble>
> +  void DoubleElement(double aDouble) { DoubleProperty(nullptr, aDouble); }
> +
> +  // Prints: "<aName>": "<aStr>"
> +  // Escapes the string as needed.

both strings

@@ +375,5 @@
> +  }
> +
> +  // Prints: "<aStr>"
> +  // Escapes the string as needed.
> +  void StringElement(const char* aStr) { StringProperty(nullptr, aStr); }

Probably worth adding nullElement() for JSON-completeness.

::: mfbt/tests/TestJSONWriter.cpp
@@ +8,5 @@
> +#include "mozilla/JSONWriter.h"
> +#include <stdio.h>
> +#include <string.h>
> +
> +using namespace mozilla;

We do specific using statements in tests, rather than open up the entirety of Pandora's box.
Comment on attachment 8490554 [details] [diff] [review]
(part 1) - Export double-conversion.h normally from MFBT

Review of attachment 8490554 [details] [diff] [review]:
-----------------------------------------------------------------

I think I can review this even without being a build peer.
Attachment #8490554 - Flags: review?(nfroyd) → review+
Comment on attachment 8490555 [details] [diff] [review]
(part 2) - Add mfbt/JSONWriter.h and use it for memory reporting

Review of attachment 8490555 [details] [diff] [review]:
-----------------------------------------------------------------

I have very little to add to Jeff's comments.

::: mfbt/JSONWriter.h
@@ +35,5 @@
> +// char*| throughout, and can be ASCII or UTF-8.
> +//
> +// EXAMPLE
> +// -------
> +// Assume that |MyWriteFunc| is class that implements |JSONWriteFunc|. The

Nit: "is a class"

@@ +304,5 @@
> +  {
> +    // The chars with non-'___' entries in this table are those that can be
> +    // represented with a two-char escape sequence. The value is the second
> +    // char in the sequence, that which follows the initial backslash.
> +    mozilla::PodArrayZero(mTwoCharEscapes);

Ditto what Jeff said.  I believe the magic ingredient that you're missing is tagging the declaration with MFBT_DATA.
Attachment #8490555 - Flags: review?(nfroyd) → review+
> Erm, mfbt consistently uses camelCaps for method names.  My understanding
> was that we were sticking to camelCaps given the the large presence of such
> names throughout the code, both in Mozilla code (JS, XPConnect, etc.) and in
> imported code making absolute InterCaps adherence an impossibility.

mfbt/STYLE file says the following.

> Some of the files use a lower-case letter at the start of function names.
> This is because MFBT used to use a different style, and was later converted
> to standard Mozilla style. These functions have not been changed to use an
> upper-case letter because it would cause a lot of churn in other parts of the
> codebase. However, new files should follow standard Mozilla style and use an
> upper-case letter at the start of function names.

Thank you for the extra review! I've addressed all your other comments.


> I believe the magic ingredient that you're missing is tagging the declaration
> with MFBT_DATA.

Aha! Excellent.
Updated version... the MFBT_DATA thing works locally but causes lots of bustage
on try :(
Attachment #8490555 - Attachment is obsolete: true
Here's the version using a global variable instead, which avoids the link
errors, and which froydnj r+'d via IRC.
Attachment #8491190 - Attachment is obsolete: true
froydnj suggested putting it inside mozilla::detail.
Attachment #8491198 - Attachment is obsolete: true
https://hg.mozilla.org/mozilla-central/rev/6fc33b189e9f
https://hg.mozilla.org/mozilla-central/rev/abe819e638a9
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla35
(In reply to Nicholas Nethercote [:njn] from comment #9)
> mfbt/STYLE file says the following.
> 
> > Some of the files use a lower-case letter at the start of function names.
> > This is because MFBT used to use a different style, and was later converted
> > to standard Mozilla style. These functions have not been changed to use an
> > upper-case letter because it would cause a lot of churn in other parts of the
> > codebase. However, new files should follow standard Mozilla style and use an
> > upper-case letter at the start of function names.

This is not what I remember agreeing to.  My understanding was that camelCaps for method names, InterCaps for class names would remain, precisely because the style is widely used, is prevalent in the JS engine and in XPConnect as well as elsewhere, and there would be no possibility of us expunging the third-party code using this capitalization convention to achieve actual uniformity.

But whatever, this is landed, I have better things to worry about right now.
You need to log in before you can comment on or make changes to this bug.