In telemetry, don't do malloc_usable_size(vec[0]) where vec is stl::vector; it's not safe

RESOLVED FIXED in Firefox 62

Status

()

defect
P3
normal
RESOLVED FIXED
7 years ago
a year ago

People

(Reporter: justin.lebar+bug, Assigned: rofael, Mentored)

Tracking

Trunk
mozilla62
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox62 fixed)

Details

Attachments

(2 attachments, 2 obsolete attachments)

See bug 806377 comment 1 - 5:

histogram.cc does malloc_size_of(vec[0]), where vec is an stl::vector.  That's not safe because the vector might use inline storage (like nsAutoTArray), or vec[0] might not be the start of a malloc'ed block.

Instead, we should use a custom allocator, or some other trick here.
Blocks: 746714
A custom allocator sounds better, but how would it work?  Would it count the number of bytes allocated?  Would it know about frees?  For it to integrate with DMD, all the heap blocks need to be measured with an nsMallocSizeOfFun.
The allocator would wrap the default allocator, which wraps new and delete.  So you'd probably want to do a counter-based memory reporter, but you could still call nsMallocSizeOfFn and you could still track deletes.
Ah yep, we already wrap hunspell's allocator in a similar way and DMD has machinery to handle it.  Good.
A tempting thing to do would be to have one allocator per vector and let that allocator store the amount of memory used by that vector.

That is, the vector's allocator would contain a counter and add malloc_usable_size to it on every allocation, subtracting malloc_usable_size on every de-allocation.

Unfortunately C++03 [1] rules this out.  In section 20.1.5, it says:

  "All instances of a given allocator type are required to be interchangeable and always compare equal to each other."

So that means that the allocators are going to have to increment and decrement a global variable.

[1] I don't want to link to it because it's not freely-distributable, but try Googling "ISO/IEC 14882:2003".
In section 19.4.1 of Stroustrup's 3rd edition (see http://mmc.geofisica.unam.mx/utils/CyC++/Documentacion/The%20C++%20Programming%20Language,%203rd%20Edition%20-%20%28pdf%29/ch19.pdf) he claims that "Allocators can still hold data on a per-allocator-type basis", but if I've understood his example correctly this requires a non-standard, extended allocator type and so it doesn't work on std types like |vector|.

Assuming that's right, my best idea involves (a) an allocator class that has a static |mLiveBytes| member, and (b) a macro that generates instances of this allocator class.

E.g.:

  #define ALLOCATOR(name)                     \
    template <class T> class name#Allocator { \
    private:                                  \
      static const size_t mLiveBytes;         \
    public:                                   \
      /* All the usual allocator stuff */     \
  }

We'd create as many instances of this class as necessary, effectively achieving per-allocator byte counting.  The macro is a hack but I don't see how else to do it.  I guess this would end up in mfbt/.
I'm not sure you need a macro.  We could instead use the curiously-recurring template pattern.

template<class OuterAlloc>
class CountingAllocator {
  // usual allocator stuff
  void* Allocate(size_t size) {
    void* p = malloc(size);
    OuterAlloc::NoteAllocation(malloc_usable_size(p));
    return p;
  }

  void Free(void* p) {
    OuterAlloc::NoteDeallocation(malloc_usable_size(p));
  }
};

class MyAllocator : public CountingAllocator<MyAllocator>
{
  void NoteAllocation(size_t s);
  void NoteDeallocation(size_t s);
};

Perhaps that's marginally better than the macro.  Perhaps CountingAllocator isn't the right name.
Even with the approach in comment 5, this thing is still gonna be a pain to use.  To measure all the histograms we'll need something like this:

  size_t n = 0;
  for each h in Histograms {
    n += h->SizeOfIncludingThis();   // measures everything but counts_ and ranges_
  }
  n += RangesAllocator.LiveBytes();  // measures all the counts_ fields
  n += CountsAllocator.LiveBytes();  // measures all the ranges_ fields
  
At this point I think the least worst approach is to simply convert these two fields to nsTArray.
This patch just converts Histogram::{ranges_,counts_} from vector<int> to
nsTArray<int>.
Attachment #679542 - Flags: review?(nfroyd)
Assignee: nobody → n.nethercote
Comment on attachment 679542 [details] [diff] [review]
Convert Histogram's std::vector members to nsTArray to facilitate memory reporting.

Review of attachment 679542 [details] [diff] [review]:
-----------------------------------------------------------------

I have no problem with the code; the only twinge of worry I feel is that this is the first depend-on-xpcom-headers change to the base ipc/chromium bits.  But we already depend on xpcom headers elsewhere in ipc/ and I think we've basically given up on wholesale imports of upstream for this bit of code, so it probably doesn't matter.

::: ipc/chromium/src/base/histogram.cc
@@ +451,5 @@
>      range_checksum_(0),
>      sample_() {
> +  for (uint32_t i = 0; i < bucket_count + 1; i++) {
> +    ranges_.AppendElement(0);
> +  }

Since we do this loop in both constructors, can we just move it to Initialize()?  I understand the conceptual nicety of consistency with what was there before, but there's no point in duplicating the code.

@@ +580,5 @@
>  }
>  
>  uint32_t Histogram::CalculateRangeChecksum() const {
> +  DCHECK_EQ(ranges_.Length(), bucket_count() + 1);
> +  uint32_t checksum = static_cast<uint32_t>(ranges_.Length()); // Seed checksum.

Nit: No need for cast, Length() returns uint32_t.

@@ +718,5 @@
> +  size_t oldLength = counts_.Length();
> +  size_t newLength = histogram.bucket_count();
> +  counts_.SetLength(newLength);
> +  for (uint32_t i = oldLength; i < newLength; i++) {
> +    counts_[i] = 0;  

Nit: Trailing whitespace.

@@ +761,5 @@
>    // negative values when snapshots are later combined (and deltas calculated).
>    // As a result, we don't currently CHCEK() for positive values.
>    sum_ -= other.sum_;
>    redundant_count_ -= other.redundant_count_;
> +  for (size_t index = 0; index < counts_.Length(); ++index) {

Nit: |index| should really be uint32_t, for consistency.  Or size_t everywhere.  Or Counts::size_type.  Whichever you like, just be consistent.  Here and elsewhere.
Attachment #679542 - Flags: review?(nfroyd) → review+
Blocks: 719181
I get lots of test failures like this:

TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 200 bytes during test execution
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 50 instances of nsTArray_base with size 4 bytes each (200 bytes total)


I don't understand this.  The nsTArrays are within Histogram.
(In reply to Nicholas Nethercote [:njn] from comment #10)
> I get lots of test failures like this:
> 
> TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 50
> instances of nsTArray_base with size 4 bytes each (200 bytes total)
> 
> I don't understand this.  The nsTArrays are within Histogram.

This is because we leak Histograms semi-deliberately.  See e.g. bug 717630 and:

http://mxr.mozilla.org/mozilla-central/source/ipc/chromium/src/base/histogram.cc#95

I suppose we could try to enforce that nobody is reading/writing to histograms post-XPCOM shutdown and therefore use ClearOnShutdown with them.
> This is because we leak Histograms semi-deliberately.

Sigh.  Did this not show up as a leak before because it was std::vector, and we can't detect leaks of std::vector?
(In reply to Nicholas Nethercote [:njn] from comment #12)
> > This is because we leak Histograms semi-deliberately.
> 
> Sigh.  Did this not show up as a leak before because it was std::vector, and
> we can't detect leaks of std::vector?

Yeah, aiui trace-malloc reports leaks of objects which are registered with it; it doesn't report leaks of malloc's.
This patch moves around the initialization of StatisticsRecorder so that it
will be deleted at XPCOM shutdown.  IIUC, that should enable the histograms
and the data within them to be properly noted by the leak checking machinery
(though freeing the histograms will require the patch in bug 824647).
Attachment #721819 - Flags: review?(taras.mozilla)
> This patch moves around the initialization of StatisticsRecorder so that it
> will be deleted at XPCOM shutdown.

Cool!  Thanks for doing this.
The code trying to prevent InitializeStatisticsRecorder from being called
more than once was just braindead.  I think we'll just have to accept that
if people attempt to record telemetry post-XPCOM shutdown that things will
crash.  Maybe that's not such a bad thing, since the data being recorded
would have no way of being sent in anyway...
Attachment #721819 - Attachment is obsolete: true
Attachment #721819 - Flags: review?(taras.mozilla)
Attachment #722245 - Flags: review?(taras.mozilla)
Comment on attachment 722245 [details] [diff] [review]
initialize StatisticsRecorder lazily so it can be properly freed for leak-checking purposes

This isn't going to work as-is: we destroy StatisticsRecorder and associated histograms at xpcom-shutdown.  But we are still holding pointers to histograms in Telemetry itself, and C++ clients can still get at them.  Accessing freed memory never ends well.
Attachment #722245 - Flags: review?(taras.mozilla)
Depends on: 849272
Comment on attachment 722245 [details] [diff] [review]
initialize StatisticsRecorder lazily so it can be properly freed for leak-checking purposes

This work has been moved over to bug 849272.
Attachment #722245 - Attachment is obsolete: true
It seems as though the concerns in this bug may have been ameliorated by the work in bug 798914?
Flags: needinfo?(n.nethercote)
(In reply to Chris H-C :chutten from comment #19)
> It seems as though the concerns in this bug may have been ameliorated by the
> work in bug 798914?

That's not relevant, AFAICT.

Measuring the memory usage of an std::vector like this is still dubious, as per comment 0. In practice it doesn't seem to be a problem though. The proper fix is still to convert these std::vector instances to nsTArray or mozilla::Vector.
Flags: needinfo?(n.nethercote)
Mentor: chutten
Priority: -- → P3
Assignee: n.nethercote → nobody
Assignee

Comment 21

a year ago
Per comment 20, what scope would all of those replacements need to be done in? Just histogram.cc (vector usage appears to be entirely internal if I'm reading https://dxr.mozilla.org/mozilla-central/source/ipc/chromium/src/base/histogram.h correctly)?
Flags: needinfo?(chutten)
That is my reading of it as well, yes. But my approach thus far has been to mostly ignore histogram.{h,cc} and hope it keeps working, so I may be wrong :)
Flags: needinfo?(chutten)
Assignee

Comment 23

a year ago
I might as well pick this up. Seems to be a straightforward fix.
Assignee: nobody → me
Status: NEW → ASSIGNED
Assignee

Comment 24

a year ago
Okay. not as easy as I thought. I ran it on Try and for some reason it fails lots of xpcshell tests on linux64 opt. https://treeherder.mozilla.org/#/jobs?repo=try&revision=990bf5177146a38c7a2783abc945c074a2733358

Any ideas before I dig through those tests? I assume it's relevant because I see others are passing it no problem.
Flags: needinfo?(chutten)
Sounds as though there's a crash:

[task 2018-05-24T03:06:37.253Z] 03:06:37     INFO - E   PID 38044 | ExceptionHandler::GenerateDump cloned child 38071

Though it's weird it only happened on the opt build. The debug build proceeded without issue.
Let's see if a retrigger will sort it (though if it failed twice I'm not terribly hopeful...)
Flags: needinfo?(chutten)
Assignee

Comment 26

a year ago
This time I'm only doing only opt builds across different OSs. I also rebased on top of the latest commit, just in case (I had some weird heads going on). 

https://treeherder.mozilla.org/#/jobs?repo=try&revision=e3125bc7655832a5dd454bfb09270f9b9e687806
Assignee

Comment 27

a year ago
>/builds/worker/workspace/build/src/toolkit/components/telemetry/TelemetryHistogram.cpp:2539:11: error: no member named 'resize' in 'nsTArray<int>'

Huh, looks like there's a class (https://dxr.mozilla.org/mozilla-central/rev/47e81ea1ef10189ef210867934bf36e14cf223dc/toolkit/components/telemetry/TelemetryHistogram.cpp#2539) that inherited from SampleSet. 

> PersistedSampleSet::PersistedSampleSet(const nsTArray<Histogram::Count>& aCounts,
>                                        int64_t aSampleSum)
> {
>   // Initialize the data in the base class. See Histogram::SampleSet
>   // for the fields documentation.
>   const size_t numCounts = aCounts.Length();
>   counts_.resize(numCounts);
>   for (size_t i = 0; i < numCounts; i++) {
>     counts_[i] = aCounts[i];
>     redundant_count_ += aCounts[i];
>   }
>   sum_ = aSampleSum;
> };

I'll try again with that and see if it makes a change (wonder why the other builds didn't pick that up...).

https://treeherder.mozilla.org/#/jobs?repo=try&revision=ee8dbed96b9cc8a588d8cf13567b7b279a91d72c
It was only just added in bug 1457127 so it is very new.
Assignee

Comment 29

a year ago
Nope, same problem. According to these, the tests to make sure the XPCShell Testing API is working are all failing (because XPCShell keeps crashing?). That has nothing to do with my code, so why would that be happening?

If it's because XPCShell can't boot at all, I'm going to try running it locally after this build and see if that can tell me something...
The tests might call into code that calls code that calls code that accumulates Telemetry in a way that's odd. If there were a way to get stacktraces from the crashing location, that'd probably help investigation.
Assignee

Comment 31

a year ago
In one of the test reports:

>[task 2018-05-24T20:50:54.244Z] 20:50:54     INFO -   1  libxul.so!InvalidArrayIndex_CRASH [nsTArray.cpp:ee8dbed96b9cc8a588d8cf13567b7b279a91d72c : 26 + 0xf]
>[task 2018-05-24T20:50:54.244Z] 20:50:54     INFO -   2  libxul.so!nsTArray_Impl<BloatEntry *, nsTArrayInfallibleAllocator>::ElementAt [nsTArray.h:ee8dbed96b9cc8a588d8cf13567b7b279a91d72c : 1031 + 0x7]
>[task 2018-05-24T20:50:54.245Z] 20:50:54     INFO -   3  libxul.so!base::Histogram::SampleSet::Resize [nsTArray.h:ee8dbed96b9cc8a588d8cf13567b7b279a91d72c : 1069 + 0x7]

Ah, in my code, I call SetLength(histogram.bucket_count()) and try to zero out counts_[0..histogram.bucket_count()]. If SetLength only increases the capacity of the array, rather than the size, my loop would be too large, explaining the crashes.
Assignee

Comment 32

a year ago
And I just realized that Resize isn't a reset. I've changed Resize so now it removes elements if it wants fewer buckets, and appends zeros if it wants more.
Assignee

Comment 33

a year ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=22b38aed2d643ae003f26b3b9b11f71c4fc8449a

Looks like that's been fixed! I'll send it to reviewboard if everything else looks good.
Comment hidden (mozreview-request)

Comment 35

a year ago
mozreview-review
Comment on attachment 8980465 [details]
Bug 806514 - Replaced std::vector with nsTArray in Histogram::SampleSet.

https://reviewboard.mozilla.org/r/246628/#review252978

::: ipc/chromium/src/base/histogram.cc:433
(Diff revision 1)
>  
>  void Histogram::SampleSet::Resize(const Histogram& histogram) {
> -  counts_.resize(histogram.bucket_count(), 0);
> +  size_t newSize = histogram.bucket_count();
> +
> +  while (newSize < counts_.Length()) counts_.RemoveLastElement();
> +  while (newSize > counts_.Length()) counts_.AppendElement(0);

Won't SetLength take care of this?

Comment 36

a year ago
mozreview-review-reply
Comment on attachment 8980465 [details]
Bug 806514 - Replaced std::vector with nsTArray in Histogram::SampleSet.

https://reviewboard.mozilla.org/r/246628/#review252978

> Won't SetLength take care of this?

Sorry, this was a little short on detail.

SetLength -should- take care of this for you. See how it's used, for instance, over here: https://searchfox.org/mozilla-central/rev/bf4def01bf8f6ff0d18f02f2d7e9efc73e12c63f/dom/events/DataTransferItemList.cpp#58

Is it possible that your SetLength was returning false for some reason (failing to resize)?
Assignee

Comment 37

a year ago
mozreview-review-reply
Comment on attachment 8980465 [details]
Bug 806514 - Replaced std::vector with nsTArray in Histogram::SampleSet.

https://reviewboard.mozilla.org/r/246628/#review252978

> Sorry, this was a little short on detail.
> 
> SetLength -should- take care of this for you. See how it's used, for instance, over here: https://searchfox.org/mozilla-central/rev/bf4def01bf8f6ff0d18f02f2d7e9efc73e12c63f/dom/events/DataTransferItemList.cpp#58
> 
> Is it possible that your SetLength was returning false for some reason (failing to resize)?

Ah, there it is! I only found that fallible one after digging through nsTArray.h. I had to set any new values to zero in SampleSet::Resize, but it's working now.
Comment hidden (mozreview-request)

Comment 39

a year ago
mozreview-review
Comment on attachment 8980465 [details]
Bug 806514 - Replaced std::vector with nsTArray in Histogram::SampleSet.

https://reviewboard.mozilla.org/r/246628/#review253330

This is looking good! Thank you for your efforts. I'll send this to autoland.

::: ipc/chromium/src/base/histogram.cc:433
(Diff revision 2)
>  
>  void Histogram::SampleSet::Resize(const Histogram& histogram) {
> -  counts_.resize(histogram.bucket_count(), 0);
> +  size_t oldSize = counts_.Length();
> +  counts_.SetLength(histogram.bucket_count());
> +
> +  for (size_t i = oldSize; i < histogram.bucket_count(); ++i) counts_[i] = 0;

In case you're wondering why you have to go to this effort, it's because of a decade-old design decision to have nsTArray behave like a "regulare C/C++ array" instead of using default initialization (https://searchfox.org/mozilla-central/rev/5a744713370ec47969595e369fd5125f123e6d24/xpcom/ds/nsTArray.h#519)
Attachment #8980465 - Flags: review?(chutten) → review+

Comment 40

a year ago
Pushed by chutten@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/19e5c0a9b0af
Replaced std::vector with nsTArray in Histogram::SampleSet. r=chutten

Comment 41

a year ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/19e5c0a9b0af
Status: ASSIGNED → RESOLVED
Last Resolved: a year ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
You need to log in before you can comment on or make changes to this bug.