Closed Bug 1278821 Opened 3 years ago Closed 2 years ago

Re-factor JS-exposed TelemetryHistogram:: functions so as to avoid both races and deadlocks

Categories

(Toolkit :: Telemetry, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla59
Tracking Status
firefox59 --- fixed

People

(Reporter: jseward, Assigned: mortificador, Mentored)

References

(Blocks 3 open bugs)

Details

(Whiteboard: [measurement:client] [lang=c++])

Attachments

(1 file, 6 obsolete files)

Copied from bug 1258183 comment 38:

Review of attachment 8757305 [details] [diff] [review] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/TelemetryHistogram.cpp
@@ +1070,5 @@
> +// deadlock because the JS_ calls that they make may call back into the
> +// TelemetryHistogram interface, hence trying to re-acquire the mutex.
> +//
> +// This means that these functions potentially race against threads, but
> +// that seems preferable to risking deadlock.

Can't we avoid deadlocks properly by re-factoring the JS-exposed functions?

In general something like:
* initial JS calls (arg conversion, ...)
* lock
* internal_* calls
* unlock
* final JS calls (return value construction, serialization, ...)
OS: Unspecified → All
Priority: -- → P3
Hardware: Unspecified → All
Whiteboard: [measurement:client]
Version: unspecified → Trunk
This would at least affect:
* TelemetryHistogram::{Create,GetKeyed,GetAddon}HistogramSnapshots
* JSHistogram_*()
* JSKeyedHistogram_*()
Blocks: 1304519
Assignee: nobody → chutten
No longer blocks: 1304519
Flags: needinfo?(gfritzsche)
Blocks: 1304519
The 3-phased approach from comment 0 was already implemented for TelemetryScalar.cpp, e.g.:
https://dxr.mozilla.org/mozilla-central/rev/42c95d88aaaa7c2eca1d278399421d437441ac4d/toolkit/components/telemetry/TelemetryScalar.cpp#1177
Flags: needinfo?(gfritzsche)
Priority: P3 → P1
Priority: P1 → P2
Assignee: chutten → nobody
Priority: P2 → P3
Mentor: gfritzsche
Mentor: chutten
Whiteboard: [measurement:client] → [measurement:client] [lang=c++]
Depends on: 1362953
Could you elaborate a bit on this?
Flags: needinfo?(chutten)
Some functions in TelemetryHistogram are unprotected by locks because they contain calls into the javascript engine. These calls into JS could in turn call back into TelemetryHistogram (say, to record the length of a garbage collection) which leaves us with a re-entrancy problem that would deadlock. The same thread would both have and attempt to claim the lock.

This bug is about refactoring those functions mentioned in comment#1 to perform their JS activities without the lock (no worries here, if they trigger a Telemetry action they can gain the lock without problems), then claim the telemetry lock, perform Telemetry activities, release the lock, then perform any last JS activities before returning.

This three-phase (JS without lock, Telemetry with lock, final JS without lock) approach has worked in other parts of telemetry with decent success. Examples can be found in the JS-related functions of, for instance, TelemetryScalars.
Flags: needinfo?(chutten)
Hey Fernando, would be interested in working on this bug?
Flags: needinfo?(mortificador)
Hi Alessio,

Yes! I'd like to work on this bug :) I'll start taking a look now

Thank you very much!
Flags: needinfo?(mortificador)
Assignee: nobody → mortificador
Hi Alessio,

I have a patch, I think that I've done everything that needs to be done (please let me know if that's not the case!).
To check that I haven't broken anything I've run these tests (I'm not sure if I should run additional tests):

mach gtest *Telemetry*Histogram*
mach test toolkit/components/telemetry/tests/unit/test_TelemetryHistograms.js

All of them pass :)

Also, we could do as we did here https://bugzilla.mozilla.org/show_bug.cgi?id=1362957 and refactor the internal_* calls so they take the lock as function parameter(as a safer pattern to avoid deadlocks). I guess that would be a different bug.

I'm sure that there will be things to fix in the patch, but I hope that is good enough at least a as first approach.

Thank you very much!
Attached patch bug1278821.patch (obsolete) — Splinter Review
Attachment #8930223 - Flags: review?(alessio.placitelli)
We do actually have a bug for refactoring TelemetryHistogram locking: bug 1362953

:lvk was looking into that last week. Your work here will certainly make that job more straightforward :)
Comment on attachment 8930223 [details] [diff] [review]
bug1278821.patch

Review of attachment 8930223 [details] [diff] [review]:
-----------------------------------------------------------------

Hey Fernando!

This is a bit trickier than the other bugs, so don't be scared about the comments below :) The approach is good, we just need to deal with some complexity due to the JS/Telemetry internals exchanges.

In general, for testing, I would suggest you run the full test suites for telemetry in addition to the gtest:

./mach test toolkit/components/telemetry

Let me know if anything is unclear!

::: toolkit/components/telemetry/TelemetryHistogram.cpp
@@ +1137,5 @@
>    HistogramID id = data->histogramId;
> +
> +  {
> +    StaticMutexAutoLock locker(gTelemetryHistogramMutex);
> +    MOZ_ASSERT(internal_IsHistogramEnumId(id));

Ideally, we want a single |StaticMutexAutoLock| per function as holding/releasing might have impact on performance. In addition, we need all the JS-related functions to be out of the scope of our lock. For this particular function, things get a bit complicated due to the nested if(s) below which makes use of both telemetry internals and JS. Can you try to untangle that ?

@@ +1149,5 @@
>    args.rval().setUndefined();
>  
> +  {
> +    StaticMutexAutoLock locker(gTelemetryHistogramMutex);
> +    if (!internal_CanRecordBase()) {

Let's move this check a few lines below, near |internal_Accumulate|: this way we won't need to use two different lockers.

@@ +1236,5 @@
>  
> +  reflectStatus status;
> +  {
> +    StaticMutexAutoLock locker(gTelemetryHistogramMutex);
> +    status = internal_ReflectHistogramSnapshot(cx, snapshot, h);

This |internal_ReflectHistogramSnapshot| does two things: it takes a snapshot of Histogram::SampleSet (which should be protected by the lock) and then mirrors it to a JS structure (which should be outside of the lock protection). We can change this part and have:

Histogram::SampleSet ss;
h->SnapshotSample(&ss);

Within the mutex lock where we fetch |h|. Then pass it to |internal_ReflectHistogramAndSamples| (instead of |internal_ReflectHistogramSnapshot|) here, without holding any lock.

@@ +1268,5 @@
>    HistogramID id = data->histogramId;
> +
> +  {
> +    StaticMutexAutoLock locker(gTelemetryHistogramMutex);
> +    MOZ_ASSERT(internal_IsHistogramEnumId(id));

Let's move |HistogramID id = ...| and the |MOZ_ASSERT(..)| below, just before |internal_ClearHistogram|. This way we won't need two locks.

@@ +1474,5 @@
>    HistogramID id = data->histogramId;
> +
> +  {
> +    StaticMutexAutoLock locker(gTelemetryHistogramMutex);
> +    MOZ_ASSERT(internal_IsHistogramEnumId(id));

Let's drop this change, so we don't have two locks.

@@ +1571,5 @@
>    HistogramID id = data->histogramId;
> +
> +  {
> +    StaticMutexAutoLock locker(gTelemetryHistogramMutex);
> +    MOZ_ASSERT(internal_IsHistogramEnumId(id));

Let's move this |MOZ_ASSERT()| within the AutoLock below.

@@ +1597,4 @@
>  bool
>  internal_JSKeyedHistogram_Snapshot(JSContext *cx, unsigned argc, JS::Value *vp)
>  {
> +  StaticMutexAutoLock locker(gTelemetryHistogramMutex);

Let's not do this: |internal_KeyedHistogram_SnapshotImpl| contains some JS calls and we don't want to hold the mutex while they're being executed.

@@ +1606,4 @@
>  internal_JSKeyedHistogram_SubsessionSnapshot(JSContext *cx,
>                                               unsigned argc, JS::Value *vp)
>  {
> +  StaticMutexAutoLock locker(gTelemetryHistogramMutex);

Let's not do this: |internal_KeyedHistogram_SnapshotImpl| contains some JS calls and we don't want to hold the mutex while they're being executed.

@@ +1622,4 @@
>      JS_ReportErrorASCII(cx, "No key arguments supported for snapshotSubsessionAndClear");
>    }
>  
> +  StaticMutexAutoLock locker(gTelemetryHistogramMutex);

Let's not do this: |internal_KeyedHistogram_SnapshotImpl| contains some JS calls and we don't want to hold the mutex while they're being executed.

@@ +1642,5 @@
>    HistogramID id = data->histogramId;
> +
> +  {
> +    StaticMutexAutoLock locker(gTelemetryHistogramMutex);
> +    MOZ_ASSERT(internal_IsHistogramEnumId(id));

Let's move the |MOZ_ASSERT(...)| a few lines below, in the scope of the other AutoLock.

@@ +1663,3 @@
>    if (!keyed) {
>      return true;
>    }

There's something else we could do right below this point. Let's move this (https://searchfox.org/mozilla-central/rev/919dce54f43356c22d6ff6b81c07ef412b1bf933/toolkit/components/telemetry/TelemetryHistogram.cpp#1612-1627) block right below |args.rval().setUndefined();|. Let's take |bool onlySubsession = false;| out of the #ifdef and only set it false after the #else. This will allow us to simply add |keyed->Clear(onlySubsession);| and the |if (!keyed)| validity check to the scoped lock above.

@@ +2174,5 @@
>        }
>  
> +      Histogram* h = nullptr;
> +      {
> +        StaticMutexAutoLock locker(gTelemetryHistogramMutex);

This is a bit tricky: we don't really want to hold the mutex and then release it for each histogram. We would rather want to take a snapshot of all the histograms with a single lock operation. Think of it like this:

-> lock
-> get a copy of all the histograms for all the processes
-> unlock
-> iterate through the copy of the histogram data and create the JavaScript structures

You can have a look at how this was solved in TelemetryScalars (https://searchfox.org/mozilla-central/rev/919dce54f43356c22d6ff6b81c07ef412b1bf933/toolkit/components/telemetry/TelemetryScalar.cpp#2131-2166)

@@ +2268,5 @@
>        }
>  
> +      KeyedHistogram* keyed = nullptr;
> +      {
> +        StaticMutexAutoLock locker(gTelemetryHistogramMutex);

The same considerations from TelemetryHistogram::CreateHistogramSnapshots apply here as well
Attachment #8930223 - Flags: review?(alessio.placitelli)
Hi Alessio,

Cool, thank you very much for the thorough review! I'll work on all the changes and come back with a patch (or more doubts!)

Thank you very much!
(In reply to Fernando from comment #11)
> Hi Alessio,
> 
> Cool, thank you very much for the thorough review! I'll work on all the
> changes and come back with a patch (or more doubts!)
> 
> Thank you very much!

You're welcome! Keep up the great work there and feel free to ask questions whenever you want :D
Hi Alessio!

Sorry for taking so long to come back with a patch, I got a cold (is that time of the year already here in London :( ) and I was unable to work on this for a while.

I've made modifications according to your comments. In some functions it was impossible (or so I think) to only have one lock, fortunately the problem could be solved if I didn't have to hold a lock when calling internal_isHistogramEnumId, so I took a look at the body of that function to see if I could do something, and it turns out that it's thread safe, it's ok if several threads call that method at the same time, we don't need to have a lock when calling internal_isHistogramEnumId (I think). Considering that, I've been able to change the code so all the internal_* calls appear after locking a mutex, and not locking a mutex more than once per method at the same time.

In "internal_JSHistogram_Add" I've replaced:  
    if(!internal_CanRecordBase()) {
      return true;
    }
    internal_Accumulate(id, value);
    return true;

by
 internal_Accumulate(id, value);
 return true;

Because the first thing inside internal_Accumulate is this:
if (!gInitDone || !internal_CanRecordBase() ||
      internal_RemoteAccumulate(aId, aKey, aSample)) {
    return;
  }

So, internal_CanRecordBase() is already checked in that function, we don't need to check it before.
I've ran "mach test toolkit/components/telemetry" and all tests pass.

I assume that the patch is not going to be perfect yet, but I hope that it's much more closer to what you have in mind!

Thank you very much!
Attached patch bug1278821.patch (obsolete) — Splinter Review
Attachment #8930223 - Attachment is obsolete: true
Attachment #8933911 - Flags: review?(alessio.placitelli)
(In reply to Fernando from comment #14)
> Created attachment 8933911 [details] [diff] [review]
> bug1278821.patch

Sorry for the delay, I only wanted to mention that I might not be able to get to this review quickly, so you might have to wait a couple of days more.
(In reply to Alessio Placitelli [:Dexter] from comment #15)
> (In reply to Fernando from comment #14)
> > Created attachment 8933911 [details] [diff] [review]
> > bug1278821.patch
> 
> Sorry for the delay, I only wanted to mention that I might not be able to
> get to this review quickly, so you might have to wait a couple of days more.

Hi Alessio,

No problem! thank you very much :)
Comment on attachment 8933911 [details] [diff] [review]
bug1278821.patch

Review of attachment 8933911 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the huge delay.

This is the right way to go. Most of the comments are nits due to indentation: please double check that :)

I only have 2 major comments about the snapshotting:

- we should make sure to copy the data while a lock is held, then release the lock, and eventually mirror the copied data to JS;
- the keyed snapshot function might be a bit more challenging, you are free to either work on that as part of this patch or file a follow-up bug to address it :)

::: toolkit/components/telemetry/TelemetryHistogram.cpp
@@ +1135,4 @@
>    JSHistogramData* data = static_cast<JSHistogramData*>(JS_GetPrivate(obj));
>    MOZ_ASSERT(data);
>    HistogramID id = data->histogramId;
> +

nit: let's remove this blank line

@@ +1142,5 @@
>    // This function should always return |undefined| and never fail but
>    // rather report failures using the console.
>    args.rval().setUndefined();
>  
> +  uint32_t type = gHistogramInfos[id].histogramType;

nit: let's move this line back to where it was originally, after the MOZ_ASSERT

@@ -1142,5 @@
>    // This function should always return |undefined| and never fail but
>    // rather report failures using the console.
>    args.rval().setUndefined();
>  
> -  if (!internal_CanRecordBase()) {

Good that you found |internal_Accumulate| takes care of that!

@@ +1143,5 @@
>    // rather report failures using the console.
>    args.rval().setUndefined();
>  
> +  uint32_t type = gHistogramInfos[id].histogramType;
> +  // If we don't have an argument for the count histogram, assume an increment of 1.

I don't think this comment should be here: it's identical to the one below (which is the original one) :) Let's leave it like that.

@@ -1190,4 @@
>      StaticMutexAutoLock locker(gTelemetryHistogramMutex);
>      internal_Accumulate(id, value);
>    }
> -

nit: let's keep this blank line

@@ +1220,5 @@
> +    // This is not good standard behavior given that we have histogram instances
> +    // covering multiple processes and two session types.
> +    // However, changing this requires some broader changes to callers.
> +    h = internal_GetHistogramById(id, ProcessID::Parent, SessionType::Session);
> +    MOZ_ASSERT(h);

nit: let's add a comment above this MOZ_ASSERT, explaining where this lines come from and why they are here :)

@@ +1456,4 @@
>    JSHistogramData* data = static_cast<JSHistogramData*>(JS_GetPrivate(obj));
>    MOZ_ASSERT(data);
>    HistogramID id = data->histogramId;
> +

nit: let's remove this newly added blank line

@@ +2088,5 @@
> +// snapshot should be created
> +struct HistogramProcessInfo {
> +  Histogram* h;
> +  Histogram::SampleSet ss;
> +  size_t index;

If we get rid of the |index| here, we can just use mozilla:MakePair in the function below.

@@ +2123,5 @@
>  
> +  // Only lock the mutex while accessing our internal data. Copy all the
> +  // histograms for all the processes, and make the JS calls later
> +  nsTArray<nsTArray<HistogramProcessInfo>>
> +          processHistArray(static_cast<uint32_t>(ProcessID::Count));

nit: let's use 2-spaces indent here

@@ +2128,5 @@
> +  {
> +    StaticMutexAutoLock locker(gTelemetryHistogramMutex);
> +    for (uint32_t process = 0; process < static_cast<uint32_t>(ProcessID::Count); ++process) {
> +      nsTArray<HistogramProcessInfo>* hArray = processHistArray.AppendElement();
> +      if(!hArray) {

nit: please add a space between the "if" and the "("

@@ +2141,5 @@
> +
> +        HistogramID id = HistogramID(i);
> +
> +        if (!CanRecordInProcess(info.record_in_processes, ProcessID(process)) ||
> +          ((ProcessID(process) == ProcessID::Gpu) && !includeGPUProcess)) {

nit: please align "((ProcessID" with "!CanRecordInProcess(...

@@ +2152,5 @@
> +
> +        bool shouldInstantiate =
> +          info.histogramType == nsITelemetry::HISTOGRAM_FLAG;
> +        Histogram* h = internal_GetHistogramById(id, ProcessID(process),sessionType,
> +                                      shouldInstantiate);

nits: please keep the formatting of this line: "sessionType" should go on a new line and "shouldInstantiate" on another new line, both aligned with the "id" on the first line.

@@ +2159,5 @@
> +        }
> +
> +        Histogram::SampleSet ss;
> +        h->SnapshotSample(&ss);
> +        if(!hArray->AppendElement(HistogramProcessInfo{h, ss, i})) {

nit: please add a space between the "if" and the "(" (also check that in other places as well :D)

@@ +2161,5 @@
> +        Histogram::SampleSet ss;
> +        h->SnapshotSample(&ss);
> +        if(!hArray->AppendElement(HistogramProcessInfo{h, ss, i})) {
> +          return NS_ERROR_FAILURE;
> +        }

We should clear the histogram after this point, as we need to hold the lock to do that. The rest of the information we need from the Histogram class are not tied to the data, but rather to the histogram definition (e.g. min, max, et..). We can add the whole

#if !defined(MOZ_WIDGET_ANDROID)
      if ((sessionType == SessionType::Subsession) && aClearSubsession) {
        h->Clear();
      }
#endif

here.

@@ +2181,5 @@
>  
> +    const nsTArray<HistogramProcessInfo>& hArray = processHistArray.ElementAt(process);
> +    for (size_t i = 0; i < hArray.Length(); ++i) {
> +      const HistogramProcessInfo& hData = hArray.ElementAt(i);
> +      uint32_t histogramIndex = hData.index;

Since the histogram snapshots are added in order, can't we use "i" here? If we can, then we are able to remove "index" from the HistogramProcessInfo structure and just use |mozilla::MakePair| for the Histogram/SampleSet.

@@ +2240,5 @@
>      includeGPUProcess = gpm->AttemptedGPUProcess();
>    }
>  
> +  using KeyedHistogramDataPair = mozilla::Pair<KeyedHistogram*, size_t>;
> +  nsTArray<nsTArray<KeyedHistogramDataPair>> processHistArray;

nit: please leave a blank line below this one

@@ +2245,3 @@
>    for (uint32_t process = 0; process < static_cast<uint32_t>(ProcessID::Count); ++process) {
> +    nsTArray<KeyedHistogramDataPair>* hArray = processHistArray.AppendElement();
> +    if(!hArray) {

nit: add the whitespace :)

@@ +2262,4 @@
>          continue;
>        }
>  
> +      StaticMutexAutoLock locker(gTelemetryHistogramMutex);

I think we should acquire the mutex just once, outside of the outer loop.

@@ +2266,3 @@
>        KeyedHistogram* keyed = internal_GetKeyedHistogramById(HistogramID(id),
> +                                              ProcessID(process),
> +                          /* instantiate = */ false);

nit: this should be aligned with "ProcessID"

@@ +2270,4 @@
>        if (!keyed) {
>          continue;
>        }
> +      if(!hArray->AppendElement(mozilla::MakePair(keyed, id))) {

This is just copying the pointer to the keyed scalar while the lock is held, not the data itself. This means that, after we release the mutex, other threads can still mess with the data. We should make a copy of the data and clear, see |GetJSSnapshot|. However, this is a bit more convoluted compared to the non-keyed one. For this patch, I'm happy to leave this function as it is and file a follow-up bug for fixing this specific function.
Attachment #8933911 - Flags: review?(alessio.placitelli) → feedback+
Hi Alessio,

Thank you very much for the review! Sorry about the indentation problems, I'll pay more attention next time.
I'm going to address all your comments and I will try to fix the keyed snapshot function too :)

Thank you!
Hi Alessio,

Just wanted to ask a couple of things.

@@ +2181,5 @@
>  
> +    const nsTArray<HistogramProcessInfo>& hArray = processHistArray.ElementAt(process);
> +    for (size_t i = 0; i < hArray.Length(); ++i) {
> +      const HistogramProcessInfo& hData = hArray.ElementAt(i);
> +      uint32_t histogramIndex = hData.index;

"Since the histogram snapshots are added in order, can't we use "i" here? If we can, then we are able to remove "index" from the HistogramProcessInfo structure and just use |mozilla::MakePair| for the Histogram/SampleSet."
The only problem with that approach is that, although the histogram snapshots are added in order, their position in the array (hArray) does not correspond to their id (because we can skip some histogram snapshots, for example when internal_IsExpired returns true):
        if (!h || internal_IsExpired(h) || !internal_ShouldReflectHistogram(h, id)) {
          continue;
        }
For example, if the histogram snapshot with id 1 is skipped, then the next histogram snapshot added to the array will occupy the position 1, but it's id will be different. That's why I used the index field in the struct. We could use something different, for example a map of id to a mozilla::Pair, and then iterate over the map.

2270,4 @@
>        if (!keyed) {
>          continue;
>        }
> +      if(!hArray->AppendElement(mozilla::MakePair(keyed, id))) {

"This is just copying the pointer to the keyed scalar while the lock is held, not the data itself. This means that, after we release the mutex, other threads can still mess with the data. We should make a copy of the data and clear, see |GetJSSnapshot|."
The KeyedHistogram class has a member that is a map from nsCStringHashKey to Histogram*. I see that the Clear() method (called in GetJSSnapshot) clears that map and calls delete on the Histogram* values stored in the map. If we are going to call Clear just after copying the data, I need to copy the Histogram values (not just the pointers, because they are going to be deleted in the call to Clear, but the values) and store them in an array or some other data structure, am I right? If that's the case, I don't see any method to get the internal map used by KeyedHistogram, is it ok to add a method to return a const reference to that internal map?

Thank you very much!
(In reply to Fernando from comment #19)
> Hi Alessio,
> 
> Just wanted to ask a couple of things.
> 
> @@ +2181,5 @@
> >  
> > +    const nsTArray<HistogramProcessInfo>& hArray = processHistArray.ElementAt(process);
> > +    for (size_t i = 0; i < hArray.Length(); ++i) {
> > +      const HistogramProcessInfo& hData = hArray.ElementAt(i);
> > +      uint32_t histogramIndex = hData.index;
> 
> "Since the histogram snapshots are added in order, can't we use "i" here? If
> we can, then we are able to remove "index" from the HistogramProcessInfo
> structure and just use |mozilla::MakePair| for the Histogram/SampleSet."
> The only problem with that approach is that, although the histogram
> snapshots are added in order, their position in the array (hArray) does not
> correspond to their id (because we can skip some histogram snapshots, for
> example when internal_IsExpired returns true):
>         if (!h || internal_IsExpired(h) ||
> !internal_ShouldReflectHistogram(h, id)) {
>           continue;
>         }
> For example, if the histogram snapshot with id 1 is skipped, then the next
> histogram snapshot added to the array will occupy the position 1, but it's
> id will be different. That's why I used the index field in the struct. We
> could use something different, for example a map of id to a mozilla::Pair,
> and then iterate over the map.

Good point, you're right!

> 2270,4 @@
> >        if (!keyed) {
> >          continue;
> >        }
> > +      if(!hArray->AppendElement(mozilla::MakePair(keyed, id))) {
> 
> "This is just copying the pointer to the keyed scalar while the lock is
> held, not the data itself. This means that, after we release the mutex,
> other threads can still mess with the data. We should make a copy of the
> data and clear, see |GetJSSnapshot|."
> The KeyedHistogram class has a member that is a map from nsCStringHashKey to
> Histogram*. I see that the Clear() method (called in GetJSSnapshot) clears
> that map and calls delete on the Histogram* values stored in the map. If we
> are going to call Clear just after copying the data, I need to copy the
> Histogram values (not just the pointers, because they are going to be
> deleted in the call to Clear, but the values) and store them in an array or
> some other data structure, am I right? If that's the case, I don't see any
> method to get the internal map used by KeyedHistogram, is it ok to add a
> method to return a const reference to that internal map?
> 
> Thank you very much!

Yes please, let's add another method for this. You will notice that |GetJSSnapshot| won't be used anymore after you do that :) Since this may be a bit more complicated than the rest of the patch, I would suggest we try to finish up the work without this keyed histogram snapshotting part and file a new bug for it. You will be able to pick it up right after this one ;)
Hi Alessio,

I've addressed the issues that you mentioned, and I've also reverted the changes that I made on the keyed snapshot function so that work can be done under a new bug.

I hope everything is fine now :)

I've run the tests (by running ./mach test toolkit/components/telemetry) and I can see in the logs that they pass, but there is an error in the (I think) last stage:
"
Error running mach:
 
    ['test', 'toolkit/components/telemetry']
 
The error occurred in code that was called by the mach command. This is either
a bug in the called code itself or in the way that mach is calling it.
 
You should consider filing a bug for this issue.
 
If filing a bug, please include the full output of mach, including this error
message.
 
The details of the failure are as follows:
 
TypeError: run_python_tests() got an unexpected keyword argument 'log'
"

I've asked in the IRC developers channel, and other people see the same error (so I know it's not related to my changes), I think something is broken in toolkit/components/telemetry/tests/python, should I report a bug for this?

Thank you very much!
Attached patch bug1278821.patch (obsolete) — Splinter Review
Attachment #8933911 - Attachment is obsolete: true
Attachment #8940548 - Flags: review?(alessio.placitelli)
(In reply to Fernando from comment #21)
> Hi Alessio,
> 
> I've addressed the issues that you mentioned, and I've also reverted the
> changes that I made on the keyed snapshot function so that work can be done
> under a new bug.
> 
> I hope everything is fine now :)

Thank you very much for your patience and perseverance :)

> I've asked in the IRC developers channel, and other people see the same
> error (so I know it's not related to my changes), I think something is
> broken in toolkit/components/telemetry/tests/python, should I report a bug
> for this?
> 
> Thank you very much!

Yes, it's unrelated, I filed bug 1427703 a few days ago for this :)
Comment on attachment 8940548 [details] [diff] [review]
bug1278821.patch

Review of attachment 8940548 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks Fernando, there are 3 nits left and we're done!

::: toolkit/components/telemetry/TelemetryHistogram.cpp
@@ +1207,2 @@
>  
>    JS::Rooted<JSObject*> snapshot(cx, JS_NewPlainObject(cx));

nit: let's move this block (JS::Rooted + the if below) right before | reflectStatus status = internal_ReflectHistogramAndSamples(cx, snapshot, h, ss);| down below.

@@ +2085,5 @@
>  }
>  
> +// Struct used to keep information about the histograms for which a
> +// snapshot should be created
> +struct HistogramProcessInfo {

nit: this is only used in |CreateHistogramSnapshots|, let's move this definition before the comment preceding |nsTArray<nsTArray<HistogramProcessInfo>> ...| within that function.

@@ +2205,3 @@
>        case REFLECT_FAILURE:
>          return NS_ERROR_FAILURE;
>        case REFLECT_OK:

We're clearing the histogram right after getting the data snapshot earlier, so we can remove the bogus clear block below, at around line 2214:

#if !defined(MOZ_WIDGET_ANDROID)
      if ((sessionType == SessionType::Subsession) && aClearSubsession) {
        h->Clear();
      }
#endif

The histogram should already be cleared at this point :)
Attachment #8940548 - Flags: review?(alessio.placitelli) → feedback+
Hi Alessio,

Thank you very much! And great to see that the bug was already reported!

Here is a new patch with you comments addressed. Thank you very much for your mentoring and patience :)
Attached patch bug1278821.patch (obsolete) — Splinter Review
Attachment #8940548 - Attachment is obsolete: true
Attachment #8940819 - Flags: review?(alessio.placitelli)
Comment on attachment 8940819 [details] [diff] [review]
bug1278821.patch

The code looks good now, thank you for your hard work on this patch.

Sorry, but I missed one thing in all the previous reviews: the patch file is missing a commit message :) Would you mind adding one?

You can find out more about our commit messages conventions here: https://developer.mozilla.org/en-US/docs/Mercurial/Using_Mercurial#Commit_Message_Conventions
Attachment #8940819 - Flags: review?(alessio.placitelli) → feedback+
(In reply to Alessio Placitelli [:Dexter] from comment #27)
> Comment on attachment 8940819 [details] [diff] [review]
> bug1278821.patch
> 
> The code looks good now, thank you for your hard work on this patch.
> 
> Sorry, but I missed one thing in all the previous reviews: the patch file is
> missing a commit message :) Would you mind adding one?
> 
> You can find out more about our commit messages conventions here:
> https://developer.mozilla.org/en-US/docs/Mercurial/
> Using_Mercurial#Commit_Message_Conventions

Ops, sorry! Here you can find the patch with the commit message.

Thank you very much!
Attached patch bug1278821.patch (obsolete) — Splinter Review
Attachment #8941164 - Flags: review?(alessio.placitelli)
Comment on attachment 8941164 [details] [diff] [review]
bug1278821.patch

Review of attachment 8941164 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good now, thank you for your patience and efforts! Let me push this to our try servers: if everything goes as planned, and nothing breaks, we will go ahead and land this.
Attachment #8941164 - Flags: review?(alessio.placitelli) → review+
Attachment #8940819 - Attachment is obsolete: true
Fernando, looks like some try jobs are failing: https://treeherder.mozilla.org/#/jobs?repo=try&revision=52f2bea3d5725c083a50d4de571a4b9d1bafb231&selectedJob=155275074

Would you kindly look into this?
Flags: needinfo?(mortificador)
(In reply to Alessio Placitelli [:Dexter] from comment #32)
> Fernando, looks like some try jobs are failing:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=52f2bea3d5725c083a50d4de571a4b9d1bafb231&selectedJob=1
> 55275074
> 
> Would you kindly look into this?

Hi Alessio,

The error is:
 /builds/worker/workspace/build/src/obj-firefox/dist/include/nsTArray.h:679:34: error: Cannot instantiate 'nsTArray_CopyChooser<HistogramProcessInfo>' with non-memmovable template argument 'HistogramProcessInfo'

I think this error is caught by a custom clang tool that runs on treeherder (and that's why I didn't see the error when I compiled locally). 
HistogramProcessInfo has a member Histogram::SampleSet, and Histogram::SampleSet has a member std::vector, and I think that the tool thinks that std::vector is not memmovable (obviously the std library does not have the MOZ_NON_MEMMOVABLE attribute, so I guess there is code in the custom tool that deals with the std classes?). If that's the case, I think that the class Histogram::SampleSet should have the attribute MOZ_NON_MEMMOVABLE, right?

This evening I will submit a patch where I'll add the attribute MOZ_NON_MEMMOVABLE to HistogramProcessInfo and I will use a std::vector instead of a nsTArray, is that ok?

Thank you very much!
Flags: needinfo?(mortificador)
(In reply to Fernando from comment #33)
> (In reply to Alessio Placitelli [:Dexter] from comment #32)
> > Fernando, looks like some try jobs are failing:
> > https://treeherder.mozilla.org/#/
> > jobs?repo=try&revision=52f2bea3d5725c083a50d4de571a4b9d1bafb231&selectedJob=1
> > 55275074
> > 
> > Would you kindly look into this?

> This evening I will submit a patch where I'll add the attribute
> MOZ_NON_MEMMOVABLE to HistogramProcessInfo and I will use a std::vector
> instead of a nsTArray, is that ok?

Good catch! Let's not touch the HistogramSet part for now, but just prepend MOZ_NON_MEMMOVABLE to the HistogramProcessInfo structure and see if that works. I triggered a try build to test that as I cannot reproduce locally, let's see  how that works.
(In reply to Alessio Placitelli [:Dexter] from comment #34)
> > This evening I will submit a patch where I'll add the attribute
> > MOZ_NON_MEMMOVABLE to HistogramProcessInfo and I will use a std::vector
> > instead of a nsTArray, is that ok?
> 
> Good catch! Let's not touch the HistogramSet part for now, but just prepend
> MOZ_NON_MEMMOVABLE to the HistogramProcessInfo structure and see if that
> works. I triggered a try build to test that as I cannot reproduce locally,
> let's see  how that works.

Allright, that didn't fix the problem and turns out we can change HistogramSet if that fixes the issue :)
(In reply to Alessio Placitelli [:Dexter] from comment #35)
> (In reply to Alessio Placitelli [:Dexter] from comment #34)
> > > This evening I will submit a patch where I'll add the attribute
> > > MOZ_NON_MEMMOVABLE to HistogramProcessInfo and I will use a std::vector
> > > instead of a nsTArray, is that ok?
> > 
> > Good catch! Let's not touch the HistogramSet part for now, but just prepend
> > MOZ_NON_MEMMOVABLE to the HistogramProcessInfo structure and see if that
> > works. I triggered a try build to test that as I cannot reproduce locally,
> > let's see  how that works.
> 
> Allright, that didn't fix the problem and turns out we can change
> HistogramSet if that fixes the issue :)

I think that the only way to fix it is changing from nsTArray<HistogramProcessInfo> to std::vector<HistogramProcessInfo> (I think nsTArray won't work with a class that is nonmovable), I suggested adding the attribute to HistogramSet because I think that if the class had the attribute we could have avoided this problem in the first place.

If you think that this is ok, I'll submit a patch this evening with those changes.

Thank you very much!
Hi Alessio,

New day, new patch :) as said, here you can find the new patch: I've added the attribute MOZ_NON_MEMMOVABLE to HistogramProcessInfo and change nsTArray and now we use a std::vector instead.

I've checked and all tests pass, please let me know if there is any problem!

Thank you very much!!
Attached patch bug1278821.patch (obsolete) — Splinter Review
Attachment #8941164 - Attachment is obsolete: true
Attachment #8941615 - Flags: review?(alessio.placitelli)
Comment on attachment 8941615 [details] [diff] [review]
bug1278821.patch

Review of attachment 8941615 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/TelemetryHistogram.cpp
@@ +2136,5 @@
> +    Histogram::SampleSet ss;
> +    size_t index;
> +  };
> +
> +  std::vector<std::vector<HistogramProcessInfo>>

We should prefer using Mozilla libraries in place of stl ones, when possible (See https://developer.mozilla.org/en-US/docs/Mozilla/Using_CXX_in_Mozilla_code#C_and_Mozilla_standard_libraries ). Luckily, there's a |mozilla::Vector| implementation that can easily come to our rescue :) I've sketched out a fix using an |nsDataHashtable| as the outer container (https://treeherder.mozilla.org/#/jobs?repo=try&revision=b1d213e00fb1cf13f3ceb0807bd0f0e02391b98a&selectedJob=155576783) and mozilla::Vector as the inner one. This seems to work, but feel free to use mozilla::Vector for the outer per-process storage as well if you want.
Attachment #8941615 - Flags: review?(alessio.placitelli)
(In reply to Alessio Placitelli [:Dexter] from comment #39)
> Comment on attachment 8941615 [details] [diff] [review]
> bug1278821.patch
> 
> Review of attachment 8941615 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: toolkit/components/telemetry/TelemetryHistogram.cpp
> @@ +2136,5 @@
> > +    Histogram::SampleSet ss;
> > +    size_t index;
> > +  };
> > +
> > +  std::vector<std::vector<HistogramProcessInfo>>
> 
> We should prefer using Mozilla libraries in place of stl ones, when possible
> (See
> https://developer.mozilla.org/en-US/docs/Mozilla/
> Using_CXX_in_Mozilla_code#C_and_Mozilla_standard_libraries ). Luckily,
> there's a |mozilla::Vector| implementation that can easily come to our
> rescue :) I've sketched out a fix using an |nsDataHashtable| as the outer
> container
> (https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=b1d213e00fb1cf13f3ceb0807bd0f0e02391b98a&selectedJob=1
> 55576783) and mozilla::Vector as the inner one. This seems to work, but feel
> free to use mozilla::Vector for the outer per-process storage as well if you
> want.

Hi Alessio,

Thank you very much! I've attached another patch, using now mozilla::Vector instead of std::vector (I use a mozilla::Vector for the outer per-process storage too).

Hopefully this will be the definitive one :)

Thank you very much for your help!!
Attached patch bug1278821.patchSplinter Review
Attachment #8941615 - Attachment is obsolete: true
Attachment #8941976 - Flags: review?(alessio.placitelli)
Comment on attachment 8941976 [details] [diff] [review]
bug1278821.patch

Review of attachment 8941976 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me now! Let's see if everything goes smoothly on treeherder ;)
Attachment #8941976 - Flags: review?(alessio.placitelli) → review+
Pushed by archaeopteryx@coole-files.de:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b8ded26d5d19
Re-factor JS-exposed functions in TelemetryHistogram.cpp to avoid deadlocks. r=Dexter
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/b8ded26d5d19
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
Blocks: 1435812
You need to log in before you can comment on or make changes to this bug.