Refactor lock-protected TelemetryHistogram code

RESOLVED FIXED in Firefox 62

Status

()

enhancement
P4
normal
RESOLVED FIXED
2 years ago
a year ago

People

(Reporter: gfritzsche, Assigned: mcalabrese85, Mentored)

Tracking

(Blocks 3 bugs)

Trunk
mozilla62
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox62 fixed)

Details

(Whiteboard: [lang=c++])

Attachments

(1 attachment, 4 obsolete attachments)

In TelemetryHistogram.cpp [1], all functions that are named `internal_*()` are meant to be called with a global TelemetryHistogram lock to be held (on `gTelemetryHistogramMutex`).

A safer pattern is to require passing a lock instance as seen in [2].
We should refactor TelemetryHistogram to that pattern, so e.g.:
> void
> internal_SetHistogramRecordingEnabled(...)
... becomes
> void
> SetHistogramRecordingEnabled(const StaticMutexAutoLock& lock, ...)
... and callers become:
> internal_SetHistogramRecordingEnabled(locker, ...);

1: https://dxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/TelemetryHistogram.cpp
2: https://dxr.mozilla.org/mozilla-central/rev/17d8a1e278a9c54a6fdda9d390abce4077e55b20/toolkit/components/telemetry/TelemetryEvent.cpp#280
Reporter

Comment 1

2 years ago
This will require:
- Changing TelemetryHistogram.cpp as described.
- We will also need to update the comments to match the code changes.
- Making sure tests pass:
  - first: mach test toolkit/components/telemetry/tests/unit/test_TelemetryHistograms.js
  - before submitting a patch, also make sure the other tests pass too: mach test toolkit/components/telemetry/tests/unit/
Reporter

Comment 2

2 years ago
Note that the following functions can't be adopted yet:
- internal_JSHistogram_*()
- internal_WrapAndReturnHistogram()
- internal_KeyedHistogram_SnapshotImpl()
- internal_JSKeyedHistogram_*()
- internal_WrapAndReturnKeyedHistogram()

This is due to bug 1278821.
Reporter

Updated

2 years ago
Mentor: chutten
I was able to build successfully. The patch passed all tests.

Some notes:

The methods |KeyedHistogram::Add()|, |KeyedHistogram::GetHistogram()| and |KeyedHistogram::GetEnumId()| also now require a lock instance to be passed as a parameter, as they are called from |internal_*()| methods and also call certain |internal_*()| methods (specifically, KeyedHistogram::GetHistogram() calls internal_HistogramGet() and KeyedHistogram::GetEnumId() calls internal_GetHistogramEnumId()).

The method |KeyedHistogram::ReflectKeyedHistogram()| does not require passing a lock instance even though it calls |internal_ReflectHistogramSnapshot()|. It, along with a few of the listed methods that cannot be adopted yet, but call other |internal_*()| methods, acquires |StaticMutexAutoLock locker(gTelemetryHistogramMutex);| with a limited scope.
Attachment #8866551 - Flags: review?(chutten)
Comment on attachment 8866551 [details] [diff] [review]
Proposed patch for bug 1362953.

Review of attachment 8866551 [details] [diff] [review]:
-----------------------------------------------------------------

Overall you should not be creating any lockers. If you find yourself in a situation where you have to, then it gets interesting.

If any call descending from where you think you need a new locker might call into the JS engine, then we can't do it. Deadlocks will hit us. In that case we need to remove the locker parameter from those internal_* methods and accept that we may race.

If it doesn't call into the JS engine, then a tightly-scoped new locker may be possible. Call these cases out specifically when you put it up for review so that I don't accidentally miss one :)

::: toolkit/components/telemetry/TelemetryHistogram.cpp
@@ +858,5 @@
> +// it is used, and (2) because it is never de-initialised, and
> +// a normal Mutex would show up as a leak in BloatView.  StaticMutex
> +// also has the "OffTheBooks" property, so it won't show as a leak
> +// in BloatView.
> +static StaticMutex gTelemetryHistogramMutex;

It doesn't appear as though you need to move these lines

@@ +1081,5 @@
>      return false;
>    }
>  
> +  {
> +    StaticMutexAutoLock locker(gTelemetryHistogramMutex);

This is likely to cause problems as reflecting the histogram snapshot could cause the JS engine to try to allocate memory which could cause it to start a GC which would cause it to accumulate telemetry which would deadlock on trying to reacquire this mutex.

Your patch should not actually contain any StaticMutexAutoLock creations. It should only pass the ones around that already have been created.

@@ +2058,5 @@
>  TelemetryHistogram::Accumulate(mozilla::Telemetry::HistogramID aID,
>                                 const nsCString& aKey, uint32_t aSample)
>  {
> +  {
> +    StaticMutexAutoLock locker(gTelemetryHistogramMutex);

In this case and a few others in your patch you can just raise the original `locker` to the top of the function.
Attachment #8866551 - Flags: review?(chutten) → review-
(In reply to Chris H-C :chutten from comment #4)
> Comment on attachment 8866551 [details] [diff] [review]
> Proposed patch for bug 1362953.
> 
> Review of attachment 8866551 [details] [diff] [review]:
> -----------------------------------------------------------------

> Your patch should not actually contain any StaticMutexAutoLock creations. It
> should only pass the ones around that already have been created.

Oops. I thought it needs to be enforced to all |internal_*()| methods, which necessitated adding more. I've fixed this now.
 
> @@ +2058,5 @@
> >  TelemetryHistogram::Accumulate(mozilla::Telemetry::HistogramID aID,
> >                                 const nsCString& aKey, uint32_t aSample)
> >  {
> > +  {
> > +    StaticMutexAutoLock locker(gTelemetryHistogramMutex);
> 
> In this case and a few others in your patch you can just raise the original
> `locker` to the top of the function.

That would've worked too. I thought there was a specific reason behind keeping it lower, and hence, chose fine-grained locking over coarse-grained locking. I've fixed this now.
I was able to build successfully. The patch passed all tests.

I've added a tightly scoped locker to internal_IdentifyCorruptHistograms(), even though it is called by TelemetryHistogram::CreateHistogramSnapshots(), as it would allow enforcing the pattern to internal_Accumulate(), and subsequently, to internal_RemoteAccumulate(), internal_HistogramAdd() and internal_CanRecordExtended().

For the above-mentioned implementation, the declaration of |gTelemetryHistogramMutex| has been moved up.

The following functions have been left untouched :

** internal_CanRecordBase() -> called by internal_JSHistogram_Add()

** internal_GetHistogramEnumId() -> called by internal_JSHistogram_Add()

** internal_HistogramClear() -> called by internal_JSHistogram_Clear()

** internal_ReflectHistogramSnapshot() -> called by internal_JSHistogram_Snapshot(), internal_KeyedHistogram_SnapshotImpl(), and TelemetryHistogram::CreateHistogramSnapshots()

** internal_IsEmpty() -> called by TelemetryHistogram::CreateHistogramSnapshots()

** internal_IsExpired() -> called by TelemetryHistogram::CreateHistogramSnapshots()

** internal_ShouldReflectHistogram() -> called only by TelemetryHistogram::CreateHistogramSnapshots()

** internal_IdentifyCorruptHistograms() -> called only by TelemetryHistogram::CreateHistogramSnapshots()

** internal_GetHistogramByEnumId() -> called by TelemetryHistogram::CreateHistogramSnapshots()

** internal_GetSubsessionHistogram() -> called by TelemetryHistogram::CreateHistogramSnapshots()

** internal_GetHistogramMapEntry() -> called by internal_GetHistogramEnumId(), which is further called by internal_JSHistogram_Add().
Since internal_GetHistogramEnumId() can also be called by functions possessing the lock, it cannot try to possess lock again.

** internal_CloneHistogram() -> called by internal_GetSubsessionHistogram(), which is further called by TelemetryHistogram::CreateHistogramSnapshots().
Since internal_GetSubsessionHistogram() can also be called by functions possessing the lock, it cannot try to possess the lock again.

** internal_HistogramGet() -> called by internal_GetHistogramByEnumId(), which is further called by TelemetryHistogram::CreateHistogramSnapshots().
Since internal_GetHistogramByEnumId() can also be called by functions possessing the lock, it cannot try to possess the lock again.

** internal_ReflectHistogramAndSamples() -> called only by internal_ReflectHistogramSnapshot(). Therefore, left untouched.

** internal_FillRanges() -> called only by internal_ReflectHistogramAndSamples(). Therefore, left untouched.

** internal_CheckHistogramArguments() -> called only by internal_HistogramGet(). Therefore, left untouched.
Attachment #8866551 - Attachment is obsolete: true
Attachment #8867307 - Flags: review?(chutten)
Comment on attachment 8867307 [details] [diff] [review]
Rectified patch for bug 1362953.

Review of attachment 8867307 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/TelemetryHistogram.cpp
@@ +742,5 @@
>        } else if (check & Histogram::COUNT_LOW_ERROR) {
>          corruptID = mozilla::Telemetry::TOTAL_COUNT_LOW_ERRORS;
>        }
> +      {
> +        StaticMutexAutoLock locker(gTelemetryHistogramMutex);

If this is okay, then the entirety of internal_IdentifyCorruptHistograms is okay. If the entirety's okay, then you can push the tightly-scoped locker into TelemetryHistogram::CreateHistogramSnapshots around the call site. Then this would be one fewer internal_* function missing a locker.

@@ +1174,4 @@
>  }
>  
>  bool
> +internal_RemoteAccumulate(const StaticMutexAutoLock& lock, 

end-of-line whitespace should be omitted.

@@ +1209,4 @@
>    return true;
>  }
>  
> +void internal_Accumulate(const StaticMutexAutoLock& lock, 

more EOL whitespace
Attachment #8867307 - Flags: review?(chutten) → review-

Updated

2 years ago
Assignee: nobody → vedant.sareen
Status: NEW → ASSIGNED
Attachment #8867307 - Attachment is obsolete: true
Attachment #8867849 - Flags: review?(chutten)
Comment on attachment 8867849 [details] [diff] [review]
Cleaned up patch for bug 1362953.

Review of attachment 8867849 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, I'll send this to try tomorrow morning.

In the meantime, if you're looking at getting the rest of these in Part 2, may I recommend tracking your work on bug 1278821? Apparently we have a bug for it we've been neglecting this past little while :)
Attachment #8867849 - Flags: review?(chutten) → review+
Reporter

Updated

2 years ago
Blocks: 1278821
It seems as though some of the media tests are, once again, showing us when Telemetry locks up. Somewhere in this patch we must have introduced a deadlock.

:jseward, there are tools available to help us out with this, right? I seem to recall the name "TSan" being bandied about, is that available on all of our Tier 1 platforms? (and is there documentation about running it?)
Flags: needinfo?(jseward)
TSan is one of those tools that "probably works for Firefox" most of the time,
but AFAIK nobody actually ensures that Fx always works on it.  It's only
available on x86_64-linux.  I haven't run it for about six months.

If you can reproduce the hangs in a debugger, obviously the thing to do is look
at the stacks of the 2 (or more) threads involved.

If not, you have a difficult problem.  One thing you could look for is
inconsistencies in lock order acquisition.  That is, if there are two locks, A
and B, that must be acquired in order to complete some activity, then all
threads must acquire them in the same order.  It doesn't matter what the order
is (it can be A first and then B, or B first and then A).  What can happen if
the order is inconsistent, is that one thread can acquire A and then wait for
B, whilst another acquires B and then waits for A.  So they deadlock.

I believe that debug builds of Firefox have some amount lock cycle detection
built in, although I suspect that doesn't apply to all flavours of locks.
Nathan might know more.
Flags: needinfo?(jseward) → needinfo?(nfroyd)
(In reply to Julian Seward [:jseward] from comment #12)
> I believe that debug builds of Firefox have some amount lock cycle detection
> built in, although I suspect that doesn't apply to all flavours of locks.
> Nathan might know more.

Yes, we'll do deadlock detection for mozilla::Mutex.  We won't do it for the lock flavor used by the IPC code, which appears to be implicated in some of the crash stacks from that try run.  Perhaps converting those locks to mozilla::Mutex, which would be an excellent cleanup on its own, might assist you in finding the bug?
Flags: needinfo?(nfroyd)
According to the comments[1] we are using StaticMutex to ensure thread-safe initialization and to skip BloatView. Are those concerns no longer founded? (can we convert safely?)

[1]: http://searchfox.org/mozilla-central/source/toolkit/components/telemetry/TelemetryHistogram.cpp#1135
Flags: needinfo?(nfroyd)
(In reply to Chris H-C :chutten from comment #14)
> According to the comments[1] we are using StaticMutex to ensure thread-safe
> initialization and to skip BloatView. Are those concerns no longer founded?
> (can we convert safely?)

Oh, sorry.  When I said "lock flavor used by the IPC code", I meant the locks used primarily (only?) by the code under $SRCDIR/ipc:

http://dxr.mozilla.org/mozilla-central/source/ipc/chromium/src/base/lock.h#13

It looked like you might be calling into that code from Telemetry and dead/livelocking in some way, which would be difficult to detect with the IPC-specific locks not participating in our deadlock detection.  (TSan runs ought to be able to catch that sort of thing, but our deadlock detection would have the virtue of working well on try, so...)

The comments that you refer to are still valid.
Flags: needinfo?(nfroyd)
(To be clear, the errors in communication were entirely on my end!)
Reporter

Updated

2 years ago
Blocks: 1277552
Reporter

Updated

2 years ago
Blocks: 1370565
Reporter

Updated

2 years ago
Blocks: 1381470
Reporter

Updated

2 years ago
No longer blocks: 1381470
Reporter

Updated

2 years ago
Blocks: 1362957
Reporter

Comment 17

2 years ago
Hi Vedant, are you still working on this?
Flags: needinfo?(gfritzsche)
Reporter

Updated

2 years ago
Flags: needinfo?(gfritzsche) → needinfo?(vedantsareen)
Hey!

Yes, I am still working on it. However, my pace of work has been extremely slow, as I've been caught up in a lot of other work too :/

Since the bug activity has been idle for 4 months, I guess it would make sense to mark this bug as unassigned for now, just in case someone else is also interested in working on it.

If the bug remains free and if I'm able to make some progress in this, I'll make sure to upload a WIP patch for it :)

I humbly apologize for the inconvenience caused :/
Flags: needinfo?(vedantsareen)
Reporter

Comment 19

2 years ago
That sounds good to me :)
And no worries, thanks for your work on it so far!
Assignee: vedantsareen → nobody
Status: ASSIGNED → NEW
Priority: P3 → P4
Hi, I'd like to try fixing this bug. Where should I start?
Flags: needinfo?(gfritzsche)
Reporter

Comment 21

2 years ago
Chris spent some time on refactoring the TelemetryHistogram code recently.
Chris, can you help get Leo started?
Flags: needinfo?(gfritzsche) → needinfo?(chutten)
The first step would be to get the Firefox source code and build it. This is not an easy bug, so I will trust that you are able to find out how to do this on your own. If you can't, no worries! We have lots of easy bugs to start with[1] and you can always come back to this one later.

The second step would be to carefully read the bug comments above. There is a lot of information we learned through the first attempt at this bug, and any successful attempt would do well to keep those lessons in mind.

The third will be to rework the TelemetryHistogram code so that TelemetryHistogram::* methods continue to generate locks, passing them as proof to internal_* methods. (see early comments on this bug)

The fourth will be to test this locally. It should build without errors or warnings, and ./mach test toolkit/components/telemetry/tests should succeed without errors or deadlock or timeout.

The fifth will be to submit the code here as a patch that I can review. You can use MozReview or attach the patch as a file to this bug.

Then we review and retest for a bit. Then we push the patch to try to make sure it builds and tests on all of the Operating Systems we need to support.

Then we push it to mozilla-central and it gets included in a build!

Do you wish to continue with this bug?

[1]: https://www.joshmatthews.net/bugsahoy/?internals=1
Flags: needinfo?(chutten) → needinfo?(lkhodel)
(In reply to Chris H-C :chutten from comment #22)
> The first step would be to get the Firefox source code and build it. This is
> not an easy bug, so I will trust that you are able to find out how to do
> this on your own. If you can't, no worries! We have lots of easy bugs to
> start with[1] and you can always come back to this one later.

No problem, I've landed a very simple bug fix, so I know the process.

> 
> The second step would be to carefully read the bug comments above. There is
> a lot of information we learned through the first attempt at this bug, and
> any successful attempt would do well to keep those lessons in mind.

Will do!

> 
> The third will be to rework the TelemetryHistogram code so that
> TelemetryHistogram::* methods continue to generate locks, passing them as
> proof to internal_* methods. (see early comments on this bug)
> 
> The fourth will be to test this locally. It should build without errors or
> warnings, and ./mach test toolkit/components/telemetry/tests should succeed
> without errors or deadlock or timeout.
> 
> The fifth will be to submit the code here as a patch that I can review. You
> can use MozReview or attach the patch as a file to this bug.
> 
> Then we review and retest for a bit. Then we push the patch to try to make
> sure it builds and tests on all of the Operating Systems we need to support.
> 
> Then we push it to mozilla-central and it gets included in a build!

Got it.

> 
> Do you wish to continue with this bug?
> 
> [1]: https://www.joshmatthews.net/bugsahoy/?internals=1

Yes, I do. I'll get started and will let you know as soon as I have any questions.

Thanks!
Flags: needinfo?(lkhodel)

Updated

2 years ago
Assignee: nobody → lkhodel
Status: NEW → ASSIGNED
Hello :lvk, how are things progressing here? Anything I can help with?
Flags: needinfo?(lkhodel)
Hi :chutten, I'm getting familiar with the code that I'm going to be changing but I got ahead myself trying to build a callgraph for the internal methods using rtags. I've got Eclipse set up with the index and I'm going to try to walk through the call hierarchy through the UI, but it'd be really nice to have a tool that can help me do it more efficiently. I found 'Callgraph' here:

https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Callgraph

but it depends on Treehydra, which was abandoned in 2010, so I'm not sure it's the best option. Is there something else available that I'm missing? I've Googled around but couldn't find anything that worked out of the box.
Flags: needinfo?(lkhodel) → needinfo?(chutten)
Hrm, that's a tough one. My tooling amounts to just a couple of vim instances, so I might not be the best person to ask. The #developer channel on irc.mozilla.org[1] might be able to help you out. 

In general, calls come in on TelemetryHistogram::* methods where they acquire the lock before calling any number of internal_* methods in any order. There are also the JS entrypoints we register (the JSHistogram and JSKeyedHistogram functions) which have a problematic lack of locks (which are being addressed in bug 1278821), and are alternate places calls may enter the module.

[1]: https://wiki.mozilla.org/Irc
Flags: needinfo?(chutten)
(In reply to Chris H-C :chutten from comment #26)
> Hrm, that's a tough one. My tooling amounts to just a couple of vim
> instances, so I might not be the best person to ask. The #developer channel
> on irc.mozilla.org[1] might be able to help you out. 

Got it, thanks. I found Sourcetrail[1], which is free for non-commercial use, but haven't had a chance to get it running.

> 
> In general, calls come in on TelemetryHistogram::* methods where they
> acquire the lock before calling any number of internal_* methods in any
> order. There are also the JS entrypoints we register (the JSHistogram and
> JSKeyedHistogram functions) which have a problematic lack of locks (which
> are being addressed in bug 1278821), and are alternate places calls may
> enter the module.
> 
> [1]: https://wiki.mozilla.org/Irc

Good to know, thanks! I'm going to get started refactoring and will let you know if I have any questions.

[1]: https://www.sourcetrail.com/
Hello, Leo! How are things going with this bug?

If it hasn't been going well, no worries. I'm here to help. And if that help means helping find you a different bug to work on, that's okay, too!
Flags: needinfo?(lkhodel)
Assignee: lkhodel → nobody
Status: ASSIGNED → NEW
Flags: needinfo?(lkhodel)
Hello there,
Mind if I give this one a try ? 
It seems like a heavy one !
Sure! Let me know if you have any questions.
Assignee: nobody → jeremy.lempereur
Status: NEW → ASSIGNED
Hey Jeremy, I know you have a few other things you're looking into across the project. Are you still planning on looking into this sometime soon?
Flags: needinfo?(jeremy.lempereur)
Hey Chris :),

Considering the Autotimer / Autocounter + benchmarks I'd rather pass on that one ^^
Flags: needinfo?(jeremy.lempereur)
Assignee: jeremy.lempereur → nobody
Status: ASSIGNED → NEW
Michael, would you be interested in taking this to the finish line?
Flags: needinfo?(mcalabrese85)
Assignee

Comment 34

a year ago
(In reply to Alessio Placitelli [:Dexter] from comment #33)
> Michael, would you be interested in taking this to the finish line?

Hey Alessio, I'm looking into this now. I may be a little slow on it for about two weeks as I'll be traveling, though. While I'm looking it over, I want to make sure I have the constraints right:

1. No new locks should be made, only the ones that are already there should be passed around.

2. Each lock will be passed into each internal_* function that follows the creation of the lock, and will continue to be passed on if the callee calls another internal_* function. Ex: internal_JSHistogram_Add creates a lock before calling internal_Accumulate. internal_Accumulate should take the lock as an argument and use it when it calls internal_RemoteAccumulate, which has also be modified to accept the lock as a parameter.

3. If necessary, fine-grained locks can become more coarse-grained ones if needed to cover more internal functions. This may be done in a function like TelemetryHistogram::SetHistogramRecordingEnabled, where internal_IsHistogramEnumId is called without a lock although one is created before internal_SetHistogramRecordingEnabled.

4. Locks should not occur before calling interal_JS* functions because they may try to create their own lock, causing a deadlock.

Does that seem right? Or am I missing something?

Also, is markdown still available for comments? I tried a few things, but nothing shows up in the preview and I don't see the expected checkbox that is mentioned somewhere on the Bugzilla site.
Flags: needinfo?(mcalabrese85)
(In reply to Michael Calabrese from comment #34)
> (In reply to Alessio Placitelli [:Dexter] from comment #33)
> > Michael, would you be interested in taking this to the finish line?
> 
> Hey Alessio, I'm looking into this now. I may be a little slow on it for
> about two weeks as I'll be traveling, though. While I'm looking it over, I
> want to make sure I have the constraints right:
> 
> 1. No new locks should be made, only the ones that are already there should
> be passed around.
> 
> 2. Each lock will be passed into each internal_* function that follows the
> creation of the lock, and will continue to be passed on if the callee calls
> another internal_* function. Ex: internal_JSHistogram_Add creates a lock
> before calling internal_Accumulate. internal_Accumulate should take the lock
> as an argument and use it when it calls internal_RemoteAccumulate, which has
> also be modified to accept the lock as a parameter.
> 
> 3. If necessary, fine-grained locks can become more coarse-grained ones if
> needed to cover more internal functions. This may be done in a function like
> TelemetryHistogram::SetHistogramRecordingEnabled, where
> internal_IsHistogramEnumId is called without a lock although one is created
> before internal_SetHistogramRecordingEnabled.
> 
> 4. Locks should not occur before calling interal_JS* functions because they
> may try to create their own lock, causing a deadlock.
> 
> Does that seem right? Or am I missing something?

Yes, this sounds about right! Let's evaluate (3) on a case by case basis.
 
> Also, is markdown still available for comments? I tried a few things, but
> nothing shows up in the preview and I don't see the expected checkbox that
> is mentioned somewhere on the Bugzilla site.

Markdown is just available for comments in MozReview I'm afraid.
Assignee

Comment 36

a year ago
Awesome, thank you. I don't intend to mess with case 3 much, if at all, just wanted it out there as a possibility. I plan on working on this for at least a few hours each day over the next few days and hopefully more over the weekend if necessary.
(In reply to Michael Calabrese from comment #36)
> Awesome, thank you. I don't intend to mess with case 3 much, if at all, just
> wanted it out there as a possibility. I plan on working on this for at least
> a few hours each day over the next few days and hopefully more over the
> weekend if necessary.

Definitely, thank you for that! Take your time and let us know if you're blocked or want to discuss about it :)
Assignee: nobody → mcalabrese85
Comment hidden (mozreview-request)

Comment 39

a year ago
mozreview-review
Comment on attachment 8976371 [details]
Bug 1362953 - Refactored lock-protected TelemetryHistogram code to be more similar to other sections of code.

https://reviewboard.mozilla.org/r/244534/#review250652

This looks good as a start, thank you! I'm mostly pointing out that the argument needs to be rename, hence the long list of changes requested. Moreover, I would suggest you rebase your change on a more recent version of our mozilla-central: a few bugs landed lately that changed TelemetryHistogram.cpp. Even better, you might want to wait a couple of days for bug 1457127 to land before rebasing :)

::: commit-message-3c9d6:1
(Diff revision 1)
> +Bug 1362953 - Refactored lock-protected TelemetryHistogram code to be more similar to other sections of code. r?Dexter

Let's be specific about what we did here :) Something like "Bug 1362953 - Require lock proof in TelemetryHistogram internal functions. r?dexter"

::: toolkit/components/telemetry/TelemetryHistogram.cpp:266
(Diff revision 1)
>                                             ProcessID aProcessId)
>  {
>    return aHistogramId * size_t(ProcessID::Count) + size_t(aProcessId);
>  }
>  
> -size_t internal_HistogramStorageIndex(HistogramID aHistogramId,
> +size_t internal_HistogramStorageIndex(const StaticMutexAutoLock& lock,

Let's follow our [code style guide](https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Coding_Style) and call the parameter `aLock` instead of `lock`. Let's do this here and in other places where you added it.

::: toolkit/components/telemetry/TelemetryHistogram.cpp:278
(Diff revision 1)
>          "Too many histograms and processes to store in a 1D array.");
>  
>    return aHistogramId * size_t(ProcessID::Count) + size_t(aProcessId);
>  }
>  
> -Histogram* internal_GetHistogramFromStorage(HistogramID aHistogramId,
> +Histogram* internal_GetHistogramFromStorage(const StaticMutexAutoLock& lock,

This should be `aLock`.

::: toolkit/components/telemetry/TelemetryHistogram.cpp:286
(Diff revision 1)
>  {
> -  size_t index = internal_HistogramStorageIndex(aHistogramId, aProcessId);
> +  size_t index = internal_HistogramStorageIndex(lock, aHistogramId, aProcessId);
>    return gHistogramStorage[index];
>  }
>  
> -void internal_SetHistogramInStorage(HistogramID aHistogramId,
> +void internal_SetHistogramInStorage(const StaticMutexAutoLock& lock,

This should be `aLock`.

::: toolkit/components/telemetry/TelemetryHistogram.cpp:333
(Diff revision 1)
>    return aID < HistogramCount;
>  }
>  
>  // Look up a plain histogram by id.
>  Histogram*
> -internal_GetHistogramById(HistogramID histogramId, ProcessID processId, bool instantiate = true)
> +internal_GetHistogramById(const StaticMutexAutoLock& lock, HistogramID histogramId, ProcessID processId, bool instantiate = true)

This should be `aLock`.

::: toolkit/components/telemetry/TelemetryHistogram.cpp:377
(Diff revision 1)
>    return kh;
>  }
>  
>  // Look up a histogram id from a histogram name.
>  nsresult
> -internal_GetHistogramIdByName(const nsACString& name, HistogramID* id)
> +internal_GetHistogramIdByName(const StaticMutexAutoLock& lock, const nsACString& name, HistogramID* id)

This should be `aLock`.

::: toolkit/components/telemetry/TelemetryHistogram.cpp:389
(Diff revision 1)
>    return NS_OK;
>  }
>  
>  // Clear a histogram from storage.
>  void
> -internal_ClearHistogramById(HistogramID histogramId, ProcessID processId)
> +internal_ClearHistogramById(const StaticMutexAutoLock& lock, HistogramID histogramId, ProcessID processId)

This should be `aLock`.

::: toolkit/components/telemetry/TelemetryHistogram.cpp:421
(Diff revision 1)
>    return gCanRecordExtended;
>  }
>  
>  // Note: this is completely unrelated to mozilla::IsEmpty.
>  bool
> -internal_IsEmpty(const Histogram *h)
> +internal_IsEmpty(const StaticMutexAutoLock& lock, const Histogram *h)

This should be `aLock`.

::: toolkit/components/telemetry/TelemetryHistogram.cpp:427
(Diff revision 1)
>  {
>    return h->is_empty();
>  }
>  
>  bool
> -internal_IsExpired(Histogram* h)
> +internal_IsExpired(const StaticMutexAutoLock& lock, Histogram* h)

This should be `aLock`.

::: toolkit/components/telemetry/TelemetryHistogram.cpp:433
(Diff revision 1)
>  {
>    return h == gExpiredHistogram;
>  }
>  
>  void
> -internal_SetHistogramRecordingEnabled(HistogramID id, bool aEnabled)
> +internal_SetHistogramRecordingEnabled(const StaticMutexAutoLock& lock, HistogramID id, bool aEnabled)

This should be `aLock`.

::: toolkit/components/telemetry/TelemetryHistogram.cpp:440
(Diff revision 1)
>    MOZ_ASSERT(internal_IsHistogramEnumId(id));
>    gHistogramRecordingDisabled[id] = !aEnabled;
>  }
>  
>  bool
>  internal_IsRecordingEnabled(HistogramID id)

Any reason why you didn't add the lock here as well?

::: toolkit/components/telemetry/TelemetryHistogram.cpp:602
(Diff revision 1)
>  
>    return h;
>  }
>  
>  nsresult
> -internal_HistogramAdd(Histogram& histogram,
> +internal_HistogramAdd(const StaticMutexAutoLock& lock,

This should be `aLock`.

::: toolkit/components/telemetry/TelemetryHistogram.cpp:743
(Diff revision 1)
>  
>    return NS_OK;
>  }
>  
>  bool
> -internal_ShouldReflectHistogram(Histogram* h, HistogramID id)
> +internal_ShouldReflectHistogram(const StaticMutexAutoLock& lock, Histogram* h, HistogramID id)

This should be `aLock`.

::: toolkit/components/telemetry/TelemetryHistogram.cpp:990
(Diff revision 1)
>  // PRIVATE: thread-unsafe helpers for the external interface
>  
>  namespace {
>  
>  bool
> -internal_RemoteAccumulate(HistogramID aId, uint32_t aSample)
> +internal_RemoteAccumulate(const StaticMutexAutoLock& lock, HistogramID aId, uint32_t aSample)

This should be `aLock`.

::: toolkit/components/telemetry/TelemetryHistogram.cpp:1005
(Diff revision 1)
>    TelemetryIPCAccumulator::AccumulateChildHistogram(aId, aSample);
>    return true;
>  }
>  
>  bool
> -internal_RemoteAccumulate(HistogramID aId,
> +internal_RemoteAccumulate(const StaticMutexAutoLock& lock,

This should be `aLock`.

::: toolkit/components/telemetry/TelemetryHistogram.cpp:1021
(Diff revision 1)
>  
>    TelemetryIPCAccumulator::AccumulateChildKeyedHistogram(aId, aKey, aSample);
>    return true;
>  }
>  
> -void internal_Accumulate(HistogramID aId, uint32_t aSample)
> +void internal_Accumulate(const StaticMutexAutoLock& lock, HistogramID aId, uint32_t aSample)

This should be `aLock`.

::: toolkit/components/telemetry/TelemetryHistogram.cpp:1034
(Diff revision 1)
>    MOZ_ASSERT(h);
> -  internal_HistogramAdd(*h, aId, aSample, ProcessID::Parent);
> +  internal_HistogramAdd(lock, *h, aId, aSample, ProcessID::Parent);
>  }
>  
>  void
> -internal_Accumulate(HistogramID aId,
> +internal_Accumulate(const StaticMutexAutoLock& lock, HistogramID aId,

This should be `aLock`.

::: toolkit/components/telemetry/TelemetryHistogram.cpp:1048
(Diff revision 1)
>    MOZ_ASSERT(keyed);
>    keyed->Add(aKey, aSample, ProcessID::Parent);
>  }
>  
>  void
> -internal_AccumulateChild(ProcessID aProcessType, HistogramID aId, uint32_t aSample)
> +internal_AccumulateChild(const StaticMutexAutoLock& lock, ProcessID aProcessType, HistogramID aId, uint32_t aSample)

This should be `aLock`.

::: toolkit/components/telemetry/TelemetryHistogram.cpp:1062
(Diff revision 1)
>      NS_WARNING("Failed GetHistogramById for CHILD");
>    }
>  }
>  
>  void
> -internal_AccumulateChildKeyed(ProcessID aProcessType, HistogramID aId,
> +internal_AccumulateChildKeyed(const StaticMutexAutoLock& lock, ProcessID aProcessType, HistogramID aId,

This should be `aLock`.

::: toolkit/components/telemetry/TelemetryHistogram.cpp:1075
(Diff revision 1)
>    MOZ_ASSERT(keyed);
>    keyed->Add(aKey, aSample, aProcessType);
>  }
>  
>  void
> -internal_ClearHistogram(HistogramID id)
> +internal_ClearHistogram(const StaticMutexAutoLock& lock, HistogramID id)

This should be `aLock`.
Attachment #8976371 - Flags: review?(alessio.placitelli)
Attachment #8867849 - Attachment is obsolete: true
Assignee

Comment 40

a year ago
Oops, sorry about the variable name. That can be corrected easily, thankfully.

internal_IsRecordingEnabled cannot be locked because KeyedHistogram::Add does not have a lock to pass to the function. I'm not sure how KeyedHistogram::Add is called, so I'm not sure if a lock can be created in that function.

internal_IsHistogramEnumId cannot be locked because internal_IsRecordingEnabled, internal_KeyedHistogram_SnapshotImpl and internal_JSHistogram_Add do not contain locks. internal_JSKeyedHistogram_Clear and TelemetryHistogram::GetHistogramName have locks, but they are created after internal_IsHistogramEnumId is called.

In internal_JSKeyedHistogram_Clear the lock is created right after internal_IsHistogramEnumId is called, so that could be moved. In TelemetryHistogram::GetHistogramName, the same could probable happen as there is only an NS_WARN_IF and MOZ_ASSERT_UNREACHABLE call before the lock is created. internal_IsRecordingEnabled has the problem listed above, which is the same for caller functions regarding internal_KeyedHistogram_SnapshotImpl and internal_JSHistogram_Add.

I see that Bug 1457127 may change the code further, so I will keep an eye on it. I had merged changes that had been made already, but can understand that it may make it difficult for others if multiple people are working on the same file.
Hi Michael, bug 1457127 landed and this bug is now unblocked :) Feel free to rebase and work on it again!
Flags: needinfo?(mcalabrese85)
Comment hidden (mozreview-request)
Assignee

Updated

a year ago
Attachment #8976371 - Attachment is obsolete: true
Assignee

Comment 43

a year ago
Hi Alessio,

I updated the patch to work with the new code. As far as I can tell the new code did not require additional functions to use the lock as a parameter (or at least none that you did not already implement in the other patches for Bug 1457127), but please let me if this is incorrect.
Flags: needinfo?(mcalabrese85)

Comment 44

a year ago
mozreview-review
Comment on attachment 8979820 [details]
Bug 1362953 - Require lock proof in TelemetryHistogram internal functions where possible.

https://reviewboard.mozilla.org/r/245986/#review252048

Hi Michael, this looks like it's almost there. A few whitespaces were introduced mistakenly, so let's get rid of these and we should be done!

::: toolkit/components/telemetry/TelemetryHistogram.cpp:398
(Diff revision 1)
>    return kh;
>  }
>  
>  // Look up a histogram id from a histogram name.
>  nsresult
> -internal_GetHistogramIdByName(const nsACString& name, HistogramID* id)
> +internal_GetHistogramIdByName(const StaticMutexAutoLock& aLock, 

nit: there's a trailing space at the end of this

::: toolkit/components/telemetry/TelemetryHistogram.cpp:413
(Diff revision 1)
>  }
>  
>  // Clear a histogram from storage.
>  void
> -internal_ClearHistogramById(HistogramID histogramId, ProcessID processId)
> +internal_ClearHistogramById(const StaticMutexAutoLock& aLock,
> +                            HistogramID histogramId, 

nit: there's a trailing space at the end of this

::: toolkit/components/telemetry/TelemetryHistogram.cpp:468
(Diff revision 1)
>  {
>    return h == gExpiredHistogram;
>  }
>  
>  void
> -internal_SetHistogramRecordingEnabled(HistogramID id, bool aEnabled)
> +internal_SetHistogramRecordingEnabled(const StaticMutexAutoLock& aLock, 

nit: there's a trailing space at the end of this

::: toolkit/components/telemetry/TelemetryHistogram.cpp:639
(Diff revision 1)
>  
>    return h;
>  }
>  
>  nsresult
> -internal_HistogramAdd(Histogram& histogram,
> +internal_HistogramAdd(const StaticMutexAutoLock& aLock, 

nit: there's a trailing space at the end of this

::: toolkit/components/telemetry/TelemetryHistogram.cpp:1225
(Diff revision 1)
>    MOZ_ASSERT(keyed);
>    keyed->Add(aKey, aSample, ProcessID::Parent);
>  }
>  
>  void
> -internal_AccumulateChild(ProcessID aProcessType, HistogramID aId, uint32_t aSample)
> +internal_AccumulateChild(const StaticMutexAutoLock& aLock, 

nit: there's a trailing space at the end of this
Attachment #8979820 - Flags: review?(alessio.placitelli)
Assignee

Comment 45

a year ago
Hello Alessio,

Thank you for your review. Is the trailing space what was being shown in the diff as a solid red block? I wasn't sure what it meant, thought it just meant an additional line break was added. And when I push the commit for review, do I just mention that trailing whitespace was removed or should I treat it as a new patch that is used in place of the one previously submitted?
(In reply to Michael Calabrese from comment #45)
> Hello Alessio,
> 
> Thank you for your review. Is the trailing space what was being shown in the
> diff as a solid red block? I wasn't sure what it meant, thought it just
> meant an additional line break was added.

Yes, the solid red blocks :)

> And when I push the commit for
> review, do I just mention that trailing whitespace was removed or should I
> treat it as a new patch that is used in place of the one previously
> submitted?

You can amend the commit and push again the updated version. With the current commit as the "tip" of your repo, you can do this:

> hg log -r .

To check that your commit is the most recent one. Then you take off the whitespaces. After that, you do

> hg diff

And verify that you only fixed the whitespaces.

Finally

> hg commit --amend
> hg push review
Comment hidden (mozreview-request)

Comment 48

a year ago
mozreview-review
Comment on attachment 8979820 [details]
Bug 1362953 - Require lock proof in TelemetryHistogram internal functions where possible.

https://reviewboard.mozilla.org/r/245986/#review252800

This looks good to me now, thank you for your efforts and patience!
Attachment #8979820 - Flags: review?(alessio.placitelli) → review+
Hey Michael, looks like this is failing to build: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b74f2f8576bef4a44cebdd53f5f1cba147293bbe&selectedJob=180188595

Would you kindly take a look at what's going on?
Flags: needinfo?(mcalabrese85)
Assignee

Comment 50

a year ago
Hey Alessio,

Looks like I didn't look over some of the functions that were added closely enough after all. I didn't catch them because I have only been testing on Windows so far, but I will be changing that habit in the future. The patch is now updated and builds and runs correctly on my Android phone and tablet.

Unfortunately, I cannot run the automated tests for some reason. I'm not sure if it is because I'm running in a virtual machine or because I need to have the devices rooted at the moment. I'll have to look into it more over the weekend and try to figure it out before I push the revised patch.
Flags: needinfo?(mcalabrese85)
(In reply to Michael Calabrese from comment #50)
> Hey Alessio,
> 
> Looks like I didn't look over some of the functions that were added closely
> enough after all. I didn't catch them because I have only been testing on
> Windows so far, but I will be changing that habit in the future. The patch
> is now updated and builds and runs correctly on my Android phone and tablet.

Ah, no problem for that. I think you can even trigger a try push for these platforms yourself from MozReview! From "Automation" pick "Trigger a try build", then on the left select the platform (both debug and optimized) and finally the test suite. I would go for xpcshell,gtest,mochitest-e10s-bc !

> Unfortunately, I cannot run the automated tests for some reason. I'm not
> sure if it is because I'm running in a virtual machine or because I need to
> have the devices rooted at the moment. I'll have to look into it more over
> the weekend and try to figure it out before I push the revised patch.

It could be because of both things :) Don't worry, I can push to try if you cannot!
Flags: needinfo?(mcalabrese85)
Comment hidden (mozreview-request)
Assignee

Comment 53

a year ago
Hey Alessio,

No luck with getting the tests to work on my computer. I'm sure it's something small I'm not doing right, but I'll have to wait to jump onto the IRC channel to see if someone can give me a hand.

In the meantime, I pushed the revised patch. I cannot trigger a try build on my own, but I will keep an eye out on the results if you can do it for me.
(In reply to Michael Calabrese from comment #53)
> Hey Alessio,
> 
> No luck with getting the tests to work on my computer. I'm sure it's
> something small I'm not doing right, but I'll have to wait to jump onto the
> IRC channel to see if someone can give me a hand.

No worries, looks like you dealt with the problem the right way :) Unfortunately working on Android is still a bit rough on the edges :)

> In the meantime, I pushed the revised patch. I cannot trigger a try build on
> my own, but I will keep an eye out on the results if you can do it for me.

I've done that (https://treeherder.mozilla.org/#/jobs?repo=try&revision=eeda541f14bb70c3bcd453afb3efcfbdf8d0b657&selectedJob=180668966), looks like this is good to land now!
Flags: needinfo?(mcalabrese85)

Comment 55

a year ago
Pushed by alessio.placitelli@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/9f9b17412a65
Require lock proof in TelemetryHistogram internal functions where possible. r=Dexter

Comment 56

a year ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/9f9b17412a65
Status: NEW → RESOLVED
Last Resolved: a year ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
You need to log in before you can comment on or make changes to this bug.