1362953 - Refactor lock-protected TelemetryHistogram code

Reporter

Description

•

8 years ago

In TelemetryHistogram.cpp [1], all functions that are named `internal_*()` are meant to be called with a global TelemetryHistogram lock to be held (on `gTelemetryHistogramMutex`). A safer pattern is to require passing a lock instance as seen in [2]. We should refactor TelemetryHistogram to that pattern, so e.g.: > void > internal_SetHistogramRecordingEnabled(...) ... becomes > void > SetHistogramRecordingEnabled(const StaticMutexAutoLock& lock, ...) ... and callers become: > internal_SetHistogramRecordingEnabled(locker, ...); 1: https://dxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/TelemetryHistogram.cpp 2: https://dxr.mozilla.org/mozilla-central/rev/17d8a1e278a9c54a6fdda9d390abce4077e55b20/toolkit/components/telemetry/TelemetryEvent.cpp#280

Georg Fritzsche [:gfritzsche]

Reporter

Comment 1

•

8 years ago

This will require: - Changing TelemetryHistogram.cpp as described. - We will also need to update the comments to match the code changes. - Making sure tests pass: - first: mach test toolkit/components/telemetry/tests/unit/test_TelemetryHistograms.js - before submitting a patch, also make sure the other tests pass too: mach test toolkit/components/telemetry/tests/unit/

Georg Fritzsche [:gfritzsche]

Reporter

Comment 2

•

8 years ago

Note that the following functions can't be adopted yet: - internal_JSHistogram_*() - internal_WrapAndReturnHistogram() - internal_KeyedHistogram_SnapshotImpl() - internal_JSKeyedHistogram_*() - internal_WrapAndReturnKeyedHistogram() This is due to bug 1278821.

Georg Fritzsche [:gfritzsche]

Reporter

Updated

•

8 years ago

Mentor: chutten

Vedant Sareen [:fionn_mac]

Comment 3

•

8 years ago

Attached patch Proposed patch for bug 1362953. (obsolete) — Details — Splinter Review

I was able to build successfully. The patch passed all tests. Some notes: The methods |KeyedHistogram::Add()|, |KeyedHistogram::GetHistogram()| and |KeyedHistogram::GetEnumId()| also now require a lock instance to be passed as a parameter, as they are called from |internal_*()| methods and also call certain |internal_*()| methods (specifically, KeyedHistogram::GetHistogram() calls internal_HistogramGet() and KeyedHistogram::GetEnumId() calls internal_GetHistogramEnumId()). The method |KeyedHistogram::ReflectKeyedHistogram()| does not require passing a lock instance even though it calls |internal_ReflectHistogramSnapshot()|. It, along with a few of the listed methods that cannot be adopted yet, but call other |internal_*()| methods, acquires |StaticMutexAutoLock locker(gTelemetryHistogramMutex);| with a limited scope.

Attachment #8866551 - Flags: review?(chutten)

Chris H-C :chutten

Comment 4

•

8 years ago

Comment on attachment 8866551 [details] [diff] [review] Proposed patch for bug 1362953. Review of attachment 8866551 [details] [diff] [review]: ----------------------------------------------------------------- Overall you should not be creating any lockers. If you find yourself in a situation where you have to, then it gets interesting. If any call descending from where you think you need a new locker might call into the JS engine, then we can't do it. Deadlocks will hit us. In that case we need to remove the locker parameter from those internal_* methods and accept that we may race. If it doesn't call into the JS engine, then a tightly-scoped new locker may be possible. Call these cases out specifically when you put it up for review so that I don't accidentally miss one :) ::: toolkit/components/telemetry/TelemetryHistogram.cpp @@ +858,5 @@ > +// it is used, and (2) because it is never de-initialised, and > +// a normal Mutex would show up as a leak in BloatView. StaticMutex > +// also has the "OffTheBooks" property, so it won't show as a leak > +// in BloatView. > +static StaticMutex gTelemetryHistogramMutex; It doesn't appear as though you need to move these lines @@ +1081,5 @@ > return false; > } > > + { > + StaticMutexAutoLock locker(gTelemetryHistogramMutex); This is likely to cause problems as reflecting the histogram snapshot could cause the JS engine to try to allocate memory which could cause it to start a GC which would cause it to accumulate telemetry which would deadlock on trying to reacquire this mutex. Your patch should not actually contain any StaticMutexAutoLock creations. It should only pass the ones around that already have been created. @@ +2058,5 @@ > TelemetryHistogram::Accumulate(mozilla::Telemetry::HistogramID aID, > const nsCString& aKey, uint32_t aSample) > { > + { > + StaticMutexAutoLock locker(gTelemetryHistogramMutex); In this case and a few others in your patch you can just raise the original `locker` to the top of the function.

Attachment #8866551 - Flags: review?(chutten) → review-

Vedant Sareen [:fionn_mac]

Comment 5

•

8 years ago

(In reply to Chris H-C :chutten from comment #4) > Comment on attachment 8866551 [details] [diff] [review] > Proposed patch for bug 1362953. > > Review of attachment 8866551 [details] [diff] [review]: > ----------------------------------------------------------------- > Your patch should not actually contain any StaticMutexAutoLock creations. It > should only pass the ones around that already have been created. Oops. I thought it needs to be enforced to all |internal_*()| methods, which necessitated adding more. I've fixed this now. > @@ +2058,5 @@ > > TelemetryHistogram::Accumulate(mozilla::Telemetry::HistogramID aID, > > const nsCString& aKey, uint32_t aSample) > > { > > + { > > + StaticMutexAutoLock locker(gTelemetryHistogramMutex); > > In this case and a few others in your patch you can just raise the original > `locker` to the top of the function. That would've worked too. I thought there was a specific reason behind keeping it lower, and hence, chose fine-grained locking over coarse-grained locking. I've fixed this now.

Vedant Sareen [:fionn_mac]

Comment 6

•

8 years ago

Attached patch Rectified patch for bug 1362953. (obsolete) — Details — Splinter Review

I was able to build successfully. The patch passed all tests. I've added a tightly scoped locker to internal_IdentifyCorruptHistograms(), even though it is called by TelemetryHistogram::CreateHistogramSnapshots(), as it would allow enforcing the pattern to internal_Accumulate(), and subsequently, to internal_RemoteAccumulate(), internal_HistogramAdd() and internal_CanRecordExtended(). For the above-mentioned implementation, the declaration of |gTelemetryHistogramMutex| has been moved up. The following functions have been left untouched : ** internal_CanRecordBase() -> called by internal_JSHistogram_Add() ** internal_GetHistogramEnumId() -> called by internal_JSHistogram_Add() ** internal_HistogramClear() -> called by internal_JSHistogram_Clear() ** internal_ReflectHistogramSnapshot() -> called by internal_JSHistogram_Snapshot(), internal_KeyedHistogram_SnapshotImpl(), and TelemetryHistogram::CreateHistogramSnapshots() ** internal_IsEmpty() -> called by TelemetryHistogram::CreateHistogramSnapshots() ** internal_IsExpired() -> called by TelemetryHistogram::CreateHistogramSnapshots() ** internal_ShouldReflectHistogram() -> called only by TelemetryHistogram::CreateHistogramSnapshots() ** internal_IdentifyCorruptHistograms() -> called only by TelemetryHistogram::CreateHistogramSnapshots() ** internal_GetHistogramByEnumId() -> called by TelemetryHistogram::CreateHistogramSnapshots() ** internal_GetSubsessionHistogram() -> called by TelemetryHistogram::CreateHistogramSnapshots() ** internal_GetHistogramMapEntry() -> called by internal_GetHistogramEnumId(), which is further called by internal_JSHistogram_Add(). Since internal_GetHistogramEnumId() can also be called by functions possessing the lock, it cannot try to possess lock again. ** internal_CloneHistogram() -> called by internal_GetSubsessionHistogram(), which is further called by TelemetryHistogram::CreateHistogramSnapshots(). Since internal_GetSubsessionHistogram() can also be called by functions possessing the lock, it cannot try to possess the lock again. ** internal_HistogramGet() -> called by internal_GetHistogramByEnumId(), which is further called by TelemetryHistogram::CreateHistogramSnapshots(). Since internal_GetHistogramByEnumId() can also be called by functions possessing the lock, it cannot try to possess the lock again. ** internal_ReflectHistogramAndSamples() -> called only by internal_ReflectHistogramSnapshot(). Therefore, left untouched. ** internal_FillRanges() -> called only by internal_ReflectHistogramAndSamples(). Therefore, left untouched. ** internal_CheckHistogramArguments() -> called only by internal_HistogramGet(). Therefore, left untouched.

Attachment #8866551 - Attachment is obsolete: true

Attachment #8867307 - Flags: review?(chutten)

Chris H-C :chutten

Comment 7

•

8 years ago

Comment on attachment 8867307 [details] [diff] [review] Rectified patch for bug 1362953. Review of attachment 8867307 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/components/telemetry/TelemetryHistogram.cpp @@ +742,5 @@ > } else if (check & Histogram::COUNT_LOW_ERROR) { > corruptID = mozilla::Telemetry::TOTAL_COUNT_LOW_ERRORS; > } > + { > + StaticMutexAutoLock locker(gTelemetryHistogramMutex); If this is okay, then the entirety of internal_IdentifyCorruptHistograms is okay. If the entirety's okay, then you can push the tightly-scoped locker into TelemetryHistogram::CreateHistogramSnapshots around the call site. Then this would be one fewer internal_* function missing a locker. @@ +1174,4 @@ > } > > bool > +internal_RemoteAccumulate(const StaticMutexAutoLock& lock, end-of-line whitespace should be omitted. @@ +1209,4 @@ > return true; > } > > +void internal_Accumulate(const StaticMutexAutoLock& lock, more EOL whitespace

Attachment #8867307 - Flags: review?(chutten) → review-

Chris H-C :chutten

Updated

•

8 years ago

Assignee: nobody → vedant.sareen

Status: NEW → ASSIGNED

Vedant Sareen [:fionn_mac]

Comment 8

•

8 years ago

Attached patch Cleaned up patch for bug 1362953. (obsolete) — Details — Splinter Review

Attachment #8867307 - Attachment is obsolete: true

Attachment #8867849 - Flags: review?(chutten)

Chris H-C :chutten

Comment 9

•

8 years ago

Comment on attachment 8867849 [details] [diff] [review] Cleaned up patch for bug 1362953. Review of attachment 8867849 [details] [diff] [review]: ----------------------------------------------------------------- Looks good, I'll send this to try tomorrow morning. In the meantime, if you're looking at getting the rest of these in Part 2, may I recommend tracking your work on bug 1278821? Apparently we have a bug for it we've been neglecting this past little while :)

Attachment #8867849 - Flags: review?(chutten) → review+

Georg Fritzsche [:gfritzsche]

Reporter

Updated

•

8 years ago

Blocks: 1278821

Chris H-C :chutten

Comment 10

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=e6ace3c27e122d90bbc6cf43628959c6f5f02ef5

Chris H-C :chutten

Comment 11

•

8 years ago

It seems as though some of the media tests are, once again, showing us when Telemetry locks up. Somewhere in this patch we must have introduced a deadlock. :jseward, there are tools available to help us out with this, right? I seem to recall the name "TSan" being bandied about, is that available on all of our Tier 1 platforms? (and is there documentation about running it?)

Flags: needinfo?(jseward)

Julian Seward [:jseward]

Comment 12

•

8 years ago

TSan is one of those tools that "probably works for Firefox" most of the time, but AFAIK nobody actually ensures that Fx always works on it. It's only available on x86_64-linux. I haven't run it for about six months. If you can reproduce the hangs in a debugger, obviously the thing to do is look at the stacks of the 2 (or more) threads involved. If not, you have a difficult problem. One thing you could look for is inconsistencies in lock order acquisition. That is, if there are two locks, A and B, that must be acquired in order to complete some activity, then all threads must acquire them in the same order. It doesn't matter what the order is (it can be A first and then B, or B first and then A). What can happen if the order is inconsistent, is that one thread can acquire A and then wait for B, whilst another acquires B and then waits for A. So they deadlock. I believe that debug builds of Firefox have some amount lock cycle detection built in, although I suspect that doesn't apply to all flavours of locks. Nathan might know more.

Flags: needinfo?(jseward) → needinfo?(nfroyd)

Nathan Froyd [:froydnj]

Comment 13

•

8 years ago

(In reply to Julian Seward [:jseward] from comment #12) > I believe that debug builds of Firefox have some amount lock cycle detection > built in, although I suspect that doesn't apply to all flavours of locks. > Nathan might know more. Yes, we'll do deadlock detection for mozilla::Mutex. We won't do it for the lock flavor used by the IPC code, which appears to be implicated in some of the crash stacks from that try run. Perhaps converting those locks to mozilla::Mutex, which would be an excellent cleanup on its own, might assist you in finding the bug?

Flags: needinfo?(nfroyd)

Chris H-C :chutten

Comment 14

•

8 years ago

According to the comments[1] we are using StaticMutex to ensure thread-safe initialization and to skip BloatView. Are those concerns no longer founded? (can we convert safely?) [1]: http://searchfox.org/mozilla-central/source/toolkit/components/telemetry/TelemetryHistogram.cpp#1135

Flags: needinfo?(nfroyd)

Nathan Froyd [:froydnj]

Comment 15

•

8 years ago

(In reply to Chris H-C :chutten from comment #14) > According to the comments[1] we are using StaticMutex to ensure thread-safe > initialization and to skip BloatView. Are those concerns no longer founded? > (can we convert safely?) Oh, sorry. When I said "lock flavor used by the IPC code", I meant the locks used primarily (only?) by the code under $SRCDIR/ipc: http://dxr.mozilla.org/mozilla-central/source/ipc/chromium/src/base/lock.h#13 It looked like you might be calling into that code from Telemetry and dead/livelocking in some way, which would be difficult to detect with the IPC-specific locks not participating in our deadlock detection. (TSan runs ought to be able to catch that sort of thing, but our deadlock detection would have the virtue of working well on try, so...) The comments that you refer to are still valid.

Flags: needinfo?(nfroyd)

Nathan Froyd [:froydnj]

Comment 16

•

8 years ago

(To be clear, the errors in communication were entirely on my end!)

Georg Fritzsche [:gfritzsche]

Reporter

Updated

•

8 years ago

Blocks: 1277552

Georg Fritzsche [:gfritzsche]

Reporter

Updated

•

8 years ago

Blocks: 1370565

Georg Fritzsche [:gfritzsche]

Reporter

Updated

•

8 years ago

Blocks: 1381470

Georg Fritzsche [:gfritzsche]

Reporter

Updated

•

8 years ago

No longer blocks: 1381470

Georg Fritzsche [:gfritzsche]

Reporter

Updated

•

8 years ago

Blocks: 1362957

Georg Fritzsche [:gfritzsche]

Reporter

Comment 17

•

8 years ago

Hi Vedant, are you still working on this?

Flags: needinfo?(gfritzsche)

Georg Fritzsche [:gfritzsche]

Reporter

Updated

•

8 years ago

Flags: needinfo?(gfritzsche) → needinfo?(vedantsareen)

Vedant Sareen [:fionn_mac]

Comment 18

•

8 years ago

Hey! Yes, I am still working on it. However, my pace of work has been extremely slow, as I've been caught up in a lot of other work too :/ Since the bug activity has been idle for 4 months, I guess it would make sense to mark this bug as unassigned for now, just in case someone else is also interested in working on it. If the bug remains free and if I'm able to make some progress in this, I'll make sure to upload a WIP patch for it :) I humbly apologize for the inconvenience caused :/

Flags: needinfo?(vedantsareen)

Georg Fritzsche [:gfritzsche]

Reporter

Comment 19

•

8 years ago

That sounds good to me :) And no worries, thanks for your work on it so far!

Assignee: vedantsareen → nobody

Status: ASSIGNED → NEW

Priority: P3 → P4

Leo Khodel [:lvk]

Comment 20

•

8 years ago

Hi, I'd like to try fixing this bug. Where should I start?

Flags: needinfo?(gfritzsche)

Georg Fritzsche [:gfritzsche]

Reporter

Comment 21

•

8 years ago

Chris spent some time on refactoring the TelemetryHistogram code recently. Chris, can you help get Leo started?

Flags: needinfo?(gfritzsche) → needinfo?(chutten)

Chris H-C :chutten

Comment 22

•

8 years ago

The first step would be to get the Firefox source code and build it. This is not an easy bug, so I will trust that you are able to find out how to do this on your own. If you can't, no worries! We have lots of easy bugs to start with[1] and you can always come back to this one later. The second step would be to carefully read the bug comments above. There is a lot of information we learned through the first attempt at this bug, and any successful attempt would do well to keep those lessons in mind. The third will be to rework the TelemetryHistogram code so that TelemetryHistogram::* methods continue to generate locks, passing them as proof to internal_* methods. (see early comments on this bug) The fourth will be to test this locally. It should build without errors or warnings, and ./mach test toolkit/components/telemetry/tests should succeed without errors or deadlock or timeout. The fifth will be to submit the code here as a patch that I can review. You can use MozReview or attach the patch as a file to this bug. Then we review and retest for a bit. Then we push the patch to try to make sure it builds and tests on all of the Operating Systems we need to support. Then we push it to mozilla-central and it gets included in a build! Do you wish to continue with this bug? [1]: https://www.joshmatthews.net/bugsahoy/?internals=1

Flags: needinfo?(chutten) → needinfo?(lkhodel)

Leo Khodel [:lvk]

Comment 23

•

8 years ago

(In reply to Chris H-C :chutten from comment #22) > The first step would be to get the Firefox source code and build it. This is > not an easy bug, so I will trust that you are able to find out how to do > this on your own. If you can't, no worries! We have lots of easy bugs to > start with[1] and you can always come back to this one later. No problem, I've landed a very simple bug fix, so I know the process. > > The second step would be to carefully read the bug comments above. There is > a lot of information we learned through the first attempt at this bug, and > any successful attempt would do well to keep those lessons in mind. Will do! > > The third will be to rework the TelemetryHistogram code so that > TelemetryHistogram::* methods continue to generate locks, passing them as > proof to internal_* methods. (see early comments on this bug) > > The fourth will be to test this locally. It should build without errors or > warnings, and ./mach test toolkit/components/telemetry/tests should succeed > without errors or deadlock or timeout. > > The fifth will be to submit the code here as a patch that I can review. You > can use MozReview or attach the patch as a file to this bug. > > Then we review and retest for a bit. Then we push the patch to try to make > sure it builds and tests on all of the Operating Systems we need to support. > > Then we push it to mozilla-central and it gets included in a build! Got it. > > Do you wish to continue with this bug? > > [1]: https://www.joshmatthews.net/bugsahoy/?internals=1 Yes, I do. I'll get started and will let you know as soon as I have any questions. Thanks!

Flags: needinfo?(lkhodel)

Chris H-C :chutten

Updated

•

8 years ago

Assignee: nobody → lkhodel

Status: NEW → ASSIGNED

Chris H-C :chutten

Comment 24

•

8 years ago

Hello :lvk, how are things progressing here? Anything I can help with?

Flags: needinfo?(lkhodel)

Leo Khodel [:lvk]

Comment 25

•

8 years ago

Hi :chutten, I'm getting familiar with the code that I'm going to be changing but I got ahead myself trying to build a callgraph for the internal methods using rtags. I've got Eclipse set up with the index and I'm going to try to walk through the call hierarchy through the UI, but it'd be really nice to have a tool that can help me do it more efficiently. I found 'Callgraph' here: https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Callgraph but it depends on Treehydra, which was abandoned in 2010, so I'm not sure it's the best option. Is there something else available that I'm missing? I've Googled around but couldn't find anything that worked out of the box.

Flags: needinfo?(lkhodel) → needinfo?(chutten)

Chris H-C :chutten

Comment 26

•

8 years ago

Hrm, that's a tough one. My tooling amounts to just a couple of vim instances, so I might not be the best person to ask. The #developer channel on irc.mozilla.org[1] might be able to help you out. In general, calls come in on TelemetryHistogram::* methods where they acquire the lock before calling any number of internal_* methods in any order. There are also the JS entrypoints we register (the JSHistogram and JSKeyedHistogram functions) which have a problematic lack of locks (which are being addressed in bug 1278821), and are alternate places calls may enter the module. [1]: https://wiki.mozilla.org/Irc

Flags: needinfo?(chutten)

Leo Khodel [:lvk]

Comment 27

•

8 years ago

(In reply to Chris H-C :chutten from comment #26) > Hrm, that's a tough one. My tooling amounts to just a couple of vim > instances, so I might not be the best person to ask. The #developer channel > on irc.mozilla.org[1] might be able to help you out. Got it, thanks. I found Sourcetrail[1], which is free for non-commercial use, but haven't had a chance to get it running. > > In general, calls come in on TelemetryHistogram::* methods where they > acquire the lock before calling any number of internal_* methods in any > order. There are also the JS entrypoints we register (the JSHistogram and > JSKeyedHistogram functions) which have a problematic lack of locks (which > are being addressed in bug 1278821), and are alternate places calls may > enter the module. > > [1]: https://wiki.mozilla.org/Irc Good to know, thanks! I'm going to get started refactoring and will let you know if I have any questions. [1]: https://www.sourcetrail.com/

Chris H-C :chutten

Comment 28

•

8 years ago

Hello, Leo! How are things going with this bug? If it hasn't been going well, no worries. I'm here to help. And if that help means helping find you a different bug to work on, that's okay, too!

Flags: needinfo?(lkhodel)

Chris H-C :chutten

Updated

•

7 years ago

Assignee: lkhodel → nobody

Status: ASSIGNED → NEW

Flags: needinfo?(lkhodel)

Jeremy Lempereur [:o0ignition0o]

Comment 29

•

7 years ago

Hello there, Mind if I give this one a try ? It seems like a heavy one !

Chris H-C :chutten

Comment 30

•

7 years ago

Sure! Let me know if you have any questions.

Assignee: nobody → jeremy.lempereur

Status: NEW → ASSIGNED

Chris H-C :chutten

Comment 31

•

7 years ago

Hey Jeremy, I know you have a few other things you're looking into across the project. Are you still planning on looking into this sometime soon?

Flags: needinfo?(jeremy.lempereur)

Jeremy Lempereur [:o0ignition0o]

Comment 32

•

7 years ago

Hey Chris :), Considering the Autotimer / Autocounter + benchmarks I'd rather pass on that one ^^

Flags: needinfo?(jeremy.lempereur)

Chris H-C :chutten

Updated

•

7 years ago

Assignee: jeremy.lempereur → nobody

Status: ASSIGNED → NEW

Alessio Placitelli [:Dexter]

Comment 33

•

7 years ago

Michael, would you be interested in taking this to the finish line?

Flags: needinfo?(mcalabrese85)

Michael Calabrese

Assignee

Comment 34

•

7 years ago

(In reply to Alessio Placitelli [:Dexter] from comment #33) > Michael, would you be interested in taking this to the finish line? Hey Alessio, I'm looking into this now. I may be a little slow on it for about two weeks as I'll be traveling, though. While I'm looking it over, I want to make sure I have the constraints right: 1. No new locks should be made, only the ones that are already there should be passed around. 2. Each lock will be passed into each internal_* function that follows the creation of the lock, and will continue to be passed on if the callee calls another internal_* function. Ex: internal_JSHistogram_Add creates a lock before calling internal_Accumulate. internal_Accumulate should take the lock as an argument and use it when it calls internal_RemoteAccumulate, which has also be modified to accept the lock as a parameter. 3. If necessary, fine-grained locks can become more coarse-grained ones if needed to cover more internal functions. This may be done in a function like TelemetryHistogram::SetHistogramRecordingEnabled, where internal_IsHistogramEnumId is called without a lock although one is created before internal_SetHistogramRecordingEnabled. 4. Locks should not occur before calling interal_JS* functions because they may try to create their own lock, causing a deadlock. Does that seem right? Or am I missing something? Also, is markdown still available for comments? I tried a few things, but nothing shows up in the preview and I don't see the expected checkbox that is mentioned somewhere on the Bugzilla site.

Flags: needinfo?(mcalabrese85)

Alessio Placitelli [:Dexter]

Comment 35

•

7 years ago

(In reply to Michael Calabrese from comment #34) > (In reply to Alessio Placitelli [:Dexter] from comment #33) > > Michael, would you be interested in taking this to the finish line? > > Hey Alessio, I'm looking into this now. I may be a little slow on it for > about two weeks as I'll be traveling, though. While I'm looking it over, I > want to make sure I have the constraints right: > > 1. No new locks should be made, only the ones that are already there should > be passed around. > > 2. Each lock will be passed into each internal_* function that follows the > creation of the lock, and will continue to be passed on if the callee calls > another internal_* function. Ex: internal_JSHistogram_Add creates a lock > before calling internal_Accumulate. internal_Accumulate should take the lock > as an argument and use it when it calls internal_RemoteAccumulate, which has > also be modified to accept the lock as a parameter. > > 3. If necessary, fine-grained locks can become more coarse-grained ones if > needed to cover more internal functions. This may be done in a function like > TelemetryHistogram::SetHistogramRecordingEnabled, where > internal_IsHistogramEnumId is called without a lock although one is created > before internal_SetHistogramRecordingEnabled. > > 4. Locks should not occur before calling interal_JS* functions because they > may try to create their own lock, causing a deadlock. > > Does that seem right? Or am I missing something? Yes, this sounds about right! Let's evaluate (3) on a case by case basis. > Also, is markdown still available for comments? I tried a few things, but > nothing shows up in the preview and I don't see the expected checkbox that > is mentioned somewhere on the Bugzilla site. Markdown is just available for comments in MozReview I'm afraid.

Michael Calabrese

Assignee

Comment 36

•

7 years ago

Awesome, thank you. I don't intend to mess with case 3 much, if at all, just wanted it out there as a possibility. I plan on working on this for at least a few hours each day over the next few days and hopefully more over the weekend if necessary.

Alessio Placitelli [:Dexter]

Comment 37

•

7 years ago

(In reply to Michael Calabrese from comment #36) > Awesome, thank you. I don't intend to mess with case 3 much, if at all, just > wanted it out there as a possibility. I plan on working on this for at least > a few hours each day over the next few days and hopefully more over the > weekend if necessary. Definitely, thank you for that! Take your time and let us know if you're blocked or want to discuss about it :)

Assignee: nobody → mcalabrese85

Comment hidden (mozreview-request)

Alessio Placitelli [:Dexter]

Comment 39

•

7 years ago

mozreview-review

Comment on attachment 8976371 [details] Bug 1362953 - Refactored lock-protected TelemetryHistogram code to be more similar to other sections of code. https://reviewboard.mozilla.org/r/244534/#review250652 This looks good as a start, thank you! I'm mostly pointing out that the argument needs to be rename, hence the long list of changes requested. Moreover, I would suggest you rebase your change on a more recent version of our mozilla-central: a few bugs landed lately that changed TelemetryHistogram.cpp. Even better, you might want to wait a couple of days for bug 1457127 to land before rebasing :) ::: commit-message-3c9d6:1 (Diff revision 1) > +Bug 1362953 - Refactored lock-protected TelemetryHistogram code to be more similar to other sections of code. r?Dexter Let's be specific about what we did here :) Something like "Bug 1362953 - Require lock proof in TelemetryHistogram internal functions. r?dexter" ::: toolkit/components/telemetry/TelemetryHistogram.cpp:266 (Diff revision 1) > ProcessID aProcessId) > { > return aHistogramId * size_t(ProcessID::Count) + size_t(aProcessId); > } > > -size_t internal_HistogramStorageIndex(HistogramID aHistogramId, > +size_t internal_HistogramStorageIndex(const StaticMutexAutoLock& lock, Let's follow our [code style guide](https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Coding_Style) and call the parameter `aLock` instead of `lock`. Let's do this here and in other places where you added it. ::: toolkit/components/telemetry/TelemetryHistogram.cpp:278 (Diff revision 1) > "Too many histograms and processes to store in a 1D array."); > > return aHistogramId * size_t(ProcessID::Count) + size_t(aProcessId); > } > > -Histogram* internal_GetHistogramFromStorage(HistogramID aHistogramId, > +Histogram* internal_GetHistogramFromStorage(const StaticMutexAutoLock& lock, This should be `aLock`. ::: toolkit/components/telemetry/TelemetryHistogram.cpp:286 (Diff revision 1) > { > - size_t index = internal_HistogramStorageIndex(aHistogramId, aProcessId); > + size_t index = internal_HistogramStorageIndex(lock, aHistogramId, aProcessId); > return gHistogramStorage[index]; > } > > -void internal_SetHistogramInStorage(HistogramID aHistogramId, > +void internal_SetHistogramInStorage(const StaticMutexAutoLock& lock, This should be `aLock`. ::: toolkit/components/telemetry/TelemetryHistogram.cpp:333 (Diff revision 1) > return aID < HistogramCount; > } > > // Look up a plain histogram by id. > Histogram* > -internal_GetHistogramById(HistogramID histogramId, ProcessID processId, bool instantiate = true) > +internal_GetHistogramById(const StaticMutexAutoLock& lock, HistogramID histogramId, ProcessID processId, bool instantiate = true) This should be `aLock`. ::: toolkit/components/telemetry/TelemetryHistogram.cpp:377 (Diff revision 1) > return kh; > } > > // Look up a histogram id from a histogram name. > nsresult > -internal_GetHistogramIdByName(const nsACString& name, HistogramID* id) > +internal_GetHistogramIdByName(const StaticMutexAutoLock& lock, const nsACString& name, HistogramID* id) This should be `aLock`. ::: toolkit/components/telemetry/TelemetryHistogram.cpp:389 (Diff revision 1) > return NS_OK; > } > > // Clear a histogram from storage. > void > -internal_ClearHistogramById(HistogramID histogramId, ProcessID processId) > +internal_ClearHistogramById(const StaticMutexAutoLock& lock, HistogramID histogramId, ProcessID processId) This should be `aLock`. ::: toolkit/components/telemetry/TelemetryHistogram.cpp:421 (Diff revision 1) > return gCanRecordExtended; > } > > // Note: this is completely unrelated to mozilla::IsEmpty. > bool > -internal_IsEmpty(const Histogram *h) > +internal_IsEmpty(const StaticMutexAutoLock& lock, const Histogram *h) This should be `aLock`. ::: toolkit/components/telemetry/TelemetryHistogram.cpp:427 (Diff revision 1) > { > return h->is_empty(); > } > > bool > -internal_IsExpired(Histogram* h) > +internal_IsExpired(const StaticMutexAutoLock& lock, Histogram* h) This should be `aLock`. ::: toolkit/components/telemetry/TelemetryHistogram.cpp:433 (Diff revision 1) > { > return h == gExpiredHistogram; > } > > void > -internal_SetHistogramRecordingEnabled(HistogramID id, bool aEnabled) > +internal_SetHistogramRecordingEnabled(const StaticMutexAutoLock& lock, HistogramID id, bool aEnabled) This should be `aLock`. ::: toolkit/components/telemetry/TelemetryHistogram.cpp:440 (Diff revision 1) > MOZ_ASSERT(internal_IsHistogramEnumId(id)); > gHistogramRecordingDisabled[id] = !aEnabled; > } > > bool > internal_IsRecordingEnabled(HistogramID id) Any reason why you didn't add the lock here as well? ::: toolkit/components/telemetry/TelemetryHistogram.cpp:602 (Diff revision 1) > > return h; > } > > nsresult > -internal_HistogramAdd(Histogram& histogram, > +internal_HistogramAdd(const StaticMutexAutoLock& lock, This should be `aLock`. ::: toolkit/components/telemetry/TelemetryHistogram.cpp:743 (Diff revision 1) > > return NS_OK; > } > > bool > -internal_ShouldReflectHistogram(Histogram* h, HistogramID id) > +internal_ShouldReflectHistogram(const StaticMutexAutoLock& lock, Histogram* h, HistogramID id) This should be `aLock`. ::: toolkit/components/telemetry/TelemetryHistogram.cpp:990 (Diff revision 1) > // PRIVATE: thread-unsafe helpers for the external interface > > namespace { > > bool > -internal_RemoteAccumulate(HistogramID aId, uint32_t aSample) > +internal_RemoteAccumulate(const StaticMutexAutoLock& lock, HistogramID aId, uint32_t aSample) This should be `aLock`. ::: toolkit/components/telemetry/TelemetryHistogram.cpp:1005 (Diff revision 1) > TelemetryIPCAccumulator::AccumulateChildHistogram(aId, aSample); > return true; > } > > bool > -internal_RemoteAccumulate(HistogramID aId, > +internal_RemoteAccumulate(const StaticMutexAutoLock& lock, This should be `aLock`. ::: toolkit/components/telemetry/TelemetryHistogram.cpp:1021 (Diff revision 1) > > TelemetryIPCAccumulator::AccumulateChildKeyedHistogram(aId, aKey, aSample); > return true; > } > > -void internal_Accumulate(HistogramID aId, uint32_t aSample) > +void internal_Accumulate(const StaticMutexAutoLock& lock, HistogramID aId, uint32_t aSample) This should be `aLock`. ::: toolkit/components/telemetry/TelemetryHistogram.cpp:1034 (Diff revision 1) > MOZ_ASSERT(h); > - internal_HistogramAdd(*h, aId, aSample, ProcessID::Parent); > + internal_HistogramAdd(lock, *h, aId, aSample, ProcessID::Parent); > } > > void > -internal_Accumulate(HistogramID aId, > +internal_Accumulate(const StaticMutexAutoLock& lock, HistogramID aId, This should be `aLock`. ::: toolkit/components/telemetry/TelemetryHistogram.cpp:1048 (Diff revision 1) > MOZ_ASSERT(keyed); > keyed->Add(aKey, aSample, ProcessID::Parent); > } > > void > -internal_AccumulateChild(ProcessID aProcessType, HistogramID aId, uint32_t aSample) > +internal_AccumulateChild(const StaticMutexAutoLock& lock, ProcessID aProcessType, HistogramID aId, uint32_t aSample) This should be `aLock`. ::: toolkit/components/telemetry/TelemetryHistogram.cpp:1062 (Diff revision 1) > NS_WARNING("Failed GetHistogramById for CHILD"); > } > } > > void > -internal_AccumulateChildKeyed(ProcessID aProcessType, HistogramID aId, > +internal_AccumulateChildKeyed(const StaticMutexAutoLock& lock, ProcessID aProcessType, HistogramID aId, This should be `aLock`. ::: toolkit/components/telemetry/TelemetryHistogram.cpp:1075 (Diff revision 1) > MOZ_ASSERT(keyed); > keyed->Add(aKey, aSample, aProcessType); > } > > void > -internal_ClearHistogram(HistogramID id) > +internal_ClearHistogram(const StaticMutexAutoLock& lock, HistogramID id) This should be `aLock`.

Attachment #8976371 - Flags: review?(alessio.placitelli)

Alessio Placitelli [:Dexter]

Updated

•

7 years ago

Attachment #8867849 - Attachment is obsolete: true

Michael Calabrese

Assignee

Comment 40

•

7 years ago

Oops, sorry about the variable name. That can be corrected easily, thankfully. internal_IsRecordingEnabled cannot be locked because KeyedHistogram::Add does not have a lock to pass to the function. I'm not sure how KeyedHistogram::Add is called, so I'm not sure if a lock can be created in that function. internal_IsHistogramEnumId cannot be locked because internal_IsRecordingEnabled, internal_KeyedHistogram_SnapshotImpl and internal_JSHistogram_Add do not contain locks. internal_JSKeyedHistogram_Clear and TelemetryHistogram::GetHistogramName have locks, but they are created after internal_IsHistogramEnumId is called. In internal_JSKeyedHistogram_Clear the lock is created right after internal_IsHistogramEnumId is called, so that could be moved. In TelemetryHistogram::GetHistogramName, the same could probable happen as there is only an NS_WARN_IF and MOZ_ASSERT_UNREACHABLE call before the lock is created. internal_IsRecordingEnabled has the problem listed above, which is the same for caller functions regarding internal_KeyedHistogram_SnapshotImpl and internal_JSHistogram_Add. I see that Bug 1457127 may change the code further, so I will keep an eye on it. I had merged changes that had been made already, but can understand that it may make it difficult for others if multiple people are working on the same file.

Alessio Placitelli [:Dexter]

Comment 41

•

7 years ago

Hi Michael, bug 1457127 landed and this bug is now unblocked :) Feel free to rebase and work on it again!

Flags: needinfo?(mcalabrese85)

Comment hidden (mozreview-request)

Michael Calabrese

Assignee

Updated

•

7 years ago

Attachment #8976371 - Attachment is obsolete: true

Michael Calabrese

Assignee

Comment 43

•

7 years ago

Hi Alessio, I updated the patch to work with the new code. As far as I can tell the new code did not require additional functions to use the lock as a parameter (or at least none that you did not already implement in the other patches for Bug 1457127), but please let me if this is incorrect.

Flags: needinfo?(mcalabrese85)

Alessio Placitelli [:Dexter]

Comment 44

•

7 years ago

mozreview-review

Comment on attachment 8979820 [details] Bug 1362953 - Require lock proof in TelemetryHistogram internal functions where possible. https://reviewboard.mozilla.org/r/245986/#review252048 Hi Michael, this looks like it's almost there. A few whitespaces were introduced mistakenly, so let's get rid of these and we should be done! ::: toolkit/components/telemetry/TelemetryHistogram.cpp:398 (Diff revision 1) > return kh; > } > > // Look up a histogram id from a histogram name. > nsresult > -internal_GetHistogramIdByName(const nsACString& name, HistogramID* id) > +internal_GetHistogramIdByName(const StaticMutexAutoLock& aLock, nit: there's a trailing space at the end of this ::: toolkit/components/telemetry/TelemetryHistogram.cpp:413 (Diff revision 1) > } > > // Clear a histogram from storage. > void > -internal_ClearHistogramById(HistogramID histogramId, ProcessID processId) > +internal_ClearHistogramById(const StaticMutexAutoLock& aLock, > + HistogramID histogramId, nit: there's a trailing space at the end of this ::: toolkit/components/telemetry/TelemetryHistogram.cpp:468 (Diff revision 1) > { > return h == gExpiredHistogram; > } > > void > -internal_SetHistogramRecordingEnabled(HistogramID id, bool aEnabled) > +internal_SetHistogramRecordingEnabled(const StaticMutexAutoLock& aLock, nit: there's a trailing space at the end of this ::: toolkit/components/telemetry/TelemetryHistogram.cpp:639 (Diff revision 1) > > return h; > } > > nsresult > -internal_HistogramAdd(Histogram& histogram, > +internal_HistogramAdd(const StaticMutexAutoLock& aLock, nit: there's a trailing space at the end of this ::: toolkit/components/telemetry/TelemetryHistogram.cpp:1225 (Diff revision 1) > MOZ_ASSERT(keyed); > keyed->Add(aKey, aSample, ProcessID::Parent); > } > > void > -internal_AccumulateChild(ProcessID aProcessType, HistogramID aId, uint32_t aSample) > +internal_AccumulateChild(const StaticMutexAutoLock& aLock, nit: there's a trailing space at the end of this

Attachment #8979820 - Flags: review?(alessio.placitelli)

Michael Calabrese

Assignee

Comment 45

•

7 years ago

Hello Alessio, Thank you for your review. Is the trailing space what was being shown in the diff as a solid red block? I wasn't sure what it meant, thought it just meant an additional line break was added. And when I push the commit for review, do I just mention that trailing whitespace was removed or should I treat it as a new patch that is used in place of the one previously submitted?

Alessio Placitelli [:Dexter]

Comment 46

•

7 years ago

(In reply to Michael Calabrese from comment #45) > Hello Alessio, > > Thank you for your review. Is the trailing space what was being shown in the > diff as a solid red block? I wasn't sure what it meant, thought it just > meant an additional line break was added. Yes, the solid red blocks :) > And when I push the commit for > review, do I just mention that trailing whitespace was removed or should I > treat it as a new patch that is used in place of the one previously > submitted? You can amend the commit and push again the updated version. With the current commit as the "tip" of your repo, you can do this: > hg log -r . To check that your commit is the most recent one. Then you take off the whitespaces. After that, you do > hg diff And verify that you only fixed the whitespaces. Finally > hg commit --amend > hg push review

Comment hidden (mozreview-request)

Alessio Placitelli [:Dexter]

Comment 48

•

7 years ago

mozreview-review

Comment on attachment 8979820 [details] Bug 1362953 - Require lock proof in TelemetryHistogram internal functions where possible. https://reviewboard.mozilla.org/r/245986/#review252800 This looks good to me now, thank you for your efforts and patience!

Attachment #8979820 - Flags: review?(alessio.placitelli) → review+

Alessio Placitelli [:Dexter]

Comment 49

•

7 years ago

Hey Michael, looks like this is failing to build: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b74f2f8576bef4a44cebdd53f5f1cba147293bbe&selectedJob=180188595 Would you kindly take a look at what's going on?

Flags: needinfo?(mcalabrese85)

Michael Calabrese

Assignee

Comment 50

•

7 years ago

Hey Alessio, Looks like I didn't look over some of the functions that were added closely enough after all. I didn't catch them because I have only been testing on Windows so far, but I will be changing that habit in the future. The patch is now updated and builds and runs correctly on my Android phone and tablet. Unfortunately, I cannot run the automated tests for some reason. I'm not sure if it is because I'm running in a virtual machine or because I need to have the devices rooted at the moment. I'll have to look into it more over the weekend and try to figure it out before I push the revised patch.

Flags: needinfo?(mcalabrese85)

Alessio Placitelli [:Dexter]

Comment 51

•

7 years ago

(In reply to Michael Calabrese from comment #50) > Hey Alessio, > > Looks like I didn't look over some of the functions that were added closely > enough after all. I didn't catch them because I have only been testing on > Windows so far, but I will be changing that habit in the future. The patch > is now updated and builds and runs correctly on my Android phone and tablet. Ah, no problem for that. I think you can even trigger a try push for these platforms yourself from MozReview! From "Automation" pick "Trigger a try build", then on the left select the platform (both debug and optimized) and finally the test suite. I would go for xpcshell,gtest,mochitest-e10s-bc ! > Unfortunately, I cannot run the automated tests for some reason. I'm not > sure if it is because I'm running in a virtual machine or because I need to > have the devices rooted at the moment. I'll have to look into it more over > the weekend and try to figure it out before I push the revised patch. It could be because of both things :) Don't worry, I can push to try if you cannot!

Flags: needinfo?(mcalabrese85)

Comment hidden (mozreview-request)

Michael Calabrese

Assignee

Comment 53

•

7 years ago

Hey Alessio, No luck with getting the tests to work on my computer. I'm sure it's something small I'm not doing right, but I'll have to wait to jump onto the IRC channel to see if someone can give me a hand. In the meantime, I pushed the revised patch. I cannot trigger a try build on my own, but I will keep an eye out on the results if you can do it for me.

Alessio Placitelli [:Dexter]

Comment 54

•

7 years ago

(In reply to Michael Calabrese from comment #53) > Hey Alessio, > > No luck with getting the tests to work on my computer. I'm sure it's > something small I'm not doing right, but I'll have to wait to jump onto the > IRC channel to see if someone can give me a hand. No worries, looks like you dealt with the problem the right way :) Unfortunately working on Android is still a bit rough on the edges :) > In the meantime, I pushed the revised patch. I cannot trigger a try build on > my own, but I will keep an eye out on the results if you can do it for me. I've done that (https://treeherder.mozilla.org/#/jobs?repo=try&revision=eeda541f14bb70c3bcd453afb3efcfbdf8d0b657&selectedJob=180668966), looks like this is good to land now!

Flags: needinfo?(mcalabrese85)

Pulsebot

Comment 55

•

7 years ago

Pushed by alessio.placitelli@gmail.com: https://hg.mozilla.org/integration/autoland/rev/9f9b17412a65 Require lock proof in TelemetryHistogram internal functions where possible. r=Dexter

Noemi Erli[:noemi_erli]

Comment 56

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/9f9b17412a65

Status: NEW → RESOLVED

Closed: 7 years ago

status-firefox62: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla62

Proposed patch for bug 1362953. 8 years ago Vedant Sareen [:fionn_mac] 49.35 KB, patch	chutten : review-	Details \| Diff \| Splinter Review
Rectified patch for bug 1362953. 8 years ago Vedant Sareen [:fionn_mac] 20.56 KB, patch	chutten : review-	Details \| Diff \| Splinter Review
Cleaned up patch for bug 1362953. 8 years ago Vedant Sareen [:fionn_mac] 19.72 KB, patch	chutten : review+	Details \| Diff \| Splinter Review
Bug 1362953 - Refactored lock-protected TelemetryHistogram code to be more similar to other sections of code. 7 years ago Michael Calabrese 59 bytes, text/x-review-board-request		Details
Bug 1362953 - Require lock proof in TelemetryHistogram internal functions where possible. 7 years ago Michael Calabrese 59 bytes, text/x-review-board-request	Dexter : review+	Details