Closed
Bug 1342306
Opened 7 years ago
Closed 7 years ago
Properly synchronize the global state in platform.cpp
Categories
(Core :: Gecko Profiler, defect)
Core
Gecko Profiler
Tracking
()
RESOLVED
FIXED
mozilla55
People
(Reporter: n.nethercote, Assigned: n.nethercote)
References
Details
Attachments
(3 files, 5 obsolete files)
40.87 KB,
patch
|
n.nethercote
:
review+
|
Details | Diff | Splinter Review |
27.51 KB,
patch
|
mstange
:
review+
|
Details | Diff | Splinter Review |
142.61 KB,
patch
|
mstange
:
review+
|
Details | Diff | Splinter Review |
I plan to overhaul the handling of all the global state in platform.cpp so that it's properly and obviously synchronized across threads. It's possible to protect it all with a single mutex. This will go a long way towards bug 1330184.
Assignee | ||
Comment 1•7 years ago
|
||
Because profiler_time() is going to need the global lock when I add it, and the lock will already be held when streaming is happening, so it'll cause the thread to deadlock itself. Unfortunately this requires adding an |aStartTime| parameter to a lot of functions, but this is the least worst way I can think of handling it. This also removes the need for one of the profiler_time() functions, which the patch removes.
Attachment #8842724 -
Flags: review?(mstange)
Assignee | ||
Comment 2•7 years ago
|
||
Apologies for the hard-to-review patch. Lots of got unavoidably moved things around. ---- This makes it more like platform-{macos,win32}.cpp, and will make it easier to introduce locking around all the globals in platform.cpp. The following things were moved into the new class. - Types: - SigHandlerCoordinator - Functions: - SigProfHandler() - SigprofSender() as Run() - Variables: - gOldSigprofHandler as mOldSigprofHandler - gSigprofSenderThread as mThread - gIntervalMicro as mIntervalMicro - gSigHandlerCoordinator as sSigHandlerCoordinator sInstance is the singleton instance. PlatformStart() and PlatformStop() now just delegate to StartSampler/StopSampler. The patch also does the following tidy-ups. Linux: - gWasPaused is now cleared in the parent and child processes after forking. - LUL: inlined and removed gLUL_initialization_routine(). - LUL: now only calling EnabledUnwinding() and doing the unit tests on the first call to profiler_start(), instead of all of them. Mac: - Removed a useless call to pthread_self() -- mThread is already set by the earlier call to pthread_create(). - Removed some low-value checking involving kNoThread. Mac + Win: - Renamed SamplerThread::mInstance as sInstance, because it's static. - Merged SamplerThread::Start() with the constructor. - Tweaked how mInterval/mIntervalMicro is initialized, so it can become const. All platforms: - Converted NULL to nullptr throughout. - A few other very minor clean-ups, e.g. formatting.
Attachment #8842751 -
Flags: review?(jseward)
Assignee | ||
Comment 3•7 years ago
|
||
Here's the big one. Julian, I'm f?ing you in case you're interested, particularly for the SamplerThread stuff we talked about, and if you want to do a TSan run. ---- This patch properly synchronizes all the global state in platform*.cpp, which gets us a long way towards implementing bug 1330184. - Most of the global state goes in a new class, ProfilerState, with a single instance, gPS. All accesses to gPS are protected by gPSMutex. All functions that access ProfilerState require a token proving that gPS is locked; this makes things much clearer. gRegisteredThreadsMutex is removed because it is subsumed by gPSMutex. - gVerbosity, however, does not go in ProfilerState. It stays separate, and gains its own mutex, gVerbosityMutex. Also, the tracking of the current profiler state is streamlined. Previously it was tracked via: - stack_key_initialized, gInitCount, gSampler, gIsProfiling, gIsActive, and gIsPaused. Now it is tracked via: - gPS, gPS->sActivity, and gPS->mIsPaused. This means that the Sampler class is no longer necessary, and the patch removes it. Other changes of note made by the patch are as follows. - It removes ThreadInfo::{mMutex,GetMutex}. This mutex was only used in two places, and both these are now protected by gPSMutex. - It tweaks the LOG calls. All the main functions (init(), shutdown(), start(), stop()) now do consistent BEGIN/END logging, and a couple of other low-value incidental LOG calls have been removed. - It adds a lot of release assertions requiring that gPS be initialized (e.g. profiler_init() has been called but profiler_shutdown() has not). - It uses alphabetical order for everything involving profiler feature names. - It removes Platform{Start,Stop}() and SamplerThread::{Start,Stop}Sampler(). These are no longer necessary now that SamplerThread::sInstance has been replaced with ProfilerState::mSamplerThread which allows more direct access to the current SamplerThread instance. - It removes PseudoStack::mPrivacyMode. This was derived from the "privacy" feature, and we now use gPS->mFeaturePrivacy directly, which is simpler. Finally, the new code does more locking than the old one. The following operations now lock a mutex when they previously didn't. - SamplerThread::Run() gets gPSMutex each time around even when the profiler is paused. Previously, it didn't get gRegisteredThreadsMutex in that case. This is necessary for the deleteExpiredStoredMarkers() call. - profiler_log() - profiler_will_gather_OOP_profile() - profiler_gathered_OOP_profile() - profiler_OOP_exit_profile() - profiler_get_buffer_info_helper() - profiler_is_paused() - profiler_pause() - profiler_resume() - profiler_feature_active() - profiler_is_active() - profiler_set_frame_number() - profiler_time() - profiler_in_privacy_mode() - profiler_get_backtrace() - profiler_add_marker() - profiler_tracing() None of these operations is especially hot; e.g. they are all far less frequent than the locking done in Run() when not paused. Furthermore, due to the removal of ThreadInfo::mMutex, all the following operations do less locking. - profiler_get_profile() - profiler_get_profile_jsobject() - profiler_get_profile_jsobject_async() - profiler_save_profiler_to_file_async() - profiler_save_profiler_to_file()
Attachment #8842752 -
Flags: review?(mstange)
Attachment #8842752 -
Flags: feedback?(jseward)
Comment 4•7 years ago
|
||
Comment on attachment 8842751 [details] [diff] [review] (part 2) - Introduced SamplerThread class in platform-linux-android.cpp Review of attachment 8842751 [details] [diff] [review]: ----------------------------------------------------------------- Looks fine to me. ::: tools/profiler/core/platform-linux-android.cpp @@ +122,5 @@ > gIsPaused = gWasPaused; > + gWasPaused = false; > +} > + > +// In the child, after the fork, leave paused. "leave paused" sounds strange. Is the comment incomplete? @@ +680,5 @@ > sa.sa_sigaction = StartSignalHandler; > sigemptyset(&sa.sa_mask); > sa.sa_flags = SA_RESTART | SA_SIGINFO; > if (sigaction(SIGSTART, &sa, &gOldSigstartHandler) != 0) { > + MOZ_CRASH("Error installing signal"); nit: the message is a bit vague. Maybe "Error installing SIGPROF for Gecko Profiler" ?
Attachment #8842751 -
Flags: review?(jseward) → review+
Comment 5•7 years ago
|
||
Comment on attachment 8842752 [details] [diff] [review] (part 3) - Properly synchronize the global state in platform*.cpp Review of attachment 8842752 [details] [diff] [review]: ----------------------------------------------------------------- It will be a great day for the profiler when this lands. From a read through the patch, I have three nits and a this-doesn't-look-quite-right. But on the whole, good. ::: tools/profiler/core/platform-linux-android.cpp @@ +555,5 @@ > } > > private: > + // The activity generation, for detecting when the sampler thread must stop. > + const uint32_t mActivity; Could you call this mGeneration or otherwise work the notion of generation or sequence-number into the name? I find that mActivity doesn't hint well at what it's for. @@ +622,5 @@ > char* value; > char* savePtr; > > if (file) { > + PS::AutoLock lock(gPSMutex); I wound up staring at this and wondering if we can deadlock here (in the strtok etc calls) against a (samplee) thread that has been suspended by the sampler loop. But I don't think we can, since the sampler loop has to hold gPSMutex any time a thread is suspended. Is that correct? ::: tools/profiler/core/platform-macos.cpp @@ +119,5 @@ > TimeDuration lastSleepOverhead = 0; > TimeStamp sampleStart = TimeStamp::Now(); > > + while (true) { > + { Please add a comment here to say this is a new scope specifically for |lock|. I checked the end of the loop to verify that the lock isn't held during the sleep and was confused about the scopes, until I saw the two opening braces at the head of the loop. ::: tools/profiler/core/platform.cpp @@ +1476,5 @@ > // Force-enable verbosity so that LOG prints something. > + { > + StaticMutexAutoLock lock(gVerbosityMutex); > + gVerbosity = Verbosity::VERBOSE; > + } Are you sure this is right? I would have thought you'd have to hold |lock| until after the last LOG call that follows.
Attachment #8842752 -
Flags: feedback?(jseward) → feedback+
Assignee | ||
Comment 6•7 years ago
|
||
Thank you for the review comments. Did you run the patch some more? > > + // The activity generation, for detecting when the sampler thread must stop. > > + const uint32_t mActivity; > > Could you call this mGeneration or otherwise work the notion > of generation or sequence-number into the name? I find that > mActivity doesn't hint well at what it's for. Ok. I'll probably go with mActivityGeneration, and rename ProfilerState::mActivity likewise. > @@ +622,5 @@ > > char* value; > > char* savePtr; > > > > if (file) { > > + PS::AutoLock lock(gPSMutex); > > I wound up staring at this and wondering if we can deadlock > here (in the strtok etc calls) against a (samplee) thread > that has been suspended by the sampler loop. But I don't > think we can, since the sampler loop has to hold gPSMutex > any time a thread is suspended. Is that correct? Yes. I don't think this case is fundamentally different from any of the other places that lock gPSMutex. > > + while (true) { > > + { > > Please add a comment here to say this is a new scope > specifically for |lock|. I checked the end of the loop > to verify that the lock isn't held during the sleep and > was confused about the scopes, until I saw the two opening > braces at the head of the loop. Ha, I did exactly that at one point but removed it. I'll put it back :) I do have a comment at the end of the scope. > > // Force-enable verbosity so that LOG prints something. > > + { > > + StaticMutexAutoLock lock(gVerbosityMutex); > > + gVerbosity = Verbosity::VERBOSE; > > + } > > Are you sure this is right? I would have thought you'd > have to hold |lock| until after the last LOG call that > follows. LOG expands to |if (profiler_verbose()) ...print...|, and profiler_verbose() locks gVerbosityMutex, reads it, then unlocks it. So every LOG call locks and unlocks gVerbosityMutex. Which isn't efficient but there's only ~20 LOG calls.
Assignee | ||
Comment 7•7 years ago
|
||
Updated part 2, with comments addressed, and renamed as part 1 because it's ready to land.
Assignee | ||
Updated•7 years ago
|
Attachment #8842751 -
Attachment is obsolete: true
Assignee | ||
Comment 8•7 years ago
|
||
Comment on attachment 8843498 [details] [diff] [review] (part 1) - Introduce SamplerThread class in platform-linux-android.cpp Review of attachment 8843498 [details] [diff] [review]: ----------------------------------------------------------------- Carrying over jseward's r+.
Attachment #8843498 -
Flags: review+
Assignee | ||
Comment 9•7 years ago
|
||
This was part 1, has now been renamed part 2.
Attachment #8843499 -
Flags: review?(mstange)
Assignee | ||
Updated•7 years ago
|
Attachment #8842724 -
Attachment is obsolete: true
Attachment #8842724 -
Flags: review?(mstange)
Assignee | ||
Comment 10•7 years ago
|
||
This new version addresses Julian's comments and also is updated for bug 1344118's changes.
Attachment #8843502 -
Flags: review?(mstange)
Assignee | ||
Updated•7 years ago
|
Attachment #8842752 -
Attachment is obsolete: true
Attachment #8842752 -
Flags: review?(mstange)
Updated•7 years ago
|
Attachment #8843499 -
Flags: review?(mstange) → review+
Assignee | ||
Comment 11•7 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/54a86d54e1c131ba32d7f5cb40651c44cfa4ed9f Bug 1342306 (part 1) - Introduce SamplerThread class in platform-linux-android.cpp. r=jseward. https://hg.mozilla.org/integration/mozilla-inbound/rev/bd730342c51508fd903a657030ae281163bf2c90 Bug 1342306 (part 2) - Avoid profiler_time() calls deep in profiler streaming code. r=mstange.
Assignee | ||
Updated•7 years ago
|
Whiteboard: leave-open
Comment 12•7 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #6) > Did you run the patch some more? Yes, I ran it more with a gcc -Og non-debug build and it appears to work. > > > // Force-enable verbosity so that LOG prints something. > > > + { > > > + StaticMutexAutoLock lock(gVerbosityMutex); > > > + gVerbosity = Verbosity::VERBOSE; > > > + } > > > > Are you sure this is right? I would have thought you'd > > have to hold |lock| until after the last LOG call that > > follows. > > LOG expands to |if (profiler_verbose()) ...print...|, and profiler_verbose() > locks gVerbosityMutex, reads it, then unlocks it. So every LOG call locks > and unlocks gVerbosityMutex. Argh. Then it's an atomicity-level problem. There's nothing to stop some other thread changing it in between the LOG calls. Not likely to happen, and not important, though -- let's ignore it.
Assignee | ||
Comment 13•7 years ago
|
||
(In reply to Julian Seward [:jseward] from comment #12) > > Argh. Then it's an atomicity-level problem. There's nothing to > stop some other thread changing it in between the LOG calls. > Not likely to happen, and not important, though -- let's ignore it. Oh, interesting. But it actually works out. gVerbosity is only set in two places: - In the first call to profiler_verbose() -- which is called by LOG -- it's set according to the value (or non-value) of MOZ_PROFILER_VERBOSE, but only if it hasn't already been set. - In profiler_usage(), it's set to VERBOSE. And it terminates execution at the function end. So it can't change between the LOG calls with the current structure. But this is non-obvious. I might add a comment about it.
Assignee | ||
Comment 14•7 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/d01cc63b45043da203c2324c14658f8c5854156e Bug 1342306 (part 2a) - Fix a comment that part 2 made out-of-date. r=me.
Comment 15•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/54a86d54e1c1 https://hg.mozilla.org/mozilla-central/rev/bd730342c515 https://hg.mozilla.org/mozilla-central/rev/d01cc63b4504
Assignee | ||
Comment 16•7 years ago
|
||
A minor update: changes profiler_in_privacy_mode() to profiler_is_active_and_not_in_privacy_mode() because it was always called in conjunction with profiler_is_active() and that avoids locking and unlocking gPSMutex twice on a moderately hot path.
Attachment #8844148 -
Flags: review?(mstange)
Assignee | ||
Updated•7 years ago
|
Attachment #8843502 -
Attachment is obsolete: true
Attachment #8843502 -
Flags: review?(mstange)
Assignee | ||
Comment 17•7 years ago
|
||
Latest version slightly improves how profiler_tracing() and friends work. It also has more detail about the extra locking in the commit message.
Attachment #8844324 -
Flags: review?(mstange)
Assignee | ||
Updated•7 years ago
|
Attachment #8844148 -
Attachment is obsolete: true
Attachment #8844148 -
Flags: review?(mstange)
Comment 18•7 years ago
|
||
Comment on attachment 8844148 [details] [diff] [review] (part 3) - Properly synchronize the global state in platform*.cpp Review of attachment 8844148 [details] [diff] [review]: ----------------------------------------------------------------- Phew, that was a lot! Things are starting to look really good now. ::: tools/profiler/core/platform-macos.cpp @@ +127,4 @@ > > + // See the corresponding code in platform-linux-android.cpp for an > + // explanation of what's happening here. > + if (PS::Activity(lock) != mActivityGeneration) { This PS::Activity needs to be renamed to PS::ActivityGeneration. ::: tools/profiler/core/platform-win32.cpp @@ +130,5 @@ > } > > + void Stop(PS::LockRef aLock) { > + // Disable any timer resolution changes we've made. Do it now while > + // gPSMutex is locked. It's safe to do this now even though this This makes it sound like gPSMutex being locked is important for disabling the timer resolution change. Is that the case? @@ +154,4 @@ > > + // See the corresponding code in platform-linux-android.cpp for an > + // explanation of what's happening here. > + if (PS::Activity(lock) != mActivityGeneration) { Another PS::Activity that needs to be renamed. ::: tools/profiler/core/platform.cpp @@ +2013,5 @@ > + > + // ...therefore we don't need to lock gPSMutex. But we need a PS::AutoLock to > + // access gPS, so we make a fake one. This is gross but it's hard to get the > + // "profile-subprocess-gather" observers to call back here any other way > + // without exposing ProfileGatherer, which causes other difficulties. Oh dear. ::: tools/profiler/public/GeckoProfiler.h @@ +210,5 @@ > +// case the profiler_is_active() check is just an optimization -- it prevents > +// us calling CreateExpensiveData() unnecessarily in most cases, but the > +// expensive data will end up being created but not used if another thread > +// stops the profiler between the CreateExpensiveData() and PROFILER_OPERATION > +// calls. I like this comment. I also like all the other comments you're adding in this patch.
Attachment #8844148 -
Attachment is obsolete: false
Updated•7 years ago
|
Attachment #8844148 -
Attachment is obsolete: true
Updated•7 years ago
|
Attachment #8844324 -
Flags: review?(mstange) → review+
Assignee | ||
Comment 19•7 years ago
|
||
> > + void Stop(PS::LockRef aLock) {
> > + // Disable any timer resolution changes we've made. Do it now while
> > + // gPSMutex is locked. It's safe to do this now even though this
>
> This makes it sound like gPSMutex being locked is important for disabling
> the timer resolution change. Is that the case?
Yes, because otherwise it's possible that you might start another SamplerThread, call ::timeBeginPeriod() in that new SamplerThread, and then call this pending ::timeEndPeriod() in the stopping SamplerThread. And then the timer resolution for the new SamplerThread would be wrong. I'll update the comment to clarify this.
Comment 20•7 years ago
|
||
The documentation at https://msdn.microsoft.com/en-us/library/windows/desktop/dd757624(v=vs.85).aspx makes it sound like the API handles overlapping start/end calls in the way you'd want: It matches up start and end calls with the same numbers, and always sets the system time resolution to the minimum that's required by all currently active "periods". So it seems that this lock isn't really necessary. We can still keep the code in that function, but maybe the comment should be rephrased so that it no longer implies that locking the mutex is required.
Assignee | ||
Comment 21•7 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/9061c4b3f7aef7f6535828f05816159dae80763b Bug 1342306 (part 3) - Properly synchronize the global state in platform*.cpp. r=mstange.
Comment 22•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/9061c4b3f7ae
Assignee | ||
Updated•7 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Whiteboard: leave-open
Updated•7 years ago
|
status-firefox55:
--- → fixed
Target Milestone: --- → mozilla55
Updated•7 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•