Properly synchronize the global state in platform.cpp

RESOLVED FIXED in Firefox 55

Status

()

Core
Gecko Profiler
RESOLVED FIXED
6 months ago
3 months ago

People

(Reporter: njn, Assigned: njn)

Tracking

Trunk
mozilla55
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox54 wontfix, firefox55 fixed)

Details

Attachments

(3 attachments, 5 obsolete attachments)

(Assignee)

Description

6 months ago
I plan to overhaul the handling of all the global state in platform.cpp so that it's properly and obviously synchronized across threads. It's possible to protect it all with a single mutex.

This will go a long way towards bug 1330184.
(Assignee)

Updated

6 months ago
Depends on: 1328378
(Assignee)

Comment 1

6 months ago
Created attachment 8842724 [details] [diff] [review]
(part 1) - Avoid profiler_time() calls deep in profiler streaming code

Because profiler_time() is going to need the global lock when I add it, and the
lock will already be held when streaming is happening, so it'll cause the
thread to deadlock itself.

Unfortunately this requires adding an |aStartTime| parameter to a lot of
functions, but this is the least worst way I can think of handling it.

This also removes the need for one of the profiler_time() functions, which the
patch removes.
Attachment #8842724 - Flags: review?(mstange)
(Assignee)

Comment 2

6 months ago
Created attachment 8842751 [details] [diff] [review]
(part 2) - Introduced SamplerThread class in platform-linux-android.cpp

Apologies for the hard-to-review patch. Lots of got unavoidably moved things
around.

----

This makes it more like platform-{macos,win32}.cpp, and will make it easier to
introduce locking around all the globals in platform.cpp.

The following things were moved into the new class.

- Types:
  - SigHandlerCoordinator

- Functions:
  - SigProfHandler()
  - SigprofSender() as Run()

- Variables:
  - gOldSigprofHandler as mOldSigprofHandler
  - gSigprofSenderThread as mThread
  - gIntervalMicro as mIntervalMicro
  - gSigHandlerCoordinator as sSigHandlerCoordinator

sInstance is the singleton instance. PlatformStart() and PlatformStop() now
just delegate to StartSampler/StopSampler.

The patch also does the following tidy-ups.

Linux:
- gWasPaused is now cleared in the parent and child processes after forking.
- LUL: inlined and removed gLUL_initialization_routine().
- LUL: now only calling EnabledUnwinding() and doing the unit tests on the
  first call to profiler_start(), instead of all of them.

Mac:
- Removed a useless call to pthread_self() -- mThread is already set by the
  earlier call to pthread_create().
- Removed some low-value checking involving kNoThread.

Mac + Win:
- Renamed SamplerThread::mInstance as sInstance, because it's static.
- Merged SamplerThread::Start() with the constructor.
- Tweaked how mInterval/mIntervalMicro is initialized, so it can become const.

All platforms:
- Converted NULL to nullptr throughout.
- A few other very minor clean-ups, e.g. formatting.
Attachment #8842751 - Flags: review?(jseward)
(Assignee)

Comment 3

6 months ago
Created attachment 8842752 [details] [diff] [review]
(part 3) - Properly synchronize the global state in platform*.cpp

Here's the big one. Julian, I'm f?ing you in case you're interested,
particularly for the SamplerThread stuff we talked about, and if you want to do
a TSan run.

----

This patch properly synchronizes all the global state in platform*.cpp, which
gets us a long way towards implementing bug 1330184.

- Most of the global state goes in a new class, ProfilerState, with a single
  instance, gPS. All accesses to gPS are protected by gPSMutex. All functions
  that access ProfilerState require a token proving that gPS is locked; this
  makes things much clearer.

  gRegisteredThreadsMutex is removed because it is subsumed by gPSMutex.

- gVerbosity, however, does not go in ProfilerState. It stays separate, and
  gains its own mutex, gVerbosityMutex.

Also, the tracking of the current profiler state is streamlined. Previously it
was tracked via:

- stack_key_initialized, gInitCount, gSampler, gIsProfiling, gIsActive, and
  gIsPaused.

Now it is tracked via:

- gPS, gPS->sActivity, and gPS->mIsPaused.

This means that the Sampler class is no longer necessary, and the patch removes
it.

Other changes of note made by the patch are as follows.

- It removes ThreadInfo::{mMutex,GetMutex}. This mutex was only used in two
  places, and both these are now protected by gPSMutex.

- It tweaks the LOG calls. All the main functions (init(), shutdown(), start(),
  stop()) now do consistent BEGIN/END logging, and a couple of other low-value
  incidental LOG calls have been removed.

- It adds a lot of release assertions requiring that gPS be initialized (e.g.
  profiler_init() has been called but profiler_shutdown() has not).

- It uses alphabetical order for everything involving profiler feature names.

- It removes Platform{Start,Stop}() and SamplerThread::{Start,Stop}Sampler().
  These are no longer necessary now that SamplerThread::sInstance has been
  replaced with ProfilerState::mSamplerThread which allows more direct access
  to the current SamplerThread instance.

- It removes PseudoStack::mPrivacyMode. This was derived from the "privacy"
  feature, and we now use gPS->mFeaturePrivacy directly, which is simpler.

Finally, the new code does more locking than the old one. The following
operations now lock a mutex when they previously didn't.

- SamplerThread::Run() gets gPSMutex each time around even when the profiler is
  paused. Previously, it didn't get gRegisteredThreadsMutex in that case. This
  is necessary for the deleteExpiredStoredMarkers() call.
- profiler_log()
- profiler_will_gather_OOP_profile()
- profiler_gathered_OOP_profile()
- profiler_OOP_exit_profile()
- profiler_get_buffer_info_helper()
- profiler_is_paused()
- profiler_pause()
- profiler_resume()
- profiler_feature_active()
- profiler_is_active()
- profiler_set_frame_number()
- profiler_time()
- profiler_in_privacy_mode()
- profiler_get_backtrace()
- profiler_add_marker()
- profiler_tracing()

None of these operations is especially hot; e.g. they are all far less frequent
than the locking done in Run() when not paused. Furthermore, due to the removal
of ThreadInfo::mMutex, all the following operations do less locking.

- profiler_get_profile()
- profiler_get_profile_jsobject()
- profiler_get_profile_jsobject_async()
- profiler_save_profiler_to_file_async()
- profiler_save_profiler_to_file()
Attachment #8842752 - Flags: review?(mstange)
Attachment #8842752 - Flags: feedback?(jseward)
Comment on attachment 8842751 [details] [diff] [review]
(part 2) - Introduced SamplerThread class in platform-linux-android.cpp

Review of attachment 8842751 [details] [diff] [review]:
-----------------------------------------------------------------

Looks fine to me.

::: tools/profiler/core/platform-linux-android.cpp
@@ +122,5 @@
>    gIsPaused = gWasPaused;
> +  gWasPaused = false;
> +}
> +
> +// In the child, after the fork, leave paused.

"leave paused" sounds strange.  Is the comment incomplete?

@@ +680,5 @@
>    sa.sa_sigaction = StartSignalHandler;
>    sigemptyset(&sa.sa_mask);
>    sa.sa_flags = SA_RESTART | SA_SIGINFO;
>    if (sigaction(SIGSTART, &sa, &gOldSigstartHandler) != 0) {
> +    MOZ_CRASH("Error installing signal");

nit: the message is a bit vague.  Maybe "Error installing SIGPROF for Gecko Profiler" ?
Attachment #8842751 - Flags: review?(jseward) → review+
Comment on attachment 8842752 [details] [diff] [review]
(part 3) - Properly synchronize the global state in platform*.cpp

Review of attachment 8842752 [details] [diff] [review]:
-----------------------------------------------------------------

It will be a great day for the profiler when this lands.

From a read through the patch, I have three nits and a
this-doesn't-look-quite-right.  But on the whole, good.

::: tools/profiler/core/platform-linux-android.cpp
@@ +555,5 @@
>    }
>  
>  private:
> +  // The activity generation, for detecting when the sampler thread must stop.
> +  const uint32_t mActivity;

Could you call this mGeneration or otherwise work the notion
of generation or sequence-number into the name?  I find that
mActivity doesn't hint well at what it's for.

@@ +622,5 @@
>    char* value;
>    char* savePtr;
>  
>    if (file) {
> +    PS::AutoLock lock(gPSMutex);

I wound up staring at this and wondering if we can deadlock
here (in the strtok etc calls) against a (samplee) thread
that has been suspended by the sampler loop.  But I don't
think we can, since the sampler loop has to hold gPSMutex
any time a thread is suspended.  Is that correct?

::: tools/profiler/core/platform-macos.cpp
@@ +119,5 @@
>      TimeDuration lastSleepOverhead = 0;
>      TimeStamp sampleStart = TimeStamp::Now();
>  
> +    while (true) {
> +      {

Please add a comment here to say this is a new scope
specifically for |lock|.  I checked the end of the loop
to verify that the lock isn't held during the sleep and
was confused about the scopes, until I saw the two opening
braces at the head of the loop.

::: tools/profiler/core/platform.cpp
@@ +1476,5 @@
>    // Force-enable verbosity so that LOG prints something.
> +  {
> +    StaticMutexAutoLock lock(gVerbosityMutex);
> +    gVerbosity = Verbosity::VERBOSE;
> +  }

Are you sure this is right?  I would have thought you'd
have to hold |lock| until after the last LOG call that
follows.
Attachment #8842752 - Flags: feedback?(jseward) → feedback+
(Assignee)

Comment 6

6 months ago
Thank you for the review comments. Did you run the patch some more?


> > +  // The activity generation, for detecting when the sampler thread must stop.
> > +  const uint32_t mActivity;
> 
> Could you call this mGeneration or otherwise work the notion
> of generation or sequence-number into the name?  I find that
> mActivity doesn't hint well at what it's for.

Ok. I'll probably go with mActivityGeneration, and rename ProfilerState::mActivity likewise.


> @@ +622,5 @@
> >    char* value;
> >    char* savePtr;
> >  
> >    if (file) {
> > +    PS::AutoLock lock(gPSMutex);
> 
> I wound up staring at this and wondering if we can deadlock
> here (in the strtok etc calls) against a (samplee) thread
> that has been suspended by the sampler loop.  But I don't
> think we can, since the sampler loop has to hold gPSMutex
> any time a thread is suspended.  Is that correct?

Yes. I don't think this case is fundamentally different from any of the other places that lock gPSMutex.


> > +    while (true) {
> > +      {
> 
> Please add a comment here to say this is a new scope
> specifically for |lock|.  I checked the end of the loop
> to verify that the lock isn't held during the sleep and
> was confused about the scopes, until I saw the two opening
> braces at the head of the loop.

Ha, I did exactly that at one point but removed it. I'll put it back :)  I do have a comment at the end of the scope.
 

> >    // Force-enable verbosity so that LOG prints something.
> > +  {
> > +    StaticMutexAutoLock lock(gVerbosityMutex);
> > +    gVerbosity = Verbosity::VERBOSE;
> > +  }
> 
> Are you sure this is right?  I would have thought you'd
> have to hold |lock| until after the last LOG call that
> follows.

LOG expands to |if (profiler_verbose()) ...print...|, and profiler_verbose() locks gVerbosityMutex, reads it, then unlocks it. So every LOG call locks and unlocks gVerbosityMutex. Which isn't efficient but there's only ~20 LOG calls.
(Assignee)

Comment 7

6 months ago
Created attachment 8843498 [details] [diff] [review]
(part 1) - Introduce SamplerThread class in platform-linux-android.cpp

Updated part 2, with comments addressed, and renamed as part 1 because it's
ready to land.
(Assignee)

Updated

6 months ago
Attachment #8842751 - Attachment is obsolete: true
(Assignee)

Comment 8

6 months ago
Comment on attachment 8843498 [details] [diff] [review]
(part 1) - Introduce SamplerThread class in platform-linux-android.cpp

Review of attachment 8843498 [details] [diff] [review]:
-----------------------------------------------------------------

Carrying over jseward's r+.
Attachment #8843498 - Flags: review+
(Assignee)

Comment 9

6 months ago
Created attachment 8843499 [details] [diff] [review]
(part 2) - Avoid profiler_time() calls deep in profiler streaming code

This was part 1, has now been renamed part 2.
Attachment #8843499 - Flags: review?(mstange)
(Assignee)

Updated

6 months ago
Attachment #8842724 - Attachment is obsolete: true
Attachment #8842724 - Flags: review?(mstange)
(Assignee)

Comment 10

6 months ago
Created attachment 8843502 [details] [diff] [review]
(part 3) - Properly synchronize the global state in platform*.cpp

This new version addresses Julian's comments and also is updated for bug
1344118's changes.
Attachment #8843502 - Flags: review?(mstange)
(Assignee)

Updated

6 months ago
Attachment #8842752 - Attachment is obsolete: true
Attachment #8842752 - Flags: review?(mstange)
Attachment #8843499 - Flags: review?(mstange) → review+
(Assignee)

Comment 11

6 months ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/54a86d54e1c131ba32d7f5cb40651c44cfa4ed9f
Bug 1342306 (part 1) - Introduce SamplerThread class in platform-linux-android.cpp. r=jseward.

https://hg.mozilla.org/integration/mozilla-inbound/rev/bd730342c51508fd903a657030ae281163bf2c90
Bug 1342306 (part 2) - Avoid profiler_time() calls deep in profiler streaming code. r=mstange.
(Assignee)

Updated

6 months ago
Whiteboard: leave-open
(In reply to Nicholas Nethercote [:njn] from comment #6)
> Did you run the patch some more?

Yes, I ran it more with a gcc -Og non-debug build and it appears
to work.

> > >    // Force-enable verbosity so that LOG prints something.
> > > +  {
> > > +    StaticMutexAutoLock lock(gVerbosityMutex);
> > > +    gVerbosity = Verbosity::VERBOSE;
> > > +  }
> > 
> > Are you sure this is right?  I would have thought you'd
> > have to hold |lock| until after the last LOG call that
> > follows.
> 
> LOG expands to |if (profiler_verbose()) ...print...|, and profiler_verbose()
> locks gVerbosityMutex, reads it, then unlocks it. So every LOG call locks
> and unlocks gVerbosityMutex.

Argh.  Then it's an atomicity-level problem.  There's nothing to
stop some other thread changing it in between the LOG calls.
Not likely to happen, and not important, though -- let's ignore it.
(Assignee)

Comment 13

6 months ago
(In reply to Julian Seward [:jseward] from comment #12)
> 
> Argh.  Then it's an atomicity-level problem.  There's nothing to
> stop some other thread changing it in between the LOG calls.
> Not likely to happen, and not important, though -- let's ignore it.

Oh, interesting. But it actually works out. gVerbosity is only set in two places:

- In the first call to profiler_verbose() -- which is called by LOG -- it's set according to the value (or non-value) of MOZ_PROFILER_VERBOSE, but only if it hasn't already been set.

- In profiler_usage(), it's set to VERBOSE. And it terminates execution at the function end.

So it can't change between the LOG calls with the current structure. But this is non-obvious. I might add a comment about it.
(Assignee)

Comment 14

6 months ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/d01cc63b45043da203c2324c14658f8c5854156e
Bug 1342306 (part 2a) - Fix a comment that part 2 made out-of-date. r=me.

Comment 15

6 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/54a86d54e1c1
https://hg.mozilla.org/mozilla-central/rev/bd730342c515
https://hg.mozilla.org/mozilla-central/rev/d01cc63b4504
(Assignee)

Comment 16

6 months ago
Created attachment 8844148 [details] [diff] [review]
(part 3) - Properly synchronize the global state in platform*.cpp

A minor update: changes profiler_in_privacy_mode() to
profiler_is_active_and_not_in_privacy_mode() because it was always called in
conjunction with profiler_is_active() and that avoids locking and unlocking
gPSMutex twice on a moderately hot path.
Attachment #8844148 - Flags: review?(mstange)
(Assignee)

Updated

6 months ago
Attachment #8843502 - Attachment is obsolete: true
Attachment #8843502 - Flags: review?(mstange)
(Assignee)

Comment 17

6 months ago
Created attachment 8844324 [details] [diff] [review]
(part 3) - Properly synchronize the global state in platform*.cpp

Latest version slightly improves how profiler_tracing() and friends work. It
also has more detail about the extra locking in the commit message.
Attachment #8844324 - Flags: review?(mstange)
(Assignee)

Updated

6 months ago
Attachment #8844148 - Attachment is obsolete: true
Attachment #8844148 - Flags: review?(mstange)
Comment on attachment 8844148 [details] [diff] [review]
(part 3) - Properly synchronize the global state in platform*.cpp

Review of attachment 8844148 [details] [diff] [review]:
-----------------------------------------------------------------

Phew, that was a lot! Things are starting to look really good now.

::: tools/profiler/core/platform-macos.cpp
@@ +127,4 @@
>  
> +        // See the corresponding code in platform-linux-android.cpp for an
> +        // explanation of what's happening here.
> +        if (PS::Activity(lock) != mActivityGeneration) {

This PS::Activity needs to be renamed to PS::ActivityGeneration.

::: tools/profiler/core/platform-win32.cpp
@@ +130,5 @@
>    }
>  
> +  void Stop(PS::LockRef aLock) {
> +    // Disable any timer resolution changes we've made. Do it now while
> +    // gPSMutex is locked. It's safe to do this now even though this

This makes it sound like gPSMutex being locked is important for disabling the timer resolution change. Is that the case?

@@ +154,4 @@
>  
> +        // See the corresponding code in platform-linux-android.cpp for an
> +        // explanation of what's happening here.
> +        if (PS::Activity(lock) != mActivityGeneration) {

Another PS::Activity that needs to be renamed.

::: tools/profiler/core/platform.cpp
@@ +2013,5 @@
> +
> +  // ...therefore we don't need to lock gPSMutex. But we need a PS::AutoLock to
> +  // access gPS, so we make a fake one. This is gross but it's hard to get the
> +  // "profile-subprocess-gather" observers to call back here any other way
> +  // without exposing ProfileGatherer, which causes other difficulties.

Oh dear.

::: tools/profiler/public/GeckoProfiler.h
@@ +210,5 @@
> +// case the profiler_is_active() check is just an optimization -- it prevents
> +// us calling CreateExpensiveData() unnecessarily in most cases, but the
> +// expensive data will end up being created but not used if another thread
> +// stops the profiler between the CreateExpensiveData() and PROFILER_OPERATION
> +// calls.

I like this comment.

I also like all the other comments you're adding in this patch.
Attachment #8844148 - Attachment is obsolete: false
Attachment #8844148 - Attachment is obsolete: true
Attachment #8844324 - Flags: review?(mstange) → review+
(Assignee)

Comment 19

6 months ago
> > +  void Stop(PS::LockRef aLock) {
> > +    // Disable any timer resolution changes we've made. Do it now while
> > +    // gPSMutex is locked. It's safe to do this now even though this
> 
> This makes it sound like gPSMutex being locked is important for disabling
> the timer resolution change. Is that the case?

Yes, because otherwise it's possible that you might start another SamplerThread, call ::timeBeginPeriod() in that new SamplerThread, and then call this pending ::timeEndPeriod() in the stopping SamplerThread. And then the timer resolution for the new SamplerThread would be wrong. I'll update the comment to clarify this.
The documentation at https://msdn.microsoft.com/en-us/library/windows/desktop/dd757624(v=vs.85).aspx makes it sound like the API handles overlapping start/end calls in the way you'd want: It matches up start and end calls with the same numbers, and always sets the system time resolution to the minimum that's required by all currently active "periods". So it seems that this lock isn't really necessary. We can still keep the code in that function, but maybe the comment should be rephrased so that it no longer implies that locking the mutex is required.
(Assignee)

Comment 21

6 months ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/9061c4b3f7aef7f6535828f05816159dae80763b
Bug 1342306 (part 3) - Properly synchronize the global state in platform*.cpp. r=mstange.

Comment 22

6 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/9061c4b3f7ae
Depends on: 1345796
(Assignee)

Updated

6 months ago
No longer depends on: 1345796
Depends on: 1346356
Depends on: 1273422
(Assignee)

Updated

5 months ago
Status: ASSIGNED → RESOLVED
Last Resolved: 5 months ago
Resolution: --- → FIXED
Whiteboard: leave-open
Depends on: 1347274
status-firefox55: --- → fixed
Target Milestone: --- → mozilla55
Depends on: 1348374
(Assignee)

Updated

5 months ago
Depends on: 1347044
(Assignee)

Updated

5 months ago
Depends on: 1350967
Depends on: 1356249
status-firefox54: affected → wontfix
You need to log in before you can comment on or make changes to this bug.