Closed Bug 1141712 Opened 7 years ago Closed 7 years ago

Make LUL work with inplace ticking (not using the unwinder thread)

Categories

(Core :: Gecko Profiler, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla40
Tracking Status
firefox40 --- fixed

People

(Reporter: mstange, Assigned: jseward)

References

Details

Attachments

(1 file, 5 obsolete files)

We'd like to get rid of the unwinder thread, and LUL is only implemented for the unwinder thread at the moment.

Not using the unwinder thread means that stack unwinding happens inside a signal handler on the profiled thread. If we want LUL to work in that context, we need to separate out those parts that allocate memory from those that don't. The parts that do need to allocate memory can run on the sampler thread outside of the ticks, or when the profiler is started on the thread that starts it.
Blocks: 1141714
Attached patch bug1141712-2.diff (obsolete) — Splinter Review
WIP patch.  Provides LUL+inplace ticking at least for x86_64-linux.
Not tested on any other target.  The core change is to integrate
it like Jed's unwinder (USE_EHABI_STACKWALK) is, hence:

* starting up in Sampler::Start in platform-linux.cc

* a new implementation for TableTicker::doNativeBacktrace in
  TableTicker.cpp, for the case USE_LUL_STACKWALK

Other changes:

* completely removes UnwinderThread2.{h,cpp}

* removes HasUnwinderThread() from TableTicker

* BreakpadSampler.cpp is almost entirely empty (should be rm'd)

Stuff I'm not sure about:

* SyncProfile.cpp: some stuff commented out and replaced by MOZ_CRASH.
  This needs to be fixed.  I don't know how this works or what it 
  does.

* Current effect/status of control env vars
  MOZ_PROFILER_INTERVAL MOZ_PROFILER_MODE MOZ_PROFILER_VERBOSE

* the new doNativeBacktrace doesn't have the same machinery for
  handling JIT entry frames that is present in the EHABI version
  of this function.  I don't understand enough about that to
  replicate it yet.

* Possibly as a result of the previous point, I had to add a hack
  in mergeStacksIntoProfile to deal with the case where a native
  SP value is the same as either the JS or Pseudostack SP value.

* There are possibly still some unwinder-thread artefacts still
  present, particularly to do with SyncProfile.{cpp,h}.
More ToDo:

* gtest/LulTest.cpp: is now #if 0'd

* class AutoLulRWLocker can get rm'd, I think
The issue that Julian pointed out with native stackaddresses that conflict with pseudostack and jsstack addresses already exists.  Starting a debug build of the browser with the gecko profiler addon installed reveals them.  I'm extracting out the component of Julian's patch that handles that (and removing the #ifdef LUL guards around it) and landing it separately, since it is more than a LUL-specific symptom.
Attachment #8580237 - Flags: review?(mstange)
(In reply to Kannan Vijayan [:djvj] from comment #3)
> Created attachment 8580237 [details] [diff] [review]
> skip-conflicting-native-stackaddrs.patch
Maybe retain the comment, or something like it, so the next person
along doesn't have to figure out "by hand" why that logic is there?
Depends on: 1145333
(In reply to Julian Seward [:jseward] from comment #4)
> (In reply to Kannan Vijayan [:djvj] from comment #3)
> > Created attachment 8580237 [details] [diff] [review]
> > skip-conflicting-native-stackaddrs.patch
> Maybe retain the comment, or something like it, so the next person
> along doesn't have to figure out "by hand" why that logic is there?

Will do.  Also, markus asked that I move this to a new bug, and made a good argument for that, so I did: bug 1145333.  Patch posted there.
Attachment #8580237 - Attachment is obsolete: true
Attachment #8580237 - Flags: review?(mstange)
Attached patch bug1141712-5.cset (obsolete) — Splinter Review
First version of the patch worth serious consideration:

Deleted files: UnwinderThread2.cpp, UnwinderThread2.h
  because the unwinder thread is gone

Deleted BreakpadSampler.cpp
  because it became almost empty

Created platform-linux-lul.cpp
  is small; some glue code for hooking up SPS and LUL on Linux.

simplified the env vars that control the profiler
  Added MOZ_PROFILER_HELP
  removed MOZ_PROFILER_NEW (that selected LUL; now it's always selected)
  removed MOZ_PROFILER_MODE (now it always does combined native+pseudo
  on Desktop Linux)

Redid class LUL for in-place unwinding

Removed everything I could find relating to unwinder threads and UWT buffers

Fundamentally, integrated LUL in-place unwinding using USE_LUL_STACKWALK in
  the same style that USE_EHABI_STACKWALK is done.
Attachment #8579366 - Attachment is obsolete: true
Also deleted LulRWLock.h, LulRWLock.cpp as they are no longer needed.
Attachment #8581949 - Flags: review?(mstange)
This looks pretty good. Issues I've identified:
 - Some end-of-line whitespace scattered throughout the patch
 - The comment above "class LUL {" should briefly explain what admin mode is
 - Doesn't the LUL constructor need to stay explicit?
 - I'd rename RegisterUnwinderThread to RegisterThread, or maybe RegisterSampledThread
 - mStats_n_CONTEXT is a very strange naming convention. Can you instead make a LULStats struct with three fields, possibly templated so that you can have both uint32_t and mozilla::Atomic<uint32_t> versions, and give that an operator= so that you can do mPreviousStats = mStats, and  an operator- like "template<typename S> uint32_t operator-(const LULStats<S>& aOther)" that does the calculation in LUL::MaybeShowStats()?
 - Let's make ProfilerVerbosity an enum class.
 - I didn't see anything that lets LUL know about new threads that are created while the profiler is already running. Did I miss it?
 - What's the plan for reading the unwind maps for dynamic libraries that get loaded while the profiler is already running?
(In reply to Markus Stange [:mstange] from comment #8)

Thanks for the feedback.  I can fix all the low level coding/comment
issues.  That leaves the higher level questions:


>  - I didn't see anything that lets LUL know about new threads that are
> created while the profiler is already running. Did I miss it?

No, you didn't miss it.  That might be a bit tricky.  Registering a
new thread with the library requires adding an entry to LUL's
thread_id-to-CFICache mapping, but that mapping is read in parallel
and without synchronisation by threads being sampled.

If it is possible to temporarily guarantee that all sampling threads
are not sampling (out of their signal handlers), then we could switch
the LUL object back to admin mode, add the new thread, switch it to
unwind mode again, and resume sampling.

If that's not possible, then the mapping will have to be updated
whilst other threads are reading it, so we'll have to add some
lightweight locking.

So it depends on the rest of the sampling framework, and I'm not
familiar enough with it to say.  Can you advise?


>  - What's the plan for reading the unwind maps for dynamic libraries that
> get loaded while the profiler is already running?

A similar situation: we need to get all sampling threads out of their
signal handlers before we can safely load more stuff into LUL, else we
risk malloc-lock deadlocks.

Do you know if this is likely to happen in practice?  In the current
configuration, I see LUL loading unwind info from 70+ shared libraries
and I get the impression that all libraries are loaded by the time the
profiler starts up.
I'll talk to Kannan about the locking stuff. We might need to ask Benoit for help here.

The scenario where it matters is startup profiling: I'm pretty sure that at least the OpenGL libraries aren't loaded at the point where we start the profiler. I'm currently building your patch so that I can try it out.
Here's what we concluded:
 1. Let's not worry about new libraries getting loaded during profiling for now. If it turns out to be a problem, we can still fix it later. The statement about OpenGL libraries being added during runtime might not even be true. (What we know is that we do look up GL symbols during runtime, but it's possible that the library has been present during startup already.)
 2. Let's figure out why we need to do special per-thread stuff in the first place. Why do we have a per-thread CFICache? Does the cache ever get mutated? Don't we only write to it before we enable unwinding? Do different threads have different information in their cache?
If it turns out that all threads can share the information in a global cache, then we don't even need to tell LUL about new threads. The sampler thread knows about them and can then just ask LUL to unwind them without any notification.
Okay, I've read some more code in LulMain.cpp now and it turns out that the cache does get modified during unwinding, after we've left admin mode. And the modification happens in a way that doesn't need memory allocation.
So how about this: replace mCaches with a mozilla::ThreadLocal<CFICache*> mCache, make RegisterUnwinderThread be able to be called outside mAdminMode on the profiled thread outside a signal handler, have it initialize mCache with "new CFICache(mPriMap)", and call it from TableTicker::RegisterThread. Would that work?
mTlsCache would probably be a better name than mCache.
Attachment #8581949 - Flags: review?(mstange)
Attached patch bug1141712-11.cset (obsolete) — Splinter Review
Addresses all review comments.

Also removes the CFI caching (class CFICache).  This means that
LUL::Unwind is now a pure read-only operation and so there is no
need for thread-specific data (the CFICaches) and so no need for
thread registration.  This significantly simplifies the code, at
the cost of about a 20% performance loss for unwinding.

Also moves the initialisation of the main LUL object (sLUL) outwards,
to Sampler::Startup() and Sampler::Shutdown().  This is necessary
to make startup profiling work -- sLUL needs to be initialised
early for that.

Startup profiling (MOZ_PROFILER_STARTUP=anything) works, although
it does sometimes hang, for not obvious reasons.
Attachment #8581949 - Attachment is obsolete: true
(In reply to Julian Seward [:jseward] from comment #15)
> Created attachment 8584214 [details] [diff] [review]
> bug1141712-11.cset

Try results: https://treeherder.mozilla.org/#/jobs?repo=try&revision=61bf70d8a755

re Linux x64 asan, looks like there's some leakery to fix.  Otherwise
looks at least somewhat plausible.
Attachment #8584214 - Flags: review?(mstange)
Hmm, I just thought of another problem. We initialize the profiler during startup, and that now triggers reading the unwind information, right? As a result, do we regress startup performance? Can we initialize LulMain lazily once we start the profiler the first time?
(In reply to Markus Stange [:mstange] from comment #17)
> Hmm, I just thought of another problem. We initialize the profiler during
> startup, and that now triggers reading the unwind information, right?

Yes, correct.  That's a good point.  It reads the unwind information even 
when the profiler extension is disabled.  That's ungood.

> Can we initialize LulMain lazily once we start the profiler the first time?

Possibly so.  Now that all that thread-registration complexity has gone
away, maybe we could move the Lul initialisation back to the lifetime that
it was at originally, in which the Lul object is created at sampler start
time.  That I think would fix it.  I'll give it a go.
Can you still make it so that we only read the unwind information once, even if we start and stop the profiler multiple times?
Attached patch bug1141712-12.cset (obsolete) — Splinter Review
Rebased, and with fixes for concerns listed in comments 17, 18 and 19.
Attachment #8584214 - Attachment is obsolete: true
Attachment #8584214 - Flags: review?(mstange)
Comment on attachment 8586197 [details] [diff] [review]
bug1141712-12.cset

Review of attachment 8586197 [details] [diff] [review]:
-----------------------------------------------------------------

I am so happy about this patch!

::: tools/profiler/LulMain.cpp
@@ +868,5 @@
> +
> +  mAdminMode = false;
> +}
> +
> +  

end-of-line whitespace

@@ +875,5 @@
>                      const char* aFileName, const void* aMappedImage)
>  {
> +  MOZ_ASSERT(mAdminMode);
> +  MOZ_ASSERT(gettid() == mAdminThreadId);
> +  

end-of-line whitespace

::: tools/profiler/LulMain.h
@@ +160,5 @@
> +    return *this;
> +  }
> +
> +  template <typename S>
> +  S operator- (const LULStats<S>& aOther) {

I'd make this always return an uint32_t. It doesn't really make sense to make it anything else.

Also, please remove space between - and (

@@ +216,5 @@
> +  // admin-mode calls are no longer allowed.  The object is initially
> +  // created in admin mode.  The only possible transition is
> +  // admin->unwinding, therefore.
> +  void EnableUnwinding();
> +  

end-of-line whitespace

@@ +315,4 @@
>  private:
> +  // The statistics counters at the point where they were last printed.
> +  LULStats<uint32_t> mStatsPrevious;
> +  

end-of-line whitespace

::: tools/profiler/ProfileEntry.cpp
@@ +11,5 @@
>  
>  // JS
>  #include "jsapi.h"
> +#include "jsbytecode.h"
> +#include "jsfriendapi.h"

What are these for? Why are they needed now? Is it because you're removing files, so different files get grouped during unified build?

::: tools/profiler/platform-linux-lul.h
@@ +15,5 @@
> +// LUL needs a callback for its logging sink.
> +void
> +logging_sink_for_LUL(const char* str);
> +
> +// We need a definition of gettid(), but glibc doesn't provide a 

end-of-line whitespace

::: tools/profiler/platform-linux.cc
@@ +104,5 @@
> +// A singleton instance of the library, together with a pthread_once_t
> +// that is used to initialise it just once, at first use, without
> +// races.
> +lul::LUL* sLUL = nullptr;
> +static pthread_once_t sLUL_is_initialized = PTHREAD_ONCE_INIT;

This is the first time I've seen pthread_once_t. Why did you choose to use it? Sampler::Start() is only called on the main thread, so a simple bool should also do it, shouldn't it?

@@ +308,5 @@
>  #endif
>  
>    int vm_tgid_ = getpid();
> +  DebugOnly<int> my_tid = gettid();
> +  

end-of-line whitespace

@@ +310,5 @@
>    int vm_tgid_ = getpid();
> +  DebugOnly<int> my_tid = gettid();
> +  
> +  unsigned int nSignalsSent = 0;
> +  

end-of-line whitespace

::: tools/profiler/platform.cpp
@@ +323,5 @@
> +     // Now force the next enquiry of moz_profiler_verbose to re-query
> +     // env var MOZ_PROFILER_VERBOSE.
> +     moz_profiler_set_verbosity(ProfilerVerbosity::UNCHECKED);
> +  }
> +  

end-of-line whitespace
Attachment #8586197 - Flags: review+
Rebased, and addresses all review comments in Comment 21.

Also gives each LUL object its own UniqueStringUniverse object, so
that the UniqueStringUniverse can be freed up at the point where the
LUL object is destroyed.  This fixes try failures for Linux64/ASAN,
which was reporting leaks to do with UniqueStringUniverse.

Try results: 
https://treeherder.mozilla.org/#/jobs?repo=try&revision=3e95fb2014be
Attachment #8586197 - Attachment is obsolete: true
I didn't get very far yet with reviewing, but I can already mention one thing I found: When you free the unique string universe, the strings get double-deleted: First you free the strings using delete (which is wrong because strdup-allocated things need to be freed with "free"), and then when the map is destroyed, it deletes its UniqueString objects, and the UniqueString destructor then frees the strings inside again.
Can you make UniqueString carry an std::string instead? Then you don't need strdup any more, and all memory management should be automatic.
Oh wait, I think I was wrong in one part: your delete call doesn't delete the "const char*" string, it deletes the UniqueString object. But doesn't the map also do that?
Oh, no it doesn't, because it only has a regular pointer to the UniqueString objects. I suppose that's fine. But it would probably be nicer to put mozilla::UniquePtr<UniqueString>s into the map, because then you wouldn't need the delete loop.
I checked this quite carefully with Valgrind, and it shows no evidence
of leaking (not enough deleting/freeing) or double freeing (too much
deleting/freeing).

As you observe, the map type is map<string, UniqueString*>, so deleting
the map automatically causes the keys (string) to be deleted, but the
values (UniqueString*) are ignored, which is why there is an explicit
loop to delete them.

The strdup allocations are indeed freed with "free", in ~UniqueString().

I don't think mozilla::UniquePtr has the right semantics.  IIUC it allows
only one pointer to the encapsulated resource to ever exist.  By contrast
there can be many pointers to a given UniqueString.  The point about
UniqueString is that there's only one UniqueString for any specific value,
so string equality checks can be done using pointer equality.  Besides,
it's only a few lines of code and with a bit of luck we can later on
get rid of UniqueString entirely, so I'm not convinced it's worth the
effort.
Comment on attachment 8590207 [details] [diff] [review]
bug1141712-13.cset

Sorry for the wait. This all looks good.

My comment about the double deletion was wrong.

Whether UniquePtr is appropriate here is subject to debate. Some say the "unique" is about "unique ownership", i.e. there is only one *owning* pointer to the resource, and all other pointers are non-owning. That interpretation would make it appropriate to use here. But let's not get into that now and just land the patch as-is.
Attachment #8590207 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/fa98fe5c4ca2
Assignee: nobody → jseward
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
Duplicate of this bug: 1141714
You need to log in before you can comment on or make changes to this bug.