Closed Bug 1141712 Opened 11 years ago Closed 11 years ago

Make LUL work with inplace ticking (not using the unwinder thread)

Categories

(Core :: Gecko Profiler, defect)

Product:

Component:

Type:

defect

Priority:

Not set

Severity:

normal

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla40

Tracking Flags:

Tracking

Status

firefox40

---

fixed

People

(Reporter: mstange, Assigned: jseward)

References

Details

Attachments

(1 file, 5 obsolete files)

bug1141712-2.diff 11 years ago Julian Seward [:jseward] 130.26 KB, patch		Details \| Diff \| Splinter Review
skip-conflicting-native-stackaddrs.patch 11 years ago Kannan Vijayan [:djvj] 1.05 KB, patch		Details \| Diff \| Splinter Review
bug1141712-5.cset 11 years ago Julian Seward [:jseward] 153.46 KB, patch		Details \| Diff \| Splinter Review
bug1141712-11.cset 11 years ago Julian Seward [:jseward] 173.90 KB, patch		Details \| Diff \| Splinter Review
bug1141712-12.cset 11 years ago Julian Seward [:jseward] 174.69 KB, patch	mstange : review+	Details \| Diff \| Splinter Review
bug1141712-13.cset 11 years ago Julian Seward [:jseward] 191.77 KB, patch	mstange : review+	Details \| Diff \| Splinter Review

Markus Stange [:mstange]

Reporter

Description

•

11 years ago

We'd like to get rid of the unwinder thread, and LUL is only implemented for the unwinder thread at the moment. Not using the unwinder thread means that stack unwinding happens inside a signal handler on the profiled thread. If we want LUL to work in that context, we need to separate out those parts that allocate memory from those that don't. The parts that do need to allocate memory can run on the sampler thread outside of the ticks, or when the profiler is started on the thread that starts it.

Markus Stange [:mstange]

Reporter

Updated

•

11 years ago

Blocks: 1141714

Julian Seward [:jseward]

Assignee

Comment 1

•

11 years ago

Attached patch bug1141712-2.diff (obsolete) — Details — Splinter Review

WIP patch. Provides LUL+inplace ticking at least for x86_64-linux. Not tested on any other target. The core change is to integrate it like Jed's unwinder (USE_EHABI_STACKWALK) is, hence: * starting up in Sampler::Start in platform-linux.cc * a new implementation for TableTicker::doNativeBacktrace in TableTicker.cpp, for the case USE_LUL_STACKWALK Other changes: * completely removes UnwinderThread2.{h,cpp} * removes HasUnwinderThread() from TableTicker * BreakpadSampler.cpp is almost entirely empty (should be rm'd) Stuff I'm not sure about: * SyncProfile.cpp: some stuff commented out and replaced by MOZ_CRASH. This needs to be fixed. I don't know how this works or what it does. * Current effect/status of control env vars MOZ_PROFILER_INTERVAL MOZ_PROFILER_MODE MOZ_PROFILER_VERBOSE * the new doNativeBacktrace doesn't have the same machinery for handling JIT entry frames that is present in the EHABI version of this function. I don't understand enough about that to replicate it yet. * Possibly as a result of the previous point, I had to add a hack in mergeStacksIntoProfile to deal with the case where a native SP value is the same as either the JS or Pseudostack SP value. * There are possibly still some unwinder-thread artefacts still present, particularly to do with SyncProfile.{cpp,h}.

Julian Seward [:jseward]

Assignee

Comment 2

•

11 years ago

More ToDo: * gtest/LulTest.cpp: is now #if 0'd * class AutoLulRWLocker can get rm'd, I think

Kannan Vijayan [:djvj]

Comment 3

•

11 years ago

Attached patch skip-conflicting-native-stackaddrs.patch (obsolete) — Details — Splinter Review

The issue that Julian pointed out with native stackaddresses that conflict with pseudostack and jsstack addresses already exists. Starting a debug build of the browser with the gecko profiler addon installed reveals them. I'm extracting out the component of Julian's patch that handles that (and removing the #ifdef LUL guards around it) and landing it separately, since it is more than a LUL-specific symptom.

Attachment #8580237 - Flags: review?(mstange)

Julian Seward [:jseward]

Assignee

Comment 4

•

11 years ago

(In reply to Kannan Vijayan [:djvj] from comment #3) > Created attachment 8580237 [details] [diff] [review] > skip-conflicting-native-stackaddrs.patch Maybe retain the comment, or something like it, so the next person along doesn't have to figure out "by hand" why that logic is there?

Kannan Vijayan [:djvj]

Updated

•

11 years ago

Depends on: 1145333

Kannan Vijayan [:djvj]

Comment 5

•

11 years ago

(In reply to Julian Seward [:jseward] from comment #4) > (In reply to Kannan Vijayan [:djvj] from comment #3) > > Created attachment 8580237 [details] [diff] [review] > > skip-conflicting-native-stackaddrs.patch > Maybe retain the comment, or something like it, so the next person > along doesn't have to figure out "by hand" why that logic is there? Will do. Also, markus asked that I move this to a new bug, and made a good argument for that, so I did: bug 1145333. Patch posted there.

Markus Stange [:mstange]

Reporter

Updated

•

11 years ago

Attachment #8580237 - Attachment is obsolete: true

Attachment #8580237 - Flags: review?(mstange)

Julian Seward [:jseward]

Assignee

Comment 6

•

11 years ago

Attached patch bug1141712-5.cset (obsolete) — Details — Splinter Review

First version of the patch worth serious consideration: Deleted files: UnwinderThread2.cpp, UnwinderThread2.h because the unwinder thread is gone Deleted BreakpadSampler.cpp because it became almost empty Created platform-linux-lul.cpp is small; some glue code for hooking up SPS and LUL on Linux. simplified the env vars that control the profiler Added MOZ_PROFILER_HELP removed MOZ_PROFILER_NEW (that selected LUL; now it's always selected) removed MOZ_PROFILER_MODE (now it always does combined native+pseudo on Desktop Linux) Redid class LUL for in-place unwinding Removed everything I could find relating to unwinder threads and UWT buffers Fundamentally, integrated LUL in-place unwinding using USE_LUL_STACKWALK in the same style that USE_EHABI_STACKWALK is done.

Attachment #8579366 - Attachment is obsolete: true

Julian Seward [:jseward]

Assignee

Comment 7

•

11 years ago

Also deleted LulRWLock.h, LulRWLock.cpp as they are no longer needed.

Julian Seward [:jseward]

Assignee

Updated

•

11 years ago

Attachment #8581949 - Flags: review?(mstange)

Markus Stange [:mstange]

Reporter

Comment 8

•

11 years ago

This looks pretty good. Issues I've identified: - Some end-of-line whitespace scattered throughout the patch - The comment above "class LUL {" should briefly explain what admin mode is - Doesn't the LUL constructor need to stay explicit? - I'd rename RegisterUnwinderThread to RegisterThread, or maybe RegisterSampledThread - mStats_n_CONTEXT is a very strange naming convention. Can you instead make a LULStats struct with three fields, possibly templated so that you can have both uint32_t and mozilla::Atomic<uint32_t> versions, and give that an operator= so that you can do mPreviousStats = mStats, and an operator- like "template<typename S> uint32_t operator-(const LULStats<S>& aOther)" that does the calculation in LUL::MaybeShowStats()? - Let's make ProfilerVerbosity an enum class. - I didn't see anything that lets LUL know about new threads that are created while the profiler is already running. Did I miss it? - What's the plan for reading the unwind maps for dynamic libraries that get loaded while the profiler is already running?

Julian Seward [:jseward]

Assignee

Comment 9

•

11 years ago

(In reply to Markus Stange [:mstange] from comment #8) Thanks for the feedback. I can fix all the low level coding/comment issues. That leaves the higher level questions: > - I didn't see anything that lets LUL know about new threads that are > created while the profiler is already running. Did I miss it? No, you didn't miss it. That might be a bit tricky. Registering a new thread with the library requires adding an entry to LUL's thread_id-to-CFICache mapping, but that mapping is read in parallel and without synchronisation by threads being sampled. If it is possible to temporarily guarantee that all sampling threads are not sampling (out of their signal handlers), then we could switch the LUL object back to admin mode, add the new thread, switch it to unwind mode again, and resume sampling. If that's not possible, then the mapping will have to be updated whilst other threads are reading it, so we'll have to add some lightweight locking. So it depends on the rest of the sampling framework, and I'm not familiar enough with it to say. Can you advise? > - What's the plan for reading the unwind maps for dynamic libraries that > get loaded while the profiler is already running? A similar situation: we need to get all sampling threads out of their signal handlers before we can safely load more stuff into LUL, else we risk malloc-lock deadlocks. Do you know if this is likely to happen in practice? In the current configuration, I see LUL loading unwind info from 70+ shared libraries and I get the impression that all libraries are loaded by the time the profiler starts up.

Markus Stange [:mstange]

Reporter

Comment 10

•

11 years ago

I'll talk to Kannan about the locking stuff. We might need to ask Benoit for help here. The scenario where it matters is startup profiling: I'm pretty sure that at least the OpenGL libraries aren't loaded at the point where we start the profiler. I'm currently building your patch so that I can try it out.

Markus Stange [:mstange]

Reporter

Comment 11

•

11 years ago

Here's what we concluded: 1. Let's not worry about new libraries getting loaded during profiling for now. If it turns out to be a problem, we can still fix it later. The statement about OpenGL libraries being added during runtime might not even be true. (What we know is that we do look up GL symbols during runtime, but it's possible that the library has been present during startup already.) 2. Let's figure out why we need to do special per-thread stuff in the first place. Why do we have a per-thread CFICache? Does the cache ever get mutated? Don't we only write to it before we enable unwinding? Do different threads have different information in their cache? If it turns out that all threads can share the information in a global cache, then we don't even need to tell LUL about new threads. The sampler thread knows about them and can then just ask LUL to unwind them without any notification.

Markus Stange [:mstange]

Reporter

Comment 12

•

11 years ago

Okay, I've read some more code in LulMain.cpp now and it turns out that the cache does get modified during unwinding, after we've left admin mode. And the modification happens in a way that doesn't need memory allocation.

Markus Stange [:mstange]

Reporter

Comment 13

•

11 years ago

So how about this: replace mCaches with a mozilla::ThreadLocal<CFICache*> mCache, make RegisterUnwinderThread be able to be called outside mAdminMode on the profiled thread outside a signal handler, have it initialize mCache with "new CFICache(mPriMap)", and call it from TableTicker::RegisterThread. Would that work?

Markus Stange [:mstange]

Reporter

Comment 14

•

11 years ago

mTlsCache would probably be a better name than mCache.

Markus Stange [:mstange]

Reporter

Updated

•

11 years ago

Attachment #8581949 - Flags: review?(mstange)

Julian Seward [:jseward]

Assignee

Comment 15

•

11 years ago

Attached patch bug1141712-11.cset (obsolete) — Details — Splinter Review

Addresses all review comments. Also removes the CFI caching (class CFICache). This means that LUL::Unwind is now a pure read-only operation and so there is no need for thread-specific data (the CFICaches) and so no need for thread registration. This significantly simplifies the code, at the cost of about a 20% performance loss for unwinding. Also moves the initialisation of the main LUL object (sLUL) outwards, to Sampler::Startup() and Sampler::Shutdown(). This is necessary to make startup profiling work -- sLUL needs to be initialised early for that. Startup profiling (MOZ_PROFILER_STARTUP=anything) works, although it does sometimes hang, for not obvious reasons.

Attachment #8581949 - Attachment is obsolete: true

Julian Seward [:jseward]

Assignee

Comment 16

•

11 years ago

(In reply to Julian Seward [:jseward] from comment #15) > Created attachment 8584214 [details] [diff] [review] > bug1141712-11.cset Try results: https://treeherder.mozilla.org/#/jobs?repo=try&revision=61bf70d8a755 re Linux x64 asan, looks like there's some leakery to fix. Otherwise looks at least somewhat plausible.

Julian Seward [:jseward]

Assignee

Updated

•

11 years ago

Attachment #8584214 - Flags: review?(mstange)

Markus Stange [:mstange]

Reporter

Comment 17

•

11 years ago

Hmm, I just thought of another problem. We initialize the profiler during startup, and that now triggers reading the unwind information, right? As a result, do we regress startup performance? Can we initialize LulMain lazily once we start the profiler the first time?

Julian Seward [:jseward]

Assignee

Comment 18

•

11 years ago

(In reply to Markus Stange [:mstange] from comment #17) > Hmm, I just thought of another problem. We initialize the profiler during > startup, and that now triggers reading the unwind information, right? Yes, correct. That's a good point. It reads the unwind information even when the profiler extension is disabled. That's ungood. > Can we initialize LulMain lazily once we start the profiler the first time? Possibly so. Now that all that thread-registration complexity has gone away, maybe we could move the Lul initialisation back to the lifetime that it was at originally, in which the Lul object is created at sampler start time. That I think would fix it. I'll give it a go.

Markus Stange [:mstange]

Reporter

Comment 19

•

11 years ago

Can you still make it so that we only read the unwind information once, even if we start and stop the profiler multiple times?

Julian Seward [:jseward]

Assignee

Comment 20

•

11 years ago

Attached patch bug1141712-12.cset (obsolete) — Details — Splinter Review

Rebased, and with fixes for concerns listed in comments 17, 18 and 19.

Attachment #8584214 - Attachment is obsolete: true

Attachment #8584214 - Flags: review?(mstange)

Markus Stange [:mstange]

Reporter

Comment 21

•

11 years ago

Comment on attachment 8586197 [details] [diff] [review] bug1141712-12.cset Review of attachment 8586197 [details] [diff] [review]: ----------------------------------------------------------------- I am so happy about this patch! ::: tools/profiler/LulMain.cpp @@ +868,5 @@ > + > + mAdminMode = false; > +} > + > + end-of-line whitespace @@ +875,5 @@ > const char* aFileName, const void* aMappedImage) > { > + MOZ_ASSERT(mAdminMode); > + MOZ_ASSERT(gettid() == mAdminThreadId); > + end-of-line whitespace ::: tools/profiler/LulMain.h @@ +160,5 @@ > + return *this; > + } > + > + template <typename S> > + S operator- (const LULStats<S>& aOther) { I'd make this always return an uint32_t. It doesn't really make sense to make it anything else. Also, please remove space between - and ( @@ +216,5 @@ > + // admin-mode calls are no longer allowed. The object is initially > + // created in admin mode. The only possible transition is > + // admin->unwinding, therefore. > + void EnableUnwinding(); > + end-of-line whitespace @@ +315,4 @@ > private: > + // The statistics counters at the point where they were last printed. > + LULStats<uint32_t> mStatsPrevious; > + end-of-line whitespace ::: tools/profiler/ProfileEntry.cpp @@ +11,5 @@ > > // JS > #include "jsapi.h" > +#include "jsbytecode.h" > +#include "jsfriendapi.h" What are these for? Why are they needed now? Is it because you're removing files, so different files get grouped during unified build? ::: tools/profiler/platform-linux-lul.h @@ +15,5 @@ > +// LUL needs a callback for its logging sink. > +void > +logging_sink_for_LUL(const char* str); > + > +// We need a definition of gettid(), but glibc doesn't provide a end-of-line whitespace ::: tools/profiler/platform-linux.cc @@ +104,5 @@ > +// A singleton instance of the library, together with a pthread_once_t > +// that is used to initialise it just once, at first use, without > +// races. > +lul::LUL* sLUL = nullptr; > +static pthread_once_t sLUL_is_initialized = PTHREAD_ONCE_INIT; This is the first time I've seen pthread_once_t. Why did you choose to use it? Sampler::Start() is only called on the main thread, so a simple bool should also do it, shouldn't it? @@ +308,5 @@ > #endif > > int vm_tgid_ = getpid(); > + DebugOnly<int> my_tid = gettid(); > + end-of-line whitespace @@ +310,5 @@ > int vm_tgid_ = getpid(); > + DebugOnly<int> my_tid = gettid(); > + > + unsigned int nSignalsSent = 0; > + end-of-line whitespace ::: tools/profiler/platform.cpp @@ +323,5 @@ > + // Now force the next enquiry of moz_profiler_verbose to re-query > + // env var MOZ_PROFILER_VERBOSE. > + moz_profiler_set_verbosity(ProfilerVerbosity::UNCHECKED); > + } > + end-of-line whitespace

Attachment #8586197 - Flags: review+

Julian Seward [:jseward]

Assignee

Comment 22

•

11 years ago

Attached patch bug1141712-13.cset — Details — Splinter Review

Rebased, and addresses all review comments in Comment 21. Also gives each LUL object its own UniqueStringUniverse object, so that the UniqueStringUniverse can be freed up at the point where the LUL object is destroyed. This fixes try failures for Linux64/ASAN, which was reporting leaks to do with UniqueStringUniverse. Try results: https://treeherder.mozilla.org/#/jobs?repo=try&revision=3e95fb2014be

Attachment #8586197 - Attachment is obsolete: true

Markus Stange [:mstange]

Reporter

Comment 23

•

11 years ago

I didn't get very far yet with reviewing, but I can already mention one thing I found: When you free the unique string universe, the strings get double-deleted: First you free the strings using delete (which is wrong because strdup-allocated things need to be freed with "free"), and then when the map is destroyed, it deletes its UniqueString objects, and the UniqueString destructor then frees the strings inside again. Can you make UniqueString carry an std::string instead? Then you don't need strdup any more, and all memory management should be automatic.

Markus Stange [:mstange]

Reporter

Comment 24

•

11 years ago

Oh wait, I think I was wrong in one part: your delete call doesn't delete the "const char*" string, it deletes the UniqueString object. But doesn't the map also do that?

Markus Stange [:mstange]

Reporter

Comment 25

•

11 years ago

Oh, no it doesn't, because it only has a regular pointer to the UniqueString objects. I suppose that's fine. But it would probably be nicer to put mozilla::UniquePtr<UniqueString>s into the map, because then you wouldn't need the delete loop.

Julian Seward [:jseward]

Assignee

Comment 26

•

11 years ago

I checked this quite carefully with Valgrind, and it shows no evidence of leaking (not enough deleting/freeing) or double freeing (too much deleting/freeing). As you observe, the map type is map<string, UniqueString*>, so deleting the map automatically causes the keys (string) to be deleted, but the values (UniqueString*) are ignored, which is why there is an explicit loop to delete them. The strdup allocations are indeed freed with "free", in ~UniqueString(). I don't think mozilla::UniquePtr has the right semantics. IIUC it allows only one pointer to the encapsulated resource to ever exist. By contrast there can be many pointers to a given UniqueString. The point about UniqueString is that there's only one UniqueString for any specific value, so string equality checks can be done using pointer equality. Besides, it's only a few lines of code and with a bit of luck we can later on get rid of UniqueString entirely, so I'm not convinced it's worth the effort.

Markus Stange [:mstange]

Reporter

Comment 27

•

11 years ago

Comment on attachment 8590207 [details] [diff] [review] bug1141712-13.cset Sorry for the wait. This all looks good. My comment about the double deletion was wrong. Whether UniquePtr is appropriate here is subject to debate. Some say the "unique" is about "unique ownership", i.e. there is only one *owning* pointer to the resource, and all other pointers are non-owning. That interpretation would make it appropriate to use here. But let's not get into that now and just land the patch as-is.

Attachment #8590207 - Flags: review+

Julian Seward [:jseward]

Assignee

Comment 28

•

11 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/fa98fe5c4ca2

Ryan VanderMeulen [:RyanVM]

Comment 29

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/fa98fe5c4ca2

Assignee: nobody → jseward

Status: NEW → RESOLVED

Closed: 11 years ago

status-firefox40: --- → fixed

Flags: in-testsuite+

Resolution: --- → FIXED

Target Milestone: --- → mozilla40

You need to log in before you can comment on or make changes to this bug.