Closed
Bug 1342102
Opened 7 years ago
Closed 7 years ago
Use the same threading structure in platform-linux-android.cpp as for the -macos and -win32 versions
Categories
(Core :: Gecko Profiler, defect)
Core
Gecko Profiler
Tracking
()
RESOLVED
FIXED
mozilla54
Tracking | Status | |
---|---|---|
firefox54 | --- | fixed |
People
(Reporter: jseward, Assigned: jseward)
References
Details
Attachments
(2 files, 2 obsolete files)
4.67 KB,
text/plain
|
Details | |
13.04 KB,
patch
|
n.nethercote
:
review+
|
Details | Diff | Splinter Review |
The -macos and -win32 profiler cores have the following structure: N threads being sampled one sampler thread, which does this: forever { for each registered thread t { t.suspend t.getcontext t.Tick (copy potentially lots of stuff out of t's stack) t.resume } } The -linux-android core has a different scheme: sampler thread: forever { for each registered thread t { send SIGPROF to t wait for t to indicate its SIGPROF handler has finished (a.k.a sem_wait(&gSignalHandlingDone)) } SIGPROF handler (runs in each sampled thread) { thisthread.getcontext thisthread.Tick tell the sampler thread we're done (a.k.a. sem_post(&gSignalHandlingDone)) } The different -linux-android scheme has a number of disadvantages: (1) Current work to rationalise/clean up the threading structure of the profiler is complicated by the need to reason about/verify two different schemes. In particular, the Tick call in the -win32 and -macos schemes will produce its output on the sampler thread. In the -linux-android scheme that is produced on the sampled threads. (2) It causes a lot of duplicated code in platform-*.cpp. For example SamplerThread::Run() in the -win32 and -macos files are very similar. Ideally all three could be merged into a single file with all the identical logic commoned up. (3) Running lots of code -- the entire contents of Tick -- in a signal handler isn't considered good practice. Doing this will require implementing the (suspend, getcontext, resume) primitives for Linux.
Assignee | ||
Comment 1•7 years ago
|
||
Doing this would be relatively straightforward if it were possible to send an extra machine word along with a signal to a thread. With the POSIX "realtime signals extension" (which we have in Linux), it is almost possible to do this using sigqueue() and siginfo.si_value. But with sigqueue() we can't deliver signals to a specific thread, so that scheme doesn't work. So we have to fall back to using a global variable. The existing platform-linux-android.cpp uses a global variable gCurrentThreadInfo anyway, so we don't lose any generality like this. Plan is: // Used to communicate between sampler and samplee threads struct Magic { sem_t acquire_sem; sem_t release_sem; sigcontext_t thread_ctx; } static Atomic<Magic*> gMagicPtr; // We still have a signal handler in the sampled threads, but now // it does almost nothing: static void SigprofHandler(int signal, siginfo_t* info, void* context) { // Stash the thread context so that sampler thread can use it memcpy(gMagicPtr->thread_ctx, context) // Tell the sampler thread that the context is valid, and // go to sleep sem_post(gMagicPtr->acquire_sem); // Wait for the sampler thread to un-freeze us sem_wait(gMagicPtr->release_sem); } // Sampler thread is now more or less isomorphic with the // -macos/-win32 versions: sampler_thread: forever { for t in registered_threads { // perform "suspend" and "getcontext" steps struct Magic magic; // on Sampler thread's stack gMagicPtr = &magic; memset(gMagicPtr, 0, sizeof(*gMagicPtr)) sem_init(&gMagicPtr->acquire_sem, 0); sem_init(&gMagicPtr->release_sem, 0); tgkill(SIGPROF, t) sem_wait(acquire_sem); // t is now frozen and gMagicPtr->thread_ctx is valid Tick // perform "release" step sem_post(&release_sem); } }
Comment 2•7 years ago
|
||
I really like the idea.
Comment 3•7 years ago
|
||
Nice! Does this completely eliminate the "we must be careful about how where we call malloc" problem? BTW, I have two plans to make platform-linux-android.cpp more similar to platform-{win32,macos}.cpp. - Factor out the common parts of the main "forever" loop so that we have one copy of it, which can go in platform.cpp. - Implement a SamplerThread class and put a lot of the global state within it. E.g. your gMagicPtr could go within it. This avoids having to clear the global state when the profiler is stopped, because it will instead disappear. But I'm happy to wait for you to do this first.
Assignee | ||
Comment 4•7 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #3) > Does this completely eliminate the "we must be careful about how > where we call malloc" problem? Alas, no .. this has no bearing on that problem. This merely moves the critical section into the sampler thread so that the thread structure is identical on all targets. It doesn't change the potential-deadlock semantics. > BTW, I have two plans to make platform-linux-android.cpp more similar to > platform-{win32,macos}.cpp. Interesting. I'll chase this next week and see if it's actually feasible. If yes then we can de-duplicate the platform-* files.
Assignee | ||
Comment 5•7 years ago
|
||
Proof-of-concept standalone test program. There's one dubious thing about this. In the signal handler we call both sem_post and sem_wait. sem_post is "async signal safe" (OK for use in a signal handler) but sem_wait isn't. sem_wait can be interrupted by a signal delivery. We can restart the wait after such an interruption, should it occur. I find myself wondering what happens if the kernel fails to deliver any of the SIGPROFs to the target thread. Then the sampling thread will deadlock. But that would also be the case for the existing implementation, so this doesn't make it any worse.
Assignee | ||
Comment 6•7 years ago
|
||
WIP patch. The good news is that it works, more or less as planned. The bad news is that I had to insert a spinloop, for reasons written in comments in the patch. It doesn't spin much though and with 1KHz sampling on Linux, doesn't seem to take measurably more CPU than the original version (per watching /bin/top). * Removes global variables: gCurrentThreadInfo, gRssMemory, gUssMemory, gSignalHandlingDone * Replaces then with a single pointer, gSigHandlerCoordinator, to a |struct SigHandlerCoordinator|, which is what is referred to as |struct Magic| in comment 1. * SigprofHandler() now only contains synchronisation code. * The critical section (copying of the suspended thread's state) is now done in the loop in SigprofSender().
Comment 7•7 years ago
|
||
Comment on attachment 8841554 [details] [diff] [review] bug1342102-1.cset Review of attachment 8841554 [details] [diff] [review]: ----------------------------------------------------------------- The spin loop is unfortunate. I can think of two possibilities for removing it. - Use a third semaphore instead. This seems straightforward. - Move the sem_init() calls from the inner loop to the outer loop. Then if the sem_wait(resume) in the signal handler takes a long time you don't have to worry about the next inner loop iteration causing problems via sem_init(resume). But I'm not sure what happens if signal handler is still waiting when the next inner loop iteration fires another signal. Might be a bad idea. AIUI, it's needed because you call sem_init() within the inner loop. What about if you moved it to the outer loop? I think you wouldn't need it then. Then it might be possible for SignalHandler's sem_wait(resume) call to still be incomplete by the time the next inner loop's iteration calls sem_post(resume), in which case it'll be interrupted and instead of looping the Signal handler could just return. (I'm not sure how the errno handling would work in this scenario.) ::: tools/profiler/core/platform-linux-android.cpp @@ +165,5 @@ > #endif > } > > +// A type used to coordinate between the sampler (signal sending) > +// thread and the thread currently being sampled (the target). You're inconsistent in your language -- sometimes you use "target", sometimes "samplee". @@ +166,5 @@ > } > > +// A type used to coordinate between the sampler (signal sending) > +// thread and the thread currently being sampled (the target). > +struct SigHandlerCoordinator { Nit: opening brace on its own line. @@ +167,5 @@ > > +// A type used to coordinate between the sampler (signal sending) > +// thread and the thread currently being sampled (the target). > +struct SigHandlerCoordinator { > + sem_t suspend_sem; // samplee waits on this to suspend Please use standard Mozilla style for field names, e.g. mSuspendSem, mThreadCtx. And no need to indent field names. @@ +168,5 @@ > +// A type used to coordinate between the sampler (signal sending) > +// thread and the thread currently being sampled (the target). > +struct SigHandlerCoordinator { > + sem_t suspend_sem; // samplee waits on this to suspend > + sem_t resume_sem; // sampler waits on this for samplee Nit: no need to indent field names. @@ +174,5 @@ > + mozilla::Atomic<int> busy; // set to 1 if the block is in use, else 0 > +}; > + > +// This the one-and-only global variable used to communicate between > +// the sampler thread and the samplee's signal handler. "samplee's signal handler" -- is that really the case? I'm totally unclear on which thread the signal handler runs. I'd find this sentence clearer if you just removed "samplee". In fact, there are various places where I find "samplee" unclear, and "signal handler" would probably be clearer. @@ +184,2 @@ > static void > +SigprofHandler(int signal, siginfo_t* info, void* contextV) While you're changing this function a lot, please rename these as aSignal, aInfo, aContextV. (Or just aContext? I'm not sure what the "V" is for.) @@ +198,2 @@ > > + // Tell the sampler thread that the context is valid, and go to I find this sentence easier to read if you change "and" to "then". (Makes it clearer that "go to sleep" isn't part of the message to the sampler thread.) @@ +198,3 @@ > > + // Tell the sampler thread that the context is valid, and go to > + // sleep. sem_post can never fail, providing the semaphore is valid. Nit: this and other comments look like they aren't 79 chars wide. You can fit more on each line :) @@ +203,5 @@ > + > + // Wait for the sampler thread to un-freeze us. In contrast to > + // sem_post, this can fail by being interrupted by a signal. > + while (true) { > + r = sem_wait(&(gSigHandlerCoordinator->resume_sem)); No need for the extra parens, here and on similar lines throughout the patch. @@ +325,5 @@ > > + // Suspend the samplee and get its context. > + SigHandlerCoordinator coord; // on sampler thread's stack > + PodZero(&coord); > + coord.busy = 1; Please give SigHandlerCoordinator a constructor to make the initialization nicer. I guess you'll have to initialize the fields individually instead of using PodZero, so you can remove the PodOperations.h include above. @@ +331,5 @@ > + > + int r; > + r = sem_init(&(gSigHandlerCoordinator->suspend_sem), /*!pshared*/0, 0); > + r |= sem_init(&(gSigHandlerCoordinator->resume_sem), /*!pshared*/0, 0); > + MY_MOZ_RELEASE_ASSERT(r == 0); The sem_init calls could be in the constructor too. @@ +336,5 @@ > + > + r = tgkill(vm_tgid_, threadId, SIGPROF); > + MY_MOZ_RELEASE_ASSERT(r == 0); > + > + // Wait for the signal handler to indicate that is has suspended. Typo: s/is/it/ @@ +338,5 @@ > + MY_MOZ_RELEASE_ASSERT(r == 0); > + > + // Wait for the signal handler to indicate that is has suspended. > + r = sem_wait(&(gSigHandlerCoordinator->suspend_sem)); > + MY_MOZ_RELEASE_ASSERT(r == 0); For the other sem_wait you check for interruption by signals. Why not the same here? @@ +345,5 @@ > + // gSigHandlerCoordinator->thread_ctx is valid. We can poke > + // around in it and unwind its stack as we like. > + > + TickSample sample_obj; > + TickSample* sample = &sample_obj; Since you've moving this code, can you rename sample_obj as sample and work with it directly instead of using a pointer? This indirection has been bugging me for a while and I have patches to remove it in the other platform-*.cpp files. @@ +364,5 @@ > + MY_MOZ_RELEASE_ASSERT(r == 0); > + > + // At this point, we are racing the samplee. The sem_post on > + // resume_sem just above makes the samplee logically able to > + // progress forwards, so that it will exit the signal handler. This sentence is a good example where my confusion between "signal handler" and "samplee" make things hard to understand... @@ +365,5 @@ > + > + // At this point, we are racing the samplee. The sem_post on > + // resume_sem just above makes the samplee logically able to > + // progress forwards, so that it will exit the signal handler. > + // But when it resumes making to progress depends on when the "making to progress"?
Attachment #8841554 -
Flags: feedback+
Comment 8•7 years ago
|
||
> AIUI, it's needed because you call sem_init() within the inner loop. What
> about if you moved it to the outer loop? I think you wouldn't need it then.
> Then it might be possible for SignalHandler's sem_wait(resume) call to still
> be incomplete by the time the next inner loop's iteration calls
> sem_post(resume), in which case it'll be interrupted and instead of looping
> the Signal handler could just return. (I'm not sure how the errno handling
> would work in this scenario.)
Sorry, I meant to delete that text. It's basically restating the second bullet point just above it. A third semaphore is probably still a better approach.
Comment 9•7 years ago
|
||
FWIW, here's what I wrote down about the three semaphores idea while I was getting my head around it: - First semaphore: makes the sampler thread wait on the signal handler (so the latter can set up context) - Second semaphore: makes the signal handler wait on the sampler thread (so the latter can Tick() using the context) - Third semaphore: makes the sampler thread wait on the signal handler (so the latter can finish cleanly) Also, even if this approach works out, it might be worth moving the sem_init() calls from the inner loop to the outer loop.
Assignee | ||
Comment 10•7 years ago
|
||
Thanks for the comments and review! Yes, I later realised that what is needed is a four-message protocol between the sampler thread and the thread being sampled. I missed the need for the fourth message in my test program, hence the need for the spinlock kludge. I'll redo it properly today, removing the spinlock.
Assignee | ||
Comment 11•7 years ago
|
||
Revised patch. Addresses all review comments, has improved comments, no spinlock, and three semaphores.
Attachment #8841554 -
Attachment is obsolete: true
Assignee | ||
Comment 12•7 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #7) Some comments: ---- I was indeed inconsistent in terminology for the sampling and signal-receiving threads. As it stands we don't appear to have decent names for: (1) the thread that does the sampling (2) the threads that are being sampled (3) the signal handler, when it runs For (2) we say clumsy and vague things like "the signal handler thread". Note that (2) and (3) are not the same thing. I propose the following: (1) "sampler thread" (2) "samplee thread" (3) "samplee thread's signal handler" (1) and (2) apply equally to the -win32 and -macos variants. In the revised patch I've used (1), (2) and (3) more consistently and removed mentions of "target thread". ---- Your analysis about the need for the spinlock matches mine. I agree that we need a four-message protocol using a signal and three semaphores, and the patch implements that. It also has a big comment documenting the protocol. You mentioned .. > - Move the sem_init() calls from the inner loop to the outer loop. I considered that, but I prefer not to do it, because: * I have a hard time figuring out what the correctness implications would be. It's difficult enough making it not crash or hang as it is. * Having the sampler's per-thread loop be able to "get ahead" of the threads it is sampling (ie, not wait for the signal handlers to exit) might make it more difficult to merge all three platform-* implementations together later. ---- > For the other sem_wait you check for interruption by signals. Why not the > same here? I made them all check for interruption by signals. I find it difficult to figure out whether interruption by (other) signals could cause deadlocking. So far I haven't seen any problems. I did notice that the sigaction() call that sets up the signal handler (in PlatformStart), leaves the sa_mask field clear. That means that the signal handler could be interrupted by some other signal. I'd prefer to set all entries of the sa_mask field, in an attempt to block all other signals in the handler, or at least as many as the kernel will let us. That would be an extra line of defence against the signal handler getting interrupted. That can go in a different bug, though.
Assignee | ||
Comment 13•7 years ago
|
||
For a final version of the patch I will of course remove the following hack #define MY_MOZ_RELEASE_ASSERT(_cond) MOZ_RELEASE_ASSERT(_cond) and replace MY_MOZ_RELEASE_ASSERT uses with standard MOZ_ASSERTS. The reason it is there is that currently the profiler is unusable when built with assertions, so I had to work with a non-debug build, but I still wanted to have assertions.
Assignee | ||
Comment 14•7 years ago
|
||
Fixes some comment typos. No functional change.
Attachment #8841938 -
Attachment is obsolete: true
Assignee | ||
Comment 15•7 years ago
|
||
Looks plausible on try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=3f6dfd890656039ff26fa71c550040bead666240
Assignee | ||
Updated•7 years ago
|
Attachment #8841945 -
Flags: review?(n.nethercote)
Comment 16•7 years ago
|
||
(In reply to Julian Seward [:jseward] from comment #13) > For a final version of the patch I will of course remove the following hack > > #define MY_MOZ_RELEASE_ASSERT(_cond) MOZ_RELEASE_ASSERT(_cond) > > and replace MY_MOZ_RELEASE_ASSERT uses with standard MOZ_ASSERTS. The reason > it is there is that currently the profiler is unusable when built with > assertions, so I had to work with a non-debug build, but I still wanted to > have assertions. In general I prefer to review patches in their final form. One approach you could use in this case is to have two patches in your queue, the main one, and a follow-up that removes MY_MOZ_RELEASE_ASSERT, and post both for review, along with a promise to fold them together before landing. Also, again, I'd be happier if I could see the commit message. Especially for a patch like this where it would ideally contain an explanation of why the change is worthwhile.
Comment 17•7 years ago
|
||
Comment on attachment 8841945 [details] [diff] [review] bug1342102-3.cset Review of attachment 8841945 [details] [diff] [review]: ----------------------------------------------------------------- Looks good. I think you're still referring to the samplee's signal handler as the samplee in several places. Perhaps I'm being too pedantic? After all, the samplee's signal handler is acting on behalf of the samplee. I'll let you decide. Other than that, a couple of nits, and otherwise looks good to go. ::: tools/profiler/core/platform-linux-android.cpp @@ +165,5 @@ > #endif > } > > +// A four-message protocol is used to reliably suspend and later resume the > +// thread to be sampled (the samplee): This whole comment is good. But it would be nice to have a little more motivation. Specifically: - The only way to reliably interrupt a Linux/Android thread and inspect its register state is via a signal handler. - We don't want to run much code within signal handlers, because that's considered poor form, so we do the minimum we can: grab the register state and pass it back to the sampler thread, which does all the unwinding, etc. That's the crux of it, AIUI. @@ +167,5 @@ > > +// A four-message protocol is used to reliably suspend and later resume the > +// thread to be sampled (the samplee): > +// > +// Sampler (signal sender) thread Samplee (thread to be sampled) The right-hand column is really a combination of the samplee and the samplee's signal handler, right? @@ +174,5 @@ > +// and point gSigHandlerCoordinator at it > +// > +// send SIGPROF to samplee ------- MSG 1 ----> (enter signal handler) > +// wait(mMessage2) Copy register state > +// into gSigHandlerCoordinator Nit: where descriptions spill onto a 2nd or later line, I would indent those later lines by 2 spaces, to aid readability. @@ +183,5 @@ > +// > +// Release samplee: > +// post(mMessage3) ------- MSG 3 -----> > +// wait(mMessage4) Samplee now resumes. Tell > +// the sampler that we are done. More accurate to say "Samplee can now resume"? It doesn't actually resume until SigprofHandler() returns, right? @@ +186,5 @@ > +// wait(mMessage4) Samplee now resumes. Tell > +// the sampler that we are done. > +// <------ MSG 4 ------ post(mMessage4) > +// Now we know the samplee has > +// finished using gSigHandlerCoordinator. The samplee's signal handler has finished using it. @@ +206,5 @@ > +{ > + SigHandlerCoordinator() > + { > + PodZero(&mUContext); > + int r = sem_init(&mMessage2, /*!pshared*/0, 0); Nit: I would write: > int r = sem_init(&mMessage2, /* pshared */0, 0); i.e. spaces around the comment markers, and no '!'. @@ +222,5 @@ > + } > + > + sem_t mMessage2; // "to sampler: context is in gSigHandlerCoordinator" > + sem_t mMessage3; // "to samplee: resume" > + sem_t mMessage4; // "to sampler: finished with gSigHandlerCoordinator" Nit: put the opening quotes after the colons? @@ +253,5 @@ > > + // At this point, the sampler thread assumes we are suspended, so we must > + // not touch any global state here. > + > + // Wait for message 3: the sampler thread tells us to resume Nit: add trailing '.'. @@ +387,5 @@ > + // signalling at it. > + int r = tgkill(vm_tgid_, threadId, SIGPROF); > + MY_MOZ_RELEASE_ASSERT(r == 0); > + > + // Wait for message 2 from the samplee, indicating that the context is The message is coming from the samplee's signal handler. @@ +417,5 @@ > + sample.ussMemory = ussMemory; > + > + Tick(&sample); > + > + // Send message 3 to the samplee, which tells it to resume. Message goes the samplee's signal handler. @@ +421,5 @@ > + // Send message 3 to the samplee, which tells it to resume. > + r = sem_post(&gSigHandlerCoordinator->mMessage3); > + MY_MOZ_RELEASE_ASSERT(r == 0); > + > + // Wait for message 4 from the samplee, which tells us that it has Message comes from the samplee's signal handler.
Attachment #8841945 -
Flags: review?(n.nethercote) → review+
Comment 18•7 years ago
|
||
Pushed by jseward@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/1e1880fc4d8f Use the same threading structure in platform-linux-android.cpp as for the -macos and -win32 versions. r=n.nethercote.
Comment 19•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/1e1880fc4d8f
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
You need to log in
before you can comment on or make changes to this bug.
Description
•