Gecko profiler on b2g (and maybe other Linux?) eats half the CPU busy-waiting in SignalSender

RESOLVED FIXED in mozilla25

Status

()

Core
Gecko Profiler
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: jld, Assigned: jld)

Tracking

({perf})

Trunk
mozilla25
ARM
Gonk (Firefox OS)
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [c= p=2 s=2013.07.12])

Attachments

(1 attachment)

(Assignee)

Description

5 years ago
// Wait for the signal handler to run before moving on to the next one
        while (sCurrentThreadProfile)
          sched_yield();

That needs to die.

POSIX condition variables are not async signal safe, but the futex syscall itself should be fine.
(Assignee)

Comment 1

5 years ago
Here's a perf_event profile of the profiler: https://people.mozilla.com/~bgirard/cleopatra/#report=1cdd3cc135bcd446db14854202cbf8fe9dba24d0

Note thread 689 using almost as much CPU time as the Browser app's main thread, which is running sunspider.  This is without an unwinder thread, but the same effect also occurs when using breakpad.  Multithreaded profiling (for SPS) is not enabled.  The requested interval is 10ms, but the average interval seen in the SPS profile is over 20ms.

(Meta-profile created with the kernel fix to save the user-mode PC so we can see sched_yield, and with a small change to the Android libc to make sched_yield create a frame that the kernel unwinder understands so that SignalSender shows up.)
(Assignee)

Comment 2

5 years ago
Created attachment 774357 [details] [diff] [review]
Replace SPS signal handler completion busy loop with POSIX semaphore
Attachment #774357 - Flags: review?(bgirard)
Comment on attachment 774357 [details] [diff] [review]
Replace SPS signal handler completion busy loop with POSIX semaphore

Review of attachment 774357 [details] [diff] [review]:
-----------------------------------------------------------------

::: tools/profiler/platform-linux.cc
@@ +273,5 @@
>            continue;
>          }
>  
>          // Wait for the signal handler to run before moving on to the next one
> +        sem_wait(&sSignalHandlingDone);

http://sdepl.ucsd.edu/cgi-bin/yman2html?m=sem_wait&s=3#ASYNC-SIGNAL%20SAFETY

Maybe we should include a note that it is async signal safe. I wonder if it's worth whitelisting platforms with IFDEF since this wont be true everywhere and whitelisting and ports will have to explicitly verify it.

@@ +308,5 @@
>    SamplerRegistry::AddActiveSampler(this);
>  
> +  // Initialize signal handler communication
> +  sCurrentThreadProfile = NULL;
> +  if (sem_init(&sSignalHandlingDone, /* pshared: */ 0, /* value: */ 0) != 0) {

I prefer assigning these to local variable to give them a description but this is fine.
Attachment #774357 - Flags: review?(bgirard) → review+
(Assignee)

Comment 4

5 years ago
(In reply to Benoit Girard (:BenWa) from comment #3)
> >          // Wait for the signal handler to run before moving on to the next one
> > +        sem_wait(&sSignalHandlingDone);
> 
> http://sdepl.ucsd.edu/cgi-bin/yman2html?m=sem_wait&s=3#ASYNC-SIGNAL%20SAFETY
> 
> Maybe we should include a note that it is async signal safe. I wonder if
> it's worth whitelisting platforms with IFDEF since this wont be true
> everywhere and whitelisting and ports will have to explicitly verify it.

LinuxThreads?  Wasn't that removed in favor of NPTL a long time ago?  For comparison, my laptop's version of the sem_post(3) man page has the unqualified note “sem_post() is async-signal-safe: it may be safely called within a signal handler.”  Also, according to the document linked above, even LinuxThreads violated async signal safety only on actual 386es (i.e., not a 486 or newer), which even NetBSD hasn't supported since 2007, and the SPARC, which I'm sure has some way of correctly implementing sem_post in non-ancient versions of the CPU.

And, of course, the relevant standard (http://pubs.opengroup.org/onlinepubs/009695399/functions/sem_post.html) declares, “The sem_post() function shall be reentrant with respect to signals and may be invoked from a signal-catching function.”  This requirement appears to date back to Issue 5 in 1997, so it's not exactly new.

Thus, I'm feeling reasonably confident about sem_post following its specification on any platform we're likely to run Gecko on in 2013.


(In case anyone reading this sees the quoted call to sig_wait and is confused: It is in fact sem_post that we need to care about here; sem_wait is called from SignalSender in non-signal-handler context (and async signal safety is defined in terms of the interface being used inside the signal handler rather than the routine being potentially interrupted) so it's fine.)
(Assignee)

Updated

5 years ago
Attachment #774357 - Flags: checkin?
(Assignee)

Comment 5

5 years ago
Also, I did try builds of the Linux-based platforms: https://tbpl.mozilla.org/?tree=Try&rev=ed77caf69091  (The orange ggc, which I didn't even ask for, appears to be not my fault.)
Attachment #774357 - Flags: checkin? → checkin+
https://hg.mozilla.org/mozilla-central/rev/be42574c6416
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla25

Updated

5 years ago
Keywords: perf
Whiteboard: [c= p=2] → [c= p=2 s=2013.07.12]
You need to log in before you can comment on or make changes to this bug.