Add event loop responsiveness instrumentation

RESOLVED FIXED in mozilla6

Status

()

RESOLVED FIXED
9 years ago
5 years ago

People

(Reporter: azakai, Assigned: ted)

Tracking

Trunk
mozilla6
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments, 13 obsolete attachments)

22.89 KB, patch
cjones
: review+
Details | Diff | Splinter Review
10.50 KB, patch
Details | Diff | Splinter Review
15.84 KB, patch
cjones
: review+
Details | Diff | Splinter Review
(Reporter)

Description

9 years ago
We have various settings that can trade off absolute speed for responsiveness (for example, nicing the child). We need objective, useful measurements of responsiveness, so we can tell what we can consider trading off for what.
(Reporter)

Updated

9 years ago
Blocks: 594121
(Reporter)

Comment 1

9 years ago
Posted patch some measurements (obsolete) — Splinter Review
This patch measures 2 things:

1. Behavior of the main thread. 'main_loop_busy' is the % of time, over the last 5 seconds, that the main thread was busy. 'main_loop_max_lag_ms' is the longest stretch during which the main thread was busy, over the last 5 seconds (if two events are very close together - within 5ms - we count them as a single 'stretch').

2. # of involuntary context switches ('nivcsw'). That is, how many times did the CPU grab control from us because we were taking too long, over the last second. This is an indicator of whether we would have wanted more CPU then we actually got.

With browser.tabs.remote set to false, running the v8 benchmark in dromaeo gives

  MOZ_PROFILE_PARENT_CPU { 'main_loop_busy': 0.998839, 'main_loop_max_lag_ms': 4994 }
  MOZ_PROFILE_PARENT_CPU { 'nivcsw': 99 }

- unsurprisingly, we were busy almost 100% of the time, we lagged for almost all of the last 5 seconds, and the CPU interrupted us 100 times over the last second. And fennec is of course very unresponsive.

When using browser.tabs.remote = true, we have the normal behavior,

  MOZ_PROFILE_PARENT_CPU { 'main_loop_busy': 0.002645, 'main_loop_max_lag_ms': 13 }
  MOZ_PROFILE_PARENT_CPU { 'nivcsw': 3 }

and fennec is nice and responsive.

Thoughts/ideas on these measurements?
(Reporter)

Comment 2

9 years ago
Forgot to say 2 things:

1. This patch is not meant for review or anything. It's just some hacks, in order to get measurements.

2. Obviously each measurement here is imperfect in itself. Hopefully though a variety of measurements can give us a good picture of things.
Didn't look too closely at the patch, but two initial points
 - Can't measure anything event-loop-related only at nsThread, because we also drop into native loop(s).  Most important are the AppShell ones.  (Yes, this design is completely idiotic and busted.)
 - In addition to measuring time processing events, we'll also want something like a "tracer event" to go through our event loop, like windowing systems use, to measure response latency.  That is, ping the main thread with a time-stamped tracer event periodically, and measure how long it takes for that to be processed.  This can measure backlog from a bunch of short-running events.

Thanks for taking this on, it's badly needed.
Also, I'm not sure what we gain by measuring involuntary context switches in addition to time processing events.
(Reporter)

Comment 5

9 years ago
(In reply to comment #3)
> Didn't look too closely at the patch, but two initial points
>  - Can't measure anything event-loop-related only at nsThread, because we also
> drop into native loop(s).  Most important are the AppShell ones.  (Yes, this
> design is completely idiotic and busted.)

After skimming through that code, I placed the 'event start' marker in the patch after OnProcessNextEvent() (in nsThread::ProcessNextEvent), as OnProcessNextEvent is what gets into the native event queue. So this basically causes it to ignore native events. Which is good for when they just wait for something to happen, but maybe not what we want if they do actual processing. I presume the former is much more common. Or is there a case here I am not thinking of?

>  - In addition to measuring time processing events, we'll also want something
> like a "tracer event" to go through our event loop, like windowing systems use,
> to measure response latency.  That is, ping the main thread with a time-stamped
> tracer event periodically, and measure how long it takes for that to be
> processed.  This can measure backlog from a bunch of short-running events.
> 

Very good idea, I'll do that next.

> Also, I'm not sure what we gain by measuring involuntary context switches in
> addition to time processing events.

Well, maybe not much, but if nothing else it's good as a sanity check on the other measurements.
(In reply to comment #5)
> (In reply to comment #3)
> > Didn't look too closely at the patch, but two initial points
> >  - Can't measure anything event-loop-related only at nsThread, because we also
> > drop into native loop(s).  Most important are the AppShell ones.  (Yes, this
> > design is completely idiotic and busted.)
> 
> After skimming through that code, I placed the 'event start' marker in the
> patch after OnProcessNextEvent() (in nsThread::ProcessNextEvent), as
> OnProcessNextEvent is what gets into the native event queue. So this basically
> causes it to ignore native events. Which is good for when they just wait for
> something to happen, but maybe not what we want if they do actual processing. I
> presume the former is much more common. Or is there a case here I am not
> thinking of?
> 

We repaint and dispatch input events from the native loop.  We'll want to include those in the measurements.
(Reporter)

Comment 7

9 years ago
Posted patch some measurements v2 (obsolete) — Splinter Review
Added a tracer event.

Oddly, the tracer delays are in the range of 100-1000ms when running dromaeo in a local tab, which means the UI is basically frozen. IOW the tracers get run, but not rendering or responses to UI events.
Attachment #485414 - Attachment is obsolete: true
(Reporter)

Comment 8

9 years ago
> We repaint and dispatch input events from the native loop.  We'll want to
> include those in the measurements.

I guess we'll need to detect which native events to ignore, and which not, which will be widget-dependent...
(In reply to comment #7)
> Created attachment 485451 [details] [diff] [review]
> some measurements v2
> 
> Added a tracer event.
> 
> Oddly, the tracer delays are in the range of 100-1000ms when running dromaeo in
> a local tab, which means the UI is basically frozen. IOW the tracers get run,
> but not rendering or responses to UI events.

That makes sense with an XPCOM-event tracer.  Ideally we'd probably want to fire off a tracer to the native loop, since that more closely corresponds to user-visible responsiveness, but I don't know of an easy way to do that across all platforms.
FYI, we're going to hook something like this up to Talos in bug 631571. I'm not sure I want to use your exact approach here, but tracer events do sound like a good way to do it. cjones suggested that native tracer events would be more realistic, and that makes sense to me and probably isn't that hard to implement on the platforms we care about.

Alon: should I steal this bug for the platform work I'm going to do, or should I just file a new one?
Stealing!
Assignee: azakai → ted.mielczarek
Blocks: 631571
Component: General → General
Product: Fennec → Toolkit
QA Contact: general → general
Summary: Measure Fennec's Responsiveness → Add event loop responsiveness instrumentation
Here's an implementation for GTK, it was pretty straightforward. I'll look into implementing this on other platforms next week.
Attachment #509852 - Attachment is obsolete: true
With some helpful pointers from jimm, here's a Win32 implementation. It's sending a WM_USER-type message at the hidden window and waiting for a response.
One extension of this that would be useful is to grab self-minidumps when the unresponsiveness gets "ridiculous".  That's cross-platform though and could easily be done in a followup.
Attachment #510727 - Attachment is obsolete: true
I decided to split up the widget implementations to make review simpler. Here's the core bits + the GTK2 implementation.

The only thing I'm not 100% on here is the output format. Right now it just prints:
MOZ_EVENT_INTERVAL <value>

where <value> is in integer milliseconds. I probably want to include a timestamp there as well, so that latency samples can be correlated with UI events, but I'm not sure of a good way to do that.

r?bsmedberg on the toolkit bits, r?karlt on the widget/gtk2 bits.
Attachment #511406 - Flags: review?(karlt)
Attachment #511406 - Flags: review?(benjamin)
Attachment #485451 - Attachment is obsolete: true
Here's the Win32 implementation.
Attachment #511409 - Flags: review?(jmathies)
And here's the Cocoa implementation.

Steven: I wound up subclassing NSApplication, since there didn't seem to be any other clean way to send events to the event loop and pick them off. It's not a hugely invasive change, but if you have any feedback on it I'm all ears.
Attachment #511410 - Flags: review?(smichaud)
Blocks: 633239
Comment on attachment 511406 [details] [diff] [review]
Implement event loop instrumentation using native events, core implementation + GTK2 implementation

>+bool FireAndWaitForTracerEvent()

Shouldn't this be in the "mozilla" namespace, or prefixed with NS_?

r+ on the logic in widget code.
Attachment #511406 - Flags: review?(karlt) → review+

Comment 20

8 years ago
Comment on attachment 511406 [details] [diff] [review]
Implement event loop instrumentation using native events, core implementation + GTK2 implementation

Quick drive-by, super minor nit: should only build WidgetTraceEvent.cpp in for builds that have MOZ_INSTRUMENT_EVENT_LOOP set?
It's not a configurable option, it gets selected per-widget-toolkit, so all GTK2 builds will have it enabled. (The feature is enabled at runtime by env var.)
(In reply to comment #19)
> >+bool FireAndWaitForTracerEvent()
> 
> Shouldn't this be in the "mozilla" namespace, or prefixed with NS_?

I hate NS_ prefixes, personally. I guess it could go in the mozilla namespace, I hadn't really considered that. I suppose that would be more in-line with what we're doing for internal APIs now.
Comment on attachment 511409 [details] [diff] [review]
Win32 event loop instrumentation

>+  NS_IMETHOD Run() {
>+    // Jump through some hoops to locate the hidden window.
>+    nsCOMPtr<nsIAppShellService> appShell(do_GetService(NS_APPSHELLSERVICE_CONTRACTID));
>+    nsCOMPtr<nsIXULWindow> hiddenWindow;
>+
>+    nsresult rv = appShell->GetHiddenWindow(getter_AddRefs(hiddenWindow));
>+    if (NS_FAILED(rv)) {
>+      return rv;
>+    }
>+
>+    nsCOMPtr<nsIDocShell> docShell;
>+    rv = hiddenWindow->GetDocShell(getter_AddRefs(docShell));
>+    if (NS_FAILED(rv) || !docShell) {
>+      return rv;
>+    }
>+

It would probably be simpler to QI to nsIBaseWindow here and just call GetMainWidget. (There's a function in WinTaskbar.cpp named GetHWNDFromDocShell that does this.)

>+  if (msg == MOZ_WM_TRACE) {
>+    SignalTracerThread();
>+    return 0;
>+  }
>+

Please add a comment on this explaining what it is.
Comment on attachment 511410 [details] [diff] [review]
Cocoa event loop instrumentation

This looks fine to me, but I've got a few nits:

GeckoApplication should probably be changed to GeckoNSApplication.
This conforms to the naming convention we've already used for
GeckoNSMenu, and is probably less likely to be confusing.

CocoaEvents.h should probably be CustomEvents.h or CustomCocoaEvents.h
-- again, I think this is less confusing.

In widget/src/cocoa/Makefile.in you've changed the formatting of the
whole "CMMSRCS =" block.  You should probably restore the original
formatting.

In FireAndWaitForTracerEvent() you use "[pool drain]" instead of the
"[pool release]" that we would "normally" use.  "[pool drain]" isn't
incorrect, and is functionally equivalent to "[pool release]" in
current code.  Apple would argue that "[pool drain]" is actually more
correct, because it can also be used with Objective-C garbage
collection (which we currently don't support).  But using Objective-C
garbage collection makes autorelease pools unnecessary.

So I'd be inclined to use "[pool drain]" ... though maybe I'm being
overly picky :-)

Finally, this won't work with the plugin-container process used by
OOPP (which uses its own NSApplication subclass, CrApplication, from
ipc/chromium/src/base/chrome_application_mac.mm).  But I'm not sure we
should care.
Attachment #511410 - Flags: review?(smichaud) → review+
> So I'd be inclined to use "[pool drain]" ... though maybe I'm being
> overly picky :-)

So I'd be inclined to use "[pool release]" :-)
Thanks for the review!

(In reply to comment #24)
> GeckoApplication should probably be changed to GeckoNSApplication.
> This conforms to the naming convention we've already used for
> GeckoNSMenu, and is probably less likely to be confusing.
> 
> CocoaEvents.h should probably be CustomEvents.h or CustomCocoaEvents.h
> -- again, I think this is less confusing.

Will do.

> In widget/src/cocoa/Makefile.in you've changed the formatting of the
> whole "CMMSRCS =" block.  You should probably restore the original
> formatting.

Yeah, that's intentional. Our (admittedly undocumented) style guidelines for Makefiles are to use two-space indentation in line continuation blocks like that, and I've been encouraging people to clean up blocks as they touch them. I don't think it's worth cleaning up entire files in one go, but since I was already touching this block it's not too bad.

> In FireAndWaitForTracerEvent() you use "[pool drain]" instead of the
> "[pool release]" that we would "normally" use.  "[pool drain]" isn't
> incorrect, and is functionally equivalent to "[pool release]" in
> current code.  Apple would argue that "[pool drain]" is actually more
> correct, because it can also be used with Objective-C garbage
> collection (which we currently don't support).  But using Objective-C
> garbage collection makes autorelease pools unnecessary.
> 
> So I'd be inclined to use "[pool drain]" ... though maybe I'm being
> overly picky :-)

I will use whatever you recommend here. I think I used drain just because that's what the docs seem to recommend.

> Finally, this won't work with the plugin-container process used by
> OOPP (which uses its own NSApplication subclass, CrApplication, from
> ipc/chromium/src/base/chrome_application_mac.mm).  But I'm not sure we
> should care.

That's fine, we only care about instrumenting the browser process, since that's where the user-interaction happens.
Duplicate of this bug: 15122
Comment on attachment 511406 [details] [diff] [review]
Implement event loop instrumentation using native events, core implementation + GTK2 implementation

The build-goop looks fine. I'm a little worried about the tracer thread never shutting down, because it's not clear to me whether TimeStamp and the other machinery it uses might disappear during the shutdown sequence. Redirecting review to cjones to help answer that question.
Attachment #511406 - Flags: review?(benjamin) → review?(jones.chris.g)
Comment on attachment 511406 [details] [diff] [review]
Implement event loop instrumentation using native events, core implementation + GTK2 implementation

When we have POSIX CLOCK_MONOTONIC available (linux), TimeStamp::Shutdown is a no-op and TimeStamp can be used forever.  Everywhere else we use NSPR currently and TimeStamp::Shutdown destroys a lock off a static dtor.  Once the static dtor runs, any use of TimeStamp will crash (safely, null deref).

So we either have to join the tracer thread during shutdown or not run the non-no-op TimeStamp::Shutdown if we're using the tracer instrumentation.  This code can't land as-is.

It seems a bit more future-proof to join the tracer thread sometime during shutdown; what was the reason you didn't want to do that?
Attachment #511406 - Flags: review?(jones.chris.g) → review-
Although TBH, I'd be fine with not freeing the TimeStamp lock for non-NS_FREE_PERMANENT_DATA builds, which would make this code OK as-is.
Sheer laziness on my part, it was just more work to communicate that back to the thread. I can fix the patch to do so. The intent is to make this usable in nightly/release builds, since we want to run this through Talos, so I'd rather not have any fiddly flags to deal with.
Comment 30 would make me happy for the time being.  Let's get this landed!
Attachment #511406 - Attachment is obsolete: true
I fixed things up so we shut down and join the background thread. I also shuffled some things around so that there's a mozilla/WidgetTraceEvent.h header exported from widget/, and the function exported from there lives in the mozilla namespace.

I fiddled with the output format to 1) include timestamps 2) only output samples if they exceed our responsiveness threshold (currently 50ms) 3) include start and end lines.

Also I added a whole bunch of comments, including a large usage block at the top of EventTracer.cpp.
Attachment #524622 - Flags: review?(jones.chris.g)
Attachment #511409 - Attachment is obsolete: true
Attachment #511409 - Flags: review?(jmathies)
Fixed up review comments, and also moved things into the mozilla namespace.
Attachment #525071 - Flags: review?(jmathies)
Attachment #511410 - Attachment is obsolete: true
With review comments addressed.
Comment on attachment 524622 [details] [diff] [review]
Implement event loop instrumentation using native events, core implementation + GTK2 implementation

Generally, style here is still likeThis for locals and sLikeThis for
file-static variables.  I don't like it but there it is.

>diff --git a/toolkit/xre/EventTracer.cpp b/toolkit/xre/EventTracer.cpp
>+ * Set MOZ_INSTRUMENT_EVENT_LOOP=1 in the environment to enable
>+ * this instrumentation.
>+ *
>+ * Set MOZ_INSTRUMENT_EVENT_LOOP_OUTPUT in the environment to a
>+ * file path to contain the log output, the default is to log to stdout.

This implicitly relies on child processes not using nsAppRunner (if
they did, badness would happen).  We might want to instrument child
process' event loops or not; should note here that only the UI process
is being traced atm.

>+using mozilla::TimeDuration;
>+using mozilla::TimeStamp;
>+using mozilla::FireAndWaitForTracerEvent;

using namespace mozilla;

>+void TracerThread(void *arg)
>+{
>+  // This should be set to the maximum latency we'd like to allow
>+  // for responsiveness.
>+  const PRIntervalTime kMeasureInterval = PR_MillisecondsToInterval(50);

Would be nice to allow this to be overridden by an environment
variable, but not a big deal.

>+    //TODO: only wait up to a maximum of kMeasureInterval, return
>+    // early if that threshold is exceeded and dump a stack trace
>+    // of some sort.

Or connect Shark or ...  Lots of stuff here.

>+    if (FireAndWaitForTracerEvent()) {
>+      TimeDuration duration = TimeStamp::Now() - start;
>+      // Only report samples that exceed our measurement interval.
>+      if (duration.ToMilliseconds() > kMeasureInterval) {

Not sure about this but I guess we can let experience tell how to
filter.

>+bool InitEventTracing()
>+{

NS_ABORT_IF_FALSE(!sTracerThread).  (Soon we'll have MOZ_ASSERT()!)

>diff --git a/toolkit/xre/EventTracer.h b/toolkit/xre/EventTracer.h
>+// Create a thread that will fire events back at the
>+// main thread to measure responsiveness. Return true
>+// if the thread was created successfully.
>+bool InitEventTracing();
>+
>+// Signal the background thread to stop, and join it.
>+void ShutdownEventTracing();

Need to note that both of these functions must be called from the same
thread.

These should be in namespace mozilla too.

>diff --git a/widget/public/WidgetTraceEvent.h b/widget/public/WidgetTraceEvent.h
>+// Fire a tracer event at the event loop, and block until
>+// the event is processed. This should only be called by
>+// a background thread.
>+bool FireAndWaitForTracerEvent();

Since there's no param to this function that specifies the event loop,
the comment should explicitly said "UI thread".  It seems like it
might be easier to just say that this function can only be called by a
thread that's not the UI thread.

>diff --git a/widget/src/gtk2/WidgetTraceEvent.cpp b/widget/src/gtk2/WidgetTraceEvent.cpp
>
>+pthread_mutex_t mutex         = PTHREAD_MUTEX_INITIALIZER;
>+pthread_cond_t  condition_var = PTHREAD_COND_INITIALIZER;

You need another variable to track signaled-ness, say |bool
sTracerProcessed|.  The pthread/CondVar/PRCondVar spec allows for
spurious wakeups of condition variables.  The pattern is

signaler:
  MutexAutoLock lock(sMutex);
  sTracerSignaled = true;
  sCondVar.Notify();

signalee:
  MutexAutoLock lock(sMutex);
  while (!sTracerProcessed)
    sCondVar.Wait();

Somewhat pedantic, but gtk2 != pthread so this code isn't technically
valid.  I'd just use mozilla::Mutex/CondVar instead (soon to be
mozilla::Monitor).

>+
>+// This function is called from the main (UI) thread.
>+gboolean TracerCallback(gpointer data)
>+{
>+  pthread_mutex_lock(&mutex);

NS_ABORT_IF_FALSE(!sTracerProcessed);
Attachment #524622 - Flags: review?(jones.chris.g)
Switching between Google C++ style and Mozilla is killing me. I think I've got Stockholm syndrome, I am starting to like their styleguide. (Perhaps because it's so pedantic it specifies everything and I don't have to guess, I just follow it blindly?)

(In reply to comment #36)
> Generally, style here is still likeThis for locals and sLikeThis for
> file-static variables.  I don't like it but there it is.

Bleh. Even in a new file I'm adding?

> >+using mozilla::TimeDuration;
> >+using mozilla::TimeStamp;
> >+using mozilla::FireAndWaitForTracerEvent;
> 
> using namespace mozilla;

So, this is explicitly forbidden by the Google style guide, and our style guide doesn't say anything about it. Similar to how in Python, it's frowned upon to "from foo import *", I think we should discourage "using namespace", since it makes it less clear where things are coming from. If it's alright with you I'd like to keep this the way it is.

> >+    if (FireAndWaitForTracerEvent()) {
> >+      TimeDuration duration = TimeStamp::Now() - start;
> >+      // Only report samples that exceed our measurement interval.
> >+      if (duration.ToMilliseconds() > kMeasureInterval) {
> 
> Not sure about this but I guess we can let experience tell how to
> filter.

bmoss suggested this, so I threw it in here. Without a filter like this the output is *very* verbose (20Hz), and that's going to be a pain to deal with. We can certainly tweak this if necessary later.


> >diff --git a/toolkit/xre/EventTracer.h b/toolkit/xre/EventTracer.h
> >+// Create a thread that will fire events back at the
> >+// main thread to measure responsiveness. Return true
> >+// if the thread was created successfully.
> >+bool InitEventTracing();
> >+
> >+// Signal the background thread to stop, and join it.
> >+void ShutdownEventTracing();
> 
> Need to note that both of these functions must be called from the same
> thread.
> 
> These should be in namespace mozilla too.

Why do these need to be in namespace mozilla? They're internal implementation details, not an exported API. (Unless we're moving everything into namespace mozilla?)
Attachment #524622 - Attachment is obsolete: true
This addresses all the review comments except the ones related to namespaces that I carped about in my previous comment.
Attachment #525374 - Flags: review?(jones.chris.g)
Attachment #525085 - Attachment is obsolete: true
Updated the Cocoa patch to use mozilla::{Mutex,CondVar} like the GTK2 patch.
(In reply to comment #37)
> Switching between Google C++ style and Mozilla is killing me. I think I've got
> Stockholm syndrome, I am starting to like their styleguide. (Perhaps because
> it's so pedantic it specifies everything and I don't have to guess, I just
> follow it blindly?)

Yeah ... IME, the style google uses is common, and IMNSHO it's much less ugly than Gecko's (note that SM C++ style is closer to what google uses).

> 
> (In reply to comment #36)
> > Generally, style here is still likeThis for locals and sLikeThis for
> > file-static variables.  I don't like it but there it is.
> 
> Bleh. Even in a new file I'm adding?

The new files would technically fall under roc's sr, and he pretty adamantly enforces the style guide.  I don't particularly care, but if you'd prefer to use this style, I'd ask roc.

> 
> > >+using mozilla::TimeDuration;
> > >+using mozilla::TimeStamp;
> > >+using mozilla::FireAndWaitForTracerEvent;
> > 
> > using namespace mozilla;
> 
> So, this is explicitly forbidden by the Google style guide, and our style guide
> doesn't say anything about it. Similar to how in Python, it's frowned upon to
> "from foo import *", I think we should discourage "using namespace", since it
> makes it less clear where things are coming from. If it's alright with you I'd
> like to keep this the way it is.

I used to be in that camp.  I came to it through java, where identifier collisions are common (even in the standard library) and this style is pretty much necessary.  But everyone uses eclipse to write java, and eclipse manages these imports automatically and hides them from developers.  In C++ in gecko, we don't have the same kind of collision problems, and we don't have IDEs managing "using" statements for us automatically.  Now that some files have built up 10's of lines of |using mozilla::Foo|, I'm starting to tire of it.  I'm not sure what motivated google's guideline; finding declarations of identifiers is the job of IDEs, which is the same reason I think hungarian notation is silly.

Here though, for 3, it's not worth arguing over too much.

> 
> > >+    if (FireAndWaitForTracerEvent()) {
> > >+      TimeDuration duration = TimeStamp::Now() - start;
> > >+      // Only report samples that exceed our measurement interval.
> > >+      if (duration.ToMilliseconds() > kMeasureInterval) {
> > 
> > Not sure about this but I guess we can let experience tell how to
> > filter.
> 
> bmoss suggested this, so I threw it in here. Without a filter like this the
> output is *very* verbose (20Hz), and that's going to be a pain to deal with. We
> can certainly tweak this if necessary later.

Sure, let's let experience tell.  We'll have plenty enough outliers above 50ms to keep everyone busy for the time being ;).

> 
> > >diff --git a/toolkit/xre/EventTracer.h b/toolkit/xre/EventTracer.h
> > >+// Create a thread that will fire events back at the
> > >+// main thread to measure responsiveness. Return true
> > >+// if the thread was created successfully.
> > >+bool InitEventTracing();
> > >+
> > >+// Signal the background thread to stop, and join it.
> > >+void ShutdownEventTracing();
> > 
> > Need to note that both of these functions must be called from the same
> > thread.
> > 
> > These should be in namespace mozilla too.
> 
> Why do these need to be in namespace mozilla? They're internal implementation
> details, not an exported API. (Unless we're moving everything into namespace
> mozilla?)

If we happen to import another C++ library that happens to define these symbols, they'll collide in the linker.  This happened when we imported the code in ipc/chromium.
Comment on attachment 525374 [details] [diff] [review]
Implement event loop instrumentation using native events, core implementation + GTK2 implementation.

>diff --git a/toolkit/xre/EventTracer.cpp b/toolkit/xre/EventTracer.cpp

>+using mozilla::TimeDuration;
>+using mozilla::TimeStamp;
>+using mozilla::FireAndWaitForTracerEvent;

Still prefer |using namespace mozilla| but this doesn't burn my eyes
enough to r-.

>+bool InitEventTracing()
>+{
>+  // Create a thread that will fire events back at the
>+  // main thread to measure responsiveness.
>+  sTracerThread = PR_CreateThread(PR_USER_THREAD,
>+                                  TracerThread,
>+                                  NULL,
>+                                  PR_PRIORITY_NORMAL,
>+                                  PR_GLOBAL_THREAD,
>+                                  PR_JOINABLE_THREAD,
>+                                  0);
>+  NS_ABORT_IF_FALSE(!sTracerThread, "Failed to create tracer thread!");

The assertion I asked for would guard against multiple-init, a logic
erro, and should go before the PR_CreateThread().  I don't
particularly care what we do if creating the tracer thread fails.

>diff --git a/toolkit/xre/EventTracer.h b/toolkit/xre/EventTracer.h
>+// Create a thread that will fire events back at the
>+// main thread to measure responsiveness. Return true
>+// if the thread was created successfully.
>+bool InitEventTracing();
>+
>+// Signal the background thread to stop, and join it.
>+// Must be called from the same thread that called InitEventTracing.
>+void ShutdownEventTracing();

namespace mozilla per discussion.

>diff --git a/widget/public/WidgetTraceEvent.h b/widget/public/WidgetTraceEvent.h
>+// Fire a tracer event at the event loop, and block until

"At the UI-thread event loop"

>diff --git a/widget/src/gtk2/WidgetTraceEvent.cpp b/widget/src/gtk2/WidgetTraceEvent.cpp
>+using mozilla::CondVar;
>+using mozilla::Mutex;
>+using mozilla::MutexAutoLock;

And here.

>+// This function is called from the background tracer thread.
>+bool FireAndWaitForTracerEvent()
>+{
>+  // Send a default-priority idle event through the
>+  // event loop, and wait for it to finish.
>+  MutexAutoLock lock(sMutex);
>+  sTracerProcessed = false;
>+  g_idle_add_full(G_PRIORITY_DEFAULT,
>+                  TracerCallback,
>+                  NULL,
>+                  NULL);
>+  while (!sTracerProcessed)
>+    sCondVar.Wait();

I think it makes more sense to set |sTracerProcessed = false| here
after the Wait() and then assert !sTracerProcessed just after entering
the mutex above.

r=me with that.
Attachment #525374 - Flags: review?(jones.chris.g) → review+

Updated

8 years ago
Attachment #525071 - Flags: review?(jmathies) → review+
Attachment #525374 - Attachment is obsolete: true
Pushed to m-c:
http://hg.mozilla.org/mozilla-central/rev/81ea8b39ed4e
http://hg.mozilla.org/mozilla-central/rev/b7bafc5210cb
http://hg.mozilla.org/mozilla-central/rev/2c5a545681cf
Status: NEW → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla6

Comment 44

8 years ago
This leaked a Mutex and a CondVar in crashtests, so I backed it out:

http://hg.mozilla.org/mozilla-central/rev/d3bff8e97702
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attachment #525697 - Attachment is obsolete: true
Okay, this patch removes the static Mutex/CondVar variables and replaces them with pointers. To facilitate this, I added Init/Cleanup methods to the widget backend. I verified that we no longer leak with this patch.
Attachment #525071 - Attachment is obsolete: true
Trivial changes to match the API changes.
Attachment #525390 - Attachment is obsolete: true
Changes to match the GTK2 impl.
Attachment #527344 - Flags: review?(jones.chris.g)
Attachment #527024 - Flags: review?(jones.chris.g)
Comment on attachment 527024 [details] [diff] [review]
Implement event loop instrumentation using native events, core implementation + GTK2 implementation.

Looks OK.  (Hey, interdiff worked!)

Nit: "cleanup" is a noun, "clean up" is a verb phrase.  Please s/Cleanup/CleanUp/.

r=me with that.
Attachment #527024 - Flags: review?(jones.chris.g) → review+
Comment on attachment 527344 [details] [diff] [review]
Cocoa event loop instrumentation.

Interdiff looks OK.

Same nit, s/Cleanup/CleanUp/.
Attachment #527344 - Flags: review?(jones.chris.g) → review+
Pushed revised patches to m-c:
http://hg.mozilla.org/mozilla-central/rev/9f474136c458
http://hg.mozilla.org/mozilla-central/rev/b6726f1fd6ed
http://hg.mozilla.org/mozilla-central/rev/855e5cd3c884
Status: REOPENED → RESOLVED
Last Resolved: 8 years ago8 years ago
Resolution: --- → FIXED
Blocks: 653701
Blocks: 653703
(In reply to comment #37)
> > >+using mozilla::TimeDuration;
> > >+using mozilla::TimeStamp;
> > >+using mozilla::FireAndWaitForTracerEvent;
> > 
> > using namespace mozilla;
> 
> So, this is explicitly forbidden by the Google style guide, and our style
> guide doesn't say anything about it. Similar to how in Python, it's frowned
> upon to "from foo import *", I think we should discourage "using namespace",
> since it makes it less clear where things are coming from. If it's alright
> with you I'd like to keep this the way it is.

FWIW we're "using namespace mozilla;" all over the place and I think that's good. The goal is prevent accidental name clashes, not to resolve confusion that is arising in practice. Your IDE or MXR will tell you where something is coming from.
Blocks: 700754
You need to log in before you can comment on or make changes to this bug.