Closed Bug 718801 Opened 12 years ago Closed 12 years ago

add telemetry for main thread lock held times

Categories

(Core :: XPCOM, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: froydnj, Assigned: froydnj)

Details

(Whiteboard: [Snappy:P2])

Attachments

(4 files, 10 obsolete files)

7.05 KB, patch
froydnj
: review+
taras.mozilla
: feedback+
Details | Diff | Splinter Review
2.84 KB, patch
asuth
: review+
Details | Diff | Splinter Review
5.45 KB, patch
cjones
: review+
Details | Diff | Splinter Review
2.96 KB, patch
khuey
: review-
Details | Diff | Splinter Review
We think holding locks might be a source of jank.  Adding telemetry for lock times would help us measure that and make sure we don't hold locks for too long in the future.
Attached patch telemetry bits (obsolete) — Splinter Review
The mutex bits should be ready to go shortly.  In the meantime, here are the telemetry bits.  Straightforward, nothing surprising.

One question, though: do I need locking in GetLongMutexes?  There's locking in the SlowSQL accessors, but it's not clear to me that it'd be necessary, since we're measuring main-thread-only stuff with this.
Attachment #589570 - Flags: review?(taras.mozilla)
Attached patch telemetry bits (obsolete) — Splinter Review
Let's try that again, with correct names everywhere.  Locking question in GetLongMutexes still holds.
Attachment #589570 - Attachment is obsolete: true
Attachment #589570 - Flags: review?(taras.mozilla)
Attachment #589590 - Flags: review?(taras.mozilla)
Comment on attachment 589590 [details] [diff] [review]
telemetry bits

+  if (!(array
+        && JS_SetElement(args->cx, array, 0, &hitCount)
+        && JS_SetElement(args->cx, array, 0, &totalTime))) {

that looks like a typo.

We have conservative stack scanning, but I would still appreciate it if you 
+  if (!JS_DefineProperty(args->cx, args->obj, entry->GetKey(),
+                         OBJECT_TO_JSVAL(array), NULL, NULL,
+                         JSPROP_ENUMERATE)) {
right after JS_NewArrayObject

Same with JS_NewObject( and *ret = OBJECT_TO_JSVAL(root);


seems ok with above fixed
Attachment #589590 - Flags: review?(taras.mozilla) → review+
Attached patch telemetry bits (obsolete) — Splinter Review
Refreshed patch addressing review comments, carrying over r+.
Attachment #589590 - Attachment is obsolete: true
Attachment #590780 - Flags: review+
Attached patch mutex bits (obsolete) — Splinter Review
These are the potentially controversial mutex bits.  We add a TimeStamp to all Mutexes; we also record the name of the Mutex, even in opt builds, so we can provide it to Telemetry later.

Since we don't want this turned on all the time, everything is guarded by MOZ_MUTEX_TELEMETRY.  A configure patch for this is coming up next.

This patch does depend on bug 718214 for Telemetry::CanRecord.
Attachment #590785 - Flags: review?(benjamin)
Attached patch configure bits (obsolete) — Splinter Review
This adds --enable-mutex-telemetry and MOZ_MUTEX_TELEMETRY bits.

Ideally, we'd add --enable-mutex-telemetry to our builds for an entire release cycle or so to gather data.  That would, of course, be a separate patch.
Attachment #590786 - Flags: review?(khuey)
Comment on attachment 590785 [details] [diff] [review]
mutex bits

I'm going to punt this to cjones who is the original author and can think about it better than I!
Attachment #590785 - Flags: review?(benjamin) → review?(jones.chris.g)
I should get to this by EOD today.
Comment on attachment 590785 [details] [diff] [review]
mutex bits

I assume this is going to be off by default, right?  And the intention
is for it to be used in opt builds?

You're going to get some incredibly-long outliers when the CPU goes to
sleep while a lock is acquired.  I guess this is a general problem for
most telemetry probes, so maybe that's filtered already.

If there's a set of suspects you already have in mind (*cough*
storage), it would be a bit cleaner and nicer to have special helpers
to instrument blocking time.  But if you don't already have suspects,
yeah you want this.

Speaking of storage, it uses its own mutex helper wrapped around
sqlite3_mutex.  I assume you're going to instrument this separately,
or have already?

>diff --git a/xpcom/glue/BlockingResourceBase.h b/xpcom/glue/BlockingResourceBase.h

>+#ifdef MOZ_MUTEX_TELEMETRY
>+    /**
>+     * mName
>+     * The name of this resource.
>+     */
>+    const char *mName;
>+#endif

You don't use this, get rid of it.

>diff --git a/xpcom/glue/Mutex.h b/xpcom/glue/Mutex.h

>     Mutex(const char* name) :
>         BlockingResourceBase(name, eMutex)
>+#ifdef MOZ_MUTEX_TELEMETRY
>+        , mLockTime()
>+        , mName(name)

This duplicates state held by BlockingResourceBase, but that's OK by
me.  It would be hard and abstraction-leaky to get at that state.

>     void Lock()
>     {
>+#ifdef MOZ_MUTEX_TELEMETRY
>+        mLockTime = TimeStamp::Now();

This is racy: multiple threads contend for Lock().  TimeStamp can't be
atomically written and read on all archs, so you'll additionally get
corrupted data.

The race is a problem because the last thread here will win the race
for mLockTime.  That means you'll end up underestimating the amount of
block time.

More generally, what you're doing here is measuring the duration of
"blocked and mutex-held time", the amount of time a thread waits for
the mutex and then the amount of time it's held.  Trying to measure
the blocked time is leading to all these data hazards.

I don't think it's too subtle to see that if thread A spends a
significant amount of time waiting on a mutex, then it must be because
another thread B held the mutex for that duration.  So just measuring
the duration of time a mutex is *acquired* is sufficient for
telemetry.  That gets rid of all the data hazards too.

>+#endif
>         PR_Lock(mLock);

Move the timestamp write inside here.

>     void Unlock()
>     {
>         PR_Unlock(mLock);
>+        MaybeReportSlowMutex();

See above.  Move this above the PR_Unlock(), so that it happens while
the mutex is still acquired.

|MaybeReportSlowMutex()| is a misleading name.  Mutexes aren't slow or
fast.  |MaybeReportOverlyLongCriticalSection()| is better.

I don't need to go into all the problems that can occur with the impl
as it is here, but there are many.

>     PRLock* mLock;
>+#ifdef MOZ_MUTEX_TELEMETRY
>+    TimeStamp mLockTime;

Please add a comment about what this is used to measure, and why only
measuring the lock-held is enough.  Call this |mAcquiredTime| for
clarity.

Explain that |mAcquiredTime| is protected by |mLock|.

>+    const char *mName;

Please add a note that this is duplicated by state in
BlockingResourceBase, but that's OK.
Attachment #590785 - Flags: review?(jones.chris.g) → review-
(In reply to Chris Jones [:cjones] [:warhammer] from comment #9)
> Comment on attachment 590785 [details] [diff] [review]
>

One thing I left out here is that we're not measuring the block time of ReentrantMonitor, which uses PRMonitor directly.  I'm not sure how big of a problem this is.

> I don't think it's too subtle to see that if thread A spends a
> significant amount of time waiting on a mutex, then it must be because
> another thread B held the mutex for that duration.  So just measuring
> the duration of time a mutex is *acquired* is sufficient for
> telemetry.

While this is true, trying to measure this will give us bogus data.  The problem comes from CondVar/Monitor: when the thread starts to block on the condvar, its protecting mutex is released, but deep within the bowels of the OS.  The Mutex class doesn't observe this.  So the problem is that Mutex is used for critical sections, but also condvar waiting.

So what we want to do is measure the block-on-acquire time.  By the argument above, if thread B holds a mutex for too long, then thread A will block waiting on it for a long time, so measuring block-on-acquire gives us the same kind of data.  The problem is that measuring this correctly is a bit trickier.

What we need to do is
 - create a TLS TimeStamp variable called something like sMutexAcquireStartTime, in some .cpp file.
 - in Mutex::Lock(), use a protocol like

void
Mutex::Lock()
{
    // write TimeStamp::Now() to TLS sMutexAcquireStartTime.  Probably want to do this using a helper method defined in the same .cpp as sMutexAcquireStartTime.
    PR_Lock(mLock)
    MaybeReportLongWaitTime();
}

with appropriate ifdef'ery.  Mutex can never be recursively locked, but even if it could, this protocol would still be correct.  This will give 100% precise results (modulo CPU sleep etc.).

Let me know if this makes sense.
Also, Telemetry::RecordLongMutexWait() needs to be thread safe.  I don't know if it is.
Comment on attachment 590786 [details] [diff] [review]
configure bits

Review of attachment 590786 [details] [diff] [review]:
-----------------------------------------------------------------

Why does this need to be configurable at build time?
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #12)
> Why does this need to be configurable at build time?

The intent was that we would turn it on for a release cycle or portion thereof, and turn it off after collecting data.  Possibly turning it on for other release cycles thereafter.  configury seemed like the best way to do that.

I haven't measured how much of a slowdown it would be; it's possible that it has minimal impact and we can just leave it turned on all the time.  That scenario, however, seems unlikely.
Version: unspecified → Trunk
Thanks for the comments, they are helpful.

(In reply to Chris Jones [:cjones] [:warhammer] from comment #10)
> What we need to do is
>  - create a TLS TimeStamp variable called something like
> sMutexAcquireStartTime, in some .cpp file.
>  - in Mutex::Lock(), use a protocol like
> 
> void
> Mutex::Lock()
> {
>     // write TimeStamp::Now() to TLS sMutexAcquireStartTime.  Probably want
> to do this using a helper method defined in the same .cpp as
> sMutexAcquireStartTime.
>     PR_Lock(mLock)
>     MaybeReportLongWaitTime();
> }
> 
> with appropriate ifdef'ery.  Mutex can never be recursively locked, but even
> if it could, this protocol would still be correct.  This will give 100%
> precise results (modulo CPU sleep etc.).
> 
> Let me know if this makes sense.

Earlier you said that having a TimeStamp being written outside of the lock was not correct, since TimeStamp is not necessarily atomically modifiable on all architectures.  Why the difference between what you wrote and what you present above?

Also, do you mean to have MaybeReportLongWaitTime in addition to MaybeReportOverlyLongCriticalSection?
(In reply to Nathan Froyd (:froydnj) from comment #14)
> Thanks for the comments, they are helpful.
>
> Earlier you said that having a TimeStamp being written outside of the lock
> was not correct, since TimeStamp is not necessarily atomically modifiable on
> all architectures.  Why the difference between what you wrote and what you
> present above?
> 

You were writing to a member variable of Mutex, shared among all threads with a reference to that Mutex.  What you need to do instead is write to a TLS variable that tracks each *individual* thread's wait time on the Mutex it's currently trying to acquire.  Since the variable is local to each thread, there are never data hazards trying to read/write it (ignoring signals).

The TLS implementation will also mean there's exactly one timestamp per thread, instead of metadata per Mutex instance.  We shouldn't have enough Mutexes that it's a significant win, but it's a good habit to be in.

> Also, do you mean to have MaybeReportLongWaitTime in addition to
> MaybeReportOverlyLongCriticalSection?

Nope, all we need is MaybeReportLongWaitTime().  I was just suggesting as a name to rename the current helper function to.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #15)
> (In reply to Nathan Froyd (:froydnj) from comment #14)
> > Why the difference between what you wrote and what you
> > present above?
> 
> You were writing to a member variable of Mutex, shared among all threads
> with a reference to that Mutex.  What you need to do instead is write to a
> TLS variable that tracks each *individual* thread's wait time on the Mutex
> it's currently trying to acquire.

Oh, right.

> > Also, do you mean to have MaybeReportLongWaitTime in addition to
> > MaybeReportOverlyLongCriticalSection?
> 
> Nope, all we need is MaybeReportLongWaitTime()

Ah, I see what you are aiming at.  We don't so much care if some operation takes a long time--if it does, it should show up on a profile--but whether there's significant contention for things.

Reporting lock contention is certainly a useful thing, but does it really help in addressing jank-causing things?  What if we're locking things on the main thread, but there's never any contention for those locks?
(In reply to Nathan Froyd (:froydnj) from comment #13)
> (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #12)
> > Why does this need to be configurable at build time?
> 
> The intent was that we would turn it on for a release cycle or portion
> thereof, and turn it off after collecting data.  Possibly turning it on for
> other release cycles thereafter.  configury seemed like the best way to do
> that.

Also, we can simply enable this on the profiling branch via configury.
If a lock is never contented, it will never cause jank.
Attached patch mutex bits, v2 (obsolete) — Splinter Review
Here's a reworked mutex patch implementing most of the review comments.  The one difference from the suggested implementation is using a on-stack TimeStamp instead of a TLS variable; this avoids OS-specific TLS hackery.

I put mName in BlockingResourceBase so that SQLiteMutex changes (separate patch) could use the same infrastructure.

Since this is aimed at jankiness, the Telemetry bits are main thread only; the mutex bits have been updated to reflect that.  This infrastructure would also be useful for lock contention generally with the main thread restriction removed.
Attachment #590785 - Attachment is obsolete: true
Attachment #594217 - Flags: review?(jones.chris.g)
Attached patch sqlite mutex bits (obsolete) — Splinter Review
We'd like to know if sqlite mutexes are causing jank, too.  This patch adds those, though it's likely mutex contention is correlated with long SQL queries.
Attachment #594218 - Flags: review?(mak77)
Rewriting the telemetry bits to indicate this is about lock acquisition/contention.  Carrying over r+, requesting feedback.
Attachment #590780 - Attachment is obsolete: true
Attachment #594219 - Flags: review+
Attachment #594219 - Flags: feedback?(taras.mozilla)
(In reply to Nathan Froyd (:froydnj) from comment #19)
> Created attachment 594217 [details] [diff] [review]
> mutex bits, v2
> 
> Here's a reworked mutex patch implementing most of the review comments.  The
> one difference from the suggested implementation is using a on-stack
> TimeStamp instead of a TLS variable; this avoids OS-specific TLS hackery.
> 

Duh!  Thanks.
Comment on attachment 594217 [details] [diff] [review]
mutex bits, v2

Looks good, but

>diff --git a/xpcom/glue/BlockingResourceBase.cpp b/xpcom/glue/BlockingResourceBase.cpp

>+#ifdef MOZ_MUTEX_TELEMETRY
>+    mLockTime = TimeStamp::Now();
>+#endif

>+    MaybeReportSlowMutex();

this code won't compile :).  Please fix.
Attachment #594217 - Flags: review?(jones.chris.g) → review+
Comment on attachment 590786 [details] [diff] [review]
configure bits

Review of attachment 590786 [details] [diff] [review]:
-----------------------------------------------------------------

Please remove the developer-visible configure option, and simply hardcode the value you want into the defaults section of configure.
Attachment #590786 - Flags: review?(khuey) → review-
Comment on attachment 594219 [details] [diff] [review]
telemetry bits, v2

+  struct AcquireStats {
+    PRUint32 hitCount;
+    PRUint32 totalTime;
+  };
Use template from bug 715927.

Should RAIIfy hashtable Init/Clear
Attachment #594219 - Flags: feedback?(taras.mozilla) → feedback+
Comment on attachment 594218 [details] [diff] [review]
sqlite mutex bits

Review of attachment 594218 [details] [diff] [review]:
-----------------------------------------------------------------

My major doubt regards the perf hit of this... Do we have any idea?
Apart that the code looks fine, excluding the define name, that should really be improved.

Btw, I think it's healthier to have this go through Asuth, that well knows the specifics of Storage mutexes. So please fix the define and get a review from him.

::: storage/test/Makefile.in
@@ +74,4 @@
>  endif
>  endif
>  
> +DEFINES += -DSTORAGE_TESTING

this name looks completely generic and unclear.
you want a better name that actually reflects what it is actually doing and a really nice comment above it.
Attachment #594218 - Flags: review?(mak77)
Whiteboard: [Snappy] → [Snappy:P2]
Some Talos numbers!

Base tree: https://tbpl.mozilla.org/?tree=Try&rev=caf67e8c23d2
Changed tree A: https://tbpl.mozilla.org/?tree=Try&rev=9b12d6c054e0
Changed tree B: https://tbpl.mozilla.org/?tree=Try&rev=1b7dfe5ee2f0

The A tree had some syntax errors in TelemetryPing.js, so wound up not running a few tests.  Nevertheless, we got some decent numbers.

Comparison between base and A: http://perf.snarkfest.net/compare-talos/index.html?oldRevs=caf67e8c23d2&newRev=9b12d6c054e0&tests=a11y,tdhtml,tdhtml_nochrome,a11y_paint,tdhtml_paint,tdhtml_nochrome_paint,tp4,tp4_memset,tp4_pbytes,tp4_rss,tp4_shutdown,tp4_xres,tp5,tp5_memset,tp5_pbytes,tp5_rss,tp5_shutdown,tp5_xres,tp5_paint,tp5_memset_paint,tp5_pbytes_paint,tp5_rss_paint,tp5_shutdown_paint,tp5_xres_paint,dromaeo_basics,dromaeo_css,dromaeo_dom,dromaeo_jslib,dromaeo_sunspider,dromaeo_v8,tsspider,tsspider_nochrome,tsspider_paint,tsspider_nochrome_paint,tgfx,tgfx_nochrome,tgfx_paint,tgfx_nochrome_paint,tscroll,tsvg,tsvg_opacity,ts,ts_paint,ts_cold,ts_cold_generated_max,ts_cold_generated_max_shutdown,ts_cold_generated_med,ts_cold_generated_med_shutdown,ts_cold_generated_min,ts_cold_generated_min_shutdown,ts_cold_shutdown,ts_places_generated_max,ts_places_generated_max_shutdown,ts_places_generated_med,ts_places_generated_med_shutdown,ts_places_generated_min,ts_places_generated_min_shutdown,ts_shutdown,twinopen,tpaint&submit=true

Comparison between base and B: http://perf.snarkfest.net/compare-talos/index.html?oldRevs=caf67e8c23d2&newRev=1b7dfe5ee2f0&tests=a11y,tdhtml,tdhtml_nochrome,a11y_paint,tdhtml_paint,tdhtml_nochrome_paint,tp4,tp4_memset,tp4_pbytes,tp4_rss,tp4_shutdown,tp4_xres,tp5,tp5_memset,tp5_pbytes,tp5_rss,tp5_shutdown,tp5_xres,tp5_paint,tp5_memset_paint,tp5_pbytes_paint,tp5_rss_paint,tp5_shutdown_paint,tp5_xres_paint,dromaeo_basics,dromaeo_css,dromaeo_dom,dromaeo_jslib,dromaeo_sunspider,dromaeo_v8,tsspider,tsspider_nochrome,tsspider_paint,tsspider_nochrome_paint,tgfx,tgfx_nochrome,tgfx_paint,tgfx_nochrome_paint,tscroll,tsvg,tsvg_opacity,ts,ts_paint,ts_cold,ts_cold_generated_max,ts_cold_generated_max_shutdown,ts_cold_generated_med,ts_cold_generated_med_shutdown,ts_cold_generated_min,ts_cold_generated_min_shutdown,ts_cold_shutdown,ts_places_generated_max,ts_places_generated_max_shutdown,ts_places_generated_med,ts_places_generated_med_shutdown,ts_places_generated_min,ts_places_generated_min_shutdown,ts_shutdown,twinopen,tpaint&submit=true

These actually look like pretty low overhead.  The tsvg numbers for Linux 64-bit are just wacky, though.

The ts_places numbers do regress somewhat, but maybe that's a good place to look as to *why* we are doing so much locking there?

In any event, this is not going to be turned on everywhere: nightly builds for a cycle or two.
Attached patch part 3 - configure bits v2 (obsolete) — Splinter Review
Now with hardcoded default.
Attachment #590786 - Attachment is obsolete: true
Attachment #597814 - Flags: review?(khuey)
Reworked patch with a better define name, some commentary, and refactoring.
Attachment #594218 - Attachment is obsolete: true
Attachment #597816 - Flags: review?(bugmail)
Comment on attachment 597814 [details] [diff] [review]
part 3 - configure bits v2

Review of attachment 597814 [details] [diff] [review]:
-----------------------------------------------------------------

Presumably you want to add this to autoconf.mk.in too ...
Attachment #597814 - Flags: review?(khuey) → review-
Comment on attachment 597816 [details] [diff] [review]
part 4 - sqlite mutex bits

This is pretty straightforward.  As you sure you shouldn't be adding an #include to the header since you're introducing a dependency to a header file?

In terms of accomplishing the stated goal, it's important to keep in mind that storage only uses these wrappers where we benefit from reduced lock churn or need the mutex held for a longer time than the SQLite calls themselves for correctness/sanity.  We do not acquire the mutex before every call into the SQLite API.  For example, synchronous statement stepping will not use our lock wrappers.

If you are really concerned about monitoring mutex contention, you may want to take a look at sqlite3_config's support for wrapping/replacing the mutex magic:
http://www.sqlite.org/c3ref/mutex_methods.html 
http://www.sqlite.org/c3ref/c_config_getmalloc.html

The true async unit test already does some wrapping of the mutex operations, although for more hacky reasons:
http://mxr.mozilla.org/mozilla-central/source/storage/test/test_true_async.cpp
Attachment #597816 - Flags: review?(bugmail) → review+
(In reply to Andrew Sutherland (:asuth) from comment #31)
> This is pretty straightforward.  As you sure you shouldn't be adding an
> #include to the header since you're introducing a dependency to a header
> file?

Which header file?  The only thing these changes require are functions declared in BlockingResourceBase.h and SQLiteMutex.h already includes that file.  Or are you suggesting I should be including TimeStamp.h etc.?

> In terms of accomplishing the stated goal, it's important to keep in mind
> that storage only uses these wrappers where we benefit from reduced lock
> churn or need the mutex held for a longer time than the SQLite calls
> themselves for correctness/sanity.  We do not acquire the mutex before every
> call into the SQLite API.  For example, synchronous statement stepping will
> not use our lock wrappers.
> 
> If you are really concerned about monitoring mutex contention, you may want
> to take a look at sqlite3_config's support for wrapping/replacing the mutex
> magic:

Thanks for the pointer.  I'll have a think about this.
(In reply to Nathan Froyd (:froydnj) from comment #32)
> Which header file?  The only thing these changes require are functions
> declared in BlockingResourceBase.h and SQLiteMutex.h already includes that
> file.  Or are you suggesting I should be including TimeStamp.h etc.?

Basically, I was thinking it would make sense that if you #include the header file as the first header in a source file, that it would be preferable for the compiler not to choke.  The TimeStamp stuff was the thing jumping out at me though, yes.
Attached patch part 2 - mutex bits, v3 (obsolete) — Splinter Review
I must have been running with --disable-tests before.  Here's an updated, ugly patch to address problems with --enable-tests.

If you compile with MOZ_MUTEX_TELEMETRY, you'll find that several xpcom tests won't link because they're trying to get at TimeStamp et al symbols in libxul.  These symbols are not available, as the related patch to sqlite mutexes show.  But adding something like -DMOZ_XPCOM_COMPILING_TESTS (again, as the sqlite patch does) is not sufficient; you still have to address BlockingResourceBase.cpp being compiled *with* MOZ_MUTEX_TELEMETRY.  It will therefore have symbols that can only be found inside of libxul.

So the chosen fix is to move all the necessary bits into BlockingResourceBase.h with some games:

- We have MOZ_COMPILING_XPCOM_TESTS for when we're compiling tests.
- We don't want to use that all the time, though, because we don't want the size of BlockingResourceBase to change depending on whether we're compiling tests (that would be bad).  So we still need to use MOZ_MUTEX_TELEMETRY in some spots.
- I chose to not have MaybeReportContendedAcquisition be inlined on account of code bloat.  To get the right linkonce semantics for it, we need to declare it as a template function and let the linker take care of merging.
- MOZ_MUTEX_TELEMETRY and DEBUG combined is now nonsensical; the Mutex methods will not call the necessary recording methods.  I don't think this is a problem.
Attachment #594217 - Attachment is obsolete: true
Attachment #599715 - Flags: review?(jones.chris.g)
Comment on attachment 599715 [details] [diff] [review]
part 2 - mutex bits, v3

Quite a hack, but oh well.
Attachment #599715 - Flags: review?(jones.chris.g) → review+
(In reply to Chris Jones [:cjones] [:warhammer] from comment #35)
> Quite a hack, but oh well.

The hacks required get worse.  Including the extra headers in BlockingResourceBase.h manage to pull in some libstdc++ headers, one of which #undef's free.  This leads to things like:

...
#include "mozalloc_macro_wrappers.h"  // #define free moz_free
...
#include "nsCRT.h" // declare nsCRT::free, really nsCRT::moz_free
...
#undef free
...
    nsCRT::free(myptr); // oops!  compile error. nsCRT::free does not exist

These are all hackishly patched up with something like:

#include "mozalloc_undef_macro_wrappers.h"
#include "mozalloc_macro_wrappers.h"

in the appropriate places.  But that starts really reaching to get this to work.

I'm inclined to say that we have decent visibility into the sorts of issues we're trying to address here with SQL telemetry and chromehang, and we should shelve this idea.
Actually, a better solution is just to move BlockingMutexBase::MaybeReportContendedAcquisition into Telemetry.cpp and forward-declare a few things where necessary.  That avoids dragging in Telemetry.h and all its goop.  cjones, what do you think?
Attachment #608408 - Flags: review?(jones.chris.g)
Attached patch part 3 - configure bits v3 (obsolete) — Splinter Review
Attachment #597814 - Attachment is obsolete: true
Attachment #608409 - Flags: review?(khuey)
Attachment #599715 - Attachment is obsolete: true
Ok, don't know why tryserver caught moz<->js sync errors and my builds didn't.  Let's try that one again.
Attachment #608409 - Attachment is obsolete: true
Attachment #608409 - Flags: review?(khuey)
Attachment #608417 - Flags: review?(khuey)
Try build without patches: https://tbpl.mozilla.org/?tree=Try&rev=157f9970a41a
Try build with patches: https://tbpl.mozilla.org/?tree=Try&rev=fea7dd09ab5c
compare-talos: http://perf.snarkfest.net/compare-talos/index.html?oldRevs=157f9970a41a&newRev=fea7dd09ab5c&submit=true

The compare-talos results look generally good.  Some of the tp4 tests are bad (though they also swing green hard in some cases, so maybe those tests are just noisy?), and a few tp5r rgressions.
Comment on attachment 608417 [details] [diff] [review]
part 3 - configure bits v3

Review of attachment 608417 [details] [diff] [review]:
-----------------------------------------------------------------

If you need a preprocessor define, add

if test -n "$MOZ_FOO"; then
  AC_DEFINE(MOZ_FOO)
fi

to configure.in rather than hardcoding your own stuff in config/config.mk
Attachment #608417 - Flags: review?(khuey) → review-
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #41)
> If you need a preprocessor define, add
> 
> if test -n "$MOZ_FOO"; then
>   AC_DEFINE(MOZ_FOO)
> fi

Doing just this does not appear to work; nothing in XPCOM or storage/ gets compiled with -DMOZ_MUTEX_TELEMETRY.  The above makes MOZ_FOO show up in ACDEFINES, but not DEFINES.
They should end up in mozilla-config.h.
Comment on attachment 608408 [details] [diff] [review]
part 2 - mutex bits, v4

Sure.

Sorry for the review latency.
Attachment #608408 - Flags: review?(jones.chris.g) → review+
Chromehang provides virtually the same information this does and is more general.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: