Closed Bug 1345262 Opened 7 years ago Closed 7 years ago

Assertion failure: activeContext() == js::TlsContext.get()

Categories

(Core :: Gecko Profiler, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: n.nethercote, Assigned: n.nethercote)

References

(Blocks 1 open bug)

Details

Attachments

(5 files)

I just updated and this assertion now reliably triggers on Linux, Mac and Windows as soon as the browser starts with the profiler enabled.

> Assertion failure: activeContext() == js::TlsContext.get(), at ../../../js/src/jscntxt.h:911
> 
> Thread 1 "firefox" received signal SIGSEGV, Segmentation fault.
> 0x00007fffe9a18547 in JSRuntime::activeContextFromOwnThread (
>     this=0x7fffcf768000) at ../../../js/src/jscntxt.h:911
> 911	    MOZ_ASSERT(activeContext() == js::TlsContext.get());
> (gdb) bt
> #0  0x00007fffe9a18547 in JSRuntime::activeContextFromOwnThread() (this=0x7fffcf768000) at ../../../js/src/jscntxt.h:911
> #1  0x00007fffe9c51f55 in js::GeckoProfiler::enable(bool) (this=0x7fffcf768258, enabled=true) at /home/njn/moz/mi0/js/src/vm/GeckoProfiler.cpp:77
> #2  0x00007fffe9c544d6 in js::EnableContextProfilingStack(JSContext*, bool) (cx=0x7fffcf72b000, enabled=true)
>     at /home/njn/moz/mi0/js/src/vm/GeckoProfiler.cpp:556
> #3  0x00007fffe2e85267 in PseudoStack::enableJSSampling() (this=0x7fffd364c000) at ../../dist/include/PseudoStack.h:323
> #4  0x00007fffe8b254b2 in profiler_start(int, double, char const**, unsigned int, char const**, unsigned int) (aEntries=1000000, aInterval=1, aFeatures=0x7fffc067ca00, aFeatureCount=4, aThreadNameFilters=0x7fffc7d7cc00, aFilterCount=2)
>     at /home/njn/moz/mi0/tools/profiler/core/platform.cpp:1999
> #5  0x00007fffe8b2a599 in nsProfiler::StartProfiler(unsigned int, double, char const**, unsigned int, char const**, unsigned int) (this=0x7fffd37caa60, aEntries=1000000, aInterval=1, aFeatures=0x7fffc067ca00, aFeatureCount=4, aThreadNameFilters=0x7fffc7d7cc00, aFilterCount=2)
>     at /home/njn/moz/mi0/tools/profiler/gecko/nsProfiler.cpp:90

This looks like the same basic problem as bug 1340193 -- the profiler calls EnableContextProfilingStack() for every thread from the main thread, but the JS engine expects those calls to occur on the relevant thread. But it appears something changed in the JS engine so that this problem is occurring even earlier and more reliably than before. Probably bug 1341317.

Relatedly, it's not good that an assertion failure of this obviousness -- debug builds with the profiler installed simply won't start -- wasn't detected by testing. The testing coverage for the profiler must be lousy.
Turns out you need to be profiling all threads for this to happen. Or at least, DOM worker threads.
There is already some machinery within PseudoStack that I think will be useful for this, centered around mStartJSSampling. I will attempt to fix this assertion.
Assignee: nobody → n.nethercote
Bug 1341255 changed PseudoStack::mStack to a js::ProfileEntry[], so we don't
need this assertion any more.
Attachment #8845254 - Flags: review?(jseward)
PseudoContext::sampleContext() is always called immediately after
profiler_get_pseudo_stack(). This patch introduces profiler_set_js_context()
and profiler_clear_js_context(), which replace the profiler_get_pseudo_stack()
+ sampleContext() pairs. This takes us a step closer to not having to export
PseudoStack outside the profiler.
Attachment #8845255 - Flags: review?(mstange)
It does almost entirely different things depending on whether we are setting or
clearing the context.
Attachment #8845256 - Flags: review?(mstange)
They each have a single call site, and this code makes makes more sense in
platform.cpp than as PseudoStack methods because it's mostly dealing with gPS.
Attachment #8845257 - Flags: review?(mstange)
mstange, please review the whole patch.

djvj, please review with a focus on the JS engine integration, incuding the
change in XPCJSContext.cpp.

Thank you!

----

Currently, JS sampling has major problems.

- JS sampling is enabled for all JS threads from the thread that runs
  locked_profiler_start() -- currently only the main thread -- but the JS
  engine can't handle enabling from off-thread, and asserts. This makes
  profiling workers impossible in a debug build.

- No JS thread will be JS sampled unless enableJSSampling() is called, but that
  only happens in locked_profiler_start(). That means any worker threads
  created while the profiler is active won't be JS sampled.

- Only the thread that runs locked_profiler_stop() -- currently only the main
  thread -- ever calls disableJSSampling(). This means that worker threads that
  start being JS sampled never stop being JS sampled.

This patch fixes these three problems in the following ways.

- locked_profiler_start() now sets a flag in PseudoStack that indicates
  JS sampling is desired, but doesn't directly enable it. Instead, the JS
  thread polls that flag and enables JS sampling itself when it sees the flag
  is set. The polling is done by the interrupt callback. There was already a
  flag of this sort (mJSSampling) but the new one is better.

  This required adding a call to profiler_js_operation_callback() to the
  InterruptCallback() in XPCJSContext.cpp. (In comparison, the
  InterruptCallback() in dom/workers/RuntimeService.cpp already had such a
  call.)

- RegisterCurrentThread() now requests JS sampling of a JS thread when the
  profiler is active, the thread is being profiled, and JS sampling is enabled.

- locked_profiler_stop() now calls stopJSSampling() on all live threads.

The patch makes the following smaller changes as well.

- Renames profiler_js_operation_callback() as profiler_js_interrupt_callback(),
  because "interrupt callback" is the standard name (viz.
  JS_AddInterruptCallback()).

- Calls js::RegisterContextProfilingEventMarker() with nullptr when stopping
  JS sampling, so that ProfilerJSEventMarker won't fire unnecessarily.

- Some minor formatting changes.
Attachment #8845259 - Flags: review?(mstange)
Attachment #8845259 - Flags: review?(kvijayan)
With these patches applied I haven't been able to reproduce any assertion failures related to the JS engine.
Attachment #8845254 - Flags: review?(jseward) → review+
It's better, but I can still get it to assert, with the patches
applied to m-c 346697:c40ca7a1bdd9 (about 12 hours ago), for
a debug build.  STR:

Click on the profiler button, Settings, Apply (Restart Profiler).
You don't need to change any settings, and it asserts immediately:

563	    MOZ_ASSERT(cx->runtime()->geckoProfiler().enabled());
(gdb) where
#0  0x00007fffeb0f94a2 in js::RegisterContextProfilingEventMarker(JSContext*, void (*)(char const*)) (cx=<optimized out>, fn=fn@entry=0x0)
    at /home/sewardj/MOZ/MC-PROF/js/src/vm/GeckoProfiler.cpp:563
#1  0x00007fffea4de4b9 in PseudoStack::pollJSSampling() (this=0x478010) at /home/sewardj/MOZ/MC-PROF/ff-Og-linux64-dbg/dist/include/PseudoStack.h:356

I had trouble getting a proper stack from gdb, but finally got this:

FAIL
   at 0x14EFE869: VALGRIND_PRINTF_BACKTRACE(char const*, ...) (/usr/include/valgrind/valgrind.h:6828)
   by 0x14F03605: js::RegisterContextProfilingEventMarker(JSContext*, void (*)(char const*)) (MC-PROF/js/src/vm/GeckoProfiler.cpp:563)
   by 0x142E84B8: PseudoStack::pollJSSampling() (MC-PROF/ff-Og-linux64-dbg/dist/include/PseudoStack.h:356)
   by 0x142E687C: locked_profiler_stop(mozilla::BaseAutoLock<ProfilerStateMutex> const&) (MC-PROF/tools/profiler/core/platform.cpp:2424)
   by 0x142E6AB1: profiler_stop() (MC-PROF/tools/profiler/core/platform.cpp:2478)
   by 0x142E6F04: nsProfiler::StopProfiler() (MC-PROF/tools/profiler/gecko/nsProfiler.cpp:99)
   by 0x102B9025: ??? (MC-PROF/xpcom/reflect/xptcall/md/unix/xptcinvoke_asm_x86_64_unix.S:88)
   by 0x10EEC338: XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) (MC-PROF/js/xpconnect/src/XPCWrappedNative.cpp:2010)
   by 0x10EF4203: XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*) (MC-PROF/js/xpconnect/src/XPCWrappedNativeJSOps.cpp:983)
   by 0x148849B1: js::CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), JS::CallArgs const&) (MC-PROF/js/src/jscntxtinlines.h:282)
   by 0x1487EB80: js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct) (MC-PROF/js/src/vm/Interpreter.cpp:448)
   by 0x1487EF83: InternalCall(JSContext*, js::AnyInvokeArgs const&) (MC-PROF/js/src/vm/Interpreter.cpp:493)

Assertion failure: cx->runtime()->geckoProfiler().enabled(), at /home/sewardj/MOZ/MC-PROF/js/src/vm/GeckoProfiler.cpp:564
(In reply to Julian Seward [:jseward] from comment #11)
> It's better, but I can still get it to assert, with the patches
> applied to m-c 346697:c40ca7a1bdd9 (about 12 hours ago), for
> a debug build.

Thank you for testing and the stack trace. The problem is with this change:

> - Calls js::RegisterContextProfilingEventMarker() with nullptr when stopping
>   JS sampling, so that ProfilerJSEventMarker won't fire unnecessarily.

Looks like RegisterContextProfilingEventMarker() doesn't handle being passed nullptr. I'll revert that bit; it's not crucial and I was a bit uncertain about it. (It was also a last-minute change, I must have done it after I did the more thorough testing because it's easy to hit!)
(In reply to Nicholas Nethercote [:njn] from comment #12)
> Looks like RegisterContextProfilingEventMarker() doesn't handle being passed
> nullptr. I'll revert that bit; [..]

FWIW (maybe not much, since I'm not sure what change you had in mind here),
I tried just commenting out the call, thusly:

       if (mJSSampling.compareExchange(ACTIVE_REQUESTED, ACTIVE)) {
         js::EnableContextProfilingStack(mContext, true);
         js::RegisterContextProfilingEventMarker(mContext,
                                                 &ProfilerJSEventMarker);
 
       } else if (mJSSampling.compareExchange(INACTIVE_REQUESTED, INACTIVE)) {
         js::EnableContextProfilingStack(mContext, false);
-        js::RegisterContextProfilingEventMarker(mContext, nullptr);
+        //js::RegisterContextProfilingEventMarker(mContext, nullptr);
       }

That fixes the comment 11 failure.  But I can still get it to assert elsewhere
when selecting all threads (",") for profiling.  I don't know if that's a
result of not implementing the fix you had in mind, or whether it's some other
problem.  Failure is as follows:

Assertion failure: cancelable (Only nsICancelableRunnable may be dispatched to a worker!), at /home/sewardj/MOZ/MC-PROF/dom/workers/WorkerThread.cpp:236
#01: ???[/home/sewardj/MOZ/MC-PROF/ff-Og-linux64-dbg/dist/bin/libxul.so +0x361515a]
#02: ???[/home/sewardj/MOZ/MC-PROF/ff-Og-linux64-dbg/dist/bin/libxul.so +0x4be8d27]
#03: ???[/home/sewardj/MOZ/MC-PROF/ff-Og-linux64-dbg/dist/bin/libxul.so +0x4bf78a9]
#04: ???[/home/sewardj/MOZ/MC-PROF/ff-Og-linux64-dbg/dist/bin/libxul.so +0x4bf7f51]
#05: ???[/lib64/libpthread.so.0 +0x76ca]
#06: clone[/lib64/libc.so.6 +0x107f7f]
#07: ??? (???:???)

delivering signal 11 (SIGSEGV):1 to thread 47
push_signal_frame (thread 47): signal 11
   at 0x12D0915A: mozilla::dom::workers::WorkerThread::Dispatch(already_AddRefed<nsIRunnable>, unsigned int) (MC-PROF/dom/workers/WorkerThread.cpp:235)
   by 0x142DCD26: Dispatch (MC-PROF/ff-Og-linux64-dbg/dist/include/nsIEventTarget.h:37)
   by 0x142DCD26: ThreadResponsiveness::Update(bool, nsIThread*) (MC-PROF/tools/profiler/gecko/ThreadResponsiveness.cpp:104)
   by 0x142EB8A8: UpdateThreadResponsiveness (MC-PROF/tools/profiler/core/ThreadInfo.h:80)
   by 0x142EB8A8: SamplerThread::Run() (MC-PROF/tools/profiler/core/platform-linux-android.cpp:464)
   by 0x142EBF50: SamplerThread::ThreadEntry(void*) (MC-PROF/tools/profiler/core/platform-linux-android.cpp:302)
   by 0x4C316C9: start_thread (/usr/src/debug/glibc-2.24-33-ge9e69e4/nptl/pthread_create.c:333)
   by 0x59FBF7E: clone (/usr/src/debug////////glibc-2.24-33-ge9e69e4/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:105)
> Assertion failure: cancelable (Only nsICancelableRunnable may be dispatched to a worker!), at /home/sewardj/MOZ/MC-PROF/dom/workers/WorkerThread.cpp:236

That's bug 1341924. If you want to test more there's an r-'d patch in that bug that is sufficient to fix that assertion failure.
Attachment #8845255 - Flags: review?(mstange) → review+
Attachment #8845256 - Flags: review?(mstange) → review+
Attachment #8845257 - Flags: review?(mstange) → review+
Comment on attachment 8845259 [details] [diff] [review]
(part 5) - Fix how JS sampling is started/stopped by the profiler

Review of attachment 8845259 [details] [diff] [review]:
-----------------------------------------------------------------

This is beautiful.
Attachment #8845259 - Flags: review?(mstange) → review+
Comment on attachment 8845259 [details] [diff] [review]
(part 5) - Fix how JS sampling is started/stopped by the profiler

Review of attachment 8845259 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.  No real comments or nits.  I really like the cleanup, state machine, and the commenting.  Feels a lot tighter to me.
Attachment #8845259 - Flags: review?(kvijayan) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/6ae33b63f3a2c504d5fa36b0a663c6f1e4d710ac
Bug 1345262 (part 1) - Remove unnecessary assertion. r=jseward.

https://hg.mozilla.org/integration/mozilla-inbound/rev/6906fb64de7da456097896aeb1418c6f6ca822ac
Bug 1345262 (part 2) - Add profiler_{set,clear}_js_context(). r=mstange.

https://hg.mozilla.org/integration/mozilla-inbound/rev/7414c13a82a1b43316a5d169eb2c33f4f9b4253d
Bug 1345262 (part 3) - Split PseudoStack::sampleContext() in two. r=mstange.

https://hg.mozilla.org/integration/mozilla-inbound/rev/eed2dd8ea13feff9764dd630ead0a24dd4f3af94
Bug 1345262 (part 4) - Remove and inline PseudoStack::{clearContext,flushSamplerOnJSShutdown}(). r=mstange.

https://hg.mozilla.org/integration/mozilla-inbound/rev/da8e7dc470c8d5219c3312f637be5946e9f7f37d
Bug 1345262 (part 5) - Fix how JS sampling is started/stopped by the profiler. r=mstange,djvj.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: