1829433 - GeckoProfilerEntryMarker takes time in js::RunScript even when Firefox profiler isn't running

So the issue here is that the GeckoProfilingEntryMarker needs to run while the profiler isn't running, as it keeps track (for the profiler) of certain aspects of the state of the JS interpreter - specifically the label stack, and interpreter stack. (more info at js/public/ProfilingStack.h)

This would be fine if it was a lightweight operation, but as the js + label frames need to be in a consistent state while the profiler is running, it uses a lot of atomic operations.

For example, pushLabelFrame here does a number of atomic operations to manipulate the stack, and while modifying the stack pointer: https://searchfox.org/mozilla-central/rev/3b707c8fd7e978eebf24279ee51ccf07895cfbcb/js/public/ProfilingStack.h#411

I'm investigating whether or not we can do something different when the profiler isn't running. From what I understand, the concurrency is required while the profiler is sampling, but I don't see why we would need the "protection" before/after a profiling session. I think it was Markus who originally wrote this code, so he might have some ideas as to how (or whether!) we could do this.

Flags: needinfo?(mstange.moz)

Markus Stange [:mstange]

Comment 4

•

2 years ago

Rather than reducing the time spent in GeckoProfilingEntryMarker, I think it would be better to reduce the number of calls to it. Most of that work is already happening, in bug 1829411: We're only ending up in this code because we have frequent JIT -> C++ -> JIT calls, and bug 1829411 will turn those into JIT -> JIT calls.

What I'm curious about is why we need an "interpreter" profiling stack frame in these cases where we end up calling into JIT code - I'd have expected the JS frame of the called function to be filled in via JIT stack walking and not via the interpreter stack. js::RunScript is used for calls into both interpreted JS functions and JITted JS functions, so I wonder why we're pushing the profiling stack frame here and not in interpreter-specific code.

Also, I'm not sure if avoiding atomic operations here will actually improve anything. The atomic ordering we're using is ReleaseAcquire, which is what you get on x86 by default anyway.

Flags: needinfo?(mstange.moz)

Adam Brouwers-Harries [:aabh] [he/him] ⌚GMT

Comment 5

•

2 years ago

•

Edited

I think it would be better to reduce the number of calls to it

Ah, I wasn't aware of this effort, thanks for looping me in! I'll hold off on major changes then, as it may not end up being that much of a performance improvement if this code cools down.

so I wonder why we're pushing the profiling stack frame here and not in interpreter-specific code.

I'm unfamiliar with how we merge stack walking and the profiling stack, so I'm not sure about that. I could imagine that it's non-trivial, and so having a stack entry might help even if we can get info via stack walking.

The atomic ordering we're using is ReleaseAcquire, which is what you get on x86 by default anyway.

We seem to be using SequentiallyConsistent for the actual stack frames, which may be slower on x86 (there's an interesting comment at the top of the file regarding this). Additionally, this could be slow on ARM architecture or more relaxed memory architectures.

Nazım Can Altınova [:canova][:canaltinova on phabricator]

Comment 6

•

2 years ago

Thanks for the investigation Adam and thanks Markus for the additinal information about the current work happening in bug 1829411. It looks like that work has recently landed. Deprioritising this one until we can measure the new impact again.

Priority: P1 → P2

Adam Brouwers-Harries [:aabh] [he/him] ⌚GMT

Comment 7

•

1 year ago

:smaug, are you still seeing this taking time during profiles? Based on the previous discussion, it looks like the underlying cause might have been addressed in another bug.

Flags: needinfo?(smaug)

Adam Brouwers-Harries [:aabh] [he/him] ⌚GMT

Updated

•

1 year ago

Assignee: abrouwersharries → nobody

Status: ASSIGNED → NEW

Whiteboard: [sp3][fxp] → [sp3][fxp][aabh-optimisation]

Olli Pettay [:smaug][bugs@pettay.fi]

Reporter

Comment 8

•

1 year ago

Looks like it is still there a tiny bit https://share.firefox.dev/4f75xOP

Flags: needinfo?(smaug)

Bugzilla

GeckoProfilerEntryMarker takes time in js::RunScript even when Firefox profiler isn't running

Categories

(Core :: Gecko Profiler, defect, P2)

Tracking

()

People

(Reporter: smaug, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [sp3][fxp][aabh-optimisation])

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Comment 1

Comment 2

Updated

Updated

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Updated

Comment 8