Open Bug 1829433 Opened 2 years ago Updated 1 year ago

GeckoProfilerEntryMarker takes time in js::RunScript even when Firefox profiler isn't running

Categories

(Core :: Gecko Profiler, defect, P2)

defect

Tracking

()

People

(Reporter: smaug, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [sp3][fxp][aabh-optimisation])

I wouldn't expect to see Firefox profiler stuff in profiles generated using other profilers
https://share.firefox.dev/3N4yArP

Whiteboard: [sp3]

Hey Adam, could you take a look at this too? We can also look at it together.

Assignee: nobody → abrouwersharries
Severity: -- → S3
Priority: -- → P1
Whiteboard: [sp3] → [sp3][fxp]
Status: NEW → ASSIGNED

So the issue here is that the GeckoProfilingEntryMarker needs to run while the profiler isn't running, as it keeps track (for the profiler) of certain aspects of the state of the JS interpreter - specifically the label stack, and interpreter stack. (more info at js/public/ProfilingStack.h)

This would be fine if it was a lightweight operation, but as the js + label frames need to be in a consistent state while the profiler is running, it uses a lot of atomic operations.

For example, pushLabelFrame here does a number of atomic operations to manipulate the stack, and while modifying the stack pointer: https://searchfox.org/mozilla-central/rev/3b707c8fd7e978eebf24279ee51ccf07895cfbcb/js/public/ProfilingStack.h#411

I'm investigating whether or not we can do something different when the profiler isn't running. From what I understand, the concurrency is required while the profiler is sampling, but I don't see why we would need the "protection" before/after a profiling session. I think it was Markus who originally wrote this code, so he might have some ideas as to how (or whether!) we could do this.

Flags: needinfo?(mstange.moz)

Rather than reducing the time spent in GeckoProfilingEntryMarker, I think it would be better to reduce the number of calls to it. Most of that work is already happening, in bug 1829411: We're only ending up in this code because we have frequent JIT -> C++ -> JIT calls, and bug 1829411 will turn those into JIT -> JIT calls.

What I'm curious about is why we need an "interpreter" profiling stack frame in these cases where we end up calling into JIT code - I'd have expected the JS frame of the called function to be filled in via JIT stack walking and not via the interpreter stack. js::RunScript is used for calls into both interpreted JS functions and JITted JS functions, so I wonder why we're pushing the profiling stack frame here and not in interpreter-specific code.

Also, I'm not sure if avoiding atomic operations here will actually improve anything. The atomic ordering we're using is ReleaseAcquire, which is what you get on x86 by default anyway.

Flags: needinfo?(mstange.moz)

I think it would be better to reduce the number of calls to it

Ah, I wasn't aware of this effort, thanks for looping me in! I'll hold off on major changes then, as it may not end up being that much of a performance improvement if this code cools down.

so I wonder why we're pushing the profiling stack frame here and not in interpreter-specific code.

I'm unfamiliar with how we merge stack walking and the profiling stack, so I'm not sure about that. I could imagine that it's non-trivial, and so having a stack entry might help even if we can get info via stack walking.

The atomic ordering we're using is ReleaseAcquire, which is what you get on x86 by default anyway.

We seem to be using SequentiallyConsistent for the actual stack frames, which may be slower on x86 (there's an interesting comment at the top of the file regarding this). Additionally, this could be slow on ARM architecture or more relaxed memory architectures.

Thanks for the investigation Adam and thanks Markus for the additinal information about the current work happening in bug 1829411. It looks like that work has recently landed. Deprioritising this one until we can measure the new impact again.

Priority: P1 → P2

:smaug, are you still seeing this taking time during profiles? Based on the previous discussion, it looks like the underlying cause might have been addressed in another bug.

Flags: needinfo?(smaug)
Assignee: abrouwersharries → nobody
Status: ASSIGNED → NEW
Whiteboard: [sp3][fxp] → [sp3][fxp][aabh-optimisation]

Looks like it is still there a tiny bit https://share.firefox.dev/4f75xOP

Flags: needinfo?(smaug)
You need to log in before you can comment on or make changes to this bug.