Bug 1779257 Comment 9 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

I spent some time to investigate the issue that we have and I found a fix to it. 

We are using [`getcontext`]() the get the user context when we need to do a sync backtrace capture. And we may use 2 different `getcontext`s depending on the platform and libc because it's not a part of the standard C library and no longer part of the Posix standard. If libc has it, then we use the libc provided one. If it doesn't, we [fallback](https://searchfox.org/mozilla-central/rev/66aa740e65a343659a7446b890145781f1b6a344/toolkit/moz.configure#2704-2708) to [breakpad_getcontext](https://searchfox.org/mozilla-central/source/toolkit/crashreporter/google-breakpad/src/common/linux/breakpad_getcontext.h#45-46,48).

I'm not sure in which cases we don't have `getcontext`, but my ubuntu linux machine with the current mozilla-central clang has it. I tried both with libc and breakpad ones though to make sure.

It looks like breakpad's `getcontext` was working without a problem. But libc `getcontext` wasn't working properly (which seems like it's the mostly used one from the complaints).

Also I tested with different build flags, and found out that libc `getcontext` is working when built with no optimizations. When we build with optimization (starting from -O1), it stops working properly.

The next thing I tested was to call the `getcontext` in the parent right after initializing Registers [here](https://searchfox.org/mozilla-central/rev/66aa740e65a343659a7446b890145781f1b6a344/tools/profiler/core/platform.cpp#6768) instead of calling them inside [`Registers::SyncPopulate`](https://searchfox.org/mozilla-central/rev/66aa740e65a343659a7446b890145781f1b6a344/tools/profiler/core/platform-linux-android.cpp#632-636). It appears that this indeed fixes the issue. When I searched for it, it's mentioned in a few places that the saved context is invalid once the function that called `getcontext` returns. We need to call the `getcontext` while the frame where we called it is still on the stack. Because stack pointer address will be invalid once the `SyncPopulate` returns. So that explains why it works when we call it inside the `profiler_capture_backtrace_into` function but not inside the `Registers::SyncPopulate` method.

I have a patch now to move this `getcontext` function out of that method and inline to its parent with a macro. That way the user context will be valid during the stack walking. I tested it manually and it seems like it's working well.
I spent some time to investigate the issue that we have and I found a fix to it. 

We are using [`getcontext`](https://searchfox.org/mozilla-central/rev/66aa740e65a343659a7446b890145781f1b6a344/tools/profiler/core/platform-linux-android.cpp#633) the get the user context when we need to do a sync backtrace capture. And we may use 2 different `getcontext`s depending on the platform and libc because it's not a part of the standard C library and no longer part of the Posix standard. If libc has it, then we use the libc provided one. If it doesn't, we [fallback](https://searchfox.org/mozilla-central/rev/66aa740e65a343659a7446b890145781f1b6a344/toolkit/moz.configure#2704-2708) to [breakpad_getcontext](https://searchfox.org/mozilla-central/source/toolkit/crashreporter/google-breakpad/src/common/linux/breakpad_getcontext.h#45-46,48).

I'm not sure in which cases we don't have `getcontext`, but my ubuntu linux machine with the current mozilla-central clang has it. I tried both with libc and breakpad ones though to make sure.

It looks like breakpad's `getcontext` was working without a problem. But libc `getcontext` wasn't working properly (which seems like it's the mostly used one from the complaints).

Also I tested with different build flags, and found out that libc `getcontext` is working when built with no optimizations. When we build with optimization (starting from -O1), it stops working properly.

The next thing I tested was to call the `getcontext` in the parent right after initializing Registers [here](https://searchfox.org/mozilla-central/rev/66aa740e65a343659a7446b890145781f1b6a344/tools/profiler/core/platform.cpp#6768) instead of calling them inside [`Registers::SyncPopulate`](https://searchfox.org/mozilla-central/rev/66aa740e65a343659a7446b890145781f1b6a344/tools/profiler/core/platform-linux-android.cpp#632-636). It appears that this indeed fixes the issue. When I searched for it, it's mentioned in a few places that the saved context is invalid once the function that called `getcontext` returns. We need to call the `getcontext` while the frame where we called it is still on the stack. Because stack pointer address will be invalid once the `SyncPopulate` returns. So that explains why it works when we call it inside the `profiler_capture_backtrace_into` function but not inside the `Registers::SyncPopulate` method.

I have a patch now to move this `getcontext` function out of that method and inline to its parent with a macro. That way the user context will be valid during the stack walking. I tested it manually and it seems like it's working well.

Back to Bug 1779257 Comment 9