Bug 1529108 Comment 14 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

(In reply to Julian Seward [:jseward] from comment #9)
> The EXIDX -> Internal form reader was removed some years ago in
> favour of simply using Jed's standalone EXIDX unwinder on arm32.
> But I guess we could reinstate it.

I investigated this some more. As Julian predicted, once I restored the EXIDX -> Internal form converter code that was removed in bug 1173847, stack walking libxul using LUL started working! I got beautiful full stacks for samples in libc.so, so the hand-off from Dwarf unwinding to EXIDX unwinding is working, too.
Compare: [EHABIStackWalk profile](https://perfht.ml/3baNPJg), [LUL profile with EXIDX reader reinstated](https://perfht.ml/2W7qLH4)

So, short term, I think switching 32-bit ARM away from EHABStackWalk to LUL + EXIDX reader would be the fastest way to fix this bug and to get usable call stacks from libc.so on the Moto G5. This would make the EHABStackWalk code unused.
It should be noted that we're already using LUL for arm64 stackwalking; it's just 32-bit ARM that's not using LUL today.

However, LUL comes with a pretty hefty startup cost, and with some memory overhead: This change caused profiler initialization time to increase from ~80ms to ~400ms on my phone. This is already a problem on arm64 (and really in all places where we're using LUL today), so we'll want to have a general solution for it. But for 32 bit arm, we could instead go for another approach.
Proposal: We could turn LUL into a Frankenstein unwinder that supports both the LUL format and the raw EXIDX format, by opportunistically using code from EHABIStackWalk to do EXIDX-unwinding on a per-stackframe basis, controlled by LUL. Then we don't need the EXIDX converter and save the startup overhead.
I hacked this up in a quick-and-dirty way, and got the following profile: [LUL profile with per-frame EHABIStackWalk fallback](https://perfht.ml/35wU49d)
It seems to work, and profiler initialization time is down to 130ms! The extra 50ms compared to just EHABIStackWalk (80ms) might be from converting Dwarf unwinding info of system libraries. These timings are preliminary, I'll need to collect some more info here.

For a proper implementation, we'll probably want to make a version of the EHABIStackWalk code that uses more LUL-isms such as using [TaggedUWords](https://searchfox.org/mozilla-central/rev/54f965e51e4f77866bec42737978d40d4c1fdfc5/tools/profiler/lul/LulMain.h#55-56) everywhere (to propagate whether values are valid), reading from a [stack image](https://searchfox.org/mozilla-central/rev/7908ce29657cfd623993046bd8e38664e1c0b28e/tools/profiler/core/platform.cpp#1919-1950) rather than from the raw stack, and using [LUL-style "address" terminology](https://searchfox.org/mozilla-central/rev/54f965e51e4f77866bec42737978d40d4c1fdfc5/tools/profiler/lul/LulMain.h#21-51).
(In reply to Julian Seward [:jseward] from comment #9)
> The EXIDX -> Internal form reader was removed some years ago in
> favour of simply using Jed's standalone EXIDX unwinder on arm32.
> But I guess we could reinstate it.

I investigated this some more. As Julian predicted, once I restored the EXIDX -> Internal form converter code that was removed in bug 1173847, stack walking libxul using LUL started working! I got beautiful full stacks for samples in libc.so, so the hand-off from Dwarf unwinding to EXIDX unwinding is working, too.
Compare: [EHABIStackWalk profile](https://perfht.ml/3baNPJg), [LUL profile with EXIDX reader reinstated](https://perfht.ml/2W7qLH4)

So, short term, I think switching 32-bit ARM away from EHABStackWalk and to LUL + EXIDX reader would be the fastest way to fix this bug and to get usable call stacks from libc.so on the Moto G5. This would make the EHABStackWalk code unused.
It should be noted that we're already using LUL for arm64 stackwalking; it's just 32-bit ARM that's not using LUL today.

However, LUL comes with a pretty hefty startup cost, and with some memory overhead: This change caused profiler initialization time to increase from ~80ms to ~400ms on my phone. This is already a problem on arm64 (and really in all places where we're using LUL today), so we'll want to have a general solution for it. But for 32 bit arm, we could instead go for another approach.
Proposal: We could turn LUL into a Frankenstein unwinder that supports both the LUL format and the raw EXIDX format, by opportunistically using code from EHABIStackWalk to do EXIDX-unwinding on a per-stackframe basis, controlled by LUL. Then we don't need the EXIDX converter and save the startup overhead.
I hacked this up in a quick-and-dirty way, and got the following profile: [LUL profile with per-frame EHABIStackWalk fallback](https://perfht.ml/35wU49d)
It seems to work, and profiler initialization time is down to 130ms! The extra 50ms compared to just EHABIStackWalk (80ms) might be from converting Dwarf unwinding info of system libraries. These timings are preliminary, I'll need to collect some more info here.

For a proper implementation, we'll probably want to make a version of the EHABIStackWalk code that uses more LUL-isms such as using [TaggedUWords](https://searchfox.org/mozilla-central/rev/54f965e51e4f77866bec42737978d40d4c1fdfc5/tools/profiler/lul/LulMain.h#55-56) everywhere (to propagate whether values are valid), reading from a [stack image](https://searchfox.org/mozilla-central/rev/7908ce29657cfd623993046bd8e38664e1c0b28e/tools/profiler/core/platform.cpp#1919-1950) rather than from the raw stack, and using [LUL-style "address" terminology](https://searchfox.org/mozilla-central/rev/54f965e51e4f77866bec42737978d40d4c1fdfc5/tools/profiler/lul/LulMain.h#21-51).
(In reply to Julian Seward [:jseward] from comment #9)
> The EXIDX -> Internal form reader was removed some years ago in
> favour of simply using Jed's standalone EXIDX unwinder on arm32.
> But I guess we could reinstate it.

I investigated this some more. As Julian predicted, once I restored the EXIDX -> Internal form converter code that was removed in bug 1173847, stack walking libxul using LUL started working! I got beautiful full stacks for samples in libc.so, so the hand-off from Dwarf unwinding to EXIDX unwinding is working, too.
Compare: [EHABIStackWalk profile](https://perfht.ml/3baNPJg), [LUL profile with EXIDX reader reinstated](https://perfht.ml/2W7qLH4)

So, short term, I think switching 32-bit ARM away from EHABStackWalk and to LUL + EXIDX reader would be the fastest way to fix this bug and to get usable call stacks from libc.so on the Moto G5. This would make the EHABStackWalk code unused.
It should be noted that we're already using LUL for arm64 stackwalking; it's just 32-bit ARM that's not using LUL today.

However, LUL comes with a pretty hefty startup cost, and with some memory overhead: This change caused profiler initialization time to increase from ~80ms to ~400ms on my phone. This is already a problem on arm64 (and really in all places where we're using LUL today), so we'll want to have a general solution for it. But for 32 bit arm, we could instead go for another approach.
Proposal: We could turn LUL into a Frankenstein unwinder that supports both the LUL format and the raw EXIDX format, by opportunistically using code from EHABIStackWalk to do EXIDX-unwinding on a per-stackframe basis, controlled by LUL. Then we don't need the EXIDX converter and save the startup overhead.
I hacked this up in a quick-and-dirty way, and got the following profile: [LUL profile with per-frame EHABIStackWalk fallback](https://perfht.ml/35wU49d)
It seems to work, and profiler initialization time is down to 130ms! The extra 50ms compared to just EHABIStackWalk (80ms) might be from converting Dwarf unwinding info of system libraries. These timings are preliminary, I'll need to collect some more info here.

For a proper implementation, we'll probably want to make a version of the EHABIStackWalk code that uses more LUL-isms, such as using [TaggedUWords](https://searchfox.org/mozilla-central/rev/54f965e51e4f77866bec42737978d40d4c1fdfc5/tools/profiler/lul/LulMain.h#55-56) everywhere (to propagate whether values are valid), reading from a [stack image](https://searchfox.org/mozilla-central/rev/7908ce29657cfd623993046bd8e38664e1c0b28e/tools/profiler/core/platform.cpp#1919-1950) rather than from the raw stack, and using [LUL-style "address" terminology](https://searchfox.org/mozilla-central/rev/54f965e51e4f77866bec42737978d40d4c1fdfc5/tools/profiler/lul/LulMain.h#21-51).
(In reply to Julian Seward [:jseward] from comment #9)
> The EXIDX -> Internal form reader was removed some years ago in
> favour of simply using Jed's standalone EXIDX unwinder on arm32.
> But I guess we could reinstate it.

I investigated this some more. As Julian predicted, once I restored the EXIDX -> Internal form converter code that was removed in bug 1173847, stack walking libxul using LUL started working! I got beautiful full stacks for samples in libc.so, so the hand-off from Dwarf unwinding to EXIDX unwinding is working, too.
Compare: [EHABIStackWalk profile](https://perfht.ml/3baNPJg), [LUL profile with EXIDX reader reinstated](https://perfht.ml/2W7qLH4)

So, short term, I think switching 32-bit ARM away from EHABStackWalk and to LUL + EXIDX reader would be the fastest way to fix this bug and to get usable call stacks from libc.so on the Moto G5. This would make the EHABStackWalk code unused.
It should be noted that we're already using LUL for arm64 stackwalking; it's just 32-bit ARM that's not using LUL today.

However, LUL comes with a pretty hefty [startup cost](https://bugzilla.mozilla.org/show_bug.cgi?id=1635810), and with some memory overhead: This change caused profiler initialization time to increase from ~80ms to ~400ms on my phone. This is already a problem on arm64 (and really in all places where we're using LUL today), so we'll want to have a general solution for it. But for 32 bit arm, we could instead go for another approach.
Proposal: We could turn LUL into a Frankenstein unwinder that supports both the LUL format and the raw EXIDX format, by opportunistically using code from EHABIStackWalk to do EXIDX-unwinding on a per-stackframe basis, controlled by LUL. Then we don't need the EXIDX converter and save the startup overhead.
I hacked this up in a quick-and-dirty way, and got the following profile: [LUL profile with per-frame EHABIStackWalk fallback](https://perfht.ml/35wU49d)
It seems to work, and profiler initialization time is down to 130ms! The extra 50ms compared to just EHABIStackWalk (80ms) might be from converting Dwarf unwinding info of system libraries. These timings are preliminary, I'll need to collect some more info here.

For a proper implementation, we'll probably want to make a version of the EHABIStackWalk code that uses more LUL-isms, such as using [TaggedUWords](https://searchfox.org/mozilla-central/rev/54f965e51e4f77866bec42737978d40d4c1fdfc5/tools/profiler/lul/LulMain.h#55-56) everywhere (to propagate whether values are valid), reading from a [stack image](https://searchfox.org/mozilla-central/rev/7908ce29657cfd623993046bd8e38664e1c0b28e/tools/profiler/core/platform.cpp#1919-1950) rather than from the raw stack, and using [LUL-style "address" terminology](https://searchfox.org/mozilla-central/rev/54f965e51e4f77866bec42737978d40d4c1fdfc5/tools/profiler/lul/LulMain.h#21-51).

Back to Bug 1529108 Comment 14