(In reply to Doug Thayer [:dthayer] (he/him) from comment #6) > I would be most interested in finding all places where we initialize large buffers. We could perhaps account for this by passing the size of the auto-initialized region as an argument to the out-of-line function, and then fiddling with that function to get the results we want (including doing proportional memory-bound busywork, or even calling out to different sub-functions to distinguish the cases in the profiler). > This is just a stab in the dark, but my guess would be that the extra time is dominated by initializations of buffers larger than a cache line in size. My intuition has always been that the stack is usually in L1, so I'd be a little surprised if the cache line boundary were relevant. But I could be wrong.
Bug 1772353 Comment 10 Edit History
Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.
(In reply to Doug Thayer [:dthayer] (he/him) from comment #6) > I would be most interested in finding all places where we initialize large buffers. We could perhaps account for this by passing the size of the auto-initialized region as an argument to the out-of-line function, and then fiddling with that function to get the results we want (including doing proportional memory-bound busywork, or even calling out to different sub-functions to distinguish the cases in the profiler). > This is just a stab in the dark, but my guess would be that the extra time is dominated by initializations of buffers larger than a cache line in size. My intuition has always been that the entire stack is usually in L1, so I'd be a little surprised if the cache line boundary were relevant. But I could be wrong.