Figure out the root cause of bug 1671170
Categories
(Core :: Memory Allocator, task)
Tracking
()
Tracking | Status | |
---|---|---|
firefox85 | --- | fixed |
People
(Reporter: emilio, Assigned: emilio)
References
Details
Attachments
(1 file)
That is, the cause of https://github.com/servo/rust-smallvec/issues/243. I can confirm that:
- The regression is still present on smallvec 1.5.1: https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=0abf0aa8e985149d8eaa7859a3db7d5e1495ecd1&newProject=try&newRevision=0dcc86dec1f6eae6b467974cf516d65d33e301f3&framework=10
- The regression appears in https://github.com/servo/rust-smallvec/commit/bdfc429: https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=470cb1b8a818c20b9838503068fa0e8ef0ac44ee&newProject=try&newRevision=a1232b0cee559ba85b65cdaee8779b16e95523ad&framework=10
- The regression seems, as expected, gone if I revert that commit on 1.5.1: https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=0abf0aa8e985149d8eaa7859a3db7d5e1495ecd1&newProject=try&newRevision=db644c963ebb441d38bb3d514f35f39c39699b3d&framework=10
So somehow the change from malloc
+ memcpy
to realloc
made it slower, which is unexpected. My guess is that something is holding on a global lock when it shouldn't in realloc, but that doesn't happen on malloc somehow...
The fact that it was a Windows + Linux regression, but not Mac, makes me suspect PHC is somehow involved (PHC is not enabled by default on Mac). This will eventually compare trunk vs. 1.5.1 with PHC disabled:
Assignee | ||
Comment 1•5 years ago
•
|
||
Preliminary investigation does look like this is a PHC / jemalloc issue.
Assignee | ||
Comment 2•5 years ago
|
||
So if I understand correctly, PageAlloc has a mechanism to avoid multiple threads racing for an allocation doing slow stuff like getting stack-traces by incrementing the delay, but that's not done by PageRealloc. And every time you realloc something that PHC allocated, we get a new stack using the global mutex...
Assignee | ||
Comment 3•5 years ago
|
||
Assignee | ||
Comment 4•5 years ago
|
||
Nope, so regression is still there with PHC disabled, and thus my patch does ~nothing and my hypothesis was wrong.
Assignee | ||
Comment 5•5 years ago
|
||
Ok, second hypothesis: RallocGrowLarge is slow.
Assignee | ||
Comment 6•5 years ago
|
||
Ok, here's a better theory. This call should actually switch arenas for larger allocations. Otherwise we grow the thread-local arenas a lot.
Assignee | ||
Comment 7•5 years ago
|
||
Updated•5 years ago
|
Assignee | ||
Comment 8•5 years ago
|
||
The regression seems gone with that patch: https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=0abf0aa8e985149d8eaa7859a3db7d5e1495ecd1&newProject=try&newRevision=a64a8412d04de4aee3d807343b928663a213562f
Updated•5 years ago
|
Updated•5 years ago
|
Comment 10•5 years ago
|
||
bugherder |
Description
•