PodCopy "optimizes" memcpy to a loop of smaller memcpys
Categories
(Core :: MFBT, enhancement, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox140 | --- | fixed |
People
(Reporter: jruderman, Assigned: mgaudet)
References
(Blocks 1 open bug)
Details
(Whiteboard: [sp3])
Attachments
(2 files)
1.98 KB,
patch
|
Details | Diff | Splinter Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review |
Reporter | ||
Updated•12 years ago
|
Comment 1•12 years ago
|
||
![]() |
||
Comment 2•12 years ago
|
||
Reporter | ||
Comment 3•12 years ago
|
||
![]() |
||
Comment 4•12 years ago
|
||
![]() |
||
Comment 5•12 years ago
|
||
Comment 6•12 years ago
|
||
Updated•3 years ago
|
Assignee | ||
Comment 7•1 year ago
|
||
Assignee | ||
Comment 8•1 year ago
|
||
I think we should drop the POD copy heuristic and just use memcopy. I wandered into Bug 661957 yesterday, and then thought a bit about how things had changed in the last decade -- and surely memcopy has had a lot of optimization pressure. So I did some experiments:
Evidence:
- Doubling heuristic length to avoid memcopy (128 -> 256): Regression -- pretty uniformly slower
- Halving heuristic length (128 -> 64) : Mixed results -- Medium confidence improvements and regressions
- Dropping from 128 to 8 (16x lower) -- low confidence improvment of top level score, improvements a little better on subtests
- Removing the heuristic entirely brings a large number of 1-2% improvements with high confidence... but also brings 2 high confidence regressions on ChartJS in the 4% range -- the summary score ends up being a wash.
Based on this I suspect strongly we'd like to the equivalent patch to Jesse Ruderman's above (I will provide a phabricator revision shortly); however I would also like to understand a little more the regression:
Markus, would you be able to do a perf-compare on the provided patch, to see if you can find where the time has gone into in ChartJS (I suppose one other possibility is unfortunate moving of a GC into timed area).
In the mean time, I'll dispatch the rest of the OS -- it's possible this will be OS/Compiler dependent and we may need to #ifdef this on platform.
Assignee | ||
Comment 9•1 year ago
|
||
Alright, for my own notes here's the upcoming perf links:
You'll be able to find a performance comparison here once the tests are complete (ensure you select the right framework):
https://perf.compare/compare-results?baseRev=1701b0ea175efa4a486bb8a74b9a0e07605e7cb2&newRev=143793b3c147abd66619a654cd00ef20fb0c2074&baseRepo=try&newRepo=tryThe old comparison tool is still available at this URL:
https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=1701b0ea175efa4a486bb8a74b9a0e07605e7cb2&newProject=try&newRevision=143793b3c147abd66619a654cd00ef20fb0c2074
Assignee | ||
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Assignee | ||
Comment 10•1 year ago
|
||
Huh. A re-test showed more regressions.
Comment 11•5 months ago
|
||
Pushed to try with updated patch:
Base revision's try run: https://treeherder.mozilla.org/jobs?repo=try&revision=d9485fabe00f5b6033e94899fd7e653c412d65e0
Local revision's try run: https://treeherder.mozilla.org/jobs?repo=try&revision=4e363c28dfa1c4c68f038ef50530e23e8f770297
I can run a comparison report on the PGO builds from these pushes once they're ready.
Assignee | ||
Comment 12•5 months ago
|
||
(Retriggered the tests for more comparability)
Updated•4 months ago
|
Assignee | ||
Comment 13•4 months ago
|
||
Basically perf seems to be a wash, and at this point we've talked about this for 12 years. We should land the patch, and if and only if we see actual regressions do we investigate further.
Also, this lets us undo a compiler workaround added in Bug 1863131.
Comment 14•4 months ago
|
||
Comment 15•4 months ago
|
||
bugherder |
Updated•4 months ago
|
Description
•