Open
Bug 933149
Opened 11 years ago
Updated 2 years ago
PodCopy "optimizes" memcpy to a loop of smaller memcpys
Categories
(Core :: MFBT, defect)
Core
MFBT
Tracking
()
NEW
People
(Reporter: jruderman, Unassigned)
References
Details
Attachments
(1 file)
1.98 KB,
patch
|
Details | Diff | Splinter Review |
http://hg.mozilla.org/mozilla-central/rev/30cabba00388 added PodCopy included a small-array path (for compilers that don't understand memcpy?) http://hg.mozilla.org/mozilla-central/rev/b36175bbda47 changed the small-array path from an "=" loop to a silly "memcpy" loop. http://hg.mozilla.org/mozilla-central/rev/de6afab8b383 moved PodCopy from jsutil.h to mfbt/PodOperations.h
Attachment #825165 -
Flags: review?(luke)
Reporter | ||
Updated•11 years ago
|
Attachment #825165 -
Flags: review?(jwalden+bmo)
Comment 1•11 years ago
|
||
I believe the idea was that the memcpy calls with constant arguments would be inlined by the compiler. Do you have data that suggests this isn't the case?
Comment 2•11 years ago
|
||
(In reply to :Ms2ger from comment #1) > I believe the idea was that the memcpy calls with constant arguments would > be inlined by the compiler. Do you have data that suggests this isn't the > case? In the interest of sharing: I too was confused by this path in PodOperations. GCC and Clang will both inline memcpy with constant arguments (assuming the constant is small enough). MSVC (2010 is the version I checked, I think; maybe they've made this smarter) *will not* when optimizing for size, even when the inlining is obviously beneficial: memcpy(&an_int, &some_other_int_sized_thing, 4); Two movs is shorter than the function call, but MSVC prefers the function call. The argument for the short-array path (in bugzilla comments or IRC, unfortunately not preserved in the code) is twofold: - The above; - For arrays of runtime length, the inline loop saves the call to memcpy, which can be expensive relative to the cost of the copy.
Reporter | ||
Comment 3•11 years ago
|
||
I don't understand your first argument. If MSVC [optimizing-for-size] never inlines even a constant-size memcpy, how can it be better to call memcpy in a loop rather than call memcpy once? For gcc and clang, we should just file bugs if they make bad decisions about whether to inline non-constant-size memcpy.
Comment 4•11 years ago
|
||
(In reply to Jesse Ruderman from comment #3) > I don't understand your first argument. If MSVC [optimizing-for-size] never > inlines even a constant-size memcpy, how can it be better to call memcpy in > a loop rather than call memcpy once? Oh, I see what's going on here. I assumed we were copying with operator=.
Comment 5•11 years ago
|
||
Comment on attachment 825165 [details] [diff] [review] patch: remove the small-array path That's a lot of snark there chief. The small length case was added because memcpy was a measurable slowdown on that did a lot of small (but not constant) length memcpys. The reason was that the memcpy becomes a libc call (PLT overhead) and the memcpy implementation spends extra operations preparing to do a large copy that don't make sense when there is little to copy). This was all measured with old gcc/msvc, though. Feel free to find some workloads that pound on small PodCopy and remeasure on modern gcc/msvc.
Attachment #825165 -
Flags: review?(luke)
Comment 6•11 years ago
|
||
Comment on attachment 825165 [details] [diff] [review] patch: remove the small-array path Review of attachment 825165 [details] [diff] [review]: ----------------------------------------------------------------- What luke said. Numbers, Mrs. Landingham.
Attachment #825165 -
Flags: review?(jwalden+bmo)
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•