Open Bug 933149 Opened 11 years ago Updated 2 years ago

PodCopy "optimizes" memcpy to a loop of smaller memcpys

Categories

(Core :: MFBT, defect)

defect

Tracking

()

People

(Reporter: jruderman, Unassigned)

References

Details

Attachments

(1 file)

http://hg.mozilla.org/mozilla-central/rev/30cabba00388
  added PodCopy
  included a small-array path (for compilers that don't understand memcpy?)

http://hg.mozilla.org/mozilla-central/rev/b36175bbda47
  changed the small-array path from an "=" loop to a silly "memcpy" loop.

http://hg.mozilla.org/mozilla-central/rev/de6afab8b383
  moved PodCopy from jsutil.h to mfbt/PodOperations.h
Attachment #825165 - Flags: review?(luke)
Attachment #825165 - Flags: review?(jwalden+bmo)
I believe the idea was that the memcpy calls with constant arguments would be inlined by the compiler. Do you have data that suggests this isn't the case?
(In reply to :Ms2ger from comment #1)
> I believe the idea was that the memcpy calls with constant arguments would
> be inlined by the compiler. Do you have data that suggests this isn't the
> case?

In the interest of sharing: I too was confused by this path in PodOperations.  GCC and Clang will both inline memcpy with constant arguments (assuming the constant is small enough).  MSVC (2010 is the version I checked, I think; maybe they've made this smarter) *will not* when optimizing for size, even when the inlining is obviously beneficial:

  memcpy(&an_int, &some_other_int_sized_thing, 4);

Two movs is shorter than the function call, but MSVC prefers the function call.

The argument for the short-array path (in bugzilla comments or IRC, unfortunately not preserved in the code) is twofold:

- The above;
- For arrays of runtime length, the inline loop saves the call to memcpy, which can be expensive relative to the cost of the copy.
I don't understand your first argument.  If MSVC [optimizing-for-size] never inlines even a constant-size memcpy, how can it be better to call memcpy in a loop rather than call memcpy once?

For gcc and clang, we should just file bugs if they make bad decisions about whether to inline non-constant-size memcpy.
(In reply to Jesse Ruderman from comment #3)
> I don't understand your first argument.  If MSVC [optimizing-for-size] never
> inlines even a constant-size memcpy, how can it be better to call memcpy in
> a loop rather than call memcpy once?

Oh, I see what's going on here.  I assumed we were copying with operator=.
Comment on attachment 825165 [details] [diff] [review]
patch: remove the small-array path

That's a lot of snark there chief.  The small length case was added because memcpy was a measurable slowdown on that did a lot of small (but not constant) length memcpys.  The reason was that the memcpy becomes a libc call (PLT overhead) and the memcpy implementation spends extra operations preparing to do a large copy that don't make sense when there is little to copy).

This was all measured with old gcc/msvc, though.  Feel free to find some workloads that pound on small PodCopy and remeasure on modern gcc/msvc.
Attachment #825165 - Flags: review?(luke)
Comment on attachment 825165 [details] [diff] [review]
patch: remove the small-array path

Review of attachment 825165 [details] [diff] [review]:
-----------------------------------------------------------------

What luke said.  Numbers, Mrs. Landingham.
Attachment #825165 - Flags: review?(jwalden+bmo)
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: