Closed Bug 723228 Opened 12 years ago Closed 11 years ago

nsTArray::AssignRange should use memcpy when possible

Categories

(Core :: General, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla21

People

(Reporter: justin.lebar+bug, Assigned: rzvncj)

References

Details

(Keywords: perf, Whiteboard: [mentor=jlebar][lang=c++])

Attachments

(1 file, 6 obsolete files)

nsTArray::AssignRange (called from AppendElements and ReplaceElementsAt) is implemented as:

  template<class Item>
  void AssignRange(index_type start, size_type count,
                   const Item *values) {
    elem_type *iter = Elements() + start, *end = iter + count;
    for (; iter != end; ++iter, ++values) {
      elem_traits::Construct(iter, *values);
    }
  }

When elem_type has a non-trivial constructor, this is the best we can do.  Similarly, we can't do much if the types Item and elem_type are different.

However, it's often the case that we have a nsTArray of POD's, and we almost always use the same types Item and elem_type.  In that case, we can use memcpy.

In particular, RasterImage::AddSourceData() does

  mSourceData.AppendElements(aBuffer, aCount);

which can end up appending megabytes of data, for a large image.
Keywords: perf
This is a good first bug for someone comfortable with templates.  I can mentor.
Whiteboard: [mentor=jlebar]
Whiteboard: [mentor=jlebar] → [mentor=jlebar][lang=c++]
I assume you want a template parameter or something to indicate that we have plain old data?
I was thinking of using a traits class.  std::is_trivially_copyable, falling back to std::is_trivial, perhaps, if those are available on all our systems.

http://stackoverflow.com/questions/7624714/c-how-do-i-use-type-traits-to-determine-if-a-class-is-trivial
Assignee: nobody → sandervv
Attached patch Use memcpy() for POD structures (obsolete) — Splinter Review
Initial patch to use memcpy() when constructing plain-old-data structures.

I used std::tr1::is_pod() since that guarantees that the structure 1) has a standard layout and 2) is trivially copyable. I could also use is_trivially_copyable(), but that is not supported by the currently used CFLAGS (e.g. -std=gnu++0x).

I made a somewhat dirty hack by using sizeof() instead of typeof(). The latter is a gcc extension and does not exist in -std=gnu++0x (since it only exists in C). I could use typeid(), but that is a run-time check, which requires run-time type-checking. The problem with sizeof() is that it can happen that the size of a type equals the size of a different type (e.g. different order: {int, char} vs. {char, int}).

It was not possible to use dynamic_cast since that would require a polymorphic class (and POD structs are not polymorphic). This is done at run-time, which should be avoided, if possible.

My suggestions to "fix" sizeof() is 1) using typeid() on both types, or 2) document that AssignRange should not use two structs with equal sizeof(). The second suggestion should require a debug assertion which will use typeid() to verify that those two structs are not of a different type, while having the same size. That way, the assertion and typeid() can be optimized-out in official releases, and protect developers during the debug build from accidentally using two different types.

Although, i think it is not the case that two structs with different order, same sizeof() are actually used in the code base. This is an assumption, which I will look after.

I tested this patch on i686 and trunk, with a optimized build. I visited various websites and canvas demos. There were no memory corruptions / segfaults. My next step is to implement a proper test case and push the resulting patch to the try server. Do you have any comments or suggestions before sending it to the try server?
Attachment #595238 - Flags: review?(justin.lebar+bug)
Comment on attachment 595238 [details] [diff] [review]
Use memcpy() for POD structures

We use "feedback" for incomplete patches and "review" for complete patches, so changing the r? to f?.  I'll have a look now at the patch.
Attachment #595238 - Flags: review?(justin.lebar+bug) → feedback?(justin.lebar+bug)
Since we don't use RTTI in Firefox, typeof and dynamic_cast aren't going to work.  It's my fault for leading you down that path; I said to use typeof(), but I wasn't clear that I meant that metaphorically.  Sorry about that!

I think you may be able to get around the typeof issue by using template specialization.  That is, make a new method

   template<>
   void AssignRange<elem_type>(index_type start, size_type count,
                               const elem_type *values)

(I *think* that's the syntax) and use memcpy there, if elem_type is a POD.  You'll want to test with a printf in here to make sure it's actually being hit.

When you push to try, you'll probably discover that not all our compilers support <tr1/type_traits>.  (Actually, they all might support it, but we need to support Visual Studio back to version 2005.  When you're ready, we'll figure out how to push to try and trigger a VS2005 build.)

If we need to support compilers which don't have trl/type_traits, what we probably should do is wrap it in a new mozilla::TypeTraits.  That way, everyone who uses mozilla::TypeTraits doesn't have to figure out if trl/type_traits is supported by their compiler.

I'd like to see some measurement indicating whether this is a perf win.  You don't need to include this measurement code in your patch.  You could add code to TestTArray and measure how long the test takes (perhaps comment out the other tests).  (This has the added advantage that you don't have to re-compile all of Firefox when you change nsTArray -- just make -C xpcom/tests from your objdir and you should be good to go.)
Attachment #595238 - Flags: feedback?(justin.lebar+bug) → feedback+
http://zachsaw.blogspot.com/2010/11/fast-memcpy-for-large-blocks.html
It might be interesting to add such functionality, for larger blocks of data (ie: images of multiple megabytes) it will gain 50% performance. Unfortunately the link to the source is down, if anyone feels like this is a good idea, then I would be more than happy to help my friend (smvv) and implement this for him.
Of course there should be a one-time check with cpuid whether SSE2 is supported and probably use a function pointer that points to either the normal memcpy() or our custom SSE2 memcpy().
Most libc implementations of memcpy are smart enough to use SSE2 when available and the size is sufficient.
I am wondering if I can give this a try. I want to know if anyone is working on this.
This is the patch so far I was working on. The patch compiles properly, but it produced imho strange test results. The memcpy'd AssignRange version of Firefox produced the following results (it's measuring throughput of image decoding in kb/s):

smvv@multivac ~/moz/mozilla-central/obj-x86_64-unknown-linux-gnu $ grep decode decode.log | grep "[0-9]\+" -o | awk '{s+=$1} END {print "sum: "s, " lines: "NR, " average: "s/(NR)}'
sum: 2491896  lines: 197  average: 12649.2

And without the patch:

smvv@multivac ~/moz/mozilla-central/obj-x86_64-unknown-linux-gnu $ grep decode decode_normal.log | grep "[0-9]\+" -o | awk '{s+=$1} END {print "sum: "s, " lines: "NR, " average: "s/(NR)}'
sum: 511320  lines: 78  average: 6555.38

The strange thing is that decoding with memcpy has twice the amount of grep'd lines, while the average throughput is doubled as well. Doubling the average throughput is a good sign, but that does not guarantee that more function calls are good as well.

For the test case, I used the instructions listed at: https://bugzilla.mozilla.org/show_bug.cgi?id=715919#c2 I saved the bigpictures page (including all of its files) to a local hard disk (to reduce odd results due to network latency). I stripped the <script>-tags from the HTML page so it would focus more on decoding the images than rendering and running the scripts. And I used the given awk-scripts provided above to aggregate the data from the printf statement.

Unfortunately, I'm not able to solve this bug due to finishing my study. Feel free to take this bug from here on.
Sorry, I don't have time to work on all the tedious details now, but here's a hopefully elegant and efficient way to have this done:

#include <iostream>
#include <tr1/type_traits>

using namespace std;

template<bool a, bool b> 
struct algorithm_selector { 
    template<typename T> 
    static void implementation( T& object ) 
    { 
        cout << "Not a POD or not same type" << endl;
    } 
};

template<> 
struct algorithm_selector<true, true> { 
    template<typename T> 
    static void implementation( T& object )
    { 
        cout << "POD, same type" << endl;
    } 
};

template<typename T1, typename T2> void doStuff(const T1& t1, const T2& t2)
{
    algorithm_selector<std::tr1::is_pod<T1>::value,
                       std::tr1::is_same<T1, T2>::value>
        ::implementation(t1);
}

class NotPOD {
public:
    NotPOD(int) {}
};

int main()
{
    int i;
    NotPOD np(i);

    doStuff(i,  i );
    doStuff(np, i );
    doStuff(i,  np);
    doStuff(np, np);
}

The compiler won't let you get away with explicit specializations in a non-namespace scope (i.e. where the AssignRange() function is place now, i.e. inside a template class, using template parameters), so whoever does this needs to get the algorithm_selector struct outside the nsTArray class (which will present the new challenge of having to modify the signature of AssignRange to receive parameters that it could directly access before).
We now have IsPodType in js/public/TemplateLib.h, so this is actually feasible now.  We'd probably want to move that into mfbt/TypeTraits.h.
Attached patch Template specialization patch. (obsolete) — Splinter Review
How does this patch look? Haven't had time to test if it actually works :D, but it does compile.
Comment on attachment 704571 [details] [diff] [review]
Template specialization patch.

This looks like the right idea to me!  I'll push it to try so we can see the test results.

Note that I think we'd need to move the relevant bits from js/TemplateLib.h into mfbt/, since aiui the js:: namespace (not to be confused with the JS:: namespace) is private.
Attachment #704571 - Flags: feedback+
Attachment #595238 - Attachment is obsolete: true
Attachment #615078 - Attachment is obsolete: true
> This looks like the right idea to me!  I'll push it to try so we can see the
> test results.

Great, thanks for the prompt feedback!

> Note that I think we'd need to move the relevant bits from js/TemplateLib.h
> into mfbt/, since aiui the js:: namespace (not to be confused with the JS::
> namespace) is private.

That should be easy enough, but it might take a little while to find the time, with a full-time job and everything. I do hope to be able to work on it soon, though.
How's this?
Attachment #704571 - Attachment is obsolete: true
Comment on attachment 704944 [details] [diff] [review]
Template specialization patch (take 2).

This looks good to me!  We need Waldo to look at it for the JS and MFBT
changes.

>--- a/js/src/yarr/CheckedArithmetic.h	Tue Jan 22 09:41:07 2013 -0500
>+++ b/js/src/yarr/CheckedArithmetic.h	Tue Jan 22 19:24:03 2013 +0200
>  * There are two modes of operation:
>  *  - The default is Checked<T, CrashOnOverflow>, and crashes at the point
>  *    and overflow has occurred.
>  *  - The alternative is Checked<T, RecordOverflow>, which uses an additional
>- *    byte of storage to track whether an overflow has occurred, subsequent
>+*    byte of storage to track whether an overflow has occurred, subsequent
>  *    unchecked operations will crash if an overflow has occured

Looks like an accidental change.
Attachment #704944 - Flags: review+
Attachment #704944 - Flags: review?(jwalden+bmo)
> This looks good to me!  We need Waldo to look at it for the JS and MFBT
> changes.

Thanks!

> >- *    byte of storage to track whether an overflow has occurred, subsequent
> >+*    byte of storage to track whether an overflow has occurred, subsequent
> >  *    unchecked operations will crash if an overflow has occured
> 
> Looks like an accidental change.

You're right, sorry about that. I'll post the corrected patch in a moment.
Fixed the accidental line change.
Attachment #704944 - Attachment is obsolete: true
Attachment #704944 - Flags: review?(jwalden+bmo)
Attachment #705009 - Flags: review?(jwalden+bmo)
Comment on attachment 705009 [details] [diff] [review]
Template specialization patch (take 3).

Review of attachment 705009 [details] [diff] [review]:
-----------------------------------------------------------------

Razvan, this looks pretty reasonable.  One issue, tho -- TypeTraits.h is an incremental reimplementation of <type_traits>, so there are some names to change somewhat, and you shouldn't add mozilla::If there.  Plus there are some style nits.  Get those fixed and this should be in good shape.

Oh, would you mind setting the review flag to "?" and my username when you post a new patch?  I pay much closer attention to my review queue than I do to all bugmail (although it's a bit full at present, unfortunately, so I may not be totally prompt in getting to everything placed there :-\ ), so the best way to get something in front of me that I won't forget or overlook is to put it there.

::: js/public/HashTable.h
@@ +10,5 @@
>  
>  #include "mozilla/Attributes.h"
>  #include "mozilla/DebugOnly.h"
>  #include "mozilla/Util.h"
> +#include "mozilla/TypeTraits.h"

This should be mixed into this list alphabetically.

::: mfbt/TypeTraits.h
@@ +166,1 @@
>  {};

I have a patch in my queue that moves/adds IsSame and IsPointer, but you're probably going to get your thing done before I'll get mine done.  My patch has these comments in it -- please add them to yours, just before the relevant template structs, so that the headers fully document what they do:

+/**
+ * IsPointer determines whether a type is a pointer type (but not a pointer-to-
+ * member type).
+ *
+ * mozilla::IsPointer<struct S*>::value is true;
+ * mozilla::IsPointer<int**>::value is true;
+ * mozilla::IsPointer<void (*)(void)>::value is true;
+ * mozilla::IsPointer<int>::value is false;
+ * mozilla::IsPointer<struct S>::value is false.
+ */

+/**
+ * IsSame tests whether two types are the same type.
+ *
+ * mozilla::IsSame<int, int>::value is true;
+ * mozilla::IsSame<int*, int*>::value is true;
+ * mozilla::IsSame<int, unsigned int>::value is false;
+ * mozilla::IsSame<void, void>::value is true;
+ * mozilla::IsSame<const int, int>::value is false;
+ * mozilla::IsSame<struct S, struct S>::value is true.
+ */

Also please add this to the top of the file just above the |namespace mozilla|, to make the <type_traits> implementation similarity explicit:

+/*
+ * These traits are approximate copies of the traits and semantics from C++11's
+ * <type_traits> header.  Don't add traits not in that header!  When all
+ * platforms provide that header, we can convert all users and remove this one.
+ */

@@ +171,5 @@
>      typedef T Type;
>  };
>  
> +/* Boolean test for whether two types are the same. */
> +template <class T, class U> struct IsSameType {

TypeTraits.h emulates <type_traits> from C++11 except for not using namespace std and using mfbt's capitalization style (see mfbt/STYLE) rather than underscores.  So this should be renamed to IsSame, because in <type_traits> it was std::is_same.

Regarding styling, this should be:

template<typename T, typename U>
struct IsSame
{

@@ +174,5 @@
> +/* Boolean test for whether two types are the same. */
> +template <class T, class U> struct IsSameType {
> +    static const bool result = false;
> +};
> +template <class T> struct IsSameType<T,T> {

and

template<typename T>
struct IsSame<T, T>
{

@@ +182,5 @@
> +/*
> + * Traits class for identifying POD types. Until C++0x, there is no automatic
> + * way to detect PODs, so for the moment it is done manually.
> + */
> +template <class T> struct IsPodType                 { static const bool result = false; };

IsPod, because it's std::is_pod.

@@ +198,5 @@
> +template <> struct IsPodType<bool>                  { static const bool result = true; };
> +template <> struct IsPodType<float>                 { static const bool result = true; };
> +template <> struct IsPodType<double>                { static const bool result = true; };
> +template <> struct IsPodType<wchar_t>               { static const bool result = true; };
> +template <typename T> struct IsPodType<T *>         { static const bool result = true; };

T*, please.

@@ +201,5 @@
> +template <> struct IsPodType<wchar_t>               { static const bool result = true; };
> +template <typename T> struct IsPodType<T *>         { static const bool result = true; };
> +
> +template <bool cond, typename T, T v1, T v2> struct If        { static const T result = v1; };
> +template <typename T, T v1, T v2> struct If<false, T, v1, v2> { static const T result = v2; };

Don't add If, please, because it's not in <type_traits>.  Plus I don't think this patch actually uses this, either...

@@ +204,5 @@
> +template <bool cond, typename T, T v1, T v2> struct If        { static const T result = v1; };
> +template <typename T, T v1, T v2> struct If<false, T, v1, v2> { static const T result = v2; };
> +
> +template <class T> struct IsPointerType             { static const bool result = false; };
> +template <class T> struct IsPointerType<T *>        { static const bool result = true; };

IsPointer because it's std::is_pointer.

::: xpcom/glue/nsTArray.h
@@ +18,5 @@
>  #include "nscore.h"
>  #include "nsQuickSort.h"
>  #include "nsDebug.h"
>  #include "nsTraceRefcnt.h"
> +#include "mozilla/TypeTraits.h"

You should put this up top alphabetically amongst the other mozilla/* includes.

@@ +452,5 @@
>  
> +template<bool IsPod, bool IsSameType>
> +struct AssignRangeAlgorithm {
> +  template<class Item, class ElemType, class IndexType, class SizeType>
> +  static void implementation(ElemType* elements, IndexType start, 

You'll want to remove this trailing whitespace, and at the same point in the specialization below.

@@ +1327,5 @@
>    void AssignRange(index_type start, size_type count,
>                     const Item *values) {
> +    AssignRangeAlgorithm<mozilla::IsPodType<Item>::result,
> +                         mozilla::IsSameType<Item, elem_type>::result>
> +      ::implementation(Elements(), start, count, values); 

More trailing whitespace here to not add.

(I see more trailing whitespace in context, but we should probably clean that up in separate patches.  Still, we shouldn't add any in new patches.)
Attachment #705009 - Flags: review?(jwalden+bmo)
Here you go.
Attachment #705009 - Attachment is obsolete: true
Attachment #711239 - Flags: review?(jwalden+bmo)
Comment on attachment 711239 [details] [diff] [review]
Template specialization patch (take 4).

Review of attachment 711239 [details] [diff] [review]:
-----------------------------------------------------------------

Looks pretty good.  A last few fixes, then r?me and I'll push this for you.  Thanks!

::: js/public/TemplateLib.h
@@ -141,5 @@
>  template <bool cond, typename T, T v1, T v2> struct If        { static const T result = v1; };
>  template <typename T, T v1, T v2> struct If<false, T, v1, v2> { static const T result = v2; };
>  
>  template <class T> struct IsPointerType             { static const bool result = false; };
>  template <class T> struct IsPointerType<T *>        { static const bool result = true; };

If you're adding mozilla::IsPointer, you should definitely be removing this, and changing all users of it.

::: js/src/ds/LifoAlloc.h
@@ +11,5 @@
>  #include "mozilla/Attributes.h"
>  #include "mozilla/DebugOnly.h"
>  #include "mozilla/GuardObjects.h"
>  #include "mozilla/ASan.h"
> +#include "mozilla/TypeTraits.h"

This should be alphabetized (and so should the ASan.h entry as well, while you're here).

::: js/src/jsanalyze.h
@@ +18,5 @@
>  #include "ds/LifoAlloc.h"
>  #include "js/TemplateLib.h"
>  #include "vm/ScopeObject.h"
>  
> +#include "mozilla/TypeTraits.h"

mozilla/*.h #includes appear first in JS headers -- just after the #define jsanalyze_h___ (separated by a blank line from that and the #includes to follow, of course).

::: mfbt/TypeTraits.h
@@ +186,5 @@
> + * mozilla::IsSame<void, void>::value is true;
> + * mozilla::IsSame<const int, int>::value is false;
> + * mozilla::IsSame<struct S, struct S>::value is true.
> + */
> +template <class T, class U> struct IsSame

template<typename T, typename U>
struct IsSame
{

@@ +190,5 @@
> +template <class T, class U> struct IsSame
> +{
> +    static const bool result = false;
> +};
> +template <class T> struct IsSame<T,T>

template<typename T>
struct IsSame<T, T>
{

@@ +199,5 @@
> +/*
> + * Traits class for identifying POD types. Until C++0x, there is no automatic
> + * way to detect PODs, so for the moment it is done manually.
> + */
> +template <class T> struct IsPod                 { static const bool result = false; };

typename T, not class T

@@ +214,5 @@
> +template <> struct IsPod<unsigned long long>    { static const bool result = true; };
> +template <> struct IsPod<bool>                  { static const bool result = true; };
> +template <> struct IsPod<float>                 { static const bool result = true; };
> +template <> struct IsPod<double>                { static const bool result = true; };
> +template <> struct IsPod<wchar_t>               { static const bool result = true; };

Hmm.  If we're going to specialize for wchar_t here, that means this header needs to #include <wchar.h>.  Please add that before the opening |namespace mozilla| in this file.
Attachment #711239 - Flags: review?(jwalden+bmo)
Right.
Attachment #711239 - Attachment is obsolete: true
Attachment #711743 - Flags: review?(jwalden+bmo)
Comment on attachment 711743 [details] [diff] [review]
Template specialization patch (take 5).

Review of attachment 711743 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good!  I'll push this shortly.
Attachment #711743 - Flags: review?(jwalden+bmo) → review+
> Looks good!  I'll push this shortly.

Thanks! Glad to have been able to help, and it's been a pleasure working with you.
I made a few more little style tweaks during a final once-over and pushed.  Thanks again!

https://hg.mozilla.org/integration/mozilla-inbound/rev/d49cff40b5c3

Just to note, usually when you attach patches you should include a summary in them, and you should set a username on them -- perhaps like so if you use Mercurial queues:

hg qref -m 'Bug 723228 - nsTArray::AssignRange should use memcpy when possible.  r=jlebar for the XPCOM changes, r=jwalden for js/mfbt changes' -u 'Razvan Cojocaru <rzvncj@gmail.com>'

If other people are pushing your patches regularly, they're going to want these things done before they push.  (A new, final version of the patch is helpful in such cases.)  I don't push too many things other than what I've written myself, and my workflow involves me changing patch summaries just before pushing, so it's not a big deal for me to do it this time.

If you use Mercurial queues, you can have your username associated with the patch automatically -- just add something like this to your ~/.hgrc:

[ui] 
username = Your Name <emailaddress@example.com>

[defaults]
qnew = -U
Assignee: sandervv → rzvncj
Status: NEW → ASSIGNED
> I made a few more little style tweaks during a final once-over and pushed. 
> Thanks again!

My pleasure.

> If other people are pushing your patches regularly, they're going to want
> these things done before they push.  (A new, final version of the patch is
> helpful in such cases.)  I don't push too many things other than what I've
> written myself, and my workflow involves me changing patch summaries just
> before pushing, so it's not a big deal for me to do it this time.
> 
> If you use Mercurial queues, you can have your username associated with the
> patch automatically -- just add something like this to your ~/.hgrc:

Fair enough, and duly noted. Thank you for your help.

I do have that stuff in my ~/.hgrc, but set up for my Xen patches, where they have a whole different thing about submitting patches, patch format and ownership. I'll make sure to stick to the Mozilla guidelines with my next patches :D
I'm not sure, but can you put those in the .hg/hgrc for your Mozilla repository, and have them override global settings?  I'd expect yes, but I can't remember if I've had occasion to test this.
http://hg.mozilla.org/integration/mozilla-inbound/rev/0d1a3f041a8f

The rest of one of my trees has some further type traits work, and when I rebased it I discovered a small issue: we were using the old |result| name for the bool in the traits class.  The name doesn't actually matter so long as you're consistent in all uses of it.  But <type_traits> and TypeTraits.h before now used only |value| for everything, so it should have been named |value|.  Sorry I didn't notice this before!
Target Milestone: --- → mozilla21
> I'm not sure, but can you put those in the .hg/hgrc for your Mozilla
> repository, and have them override global settings?  I'd expect yes, but I
> can't remember if I've had occasion to test this.

I would expect that it's possible, since that doesn't sound like something that would be hard to implement :), however I haven't tried anything like that either. The solution I was thinking about was much more mundane: simply have a hgrc-mozilla and a hgrc-xen file, and switch them when changing projects.
> as you're consistent in all uses of it.  But <type_traits> and TypeTraits.h
> before now used only |value| for everything, so it should have been named
> |value|.  Sorry I didn't notice this before!

I see, sorry I didn't notice it either. But all's well that ends well. :)
Depends on: 839852
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: