Closed Bug 1309171 Opened 6 years ago Closed 6 years ago

Reduce an unnecessary copy by using universal reference

Categories

(Core :: Audio/Video: GMP, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla52
Tracking Status
firefox52 --- fixed

People

(Reporter: JamesCheng, Assigned: JamesCheng)

Details

Attachments

(1 file)

Just found this API will make a copy twice.

http://searchfox.org/mozilla-central/rev/3e03a4064eb585d96f28023785a5c242969878a6/media/gmp-clearkey/0.1/gmp-task-utils-generated.h#1934

Since it is not gecko code...use std::forward directly instead.
Attachment #8799699 - Flags: review?(gsquelart)
Hi Gerald,

Since Chris is on business trip, could you please help to review it? 

Thanks.
Comment on attachment 8799699 [details]
Bug 1309171 - Reduce an unnecessary copy by universal reference.

https://reviewboard.mozilla.org/r/84836/#review83420

Thank you for looking into potential optimizations as you spot them.

::: media/gmp-clearkey/0.1/gmp-task-utils-generated.h:1932
(Diff revision 1)
>  template<typename Type, typename Method, typename... Args>
>  GMPTask*
> -WrapTaskRefCounted(Type* aType, Method aMethod, Args... args)
> +WrapTaskRefCounted(Type* aType, Method aMethod, Args&&... args)
>  {
> -  GMPTask* t = WrapTask(aType, aMethod, args...);
> +  GMPTask* t = WrapTask(aType, aMethod, std::forward<Args>(args)...);

I would normally agree with using forwarding references in general, but in this case:
- We are still calling WrapTask, which does a copy too, so it seems strange to change one but not the other.
- All the uses I can find only pass PODs (int32_t, raw pointers), which while being as efficient to pass by value or by reference, are actually more efficient to use directly as values instead of through a reference indirection. (Though they may all be optimized away, ending up with the same code?)

So I hesitate to grant r+ here, you'll need to convince me that this change is beneficial!
(Or check with Chris when he's back.)
I don't understand how std::forward works, so I'll let gerald review this.
Rank: 25
Priority: -- → P2
Hi Gerald,

I have no compelling reason to convince you...

I don't know where is the gmp_task_args_nm_X code generated from so I dare not modify those interface.

BTW, I want to make sure one thing, 

Is it always more efficient to "call by value" when passing "POD" or "primitive type"?

Even the sizeof(POD) or sizeof(long double) is larger then ```sizeof(pointer*) 4 or 8 byte```?

Is there any article to analyze the performance which may refer to the optimization on POD type?

Thank you.
Flags: needinfo?(gsquelart)
Comment on attachment 8799699 [details]
Bug 1309171 - Reduce an unnecessary copy by universal reference.

https://reviewboard.mozilla.org/r/84836/#review83662

(In reply to James Cheng[:JamesCheng] from comment #5)
> I have no compelling reason to convince you...
> I don't know where is the gmp_task_args_nm_X code generated from so I dare
> not modify those interface.

Yeah, there is a lot of code there! I wonder if it would be possible to convert to a variadic template.
But I'm not sure it's worth our time trying to hyper-optimize this clearkey code, which (I believe) is probably not used outside of our testing.

> BTW, I want to make sure one thing, 
> 
> Is it always more efficient to "call by value" when passing "POD" or
> "primitive type"?
> 
> Even the sizeof(POD) or sizeof(long double) is larger then
> ```sizeof(pointer*) 4 or 8 byte```?

My intuition was that regardless of the cost of passing itself (which should be equal for similarly-sized objects that can fit in a register), the extra cost of using references will come when *accessing* the target object.
E.g., on a 32-bit machine, 'void takeInt(int32_t i)' and 'void takeIntRef(int32_t& r)' both take a 32-bit value (the integer itself, or the pointer that carries the reference).
But consider the implementations:
'void takeInt(int32_t i) { doSomething(i); }'
'void takeIntRef(int32_t& r) { doSomething(r); }'
The 2nd one should be equivalent to:
'void takeIntPtr(int32_t* p) { doSomething(*p); }'
where we can see a need to dereference the pointer to access the value.

Also, there could be a cost coming from forcing the object to have an address when passing by reference.


but then, I'm guessing that the compiler may be able to optimize away the pointer work, especially given inline-able templates.
And as you have demonstrated in a timed example, pass-by-reference code may even be faster than pass-by-value!

In the end, real data should trump intuition, so I'm happy to r+ this change.
Well done!


> Is there any article to analyze the performance which may refer to the
> optimization on POD type?

After a short search I couldn't find one. But it would be interesting to know what others think!
Attachment #8799699 - Flags: review?(gsquelart) → review+
Thank you for discussing with this and reviewing.
Keywords: checkin-needed
Summary: Reduce an unnecessary copy by universal reference → Reduce an unnecessary copy by using universal reference
Pushed by cbook@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/672aaae9d1eb
Reduce an unnecessary copy by universal reference. r=gerald
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/672aaae9d1eb
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
(Clearing needinfo, responded in comment 6)
Flags: needinfo?(gsquelart)
You need to log in before you can comment on or make changes to this bug.