Closed
Bug 1121860
Opened 11 years ago
Closed 2 months ago
Evaluate OpenCV performance with asm.js - operation min, max and inRange
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
WONTFIX
People
(Reporter: kaku, Unassigned)
References
Details
Attachments
(3 files)
The min, max and inRange operations of asm.js version OpenCV are slow and share the same pattern. It is especially slow while dealing with uint8 and sint8 data type although these two data types get better oprtimization in native environment.
The benchmark table could is here:
https://docs.google.com/a/mozilla.com/spreadsheets/d/1w8gr1_q_dQclTaGyiMtAhlwj_5_HPVKbHWfwUszPU40/edit#gid=288104140
Reporter | ||
Comment 1•11 years ago
|
||
The runnable HTML/JS files and profiling file of cv::inRange operarion to different datatypes.
Reporter | ||
Comment 2•11 years ago
|
||
The runnable HTML/JS files and profiling file of cv::min operarion to different datatypes.
Reporter | ||
Comment 3•11 years ago
|
||
Hi Luke and Alon,
This is a new case.
Please take a look to the attached files.
Thanks.
Flags: needinfo?(luke)
Flags: needinfo?(azakai)
![]() |
||
Comment 4•11 years ago
|
||
For the u8 case, here is the hot function where roughly all the time is spent. Without spending too much time combing over this, one thing that is easy to see is that there is a lot of redundant |256 and &255 going on. In many cases, it appears Emscripten is able to tell "this cane from a HEAPU8 load" and avoid the mask, but it seems in other cases it misses this and pessimistically inserts the masks. It'd be nice to know if this was just an Emscripten problem or if this is something that would benefit from asm.js having smaller-than-32-bit integer support.
Flags: needinfo?(luke)
![]() |
||
Comment 5•11 years ago
|
||
While it would be nice to fix the problems mentioned in comment 4, I expect they only contribute to a % overhead, not the 30x factor slowdown compared to native reported in comment 0. I just remembered that, for byte stores, to deal with the x86 low/high byte-register-aliasing problem, we just fix the source register to %al. I wonder if massive, unnecessary spilling in an otherwise tight loop could explain the massive slowdown? If so, then we might finally have our use case to fix this problem properly.
Any estimate on how hard it would be to extend the regalloc to allow restricting an LAllocation to a subset of registers? I'm not going to even think of trying to take advantage of ah,bh,ch,dh for now ;)
Comment 6•11 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #5)
> I just remembered that, for byte stores,
> to deal with the x86 low/high byte-register-aliasing problem, we just fix
> the source register to %al. I wonder if massive, unnecessary spilling in an
> otherwise tight loop could explain the massive slowdown? If so, then we
> might finally have our use case to fix this problem properly.
x64 doesn't have this problem so it should be possible to compare x86 to x64 and see if there's a significant difference. If it's also slow on x64 there are other problems, if it's faster on x64 it could be this (or some other issue, like x64 having more registers).
Comment 7•11 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #4)
> For the u8 case, here is the hot function where roughly all the time is
> spent. Without spending too much time combing over this, one thing that is
> easy to see is that there is a lot of redundant |256 and &255 going on. In
> many cases, it appears Emscripten is able to tell "this cane from a HEAPU8
> load" and avoid the mask, but it seems in other cases it misses this and
> pessimistically inserts the masks.
The |256 doesn't look unnecessary, but the &255 does -- not from the input value, but from the user, which is a store to a HEAP8. Emscripten optimizes both ways, but in a part of the compiler which isn't in SSA form, so it's blinded by a temporary variable. Such temporary variables are normally eliminated, but that is blocked here because of memory dependencies.
> It'd be nice to know if this was just an
> Emscripten problem or if this is something that would benefit from asm.js
> having smaller-than-32-bit integer support.
The case here is something we can fix in Emscripten. There are situations where having narrower types would enable more, but they're not very common.
Comment 8•11 years ago
|
||
The &255 is
> i14 = i14 - (HEAPU8[36240 + ((i14 | 256) - (HEAPU8[i1 + i13 >> 0] | 0)) >> 0] | 0) & 255;
? Yes, I agree with the analysis. On that line it can't be removed, but the use is a store to HEAP8, so it could. But the best we could do is replace the & 255 with >>> 0 (or | 0 if we are ok with changing it to signed, which in this case is valid). Would that still be worth doing?
Flags: needinfo?(azakai)
![]() |
||
Comment 9•11 years ago
|
||
Probably not; I was mostly thinking backwards about the |256 and thinking that we had a general problem with byte arithmetic, but I don't think that's the case.
Mostly now I'm just curious to see if Kaku could compare performance of asm.js/native on 32-bit vs 64-bit (specifically grabbing a 64-bit build from http://nightly.mozilla.org) as Jan was suggesting to see if byte-stores are likely the problem.
Reporter | ||
Comment 10•11 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #9)
> Mostly now I'm just curious to see if Kaku could compare performance of
> asm.js/native on 32-bit vs 64-bit (specifically grabbing a 64-bit build from
> http://nightly.mozilla.org) as Jan was suggesting to see if byte-stores are
> likely the problem.
Sorry for my late reply. Actually, I did all the test with the Firefox built by myself on a 64bit Linux environment. So, I think my Firefox is already a 64bit version. I do also download the Nightly 64bit verison, and the performance result isalmost the same to the original one.
Do I misunderstand anything?
Flags: needinfo?(luke)
Comment 12•7 years ago
|
||
Per policy at https://wiki.mozilla.org/Bug_Triage/Projects/Bug_Handling/Bug_Husbandry#Inactive_Bugs. If this bug is not an enhancement request or a bug not present in a supported release of Firefox, then it may be reopened.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INACTIVE
Updated•7 years ago
|
Status: RESOLVED → REOPENED
Resolution: INACTIVE → ---
Updated•3 years ago
|
Severity: normal → S3
Comment 13•2 months ago
|
||
ASM.js is replaced by WASM these days.
Status: REOPENED → RESOLVED
Closed: 7 years ago → 2 months ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•