Closed Bug 1097958 Opened 11 years ago Closed 10 years ago

optimize Octane-zlib

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 1124377

People

(Reporter: luke, Unassigned)

Details

With --turbo_asm on the new OSX 10.10 machines, v8 is faster than SM on Octane-zlib. Interestingly, the old OSX 10.6 machines show --turbo_asm to be slower than SM and asmjs-apps-zlib-throughput shows SM and v8 matched. The FF builtin profiler shows that most of the time is in a handful of hot zlib asm.js functions so it'd good to profile these a bit more and see if there is any low-hanging fruit in the codegen.
The inner loop in longest_match in zlib: A = z + 1 | 0; if ((a[A >> 0] | 0) != (a[y + 1 >> 0] | 0)) { z = A; break } A = z + 2 | 0; if ((a[A >> 0] | 0) != (a[y + 2 >> 0] | 0)) { z = A; break } A = z + 3 | 0; if ((a[A >> 0] | 0) != (a[y + 3 >> 0] | 0)) { z = A; break } ... really wants bug 986981. The assembly code has lots of adds. And copies for the adds. Another thing is that at the bottom of the loop, the asm.js code has a bitwise '&' where the C++ code had a short-circuit '&&'. We improved this kind of code in Emscripten's backend quite a while ago, but Octane zlib was compiled before that. There are a bunch of little things we could do better on the code in Octane's version, though it's not clear how much work we should do to optimize to that degree for code patterns that Emscripten doesn't use anymore.
If I understand correctly this is similar to the issue being addressed in bug 915157 (Odin) and bug 1080477 (Ion). If octane-zlib is the same or similar to the Emscripten zlib benchmark then the results in bug 915157 comment 26 are very relevant and show a very useful performance improvement on this benchmark from de-hoisting the index offsets. The Emscripten aggressive variable elimination did not significantly change this code. The "A = z + 1" is not de-hoisted into the first access "a[A >> 0]" because it has other uses. I can try some hand optimization to see if this helps, and if so then Emscripten might be able to use better heuristics. The patches currently match very specific patterns that use masking to eliminate the bounds check, eg. b[i&m+c>>n], so would not help here. For Ion it might be possible to adjust the bounds check too and then aggressively hoist the bounds checks? For example, hoist a bounds check that z+8 is in bounds and y+7 is in bounds? So check that uint32_t(z)<(length-8). (In reply to Dan Gohman [:sunfish] from comment #1) > The inner loop in longest_match in zlib: > > A = z + 1 | 0; > if ((a[A >> 0] | 0) != (a[y + 1 >> 0] | 0)) { > z = A; > break > } ... > really wants bug 986981. The assembly code has lots of adds. And copies for > the adds.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.