Open Bug 1823931 Opened 1 year ago Updated 1 year ago

[...array] is slower than in V8

Categories

(Core :: JavaScript Engine, enhancement, P2)

enhancement

Tracking

()

People

(Reporter: mstange, Unassigned)

References

(Blocks 2 open bugs)

Details

In a comparison between Spidermonkey's BaselineInterpreter and V8's interpreter, the following function from TodoMVC-Preact stood out as being 30x slower:

    function getTodos() {
      return [...todos];
    }

SM: https://share.firefox.dev/3JYTzKP
V8: https://share.firefox.dev/3n6J1QR

Spidermonkey is running the self-hosted next() function, whereas V8 has a CreateArrayFromIterable bytecode op.

Spidermonkey bytecode:

00000:   2  NewArray 0                  # []
00005:   2  Zero                        # [] 0
00006:   2  GetGName "todos"            # [] 0 todos
00011:   2  Dup                         # [] 0 todos todos
00012:   2  Symbol 1                    # [] 0 todos todos Symbol.iterator
00014:   2  GetElem                     # [] 0 todos todos[Symbol.iterator]
00015:   2  Swap                        # [] 0 todos[Symbol.iterator] todos
00016:   2  CallIter 0                  # [] 0 todos[Symbol.iterator]()
00019:   2  CheckIsObj 3                # [] 0 todos[Symbol.iterator]()
00021:   2  Dup                         # [] 0 todos[Symbol.iterator]() todos[Symbol.iterator]()
00022:   2  GetProp "next"              # [] 0 todos[Symbol.iterator]() todos[Symbol.iterator]().next
00027:   2  Swap                        # [] 0 todos[Symbol.iterator]().next todos[Symbol.iterator]()
00028:   2  Pick 3                      # 0 todos[Symbol.iterator]().next todos[Symbol.iterator]() []
00030:   2  Pick 3                      # todos[Symbol.iterator]().next todos[Symbol.iterator]() [] 0
00032:   2  LoopHead (ic: 5, depthHint: 1) # todos[Symbol.iterator]().next todos[Symbol.iterator]() [] 0
00038:   2  DupAt 3                     # todos[Symbol.iterator]().next todos[Symbol.iterator]() [] 0 todos[Symbol.iterator]().next
00042:   2  DupAt 3                     # todos[Symbol.iterator]().next todos[Symbol.iterator]() [] 0 todos[Symbol.iterator]().next todos[Symbol.iterator]()
00046:   2  Call 0                      # todos[Symbol.iterator]().next todos[Symbol.iterator]() [] 0 todos[Symbol.iterator]().next()
00049:   2  CheckIsObj 0                # todos[Symbol.iterator]().next todos[Symbol.iterator]() [] 0 todos[Symbol.iterator]().next()
00051:   2  Dup                         # todos[Symbol.iterator]().next todos[Symbol.iterator]() [] 0 todos[Symbol.iterator]().next() todos[Symbol.iterator]().next()
00052:   2  GetProp "done"              # todos[Symbol.iterator]().next todos[Symbol.iterator]() [] 0 todos[Symbol.iterator]().next() todos[Symbol.iterator]().next().done
00057:   2  JumpIfTrue 78 (+21)         # todos[Symbol.iterator]().next todos[Symbol.iterator]() [] 0 todos[Symbol.iterator]().next()
00062:   2  JumpTarget (ic: 8)          # todos[Symbol.iterator]().next todos[Symbol.iterator]() [] 0 todos[Symbol.iterator]().next()
00067:   2  GetProp "value"             # todos[Symbol.iterator]().next todos[Symbol.iterator]() [] 0 todos[Symbol.iterator]().next().value
00072:   2  InitElemInc                 # todos[Symbol.iterator]().next todos[Symbol.iterator]() [...] (intermediate value)
00073:   2  Goto 32 (-41)               # todos[Symbol.iterator]().next todos[Symbol.iterator]() [...] (intermediate value)
00078:   2  JumpTarget (ic: 10)         # todos[Symbol.iterator]().next todos[Symbol.iterator]() [] 0 todos[Symbol.iterator]().next()
00083:   2  Pick 4                      # todos[Symbol.iterator]() [] 0 todos[Symbol.iterator]().next() todos[Symbol.iterator]().next
00085:   2  Pick 4                      # [] 0 todos[Symbol.iterator]().next() todos[Symbol.iterator]().next todos[Symbol.iterator]()
00087:   2  PopN 3                      # [] 0
00090:   2  Pop                         # []
00091:   3  Return                      #
00092:   3  RetRval                     # !!! UNREACHABLE !!!

V8 bytecode:

[generated bytecode for function: getTodos (0x160e0019a7cd <SharedFunctionInfo getTodos>)]
Bytecode length: 5
Parameter count 1
Register count 0
Frame size 0
Bytecode age: 0
         0x160e0019b74e @    0 : 21 00 00          LdaGlobal [0], [0]
         0x160e0019b751 @    3 : 7a                CreateArrayFromIterable
         0x160e0019b752 @    4 : a9                Return

JSC bytecode:

[   0] enter
[   1] get_scope          dst:loc4
[   3] mov                dst:loc5, src:loc4
[   6] check_traps
[   7] resolve_scope      dst:loc7, scope:loc4, var:0, resolveType:GlobalProperty, localScopeDepth:0
[  14] get_from_scope     dst:loc8, scope:loc7, var:0, getPutInfo:2048<ThrowIfNotFound|GlobalProperty|NotInitialization|NotStrictMode>, localScopeDepth:0, offset:0
[  22] mov                dst:loc7, src:loc8
[  25] spread             dst:loc6, argument:loc7
[  28] new_array_with_spread dst:loc6, argv:loc6, argc:1, bitVector:0
[  33] ret                value:loc6

In this test, we turned off Baseline/Ion for everything, even for self-hosted code. If self-hosted code runs in Baseline and we only force the interpreter for non-self-hosted code, the difference will probably be less pronounced.

We could maybe do something similar to OptimizeSpreadCall where we specifically handle the [...array] case (and not, say, [1, ...array]). It looks like that's what V8 is doing; they have a dedicated op in this case, but if I add another fixed element into the array (eg [1, ...todos]) then they generate something much closer to what we have:

 0 : 7a 00 00 25       CreateArrayLiteral [0], [0], #37
 4 : 26 fa             Star r1
 6 : 12 01             LdaConstant [1]
 8 : 26 fb             Star r0
10 : 13 02 01          LdaGlobal [2], [1]
13 : 26 f7             Star r4
15 : b1 f7 03 05       GetIterator r4, [3], [5]
19 : a0 07             JumpIfJSReceiver [7] (0x8a10825017c @ 26)
21 : 61 b5 00 fb 00    CallRuntime [ThrowSymbolIteratorInvalid], r0-r0
26 : 26 f8             Star r3
28 : 28 f8 03 07       LdaNamedProperty r3, [3], [7]
32 : 26 f9             Star r2
34 : 58 f9 f8 10       CallProperty0 r2, r3, [16]
38 : 26 f7             Star r4
40 : a0 07             JumpIfJSReceiver [7] (0x8a108250191 @ 47)
42 : 61 ad 00 f7 01    CallRuntime [ThrowIteratorResultNotAnObject], r4-r4
47 : 28 f7 04 12       LdaNamedProperty r4, [4], [18]
51 : 97 13             JumpIfToBooleanTrue [19] (0x8a1082501a8 @ 70)
53 : 28 f7 05 09       LdaNamedProperty r4, [5], [9]
57 : 31 fa fb 0e       StaInArrayLiteral r1, r0, [14]
61 : 25 fb             Ldar r0
63 : 4c 0d             Inc [13]
65 : 26 fb             Star r0
67 : 8a 21 00          JumpLoop [33], [0] (0x8a108250184 @ 34)
70 : 25 fa             Ldar r1
72 : aa                Return 

This is a medium-sized amount of work. Unless this function is really hot, or the pattern shows up elsewhere, I doubt this is our lowest-hanging fruit, but we should keep it on our radar.

JSC still handles { [1,...todos] } at a high level:

 [   0] enter
[   1] get_scope          dst:loc4
[   3] mov                dst:loc5, src:loc4
[   6] check_traps
[   7] mov                dst:loc6, src:Int32: 1(const0)
[  10] resolve_scope      dst:loc8, scope:loc4, var:0, resolveType:GlobalProperty, localScopeDepth:0
[  17] get_from_scope     dst:loc9, scope:loc8, var:0, getPutInfo:2048<ThrowIfNotFound|GlobalProperty|NotInitialization|NotStrictMode>, localScopeDepth:0, offset:0
[  25] mov                dst:loc8, src:loc9
[  28] spread             dst:loc7, argument:loc8
[  31] new_array_with_spread dst:loc6, argv:loc6, argc:2, bitVector:0
[  36] ret                value:Undefined(const1)

getTodos() is 1.1% of the updated TodoMVC-Preact benchmark. It's probably called 9 times in total (3 renders of the App component x 3 calls per App render)

Wow, preact is small. If it's called 9 times in total, it's unlikely to get out of the interpreter.. Maybe into blinterp, if we lower our thresholds.

If we decide to do anything here, we should go look at how JSC implements their spread op. Just looking at the bytecode makes it seem relatively generic, and one of our problems with optimizing spread calls / spread array initialization / various forms of destructuring is that we've never found a good abstraction that would let us avoid writing a bunch of bespoke implementations.

Severity: -- → N/A
Type: task → enhancement
Priority: -- → P2
You need to log in before you can comment on or make changes to this bug.