Open Bug 1520251 Opened 5 years ago Updated 2 years ago

Slow performance for accessing objects in an array with 7+ elements

Categories

(Core :: JavaScript Engine: JIT, defect, P3)

66 Branch
x86_64
Linux
defect

Tracking

()

Performance Impact low
Tracking Status
firefox66 --- fix-optional

People

(Reporter: witold.baryluk+mozilla, Unassigned)

Details

(Keywords: perf:responsiveness)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:66.0) Gecko/20100101 Firefox/66.0

Steps to reproduce:

Hi,

I noticed something strange.

If I do loop over elements of an array with 7 elements, it is MUCH slower than looping over array with 6 elements.

function go(a) {
  for (let i = 0.0; i < 100000000; i++) {
    for (let i = a.length; i--; ) {
      const b = a[i].p;
    }
  }
}

// When I use more than 6 objects in Firefox, performance drops significantly.

// 60.3.0esr (64-bit) from Debian.
// and
// 66.0a1 (2018-12-27) (64-bit) from mozilla.org , firefox-66.0a1.en-US.linux-x86_64.tar.bz2
// In fact it is even slower in 66.0!

/*
"sometimes" below means sporadic, and appears to be extremally sensitive
to the content of the file. I.e. just changing comments can trigger
deterministically a change. Or doing refresh (Ctrl-R, or Shift-Ctrl-R).
Maybe related to hash table randomization somewhere.
*/

/* Time below means a total time for entire benchmark 100M loops of loops over the array.
 * I expect the numbers to climb linearly. But they don't. */

function gent1() {
  return [
    // empty   // 70ms
    {p: 1.0},  // x[0] // 146ms, sometimes 140ms
    {p: 1.0},  // x[1] // 211ms
    {p: 1.0},  // x[2] // 279ms, sometimes 256ms
    {p: 1.0},  // x[3] // 349ms, sometimes 370ms or 385ms, or 325ms.
    {p: 1.0},  // x[4] // 370ms, sometimes 375ms, or 440ms or 420ms
    {p: 1.0},  // x[5] // 520ms, unstable timings!, sometimes 445ms.
    {p: 1.0},  // x[6] // 940ms. SLOW. but consistent. BUT SLOW.
    {p: 1.0},  // x[7] // 1050ms. consistent.
    {p: 1.0},  // x[8] // 1165ms. consistent.
  ];
}

/*
 This is a minimized case, but it was discovered in a code that was much
 more complex (300 lines of code, with complex loop bodies, having about
 50 instructions and helper function calls).
*/

const t1 = gent1();

for (let i = 0; i < 10; i++) {
  console.time("go-t1_"+i);
  go(t1);
  console.timeEnd("go-t1_"+i);
}

// Similar behavior when the array is declared out of function, but still returned via a function.

/*
let t0 = [
    {p: 1.0},
    {p: 1.0},
    {p: 1.0},
    {p: 1.0},
    {p: 1.0},
    {p: 1.0},
    {p: 1.0},
//    {p: 1.0},
  ];

function gent2() {
  return t0;
}

const t2 = gent2();

for (let i = 0; i < 10; i++) {
  console.time("go-t2_"+i);
  go(t2);
  console.timeEnd("go-t2_"+i);
}
*/

Actual results:

There is sudden drop in performance between when going from array with 6 elements, to array with 7 elements.

Component: Untriaged → JavaScript Engine: JIT
OS: Unspecified → Linux
Product: Firefox → Core
Hardware: Unspecified → x86_64

This does not look like a case where Scalar Replacement would be able to work with, as the array is created before we OSR in the loop. So I do not know what would slow down the array access.

One hypothesis is that this could be a shape/object-group guard issue, which increase in size of elements being monitored, or that we simply generate an IC chain.

Flags: needinfo?(iireland)
Priority: -- → P2
Whiteboard: [qf]
Whiteboard: [qf] → [qf:p3:responsiveness]

Could also be related to TYPE_FLAG_OBJECT_COUNT_LIMIT, which sets the TYPE_FLAG_ANYOBJECT flag in the type set, so that TypeSet::unknownObject returns true, which in turn can disable multiple optimizations. Inspecting the ion-graph should be an easy way to validate that hypothesis, because "unknown objects" lead to emitting different MIR nodes.

So, as soon as 7 objects are present, the generated MIR shows we actually load and access any properties:

Inner loop body MIR nodes with six objects:

45 boundscheck sub42 initializedlength30 I[-2147483648, 2147483647] : Int32
46 nop
47 goto block5

And with seven (or more) objects:

45 boundscheck sub42 initializedlength30 I[-2147483648, 2147483647] : Int32
46 spectremaskindex sub42 initializedlength30 I[-2147483648, 2147483647] : Int32
47 loadelement elements28 spectremaskindex46 Object
100 keepaliveobject phi24
48 guardshape loadelement47 Object
49 loadfixedslotandunbox guardshape48 Int32
50 typebarrier loadfixedslotandunbox49 Int32
51 nop
52 goto block5

After changing TYPE_FLAG_OBJECT_COUNT_LIMIT from 7 to 8, the results shifted and only when eight or more objects are present, we start to load any properties.

For the test case from comment #0, it's easy to workaround the current limitation by ensuring that all objects are created from the same call site, e.g.

function gent1() {
  var o = p => ({p});
  return [
    o(1.0), o(1.0), o(1.0), o(1.0),
    o(1.0), o(1.0), o(1.0), o(1.0),
  ];
}

And for an additional speed boost, the loop condition can be changed to:

for (let i = a.length - 1; i >= 0; i--) {
   // <body>
}

because for (let i = a.length; i--; ) ... seems to be too tricky for range analysis to understand, so the bounds check currently doesn't get hoisted outside of the loop body.

Thanks for the analysis, anba!

The underlying issue is that we don't have special handling for object literals inside an array. Each |{p: 1.0}| has a different allocation site and is assigned a different group. We therefore hit TYPE_FLAG_OBJECT_COUNT_LIMIT very quickly when iterating over the elements of the array.

In terms of changing Spidermonkey to make this code faster: Increasing TYPE_FLAG_OBJECT_COUNT_LIMIT won't fly. In theory, if we see an array literal containing a bunch of object literals with the same shape, we could try to ensure that the objects all end up with the same group. In practice, my quick glance at the bytecode emitter indicates that it might be awkward to implement. I'm also not sure how common this kind of code is outside of microbenchmarks.

As a conservative upper bound, I added code to detect arrays full of object literals (whether or not the shape was the same), then loaded Facebook/Gmail/GDocs/CNN and puttered around for a few minutes. I ended up with ~100 hits, which is not low enough for me to want to close this bug but not high enough for me to want to mark it as a high priority.

(PS: Note that this is the exact opposite problem from bug 1517534. In this bug, we create too many object groups and pessimize. In that bug, we create too few object groups and get insufficiently precise results.)

Flags: needinfo?(iireland)
Priority: P2 → P3
Performance Impact: --- → P3
Whiteboard: [qf:p3:responsiveness]

The bug has a release status flag that shows some version of Firefox is affected, thus it will be considered confirmed.

Status: UNCONFIRMED → NEW
Ever confirmed: true
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.