Open Bug 1393712 Opened 7 years ago Updated 2 years ago

for-await-of with Async Generator is slow

Categories

(Core :: JavaScript Engine, enhancement, P2)

enhancement

Tracking

()

Performance Impact low
Tracking Status
firefox57 --- affected

People

(Reporter: arai, Unassigned)

References

(Depends on 5 open bugs)

Details

(Keywords: perf, Whiteboard: [js:perf])

from https://github.com/tc39/proposal-async-iteration/issues/112

--- code ---

async function* asyncRange(from, to) {
  for (let i = from; i < to; i++) {
    yield i;
  }
}

async function main() {
  const start = Date.now()
  // Prevent dead code elimination although I doubt
  // there's any yet
  let total = 0
  for await (const i of asyncRange(0, 550000)) {
    total += i
  }
  const end = Date.now()
  console.log(`Total: ${ total }`)
  console.log(`Time Taken: ${ end - start }`)
}

main()

--------

on JS shell, it takes ~5 seconds, and on browser, it takes ~15 seconds.

haven't checked yet but bug 1317481 might be related.
(In reply to Tooru Fujisawa [:arai] from comment #0)
> haven't checked yet but bug 1317481 might be related.

That, and all the various other Promise optimizations. Plus, importantly, optimizations for Generators. I added a dependency on bug 1317690 for that, but I'm pretty sure that even without Ion compilation there's more we can do to make Generators faster.
Depends on: 1317481, 1342037, 1317690
A big part of this is when have generators, we turn all bindings into heap allocated instead of using the frame. This is done to avoid copying stack on every yield. This is a problem when there are lets in a loop we are allocating a new lexical environment each iteration. I'm in the process of looking at a solution for this. One one of the ARES-6 benchmarks, something like 40% of runtime is eliminated by turning the let into a var.
when I profile the code on shell, I see CaptureCurrentStack takes some amount of time in several places.
while it's useful information for debugging, there might be some chance that internal-use-only promises don't need that information.
might be nice to take a look.
with default configuration, it takes 5 seconds, but with --no-async-stacks, it takes 3.5 seconds.
also, I see CreateIterResultObject. that currently uses DefineProperty,
but that could use fixed slots, like RegExp match result object?
Depends on: 1394682
(In reply to Tooru Fujisawa [:arai] from comment #3)
> when I profile the code on shell, I see CaptureCurrentStack takes some
> amount of time in several places.
> while it's useful information for debugging, there might be some chance that
> internal-use-only promises don't need that information.
> might be nice to take a look.

or perhaps we could share the captured stack object between 2 promises?
(In reply to Tooru Fujisawa [:arai] from comment #6)
> (In reply to Tooru Fujisawa [:arai] from comment #3)
> > when I profile the code on shell, I see CaptureCurrentStack takes some
> > amount of time in several places.
> > while it's useful information for debugging, there might be some chance that
> > internal-use-only promises don't need that information.
> > might be nice to take a look.
> 
> or perhaps we could share the captured stack object between 2 promises?

That seems worthwhile, yes.

In general, I wouldn't worry about the async stacks too much: they're not enabled in release builds of the browser, and eventually should be disabled in Nightly as well, as long as the debugger isn't open. See bug 1280819.

Ultimately, bug 1317481 is what we'll want here, I think.
I'm wondering if SpeciesConstructor called in js::OriginalPromiseThen can be optimized out.
that's taking 4% of the whole time, after bug 1394682 fix.
(In reply to Tooru Fujisawa [:arai] from comment #8)
> I'm wondering if SpeciesConstructor called in js::OriginalPromiseThen can be
> optimized out.

looks like bug 1386534 does that.
Depends on: 1386534
NewDenseCopiedArray in GeneratorObject::suspend also takes ~5% of whole time.
would it be possible to reuse the array between suspend calls?
Depends on: 1396499
surely bug 1317481 improves the performance
  3587 [ms] => 710 [ms]
Priority: -- → P2
Whiteboard: [js:perf][qf:p3]
Keywords: perf
somehow, current best performance after bug 1396499 and bug 1317481 is ~2200 [ms], not ~700 [ms].
I guess I should figure out when and why the performance gain reduced.

then, the another thing I noticed in profiler is, the allocation of array and function inside Promise job handling.
I'm not sure if it's possible to somehow reuse them or reduce the allocation tho...

also, it might be possible to inline JSOP_YIELD in baseline in some case, after fixing bug 1396499.
(In reply to Tooru Fujisawa [:arai] from comment #12)
> then, the another thing I noticed in profiler is, the allocation of array
> and function inside Promise job handling.
> I'm not sure if it's possible to somehow reuse them or reduce the allocation
> tho...

bug 1342044 seems to be related.
Depends on: 903457
(In reply to Tooru Fujisawa [:arai] from comment #12)
> also, it might be possible to inline JSOP_YIELD in baseline in some case,
> after fixing bug 1396499.

filed bug 1409231
(In reply to Tooru Fujisawa [:arai] from comment #12)
> somehow, current best performance after bug 1396499 and bug 1317481 is ~2200
> [ms], not ~700 [ms].
> I guess I should figure out when and why the performance gain reduced.

properly rebased and now the time is ~500 [ms] :)
with fixed version, the following code takes long:
  13.7%  AsyncGeneratorRequest::create
   8.2%  LexicalEnvironmentObject::create
   6.7%  CreatePromiseObjectWithoutResolutionFunctions
   6.3%  CreateIterResultObject
   6.2%  ResolvePromiseInternal
Depends on: 1410283
(In reply to Tooru Fujisawa [:arai] from comment #16)
> with fixed version, the following code takes long:
>   13.7%  AsyncGeneratorRequest::create

filed bug 1410283.
After bug 1317481 (and others), GetPropertyPure inside PromiseOptimizable is taking some amount of time (~5% of total).
it needs to check the property value, so we cannot use shape...
(In reply to Tooru Fujisawa [:arai] from comment #18)
> After bug 1317481 (and others), GetPropertyPure inside PromiseOptimizable is
> taking some amount of time (~5% of total).
> it needs to check the property value, so we cannot use shape...

filed bug 1410695.
Depends on: 1410695
with all current WIP patches applied, the performance difference between let+const vs var is the following:
  original (let+const): ~400 [ms]
  replaced them to var: ~310 [ms]
now the time taken by setFixedSlot looks somewhat significant, in some case.

28 [ms] (8.5%) is taken by CreateIterResultObject
20 [ms] (6.3%) is taken by AsyncGeneratorObject::createRequest, that does setFixedSlot (the request object itself is cached)
10 [ms] (3.6%) is taken by ResolvePromiseInternal that does NativeGetProperty
 9 [ms] (2.7%) is taken by ResolvePromise that does setFixedSlot.

one more thing I can think of is, if we use Async Generator from for-await-of, the IterResultObject is not exposed to user script, I think, and we could reuse it, instead of creating it every time,
or perhaps, just somehow avoid creating the IterResultObject and pass value/done props directly?
Depends on: 1412202
(In reply to Ted Campbell [:tcampbell] (PTO until Oct 30) from comment #2)
> A big part of this is when have generators, we turn all bindings into heap
> allocated instead of using the frame. This is done to avoid copying stack on
> every yield. This is a problem when there are lets in a loop we are
> allocating a new lexical environment each iteration. I'm in the process of
> looking at a solution for this. One one of the ARES-6 benchmarks, something
> like 40% of runtime is eliminated by turning the let into a var.

filed bug 1412202.
Depends on: 1412206
Performance Impact: --- → P3
Whiteboard: [js:perf][qf:p3] → [js:perf]
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.