Closed Bug 683471 Opened 13 years ago Closed 2 months ago

Consider pattern-matching setTimout("foo()")

Categories

(Core :: JavaScript Engine, defect)

Other Branch
defect

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox9 - ---

People

(Reporter: igor, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: regression)

+++ This bug was initially created as a clone of Bug #674251 +++

We should investigate the TDHTML regression first found in the bug 674251 and now revealed itself after type-inference branch landing. See bug 674251 comment 58 and later.
One interesting question here is how using jemalloc on mac affects this, if at all.
And requesting tracking for Fx9 for this performance regression.
Keywords: regression
(In reply to Boris Zbarsky (:bz) from comment #2)
> And requesting tracking for Fx9 for this performance regression.

Can you post in this bug a clear listing of what the regressions are that you want to be required fixed for Fx9?
I'm not requiring a fix.

I'm requiring some understanding for the 2x performance regression on the scrolling Tdhtml test, at the very least....
We keep taking patches that significantly ding performance metrics. Bug 683631 just went past my inbox too. I think we should maintain a high standard for performance regressions and either back out the regressor or start working on a follow-up fix right away.
Bug 683631 is much more important than this one: it reflects a slowdown when webpages interact with the DOM, something which real webpages definitely do.  I want to understand and fix bug 683631 within the next two weeks (I suspect it is from not having getter/setter paths in JM for DOM accesses).

This bug is for a test suite that manufactures tens of thousands of short scripts (bug 674251 comment 55), and the regression is caused by hanging onto those short scripts a few seconds longer (bug 674251 comment 58).  Maybe this is something which real websites do, but it isn't what this benchmark is supposed to be testing.  It is bad for benchmarks testing X to actually be gated on Y, and I think the TDHTML tests should just be changed.
The setTimeout(string) code pattern is definitely something websites do, in spite of attempts to convince them otherwise.  This benchmark in particular was reduced from an actual site...

It may be worth changing the benchmark to be non-silly yes, but I'd still like to understand why the hit is test-specific.  That part is odd.  Maybe a matter of GC timing?
(In reply to Brian Hackett from comment #6)
> This bug is for a test suite that manufactures tens of thousands of short
> scripts (bug 674251 comment 55), and the regression is caused by hanging
> onto those short scripts a few seconds longer (bug 674251 comment 58). 

As Boris pointed out the regression could be due to a number of loops over all JSScript instances that are done by JM/TM code. Delaying the script destruction until the GC could made that more visible. It should be possible to verify that by having a try server run based on a revision prior TI landing where cx->free(script) is commented out in DestroyScript effectively leaking the malloc memory but avoiding the extra cost for loops over all scripts.

If that would not show the regression, then we know that we have problems in JM/TM regarding script management. If the regression would be on the scale that is observed with TI landing or like that from my initial attempts to land the bug 674251, then the problem is in MAC allocation inefficiencies that using jemalloc presumably would address.
Results of TDHTML experiments on the try server for OSX Opt:

base (MC tip before TI): 309
base with cx->free(script) disabled: 312 (http://tbpl.allizom.org/?tree=Try&usebuildbot=1&rev=231a07b34872)
base with DestroyScript() in EvaluateScript commented out: 322
Current MC tip: 322

So there is a small regression coming from leaking script malloc, but the main regression comes from the dead scripts affecting live performance. So it looks like dead script enumeration or perhaps JIT data allocated for no longer reachable scripts affects the performance rather significantly.
The results above implies that TI and scripts-as-GC things just exposed the problem that we have before. They are not responsible for the problem on its own and at worst can contribute only to the regression on the scale of malloc leak.
Thats not how things work around here. If you expose it, you own it.
>So there is a small regression coming from leaking script malloc, 

What does that mean exactly? And is that from scripts-as-GCThings, or from TI?

>but the main regression comes from the dead scripts affecting live performance.

Is this from scripts-as-GCThings, or from TI?

Assuming it's true that this is mostly coming from looping over dead scripts, we have to fix this.
Summary: Investigate OSX-specific TDHTML regression due to releasing scripts only during GC → JM+TI: OSX-specific TDHTML regression due to releasing scripts only during GC
(In reply to David Mandelin from comment #12)
> >So there is a small regression coming from leaking script malloc, 
> 
> What does that mean exactly? And is that from scripts-as-GCThings, or from
> TI?

Both scripts-as-GCThings and TI require to destroy scripts only during the GC. As such any of this bugs alone or combined trigger the same regression.

> 
> >but the main regression comes from the dead scripts affecting live performance.
> 
> Is this from scripts-as-GCThings, or from TI?

Again, any of these patches exposes the problem that were previously hidden.

> Assuming it's true that this is mostly coming from looping over dead
> scripts, we have to fix this.

I suspect it is either loops or hash tables bloated by the dead scripts that caused the regression as all tests/experiments points into this direction.
(In reply to Igor Bukanov from comment #13)
> (In reply to David Mandelin from comment #12)
> > >So there is a small regression coming from leaking script malloc, 
> > 
> > What does that mean exactly? And is that from scripts-as-GCThings, or from
> > TI?
> 
> Both scripts-as-GCThings and TI require to destroy scripts only during the
> GC. As such any of this bugs alone or combined trigger the same regression.

Cool, thanks.
We took this regression and now this bug is unassigned and not moving.

We need a hero. Someone who knows how to use Shark.
What's the current status of this bug?
Brian says this will not affect perf in practice.
Why not?

Note that the Tdhtml testcases were based on actual websites at one point.
What is the website the testcase was based on?  The testcase basically schedules tons of timeouts at short intervals which do a puny amount of work when they fire.  So even if a website had behavior similar to this, I suspect the testcase has exaggerated that behavior to the point where it becomes pathological.
I think the website is gone at this point... but the behavior you describe is not uncommon for JS animations, actually.  :(
Is the issue here that these are string-based timeouts, and we're ending up with tons of JSScripts from all those strings?

And there are no other ways that we can end up with tons of JSScripts like that (e.g. lots of inline event handlers, etc)?

If so, I agree with not worrying about this for now and setting up something similar to the eval cache for setTimeout that maps strings to scripts.
I like the idea in comment #21 a lot. Brian, is that feasible to implement quickly?
(In reply to Andreas Gal :gal from comment #22)
> I like the idea in comment #21 a lot. 

IIRC the main culprit for the slowdown came from the test doing:

function animation_function(x, y) {
    ...
    setTimeout("animation_function("+(x + x_step)+", "+(y + y_step)+")");
} 

The caching is not helpful in this case as the source changes at each setTimeout call.
I think instead of caching it maybe worth to try to recognize the timeout scripts in the form of function_name(constant_parameters) and simply turn that into a function call.
I don't think it would be a huge amount of work to either expand the eval cache, add another similar cache or pattern match setTimeout("foo()").  We really need a real website though that spends a measurable amount of time repeatedly compiling scripts and which would be helped by such changes.
(In reply to Brian Hackett from comment #25)
> We
> really need a real website though that spends a measurable amount of time
> repeatedly compiling scripts and which would be helped by such changes.

I do not see it this way. The Talos test shows that holding about 1000 JSScript instances at least until the GC has measurable effects on performance. Many complex sites has more functions and corresponding script instances than that so we should not ignore this bug as an example of pathological code that does not exist on the real Web.

As Boris pointed out in the bug 674251 this could be due to O(N^2) or similar scalability problems in our GC implementation as we loop over all scripts in a few places doing non-trivial amount of work for each script. So we should check this first.
(In reply to Igor Bukanov from comment #26)
> (In reply to Brian Hackett from comment #25)
> > We
> > really need a real website though that spends a measurable amount of time
> > repeatedly compiling scripts and which would be helped by such changes.
> 
> I do not see it this way. The Talos test shows that holding about 1000
> JSScript instances at least until the GC has measurable effects on
> performance. Many complex sites has more functions and corresponding script
> instances than that so we should not ignore this bug as an example of
> pathological code that does not exist on the real Web.
> 
> As Boris pointed out in the bug 674251 this could be due to O(N^2) or
> similar scalability problems in our GC implementation as we loop over all
> scripts in a few places doing non-trivial amount of work for each script. So
> we should check this first.

OK. Back in comment 12, I wrote:

>Assuming it's true that this is mostly coming from looping over dead scripts, we 
>have to fix this.

Is it true that this is mostly from looping over dead scripts? If so, it seems like "change it so we don't loop over dead scripts" would take care of this.
(In reply to David Mandelin from comment #27)
> Is it true that this is mostly from looping over dead scripts? If so, it
> seems like "change it so we don't loop over dead scripts" would take care of
> this.

I do not know if this is attributed to loops over dead scripts or not. In shell I see no pathological behavior in the following test case that tries to simulate the Talos test:

function foo(x) {
    return Function("return foo("+(x + 1)+")");
}

function test(N) {
    var test_start = Date.now();
    var f = foo;
    for (var i = 0; i != N; ++i)
	f = f(i)();
    var gc_start = Date.now();
    gc();
    gc();
    var end_time = Date.now();
    return [end_time - test_start, end_time - gc_start];
}

var first_run_time;
for (var i = 1; i <= 10; ++i) {
    var times = test(i * 10*1000);
    if (!first_run_time)
	first_run_time = times[0];
    print("gc_time="+times[1]+" full_time="+times[0]+" scaling="+(times[0]/first_run_time/i).toFixed(3)); 
}


The output shows O(1) behavior in the GC and linear scaling in total timing with the number of scripts:

gc_time=2 full_time=139 scaling=1.000
gc_time=4 full_time=272 scaling=0.978
gc_time=6 full_time=412 scaling=0.988
gc_time=2 full_time=545 scaling=0.980
gc_time=3 full_time=679 scaling=0.977
gc_time=5 full_time=815 scaling=0.977
gc_time=2 full_time=950 scaling=0.976
gc_time=3 full_time=1085 scaling=0.976
gc_time=5 full_time=1220 scaling=0.975
gc_time=1 full_time=1357 scaling=0.976
See comment 23 and following.

Besides it being unclear whether this would translate into any real-world wins, however, I'm not convinced that translating this into a function call really is straight-forward.

Consider the concrete example in comment 23: I suppose we'd transform the setTimeout into something roughly like

setTimeout(animation_function, timeout, x + x_step, y + y_step);

How are we going to guarantee that eagerly evaluating the parameters won't get different results from lazily evaluating them at resolution of the timeout? even if we detect that x and y can't change, x_step and y_step certainly could.

Obviously, I might just miss some details of how setTimeout works.
Summary: JM+TI: OSX-specific TDHTML regression due to releasing scripts only during GC → Consider pattern-matching setTimout("foo()")
Assignee: general → nobody
Severity: normal → S3

If I understand correctly, This issue has been investigated, and the only conclusion seems to be that it would be unreasonable to handle these corner cases.

So, maybe we should not keep this bug open, and if this is an issue again, it would be rediscovered in our process of optimizing Firefox today.

Status: NEW → RESOLVED
Closed: 2 months ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.