The default bug view has changed. See this FAQ.

IonMonkey: Investigate perf of realtime JS raytracer

RESOLVED WORKSFORME

Status

()

Core
JavaScript Engine
RESOLVED WORKSFORME
7 years ago
7 months ago

People

(Reporter: dmandelin, Unassigned)

Tracking

(Blocks: 1 bug, {perf})

Trunk
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(1 attachment)

(Reporter)

Description

7 years ago
See the URL. Chrome gets about 30fps on my machine. We don't go above 5fps.
(Reporter)

Comment 1

7 years ago
I forgot to say that jandem originally reported this, and also observed we use a lot of memory running it, which may be related to the slowness.
Keywords: perf
At least on Mac, a profile looks like this:

10% free() calls on the background GC thread.
20% in JM-generated jit code.
26% js_GC called from stubs::Interrupt; of this 20% is marking, including what
    looks like a bunch of arrays, and the rest is in js_GC itself.  Almost half
    the marking is JSScopeProperty::trace.
18% calling stubs::SetName.  About 2/3 of this is under js_GetMutableScope
    (mostly the malloc call and js_InitTitle and self time) and the other 1/3 in
    SetName itself.
 8% calling stubs::New.  Mostly self time and, js_GetProperty.
 3% calling stubs::Call.
 2% calling stubs::NewInitObject.
 1% canvas drawImage calls.

I also did an L2 miss profile, since that seemed like an obvious thing to check:

22% of our L2 misses were on the background gc thread.  
32% of misses are in (not under) PL_DHashTableOperate called from
    GCGraphBuilder::NoteScriptChild under array_trace.
10% of misses are in ChangeTable, called from the above-mentioned
    PL_DHashTableOperate.
10% of misses are malloc calls under js_GetMutableScope from SetName.

Comment 3

7 years ago
Might be worth retesting, in lieu of bug 558451 comment 154.

Comment 4

7 years ago
err, "in view of".  And Gregor already did, and he said the profile looked a lot different.
Would be good to get the testcase attached to this bug so it won't go away...
On TM, the patch linked to from comment 3 made us go from 1.5fps to 2.5fps.  ;)

Once that merges to JM, should remeasure and reprofile.
I filed bug 592007 because the GC heuristics doesn't work right now in the browser with the new Scope patch. There is also a file included that shows the before and after Scope patch GC times.
I experience the same issue in linux 4.0b7pre build (rev: f5c0015afe0e)

Oprofile results:
http://pastebin.mozilla.org/795750
OK, I just reprofiled (on m-c tip).

52% of the time is spent in methodjit-generated code.
24% is spent under gc (called from js_NewFinalizableGCThing).
 8% is under stubs::NewObject but outside gc (js_GetPropertyHelper, self time
    for NewObject, etc)

The rest is a few things under 2% each (drawImage on the canvas, js_math_floor, stubs::SetElem<0>, RunTracer, stubs::NewInitObject, InitPropOrMethod, etc.

Interestingly, with tracer on we're at 14fps for a bit before dropping to 7fps; I profiled the 7fps steady-state.
Duplicate of this bug: 602268
Created attachment 481918 [details]
Part of the problem

The raytracer creates canvas elements and uses them as image buffers. On the attached test case JM is 2x slower than Chrome. (TM is 5x faster than Chrome though). For 10 frames, filling the buffer (the loop body) is 55 ms faster (80 vs 25) in Chrome. 

Canvas elements are also used for textures, so fixing bug 594247 will help JM here.

Updated

7 years ago
Depends on: 594247

Updated

7 years ago
Attachment #481918 - Attachment is patch: false

Updated

7 years ago
Attachment #481918 - Attachment mime type: text/plain → text/html

Comment 12

6 years ago
I can reproduce after the fixing of bug 594247:
Minefield 4.0b13pre/2011023: 12 fps, hang every seconds
Chrome 11.0.672.2          : 27 fps
(Reporter)

Comment 13

6 years ago
Rereprofiled on Win 7. Top stuff as a percentage of JS time:

 GC (mark/sweep/etc)                40%
 AddPropertyHelper/AddAtomProperty   9
 Object::init                        7
 MathCache::lookup                   5
 DOUBLE_IS_INT32                     3

It looks like nothing short of a new GC will make this run well.
Yes this needs a generational GC.
We only perform per-compartment GCs but the set of reachable objects is still more than 3.1 mill.
We finalize about 4 mill objects per GC which is about 30% of the total GC time.
Moving this to the background would help but the marking is the expensive part here.
This is a very good example where we create the fixed set of 3149831 long lived objects and afterwards we only create short lived objects. The set of long lived objects stays exactly at 3149831 for the whole animation time.
Blocks: 619558
No longer blocks: 619558
Depends on: 619558
Nothing earth-shatteringly new to report here.

Current Chrome Canary gets between 90 and 120 fps (with the rotation looking everything but smooth, though).

Current Nightly consistently gets above 50fps, so that's great. However, it's stuttering horribly, so the fps counter might be a bit broken. Looking forward to seeing how ggc's going to fare, here.
Summary: JM: Investigate perf of realtime JS raytracer → IonMonkey: Investigate perf of realtime JS raytracer
(Assignee)

Updated

3 years ago
Assignee: general → nobody

Comment 16

2 years ago
On Ultra:
Chrome 39: 6 fps
Nightly: 8 fps

On Low:
Chrome 39: 160 fps
Nightly: 200 fps (but slows down every 3~4s)
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → WORKSFORME
The periodic slowdown might be interesting from a GC point of view. From comment #14, we shouldn't be able to GC anything long-lived (but that doesn't necessarily mean we won't try). Terrence, worth a look?
Flags: needinfo?(terrence)
I looked at the realtime raytracer relatively recently. The test creates 100's of MiB of garbage per second in tenured: we need to find out why the nursery is so lightly used on this test. This in itself isn't too bad as the incremetnal GC's appear to be mostly working okay, at least on my machine. However, this does expose a problem currently with creating this much garbage consistently: we seem to be releasing and immediately reacquiring 100's of MiB of chunk addresses every second because we have a 30MiB cap on empty chunks. We need to develop better limits here, however, this is largely dependent on getting better control of our GC triggers.
See Also: → bug 1301213
I split this off as bug 1301213.
Flags: needinfo?(terrence)
You need to log in before you can comment on or make changes to this bug.