Closed Bug 754303 Opened 12 years ago Closed 12 years ago

Periodic extended GC pauses when running Coppercube Backyard WebGL demo

Categories

(Core :: Graphics: CanvasWebGL, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla15
Tracking Status
firefox14 --- fixed
firefox15 --- fixed

People

(Reporter: cers, Assigned: bjacob)

References

Details

(Whiteboard: [js:p2])

Attachments

(1 file, 1 obsolete file)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:15.0) Gecko/15.0 Firefox/15.0a1
Build ID: 20120510145529

Steps to reproduce:

1) Go to http://www.ambiera.com/coppercube/demo.php?demo=backyard&mode=webgl
2) Click "Click here to start loading the demo"


Actual results:

After the demo has loaded, it runs smoothly but is regularly interrupted for extended periods by GC. It appears to be 2-5s pauses every 4-5s (that is, 4-5s after last pause)


Expected results:

Demo should run smoothly (or at least at a stable speed)
Assignee: nobody → general
Component: Untriaged → JavaScript Engine
Product: Firefox → Core
QA Contact: untriaged → general
Depends on: 735099
I still get huge GC pauses on this, even with incremental GC enabled (though this is with a few other tabs open).  Almost all of the time is in "end callback", whatever that is. (I tried it on OSX.)

GC(T+3355.4) Total Time: 4166.2ms, Compartments Collected: 1, Total Compartments: 370, MMU (20ms): 0%, MMU (50ms): 0%, Max Pause: 4122.6ms, Allocated: 217MB, +Chunks: 3, -Chunks: 0
    Slice: 0, Pause: 26.0 (When: 26.0ms, Reason: MAYBEGC): Mark: 22.9ms, Mark Roots: 12.9ms, Discard Code: 2.7ms
    Slice: 3, Pause: 4122.6 (When: 4272.3ms, Reason: REFRESH_FRAME): Mark: 1.0ms, Mark Other: 0.5ms, Finalize Start Callback: 0.4ms, Sweep: 173.6ms, Sweep Compartments: 7.5ms, Sweep Object: 158.7ms, Discard Code: 1.5ms, Discard Analysis: 6.0ms, Discard TI: 0.2ms, Sweep Types: 0.1ms, Finalize End Callback: 2.6ms, End Callback: 3948.2ms
    Totals: Mark: 41.1ms, Mark Roots: 12.9ms, Mark Other: 0.5ms, Finalize Start Callback: 0.4ms, Sweep: 173.6ms, Sweep Compartments: 7.5ms, Sweep Object: 158.7ms, Discard Code: 4.2ms, Discard Analysis: 6.0ms, Discard TI: 0.2ms, Sweep Types: 0.1ms, Finalize End Callback: 2.6ms, End Callback: 3948.2ms

GC(T+3344.7) Total Time: 2516.5ms, Compartments Collected: 1, Total Compartments: 364, MMU (20ms): 0%, MMU (50ms): 0%, Max Pause: 2472.7ms, Allocated: 213MB, +Chunks: 21, -Chunks: 143
    Slice: 0, Pause: 26.8 (When: 26.8ms, Reason: MAYBEGC): Mark: 24.3ms, Mark Roots: 14.3ms, Discard Code: 2.1ms
    Slice: 3, Pause: 2472.7 (When: 2667.9ms, Reason: REFRESH_FRAME): Mark: 5.5ms, Mark Other: 2.7ms, Finalize Start Callback: 0.4ms, Sweep: 136.2ms, Sweep Compartments: 7.4ms, Sweep Object: 121.5ms, Discard Code: 1.0ms, Discard Analysis: 6.3ms, Discard TI: 0.2ms, Sweep Types: 0.1ms, Finalize End Callback: 2.6ms, End Callback: 2333.3ms
    Totals: Mark: 46.4ms, Mark Roots: 14.3ms, Mark Delayed: 0.2ms, Mark Other: 2.7ms, Finalize Start Callback: 0.4ms, Sweep: 136.2ms, Sweep Compartments: 7.4ms, Sweep Object: 121.5ms, Discard Code: 3.2ms, Discard Analysis: 6.3ms, Discard TI: 0.2ms, Sweep Types: 0.1ms, Finalize End Callback: 2.6ms, End Callback: 2333.3ms
Status: UNCONFIRMED → NEW
Ever confirmed: true
The "End Callback" is responsible for doing the deferred release of C++ refcounted objects. So we have a bunch of wrapped natives or something, and they all get collected, and we end up running a lot of C++ destructors, which takes a while.

Bug 743112 was designed to help here. I need to get back to it--it was pretty orange the last time I worked on it. However, we should probably try to understand why we're doing so much releasing. Is it because there are tons of objects, or is it a few objects that are very slow to be destroyed? Given how little time we spend in the mark and sweep phases, it seems likely to be the latter. It might not be a bad idea to profile this thing and see if any destructors show up.
CC times didn't seem that bad, so I would guess that there aren't a huge number of wrapped natives.
Depends on: 743112
No longer depends on: 735099
Whiteboard: [js:p2]
I used a profiling build and the new chrome hang functionality to get call stacks from the hangs lasting more than a second. They all look the same:

??? (in MSVCR100.dll)
nsTArray_base::ShiftData(unsigned int,unsigned int,unsigned int,unsigned int,unsigned int) (in xul.dll)
nsTArray::Entry,nsTArrayDefaultAllocator>::RemoveElementsAt(unsigned int,unsigned int) (in xul.dll)
nsTArray::Entry,nsTArrayDefaultAllocator>::RemoveElementAt(unsigned int) (in xul.dll)
nsTArray::Entry,nsTArrayDefaultAllocator>::RemoveElementSorted::Entry,mozilla::WebGLFastArray::Entry::Comparator>(mozilla::WebGLFastArray::Entry const &,mozilla::WebGLFastArray::Entry::Comparator const &) (in xul.dll)
mozilla::WebGLFastArray::RemoveElement(unsigned __int64) (in xul.dll)
mozilla::WebGLUniformLocation::Delete() (in xul.dll)
mozilla::WebGLRefCountedObject::DeleteOnce() (in xul.dll)
mozilla::WebGLUniformLocation::~WebGLUniformLocation() (in xul.dll)
mozilla::WebGLUniformLocation::`scalar deleting destructor'(unsigned int) (in xul.dll)
mozilla::WebGLUniformLocation::Release() (in xul.dll)
DoDeferredRelease (in xul.dll)
XPCJSRuntime::GCCallback(JSRuntime *,JSGCStatus) (in xul.dll)
Collect (in mozjs.dll)
js::GCSlice(JSRuntime *,js::JSGCInvocationKind,js::gcreason::Reason) (in mozjs.dll)
js::MaybeGC(JSContext *) (in mozjs.dll)
nsJSContext::ScriptEvaluated(bool) (in xul.dll)
nsCxPusher::Pop() (in xul.dll)
nsJSContext::CallEventHandler(nsISupports *,JSObject *,JSObject *,nsIArray *,nsIVariant * *) (in xul.dll)
nsGlobalWindow::RunTimeout(nsTimeout *) (in xul.dll)
nsGlobalWindow::TimerCallback(nsITimer *,void *) (in xul.dll)
nsTimerImpl::Fire() (in xul.dll)
nsTimerEvent::Run() (in xul.dll)
nsThread::ProcessNextEvent(bool,bool *) (in xul.dll)
mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate *) (in xul.dll)
MessageLoop::RunHandler() (in xul.dll)
MessageLoop::Run() (in xul.dll)
nsBaseAppShell::Run() (in xul.dll)
nsAppShell::Run() (in xul.dll)
nsAppStartup::Run() (in xul.dll)
XREMain::XRE_mainRun() (in xul.dll)
XREMain::XRE_main(int,char * * const,nsXREAppData const *) (in xul.dll)
XRE_main (in xul.dll)
wmain (in firefox.exe)
__tmainCRTStartup (in firefox.exe)
??? (in kernel32.dll)
??? (in ntdll.dll)
??? (in ntdll.dll)
Thanks, Vladan. This looks like a WebGL problem.
Assignee: general → nobody
Component: JavaScript Engine → Canvas: WebGL
QA Contact: general → canvas.webgl
WebGLUniformLocation objects can't be kept alive by the WebGL context or by other WebGL objects. So all the bookkeeping we're doing here, is not needed. Removing.
Attachment #626124 - Flags: review?(jmuizelaar)
Removed mMonotonicHandle as well.
Attachment #626124 - Attachment is obsolete: true
Attachment #626124 - Flags: review?(jmuizelaar)
Attachment #626126 - Flags: review?(jmuizelaar)
Attachment #626126 - Flags: review?(jmuizelaar) → review+
http://hg.mozilla.org/integration/mozilla-inbound/rev/5f14275bb276
Assignee: nobody → bjacob
Target Milestone: --- → mozilla15
Comment on attachment 626126 [details] [diff] [review]
WebGLUniformLocation doesn't need to be a WebGLRefCountedObject, i.e. all that work we're doing here is useless

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 707033
User impact if declined: Performance issue: Long GC pauses during certain WebGL animations (not all WebGL animations, just those that create a zillion WebGLUniformLocation's)
Testing completed (on m-c, etc.): m-i
Risk to taking this patch (and alternatives if risky): not risky at all. Just removal of useless code.
String or UUID changes made by this patch: none
Attachment #626126 - Flags: approval-mozilla-aurora?
OS: Mac OS X → All
Hardware: x86 → All
https://hg.mozilla.org/mozilla-central/rev/5f14275bb276
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Filed Bug 758404 about getting rid of the remaining arrays, in favor of MFBT LinkedList, for other WebGL object types. Has patch.
See Also: → 758404
Comment on attachment 626126 [details] [diff] [review]
WebGLUniformLocation doesn't need to be a WebGLRefCountedObject, i.e. all that work we're doing here is useless

[Triage Comment]
Low risk perf fix. Approved for Aurora 14.
Attachment #626126 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: