Closed
Bug 1204837
Opened 8 years ago
Closed 8 years ago
Serious performance regression in FxOS
Categories
(Firefox OS Graveyard :: Performance, defect, P1)
Tracking
(blocking-b2g:2.5+, firefox44 fixed)
Tracking | Status | |
---|---|---|
firefox44 | --- | fixed |
People
(Reporter: mlien, Assigned: cyu)
References
Details
(Keywords: dev-doc-complete, perf, regression, Whiteboard: [MemShrink:P1])
Attachments
(2 files, 2 obsolete files)
1.52 KB,
patch
|
khuey
:
review+
|
Details | Diff | Splinter Review |
3.43 KB,
patch
|
khuey
:
review+
nical
:
review+
|
Details | Diff | Splinter Review |
On Tue Sep. 15 2015, Raptor detected serious regressions in FxOS. Every application's cold launch time is regressed by more than 1.5 seconds and memory usage grows more than 7MB. This could point to an issue in the System or the platform. --- Device: Aries-kk Memory: 2048 Branch: master --- Gaia revision: 02a15ec81e0bece9 Previous Gaia revision: 47f0ac6cf4c4525c Gecko revision: 13d2e06d95f07770 Previous Gecko revision: 0baf29d2d17a19aa --- Performance numbers [1][2] on Raptor. [1] http://raptor.mozilla.org/#/dashboard/script/apps.js?device=aries&branch=master&memory=2048&series=coldlaunch.visuallyLoaded [2] http://raptor.mozilla.org/#/dashboard/script/apps-memory.js?device=aries&branch=master&memory=2048
Comment 1•8 years ago
|
||
This is blocker. Ting, could you help to test this on regression bot? Thanks.
blocking-b2g: 2.5? → 2.5+
Flags: needinfo?(janus926)
Flags: needinfo?(anygregor)
Comment 2•8 years ago
|
||
Is bisecting now, will update later. Kepp NI.
Assignee | ||
Comment 3•8 years ago
|
||
It's bug 1152046. It's supposedly backed out, but the backout was backed out so the patch is still on central.
Comment 5•8 years ago
|
||
We need to get bug 1166923 fixed.
Assignee | ||
Comment 6•8 years ago
|
||
Assignee: nobody → cyu
Attachment #8661735 -
Flags: review?(khuey)
Comment 7•8 years ago
|
||
Does that also explain the memory usage regression?
Comment 8•8 years ago
|
||
would the proposed patch (probably?) make the backout of 1152046 successful on b2g? Any idea why this dependence formed? - 1152046 was only on m-c for a short period.
Flags: needinfo?(cyu)
Comment 9•8 years ago
|
||
Note: I backed out bug 1152046 (again) locally, since I was insta-crashing with it in. That lets me run, but testing https://mozilla.github.com/webrtc-landing/pc_test.html crashes in the same place (jemalloc/internal/bitmap.h) when in e10s - but not in a non-e10s profile. So it's very touchy and machine/perf/allocation/? dependent.
Assignee | ||
Comment 10•8 years ago
|
||
(In reply to [:fabrice] Fabrice Desré from comment #7) > Does that also explain the memory usage regression? Yes, because content processes are forked from b2g.
Flags: needinfo?(cyu)
Assignee | ||
Comment 11•8 years ago
|
||
(In reply to Patrick McManus [:mcmanus] from comment #8) > would the proposed patch (probably?) make the backout of 1152046 successful > on b2g? Any idea why this dependence formed? - 1152046 was only on m-c for a > short period. This patch doesn't make the backout successful. It's a forward fix for the performance regression of bug 1152046 on b2g. 1152046 is still on m-c because the backout itself is backed out.
Updated•8 years ago
|
Attachment #8661735 -
Attachment is patch: true
Attachment #8661735 -
Flags: review?(khuey) → review+
Assignee | ||
Comment 12•8 years ago
|
||
url: https://hg.mozilla.org/integration/b2g-inbound/rev/99c11d836978425de6256aa5baf355843f39f400 changeset: 99c11d836978425de6256aa5baf355843f39f400 user: Cervantes Yu <cyu@mozilla.com> date: Thu Sep 17 14:36:06 2015 +0800 description: Bug 1204837: Make ClosingService thread Nuwa-friedly. r=khuey
Updated•8 years ago
|
Severity: major → blocker
Priority: -- → P1
Comment 13•8 years ago
|
||
Backout: https://hg.mozilla.org/integration/b2g-inbound/rev/2b23ccfe78d8
Assignee | ||
Comment 14•8 years ago
|
||
When we see mochitest shutdown timeout, the chrome process has: We have the main thread waiting for the IO thread: > Thread 1 (Thread 927.927): > #0 __futex_syscall3 () at bionic/libc/arch-arm/bionic/atomics_arm.S:183 > #1 0x4008e56c in __pthread_cond_timedwait_relative (cond=0x2f2d4, mutex=0x400c433c <gThreadListLock>, reltime=0x0) at bionic/libc/bionic/pthread.c:1477 > #2 0x4008e620 in __pthread_cond_timedwait (cond=0x2f2d4, mutex=0x400c433c <gThreadListLock>, abstime=0x0, clock=0) at bionic/libc/bionic/pthread.c:1500 > #3 0x4008e748 in pthread_join (thid=193200, ret_val=0x0) at bionic/libc/bionic/pthread.c:654 > #4 0x40f8d6f2 in base::Thread::Stop (this=this@entry=0x40441268) at ipc/chromium/src/base/thread.cc:122 > #5 0x40f9b794 in mozilla::ipc::BrowserProcessSubThread::~BrowserProcessSubThread (this=0x40441268, __in_chrg=<optimized out>) at ipc/glue/BrowserProcessSubThread.cpp:57 > #6 0x40f9b7e0 in mozilla::ipc::BrowserProcessSubThread::~BrowserProcessSubThread (this=0x40441268, __in_chrg=<optimized out>) at ipc/glue/BrowserProcessSubThread.cpp:63 > #7 0x40dca4d8 in mozilla::ShutdownXPCOM (aServMgr=<optimized out>) at xpcom/build/XPCOMInit.cpp:1027 > #8 0x423326fa in ScopedXPCOMStartup::~ScopedXPCOMStartup (this=0x4042f338, __in_chrg=<optimized out>) at toolkit/xre/nsAppRunner.cpp:1499 > #9 0x4233272a in mozilla::DefaultDelete<ScopedXPCOMStartup>::operator() (aPtr=0x4042f338, this=<optimized out>) at ../../dist/include/mozilla/UniquePtr.h:482 > #10 0x423377ca in reset (aPtr=0x0, this=0xbea4e7ec) at ../../dist/include/mozilla/UniquePtr.h:309 > #11 operator= (this=0xbea4e7ec) at ../../dist/include/mozilla/UniquePtr.h:279 > #12 XREMain::XRE_main (this=this@entry=0xbea4e7d8, argc=argc@entry=1, argv=argv@entry=0x4042f288, aAppData=aAppData@entry=0x2da88 <_ZL8sAppData>) at toolkit/xre/nsAppRunner.cpp:4410 > #13 0x42337962 in XRE_main (argc=1, argv=0x4042f288, aAppData=0x2da88 <_ZL8sAppData>, aFlags=<optimized out>) at toolkit/xre/nsAppRunner.cpp:4486 > #14 0x00010e26 in do_main (argc=argc@entry=1, argv=argv@entry=0x4042f288) at b2g/app/nsBrowserApp.cpp:167 > #15 0x00010f98 in b2g_main (argc=1, argv=<optimized out>) at b2g/app/nsBrowserApp.cpp:299 > #16 0x00010ca8 in RunProcesses (aReservedFds=..., argv=0xbea4fac4, argc=1) at b2g/app/B2GLoader.cpp:232 > #17 main (argc=1, argv=0xbea4fac4) at b2g/app/B2GLoader.cpp:297 > (gdb) fr 7 > #7 0x40dca4d8 in mozilla::ShutdownXPCOM (aServMgr=<optimized out>) at xpcom/build/XPCOMInit.cpp:1027 > 1027 delete sIOThread; Where the IO thread notifies that the thread that ImageBrigeParent runs on has channel error, but the thread was already shut down: > > Thread 4 (Thread 927.942): > #0 __futex_syscall3 () at bionic/libc/arch-arm/bionic/atomics_arm.S:183 > #1 0x4008e264 in _normal_lock (mutex=<optimized out>) at bionic/libc/bionic/pthread.c:951 > #2 pthread_mutex_lock (mutex=0x483ffeb8) at bionic/libc/bionic/pthread.c:1041 > #3 0x4002bbd0 in __wrap_pthread_mutex_lock (mtx=0x483ffeb8) at mozglue/build/Nuwa.cpp:1308 > #4 0x40f85ea6 in LockImpl::Lock (this=this@entry=0x483ffeb8) at ipc/chromium/src/base/lock_impl_posix.cc:41 > #5 0x40f8a348 in Acquire (this=0x483ffeb8) at ipc/chromium/src/base/lock.h:16 > #6 AutoLock (lock=..., this=0x44b9cc48) at ipc/chromium/src/base/lock.h:43 > #7 MessageLoop::PostTask_Helper (this=0x483ffdf0, from_here=..., task=<optimized out>, delay_ms=delay_ms@entry=0, nestable=nestable@entry=true) at ipc/chromium/src/base/message_loop.cc:326 > #8 0x40f8a38c in MessageLoop::PostTask (this=<optimized out>, from_here=..., task=<optimized out>) at ipc/chromium/src/base/message_loop.cc:267 > #9 0x40f9beb2 in mozilla::ipc::MessageChannel::PostErrorNotifyTask (this=0x48bf8b38) at ipc/glue/MessageChannel.cpp:1836 > #10 0x40f9eda8 in mozilla::ipc::ProcessLink::OnChannelError (this=0x47aee9e0) at ipc/glue/MessageLink.cpp:429 > #11 0x40f9261e in IPC::Channel::ChannelImpl::OnFileCanReadWithoutBlocking (this=0x4a1e3800, fd=78) at ipc/chromium/src/chrome/common/ipc_channel_posix.cc:879 > #12 0x40f852ae in base::MessagePumpLibevent::OnLibeventNotification (fd=78, flags=<optimized out>, context=0x4a1e3800) at ipc/chromium/src/base/message_pump_libevent.cc:231 > #13 0x40f831e2 in event_process_active_single_queue (activeq=0x4042f2a8, base=0x40467180) at ipc/chromium/src/third_party/libevent/event.c:1350 > #14 event_process_active (base=<optimized out>) at ipc/chromium/src/third_party/libevent/event.c:1420 > #15 event_base_loop (base=0x40467180, flags=1) at ipc/chromium/src/third_party/libevent/event.c:1621 > #16 0x40f86b26 in base::MessagePumpLibevent::Run (this=0x4040c6e0, delegate=0x44b9cdf0) at ipc/chromium/src/base/message_pump_libevent.cc:357 > #17 0x40f87a78 in MessageLoop::RunInternal (this=this@entry=0x44b9cdf0) at ipc/chromium/src/base/message_loop.cc:234 > #18 0x40f87a92 in RunHandler (this=0x44b9cdf0) at ipc/chromium/src/base/message_loop.cc:227 > #19 MessageLoop::Run (this=this@entry=0x44b9cdf0) at ipc/chromium/src/base/message_loop.cc:201 > #20 0x40f8e2fc in base::Thread::ThreadMain (this=0x40441268) at ipc/chromium/src/base/thread.cc:170 > #21 0x40f8c2a4 in ThreadFunc (closure=<optimized out>) at ipc/chromium/src/base/platform_thread_posix.cc:39 > #22 0x4008ee2c in __thread_entry (func=0x40f8c29d <ThreadFunc(void*)>, arg=0x40441268, tls=<optimized out>) at bionic/libc/bionic/pthread.c:217 > #23 0x4008e97c in pthread_create (thread_out=<optimized out>, attr=0xbea4e640, start_routine=0x40f8c29d <ThreadFunc(void*)>, arg=0x40441268) at bionic/libc/bionic/pthread.c:357 > #24 0x00000000 in ?? () > Backtrace stopped: previous frame identical to this frame (corrupt stack?) > The channel listener is ImageBridgeParent: > #9 0x40f9beb2 in mozilla::ipc::MessageChannel::PostErrorNotifyTask (this=0x48bf8b38) at ipc/glue/MessageChannel.cpp:1836 > 1836 mWorkerLoop->PostTask(FROM_HERE, mChannelErrorTask); > (gdb) p *mListener.mRef.mPtr.mPtr > $3 = (mozilla::layers::ImageBridgeParent) { > <mozilla::layers::PImageBridgeParent> = { > <mozilla::ipc::IProtocol> = { > ... (omitted for brevity)
Assignee | ||
Comment 15•8 years ago
|
||
The ImageBridgeParent instance belongs to the preallocated process. It might be that the preallocated process was started too late so that its shutdown deadlocks the chrome process. I think we need to disable the preallocated process to stabilize the tests.
Assignee | ||
Comment 16•8 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=46c5a86b6c01
Assignee | ||
Comment 17•8 years ago
|
||
Let's see if bug 1168337 fixes this shutdown deadlock.
Updated•8 years ago
|
Status: NEW → ASSIGNED
Assignee | ||
Comment 18•8 years ago
|
||
(In reply to Cervantes Yu [:cyu] [:cervantes] from comment #17) > Let's see if bug 1168337 fixes this shutdown deadlock. Still having mochitest shutdown hang. There must be another bug that prevents the shutdown process from completion.
Comment 19•8 years ago
|
||
Further effect of this bug appears to be that the preallocated process does not exist, which makes profiling app startup difficult.
Updated•8 years ago
|
Keywords: dev-doc-needed
Updated•8 years ago
|
Whiteboard: [MemShrink]
Comment 20•8 years ago
|
||
We need to get this fixed asap. It has been 3 weeks now. Whats the exact status here? I see backed out patches, dependent bugs and 'see also' bugs. What needs to be done to fix this?
Flags: needinfo?(anygregor) → needinfo?(cyu)
Updated•8 years ago
|
Whiteboard: [MemShrink] → [MemShrink:P1]
Assignee | ||
Comment 21•8 years ago
|
||
Bug 1152046 cannot be backed out because of other dependencies that lands after it. It has a forward fix to disable the feature on b2g and fix this regression, but landing it reveals another regression that snicked in after the main patch of 1152046 lands and breaks Nuwa (comment #14). I am working on a forward fix of comment #14 so we can land the 2 forward fixes (one for comment #14 of this bug and the other in comment #86 of bug 1152046).
Flags: needinfo?(cyu)
Assignee | ||
Comment 22•8 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=dc76f57a3482
Assignee | ||
Comment 23•8 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=7d7d4dc6c4d3
Assignee | ||
Comment 24•8 years ago
|
||
(In reply to Cervantes Yu [:cyu] [:cervantes] from comment #23) > https://treeherder.mozilla.org/#/jobs?repo=try&revision=7d7d4dc6c4d3 I think bug 1127270 caused this regression in comment 14: the cloned ImageBridgeParent and CrossProcessCompositorParent actors doesn't have their OnChannelConnected() called and miss the references to the compositor thread. Call OnChannelConnected() when cloning the actors. This should fix the problem.
Assignee | ||
Comment 25•8 years ago
|
||
Attachment #8667870 -
Flags: review?(khuey)
Assignee | ||
Comment 26•8 years ago
|
||
Added comments and assertions.
Attachment #8667870 -
Attachment is obsolete: true
Attachment #8667870 -
Flags: review?(khuey)
Attachment #8668229 -
Flags: review?(khuey)
Assignee | ||
Updated•8 years ago
|
Attachment #8668229 -
Flags: review?(nical.bugzilla)
Assignee | ||
Comment 27•8 years ago
|
||
Remove the irrelevant hunk.
Attachment #8668229 -
Attachment is obsolete: true
Attachment #8668229 -
Flags: review?(nical.bugzilla)
Attachment #8668229 -
Flags: review?(khuey)
Assignee | ||
Updated•8 years ago
|
Attachment #8668233 -
Flags: review?(khuey)
Assignee | ||
Updated•8 years ago
|
Attachment #8668233 -
Flags: review?(nical.bugzilla)
Assignee | ||
Comment 28•8 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=279fb0a101e2
Updated•8 years ago
|
Attachment #8668233 -
Flags: review?(nical.bugzilla) → review+
Assignee | ||
Comment 29•8 years ago
|
||
https://hg.mozilla.org/integration/b2g-inbound/rev/90c9af6a9708445d2e96746c34ab419376e9bf69 Bug 1204837, Part 1: Make ClosingService thread Nuwa-friedly. r=khuey https://hg.mozilla.org/integration/b2g-inbound/rev/2d4f2af678ee9149baee5f3542950d8f38f66994 Bug 1204837, Part 2: Cloned ImageBridgeParent and CompositorParent should hold a reference to the compositor thread. r=nical
Assignee | ||
Comment 30•8 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=efd2737d3f80
Attachment #8668233 -
Flags: review?(khuey) → review+
Comment 31•8 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/90c9af6a9708 https://hg.mozilla.org/mozilla-central/rev/2d4f2af678ee
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
status-firefox44:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → FxOS-S9 (16Oct)
Updated•7 years ago
|
Keywords: dev-doc-needed → dev-doc-complete
You need to log in
before you can comment on or make changes to this bug.
Description
•