Closed
Bug 1094010
Opened 10 years ago
Closed 9 years ago
[meta] Backlog for profiling on quad core device
Categories
(Firefox OS Graveyard :: Performance, defect, P2)
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: sinker, Unassigned)
References
Details
Before we start to improve B2G for quad core devices, we need to do profiling to learn behavior of current B2G. This bug here to
1. define profiling use cases and scenarios,
2. do profiling,
3. investigate profiling data.
Reporter | ||
Updated•10 years ago
|
Summary: Profile B2G for Quad core platforms as baseline of improvements → Profile B2G for Quad core platforms as the baseline of improvements
Comment 1•10 years ago
|
||
As the requirement was raised from SMS app launch time, currently we think the test cases could be:
1. App launch time (template, sms, etc.)
2. UI benchmark which includes DOM manipulation and causes layout changes
Any other ideas?
Reporter | ||
Comment 2•10 years ago
|
||
We need to watch on start-up time, FPS of some animations, and response time of actions. For example,
- start-up time of pre-installed apps,
- FPS of scrolling of FB, twitter, and swipe of homescreen,
- response time of keyboard.
Not just SPS profiler, we may need to run TaskTracer and other tools to get different views of behavior.
Comment 3•10 years ago
|
||
I just talked to Jerry, cpu scheduling systrace captured does not relate to profiler labels, and it can record cpu frequency, loading, and idle so should be good to understand how busy cpu is on each core.
Reporter | ||
Updated•10 years ago
|
Assignee: nobody → kchen
Comment 4•10 years ago
|
||
15:53 <kanru`> TODO: define app launch end
15:57 <kanru`> TODO: profile template app
16:00 <kanru`> TODO: mark each stage of web page rendering: network kickoff, frame construction, layer construction/paint, etc
16:10 <kanru`> TOOD: find out if JAR file is partially unzip
Comment 5•10 years ago
|
||
Profile Messages app launch time which has reference-workload-medium installed.
Configuration:
version: 2.2 (trunk)
gecko: c44a46f049c3fda5bbf2c4371364e154942c250c
gaia: 2f077d05105f227839dbecb22cb5324f1321b934
Commands:
sps: $ ./profile.sh start -p b2g && ./profile.sh start -p Homescreen && ./profile.sh start -p [preallocated pid]; sleep 5; ./profile.sh capture
systrace: $ systrace.py --time=5 --buf-size=10240 -o mynewtrace.html sched load freq idle
Steps:
1) Exeucte sps and systrace commands from two terminals simutaneoulsy
2) Touch Messages app icon
Results:
Nexus-4
sps http://people.mozilla.org/~bgirard/cleopatra/#report=4bd449e91e5592dd8181d26f5dd04b81d0702fa6
systrace http://www.googledrive.com/host/0B0yKxuGl8-XATld5SkdmQ2FPZms
Nexus-5
sps http://people.mozilla.org/~bgirard/cleopatra/#report=9e925256793041c656146df19ad688cb54c3881d
systrace http://www.googledrive.com/host/0B0yKxuGl8-XANWZ2bklhQmExajA
Notes:
I added profiler label to OnStartRequest() and OnStopRequest() of nsJARChannel.
Comment 6•10 years ago
|
||
(In reply to Ting-Yu Chou [:ting] from comment #5)
> Nexus-5
> sps http://people.mozilla.org/~bgirard/cleopatra/#report=9e925256793041c656146df19ad688cb54c3881d
I checked the 1.5s after the first DispatchEvent.
Synchronous IPCs block SMS main thread because B2G main is busy:
1. 46ms [1524,1571] RecvShow
1a. SendPRenderFrameConstructor (bug 1085655)
1b. SendPLayerTransactionConstructor (bug 1085655)
1c. BrowserElementChild.js SendSyncMessage (bug 1003848)
2. 65ms [1583,1649] RecvUpdateDimensions, note this is not observed in bug 1074783.
2a. SendGetDPI
2c. SendGetDefaultScale
Long run task on B2G main thread:
1. 68ms [1463,1532] awf_launch() @ app_window_factory.js
2. 17ms [1533,1551] nsRefreshDriver::Tick
3. 12ms [1555,1568] awm_switchApp() @ app_window_manager.js
4. 71ms [1570,1642] nsRefereshDriver::Tick
5. 94ms [2309.2404] aw__handle_mozbrowserloadend @ app_window.js:883
Long run task on SMS main thread:
1. 238ms [1823,2062] s() @ gaia-header.js:1 (bug 1089920)
2. 67ms [2491,2559] nsRefreshDriver::Tick (Refresh8), delays PMobileMessageCursor::RecvNotifyResult
3. 95ms [2601,2697] nsRefreshDriver::Tick (Refresh9)
Some thoughts:
1. Can RecvLoadURL be the first IPC content process received, and is it possible to RecvCacheFileDescriptor when RecvLoadURL?
2. It seems nsJARChannel is unarchiving on main thread, if it is, can we move it out for the changing of 1)?
3. The time difference between nsViewManager::DispatchEvent for the touch down/up is ~110ms, is it real or because of latency?
4. Is it possible to move system app to a worker?
5. Any chances to lower the impact of nsRefreshDriver::Tick()?
Comment 7•10 years ago
|
||
http://alivedise.github.io/nephthys/#2e1e6fad7f99954e634cee6366d48723
task tracer data for template app
Comment 8•10 years ago
|
||
Ideally content process should start working as early as possible and never block by synchronous IPC, here're what I think we can do (reference to Nexus-4 SPS profile of comment 5):
1. Content process to RecvLoadURL() in the beginning
Currently B2G process knows what app to launch in Webapps.jsm doLaunch(), but SendLoadURL() is done in nsFrameLoader::ReallyStartLoadingInternal() when system app set iframe's src, which is 44ms later. This is a waste. I have traced nsJARChannel, the inflate is usually done off main thread, and parsing read data is done on the main thread, so we will make at least 3 cores busy from the beginning.
2. Move system app to a worker
Content process RecvLoadURL() 67ms later after B2G SendLoadURL(), since RecvShow() comes first and is blocked because B2G main thread is busy on system app and the painting. As system app could run on the main thread for a while, like [1603,1654] and [2854,2966], move it to a worker will have better responsiveness of B2G main thread.
3. Minimize the overhead of app launch animation
Actually there're two animations, one for closing homescreen and one for splash, and there're something else especially around status bar. [1702,1782] is a long painting for the splash.
4. Check why Refresh1 takes ~100ms, is there anything not related to Messages app and can be improved?
Any thoughts?
Comment 9•10 years ago
|
||
About 4., I think this is due to gaia header, I'm investigating using a simpler header for our first panel in bug 1089920.
Comment 10•10 years ago
|
||
I profiled also launching Contacts, Gallery and Music on Nexus-4 with the same configuration as comment 5, basically the same problems like Messages app:
Contacts: http://people.mozilla.org/~bgirard/cleopatra/#report=3b8e661b1c9ac31e48faac98993a4abde60b2a15
- runFontFit() [1565,1737].
Gallery: http://people.mozilla.org/~bgirard/cleopatra/#report=728451ecd3cd5f2cc93e4f5d494028c95947caf6
- B2G process main thread has a very long repaint [1177,1364] and [1381,1491] after system app launch Gallery.
- runFontFit() [1740,1912].
Music: http://people.mozilla.org/~bgirard/cleopatra/#report=cb3075d5d82466f883376f3c7bbb8cdf4c80f036
- PBackgroundIDBCursor::RecvResponse() in Music calls SendGetVolumes() [2460,2609] and is blocked since B2G is busy painting and handling mozbrowserloadend in system app.
- runFontFit() [2148,2260].
Comment 11•10 years ago
|
||
Ting-Yu, Can you file bugs for tangible improvements identified above.
Flags: needinfo?(tchou)
Comment 12•10 years ago
|
||
Ravi, Thinker let not only me but also Ting-Yuan and Kanru to collect different profiles, I'll file bugs later once we reach a concensus.
Flags: needinfo?(tchou)
Comment 13•10 years ago
|
||
http://alivedise.github.io/nephthys/#2dc7879403a7614abbd7fed6654aca40
New profile of template app with labels
Comment 14•10 years ago
|
||
I got similar results to comment #6 with Communications/contacts on a nexus 4.
http://people.mozilla.org/~bgirard/cleopatra/#report=9ec5296becf5a8c8d06f8f6cf7be5715e60a0e07
There are 958 samples (1ms) out of 1311ms, first touch event to loadstart, in the contacts app. Not sure why there are 353 samples missed in SPS. The first ~170ms are spent on waiting b2g, which was preparing and rendering something before informing the pre-allocated app:
AppWindow() -> aw_render() -> aw__render() -> publish('rendered') ->
IPDL::PBrowser::RecvShow
After that the content process spent ~130ms wait b2g in IPDL::PBrowser::RecvShow, mainly in IPDL::PBrowser::SendSyncMessage (59ms) and IPDL::PBrowser::SendPRenderFrameConstructor (55ms).
Not sure if the execution orders can be adjusted to save the 170ms or even 300ms launch time.
When counting from first touch event to loadend, it costs:
382ms on waiting b2g: initial wait (171ms), IPDL::XXX::RecvYYY (211ms)
193ms on nsRefreshDriver::Tick
150ms on gaia-header.js, mostly RunFontFit
212ms on nsInputStreamPump::OnInputStreamReady
It seems to me that it's worth investigation if the 382ms can be hidden by parallelization.
Comment 15•10 years ago
|
||
For what it's worth, here is the perf result, which provides a low-level, system-wise aspect of view. There is no outstanding function.
Samples: 3K of event 'cpu-clock:HG', Event count (approx.): 3150
30.03% 946 swapper [kernel.kallsyms] [k] msm_cpuidle_enter
0.92% 29 b2g [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
0.51% 16 b2g [kernel.kallsyms] [k] _raw_spin_unlock_irq
0.41% 13 Communications libmozglue.so [.] arena_malloc
0.41% 13 b2g libmozglue.so [.] arena_dalloc
0.41% 13 perf [kernel.kallsyms] [k] format_decode
0.38% 12 b2g libc.so [.] memset
0.35% 11 Communications [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
0.35% 11 Communications libxul.so [.] xpc::JSXrayTraits::resolveOwnProperty(JSContext*, js::Wrapper
0.35% 11 perf [kernel.kallsyms] [k] vsnprintf
0.32% 10 Communications libc.so [.] memset
0.32% 10 Communications libxul.so [.] JSCompartment::wrap(JSContext*, JS::MutableHandle<JSObject*>,
0.32% 10 Communications libxul.so [.] js::Shape::search(js::ExclusiveContext*, js::Shape*, jsid, js:
0.32% 10 Communications libxul.so [.] js::StringEqualsAscii(JSLinearString*, char const*)
0.32% 10 b2g [kernel.kallsyms] [k] do_page_fault
0.29% 9 Communications libmozglue.so [.] arena_dalloc
0.25% 8 (Preallocated a [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
0.25% 8 Communications libc.so [.] __memcpy_base
0.25% 8 Communications libxul.so [.] Interpret(JSContext*, js::RunState&)
0.25% 8 b2g libmozglue.so [.] arena_malloc
Reporter | ||
Comment 16•10 years ago
|
||
(In reply to Ting-Yuan Huang from comment #14)
> and IPDL::PBrowser::SendPRenderFrameConstructor (55ms).
>
There is a patch to improve SendPRenderFrameConstructor() that had been landed days ago.
Reporter | ||
Comment 17•10 years ago
|
||
(In reply to Ting-Yuan Huang from comment #14)
> I got similar results to comment #6 with Communications/contacts on a nexus
> 4.
>
> http://people.mozilla.org/~bgirard/cleopatra/
> #report=9ec5296becf5a8c8d06f8f6cf7be5715e60a0e07
If you look into the profile, you will find 2461ms ~ 2478ms is waiting for SendGetDPI(). I think it should be part of RecvUpdateDimensions. Then, we don't need another sync IPC.
Between 3746ms ~ 3951ms and more periods of 100+ms after then, they spend a lot of time at ContactManager to convert data. There is a potential improvement.
And, I find there are a lot of missed samples. It means CPU is busy at these periods. Some one should investigate these periods, maybe there is a chance to move these tasks to the periods that is less busy.
Reporter | ||
Comment 18•10 years ago
|
||
Gregor, could you give some comment on ContactManager?
Flags: needinfo?(anygregor)
Comment 19•10 years ago
|
||
Here's another profile, scrolling Pinterest on a Nexus-4 with the same gecko/gaia as comment 5:
Steps:
1. Open http://www.pinterest.com/all/ from Browser app, wait till it's loaded
2. Execute sps and systrace commands from two terminals simutaneously
3. Start scrolling down
Results:
sps: http://people.mozilla.org/~bgirard/cleopatra/#report=14d65851f3aeacc41f20b43aa860b82228b7aef0
systrace: http://www.googledrive.com/host/0B0yKxuGl8-XAMEd5a0JRNTNkZHc
Analysis:
1. Browser process is quite busy in the for loop of FrameLayerBuilder::PaintItems(), which I think could be parallelized on multi core. The 4th core is almost idle from systrace. ClientTiledLayerBuffer::ProgressiveUpdate() can also be a candidate of parallelization.
2. [3980,5840] browser process is running the site's javascript to load more tiles.
Comment 20•10 years ago
|
||
This is the profile of scrolling Messages with medium workload
systrace: http://people.mozilla.org/~kchen/bug1094010/message-scroll-systrace.html
CPU 2 and 3 are mostly idle. While the time is mostly spent on DoReflow and painting we could still see a lot of blurred text and blank on the screen.
I think it might be that the Message app has some optimization that aggressively throw away the the DOM node that is not visible.
Reporter | ||
Comment 21•10 years ago
|
||
(In reply to Kan-Ru Chen [:kanru] from comment #20)
> I think it might be that the Message app has some optimization that
> aggressively throw away the the DOM node that is not visible.
Please check this in further.
Comment 22•10 years ago
|
||
(In reply to Thinker Li [:sinker] from comment #21)
> (In reply to Kan-Ru Chen [:kanru] from comment #20)
> > I think it might be that the Message app has some optimization that
> > aggressively throw away the the DOM node that is not visible.
>
> Please check this in further.
Only the photo is removed by img.src = "";
Reporter | ||
Comment 23•10 years ago
|
||
(In reply to Kan-Ru Chen [:kanru] from comment #22)
> Only the photo is removed by img.src = "";
This seems an useless optimization since gecko would check if it is visible to user. Right?!
Try to remove this optimization.
Comment 24•10 years ago
|
||
(In reply to Thinker Li [:sinker] from comment #18)
> Gregor, could you give some comment on ContactManager?
What is the workload here? It depends on the number of contacts you have in your DB.
Flags: needinfo?(anygregor) → needinfo?(tlee)
Reporter | ||
Comment 25•10 years ago
|
||
Could you answer Gregor's question?
Flags: needinfo?(tlee) → needinfo?(thuang)
Comment 26•10 years ago
|
||
(In reply to Thinker Li [:sinker] from comment #23)
> (In reply to Kan-Ru Chen [:kanru] from comment #22)
> > Only the photo is removed by img.src = "";
>
> This seems an useless optimization since gecko would check if it is visible
> to user. Right?!
Actually, not really :)
You can ask :tn for more information but basically on B2G it doesn't work like this.
(In reply to Kan-Ru Chen [:kanru] from comment #22)
> Only the photo is removed by img.src = "";
Where do you see this? The only place I see this is when we resize a picture, the image is not in the DOM. (utils.js)
(In reply to Kan-Ru Chen [:kanru] from comment #20)
> This is the profile of scrolling Messages with medium workload
>
> systrace:
> http://people.mozilla.org/~kchen/bug1094010/message-scroll-systrace.html
>
> CPU 2 and 3 are mostly idle. While the time is mostly spent on DoReflow and
> painting we could still see a lot of blurred text and blank on the screen.
>
> I think it might be that the Message app has some optimization that
> aggressively throw away the the DOM node that is not visible.
Nope we don't do this at all right now. We only have a plain DOM that's not changing once it's loaded and no message is sent/received.
However, for threads belonging to known contacts, we display an image which can be costly to display (and it's also using :nth-child CSS selectors BTW). Maybe this is the culprit here?
Flags: needinfo?(tnikkel)
Comment 27•10 years ago
|
||
We generally try to keep around decoded versions of visible images and those that are close to being visible by way of scrolling them into view. Setting img.src to "" will probably also discard the compressed source of the image from memory.
Flags: needinfo?(tnikkel)
Comment 28•10 years ago
|
||
I misread the code. The setImageURL in contacts/js/views/list.js only set img.dataset.src to ''. I'm not sure how the image is rendered. I think it will be reflected to img.src at some point.
Anyway, I think it only looks slow because the photo is not rendered until the scroll stops.
Reporter | ||
Comment 29•10 years ago
|
||
(In reply to Thinker Li [:sinker] from comment #17)
> If you look into the profile, you will find 2461ms ~ 2478ms is waiting for
> SendGetDPI(). I think it should be part of RecvUpdateDimensions. Then, we
> don't need another sync IPC.
Bug 835679 is out there to fix this problem.
Comment 30•10 years ago
|
||
(In reply to Kan-Ru Chen [:kanru] from comment #28)
> I misread the code. The setImageURL in contacts/js/views/list.js only set
> img.dataset.src to ''. I'm not sure how the image is rendered. I think it
> will be reflected to img.src at some point.
>
> Anyway, I think it only looks slow because the photo is not rendered until
> the scroll stops.
The blank saw while scrolling is due to the lazy rendering of the DOM nodes. Only the first screenful of contacts is rendered. All other contacts are rendered when "onscreen".
Comment 31•10 years ago
|
||
(In reply to Kan-Ru Chen [:kanru] from comment #28)
> I misread the code. The setImageURL in contacts/js/views/list.js only set
> img.dataset.src to ''. I'm not sure how the image is rendered. I think it
> will be reflected to img.src at some point.
>
> Anyway, I think it only looks slow because the photo is not rendered until
> the scroll stops.
Ah, you're talking about the Contacts app here, not the Messages app. But comment 20 is about the Messages app.
(sorry Timothy, I didn't mean to ask NI, I just wanted to know your alias. But thanks for the answer!)
Comment 32•10 years ago
|
||
(In reply to Gregor Wagner [:gwagner] from comment #24)
> What is the workload here? It depends on the number of contacts you have in
> your DB.
It's "reference-workload-light" and there are 200 contacts.
Flags: needinfo?(thuang)
Reporter | ||
Updated•10 years ago
|
Flags: needinfo?(anygregor)
Comment 33•10 years ago
|
||
(In reply to Thinker Li [:sinker] from comment #17)
> (In reply to Ting-Yuan Huang from comment #14)
> > I got similar results to comment #6 with Communications/contacts on a nexus
> > 4.
> >
> > http://people.mozilla.org/~bgirard/cleopatra/
> > #report=9ec5296becf5a8c8d06f8f6cf7be5715e60a0e07
>
> If you look into the profile, you will find 2461ms ~ 2478ms is waiting for
> SendGetDPI(). I think it should be part of RecvUpdateDimensions. Then, we
> don't need another sync IPC.
>
> Between 3746ms ~ 3951ms and more periods of 100+ms after then, they spend a
> lot of time at ContactManager to convert data. There is a potential
> improvement.
>
I assume thats not the very first start of the app where we have to upgrade the contactsDB after pushing the workload.
4sec is a long time for converting. We do have a cursor so we don't pause the main thread for too long. We also have to copy all data from the parent to the child and with the cursor we have many roundtrips.
In the child we have to create objects that we can hand out to content, which is also quite slow (see bug 783057)
Flags: needinfo?(anygregor)
Comment 34•10 years ago
|
||
(In reply to Ting-Yu Chou [:ting] from comment #19)
> 1. Browser process is quite busy in the for loop of
> FrameLayerBuilder::PaintItems(), which I think could be parallelized on
> multi core. The 4th core is almost idle from systrace.
> ClientTiledLayerBuffer::ProgressiveUpdate() can also be a candidate of
> parallelization.
I've just talked to BenWa about this, he reminded me that parallelize probably doesn't help if the paintings are memory bound, especially on device.
Reporter | ||
Updated•10 years ago
|
Blocks: AppStartup
Comment 35•10 years ago
|
||
(In reply to Ting-Yu Chou [:ting] from comment #8)
> 2. Move system app to a worker
> Content process RecvLoadURL() 67ms later after B2G SendLoadURL(), since
> RecvShow() comes first and is blocked because B2G main thread is busy on
> system app and the painting. As system app could run on the main thread for
> a while, like [1603,1654] and [2854,2966], move it to a worker will have
> better responsiveness of B2G main thread.
Alive & Kevin, how do you think about this? Is this reasonable?
Flags: needinfo?(kgrandon)
Flags: needinfo?(alive)
Comment 36•10 years ago
|
||
(In reply to Ting-Yu Chou [:ting] from comment #35)
> (In reply to Ting-Yu Chou [:ting] from comment #8)
> > 2. Move system app to a worker
> > Content process RecvLoadURL() 67ms later after B2G SendLoadURL(), since
> > RecvShow() comes first and is blocked because B2G main thread is busy on
> > system app and the painting. As system app could run on the main thread for
> > a while, like [1603,1654] and [2854,2966], move it to a worker will have
> > better responsiveness of B2G main thread.
>
> Alive & Kevin, how do you think about this? Is this reasonable?
It might be.. but much part of system app is still dealing with DOM manipulation. We cannot move the entire system app to worker at all.
Flags: needinfo?(alive)
Comment 37•10 years ago
|
||
We are talking about web workers here, correct? I'm not sure how you would move the entire system to a web worker, but you may be able to move a few pieces. Sorry, I'm lacking some context/knowledge here, so apologize for the lack of ideas/suggestions on this subject.
Flags: needinfo?(kgrandon)
Updated•10 years ago
|
No longer blocks: AppStartup
Comment 38•10 years ago
|
||
I just went through the Gaia polish bugs [1], here're the ones I think we could profile:
bug 802208,
bug 1077169,
bug 1077516,
bug 1090571,
bug 1112581,
bug 1112590,
bug 1115537,
bug 1131050,
bug 1137613,
bug 1139303
[1] https://wiki.mozilla.org/FirefoxOS/polish
Summary: Profile B2G for Quad core platforms as the baseline of improvements → [meta] Profile on quad core device
Comment 39•10 years ago
|
||
(In reply to Ting-Yu Chou [:ting] from comment #38)
> I just went through the Gaia polish bugs [1], here're the ones I think we
> could profile:
>
> bug 1115537,
Scratch this one, I misplaced it.
Updated•10 years ago
|
Comment 40•10 years ago
|
||
Can not reproduce bug 802208, the slide works smoothly.
No longer depends on: 802208
Comment 41•10 years ago
|
||
Copy from bug 1093564 comment 9.
1. While scrolling in one direction, it pauses for a second if you scroll reversely.
2. It takes some time to show the homescreen when you press home key at card view.
3. Unlock screen does not show homescreen smoothly.
4. Screen does not scroll smoothly with your continuous swipe.
5. Keyboard does not come up right away when you start input.
6. It's janky when you drag at Contacts app to search people by the first character.
7. Press button ">" or "<" to switch forward/backward to screens ususally not happen instantly, this can be seen easily from Settings.
8. Switching between alphabet/symbol keybards is slow.
9. Keyboard seems a bit late than my fingers, you can see the characters come up later if you random press some keys fast and then stop.
10. Dragging web page from Browser app is janky.
11. The animation of moving app out from card view is not smooth
12. Moving icon around while editing homescreen is janky.
Updated•10 years ago
|
Updated•9 years ago
|
Assignee: kchen → nobody
Status: NEW → RESOLVED
Closed: 9 years ago
tracking-b2g:
backlog → ---
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•