Closed Bug 960224 Opened 11 years ago Closed 11 years ago

[b2g][music] music regressed 50ms on datazilla cold launch time (Jan 15)

Categories

(Firefox OS Graveyard :: Performance, defect, P1)

ARM
All
defect

Tracking

(blocking-b2g:1.3+, firefox29 fixed, firefox30 fixed, firefox31 fixed, b2g-v1.3 fixed, b2g-v1.3T fixed, b2g-v1.4 fixed, b2g-v2.0 fixed)

RESOLVED FIXED
1.4 S4 (28mar)
blocking-b2g 1.3+
Tracking Status
firefox29 --- fixed
firefox30 --- fixed
firefox31 --- fixed
b2g-v1.3 --- fixed
b2g-v1.3T --- fixed
b2g-v1.4 --- fixed
b2g-v2.0 --- fixed

People

(Reporter: bkelly, Assigned: huseby)

References

Details

(Keywords: perf, regression, Whiteboard: [c=progress p=5 s= u=1.3] [b2gperf_regression])

Attachments

(2 files, 1 obsolete file)

Whiteboard: [c=automation p= s= u=] → [c=automation p= s= u=][b2gperf_regression]
Hema, please have someone on the Music team look into this.
Flags: needinfo?(hkoka)
Whiteboard: [c=automation p= s= u=][b2gperf_regression] → [c=progress p= s= u=] [b2gperf_regression]
Jim, could you please take a look. Thanks Hema
Assignee: nobody → squibblyflabbetydoo
Flags: needinfo?(hkoka)
The only gaia patch that touched the music app was bug 959659 - which focuses on compressing PNG images.
Skimming the gecko regression range, I only see two media bugs at a quick glance: * bug 957939 * bug 955911 Is the music app perf test making use of music files during the test? If so, what codecs are being used?
Flags: needinfo?(bkelly)
I don't have privileges to see bug 955911, but bug 957939 shouldn't affect anything, since it just moves some #defines around.
(In reply to Jason Smith [:jsmith] from comment #4) > Is the music app perf test making use of music files during the test? If so, > what codecs are being used? I believe there are music files loaded on the device via our reference workloads, but this test merely launches music without playing any files. I think the next step here would be to try to reproduce on a device locally. This can be done by running the following on an eng build before and after the regression range: sudo pip install b2gperf adb forward tcp:2828 tcp:2828 b2gperf --delay=10 Music Jason, is this something QA can do for us?
Flags: needinfo?(bkelly)
Keywords: qawanted
(In reply to Ben Kelly [:bkelly] from comment #6) > (In reply to Jason Smith [:jsmith] from comment #4) > > Is the music app perf test making use of music files during the test? If so, > > what codecs are being used? > > I believe there are music files loaded on the device via our reference > workloads, but this test merely launches music without playing any files. > > I think the next step here would be to try to reproduce on a device locally. > This can be done by running the following on an eng build before and after > the regression range: > > sudo pip install b2gperf > adb forward tcp:2828 tcp:2828 > b2gperf --delay=10 Music > > Jason, is this something QA can do for us? Yeah we can do that.
QA Contact: pbylenga
Here are my results running it locally which confirms the regression window set in comment 0. During the test, 500 music files are loaded onto the device which are distributed between 50 albums but they aren't utilized instead the app is launched and then closed during the test. 1/14/2014 Results for Music, cold_load_time: median:596, mean:603, std: 24, max:700, min:563, all:604,599,602,618,595,617,596,590,601,580,651,563,598,584,700,610,589,618,602,582,644,592,592,595,612,595,591,585,589,597 1/15/2014 Results for Music, cold_load_time: median:747, mean:762, std: 38, max:919, min:735, all:835,736,735,757,749,800,742,919,767,772,747,744,748,739,754,743,745,833,765,744,745,746,746,756,749,757,741,770,740,736 Environmental Variables Device: Buri ENG Master Mozilla RIL Build ID: 20140114040616 Gecko: http://hg.mozilla.org/mozilla-central/rev/34dddf6f7ec1 Gaia: 002cca258af8586859c6efb2dada2fcec36754a1 Platform Version: 29.0a1 Firmware Version: v1.2-device Environmental Variables Device: Buri ENG Master Mozilla RIL Build ID: 20140115041401 Gecko: http://hg.mozilla.org/mozilla-central/rev/81bced59e8b3 Gaia: 255a56ac67e5b28f1fc78307969cc83391c9652f Platform Version: 29.0a1 Firmware Version: v1.2-device
Keywords: qawanted
If we could bisect the regression window to find what change actually caused this, that'd be really helpful. Nothing listed in the commit range jumps out. My guess, since we're not actually loading music in the test, is something DOM-related.
(In reply to Jim Porter (:squib) from comment #9) > If we could bisect the regression window to find what change actually caused > this, that'd be really helpful. Nothing listed in the commit range jumps > out. My guess, since we're not actually loading music in the test, is > something DOM-related. Oh the QA side, that will be a bit difficult. We currently only have builds down to a day, so we can't do bisections down to a changeset. We are looking into this problem however. I think the best path forward at this point is to have a perf team member profile this & speculate a potential patch in the regression range that could cause this.
Dave, need you to look into this.
Status: NEW → ASSIGNED
Flags: needinfo?(dhuseby)
Priority: -- → P1
Ok I'm on it.
Flags: needinfo?(dhuseby)
Is a 50 ms perf regression bad enough to be a blocker? If so, please nom this to block.
I just tried both Gaia revisions on the v1.3 tip revision of b2g (and gecko). Here's what I found: v1.3 tip with Gaia revision 002cca2 Results for Music, cold_load_time: median:738, mean:747, std: 29, max:830, min:712 v1.3 tip with Gaia revision 255a56a Results for Music, cold_load_time: median:739, mean:740, std: 14, max:791, min:720 You'll notice there isn't any difference between the two and they are both equal to the latter, regressed, measurements listed in Comment 8. This tells me that the regression is caused in gecko somewhere. I'm going to bisect the gecko revision range and see if I can nail down the change that caused it.
I can confirm that the regression is in gecko. I'm almost done with the bisect and will have the offending changeset shortly.
Assignee: squibblyflabbetydoo → dhuseby
So the bisection result is: The first bad revision is: changeset: 163287:f8b39dc79dff parent: 163286:0420229d2b55 parent: 163240:34dddf6f7ec1 user: Wes Kocher <wkocher@mozilla.com> date: Mon Jan 13 20:55:10 2014 -0800 summary: Merge m-c to b2g-inbound Not all ancestors of this changeset have been checked. Use bisect --extend to continue the bisection from the common ancestor, f356b409d710. This is really common, the regression came in from a merge so I ran hg bisect --extend to continue the bisection from f356b409d710 to figure out which actual revision caused the regression.
After expanding the bisect to include the common ancestor, the offending revision is: The first bad revision is: changeset: 163282:9394e6901453 user: Michael Wu <mwu@mozilla.com> date: Tue Jan 14 11:12:36 2014 +0800 summary: Bug 958008 - Avoid copying images on Android/Gonk when there's nothing to optimize, r=jrmuizel For those that are interested, here's the bisect results: Gaia tip, Gecko v1.3 b6ac5d30404d (bad) Results for Music, cold_load_time: median:688, mean:721, std: 57, max:835, min:665 Gaia tip, Gecko v1.3 30e1a793fbf9 (good) Results for Music, cold_load_time: median:619, mean:640, std: 61, max:882, min:604 Gaia tip, Gecko v1.3 3a264db9f9e7 (bad) Results for Music, cold_load_time: median:780, mean:780, std: 24, max:870, min:706 Gaia tip, Gecko v1.3 01fb70a7cdbf (good) Results for Music, cold_load_time: median:619, mean:653, std: 75, max:900, min:597 Gaia tip, Gecko v1.3 c491d6b2ee25 (bad) Results for Music, cold_load_time: median:776, mean:786, std: 31, max:916, min:764 Gaia tip, Gecko v1.3 4c35eea82384 (bad) Results for Music, cold_load_time: median:780, mean:792, std: 41, max:978, min:763 Gaia tip, Gecko v1.3 f8b39dc79dff (bad) <--- first bad revision is a merge Results for Music, cold_load_time: median:770, mean:777, std: 23, max:871, min:760 Gaia tip, Gecko v1.3 d275eebfae04 (good) <--- hg bisect --extend Results for Music, cold_load_time: median:619, mean:635, std: 60, max:907, min:591 Gaia tip, Gecko v1.3 f97aabf5964f (good) Results for Music, cold_load_time: median:620, mean:630, std: 50, max:895, min:603 Gaia tip, Gecko v1.3 31742477de4d (bad) Results for Music, cold_load_time: median:776, mean:785, std: 34, max:958, min:758 Gaia tip, Gecko v1.3 9394e6901453 (bad) Results for Music, cold_load_time: median:779, mean:775, std: 37, max:874, min:607
I started the bisect with: hg bisect -g 34dddf6f7ec1 hg bisect -b 81bced59e8b3
When I look at the diff (hg diff --rev 9394e6901453^:9394e6901453) I don't see what the problem is. There's no apparent change that would affect the execution path of the Music app. Hrm...
Wow, that's really strange. Just to sanity check, I re-ran the b2gperf test on 9394e6901453 and the revision before that. Here's what I found: f97aabf5964f (this is the revision before 9394e6901453) Results for Music, cold_load_time: median:618, mean:643, std: 67, max:883, min:597 9394e6901453 Results for Music, cold_load_time: median:778, mean:778, std: 12, max:806, min:758 So it is definitely the addition of the gfxAndroidPlatform::OptimizeImage(gfxImageSurface *aSurface, gfxImageFormat format) method by :mwu that caused the regression. I'm investigating why. My hunch is that the album art loading uses this method. It is a virtual method and probably overloads either an empty base class method or a simpler base class method.
Hmmm that doesn't make sense. That patch prevents us from allocating and copying an image when there would be no change to the content or format of the image.
Depends on: 972075
No longer depends on: 972075
Depends on: 972075
No longer depends on: 972075
The datazilla numbers are stable again in the mid 700's and close to the historical numbers. I still don't have an answer for the reason for the change in numbers between the two revisions I found in the bisect. The current numbers I'm seeing are: Results for Music, cold_load_time: median:795, mean:790, std: 30, max:902, min:732 So this problem still exists. I'll take another look at it and see if I can't figure out what changed.
I just re-ran the whole regression over the weekend and found similar results to what I found before: Gecko: 34dddf6f7ec1 Gaia: b98e79e Results for Music, cold_load_time: median:627, mean:657, std: 117, max:1243, min:607 Gecko: 34dddf6f7ec1 Gaia: ac02962 Results for Music, cold_load_time: median:626, mean:694, std: 159, max:1242, min:608 Gecko: 81bced59e8b3 Gaia: b98e79e Results for Music, cold_load_time: median:794, mean:873, std: 253, max:1812, min:774 Gecko: 81bced59e8b3 Gaia: ac02962 Results for Music, cold_load_time: median:793, mean:832, std: 183, max:1816, min:770 The regression is in Gecko. hg update -C 81bced59e8b3 hg bisect --bad hg update -C 34dddf6f7ec1 hg bisect --good b6ac5d30404d Results for Music, cold_load_time: median:797, mean:872, std: 254, max:1813, min:691 hg bisect --bad 30e1a793fbf9 Results for Music, cold_load_time: median:621, mean:697, std: 173, max:1341, min:606 hg bisect --good 3a264db9f9e7 Results for Music, cold_load_time: median:784, mean:821, std: 123, max:1254, min:625 hg bisect --bad 01fb70a7cdbf Results for Music, cold_load_time: median:626, mean:681, std: 178, max:1356, min:614 hg bisect --good c491d6b2ee25 Results for Music, cold_load_time: median:781, mean:817, std: 114, max:1241, min:745 hg bisect --bad 4c35eea82384 Results for Music, cold_load_time: median:783, mean:824, std: 131, max:1362, min:771 hg bisect --bad f8b39dc79dff Results for Music, cold_load_time: median:780, mean:784, std: 39, max:882, min:626 hg bisect --bad The first bad revision is: changeset: 163287:f8b39dc79dff parent: 163286:0420229d2b55 parent: 163240:34dddf6f7ec1 user: Wes Kocher <wkocher@mozilla.com> date: Mon Jan 13 20:55:10 2014 -0800 summary: Merge m-c to b2g-inbound Not all ancestors of this changeset have been checked. Use bisect --extend to continue the bisection from the common ancestor, f356b409d710. f356b409d710 Results for Music, cold_load_time: median:624, mean:684, std: 167, max:1284, min:609 hg bisect --good f97aabf5964f Results for Music, cold_load_time: median:630, mean:697, std: 177, max:1345, min:604 hg bisect --good 31742477de4d Results for Music, cold_load_time: median:780, mean:814, std: 121, max:1283, min:745 hg bisect --bad 9394e6901453 Results for Music, cold_load_time: median:781, mean:807, std: 137, max:1344, min:613 hg bisect --bad The first bad revision is: changeset: 163282:9394e6901453 user: Michael Wu <mwu@mozilla.com> date: Tue Jan 14 11:12:36 2014 +0800 summary: Bug 958008 - Avoid copying images on Android/Gonk when there's nothing to optimize, r=jrmuizel
Blocks: 958008
Component: Gaia::Music → Graphics
Product: Firefox OS → Core
Version: unspecified → 28 Branch
Backout?
(In reply to Andreas Gal :gal from comment #24) > Backout? I just flagged bug 958008 for a backout - https://bugzilla.mozilla.org/show_bug.cgi?id=958008#c17.
here's the revision that mercurial thought was the culprit: http://hg.mozilla.org/mozilla-central/rev/9394e6901453
So the backout doesn't fix the regresssion. I just re-ran the b2gperf Music test on revisions before and after the backout and there is no change. Here's what I did: 0. cloned mozilla-b2g28_v1_3 1. tested revision 06c332d6de68, the revision just before the backout and got: Results for Music, cold_load_time: median:765, mean:774, std: 32, max:901, min:753 2. tested revision 155d55074aaa, the tip as of a few hours ago and got: Results for Music, cold_load_time: median:760, mean:759, std: 44, max:899, min:569 As you can see the backout only reduced the mean by 13ms which is insignificant and probably just random variation. I really think that mercurial's bisect tool doesn't work well when there are multiple paths between a common ancestor and a later merge point. I suspect that when it is considering which revisions to test, it merges all branches into a single timeline ordered by commit date. That will give erroneous errors because each path is a separate, isolated series of changes. So doing binary search on all revisions ordered by date is only mostly right, most of the time. To solve this bug we can try the git bisect tool on the git tree. Or we need to take a close look at the DAG for the regression range and see if we can debug mercurial to see where it chooses the wrong revisions during the bisect.
See comment 28. I ran some b2gperf tests yesterday after the back-out and found that it doesn't fix the regression. I really think the problem is mercurial's bisect tool here. I'm going to give this a try with the git bisect tool on the git tree and see if it comes up with the same result. That said, I think we should not do anything until I'm done with the git bisect.
Status: RESOLVED → REOPENED
Flags: needinfo?(jsmith)
Resolution: FIXED → ---
I'm also going to 1.4? this bug since it is a noticeable regression in launch performance.
blocking-b2g: --- → 1.4?
Okay, sounds good.
Flags: needinfo?(jsmith)
Dave, you have to do the builds yourself, we don't have b2g-inbound builds in this range?
Target Milestone: mozilla31 → ---
This really should block 1.3, not 1.4, because this regression happened in 1.3 timeframe. So moving to 1.3+.
blocking-b2g: 1.4? → 1.3+
Component: Graphics → Performance
Product: Core → Firefox OS
Version: 28 Branch → unspecified
Dave, see Milan's comment 32.
Flags: needinfo?(dhuseby)
Whiteboard: [c=progress p= s= u=] [b2gperf_regression] → [c=progress p= s= u=1.3] [b2gperf_regression]
So this is distressing. I think I have conclusive evidence that the topological sorting used in mercurial's bisect tool doesn't handle extending to a common ancestor correctly. I just re-ran the bisect using the good revision (34dddf6f7ec1) and the common ancestor (f356b409d710) identified in the previous bisect. The results are different. Here's the bisect log: hg update -C 81bced59e8b3 hg bisect --bad hg update -C f356b409d710 hg bisect --good 163401:53da7ae1a45f Results for Music, cold_load_time: median:592, mean:674, std: 171, max:1253, min:562 hg bisect --good 163355:f2089d89c778 Results for Music, cold_load_time: median:750, mean:781, std: 265, max:1749, min:626 hg bisect --bad 163333:f7fb8188c0ef Results for Music, cold_load_time: median:594, mean:626, std: 127, max:1266, min:578 hg bisect --good 163344:21977186a78e Results for Music, cold_load_time: median:594, mean:670, std: 173, max:1276, min:574 hg bisect --good 163349:b9a96956264c Results for Music, cold_load_time: median:651, mean:733, std: 167, max:1336, min:616 hg bisect --bad 163346:401b8593a881 Results for Music, cold_load_time: median:737, mean:716, std: 171, max:1312, min:568 hg bisect --bad 163345:705f43739079 Results for Music, cold_load_time: median:604, mean:687, std: 170, max:1271, min:569 hg bisect --good The first bad revision is: changeset: 163346:401b8593a881 user: Rick Eyre <rick.eyre@hotmail.com> date: Thu Dec 12 11:27:30 2013 -0500 summary: Bug 865407 - Part 4: Expose TextTrackCue::DisplayState to chrome JS. r=rillian, r=bz
Flags: needinfo?(dhuseby)
(In reply to Milan Sreckovic [:milan] from comment #32) > Dave, you have to do the builds yourself, we don't have b2g-inbound builds > in this range? I'm doing the bisecting on mozilla-central.
To install the correct versions of marionette and gaia-test, use this file and run: sudo pip install -Ir 960224-requirements.txt
running the bisect on the gecko.git repo gave me this: git bisect start git bisect bad 073cad9 git bisect good de8f18e 2213fc4 Results for Music, cold_load_time: median:598, mean:669, std: 177, max:1311, min:558 git bisect good 06f8cc3 Results for Music, cold_load_time: median:754, mean:800, std: 274, max:1836, min:634 git bisect bad a693e40 Results for Music, cold_load_time: median:592, mean:645, std: 174, max:1293, min:562 git bisect good 70ab9a1 Results for Music, cold_load_time: median:686, mean:726, std: 154, max:1257, min:613 git bisect bad 693f8fd Results for Music, cold_load_time: median:600, mean:682, std: 172, max:1255, min:567 git bisect good e345f14 Results for Music, cold_load_time: median:593, mean:684, std: 180, max:1350, min:572 git bisect good 56d2e24 Results for Music, cold_load_time: median:605, mean:698, std: 176, max:1342, min:569 git bisect good 70ab9a1568bac93b8b98b551e9dbe7c9e40d38ba is the first bad commit commit 70ab9a1568bac93b8b98b551e9dbe7c9e40d38ba Author: Rick Eyre <rick.eyre@hotmail.com> Date: Thu Dec 12 12:02:17 2013 -0500 Bug 865407 - Part 5: Move update display of cues code to TextTrackManager. r=bz, r=rillian :040000 040000 e6a44bbab150ef59298d5343c636748227a9ba08 745c34398b74ab555a5137e4471e2132c8536d76 M content :040000 040000 5c58dc3e0326bb156d338d92a80237aed4c092b7 5685e444901b47f8956517b01781337c3fc13884 M layout
So I'm pretty sure we can pinpoint the regression to the Part 4/Part 5 revisions from the Rick Eyre. This looks like another large change that is going to have to be picked apart to pinpoint the regression. Rick, can you help me break this up into discrete chunks that we can test separately to determine which of them is causing the 50 ms regression in Music launch times?
Flags: needinfo?(rick.eyre)
Sure. There are a few main things that are being added in Patch 5. 1) We're adding a new static ParserWrapper (sParserWrapper) to the TextTrackManager class. This might add a few ms possibly when the first TextTrackManager class is loaded. 2) We're aggregating all the TextTrackCues into one big nsTArray via copying them. The happens when GetAllActiveCues() is called in UpdateCueDisplay(). 3) The last one is that we're passing the TextTrackCues to JS within UpdateActiveCues(). I don't know who much impact that would have on performance, though. Also, that only happens if you actually have some TrackElements or have added tracks via JS on the web page. So this might not be the issue you're seeing. Roc has asked me to load the TextTrackManager lazily in bug 987076. So that will definitely help this since it won't be loaded for the majority of pages.
Flags: needinfo?(rick.eyre)
Rick, we need bug 987076 fixed or we have to back out the regressing patch here (or disable loading TextTrackManager). How complex is it to fix bug 987076?
(In reply to Andreas Gal :gal from comment #41) > Rick, we need bug 987076 fixed or we have to back out the regressing patch > here (or disable loading TextTrackManager). How complex is it to fix bug > 987076? I'm working on that right now. It shouldn't be too hard if the approach I'm using right now works. I'll get back ASAP with a patch. Should be in the next hour or so.
Awesome. Thanks!
No problem :). I've submitted a patch to bug 987076. Hopefully, it clears up these issues.
Let me run the numbers with the bug 987076 patch applied and see where the numbers are. I'll do that today.
Attached patch Bug_987076_2.patch (obsolete) — Splinter Review
backported m-c patch from Bug 987076 to b2g_28_v1_3
After applying the patch and running the b2gperf test, the numbers look like the regression is fixed: Results for Music, cold_load_time: median:568, mean:582, std: 36, max:733, min:556 I recommend that we apply the Bug 987076 patch to b2g_28_v1_3. The patch doesn't apply cleanly, there's one small manual change necessary, see attachment 8396874 [details] [diff] [review].
Flags: needinfo?(rick.eyre)
Looks good to me.
Flags: needinfo?(rick.eyre)
alright, I'll land it.
Still needs formal approval first ;) Also, don't forget to push to Aurora (v1.4) as well.
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Target Milestone: --- → 1.4 S4 (28mar)
This is a backport of the patch from Bug 987076 [Approval Request Comment] Bug caused by (feature/regressing bug #): 960224 also see 987076 User impact if declined: 50ms launch regression in Music app Testing completed: Risk to taking this patch (and alternatives if risky): String or UUID changes made by this patch:
Attachment #8396874 - Attachment is obsolete: true
Attachment #8397227 - Flags: review?(roc)
Attachment #8397227 - Flags: approval-mozilla-b2g28?
(In reply to Dave Huseby [:huseby] from comment #51) > Created attachment 8397227 [details] [diff] [review] > Bug_960224.patch lazy load HTMLMediaElement::mTextTrackManager > > This is a backport of the patch from Bug 987076 > NI Dae to help get all the needed comments inline. Dave, can you please follow the recently build guidelines linked here : https://wiki.mozilla.org/Release_Management/Uplift_rules#Guidelines_on_approval_comments and give information on testing/risk requested below ? Thanks ! > [Approval Request Comment] > Bug caused by (feature/regressing bug #): 960224 also see 987076 > User impact if declined: 50ms launch regression in Music app > Testing completed: > Risk to taking this patch (and alternatives if risky): > String or UUID changes made by this patch:
Flags: needinfo?(dhuseby)
Whiteboard: [c=progress p= s= u=1.3] [b2gperf_regression] → [c=progress p=5 s= u=1.3] [b2gperf_regression]
Comment on attachment 8397227 [details] [diff] [review] Bug_960224.patch lazy load HTMLMediaElement::mTextTrackManager [Approval Request Comment] Bug caused by (feature/regressing bug #): 960224 also see 987076 User impact if declined: 50ms regression in cold launch time for Music app. Testing completed: Both automated (try server) and manual testing. Risk to taking this patch (and alternatives if risky): Low as the it only shifts an allocation to happen at the time of first use rather than at time of initialization. String or UUID changes made by this patch: none
Flags: needinfo?(dhuseby)
done. see comment 54.
Flags: needinfo?(bbajaj)
Comment on attachment 8397227 [details] [diff] [review] Bug_960224.patch lazy load HTMLMediaElement::mTextTrackManager Thanks Dave ! Please make sure to have an eye on datazilla once this lands to verify its indeed fixed.
Attachment #8397227 - Flags: approval-mozilla-b2g28? → approval-mozilla-b2g28+
Flags: needinfo?(bbajaj)
Leaving a NI on :ryanvm to confirm this is backported on affected downstream branches with a=bajaj.
Flags: needinfo?(ryanvm)
uh, checkin needed? added.
Keywords: checkin-needed
Depends on: 987076
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: