Closed Bug 1132515 Opened 9 years ago Closed 9 years ago

Several applications regressing performance on 1/27/15

Categories

(Firefox OS Graveyard :: Performance, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(tracking-b2g:-, b2g-v2.2 affected, b2g-master affected)

RESOLVED WORKSFORME
tracking-b2g -
Tracking Status
b2g-v2.2 --- affected
b2g-master --- affected

People

(Reporter: Eli, Unassigned)

References

Details

(Keywords: perf, regression)

Attachments

(1 file)

13.49 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Details
Keywords: perf
See Also: → 1131893
Testing music app performance (moz-app-visually-complete) against the following known configurations:

light reference workload

Gaia last-known good (Gaia-G): 0f662dffef275994
Gecko last-known good (Gecko-G): cf429c0e9481
Gaia first-known bad (Gaia-B): 1d53fb0798429825
Gecko first-known bad (Gecko-B): ebb0f3cc77ac

--

Gaia-G, Gecko-G: 1221ms
Gaia-B, Gecko-G: 1183ms
Gaia-G, Gecko-B: 1460ms
Gaia-B, Gecko-B: 1432ms

---

This appears to be a regression in Gecko, which makes sense since this does affect several applications. Regression exists between cf429c0e9481 and ebb0f3cc77ac.
blocking-b2g: --- → 2.2+
Component: General → Performance
Blocks: AppStartup
QA Contact: bzumwalt
As was brought to my attention, the previously mentioned Gecko commits were git commits. Here are the hg commits:

Gecko last-known-good: ec65b21bb446
Gecko first-known-bad: 7cefd0f40212
Any update here?
As per XChat and discussion with Kevin, software issues are holding us back from doing a proper gecko bisect. Could you take a look at this Naoki?
Flags: needinfo?(nhirata.bugzilla)
QA Contact: bzumwalt
I won't be able to look at it today; will try to get to it soon.
(In reply to :Eli Perelman from comment #2)
> As was brought to my attention, the previously mentioned Gecko commits were
> git commits. Here are the hg commits:
> 
> Gecko last-known-good: ec65b21bb446
> Gecko first-known-bad: 7cefd0f40212

Gecko last-known-good: ec65b21bb446 = mozilla-central revision 225587 
Gecko first-known-bad: 7cefd0f40212 = mozilla-central revision 226017
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=ec65b21bb446&tochange=7cefd0f40212

It's still a wide range, I think we want to try to do a bisect on this?
Flags: needinfo?(nhirata.bugzilla)
Edward is working on bisection.
QA Contact: edchen
As bisection from Edward, we could reduce range in http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=c0f88b376e33&tochange=c18776175a69

Could we have more investigation in this range?

--- Good ---
Build ID               20150124010209
Gaia Revision          987c76c002a716a6787bc0576f0dd0f25fae25e4
Gaia Date              2015-01-07 06:20:58
Gecko Revision         https://hg.mozilla.org/mozilla-central/rev/c0f88b376e33
Gecko Version          38.0a1
Device Name            flame
Firmware(Release)      4.4.2
Firmware(Incremental)  eng.cltbld.20150124.045030
Firmware Date          Sat Jan 24 04:50:41 EST 2015
Bootloader             L1TC100118D0

--- Bad ----

Serial: f0484002 (State: device)
Build ID               20150125010227
Gaia Revision          987c76c002a716a6787bc0576f0dd0f25fae25e4
Gaia Date              2015-01-07 06:20:58
Gecko Revision         https://hg.mozilla.org/mozilla-central/rev/c18776175a69
Gecko Version          38.0a1
Device Name            flame
Firmware(Release)      4.4.2
Firmware(Incremental)  eng.cltbld.20150125.042522
Firmware Date          Sun Jan 25 04:25:33 EST 2015
Bootloader             L1TC100118D0
Flags: needinfo?(tlee)
Flags: needinfo?(nhirata.bugzilla)
**** Please ignore comment 9. ****

Following is correction. http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=fa91879c8428&tochange=38e4719e71af

--- Good ---

Build ID               20150126010231
Gaia Revision          987c76c002a716a6787bc0576f0dd0f25fae25e4
Gaia Date              2015-01-07 06:20:58
Gecko Revision         https://hg.mozilla.org/mozilla-central/rev/fa91879c8428
Gecko Version          38.0a1
Device Name            flame
Firmware(Release)      4.4.2
Firmware(Incremental)  eng.cltbld.20150126.043643
Firmware Date          Mon Jan 26 04:36:54 EST 2015
Bootloader             L1TC100118D0

--- Bad ---

Build ID               20150126160233
Gaia Revision          987c76c002a716a6787bc0576f0dd0f25fae25e4
Gaia Date              2015-01-07 06:20:58
Gecko Revision         https://hg.mozilla.org/mozilla-central/rev/38e4719e71af
Gecko Version          38.0a1
Device Name            flame
Firmware(Release)      4.4.2
Firmware(Incremental)  eng.cltbld.20150126.192457
Firmware Date          Mon Jan 26 19:25:08 EST 2015
Bootloader             L1TC100118D0
Thanks, Bobby.  I'll try to use hg bisect to narrow even further.
Report: http://goo.gl/ShZvnV
Here is bisection report based on inbound build (http://goo.gl/JJGFfx). We found 4 apps (music, gallery, contacts and sms) are affected between Gecko c3a90afa2dee and 38e4719e71af. Please refer following link for Gecko code landed.

http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=c3a90afa2dee&tochange=38e4719e71af
Thanks again.  I was blocked because I couldn't get the perf test to work on my linux box.  I found that it works on my mac, so I'll be working on this and hopefully will get results soon.
See Also: → 1140987
(In reply to Bobby Chien [:bchien] from comment #13)
> Report: http://goo.gl/ShZvnV
> Here is bisection report based on inbound build (http://goo.gl/JJGFfx). We
> found 4 apps (music, gallery, contacts and sms) are affected between Gecko
> c3a90afa2dee and 38e4719e71af. Please refer following link for Gecko code
> landed.
> 
> http://hg.mozilla.org/mozilla-central/
> pushloghtml?fromchange=c3a90afa2dee&tochange=38e4719e71af

Who will find out the exact changeset? We need to go to custom builds now. Nothing stands out in the regression range.
Flags: needinfo?(bchien)
(In reply to Gregor Wagner [:gwagner] from comment #15)
> (In reply to Bobby Chien [:bchien] from comment #13)
> > Report: http://goo.gl/ShZvnV
> > Here is bisection report based on inbound build (http://goo.gl/JJGFfx). We
> > found 4 apps (music, gallery, contacts and sms) are affected between Gecko
> > c3a90afa2dee and 38e4719e71af. Please refer following link for Gecko code
> > landed.
> > 
> > http://hg.mozilla.org/mozilla-central/
> > pushloghtml?fromchange=c3a90afa2dee&tochange=38e4719e71af
> 
> Who will find out the exact changeset? We need to go to custom builds now.
> Nothing stands out in the regression range.

Taipei QA is working on bisect the changeset now. Need more time to find out problem.

Gregor, what do you mean that nothing stands out in regression range? as report http://goo.gl/ShZvnV in comment 13. The data shows something happen in this range. Could you elaborate more?
Flags: needinfo?(tlee)
Flags: needinfo?(nhirata.bugzilla)
Flags: needinfo?(bchien)
Flags: needinfo?(anygregor)
(In reply to Bobby Chien [:bchien] from comment #16)
> Taipei QA is working on bisect the changeset now. Need more time to find out
> problem.
> 
> Gregor, what do you mean that nothing stands out in regression range? as
> report http://goo.gl/ShZvnV in comment 13. The data shows something happen
> in this range. Could you elaborate more?

I mean its not obvious what changeset caused the regression within this window.
Sometimes you have a hint by just looking at the range but I can't see anything in there
that looks suspicious.
Flags: needinfo?(anygregor)
(In reply to Bobby Chien [:bchien] from comment #13)
> Here is bisection report based on inbound build (http://goo.gl/JJGFfx). […]
> 
> http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=c3a90afa2dee&tochange=38e4719e71af

I just got an email which mention backing out my changes.  Bug 1112162, Bug 1112156, and 1118344 are unlikely to be related.  So far we have no support of SIMD on ARM, and we haven't change the way we align the stack on ARM.  So I highly doubt these would be responsible for such a regression.

If a better regression range highlight these patches more than other, I will have a deeper look into it.  Unfortunately backing out these patches is unlikely to leave mozilla-inbound in a greenish shape :/
[Responses from Edward Chen]

The final report is finished, please click the link below.

http://bit.ly/1GhUuMO

Unfortunately we cannot get any probably problem in this testing, according to this experience we provided to suggestions as next discussion topic. 

1. If possible, every Changeset should be ran a round performance testing before merge to inbound.
2. Too many combination caused difficult testing to we have to fix Gaia version.  We think that each testing must sync newest Gaia or Gecko, otherwise the testing result might has a bits unreasonable.
Bobby, please remember that Dialer has the the most stable performance profile (and it also shows the same regression), so maybe it makes sense to use the Dialer when doing such comparison?
(In reply to Julien Wajsberg [:julienw] from comment #21)
> Bobby, please remember that Dialer has the the most stable performance
> profile (and it also shows the same regression), so maybe it makes sense to
> use the Dialer when doing such comparison?

Thanks, Julien. I found datazill didn't display gecko version of dailer. Let's have some research.
As discussion with QA team, we have no findings in bisection (comment 20). We used same Flame configurations (master and v2.2, 319M, and light-workload) in both lab. But we can't bisect same regression issue from our local machine. For now, we won't continue effort to track issue until new critical regression issue appear. 

As results in Mar 24 from MT lab, master and v2.2 shows similar number on Flame. However, TPE lab shows some apps are higher than expectation. It is strange. Results attached in following.

Please let us know if there is any clue we can continue work.

-----
Configurations:
- Flame with 319M 
- Gaia: reference-workload-light

Current Results (Mar. 24):
Datazilla           Contacts  Gallery   Music     SMS
Flame-master    [1]      860     1009    1312     1392
Flame-v2.2 (MT) [2]     1066     1008    1262     1390
Flame-v2.2 (TPE)[3]     1247     2439    3053     1295
Flame-v2.1               920      992     963     1301

Datazilla Reference Links:
[1] http://goo.gl/DX2ndS
[2] http://goo.gl/V06e3v
[3] http://goo.gl/Sw7427
blocking-b2g: 2.2+ → ---
(In reply to Bobby Chien [:bchien] from comment #22)
> (In reply to Julien Wajsberg [:julienw] from comment #21)
> > Bobby, please remember that Dialer has the the most stable performance
> > profile (and it also shows the same regression), so maybe it makes sense to
> > use the Dialer when doing such comparison?
> 
> Thanks, Julien. I found datazill didn't display gecko version of dailer.
> Let's have some research.

Actually, it does, but it's "hidden" because the name of the app is too long (sigh). Because it includes "communications/".

Same issue with contacts and yet you included it in the report :)

The easiest to see the information in datazilla is to open the devtools console and use this command:

  document.getElementById('app_replicate_gecko_revision')

You'll see that the "href" attribute links to hg, and both title and textContent properties have the gecko version.
(In reply to Bobby Chien [:bchien] from comment #23)
> As discussion with QA team, we have no findings in bisection (comment 20).
> We used same Flame configurations (master and v2.2, 319M, and
> light-workload) in both lab. But we can't bisect same regression issue from
> our local machine. For now, we won't continue effort to track issue until
> new critical regression issue appear. 


I'm very concerned about this.
It's obvious there is a global regression on January 27th. That we can't find the cause is deeply concerning :/

We _do_ see improvements in master in previous days. We're nearly at the same level than before January 27th (again, using the Dialer measurements, it's easy to tell). But we don't really know if the initial regression is gone, or if it's due to new improvements.
I do not understand how we don't have resolution on this. We have a changeset where performance is lower, and a changeset where performance is higher. We should be able be to git bisect or hg bisect, flash each Gecko keeping Gaia steady, run test-perf against Music app as in comment 1 and find the regressing commit. Am I missing something as to why this isn't done?
(In reply to :Eli Perelman from comment #26)
> I do not understand how we don't have resolution on this. We have a
> changeset where performance is lower, and a changeset where performance is
> higher. We should be able be to git bisect or hg bisect, flash each Gecko
> keeping Gaia steady, run test-perf against Music app as in comment 1 and
> find the regressing commit. Am I missing something as to why this isn't done?

Lack of time or prioritization? I've done this kind of bisections in the past a few times already but right now I've got my hands full of blockers and don't have time for this. I think that the lack of a dedicated performance team might be hurting us here.
(In reply to :Eli Perelman from comment #26)
> I do not understand how we don't have resolution on this. We have a
> changeset where performance is lower, and a changeset where performance is
> higher. We should be able be to git bisect or hg bisect, flash each Gecko
> keeping Gaia steady, run test-perf against Music app as in comment 1 and
> find the regressing commit. Am I missing something as to why this isn't done?

Eli, please help to review http://bit.ly/1GhUuMO first. QA did bisection per changeset, and the changeset range was based on Jenkins automation report (http://goo.gl/ShZvnV from comment 13).
So that means we clearly see that 5 tries is not enough to have a meaningful result.  Especially I don't know if you removed the first results which we know are always bad? 

Also we don't know if you uses average or median. We also don't know the standard deviation. :/
I did quick test based on 1/26 pvt build on my flame again. It could be reproducible. Video attached here: https://www.youtube.com/watch?v=r-nMxXpNXns. This result is same with comment 13.

Build location:
- Last good build: http://goo.gl/0bbwtA
- First bad build: http://goo.gl/IcQbPE
[Tracking Requested - why for this release]:
There is no further research data, and performance numbers are back to normal on master. Close this case as unconfirmed.
Status: NEW → UNCONFIRMED
Ever confirmed: false
Status: UNCONFIRMED → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: