Closed Bug 915220 Opened 12 years ago Closed 12 years ago

FxOS: Clock 3s datazilla cold launch regression on v1-train

Tracking

(Not tracked)

Status:

RESOLVED WONTFIX

People

(Reporter: mlee, Unassigned)

References

Details

(Keywords: perf, regression, Whiteboard: [c=automation p=5 s=2013.10.04 u=][b2gperf_regression])

Attachments

(3 files)

FxOS Clock 3s Regression on v1-train.png 12 years ago Mike Lee [:mlee] 566.62 KB, image/png		Details
Profile of slow clock launch 12 years ago Ben Kelly [:bkelly, not reviewing] 1.36 MB, text/plain		Details
Profile of fast Clock launch 12 years ago Ben Kelly [:bkelly, not reviewing] 2.28 MB, text/plain		Details

Mike Lee [:mlee]

Reporter

Description

•

12 years ago

Attached image FxOS Clock 3s Regression on v1-train.png — Details

The FxOs Clock app is showing a 3s regression in startup time on v1-train in datazilla. https://datazilla.mozilla.org/b2g/?branch=v1-train&device=unagi&range=30&test=cold_load_time&app_list=browser,calendar,camera,clock,contacts,email%20FTU,fm_radio,gallery,messages,music,phone,settings,usage,video&app=browser&gaia_rev=52d4e499261a0a27&gecko_rev=700373b45b3478fc

Ben Kelly [:bkelly, not reviewing]

Comment 1

•

12 years ago

From the datazilla revisions this appears to have been caused by a gecko change. Looks like one of the following: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=0d9469252ef22f6b4b4bdbe7a7251618f8e65bc8 http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=42dc165a50260a9a7760888d95bf46ad805e8917 http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=d837833998e60b8109d1f41bcf68aeb4aaeff56f http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=114f90a483f7bd2c57921936a0617f6c62183b5d http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=ab12e80c870c9f9b422bb1d889605c8e4dca3f5d http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=4b468358c46dbc3b8b39dceac09a59e9026d8713 I guess its possible there was a gonk change in there as well, but I don't think datazilla reports that information.

Ben Kelly [:bkelly, not reviewing]

Comment 2

•

12 years ago

Check out the cool upward staircase pattern in the replicate times. Seems like each replicate run is leaking some resource which is slowing down the next run.

Mike Lee [:mlee]

Reporter

Comment 3

•

12 years ago

Kevin, need you take this one. What's causing this?

Assignee: nobody → kgrandon

Status: NEW → ASSIGNED

Mike Lee [:mlee]

Reporter

Updated

•

12 years ago

Component: General → Gaia::Clock

OS: Mac OS X → Gonk (Firefox OS)

Product: Testing → Boot2Gecko

Hardware: x86 → ARM

Dietrich Ayala (:dietrich)

Updated

•

12 years ago

Blocks: 915068

Ben Kelly [:bkelly, not reviewing]

Comment 4

•

12 years ago

Reproduced on my phone. Bisecting gecko now.

Assignee: kgrandon → bkelly

Ben Kelly [:bkelly, not reviewing]

Comment 5

•

12 years ago

Bisection shows this is caused by: http://hg.mozilla.org/releases/mozilla-b2g18/rev/e85db8ff0a7c

Blocks: 900221

[:fabrice] Fabrice Desré

Comment 6

•

12 years ago

Does that happen only for the clock app?

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 7

•

12 years ago

I wonder if purging the expired weak references on message dispatch is the problem here.

Ben Kelly [:bkelly, not reviewing]

Comment 8

•

12 years ago

(In reply to Fabrice Desré [:fabrice] from comment #6) > Does that happen only for the clock app? Datazilla suggests that it is only affecting the clock app, but that could be deceptive. The automation behind datazilla effectively listens for the mozbrowserloadend event and then kills the app. The conditions triggering the leak are occurring during this time window for only clock app. Its still possible that other apps hit this condition later in their processing.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 9

•

12 years ago

You keep referring to a leak ... what are we leaking?

Ben Kelly [:bkelly, not reviewing]

Comment 10

•

12 years ago

(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #9) > You keep referring to a leak ... what are we leaking? Good point. I don't really know its a leak. I was jumping to that conclusion since the failure mode is that the launch time gets progressively slower on each run.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 11

•

12 years ago

Can you get this in a profiler? http://mxr.mozilla.org/mozilla-central/source/content/base/src/nsFrameMessageManager.cpp#764 is the most suspicious bit of that set of patches IMO. Alternatively you could just remove line 764 and see what happens.

Ben Kelly [:bkelly, not reviewing]

Comment 12

•

12 years ago

Update: Removing the line suggest in comment 11 unfortunately did not help. Also, when I tried to run the profiler I ran into some other problems. First, it looks like this is another case where the regression only occurs when run from b2gperf. If I manually launch and kill clock repeatedly I do not see a ramp up in launch times. While on the face of it this seems similar to bug 892965, it feels a bit different. There we had a semi-constant time penalty due to conflicting with the launch of the pre-allocated process. Here we have a pretty clear ramp up in time. So I think this is probably a separate issue. The other fun part is that I can't seem to get b2gperf to run with the profiler. I'm not sure what the issue is here. I tried pausing b2gperf immediately after it relaunches b2g to enable profiling, but it seems like marionette fails. Maybe I can hack b2gperf to enable profiling when it does the relaunch.

Ben Kelly [:bkelly, not reviewing]

Comment 13

•

12 years ago

Attached file Profile of slow clock launch — Details

There is a nice large CPU spike in this profile. Unfortunately the symbols don't seem to a lot of sense to me. This is for mozilla-b2g18 at e85db8ff0a7c.

Ben Kelly [:bkelly, not reviewing]

Comment 14

•

12 years ago

Oh, and the way to profile with b2gperf is to use the --no-restart flag. So: b2gperf --delay=5 --no-restart Clock

Jason Smith [:jsmith]

Updated

•

12 years ago

blocking-b2g: --- → koi?

Ben Kelly [:bkelly, not reviewing]

Comment 15

•

12 years ago

In regards to the koi? nom, I was unsure about this myself since so far we only see the problem on b2g18. It has not manifested on mozilla-central. Hopefully if we can track down the underlying cause we can determine if we are at risk there as well.

Jason Smith [:jsmith]

Comment 16

•

12 years ago

Okay, I'll unnom until we confirm if it's going to reproduce on central as well.

blocking-b2g: koi? → ---

Ben Kelly [:bkelly, not reviewing]

Comment 17

•

12 years ago

Attached file Profile of fast Clock launch — Details

This profile shows a fast Clock launch using rev 119884:e85db8ff0a7c. From talking with BenWa, it sounds like the large "spike" I referred to previously is not necessarily a problem. It may be perfectly normal for there to be a deep call stack there. Comparing this profile to attachment 804638 [details], however, does show that some additional top level work in XPCOMUtils_QueryInterface() is performed in the slow case. Its unclear to me why this is showing up at the top level of the call stack at this point. Is it getting run from a finalizer?

Ben Kelly [:bkelly, not reviewing]

Comment 18

•

12 years ago

Also, in terms of why this might be showing up on b2g18, but not mozilla-central, it appears that more code was landed on m-c. In particular, bug 889984 attachment 772300 [details] [diff] [review] has some additions to DOMRequestHelper.jsm that are not present in b2g18. The comments from jlebar suggest that these changes are not needed any more, but perhaps they are masking a problem with the addWeakMessageListener() approach. I may try backing them out on m-c on the off chance I can reproduce this there. If I could, then the improved profiler may help track down the issue.

Mike Lee [:mlee]

Reporter

Updated

•

12 years ago

Whiteboard: [c= p= s= u=] → [c=automation p= s= u=]

Ben Kelly [:bkelly, not reviewing]

Comment 19

•

12 years ago

Nom'ing for leo to determine if there is anything we can do at this point. Should we continue investigating this? My opinion is that bug 900221 should be backed out on mozilla-b2g18 if we can. It clearly is causing a regression here, even if we don't understand what it is exactly. Also, the vendor reported problems in bug 900221 comment 22 where the device screen sometimes would not turn back on from sleep mode. I believe I have seen this after the clock slow down as well. If the mozilla-b2g18 ship has sailed, should this be marked as WONTFIX? Kyle, what do you think?

blocking-b2g: --- → leo?

Flags: needinfo?(khuey)

Ben Kelly [:bkelly, not reviewing]

Updated

•

12 years ago

Whiteboard: [c=automation p= s= u=] → [c=automation p=5 s= u=]

Ben Kelly [:bkelly, not reviewing]

Comment 20

•

12 years ago

Unassigning myself until release management can make a call about how to proceed.

Assignee: bkelly → nobody

Status: ASSIGNED → NEW

Dietrich Ayala (:dietrich)

Comment 21

•

12 years ago

Preet, if no OEMs are requiring this, and Leo is already shipped, what options do we have here?

Flags: needinfo?(praghunath)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 22

•

12 years ago

Well, if leo has sailed and this doesn't appear on trunk this then is WONTFIX. If leo hasn't sailed, then backing out bug 900221 is an option. It would be unfortunate to do that without understanding the regression, since bug 900221 (and bug 889984) fixed some pretty serious memory leaks, so we're going to be trading one performance problem for another.

Flags: needinfo?(khuey)

Preeti Raghunath(:Preeti)

Updated

•

12 years ago

Status: NEW → RESOLVED

Closed: 12 years ago

Flags: needinfo?(praghunath)

Resolution: --- → WONTFIX

Mike Lee [:mlee]

Reporter

Updated

•

12 years ago

Whiteboard: [c=automation p=5 s= u=] → [c=automation p=5 s=2013.10.04 u=]

Jason Smith [:jsmith]

Comment 23

•

12 years ago

Clearing nom because this has been wont fixed.

blocking-b2g: leo? → ---

Ben Kelly [:bkelly, not reviewing]

Updated

•

12 years ago

Whiteboard: [c=automation p=5 s=2013.10.04 u=] → [c=automation p=5 s=2013.10.04 u=][b2gperf_regression]

You need to log in before you can comment on or make changes to this bug.