Closed Bug 892965 Opened 6 years ago Closed 6 years ago

[b2g] datazilla shows cold boot times slower for m-c vs b2g18

Categories

(Firefox OS Graveyard :: General, defect, P1, major)

ARM
Gonk (Firefox OS)

Tracking

(blocking-b2g:koi+, firefox26 fixed, b2g-v1.2 fixed)

RESOLVED FIXED
1.2 C1(Sep27)
blocking-b2g koi+
Tracking Status
firefox26 --- fixed
b2g-v1.2 --- fixed

People

(Reporter: bkelly, Assigned: bkelly)

References

Details

(Keywords: perf, regression, Whiteboard: [c= u=1.2 p=5] QAregressexclude [b2gperf_regression])

Attachments

(4 files)

Switching datazilla from b2g18 to mozilla-central for "master" builds resulted in an across the board slow down for cold boot times.  Depending on the app, slow downs ranged from 100ms (FTU) to 500ms (contacts).

I don't see a way to link to a particular date range in datazilla, but look around July 11, 2013:

  https://datazilla.mozilla.org/b2g/

We need to investigate why load times are slower on mozilla-central.
Mike, how do you want to investigate this?
blocking-b2g: --- → koi+
Flags: needinfo?(mlee)
Dave,
You posted this to dev-b2g yesterday:

----- Original Message -----
> From: "Dave Hunt" <dhunt@mozilla.com>
> To: mozilla-dev-b2g@lists.mozilla.org
> Sent: Thursday, July 11, 2013 8:07:26 AM
> Subject: [b2g] We are now using mozilla-central builds on	https://datazilla.mozilla.org/b2g/
> 
> All,
> 
> You may notice a spike in load times on
> https://datazilla.mozilla.org/b2g/ for Unagi. This is because we are now
> using a mozilla-central build of Gecko with the latest Gaia master
> applied. Previously we were using a mozilla-b2g18 build with the latest
> Gaia master applied.

Why is this change a justification for the load time spike?

> The v1-train results are still available by changing the value of 'Gaia
> Branch' to v1-train.
> 
> Soon we will be making the same change for the performance tests run
> against the Inari device, so you can expect a similar spike there.

Again, why is a spike expected?

Thanks,
Mike
Severity: normal → major
Flags: needinfo?(mlee) → needinfo?(dave.hunt)
Priority: -- → P1
Whiteboard: [c= ]
(In reply to Mike Lee [:mlee] from comment #2)
> Dave,
> You posted this to dev-b2g yesterday:
> 
> ----- Original Message -----
> > From: "Dave Hunt" <dhunt@mozilla.com>
> > To: mozilla-dev-b2g@lists.mozilla.org
> > Sent: Thursday, July 11, 2013 8:07:26 AM
> > Subject: [b2g] We are now using mozilla-central builds on	https://datazilla.mozilla.org/b2g/
> > 
> > All,
> > 
> > You may notice a spike in load times on
> > https://datazilla.mozilla.org/b2g/ for Unagi. This is because we are now
> > using a mozilla-central build of Gecko with the latest Gaia master
> > applied. Previously we were using a mozilla-b2g18 build with the latest
> > Gaia master applied.
> 
> Why is this change a justification for the load time spike?

No justification, this was just an observation and an indication of what's changed.

> > The v1-train results are still available by changing the value of 'Gaia
> > Branch' to v1-train.
> > 
> > Soon we will be making the same change for the performance tests run
> > against the Inari device, so you can expect a similar spike there.
> 
> Again, why is a spike expected?

It's expected because a similar change on the Unagi builds caused an increase in load time. The exact cause certainly should be investigated.
Flags: needinfo?(dave.hunt)
I'll suppose I'll do some form of profiling for this. If you're reading this and were also interested in looking into this one, please don't let that stop you from doing so. Any help is appreciated :)
Assignee: nobody → kgrandon
I don't know if its relevant, but the variability from day-to-day is much higher for m-c vs b2g18.  I suppose that could just be the nature of more commits landing there, but it does seem a bit odd to me given the stability of the measurements prior to the switch.  Just an observation.
Status: NEW → ASSIGNED
Whiteboard: [c= ] → [c= u=1.2 p=4]
Kevin, any additional information on this issue?
Adding a screenshot of the regression because it might be hard to find on datazilla in the future.
The only thing I've really noticed in the m-c builds is the addition of IPDL calls. I would assume that these would be impacting performance a bit, but am not sure to what extent. I would love to find out why these are in the profile in m-c, but not b2g18.

Note: I'm using a m-c and b2g18 unagi build from 7/11 (The day after the regression appeared)
Gregor, Ben - Adding you guys to the CC in case you might know what would trigger these new IDPL calls in m-c. This build was taken before the settings migration to WebIDL (bug 889503), so I don't think that would be the case.

Are you familiar with any other APIs which may have migrated to WebIDL between b2g18 and m-c recently?

Thanks!
Well, IPDL != WebIDL, those are different animals. Can you link the b2g18 and mc cleopatra profiles here so I can take a peek?
(In reply to ben turner [:bent] from comment #11)
> Can you link the b2g18and mc cleopatra profiles here so I can take a peek?

Hi Ben, thanks for the info. Here are the profiles:

m-c: http://people.mozilla.com/~bgirard/cleopatra/#report=5410dcc18a9da616bc1675c7f33cd92af310728d
b2g18: http://people.mozilla.com/~bgirard/cleopatra/#report=35fd848dc726fb0ad7f5c97c48b4befce499c9da

I can also upload the .sym files if you would prefer those.
Flags: needinfo?(bent.mozilla)
I haven't had a chance to really look deeply here but the IPDL difference you're seeing is likely not relevant. The SPS profiler (on b2g, at least for right now) just shows pseudoframes, not actual stack frames. It relies on manual markers in C++ code to generate C++ frames and some JS engine smarts to generate JS frames. The fact that you're seeing IPDL messages on m-c is due to the fact that on trunk we added stack markers for all IPDL messages. The same code was called (probably) in b2g18 but the profiler just wasn't smart enough to generate a frame for it.
Flags: needinfo?(bent.mozilla)
(In reply to Kevin Grandon :kgrandon from comment #9)
> Note: I'm using a m-c and b2g18 unagi build from 7/11 (The day after the
> regression appeared)

It's worth noting that the increase appeared on this date because we switched from using mozilla-b2g18 to mozilla-central builds. The regression is likely to have already been introduced in mozilla-central before then.
Hi Gabriele - 

If you have a minute I was wondering if you could give me a hand and take a look at these profiles. There was a performance regression from the b2g18 to m-c branch changeover, and I was wondering if there was anything that was readily apparent to you here. Thanks! 

m-c: http://people.mozilla.com/~bgirard/cleopatra/#report=5410dcc18a9da616bc1675c7f33cd92af310728d
b2g18: http://people.mozilla.com/~bgirard/cleopatra/#report=35fd848dc726fb0ad7f5c97c48b4befce499c9da
Flags: needinfo?(gsvelto)
I've had a look at the profiles and the only thing that seems to stick out is IndexedDB related IPC. In the b2g18 profile it is completely absent while on the m-c profile this seems to account for almost all of the difference in startup time. Is the application making use of IndexedDB on master but not on the v1-train branch?
Flags: needinfo?(gsvelto)
Are the differences in times for Paint and Layout relevant?  It appears that in b2g18 we spend more time painting than performing layout.  In m-c the trend is reversed with painting accounting for less time than layout.

Also, I noticed that in the tail to the right, b2g18 seems to perform most of the layout up front and then perform continuous repainting (mostly).  In m-c the layout and painting is interleaved.  Perhaps we have a change that is adversely impacting scheduling?

Finally, in m-c there is a Layout style flush that occurs very late and then some final painting occurs.

Again, not sure if any of this is helpful.  Just some observations.
(In reply to Gabriele Svelto [:gsvelto] from comment #16)
> Is the application making use of IndexedDB on master but not
> on the v1-train branch?

There should be no difference in usage with IndexedDB between the two branches.
Hi Taras,

I was told you might be a good person to talked to regarding performance.

We are seeing a regression in load times from 18 -> 26. It's possible that this could be related to IndexedDB performance. Was wondering if you were aware of anything here.

Thanks!
Flags: needinfo?(taras.mozilla)
IndexedDB also gained some pseudostack markers on trunk, so I bet IDB is a red herring just like the IPDL markers.
(In reply to ben turner [:bent] from comment #20)
> IndexedDB also gained some pseudostack markers on trunk, so I bet IDB is a
> red herring just like the IPDL markers.

I see, that confused me as I knew the IPDL methods were now marked but not the IndexedDB stuff. Regarding the differences in paint & layout time pointed out by Ben in comment 17 those could be relevant but we need more than two data-points to be sure that the timing differences are not within the error margin.
(In reply to Kevin Grandon :kgrandon from comment #19)
> Hi Taras,
> 
> I was told you might be a good person to talked to regarding performance.
> 
> We are seeing a regression in load times from 18 -> 26. It's possible that
> this could be related to IndexedDB performance. Was wondering if you were
> aware of anything here.
> 
Mike's B2G perf team is a better fit for initial investigation  here.
Flags: needinfo?(taras.mozilla)
(In reply to Ben Kelly [:bkelly] from comment #17)
> Are the differences in times for Paint and Layout relevant?  

Looking over the profiles, it looks like we reduced elapsed time in Paint and Layout within the refresh driver interval:

b2g18: nsRefreshDriver::Notify 196 ms / 5.4%
m-c: nsRefreshDriver::Tick 111 ms / 4.8%

Are there other indicators that Rendering issues are at work here?
I intend to work this as my top priority once I finish my current bug.
Assignee: kgrandon → bkelly
Whiteboard: [c= u=1.2 p=4] → [c= u=1.2 p=5]
FYI, I'm now actively investigating this.  Sorry for the delay.
Hmm, my initial attempt to reproduce this on my buri has mixed results.  I went back to the following revisions around July 11, 2013 when this was reported:

  b2g18:  119850:58a4e9af311f
  m-c:    143399:fabc7deeeb21

Loading contacts app with 1000 contacts (reference-workload-heavy) I see the following cold_load_times:

  b2g18:  1118ms
  m-c:    1187ms

These are mean values over 10 app launches.  Median values were similar, so minimal skew.

So only ~70ms difference with this test.  I will continue investigating with other revisions.  Maybe I jumped too far back and missed the regression.  (That would be nice for narrowing the range!)
Its also possible this is related to gonk level changes.

Dave, do you know if datazilla/b2gperf devices have different gonk bits as well as different gecko?
Flags: needinfo?(dave.hunt)
I spoke to Clint and he thought b2gperf/datazilla was probably running a full ./flash.sh which would include the gonk bits.  Therefore I'm retesting with those changes as well to see if that will reproduce the regression.
Flags: needinfo?(dave.hunt)
Yes, we do run ./flash.sh before running the perf tests. This is run for every nightly build we get, and between those we simply flash gaia on top when there are changes in the repo.
Thanks Dave.  Can you describe where you get the nightly builds from?  Do you know how they are built?

I'm wondering if these are something different from a ./build.sh using our b2g repo.
Flags: needinfo?(dave.hunt)
Status update:

I cannot reproduce this on hamachi.  I switched to running the nightly builds just as datazilla uses and got the same gecko/gaia revisions.  The results I got were:

  b2g v1.0.1 (6565d) + gaia master (acf26): 657ms
  b2g v1.0.1 (6565d) + gaia master (2cff):  619ms
  m-c (58177) + gaia master (acf26):        591ms
  m-c (58177) + gaia master (2cff):         580ms

If anything, these results are the opposite of datazilla showing a slight improvement from both moving from b2g18 to m-c and also for moving gaia master from rev acf26 to rev 2cff.

The numbers above are again mean values calculated over 10 runs on each configuration.

I am going to switch to testing on an unagi that mlee provided to me.
Sorry, I forgot to mention in comment 33 that the times were measured for gallery, not contacts.  So they are not directly comparable to the values in comment 26.

I switched to gallery for testing since it was effected by the regression on datazilla, but generally provides more stable load time results.
Unagi gallery results:

  b2g18 (7e9bf5) + gaia master (da5424): 668ms
  b2g18 (7e9bf5) + gaia master (7d7355): 663ms
  m-c (545af1) + gaia master (da5424):   677ms
  m-c (545af1) + gaia master (7d7355):   690ms

Virtually no change.  I'm going to add back in b2gperf now.  Perhaps its an interaction with marionette.  Dave Hunt said he might also be able to try reproducing.
I've been able to replicate this again using the two builds it was originally reported from. This is using the gallery app with no reference workload. I installed the latest version of b2gperf (0.7) and ran the following commands after flashing:

$ adb forward tcp:2828 tcp:2828
$ b2gperf --delay=5 Gallery

LAST GOOD
Build: https://pvtbuilds.mozilla.org/pub/mozilla.org/b2g/nightly/mozilla-b2g18-unagi-eng/2013/07/2013-07-10-07-02-04/
Results: avg:614, max:720, min:538, all:584,576,586,606,590,613,617,674,631,647,662,627,597,708,612,548,666,587,538,673,580,650,688,573,720,579,549,578,538,632

FIRST BAD
Build: https://pvtbuilds.mozilla.org/pub/mozilla.org/b2g/nightly/mozilla-central-unagi-eng/2013/07/2013-07-11-03-02-02/
Results: avg:720, max:899, min:600, all:728,600,621,618,648,785,803,872,899,668,752,874,664,868,672,674,686,828,638,722,625,701,823,662,678,814,670,653,740,636

The average closely matches the results in DataZilla (604 vs 729)
(In reply to Dave Hunt (:davehunt) from comment #36)
> I've been able to replicate this again using the two builds it was
> originally reported from. This is using the gallery app with no reference
> workload. I installed the latest version of b2gperf (0.7) and ran the
> following commands after flashing:
> 
> $ adb forward tcp:2828 tcp:2828
> $ b2gperf --delay=5 Gallery

Did you flash a particular gaia/master on top of the images you linked?
Ok, reproduced using the same images Dave used with b2gperf.  Now I'm trying to understand how this differs from my previous tests.

I thought perhaps it was taking longer to clean up the closed app which was then interfering with the next run, but that does not appear to be the case.  Using a --delay=20 does not improve the numbers at all.  I observed with top -m 3 -d 1 that the CPU was basically going back to idle.

Next I will try to profile to determine if marionette is taking up significant resources.
(In reply to Ben Kelly [:bkelly] from comment #37)
> Did you flash a particular gaia/master on top of the images you linked?

Nope, just the build.
(In reply to Ben Kelly [:bkelly] from comment #38)
> I thought perhaps it was taking longer to clean up the closed app which was
> then interfering with the next run, but that does not appear to be the case.
> Using a --delay=20 does not improve the numbers at all.  I observed with top
> -m 3 -d 1 that the CPU was basically going back to idle.

I might have spoken too quickly here.  Running different delays on my buri this morning seems to avoid the regression:

  b2g18 --delay= 5: 650ms
  m-c   --delay= 5: 960ms
  m-c   --delay= 5: 840ms
  m-c   --delay= 6: 652ms
  m-c   --delay= 6: 638ms
  m-c   --delay=10: 669ms

From talking with gsvelto it sounds like this could be related to changes in the pre-allocated process.  Apparently in b2g18 the pre-allocated process is launched after 3 seconds, but in m-c now runs after 5 seconds.

Will continue investigating to see if this is the issue.
Apparently the 5 second prelaunch delay was backported to b2g18, so its not that simple.

If shutdown timing changed, however, it could still cause the prelaunch to interfere with the test.  I'll continue investigating.
We have experimented with increasing the delay before too, see bug 878295
Some additional data points:

  m-c, preLaunch=4s, b2gperf --delay=5:  660ms
  m-c, preLaunch=6s, b2gperf --delay=5:  1004ms

This suggests to me that in b2g18 we are getting the pre-allocated process launched before the next test run, but in m-c the pre-allocated process is not getting launched in time.
Profiling while b2gperf is running may be tricky since processes are coming and going so often.  (As far as I know the process still needs to existing when the ./profile.sh capture is run.)

Therefore I've been adding some debugging statements to get a sense of whats going on.  In particular, I added timestamp debug statements in dom/ipc's:

  ContentClient::RecvPBrowserConstructor
  ContentClient::FirstIdle
  PreallocatedProcessManager::AllocateAfterDelay
  PreallocatedProcessManager::AllocateOnIdle
  PreallocatedProcessManager::AllocateNow

This is essentially the call flow leading up to the creation of the pre-allocated process.  I believe the ContentClient methods are executed in the client process (gallery in this case) while the rest are in the parent process.  (Please correct if this is wrong!)

On mozilla-central there is a lot more latency between ContentClient::RecvPBrowserConstructor and ContentClient::FirstIdle compared to b2g18:

  b2g18 time to FirstIdle:  ~20ms to ~800ms
  m-c time to FirstIdle:    ~800ms to > 1500ms

This then causes PreallocatedProcessManager::AllocateNow to execute closer to when the app launch request is made on m-c.  When AllocateNow() occurs less ~750ms to ~1000ms before the launch request, then the app launch will show a regression in launch time.  When I observed AllocateNow() could fire >1000ms before the next app launch then the app launch time was quite good.

Interestingly b2g18 does not provide marionette debug for the app launch.  I don't know if this is because its just not logging the information or if marionette actually has different message flow on b2g18.

This could be significant because the IPC postIdleTask() that is used to fire ContentClient::FirstIdle() mainly looks to see if the IPC queue is empty or not.  It does not look for idle CPU.  If marionette is sending a lot of IPC during this time it could delay the notification of FirstIdle which in turn would delay the creation of the pre-allocated process.

I'm next going to investigate any differences in marionette and why we are using the IPC postIdleTask() here.
The debug that appears on m-c, but is missing from b2g18 is:

I/Gecko   (  753): 1378995091771        Marionette      DEBUG   Got request: executeWithCallback, da
ta: {"to":"0","specialPowers":false,"session":"6-b2g","scriptTimeout":null,"newSandbox":true,"args":[],"type":"executeAsyncScript","value":"launch_app(\"Gallery\")"}, id: {d70ab407-5278-467b-991a-2ceae560f815}
Malini informs me that marionette is part of the b2g image, so the difference in behavior makes sense.
With Contacts app we often don't see the ContentClient::FirstIdle() call at all when running with b2gperf because the app is getting killed first.  Manually opening the contacts app seems to call FirstIdle() very delayed.

This suggests to me its not a marionette interaction causing the problem.

I do see a lot of hwc logcat spam before the delayed FirstIdle.  Have their been changes to compositing that would increase IPC message traffic?
Dietrich, I think most evidence at the moment suggests this is an artifact of the test configuration and pre-allocated process configuration interacting poorly.  Yes, the pre-allocated process launches slightly later now, but that is not likely to actually impact the user's experience under normal use.  When I was manually launching apps I could not reproduce the regression.

Therefore its not clear to me this should really be a high priority for addressing real app launch regressions.  Its only going to make the datazilla graph look better.  Definitely nice to have, but perhaps not as critical as other fires at the moment.

My current plan was to continue investigating another day to see if I can detect anything clearly broken to fix in m-c.  Assuming I don't find any smoking gun, I was going to make this depend on 878295 to adjust the b2gperf delay to be less likely to conflict with the pre-allocated process launch.

I was also going to write a bug to consider adding a "pre-allocated process launched and idle" event that b2gperf could listen for instead of using a fixed timeout.  I can see arguments for and against that, though.

Does this plan sound good?  Or should we punt to the delay adjustment immediately and work on other urgent problems?

Thanks.
Flags: needinfo?(dietrich)
Depends on: 878295
Another possible approach would be to essentially remove the IPC postIdleTask() call before sendFirstIdle() from ContentClient to ContentParent.

On the one hand this might increase work while the app is loading, but might provide a more consistent experience for the user (and the tests).

My gut feeling, though, is that the launch time of the next app is generally not as important as the app I'm looking at right now.
(In reply to ben turner [:bent] (needinfo? encouraged) from comment #13)
> The fact that you're seeing IPDL messages on m-c is
> due to the fact that on trunk we added stack markers for all IPDL messages.
> The same code was called (probably) in b2g18 but the profiler just wasn't
> smart enough to generate a frame for it.

Increased IPC traffic would actually be consistent with the results I'm seeing where postIdleTask() is taking longer to run.  From what I can tell postIdleTask() will not actually fire until the IPC queue is empty.

Ben, do we have any tools to introspect the IPC queue?  Task volume and task source at a particular snapshot in time would be the main things I'm looking for.

Thanks.
Flags: needinfo?(bent.mozilla)
You could try the tool in bug 853864, though it currently introduces significant slowdowns all on its own.
Flags: needinfo?(bent.mozilla)
Oh, also, you could try setting the MOZ_IPC_MESSAGE_LOG variable, it makes IPC veeeery chatty.
I added some PROFILER_MARK's to ContentChild.cpp in order to investigate whats happening in the profiler.

I caught two cases by manually launching contacts in mozilla-central.

  1) ContentChild::FirstIdle() is executed immediately.
  2) ContentChild::FirstIdle() is delayed many seconds.

It appears that case (2) occurs when IPDL::PBrowser::RecvShow is executed prior to the FirstIdle postIdleTask().  This leads to other IPC messaging like:

  IPDL::PBrowser::RecvLoadURL
  IPDL::PContent::RecvPBrowserConstructor
  IPDL::PBrowser::RecvCacheFileDescriptor

To my naive eye this all looks like legitimate work related to launching an app.  The launch of the pre-allocated process is designed to wait for an idle state, so this appears to be working as designed.

If anything, the fast path where we think the phone is "idle" before the app really starts launching is a bug.

Do we want to maintain the idle detection before starting the 5 second timer?  There is another postIdleTask() after the timer completes.

Also, :bent suggests that postIdleTask() is poor mechanism for really detecting idle work.
Attached file "Fast path" profile
Profile showing "fast path" where there is no delay to ContentChild::FirstIdle.
Attached file "Slow path" profile
This profile shows the case where ContentChild::FirstIdle is delayed.
It seems like the options here are:

1) Accept that pre-allocated process will sometimes launch later than our configured 5 second delay and sometimes it will start close to the 5 second delay.  This would suggest we just update the b2gperf/datazilla configuration and do nothing to gecko.  Or perhaps we create a "ready for next app launch" event that marionette can listen for.

2) Remove the IPC postIdleTask() use to initiate the 5 second delay.  This would make us start the timer more consistently immediately after the app launch.  There is another postIdleTask() after the timer completes, so the pre-allocated process could still be delayed after the 5 seconds if there is IPC traffic in process.

3) Remove all uses of postIdleTask() since its not a reliable way of determining if the device is truly idle.  We would then simply depend on the current app launch to complete its work within 5 seconds.

Since I was not here for the original design of this feature I'm not sure which approach to pursue.  I can see arguments for and against each approach.  Any guidance or direction would be helpful.  Thanks!
Flags: needinfo?(jonas)
Flags: needinfo?(fabrice)
Flags: needinfo?(bent.mozilla)
(In reply to Ben Kelly [:bkelly] from comment #53)
> To my naive eye this all looks like legitimate work related to launching an
> app.  The launch of the pre-allocated process is designed to wait for an
> idle state, so this appears to be working as designed.
> 
> If anything, the fast path where we think the phone is "idle" before the app
> really starts launching is a bug.

Yeah, nice catch!

> Do we want to maintain the idle detection before starting the 5 second
> timer?  There is another postIdleTask() after the timer completes.
> 
> Also, :bent suggests that postIdleTask() is poor mechanism for really
> detecting idle work.

Yeah, this was best-effort code we put together last year and there's probably better ways to deal with it. For once postIdleTask() is made visible to chrome code via nsIMessageLoop which should really go away IMHO; instead I was considering adding similar functionality to our regular event loop (i.e. not the Chromium IPC one) which should work better. This might also help here but should be tested nevertheless.
I spoke with Jonas and Dave Hylands about this at Oslo.

First, I think there is consensus that we should increase the timeout in the b2gperf/datazilla tests in the short term.  We need more consistent timing numbers that represent the primary case of launching an app with the pre-allocated process.

In the long term, the Nuwa template process (bug 771765) will likely replace the pre-allocated process completely.  This would in effect make this timing concern a non-issue.

If it appears that bug 771765 will keep the pre-forked process, even with the template approach, then we may want to explore ways to shield our tests from the associated variability.  For example, we could emit an event when the pre-forked or pre-allocated process is ready that marionette could then listen for.

So I will pursue the b2gperf delay increase in bug 878295 and mark this bug fixed once that goes live.  I will also ask on bug 771765 if the pre-forked process will stay or not, and write a follow-up bug if necessary.
Depends on: Nuwa
Flags: needinfo?(jonas)
Flags: needinfo?(fabrice)
Flags: needinfo?(dietrich)
Flags: needinfo?(bent.mozilla)
Jon Griffin changed the delay to 10ms in datazilla for us.  I'll monitor to see how it affects the results.

Note that unfortunately unagi is no longer running on master, so we only have inari to look at.  The inari results were less affected by the timing changes, but should still show some improvement.
I've had a bit of time to think about this, and there are a couple of historical things that have changed which may be impacting things.

1 - When the whole wait-for-idle/delay thing was done, there were still many processes which were run "in-process".
2 - Lots of work has happened on master to move stuff off the main thread.

Since pretty much everything is run OOP now, in conjuntion with (2) above, the idle in 1.2 is probably happening much sooner than it was in 1.1.

What I was really trying to achieve was a "system" idle (i.e. the kernel's idle process is running).

What we've actually got is main-thread in the b2g process is idle.
(In reply to Dave Hylands [:dhylands] from comment #60)
> 1 - When the whole wait-for-idle/delay thing was done, there were still many
> processes which were run "in-process".
> 2 - Lots of work has happened on master to move stuff off the main thread.
> 
> Since pretty much everything is run OOP now, in conjuntion with (2) above,
> the idle in 1.2 is probably happening much sooner than it was in 1.1.

Actually, I was seeing 1.2 take longer to reach "idle".  This makes sense with the OOP changes, though, since the "idle" check is really a check for an empty IPC task queue.
The datazilla graphs are showing an across-the-board reduction in times as expected after the config changes in comment 59:

  http://mzl.la/16uah8s

Looking at a stable app like email, it appears to be back to the value produced under b2g18.  Some apps still show regressions, but these appear to be app specific due to ongoing work; e.g. clock.

As planned in comment 58, I am marking this fixed.  The long term solution is the nuwa template process work.
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Product: Core → Boot2Gecko
Whiteboard: [c= u=1.2 p=5] → [c= u=1.2 p=5] QAregressexclude
Target Milestone: --- → 1.2 C1(Sep27)
Whiteboard: [c= u=1.2 p=5] QAregressexclude → [c= u=1.2 p=5] QAregressexclude [b2gperf_regression]
You need to log in before you can comment on or make changes to this bug.