Closed Bug 520837 Opened 15 years ago Closed 15 years ago

Investigate Ts regression from Aero Peek landing

Categories

(Firefox :: General, defect, P1)

x86
Windows 7
defect

Tracking

()

RESOLVED FIXED
Firefox 3.6b1
Tracking Status
status1.9.2 --- beta1-fixed

People

(Reporter: robarnold, Assigned: jimm)

References

(Depends on 1 open bug, )

Details

(Keywords: perf, regression, Whiteboard: [ignore comment 4-5])

Attachments

(1 file)

3.1% Ts regression from the landing of bug 474056. <tabbrowser.xml> changes appear to not be at fault. browser.js:delayedStartup calls Win7Features.onOpenWindow which in turn instantiates the nsIWinTaskbar service which attempts to create the ITaskbarList4 interface which fails because it is not present on XP.

So it seems that delayedStartup is not delayed enough to avoid affecting Ts.
Flags: blocking-firefox3.6?
Why does it take 3% of our total startup time to fail to instantiate this service?  That seems generous!  How long is the total time to instantiate ITaskbarList4 on your system?

Just moving it until later to get after the Ts measurement probably isn't really right, unless we move it to after session restore is complete.  (Which seems reasonable to me, all things considered.)
(In reply to comment #1)
> Just moving it until later to get after the Ts measurement probably isn't
> really right, unless we move it to after session restore is complete.  (Which
> seems reasonable to me, all things considered.)

http://mxr.mozilla.org/mozilla-central/ident?i=NOTIFY_WINDOWS_RESTORED
I don't think the regression blocks beta, but certainly blocks final.
Flags: blocking-firefox3.6? → blocking-firefox3.6+
Priority: -- → P2
An 11% Ts regression on Windows XP makes this a bigger deal, bumping to P1.

http://graphs.mozilla.org/graph.html#tests=[{%22machine%22:55,%22test%22:16,%22branch%22:1},{%22machine%22:56,%22test%22:16,%22branch%22:1},{%22machine%22:57,%22test%22:16,%22branch%22:1},{%22machine%22:58,%22test%22:16,%22branch%22:1},{%22machine%22:71,%22test%22:16,%22branch%22:1},{%22machine%22:74,%22test%22:16,%22branch%22:1},{%22machine%22:75,%22test%22:16,%22branch%22:1},{%22machine%22:76,%22test%22:16,%22branch%22:1},{%22machine%22:108,%22test%22:16,%22branch%22:1},{%22machine%22:109,%22test%22:16,%22branch%22:1},{%22machine%22:110,%22test%22:16,%22branch%22:1},{%22machine%22:111,%22test%22:16,%22branch%22:1},{%22machine%22:112,%22test%22:16,%22branch%22:1},{%22machine%22:113,%22test%22:16,%22branch%22:1},{%22machine%22:114,%22test%22:16,%22branch%22:1},{%22machine%22:115,%22test%22:16,%22branch%22:1},{%22machine%22:116,%22test%22:16,%22branch%22:1},{%22machine%22:117,%22test%22:16,%22branch%22:1},{%22machine%22:118,%22test%22:16,%22branch%22:1},{%22machine%22:119,%22test%22:16,%22branch%22:1},{%22machine%22:120,%22test%22:16,%22branch%22:1},{%22machine%22:121,%22test%22:16,%22branch%22:1},{%22machine%22:122,%22test%22:16,%22branch%22:1},{%22machine%22:123,%22test%22:16,%22branch%22:1},{%22machine%22:124,%22test%22:16,%22branch%22:1},{%22%20machine%22:125,%22test%22:16,%22branch%22:1},{%22machine%22:126,%22test%22:16,%22branch%22:1},{%22machine%22:127,%22test%22:16,%22branch%22:1},{%22machine%22:240,%22test%22:16,%22branch%22:1},{%22machine%22:241,%22test%22:16,%22branch%22:1}]&sel=1254832680,1255005480
Priority: P2 → P1
Hm, to be clear, that seems to have happened from jumplists (bug 518666) not just Aero Peek (bug 474056, which this is set to block)
(In reply to comment #2)
> (In reply to comment #1)
> > Just moving it until later to get after the Ts measurement probably isn't
> > really right, unless we move it to after session restore is complete.  (Which
> > seems reasonable to me, all things considered.)
> 
> http://mxr.mozilla.org/mozilla-central/ident?i=NOTIFY_WINDOWS_RESTORED

Note that the windows-restored notification is sent before the pages have loaded. So if Ts tested session restore, this would still be in the startup path.
on other OSes that are not Seven, would probably be better to avoid Cu.import at all if possible. Especially for jumplists looks like we import the module just to discover we are not on Seven and bail out.
i filed Bug 521008 for the jumplists Ts regression because the regression happened at different times, the codes are not so deep related, Rob did not work on that feature and does not know that code deeply, and for comment 7 that instead was implemented for Aeropeek.
(please ignore comments 4 and 5, I the jumplist regression is being discussed in bug 521008 as Marco points out)

Joe proposes that we not block the beta on this, but I'd like to make sure we know what's causing the regression, at least, before we ship the feature. This should stay as a P1 until we've got some good leads/understanding.
Whiteboard: [ignore comment 4-5]
This doesn't appear to be an issue with the instantiation of nsIWinTaskbar, jump lists currently source the jumplist jsm, instantiate nsIWinTaskbar and checks the available property in 3 msec. Aero peek instantiates nsIWinTaskbar and manually checks available before sourcing it's jsm which takes 1 msec.
(In reply to comment #10)
> This doesn't appear to be an issue with the instantiation of nsIWinTaskbar,
> jump lists currently source the jumplist jsm, instantiate nsIWinTaskbar and
> checks the available property in 3 msec. Aero peek instantiates nsIWinTaskbar
> and manually checks available before sourcing it's jsm which takes 1 msec.

After doing quite a few test runs, aero is taking less than 1 msec to do it's check. Jump list's import is on average taking around 3 msec. I'll have to update the jump list import code to do the same direct check to shave a couple milliseconds off the init.

Overall though it appears the Ts regression is something other than the init code. Also, jump lists init before aero, so it's not some sort of caching issue.
Additional testing on an image where things are slowed down a bit. I might have been wrong about caching, it doesn't seem to matter which patch initiates the interface first - first instantiation takes a little longer, while the secondary takes little time.

Overall I still don't see the major regression that showed up when aero landed, but maybe my workstation is faster than our tinderbox hardware to the extent that 3-5 ms difference locally translates to 30-40 on tinderbox.

I've also tested the the patch I posted in bug 518666 for jump lists which improves init time for the module. We should land that to see how it effects trunk Ts. There should be some left over Ts gains we can get after the bad module path fix.
One final note, I was curious how much of this might be com's CoCreateInstance on ITaskbarList4. I shunted WinTaskbar() with a return and timed the getService call, the lag was still there.

Maybe this is something in xpcom or in the widget registration? (That's about as far as i got tonight.) I don't think this should block though, it's not a windows thing, so clearly we can track it down and fix it.
Getting mixed results (60-290! msec) on the getService call in browser.js (which calls cocreateinstance in the ctor) on XP in vmware after cold starts of the image, which mimics what talos does. Subsequent calls result in much lower times (around 20-30 msec.).  This may be an issue with registry caching and interface lookups where the interface is not registered. Using a getversion check drops times in all cases on vmware down to zero.

I'm going to check this in as an experiment. If it doesn't address the regression, I'll back it out. If it does, we'll spin off a bug to investigate further and look at other cases where we call cocreateinstance on non-supported interfaces.
Assignee: tellrob → jmathies
Attachment #405521 - Flags: review?(tellrob)
Comment on attachment 405521 [details] [diff] [review]
[checked in]getversion possible fix

Seems like a good experiment. This isn't how COM should be done but OTOH, the version check matches our supported configs. This doesn't address the performance hit on Windows 7 but hopefully the followup bug will.
Attachment #405521 - Flags: review?(tellrob) → review+
(In reply to comment #15)
> (From update of attachment 405521 [details] [diff] [review])
> Seems like a good experiment. This isn't how COM should be done but OTOH, the
> version check matches our supported configs. This doesn't address the
> performance hit on Windows 7 but hopefully the followup bug will.

If it does fix the talos Ts problem, I would say this patch should be pulled. We may be dealing with something here that isn't related to our software.
Attachment #405521 - Attachment description: getversion possible fix → [checked in]getversion possible fix
Talos machines reported a 3.5-5.3% improvement on XP! nice work!
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Depends on: 521571
Attachment #405521 - Flags: approval1.9.2?
not yet!
Comment on attachment 405521 [details] [diff] [review]
[checked in]getversion possible fix

(removed flag because it's blocking.)

1.9.2 tree is currently burning a bit, will check this in later today.
Attachment #405521 - Flags: approval1.9.2?
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: