Closed Bug 617503 Opened 11 years ago Closed 10 years ago

XPCShell tests are incredibly slow on Windows 7

Categories

(Testing :: XPCShell Harness, defect)

x86
Windows 7
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: cmtalbert, Assigned: wlach)

References

Details

(Whiteboard: [buildfaster:p1])

Attachments

(9 files, 1 obsolete file)

The problem is well documented by Armen: http://armenzg.blogspot.com/2010/12/xpcshell-case-on-windows.html.

I've tried to profile this today, and after a couple of false starts, I finally got somewhere, but I don't have anything really useful. xpcshell runs by starting the xpcshell binary over and over for each test.  The arguments handed to the xpcshell binary are extremely arcane and I've attached a cmd file which will work on windows (you'll need to replace my paths with yours).

Profiling python, we spend all our time in the xpcshell binary.  Profiling the xpcshell binary using AQtime I don't see any smoking guns.  Our top functions are our entry points, nothing surprising there.  Then dump figures pretty highly in the output, but nothing really accounts for the massive amount of time we spend on windows versus linux.  I'm trying to figure out linux profiling now.  But, I should definitely hand this off to someone who's more familiar with profiling on windows.

Jim, do you have any advice?

AQtime Performance Profiler
Session information 

Started: 12/7/2010 5:27:34 PM 
Ended: 12/7/2010 5:27:37 PM 
Execution time: 00:00:02:185 

Routines analyzed: 660 
Routines executed: 246 

Total number of routine calls: 990 
Total value of 'Elapsed Time':  1.31 

Performance hot spots 
Win32 Thread 2440 


Worst performance (body only): 

main  1.13  
ProcessArgs  0.14  
Dump  0.03  
ProcessFile  0.01  
Load  0.00  
CallGetService  0.00  
Process  0.00  
ContextCallback  0.00  
JSAutoByteString::JSAutoByteString  0.00  
js_free  0.00  

Worst performance (with children): 

main  1.31  
ProcessArgs  0.18  
Dump  0.03  
Process  0.01  
ProcessFile  0.01  
Load  0.00  
CallGetService  0.00  
ContextCallback  0.00  
nsGetServiceByContractID::operator()  0.00  
JSAutoByteString::JSAutoByteString  0.00
don't forget to change the paths to reflect your system.  And yes the order of ", ' quotes are important as are the \ versus \\.
(In reply to comment #1)
> Created attachment 496042 [details]
> command file to launch xpcshell
> 
> don't forget to change the paths to reflect your system.  And yes the order of
> ", ' quotes are important as are the \ versus \\.

Sorry for bug spam.  Forgot to mention that to get the set of directories I'm using here, do the following.
1. Build windows debug build
2. cd objectdir
3. make stage-xpcshell
(this way you get that <objdir>/dist/test-package-stage/xpcshell directory with all the tests in it).
Ok, I was staring at this a little more and it suddenly clicked that while "ProcessArgs" is the routine that does all the work in xpcshell.cpp, the profiler was trying to tell me that we spend the lion's share of our time in main.

So, I went debugging through main with the good old "GetTickCount" windows API (number of milliseconds since computer restarted, probably good enough for my purposes).  And I found that we spend the lion's share of our time in NS_InitXPCOM2, right in the middle of main.  I'm starting to think that either AQtime doesn't do sub-module timing or that it isn't reading my pdb files correctly and therefore it missed all the stuff that is happening as a result of this call.
Updating summary. They're slow on all Windows, but they're painfully slow on Windows 7.
Summary: XPCShell tests are incredibly slow on windows → XPCShell tests are incredibly slow on Windows 7
(In reply to comment #3)
> Created attachment 496049 [details] [diff] [review]
> some hacky profiling
> 
> Ok, I was staring at this a little more and it suddenly clicked that while
> "ProcessArgs" is the routine that does all the work in xpcshell.cpp, the
> profiler was trying to tell me that we spend the lion's share of our time in
> main.
> 
> So, I went debugging through main with the good old "GetTickCount" windows API
> (number of milliseconds since computer restarted, probably good enough for my
> purposes).  And I found that we spend the lion's share of our time in
> NS_InitXPCOM2, right in the middle of main.  I'm starting to think that either
> AQtime doesn't do sub-module timing or that it isn't reading my pdb files
> correctly and therefore it missed all the stuff that is happening as a result
> of this call.

Did you add all the right dlls to the modules list independent of the exe? I'm not sure what the xpshell harness links to but I'd guess you would have to include xpcom.dll at least.
depends answers that for me:

xul.dll
xpcom.dll
mozalloc.dll
mozjs.dll
nspr4.dll
Attached file top results
This is what I'm seeing on your example test case. Clearly xpcom init and shutdown take up a big chunk of time, but js::GCHelperThread::threadLoop seems to trap the most cycles.
Running with a simple |-e "quit(0);"| produces similar results, .4 seconds spent in that gc thread loop.
Attached file test results w/nspr
I didn't have nspr4.dll added. _PR_MD_WAIT_CV, which gets called from that loop via PR_WaitCondVar. I wonder if we could change that timeout value for shell tests? (I have no idea.)
Is that the main thread only? _PR_MD_WAIT_CV is just waiting for some condvar, it's probably not the culprit directly. What's the stack at that point? It looks like it's just waiting for the GC, which is for some reason being painfully slow. I think we'd need a profile of the GC thread, not the main thread, for that time period.

Is this debug-only?
(In reply to comment #10)
> Is that the main thread only? _PR_MD_WAIT_CV is just waiting for some condvar,
> it's probably not the culprit directly. What's the stack at that point?

The gc thread is sitting in WaitForSingleObject.

> It
> looks like it's just waiting for the GC, which is for some reason being
> painfully slow. I think we'd need a profile of the GC thread, not the main
> thread, for that time period.

I'll attach that.

> 
> Is this debug-only?

yes.
(In reply to comment #10)
> Is this debug-only?

Per Armen's blog post (linked in comment 0), xpcshell tests on Windows 7 take far longer than on any other OS (including Win2k3) in both opt and debug builds.
(Win7 opt xpcshell averages out at 73 minutes vs. Win2k3 debug xpcshell at 46!)
Attached file release all threads
I wish I could post the entire summary, but it can't be saved out.

Started: 12/8/2010 12:51:53 PM 
Ended: 12/8/2010 12:51:57 PM 
Execution time: 00:00:04:102 

Routines analyzed: 104487 
Routines executed: 2109 

Total number of routine calls: 675672 
Total value of 'Elapsed Time' (all threads):  1.36 s 

All threads, worst performance (body only): 

WaitForSingleObject  0.59  
GetQueuedCompletionStatus  0.39  
WaitForMultipleObjects  0.14  
nsSocketTransportService::Poll  0.13  
HeapAlloc  0.02  
HeapFree  0.01  
nsSocketTransportService::Init  0.01  
LoadLibraryW  0.01  
GetShortPathNameW  0.01  
ShortcutResolver::Init  0.01  

Generally, every thread spends most of it's time sitting around waiting to do something.
I ran a full test run locally on a release build for fun: 33 minutes to complete. One thing I noticed was that networking related tests took an inordinate amount of time. So I blew away two areas that use our little http daemon, netwerk and services/sync and re-ran the full test suite.  Total run time was about 10 minutes. (That didn't include all the areas that use that daemon, like mozapps/extensions.) I think this might have something networking.
Also, between those two runs, the difference in the number of tests run was only about 150.
FWIW, I've noticed that any tests that seems to fire up the http server takes forever on my windows 7 machine, but haven't had any time to look into it.
Making a connection is really slow, on average > 1 second per. connection made. That doesn't seem right.
So the problem I see here (locally) is that the os is giving priority to the IPv6 address when we do our lookup on 'localhost'. Every connection attempt we make first fails on address |::1| then retries on |127.0.0.1|. I fixed this on my local system by adding 'localhost' to the IPV6 exclusion list in modules/libpref/src/init/all.js. (network.dns.ipv4OnlyDomains)

With this set, an xpcshell test run for me shrinks from 33 minutes to 7. 

I've pushed a patch to try server to test this theory (927a0742f068). We'll see how the numbers come out. 

If this is the issue, we need to figure out how to address it. We can probably do this through machine config, although that might be a pain and doesn't help people with local runs. We could also add 'localhost' to the exclusion list although I'm not sure we want that exposed to firefox users. AFAICT our tests use the default prefs in the module, so maybe we need to make some changes so tests modify this pref before they run.
Awesome detective work Jim!
Excluding localhost definitely improved things:

Rev3 WINNT 6.1 tryserver debug test xpcshell: 111m -> 67m
Rev3 WINNT 6.1 tryserver opt test xpcshell: 74m -> 50m

These numbers still seem too high though, Win7 should be able to match XP. So I think I'll get a hold of a slave and do some local debugging.
(In reply to comment #21)
> Excluding localhost definitely improved things:
> 
> Rev3 WINNT 6.1 tryserver debug test xpcshell: 111m -> 67m
> Rev3 WINNT 6.1 tryserver opt test xpcshell: 74m -> 50m
> 
> These numbers still seem too high though, Win7 should be able to match XP. So I
> think I'll get a hold of a slave and do some local debugging.
Could we maybe at least get this issue fixed now, and work on whatever else next?  This is a substantial win on its own.
Neil suggested on #developers that we could simply fix the tests to use 127.0.0.1 instead of localhost, but that's a bit of a tall order:
http://mxr.mozilla.org/mozilla-central/search?string=localhost&find=test_&findi=&filter=^[^\0]*%24&hitlimit=&tree=mozilla-central
Attached patch head.js patchSplinter Review
I think this might do it assuming head.js gets sourced into every xpcshell test.
Comment on attachment 496833 [details] [diff] [review]
head.js patch

Yes it does. You misspelled "disabled" in your dump text. I don't think there's any need to mention the bug number, that's what hg blame is for.
Attachment #496833 - Flags: review+
Please also brace your if and indent the let.  I know it's legal, but it'd be easy to accidentally mess that up in the future.
sheesh, tough crowd. :) 

I need to run this locally to confirm it works. waiting on a reconfig/clobber build. Will kick off a review here is a bit.
(In reply to comment #27)
> sheesh, tough crowd. :) 
> 
> I need to run this locally to confirm it works. waiting on a reconfig/clobber
> build. Will kick off a review here is a bit.

Oh! I already have r+! neat.
Attached patch head.js patch v.2 (obsolete) — Splinter Review
This can land anytime, since it's tests-only.
Assignee: nobody → jmathies
But probably only after a good try server run since we are trying to freeze for beta 8 ;)
http://hg.mozilla.org/mozilla-central/rev/05a7b9970a40

I had to make a small change - skip setting the pref in xpcshell child processes. This failed causing timeouts on ipc unit tests.

I'm going to keep this bug open until I have a chance to do more debugging on a slave next week.
Attachment #496842 - Attachment is obsolete: true
That patch saved about 20 minutes on both opt and debug builds. I hope to do better though. Locally I can run these in 7 minutes, so 50 / 70 minutes on a mini is still way way too long.
Depends on: 618824
Testing on a tinderbox slave hasn't turned up much. One thing for sure is that these machines run tests really really slow. For example, netwerk/unit/test_simple.js runs on these machines take around 500 msec to complete, while here on my laptop (w/cpu half as fast as tinderbox) I can complete the same test in around 70msec, and locally on my main system I can run it in about 15.

There are no obvious networking issues like the dns problem I already found. Nothing seems to be taking up excessive cpu time either, and the firewall is off so that's not in the way. Most of the time spent seems to be in executing script. I did a little low level socket debugging which didn't turn up any obvious overhead in connecting, reading, and writing data either.
Do the tinderbox slaves have an ipv6 stack enabled?
(In reply to comment #35)
> Is that on a VM?

Not sure, from what I see via remote desktop, it's a mac mini running bootcamp. CPU is dual core 2.26GHz cpu.

(In reply to comment #36)
> Do the tinderbox slaves have an ipv6 stack enabled?

It's enabled for the adapter, but unconfigured. Disabling it doesn't have any impact on test_simple.js performance.
jimm's fix improved things a lot.

I now have more recent data with the new averages that point out that xpcshell and reftests run slower on Win7 than on WinXP (check out the graph).
http://armenzg.blogspot.com/2011/01/reftests-and-xpcshell-test-suites-run.html

reftests and xpcshell are now the top worst for Win7 taking an average between 35 and 40 minutes (xpcshell used to take more than 70 minutes on average).
On Windows XP these suites take less than 25 mins on average.

To see these two bad test suites compared to the other test suites: the next worst suite is mochitest-other at around 25 mins on average (not significantly slower than on XP).
I have also experienced slow localhost resolution on Win7 64-bit (with using 127.0.0.1 being snappy).

This appears (from searching google with "windows 7 localhost slow") to be the interaction with the Windows Firewall and IPv6 DNS resolution.

Adding "127.0.0.1 localhost" to the hosts file fixes the issue for some people [http://superuser.com/questions/65049/firefox-and-chrome-slow-on-localhost-known-fix-doesnt-work-on-windows-7].

Others have found they also need to disable the Windows Firewall and IPv6 support [http://cubicspot.blogspot.com/2010/07/fixing-slow-apache-on-localhost-under.html].
Assignee: jmathies → wlachance
Heya, I'm going to take a bit of look at this (as part of getting up to speed on various test frameworks, etc.)

I've started by adding timing information when an xpcshell test passes (need to commit that...) and creating a spreadsheet of relative test timings on a Windows 7 build slave vs. my dev box. I'll attach that in a second. So far the main interesting result is that the geolocation test takes 120 seconds on Win7 vs. only ~6 seconds on my Linux machine. There's largish differences in other tests that are no doubt worth looking into, but this is the only one that jumps out.
Sorted in ascending order (largest difference against Win7 at bottom). Note that these timings are not "fair", the machine architectures, environments, etc. are all different. So only large deviations are, in fact, interesting.
Ok, so I went through the rest of the relatively slow tests (those more than 0.5sec slower than my Linux machine). The majority of them are either places related (~223 secs or ~4 mins) or addons related (~100 secs). I filed a bug for one of the places tests that stood out (bug 675363) and, upon inspection, it just looks like disk i/o is just really slow on the build machines for some reason.

There's also "unit/test_nsIProcess.js" which is slow (>13 secs slower than my Linux machine) because it forks 1000 processes. I don't know if that's really necessary (I guess it's some kind of stress test?), but it's a pretty isolated issue rather than a systematic one.

The remaining "long tail" of tests (i.e. those with less than 0.5sec times) only account for about 66 secs, which may not be worth the effort of optimizing at this point.
Filed bug 676412 to deal with nsIProcess.js.
Whiteboard: [buildfaster:p1]
Talked about this during our GoFaster meeting. The feeling is that we've exhausted general solutions to this bug. Remaining problems seem to be fairly specific. Resolving this as WONTFIX.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.