test_frecency.js really slow when running under win7 build slaves

RESOLVED INVALID

Status

()

Toolkit
Places
RESOLVED INVALID
7 years ago
5 years ago

People

(Reporter: wlach, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [buildfaster:p1])

Attachments

(2 attachments)

In the course of looking into bug 617503, I found that there was a huge difference between the speed of execution of test_frecency.js between the Windows build slave and my Linux box (15 seconds). A quick check revealed that this problem did not occur on my Win7 VM either: just the slave environment. I found that the problem seems to be the addVisit API, which can take more than 100ms to complete.

Upon further digging using ProcessMonitor (http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx), it looks like the main cause of that is some writes to disk that occur when adding a visit to a specific URL. Specifically, sometimes when a place is added, there's a call to FlushBuffersFile which can take up to 80 milliseconds in this environment (under my Windows VM, it never seems to take more than 8-10 milliseconds).

Things to look into:

1. Why is this call so much slower on the Windows build slaves? Memory doesn't seem to have anything to do with it. For experimentation, I gave my VM only 1gig of memory (the slaves have 2) and things were still just as fast on it as before.

2. Why are we reading/writing a whole bunch of sqlite stuff to disk on windows when we add a visit? Shouldn't sqlite only do this periodically? A quick strace of the same test on Linux reveals that *nothing* is written/read from disk during this operation.

I'll add some attachments shortly to help reproduce the issue.
Created attachment 549500 [details]
CSV file of Process Monitor CSV output during addVisit call in XPCShell

Do a search for "FlushBuffersFile" to find the offending call
Throwa(In reply to comment #1)
> Created attachment 549500 [details]
> CSV file of Process Monitor CSV output during addVisit call in XPCShell
> 
> Do a search for "FlushBuffersFile" to find the offending call

Should add this was taken on the win7 build slave
Created attachment 549503 [details] [diff] [review]
Throwaway patch to add extra logging/timing to frecency test
Adding catlee to the CC list. Any thoughts on why the win7 slaves are so slow?
Those FlushBuffersFile calls are due to wal checkpoints, from places.sqlite-wal to places.sqlite. This may happen between those addVisit calls since they are automatic, when the wal journal reaches a certain size, a checkpoint operation is automatically inited.
On Linux it's possible this happens differently, or just at different times. For sure there the checkpoints should cause fsync() calls.
Any other write may hit the filesystem cache (writes to the journal are are not fsynced)
Now the real question is why on those slaves the checkpoint operation is so slow.
Just did some analysis of xpcshell tests that take >500 msec more on the windows build slave than on my Linux machine, a large number of them seem to be places related. 223 seconds (= ~4 min) difference total over the run of tests.
This just seems to indicate IO on those boxes is tremendously slow, and would be nice to figure out if the reason may be old/slow disks, controller drivers, incorrect udma mode or whatever may cause such a problem.  Any of our tests may cause a wal checkpoint with relative fsync() call. and honestly I'd expect this to be better than it was before when any change could have caused an fsync().
(In reply to comment #6)
> Just did some analysis of xpcshell tests that take >500 msec more on the
> windows build slave than on my Linux machine, a large number of them seem to
> be places related. 223 seconds (= ~4 min) difference total over the run of
> tests.

How long do these xpcshell tests take on the linux talos slaves?

Let's at least /try/ to control for hardware here. :)
Notice also bug 676521, I may be wrong, but something on windows storage is different from others...
I don't have access to a bonafide Linux talos slave to test right now, but on our tools test machines (which have the same hardware specs and should be configured similarly afaik) the frecency test takes about 12 seconds to complete, which is a good bit faster than the 21 seconds I was seeing on the win7 slave.

I did the same test a few times on a win64 tool test machine (again, similar config: win7 on mac mini running bootcamp), the results are again about 21 seconds (20.5 and 22.91 to be precise).

So it does indeed seem like we may have some kind of issue with the configuration of the win7 test slaves. One person on our time (bc) thinks that an incorrectly configured partitioning scheme may be the culprit. We should probably at least look into that.
(In reply to comment #10)

> So it does indeed seem like we may have some kind of issue with the
> configuration of the win7 test slaves. One person on our time (bc) thinks
> that an incorrectly configured partitioning scheme may be the culprit. We
> should probably at least look into that.

Where is this conversation taking place? I'd like to join in, because I'm just not coming to the same conclusions you are based on the data I see here.

Comment 12

7 years ago
I've just been throwing some ideas out about how the disk and its partitions are configured may affect performance of Windows. There is nothing concrete. It is just speculation at this point.
OK, fair enough. From my perspective, I see this as "Windows is slow", not that there's anything specifically wrong with the talos-r3-w7 machine config. 

Is there an easy test case I could try on various hardware?
Here's some real data on how long it takes to run xpcshell tests on various platforms:

+-------------------------------------------------+------+
| name                                            | e    |
+-------------------------------------------------+------+
| mozilla-central_snowleopard_test-xpcshell       |  504 | 
| mozilla-central_leopard_test-xpcshell           |  669 | 
| mozilla-central_fedora64_test-xpcshell          | 1010 | 
| mozilla-central_fedora_test-xpcshell            | 1016 | 
| mozilla-central_snowleopard-debug_test-xpcshell | 1333 | 
| mozilla-central_xp_test-xpcshell                | 1524 | 
| mozilla-central_fedora64-debug_test-xpcshell    | 1656 | 
| mozilla-central_fedora-debug_test-xpcshell      | 1659 | 
| mozilla-central_leopard-o-debug_test-xpcshell   | 1875 | 
| mozilla-central_win7_test-xpcshell              | 2323 | 
| mozilla-central_xp-debug_test-xpcshell          | 2376 | 
| mozilla-central_w764_test-xpcshell              | 2396 | 
| mozilla-central_win7-debug_test-xpcshell        | 3085 | 
+-------------------------------------------------+------+

where e is number of seconds to run just the test. this excludes setup/teardown time.
zandr: probably the easiest thing to do is to run the places xpcshell tests.

On a test machine, this might go something like:

* Open a command prompt
* Get a mozilla environment: c:\mozilla-build\start-l10n.bat
* Create a directory to hold the tests and cd into it
* get the tests: wget ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-central/firefox-8.0a1.en-US.win32.tests.zip
* get the build: wget ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-central/firefox-8.0a1.en-US.win32.zip
* Unzip all that stuff.
* Copy the contents of the bin/ directory (from the tests.zip) into the firefox/ directory.
* cd into the xpcshell/ directory
* Run the xpcshell places tests manually and time the results: time python runxpcshelltests.py --manifest=tests/all-test-dirs.list --test-path=places ../firefox/xpcshell.exe
I just found a trick that increases the speed of the places tests a crazy amount on our tools mac mini:

Open up the device manager, right click on the hard disk (fujitsu something something). Select policies and check "turn off windows write cache buffer flushing". Click "ok".

After doing this, running the places tests went down from 7m40s to 1m51.528s (point of comparison: my fancy new Linux laptop takes 3m20s). 

Of course this feature is typically enabled for a reason (we're subject to data loss should there be a power outage) but this may not be a big deal for our test slaves. Feedback welcome!
Data loss on power fail is not a concern for the talos pool is not a concern for a host of reasons. Testing in a configuration that users don't use is not something I'm in a position to evaluate. A six-minute win is not to be taken lightly, however.
(In reply to William Lachance from comment #16)
> Open up the device manager, right click on the hard disk (fujitsu something
> something). Select policies and check "turn off windows write cache buffer
> flushing". Click "ok".

This practically disables fsync() calls, that is what is found being slow here, doesn't try to figure out why fsyncs are slower than they should. I suspect disabling fsync may increase risk of global OS corruption though, rather than just our profile corruption (that would not be a big deal).

As it is, may just indeed be a "Windows IO is f***ing slow on bootcamp" issue, on my box the same test takes less than 1 second, but it's clearly not a mac mini! I see a lot of reports on the net about slow disk performance in bootcamp. Did someone check the drivers and udma mode of the controller used by these installs? Can it be improved?

Comment 19

7 years ago
(In reply to Marco Bonardo [:mak] (Away 8-14 Aug) from comment #18)
> (In reply to William Lachance from comment #16)
> > Open up the device manager, right click on the hard disk (fujitsu something
> > something). Select policies and check "turn off windows write cache buffer
> > flushing". Click "ok".
> 
> This practically disables fsync() calls, that is what is found being slow
> here, doesn't try to figure out why fsyncs are slower than they should. I
> suspect disabling fsync may increase risk of global OS corruption though,
> rather than just our profile corruption (that would not be a big deal).
> 
> As it is, may just indeed be a "Windows IO is f***ing slow on bootcamp"
> issue, on my box the same test takes less than 1 second, but it's clearly
> not a mac mini! I see a lot of reports on the net about slow disk
> performance in bootcamp. Did someone check the drivers and udma mode of the
> controller used by these installs? Can it be improved?

It wouldn't surprise me if we found Apple to have chosen a config that purposely kept win perf in check.
I just checked the settings of the IDE controller on the tools box, DMA is allegedly enabled.

I did some searching on google and couldn't really find much useful information on bootcamp and slow disk i/o (some people were complaining about problems with a beta version in 2006 running windows xp, with no real resolution).

On irc it was suggested that maybe we ask about this on serverfault. What's the model of this machine?
Macmini3,1: 2.26GHz C2D with 2GB ram and a 160GB 5400rpm drive.
Just filed a query on serverfault. http://serverfault.com/questions/298166/slow-disk-i-o-especially-flushing-to-disk-with-mac-mini-running-win7-via-bootca

Hey, it's worth a try!
At bc's suggestion I tried two additional disk-related settings on our tools test machine based on this document: http://technet.microsoft.com/en-us/library/cc785435%28WS.10%29.aspx

1. set disablelastaccess to 1. This seemed to improve test times negligibly on test_frecency.js (maybe 1 sec difference average on three tries).
2. (seperately) set memoryusage to 1. This seemed to make no difference whatsoever.
(In reply to William Lachance from comment #16)
> Open up the device manager, right click on the hard disk (fujitsu something
> something). Select policies and check "turn off windows write cache buffer
> flushing". Click "ok".
We don't want to do this because this same pool does our performance testing.
(In reply to Shawn Wilsher :sdwilsh from comment #24)
> (In reply to William Lachance from comment #16)
> > Open up the device manager, right click on the hard disk (fujitsu something
> > something). Select policies and check "turn off windows write cache buffer
> > flushing". Click "ok".
> We don't want to do this because this same pool does our performance testing.

I assume the rationale for that case is that we want to be able to compare relative performance numbers. But in that case could we *never* change the hardware/configuration? Surely at some point it will become necessary to do so.

During the ateam weekly meeting, anode suggested that maybe we could have half the machines in the old configuration and half of them in a new configuration as a transitionary measure (so we could still compare relative results for a period of time). Interested in hearing whether this would be feasible. I think the performance difference makes it worth considering this sort of thing, at the very least.
(In reply to William Lachance from comment #25)
> (In reply to Shawn Wilsher :sdwilsh from comment #24)
> > (In reply to William Lachance from comment #16)
> > > Open up the device manager, right click on the hard disk (fujitsu something
> > > something). Select policies and check "turn off windows write cache buffer
> > > flushing". Click "ok".
> > We don't want to do this because this same pool does our performance testing.
> 
> I assume the rationale for that case is that we want to be able to compare
> relative performance numbers. 

No, I think the rationale is that we don't want our performance testing environment to be too different from what users actually run.

I'll admit that Windows on Apple hardware is already a bit of an odd duck, and there's work afoot to fix that. But flipping a switch that makes a perf/reliability tradeoff that no user would make could only cause trouble down the road.

This is basically masking a file I/O perf issue. But it's a big switch, it would mask any other places where we suddenly started doing a lot of file I/O. Regressions would slip through.
(In reply to Zandr Milewski [:zandr] from comment #26)
> No, I think the rationale is that we don't want our performance testing
> environment to be too different from what users actually run.
Correct.
Whiteboard: [buildfaster:p1]
Ok, unless we're missing some way in which these machines are obviously misconfigured (I doubt it at this point), I guess that there's no easy fix to this problem. I'd say the options are:

1. Change the places tests to do less flushing to disk somehow when doing xpcshell tests (not sure whether this is worth doing: I guess it depends on the effort required). test_frecency is just the slowest of the lot: virtually everything that touches places seems to be much slower on these machines.

2. Do nothing and just live with this performance issue until we start running the windows tests on newer/different hardware (I hear there are plans for this, but no idea what the progress is).

Thoughts?
We talked a bit more about this at our GoFaster meeting, and I think our conclusion is that there's no real good specific solution to this. Other strategies like changing the hardware we use for Windows testing or parallelizing XPCShell tests may help, but they're out of scope for this bug.
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.