Open Bug 1425572 Opened 3 years ago Updated 3 months ago

Consider flushing I/O as part of running Talos tests

Categories

(Testing :: Talos, enhancement, P3)

enhancement

Tracking

(Not tracked)

People

(Reporter: gps, Unassigned)

Details

I performed a Try push that effectively disables fsync in SQLite databases used by Firefox. The results were interesting.

https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=8577f7af1caf16525ab9c8db608c2da62c491cb9&framework=1&showOnlyImportant=1&selectedTimeRange=172800

We observe a significant (~50 MB / 17% ) reduction in I/O during tp5n nonmain_normal_fileio. This is telling me that disabling fsync() is resulting in fewer I/O operations reaching the filesystem. That's to be expected.

What's more interesting I think is that we see significant responsiveness regressions in e.g. tp6_facebook pgo 1_thread e10s.

What I think is happening is that when we disable fsync() in SQLite, a bunch of I/O writes buffer in the operating system's filesystem cache. Then, some other process (likely in Firefox) triggers an fsync(). This fsync() forces a flush of pending writes. Since there is more data that must be flushed, this fsync() takes longer than it would if SQLite were doing fsync()'s. Something somewhere is waiting on an I/O operation to complete. And this extra waiting is causing the responsiveness Talos numbers to increase.

Since I/O isn't immediate unless an fsync() is in play (which Firefox coincidentally does a lot of during normal operations), what we could be seeing is I/O from previous tests "bleeding over" into subsequent tests. For example, if we have Talos tests A and B running in the same process, A could incur a lot of write I/O. Test B performs a fsync() and this flushes data left over from test A. In other words, test B is measuring remnants of test A and the measurements from B may be "contaminated."

I think we should look into:

* Forcing an I/O flush between Talos tests/subtests
* Measuring I/O occurred during flushing of a measured test in addition to I/O during the test itself. This will help us isolate incurred versus deferred I/O and will help paint a better picture of overall I/O patterns

Forcing an I/O flush between tests/subtests could be difficult. If there is process separation, a fflush() would work. However, if Firefox is running, to do this right would require some kind of mechanism within Firefox itself to force flush any pending writes (which may be queued behind timers, etc).

:acreskey is this something you've looked into regarding reducing noise in page load tests? The bug is related to Talos, but the majority of our page load tests have since moved to Raptor. Perhaps this is worth investigating further?

Flags: needinfo?(acreskey)
Priority: -- → P3

Thanks Dave - SQLite flushing IO is not something that I've looked at. But I have found in Bug 1589356 that there can be heavy file IO while the tests are running.
Because these results are older, I've kicked off a new test of the above, using raptor pageload.

I also find it interesting that on android we disable synchronous io, so I've kicked off a test to validate that choice from 2011.

I didn't get the same results as in comment 1 when I disabled synchronous storage (toolkit.storage.synchronous).
At least, not on pageload.
Desktop:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=965bdcb397b6344e3b22890ec2854304c337abfd&framework=10&selectedTimeRange=604800

On Android it's already disabled, so here I enabled it.
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=9c19d16b1802c7a7e0503d8909f2c1bd742ee31d&framework=10&selectedTimeRange=604800
I don't see a clear pattern, might regress some sites, maybe help one.
I think a baseline commit to compare against would give a better view.

Flags: needinfo?(acreskey)
You need to log in before you can comment on or make changes to this bug.