1272060 - Use low I/O, volatile storage for Places when running mochitests

Reporter

Description

•

8 years ago

Let's use the preference from bug 1272025 to drastically lower Places SQLite related I/O when running mochitests.

I'll write up in the commit message why I think this is safe to do.

Gregory Szorc [:gps]

Reporter

Comment 1

•

8 years ago

Attached file MozReview Request: Bug 1272060 - Put Places in low-I/O, volatile storage mode during tests — Details

By default, Places opens a SQLite database using a disk-based
WAL journal. When commits are performed, the SQLite files are
flushed to the filesystem - effectively waiting on the disk to
acknowledge writes. This can be very slow, especially on
spinning disks.

This commit changes the test environment to put Places/SQLite
in low-I/O, volatile storage mode. In this mode, the journal
will reside in memory and disk writes are only written to the
OS - we don't wait for data to be flushed to disk. This
drastically lowers the amount of I/O required for SQLite
operations.

The I/O required by Places/SQLite when running tests can
be significant. Tests like reftests, mochitests, and WPT
effectively load hundreds or even thousands of pages in
rapid order. Each page load has the potential to generate
I/O via Places recording the history visit. In effect,
these tests are stress testing Places/SQLite.

In bug 1271035, we recorded ~50 GB of requested I/O
related to Places when running reftests on Windows.
In automation, 10+ GB disk I/O was incurred during job
execution. Disabling Places wholesale in reftests elimited
most of this I/O, significantly speeding up test execution.

While this commit reduces the robustness of Places/SQLite
storage, I don't think it is relevant for tests. This is
because our tests are verifying that Places works, not that
SQLite writes are durable. The change to use more volatile
SQLite storage shouldn't make a difference to the behavior
of Places for purposes of testing Firefox. So the net effect
is we make Places/SQLite significantly faster without
sacrificing Places test coverage.

TODO document improvements measured in automation.

Review commit: https://reviewboard.mozilla.org/r/52009/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/52009/

Attachment #8751400 - Flags: review?(mak77)

Attachment #8751400 - Flags: review?(jmaher)

Gregory Szorc [:gps]

Reporter

Comment 2

•

8 years ago

Comment on attachment 8751400 [details]
MozReview Request: Bug 1272060 - Put Places in low-I/O, volatile storage mode during tests

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/52009/diff/1-2/

Joel Maher ( :jmaher ) (UTC -8)

Comment 3

•

8 years ago

Comment on attachment 8751400 [details]
MozReview Request: Bug 1272060 - Put Places in low-I/O, volatile storage mode during tests

https://reviewboard.mozilla.org/r/52009/#review48851

while I like this, I do wonder:
1) how much time this actually saves (I suspect a decent amount)
2) will the create lack of coverage on the places database- right now we probably hit a lot of odd edge cases, with this patch I would imagine less

Does this make sense for talos?
Should we write a unittest that stresses the places db?

Attachment #8751400 - Flags: review?(jmaher) → review+

Gregory Szorc [:gps]

Reporter

Comment 4

•

8 years ago

https://reviewboard.mozilla.org/r/52009/#review48851

I'll add the job time numbers once I get them. I only have data for Linux so far. It doesn't change much on Linux because our I/O there is pretty decent. It should result in measurable savings on OS X and Windows because spinning disks.

I argue this doesn't really cause a lack of coverage on the Places database (see commit message). But I have flagged Marco for review just to be sure.

I think we should *not* change Talos (at least the Talos tests measuring page load performance) because Places can have an impact on Talos performance and we want Talos to be testing what we ship to users. To clarify: Talos doesn't inherit prefs_general.js, right? I thought it had its own preferences list. But I could be wrong...

Marco Bonardo [:mak]

Comment 5

•

8 years ago

I don't think this should happen for every mochitest harness, it should only happen for those not explicitly testing Places. That is, I think it should not happen for mochitest-browser, and should not happen for mochitests in places/ folders. The reason is that mochitest-browser is closer to the real UI usage and can sometimes uncover concurrency bugs that only exist when we are in WAL mode (intermittents often uncovered those bugs). I care a less about other mochitests , even if that is still true, cause there are less interesting code paths hit. In general, we should always test what we ship to the users, not something else.

The other problem I have is that we should not allow final users to set this pref, it will be very tempting for users or "perf add-ons" to set this pref, and then we'd have to deal with broken dbs reports caused by this non standard setting, and we may even not know the pref is set. Which alternatives do we have to make this a build time options that only applied to tinderbox builds but not to Nightly/Aurora/Beta/Release?

Joel Maher ( :jmaher ) (UTC -8)

Comment 6

•

8 years ago

talos has its own preference definitions, so this change won't affect it.  I am fine leaving this alone in talos, thanks for the thoughts here.

::mak, is there a way that we could identify the list of tests to not have this pref?  We could set it in the mochitests (ideally in something like head.js for browser-chrome), or since we do run-by-dir, we can find a way to either fix this in the manifest files (.ini files) to set a pref or have the harness determine if we are in this list of directories and change the prefs.

Marco Bonardo [:mak]

Comment 7

•

8 years ago

once we have a way to ignore the pref in user-released builds, surely we could use head files for exclusions.
As I said, any tests in toolkit/components/places/ and browser/components/places and any mochitest-browser test in browser/ (and subfolders) should test the real thing we ship

Gregory Szorc [:gps]

Reporter

Comment 8

•

8 years ago

Release engineering is now using "build promotion" which means we take normal builds and ship them to users. So e.g. a PGO build on central becomes Nightly. This means we can't simply have a build time flag control the behavior: we need something that can be set at run time.

We have undocumented prefs that enable unwanted behavior. "security.turn_off_all_security_so_that_viruses_can_take_over_this_computer" comes to mind. Perhaps we could rename the pref to "places.allow_database_to_lose_data_and_become_corrupt" or something. And we can have AMO screen for extensions setting that pref as an extra guard against people getting this set.

As for per-directory and per-suite options, that makes sense. I'll see what I can cook up.

Gregory Szorc [:gps]

Reporter

Comment 9

•

8 years ago

Comment on attachment 8751400 [details]
MozReview Request: Bug 1272060 - Put Places in low-I/O, volatile storage mode during tests

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/52009/diff/2-3/

Attachment #8751400 - Attachment description: MozReview Request: Bug 1272060 - Put Places in low-I/O, volatile storage mode during tests; r?jmaher, mak → MozReview Request: Bug 1272060 - Put Places in low-I/O, volatile storage mode during tests

Gregory Szorc [:gps]

Reporter

Comment 10

•

8 years ago

I /think/ the OS X 10.10 opt bc7 job at https://treeherder.mozilla.org/#/jobs?repo=try&revision=977aef8d14c758edfd3641c7073bcd709bb566c3 increased the frequency of some intermittents of tests touching Places. That Linux ASAN mda intermittent is also kinda wonky. At first glance it doesn't appear related to Places. But the failure rate is pretty high...

Marco Bonardo [:mak]

Comment 11

•

8 years ago

(In reply to Gregory Szorc [:gps] from comment #8)
> Release engineering is now using "build promotion" which means we take
> normal builds and ship them to users. So e.g. a PGO build on central becomes
> Nightly. This means we can't simply have a build time flag control the
> behavior: we need something that can be set at run time.

Could we have something that is set only when tests run instead, like an env variable?
So when we read the pref, if the env var is not set we ignore it.
I don't care that much about the fact we already have footguns, 2 wrongs don't make a right :)

Gregory Szorc [:gps]

Reporter

Comment 12

•

8 years ago

Yes, making the pref only work if an env variable is set would be fine and easy to do. I don't think it will stop people from footgunning themselves. But it is better than nothing.

(not currently active) Ted Mielczarek

Comment 13

•

8 years ago

At some point in the past I remember we looked into why xpcshell tests were so slow on Windows (bug 617503), and determined that it was mostly disk I/O. Unfortunately we probably can't turn this off for the xpcshell tests that are actually testing Places.

Maybe we could try getting faster disks in our test machines?

Marco Bonardo [:mak]

Comment 14

•

8 years ago

If you can build something like a statistics of the tests that are doing more I/O, it could have 2 advantages:
1. we could look into them and maybe figure out alternatives to reduce it
2. maybe by looking into those, we could find perf improvements that would turn to the final users

Something like the Orange Factor for I/O, even a very simple hall of shame.

Marco Bonardo [:mak]

Comment 15

•

8 years ago

Comment on attachment 8751400 [details]
MozReview Request: Bug 1272060 - Put Places in low-I/O, volatile storage mode during tests

https://reviewboard.mozilla.org/r/52009/#review50302

I'm ok with the idea, but we should exclude some tests harnesses (comment 5) that are critical for Places functionality.
plus we need the new ENV var for this to be effective.

Attachment #8751400 - Flags: review?(mak77)

Gregory Szorc [:gps]

Reporter

Comment 16

•

7 years ago

I'm not actively working on this.

Assignee: gps → nobody

Status: ASSIGNED → NEW

Marco Bonardo [:mak]

Comment 17

•

7 years ago

fwiw, last year perf improvements and now bug 1356220 should have helped here, by increasing the wal autocheckpoint value to 2MB, that is unlikely to be hit by tests. So while we'll write yet, we likely won't fsync that much and some pages may be written less times to the db. We could also expose a pref to further increase the autocheckpoint in tests if it helps, and that pref could indeed be used from prefs_general.

We should re-run the `mach reftest` test with Sysinternals Process Monitor and check how bad the current situation is yet, and maybe measure if a 10MB autocheckpoint would make it even better.
I'll see if I can find time to do those couple measurements in the next weeks.

Flags: needinfo?(mak77)

Gregory Szorc [:gps]

Reporter

Comment 18

•

7 years ago

(In reply to Marco Bonardo [::mak] from comment #17)
> We should re-run the `mach reftest` test with Sysinternals Process Monitor
> and check how bad the current situation is yet, and maybe measure if a 10MB
> autocheckpoint would make it even better.
> I'll see if I can find time to do those couple measurements in the next
> weeks.

Just don't do it with reftests: we disabled places in reftests in bug 1271035 because places isn't relevant to reftests.

Marco Bonardo [:mak]

Comment 19

•

7 years ago

didn't find the time, but it's still a valid concern. Moving to Places backlog.
considered the improvementa and the fact one can disable global history in browsers, the issue is workaroundable.
Also, if we end up not using places.database.disableDurability (and we are currently not using it) we may evaluate removing it.

Component: Mochitest → Places

Flags: needinfo?(mak77)

Priority: -- → P3

Product: Testing → Toolkit

Version: Version 3 → unspecified

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

Marco Bonardo [:mak]

Comment 20

•

7 months ago

I think it's worth investigating disabling history for certain tests subsets, rather than overall.

Status: NEW → RESOLVED

Closed: 7 months ago

Resolution: --- → INCOMPLETE