50+ GB of read I/O in places.sqlite shortly after starting Nightly




2 years ago
2 months ago


(Reporter: gps, Unassigned)



Firefox Tracking Flags

(Not tracked)




2 years ago
I've been playing around with procmon on Windows, seeing what Firefox processes are doing.

On my Windows 10 home PC tonight, I recorded 57,614,111,020 read bytes over 1,758,247 read operations to places.sqlite starting within a few seconds of startup. There were also 167,968,768 write bytes to places.sqlite and 175,018,232 write bytes to places.sqlite-wal. My places.sqlite is ~62,000,000 bytes.

The I/O pattern looks like we read the first ~20MB of places.sqlite in 64kb chunks hundreds of times over and over. There are chunks of writes, but they occur after several read passes.

I have no clue what is going on. I looked at the source history of toolkit/components/places and didn't see any schema changes recently. Perhaps Places was doing a vacuum or something. Even then, the ratio of read amount of file size is 930:1! That feels quite excessive.

My machine has an SSD, so I didn't really notice anything really intensive. And most of these read operations I reckon were serviced by the page cache. But still, I measured 16.2s of time spent in system level I/O function calls for places.sqlite. And there were >100MB of write I/O (keep in mind SQLite write I/O requires a disk flush - it doesn't just hand data over to the OS).

I have the PML trace saved. I have relative offsets within xul.dll of the stacks. So I should be able to figure out what is triggering this. Running the 2016-05-17 64-bit Nightly from https://hg.mozilla.org/mozilla-central/rev/a884b96685aa13b65601feddb24e5f85ba861561.
mozStorage logs using MOZ_LOG as "mozStorage"; you can see all the SQL that's happening which should be helpful.  On linux, I do the following, but you can use the updated env variables and the appropriate windows idioms:
NSPR_LOG_MODULES=timestamp,mozStorage:5 firefox-nightly/firefox |& tee /tmp/mainlog

Since only a subset of the database is being read and Places VACUUMs and so we do expect the database tables to be clumped in terms of file offset ordering, the most likely explanation is something is running inefficient table scan queries.  Any chance you're running any of the new test pilot extensions (because they're new and therefore easier to blame? :)

I have some tooling that can automatically parse the storage logs[1]and run EXPLAIN on them so we can maybe figure things out without too much difficulty if there's no obvious smoking gun in the SQL

1: Which depends on an unlanded patch of mine I need to fix up.  So some assembly required, but I can do that.
(In reply to Andrew Sutherland [:asuth] from comment #1)
> the most likely explanation is something is running inefficient
> table scan queries.

The awesomebar does, it always did cause we don't have a fulltext index and never had resources to implement one. So if you search for something that doesn't match, we may end up reading the whole db contents.
But I don't know if this was the case, I think the nsprlog is a good way to check that.

The last schema migration was to v32, that landed on 28th of April.
FWIW, we are working on the database size and efficiency along with the other awesomebar improvements, those changes should also help reducing these numbers (things like bug 889561, bug 1209027, and the recently landed bug 1262887)
do you have Clear history on Shutdown enabled by chance? cause in some cases we know clear history can generate a huge -wal journal (for my 100MB db I've see the wal growing up to 4GB) and then that journal must be merged (bug 871908)
Keywords: perf
Priority: -- → P3

Comment 4

2 months ago
(In reply to Marco Bonardo [::mak] from comment #3)
> do you have Clear history on Shutdown enabled by chance?
Flags: needinfo?(gps)

Comment 5

2 months ago
No I do not have clear history on shutdown enabled.
Flags: needinfo?(gps)
You need to log in before you can comment on or make changes to this bug.