1361091 - Can we avoid places main thread IO before the first window is shown?

Reporter

Description

•

7 years ago

See this startup profile on the quantum reference hardware: https://perfht.ml/2p1y7ce

During early startup, we spend 100ms calling mozilla::storage::Service::OpenUnsharedDatabase from mozilla::places::Database::InitFaviconsDatabaseFile and 43ms calling it from mozilla::places::Database::InitDatabaseFile.

We also spend 56ms in mozilla::storage::Connection::Close() called from mozilla::places::Database::InitFaviconsDatabaseFile. Do we really need to close this database (which flushes to disk) during startup?

If making all of this IO async is difficult (as bug 975979 seems to indicate), could we delay some of this to after the first window is shown?

Marco Bonardo [:mak]

Comment 1

•

7 years ago

Making init async is impossible, atm. We didn't even finish converting all the APIs to be async.

Afaict, this is not a bug in Places, it's a bug in its consumer, the browser.

Delaying means delaying components using the bookmarks or history services, that includes things like migrating data from other browsers (included automigration) that we currently do before showing the first window, on purpose, and importing our own default bookmarks. And then the docshell will init history as soon as it needs to use global history but I suppose at this point we have a window.
It is non-trivial, probably one could take the stacks initing the services one by one, and see what is needed to delay them. I can't exclude that some of those stacks are easily fixable, especially if the only requirement is to wait for the first window.

(In reply to Florian Quèze [:florian] [:flo] from comment #0)
> See this startup profile on the quantum reference hardware:
> https://perfht.ml/2p1y7ce
> 
> During early startup, we spend 100ms calling
> mozilla::storage::Service::OpenUnsharedDatabase from
> mozilla::places::Database::InitFaviconsDatabaseFile and 43ms calling it from
> mozilla::places::Database::InitDatabaseFile.

This is measured on a new profile, where we must create the schema yet, right?
On an existing profile it's taking far less, since it just has to open the db and check the schema version.

> We also spend 56ms in mozilla::storage::Connection::Close() called from
> mozilla::places::Database::InitFaviconsDatabaseFile. Do we really need to
> close this database (which flushes to disk) during startup?

We could potentially delay this, the reason to close the connection is just that we don't need it anymore after the db has been created. Again, this happens only on a new profile.

Component: Places → General

Product: Toolkit → Firefox

Florian Quèze [:florian]

Reporter

Comment 2

•

7 years ago

(In reply to Marco Bonardo [::mak] from comment #1)

> > During early startup, we spend 100ms calling
> > mozilla::storage::Service::OpenUnsharedDatabase from
> > mozilla::places::Database::InitFaviconsDatabaseFile and 43ms calling it from
> > mozilla::places::Database::InitDatabaseFile.
> 
> This is measured on a new profile, where we must create the schema yet,
> right?
> On an existing profile it's taking far less, since it just has to open the
> db and check the schema version.

Unfortunately, no. I haven't figured out yet how to run the profiler on a new profile (because I need... well, the profiler to be installed ;-)).

I just saw another startup profile this morning with this, so it seems to happen at each startup.

> > We also spend 56ms in mozilla::storage::Connection::Close() called from
> > mozilla::places::Database::InitFaviconsDatabaseFile. Do we really need to
> > close this database (which flushes to disk) during startup?
> 
> We could potentially delay this, the reason to close the connection is just
> that we don't need it anymore after the db has been created. Again, this
> happens only on a new profile.

Is there any way to close asynchronously a database that has been opened synchronously? If not, could we at least run this from an idle callback?

Marco Bonardo [:mak]

Comment 3

•

7 years ago

(In reply to Florian Quèze [:florian] [:flo] from comment #2)
> > This is measured on a new profile, where we must create the schema yet,
> > right?
> > On an existing profile it's taking far less, since it just has to open the
> > db and check the schema version.
> 
> Unfortunately, no. I haven't figured out yet how to run the profiler on a
> new profile (because I need... well, the profiler to be installed ;-)).
> 
> I just saw another startup profile this morning with this, so it seems to
> happen at each startup.

After looking at the code, you are right, we indeed must open the database to check that it's not corrupt.
We could maybe avoid this and rely on ATTACH DATABASE failures to detect a problem.
Looks like we will also need to try to remove the right db on case of corruption, so that needs some additional work but I think it's worth it.

> > > We also spend 56ms in mozilla::storage::Connection::Close() called from
> > > mozilla::places::Database::InitFaviconsDatabaseFile. Do we really need to
> > > close this database (which flushes to disk) during startup?
> > 
> > We could potentially delay this, the reason to close the connection is just
> > that we don't need it anymore after the db has been created. Again, this
> > happens only on a new profile.
> 
> Is there any way to close asynchronously a database that has been opened
> synchronously? If not, could we at least run this from an idle callback?

Yes, we can asyncClose a synchronous connection.

So, I can probably fix this for existing profiles, fixing it for new profiles would be quite more complex. I'll file a separate bug for the existing profiles case.

Marco Bonardo [:mak]

Updated

•

7 years ago

Depends on: 1361322

(no longer active)

Comment 4

•

7 years ago

(In reply to Florian Quèze [:florian] [:flo] from comment #2)
> (In reply to Marco Bonardo [::mak] from comment #1)
> 
> > > During early startup, we spend 100ms calling
> > > mozilla::storage::Service::OpenUnsharedDatabase from
> > > mozilla::places::Database::InitFaviconsDatabaseFile and 43ms calling it from
> > > mozilla::places::Database::InitDatabaseFile.
> > 
> > This is measured on a new profile, where we must create the schema yet,
> > right?
> > On an existing profile it's taking far less, since it just has to open the
> > db and check the schema version.
> 
> Unfortunately, no. I haven't figured out yet how to run the profiler on a
> new profile (because I need... well, the profiler to be installed ;-)).

I think I maybe mentioned this to mstange at some point...  If there is no bug on file for it, we should get one on file.  We should make it possible to capture and dump profiles somewhere without having the extension installed.

Markus Stange [:mstange]

Comment 5

•

7 years ago

You can start the profiler using the env var MOZ_STARTUP_PROFILING=1, then install the add-on and capture a profile right away, and that profile should contain the whole startup including the time before the add-on was installed.

Jean Gong :jgong

Updated

•

7 years ago

Whiteboard: [qf] → [qf:investigate]

Florian Quèze [:florian]

Reporter

Comment 6

•

7 years ago

I now have a first-startup profile (ie. we are creating the profile during this session) : https://perfht.ml/2sg0mZz

Marco Bonardo [:mak]

Comment 7

•

7 years ago

Most of those 130ms on Close are spent doing a checkpoint operation that causes an fsync, moving that fsync on a separate thread sounds good. AsyncClose also has a cost in this case, since it has to spawn a new thread and post to it (with all of its lockings), but should almost always be cheaper than an fsync.

The problem is that from my previous testing just using AsyncClose here completely breaks Places, because the next ATTACH DATABASE query fails (closing opens a transaction and disallows ATTACH calls). What's left to try is to delay the closing to a later time (after the ATTACH call) and see if that works.

Flags: needinfo?(florian)

Marco Bonardo [:mak]

Updated

•

7 years ago

Flags: needinfo?(florian)

Marco Bonardo [:mak]

Comment 8

•

7 years ago

It would be interesting to measure again on the reference hardware, after we handled the most obvious things:
1. Favicons database now is not opened/closed just to check if it's corrupt
2. The database connection is delayed as far as possible, it is initialized when the first consumer needs to run a query

Depends on: 1371677

Firefox Bug Husbandry Bot

Updated

•

7 years ago

Keywords: perf

Firefox Bug Husbandry Bot

Updated

•

6 years ago

Whiteboard: [qf:investigate] → [qf:investigate][fxperf]

Alex Thayer [:alexical] (she/her)

Comment 9

•

6 years ago

On reference hardware:

Startup profile on first run: https://perfht.ml/2Ex8OL4
Startup profile on second run: https://perfht.ml/2EB4VEH

In summary, it looks like on first run we're down from 400ms to 200ms, and on subsequent runs we've largely eliminated the cost (down to 6ms)?

Priority: -- → P3

Florian Quèze [:florian]

Reporter

Comment 10

•

6 years ago

(In reply to Doug Thayer [:dthayer] from comment #9)
> On reference hardware:
> 
> Startup profile on first run: https://perfht.ml/2Ex8OL4
> Startup profile on second run: https://perfht.ml/2EB4VEH

What do you mean with "first run" and "second run", is it cold vs warm startup, or is it when creating a profile and then when re-using it?

> In summary, it looks like on first run we're down from 400ms to 200ms, and
> on subsequent runs we've largely eliminated the cost (down to 6ms)?

Measuring main thread IO in 'ms' rarely makes sense, because it tends to take a random amount of time, depending mostly on what the system is doing in the background.

Alex Thayer [:alexical] (she/her)

Comment 11

•

6 years ago

(In reply to Florian Quèze [:florian] from comment #10)
> What do you mean with "first run" and "second run", is it cold vs warm
> startup, or is it when creating a profile and then when re-using it?

Sorry, good point. What I meant is creating a profile vs reusing it. Warm in both cases, so I can do a cold start for a better comparison.

> Measuring main thread IO in 'ms' rarely makes sense, because it tends to
> take a random amount of time, depending mostly on what the system is doing
> in the background.

That's certainly fair - is there an efficient measure or a method that you generally use for determining the severity of main thread IO? Just eyeballing what calls are actually made and using intuition?

Marco Bonardo [:mak]

Comment 12

•

6 years ago

Btw, the profiles show more or less what we already know, the db creation is on main thread, we pay the cost of the fsync at the end of the creation, unfortunately we can't do anything about this without lots of work (that we are doing, but it's quite demanding).

Florian Quèze [:florian]

Reporter

Comment 13

•

6 years ago

(In reply to Doug Thayer [:dthayer] from comment #11)

> > Measuring main thread IO in 'ms' rarely makes sense, because it tends to
> > take a random amount of time, depending mostly on what the system is doing
> > in the background.
> 
> That's certainly fair - is there an efficient measure or a method that you
> generally use for determining the severity of main thread IO? Just
> eyeballing what calls are actually made and using intuition?

Kinda, yes. All main thread IO is bad. The more often it is being done, the worse it is. Doing main thread IO twice is worse than doing it once (ie. if (file.exists()) file.read(); is worse than try { file.read() }, or reading 2 small files synchronously is worse than reading a single file synchronously containing the same amount of data).

I would like to put in place automated tests with a whitelist to ensure we don't introduce more main thread IO, but it's not been done yet.

Florian Quèze [:florian]

Reporter

Comment 14

•

6 years ago

(In reply to Marco Bonardo [::mak] from comment #12)
> Btw, the profiles show more or less what we already know, the db creation is
> on main thread, we pay the cost of the fsync at the end of the creation,
> unfortunately we can't do anything about this without lots of work (that we
> are doing, but it's quite demanding).

Isn't this bug WFM at this point? I thought we had already delayed initializing Places until after first paint. Or is the behavior different when creating a new profile?

Marco Bonardo [:mak]

Comment 15

•

6 years ago

don't we have the talos xperf test?
I just fear its whitelist is not up-to-date :(

Marco Bonardo [:mak]

Comment 16

•

6 years ago

(In reply to Florian Quèze [:florian] from comment #14)
> Isn't this bug WFM at this point? I thought we had already delayed
> initializing Places until after first paint. Or is the behavior different
> when creating a new profile?

The behavior is more or less the same, it depends if someone runs Places code before the first paint. For example the new policies will do IIRC, because they want to touch bookmarks before the ui is shown.

I also expect this to be WFM, but we need a test checking that is true and preventing regressions. Not sure in which shape.
We can also delay some things further after the UI is shown, like the star button, some things in Activity Stream (<link> icons and link coloring). Ideally just opening Firefox should not initialize the Places DB. I have a patch that asserts whe nthe db is initialized, printing out the js stack too. iirc it was hit by a few things after the ui startup, and some of those can be avoided.

Florian Quèze [:florian]

Reporter

Comment 17

•

6 years ago

Marking as fxperf:p3 for now to get it off the triage list, but I'm not sure this bug is actionable. Is the next step to write a test? Is it a duplicate of an existing meta bug about making Places I/O async?

Whiteboard: [qf:investigate][fxperf] → [qf:investigate][fxperf:p3]

Marco Bonardo [:mak]

Comment 18

•

6 years ago

FWIW, Bug 975979 is the one about async I/O in Places.

Mike Conley (:mconley) (:⚙️)

Updated

•

5 years ago

Whiteboard: [qf:investigate][fxperf:p3] → [qf:responsiveness:p2][fxperf]

:Felipe Gomes (needinfo for replies!)

Comment 19

•

5 years ago

Is this a good representation of this bug?

The general problem initially reported here, sync IO due to favicons, has been fixed by the favicons work
It looks like this bug is probably WFM
The pending task to do was to write a test to prevent regressions

Florian is already planning to create a test to protect against adding more main-thread-IO during startup (see the beginning of this effort here: bug 1529000 -- I don't know if there's already a bug for the proper test).

So do you agree on closing this bug as WORKSFORME and let the follow-up test work happen somewhere else? If the test is a whitelist then we don't need to explicitly think about blacklisting what was mentioned here

Flags: needinfo?(florian)

Florian Quèze [:florian]

Reporter

Comment 20

•

5 years ago

https://perfht.ml/2Itg9NV shows where Places I/O is currently triggered; that's after first paint, triggered by resource://activity-stream/lib/LinksCache.jsm

This was fixed in bug 1371710 and test coverage was added in bug 1373387.

Status: NEW → RESOLVED

Closed: 5 years ago

Flags: needinfo?(florian)

Resolution: --- → DUPLICATE

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

2 years ago

Performance Impact: --- → P2

Keywords: perf:responsiveness

Whiteboard: [qf:responsiveness:p2][fxperf] → [fxperf]