Avoid main-thread IO for permissions.sqlite

NEW
Unassigned

Status

()

defect
6 years ago
4 months ago

People

(Reporter: rvitillo, Unassigned)

Tracking

(Depends on 2 bugs, Blocks 4 bugs, {main-thread-io})

Trunk
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [fxperf:p3], )

It seems that "permissions.sqlite" is a top main-thread IO offender (see URL). Would it be feasible to move some of the disk IO off the main thread?
Blocks: 572459
No longer blocks: 649889
Not easily without a complete rewrite of the permissions API (nsIPermissionManager).

We could of course use one of the hacks we've used for other similar services (delay loading with sync I/O fallback, async "initialize service" API, etc.).
Is this stuff on the qf/photon-perf radar (maybe in the form of dupes of this bug) ?
Flags: needinfo?(florian)
Flags: needinfo?(ehsan)
How up to date is this bug?  It was filed three years ago...  The permission manager uses async statements to talk to the DB after initialization <https://searchfox.org/mozilla-central/rev/507743376d1ba753d14ab6b9305b7c6358570be8/extensions/cookie/nsPermissionManager.cpp#1554>, and bug 1337056 is on file to make its initialization asynchronous but it's hard to believe that it's initialization is a top offender these days.
Flags: needinfo?(ehsan)
(In reply to :Gijs from comment #3)
> Is this stuff on the qf/photon-perf radar (maybe in the form of dupes of
> this bug) ?

No, and per comment 4, it seems this could be resolved as WORKSFORME.
Flags: needinfo?(florian)
I'm not sure what bug 1337056 did, but mDBConn is not an asynchronous connection, and indeed all of the opening and schema creation are synchronous and doing main-thread IO. See http://searchfox.org/mozilla-central/rev/20963d7269b1b14d455f47bc0260d0653015bf84/extensions/cookie/nsPermissionManager.cpp#970
It seems even worse, at the end of initDB there's a call to nsPermissionManager::Read() that reads the whole db contents synchronously. Either this is lefacy code and bug 1337056 switched us to a new store, or this is far from being WFM.
CC'ing mystor who was asking me about this recently.
See also: https://bugzilla.mozilla.org/show_bug.cgi?id=1363541 specifically with regard to the startup of the permission manager.
Depends on: 1363541
(In reply to Marco Bonardo [::mak] from comment #7)
> It seems even worse, at the end of initDB there's a call to
> nsPermissionManager::Read() that reads the whole db contents synchronously.
> Either this is lefacy code and bug 1337056 switched us to a new store, or
> this is far from being WFM.

That code changed permission manager initialization in the content process, not in the chrome process. Initialization in the chrome process is still synchronous main thread IO. bug 1363541 is tracking trying to make that at least partially async as well, but I don't think I will have time to fix it before 57.

In my startup tests this sync I/O usually took under 10ms, and happens during startup, so it was decided that it wasn't important enough to make [qf:p1], but we can reevaluate if you have information showing it having a larger impact.
Have you measured the 10ms on the reference hardware?
Flags: needinfo?(michael)
(In reply to Marco Bonardo [::mak] from comment #7)
> It seems even worse, at the end of initDB there's a call to
> nsPermissionManager::Read() that reads the whole db contents synchronously.
> Either this is lefacy code and bug 1337056 switched us to a new store, or
> this is far from being WFM.

Bug 1337056 had nothing to do with the database access here, FWIW.  If you are looking for a bug to blame for not doing what needed to be done for proper async DB access, you should really blame bug 724878.  :P
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #11)
> Have you measured the 10ms on the reference hardware?

I think I saw that number in a profile which you showed me from the reference hardware.

I would be open to adding a telemetry probe around how long initializing nsPermissionManager takes for a version or so to see if it's a big problem in the "real world". Do you think that's a good idea?
Flags: needinfo?(michael) → needinfo?(ehsan)
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #12)
> Bug 1337056 had nothing to do with the database access here, FWIW.  If you
> are looking for a bug to blame

I'm not trying to blame any bug, at a maximum I should blame myself as the Storage owner :( It's just that the title of that bug is very misleading, since initialization in the chrome process is synchronous, and indeed you were also confused by it in comment  Thanks :mystor for clarifying where we are!
(In reply to Marco Bonardo [::mak] from comment #14)
> (In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from
> comment #12)
> > Bug 1337056 had nothing to do with the database access here, FWIW.  If you
> > are looking for a bug to blame
> 
> I'm not trying to blame any bug, at a maximum I should blame myself as the
> Storage owner

I meant blame as in hg/git blame, not as in English blame.  Sorry for the misunderstanding!  :-)  My point was that bug 1337056's goal wasn't trying to make the DB access async, it was trying to make the IPC layer async...  Bug 724878 however _was_ trying to make the DB access async but it seemed like it failed to do so for the reasons you mentioned in comment 6.  (FWIW I certainly was under the misguided impression that merely calling the async APIs is all you need to do... this may be worth sending a note about to dev-platform/firefox-dev about.  I'm not sure how many others may have the wrong expectation about this API...)

(In reply to Michael Layzell [:mystor] from comment #13)
> (In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from
> comment #11)
> > Have you measured the 10ms on the reference hardware?
> 
> I think I saw that number in a profile which you showed me from the
> reference hardware.

Yeah I do remember us looking at it.  Unfortunately I don't remember where the profile came from.  :-/

> I would be open to adding a telemetry probe around how long initializing
> nsPermissionManager takes for a version or so to see if it's a big problem
> in the "real world". Do you think that's a good idea?

I think adding a telemetry around it is definitely a good idea.  Doing some more local measurements is also a good idea.

Florian, any chance you could look in the slow startup profiles you have captured recently to look for traces of permission manager initialization?
Flags: needinfo?(ehsan) → needinfo?(florian)
(In reply to Ehsan from comment #15)

> Florian, any chance you could look in the slow startup profiles you have
> captured recently to look for traces of permission manager initialization?

Here is a startup profile of the current nightly on the reference hardware:
https://perfht.ml/2ti5Tez

It shows nsPermissionManager::InitDB taking 152ms on the main thread, that's 12.9% of the time before first paint.

Note: this is neither a cold startup nor a warm startup; it was the first startup after an update (so most system libraries are already in disk cache)
Flags: needinfo?(florian)
Whiteboard: [fxperf]
Whiteboard: [fxperf] → [fxperf:p3]
Blocks: 1543203
You need to log in before you can comment on or make changes to this bug.