[email] Consider/try optimizing back-end start-up so databases come up fast, protocol JS load and init are deferred

RESOLVED FIXED

Status

Firefox OS
Gaia::E-Mail
RESOLVED FIXED
6 years ago
5 years ago

People

(Reporter: asuth, Assigned: jrburke)

Tracking

unspecified
x86_64
Linux
Dependency tree / graph

Firefox Tracking Flags

(b2g18 fixed, b2g18-v1.0.1 fixed)

Details

Attachments

(1 attachment)

(Reporter)

Description

6 years ago
In order to get messages to the UI as fast as possible, we want the back-end to only initialize what is required to get cached messages out of the database.

This would probably consist of:
- Break startup into 2 phases: offline and online.  By default, all constructors for accounts just bring the account up into offline mode.  Only MailStorage instances get created, ideally, no protocol stuff.
- Do not require() any online/protocol stuff until online start.  This implies changing to CommonJS-style synchronous-looking require() invocations only triggered at runtime, at least for the offline stuff.  Note that we would not use the regex scanning/dynamic script figuring-out stuff for production, but we would explicitly pre-load the script tags and not invoke our offline startup until they are loaded so there would be no performance hits to this.
- Do not even add the <script> tags for any online/protocol stuff until offline startup has fully completed.
Blocks: 817095
(Reporter)

Comment 1

5 years ago
James is currently helping out with this, particularly in the area of investigating us cutting our dependency tree into the requisite sub-trees so that we can still concatenate/minify the initial load group and potentially then load the dep sub-trees in optimized blobs too (if the numbers are better that way for load time and latency/responsiveness.)
Assignee: nobody → jrburke
Status: NEW → ASSIGNED
(Assignee)

Comment 2

5 years ago
Here is an initial shot at breaking out the activesync code to load in a secondary layer, based on a brief discussion with asuth:

https://github.com/jrburke/gaia-email-libs-and-more/compare/master...bug814271-emaildefer

I am not doing a pull request yet, because the big change would be reworking accountcommon.js to do some setup lazily/async, and I am unsure on the impact for that. I put a change in accountcommon.js that shows what would need to be done as far as loading the activesync code.

CAREFUL: if you pull down this code and run the make command to copy things to gaia, it will alter your gaia repo, and since accountcommon.js is broken at the moment so bad things will happen.

alameda.js is swapped in for almond.js because almond cannot do dynamic loading. So, there is an addition of alameda.js, removal of almond.js and a new activesync.js build layer that is loaded dynamically. Sizes (using a quick script to remove comments and extraneous empty multilines):

8632 almond.js
33829 alameda.js
138710 activesync.js

So, an addition of 25K up front to delay 138K later, with the option to delay load more as the modules are identified.

The main trick is working out how accountcommon.js could be reworked fully to async load the activesync.js build layer.

This change is only in this gaia-email-libs-and-more repo. If accountcommon.js is modified to work then no change should be needed in the gaia repo.
(Reporter)

Comment 3

5 years ago
The incremental alameda cost will be offset by chucking q.js; un-shrunk it's 40941 bytes.  It's ballast that quasi-accidentally gets pulled in by the logging framework which in one spot generates a promise for its test-framework interaction.  We can replace q.js with a NOP file, but there are likely easy ways to properly eliminate the dep, like have the test framework push in the deferred generator, etc.

accountcommon has 3 things going on inside it:
- CompositeAccount class, which is really only needed by IMAP/SMTP
- new account logic/support and account re-creation on database rev: Configurators, Autoconfigurator, recreateAccount, tryToManuallRecereateAccount
- account loading as used by MailUniverse._loadAccount: COMPOSITE_ACCOUNT_TYPE_TO_CLASS, accountTypeToClass.

The root entry points for the logic are all in mailuniverse.js's MailUniverse.  tryToCreateAccount and _loadAccount are really the keys.  tryToCreateAccount is already a fully async process, so it's straightforward.  _loadAccount is currently synchronous, but could be made asynchronous without too much hassle.  As alluded to on IRC, we can pick to load only the default (or until I fix that, first account) account, then load the others later once we have at least started the sync for that first account.  Because the front-end's list of folders is a list that can be dynamically updated, it's not the end of the world to the front-end if it doesn't find out about the other accounts for a second or two.

(We could also tell the front-end about them and make sure to trigger account load if we get any requests for them first, but that's got a lot more surface area, so it seems best to defer.)
(Reporter)

Comment 4

5 years ago
And I think we should probably split out those 3 different aspects of accountcommon.  I think it's less confusing to have the file split into parts than to have 1 big file that has a lot of deferred mechanics going on inside it.  Better to have the MailUniverse deal with most of that.  (And then eventually the accounts themselves when we split those into offline and online phases.)
(Assignee)

Comment 5

5 years ago
Here is my first shot at breaking up accountcommon and switching the APIs async:

https://github.com/jrburke/gaia-email-libs-and-more/commit/6803cf19eb1fc9c4301beea7f3732334c377c01b

Notable points:

* accountcommon dynamically loads "configurator" modules for each account type now: "imap+smtp", "activesync" and "fake".
* Each configurator is targeted as a build layer. So, not only is activesync loaded on-demand, so are the other types. This can be changed easily in the build config.
* alameda needs some config info for the dynamic loading, see changes in end.js, and note the async require to start module execution. alameda does not support sync require at the top level like almond does, due to the possibility something may need to be dynamically loaded.
* I needed to move the window.process definition in shim-sham outside the define block, otherwise the browserify version of events would throw. Not sure of the cause of that, but since that events module does not have a direct dependency on shim-sham, I can see where that window.process has to happen right away.

I have not run the tests, but running on unagi with a fake and a gmail account works. So, activesync not tried. Also, I'm not sure if the install-gaia did the concatenation work, at first glance it may not have. I did not explicitly set DEBUG=1.

Give a shout if this looks bad, has problems. Tomorrow I will try to figure out running the tests in the email-libs repo and see about confirming the gaia-concatenated version works.
(Reporter)

Comment 6

5 years ago
(In reply to James Burke [:jrburke] from comment #5)
> I have not run the tests, but running on unagi with a fake and a gmail
> account works. So, activesync not tried. Also, I'm not sure if the
> install-gaia did the concatenation work, at first glance it may not have. I
> did not explicitly set DEBUG=1.

The optimization process only leaves artifacts around in the .zip files it produces.  You can find ours in profile/webapps/email.gaiamobile.org/application.zip after doing "make profile" or "make NOFTU=1 profile" if you're tired of the FTU app.
(Reporter)

Comment 7

5 years ago
No new test regressions except for some minor autoconfig fallout from a bad 'this' which I've called out on the pull request.

This is looking very good.  I'm especially psyched that I see there's a lot more stuff this framework will let us move out into other layers if we do the 'online' variants and such and it doesn't look all that hard to do that!
(Assignee)

Comment 8

5 years ago
Fixed "this" issue here:
https://github.com/jrburke/gaia-email-libs-and-more/commit/6573ee6f222753d3f43ca51dd628d3681ea71a08

I was going to fiddle with running the unit tests, but if you would prefer to see a squashed commit and a pull request since you ran the tests, I can focus on that route.
(Assignee)

Comment 9

5 years ago
One other small update, this one stubs out q for the copy over to gaia:
https://github.com/jrburke/gaia-email-libs-and-more/commit/7c8ca893116231b48b3b3f255beafb0403f18d68
(Reporter)

Comment 10

5 years ago
(In reply to James Burke [:jrburke] from comment #8)
> I was going to fiddle with running the unit tests, but if you would prefer
> to see a squashed commit and a pull request since you ran the tests, I can
> focus on that route.

gaia-email-libs-and-more does not require squashed commits; only gaia requires that.

I was travelling today and need to check-in with people tomorrow about where we are priority-wise for performance work, but should be able to follow up much more.  If you have some spare cycles to do a profiler run on a device to see how this improves startup for an IMAP-only or activesync-only case, that would be great.  I want to land this on 'master' soon, but want to make sure we have numbers that don't show any horrible surprises.
(Assignee)

Comment 11

5 years ago
Got profiling set up, and I think I'm starting to understand the output. Here are some captures if anyone wants to check. These captures are for a fresh, first start of the email app, so no accounts enabled. Done on an unagi device.

Before changes (using email app from BRANCH=v1-train as of mid-yesterday):
http://people.mozilla.com/~bgirard/cleopatra/#report=81efc32d776f1173cabbdcf52c0e7a23ebce7f6f
Approximate select range: 7802-8166


Changes from my branch:
http://people.mozilla.com/~bgirard/cleopatra/#report=ef4d56bb03a17db99e89d9fc033c3d04c6215e05
Approximate select range: 4193-4483

Done with 10 ms sampling. I had trouble with the 1ms sampling patch, it seemed that I could not time it well to include the EvalString costs at the beginning, maybe a result of waiting for "stabilizing"?

Before: 93 ticks a JS:EvaluateString (evaluating JS), after it is 71 ticks.
Before: 28 ticks for running modules, after 41.

The Timer::Fire sections that looks like almond code (callDep) in the first and alameda in the second (waitForDep), but I believe this is the actual calling of the module factory functions, not actual holdups in almond/alameda, they just are the things visible at the 10ms sampling.


After setting up one imap gmail account. I set up the account, then did ./profile start to restart the device in capture mode. I then opened the email app:

BRANCH=v1-train
http://people.mozilla.com/~bgirard/cleopatra/#report=404d6e17e8838c833e05f3e704d4b6222ba2f9b9

96 EvaluateString
31 RunScript
43 RunScript MessageListCard.prototype.postInsert, folderSlice.oncomplete

Branch code:
http://people.mozilla.com/~bgirard/cleopatra/#report=b2ecbd892316d4be51f7ef95dc8f1267d4e1f3e2


73 EvaluateString
67 RunScript
41 RunScript MessageListCard.prototype.postInsert, folderSlice.oncomplete

This one was harder to read though. I could not tell where the dynamic load came in. But hopefully this gives a flavor for the change.
(Assignee)

Comment 12

5 years ago
After talking with asuth, realized I was doing the comparison wrong, had the wrong gaia baseline to compare against. I updated to latest gaia and email-libs masters, merged with my branch and then did these tests. I also turned on "Time to load" in the dev section:

Summary: Faster time to loads and overall an improvement.

First time load: from 3188 to 2370
With one imap: from 2887 to 2350


Master
---

First time (no account):
http://people.mozilla.com/~bgirard/cleopatra/#report=b8855bed8fe363757d47d266a157651c81a58a40

Time to load: 3188
1303-1711
EvaluateString: 135 (includes the RunScript at 40)


IMAP account, fresh load (after profile.sh start reboot):
http://people.mozilla.com/~bgirard/cleopatra/#report=96ae136b5ef02ea090776788d5bc8922844245e3

Time to load: 2887
665-1204
EvaluateString: 131 (includes the RunScript at 37)
handleMessage: 44 (pushCard, folderSlice.oncomplete)

Branch:
---------

First time (no account):
http://people.mozilla.com/~bgirard/cleopatra/#report=1845b54ba807b6a7ad0bdbb225c7228ca317410c

Time to load: 2370
735-1052
EvaluateString: 76
RunScript (separate) 42

IMAP account, fresh load (after profile.sh start reboot): profile_881_E-Mail
http://people.mozilla.com/~bgirard/cleopatra/#report=45be0d8dd307f9ceeacad866d3737bffed8d11f9

Time to load: 2350
749-1090
EvaluateString: 75
RunScript: (separate) 62 (includes new dynamic load of imap account)
handleMessage: 41 (pushCard, folderSlice.oncomplete)
(Assignee)

Comment 13

5 years ago
Created attachment 716949 [details] [review]
https://github.com/mozilla-b2g/gaia-email-libs-and-more/pull/128

Pull request as discussed in IRC, defers loading of email protocol objects until needed.
Attachment #716949 - Flags: review?(bugmail)
(Reporter)

Comment 14

5 years ago
Comment on attachment 716949 [details] [review]
https://github.com/mozilla-b2g/gaia-email-libs-and-more/pull/128

This reliably improved startup for me by ~450ms per the time-to-live display.  gaia-email-libs-and-more tests were green, manual smoke testing was clean, and travis was happy (not that we have much in the way of unit tests inside gaia proper).

landed in gaia-email-libs-and-more/master:
https://github.com/mozilla-b2g/gaia-email-libs-and-more/pull/128

landed in gaia/master:
https://github.com/mozilla-b2g/gaia/pull/8256


The next step in this line of effort is to add additional phases for offline/online startup as proposed in comment 0.  I think it makes sense to leave this bug open for that work since this is all of a single lineage of work and the granularity that we would want uplift would be for both bits.
Attachment #716949 - Flags: review?(bugmail) → review+
(Assignee)

Comment 15

5 years ago
On deferring loading of the protocol modules, and just starting up using the local info:

I have started a branch, but not done with it yet. I'm sharing it now in the current state in case there is feedback that I need to adjust the approach:

https://github.com/jrburke/gaia-email-libs-and-more/compare/onlinedelay

I started by dynamically loading the composer module and its dependencies only when needed. A "mailchew-strings" module was created to hold any updates to the mailchew locale strings, and then mailchew now checks for it when it does load up. Looking at the dependencies, mailbridge was accessing both composer and mailchew, but now exposed mailchew as a property on composer, so that "composer" can be turned into a build layer.

While this is not directly offloading the email protocol work, I did it as a first part in consolidating access to the *chew modules, and it looks like dynamically loading the composer layer saves around 100ms (this time is saved in the time period after first load but before messages start showing up).

The next part of the patch is to see if it still makes sense to consolidate the *chew modules together, and then look at the account type modules to offload the email protocol work. Looking briefly at the composite type, perhaps dynamically load the protocol stuff as part of the folder and jobs connection acquisition calls. But that was just a quick first glance, may not have that right yet.
(Reporter)

Comment 16

5 years ago
ImapAccount.__folderDemandsConnection/_makeConnection seems like the logical place to lazy-load the e-mail stuff.

ActiveSync is a little weirder since it doesn't maintain a persistent connection, but does need to do some initial setup/sanity checking work with the server before it determines that it is 'account.conn.connected'.  (Which is why everyone checks that and then call account.conn.connect() if it's not connected.)  For lazy-loading purposes, this suggests that we want to slightly change up that idiom so we can defer creating the $activesync.Connection.  Like tracked 'connected' on the Account itself and put connect on there too which would create the conn, etc.

Jim, thoughts on how to do that for ActiveSync in a way that's conceptually clean but lets us defer loading all the $activesync/$wbxml modules as long as possible?
Flags: needinfo?(squibblyflabbetydoo)
(Assignee)

Comment 17

5 years ago
Pushed some more code, splitting out imap and smtp code from the composite/configurator code, and did some other layering for items not needed up front:

https://github.com/jrburke/gaia-email-libs-and-more/compare/onlinedelay

Before doing any changes for that branch, I was seeing mail show up in the 28xx ms range, it is now 24xx ms range for a single IMAP account configured. If I use uglifyjs on the scripts, it goes to low 23xx.

With this most recent change, occasionally a load will trigger the imap/probe layer, but I have not worked through the heuristic for that. The one time I did see it though, the "first messages" was still in the 24xx range.

Next I will look at separating out the code for activesync, which should be the last of the "split out protocol loading" part.

I also need to confirm tests are working too, so this is still considered very much a work in progress.

I am seeing "first load" times in 9xx ms range, so I want to do some profiling to see what else is happening 1400ms range between first load and messages showing up, to see if there is anything else that could be sped up.
(Reporter)

Comment 18

5 years ago
This looks good.  The reason you are probably seeing things happen at load time is because of enqueued per-account operations.  Specifically, if you look at the _loadAccount code, it runs through the list of mutations and calls _queueAccountOp to reschedule various operations.  These operations may then decide to bring the account online, etc.

If you look at _queueAccountOp, it does check if the account should be running operations/server operations yet.  (Server operations need us to be online, non-server operations do not.)  It may make a lot of sense to allow the account to be in a state where it will not yet run its server operations.  We probably do not want to have 'this.online' be false, however, since that could interfere with the refresh operation that we do want to trigger.  When we do flip whatever flag we might add, we would then want to call _resumeOpProcessingForAccount.
(Assignee)

Comment 19

5 years ago
First pass at dynamic loading of activesync protocol stuff here:

https://github.com/jrburke/gaia-email-libs-and-more/commit/6c4271f87ed8095e5910787368bad7d739d87710

(part of the previously linked "onlinedelay" branch in my fork)

Notes about it:

1) mailapi/activesync/account has a withConnection method that handles the previous .connected checking and then also delays initializiation of the .conn object until first call.

2) mailapi/activesync/folder is inlining a copy of FolderHierarchy from activesync/codepages. It may make more sense to modify the source activesync code and pull that object out as a separate module, so that folder can just depend on it directly. The copy exists because it was hard to async load all uses of it inside mailapi/activesync/account.

3) for mailapi/activesync folder, account and jobs, all use a "lazyInit" function wrapper that does the async loading of the protocol stuff. They also trigger an account.withConnection() call too.

Still lots more testing to be done on it, but basic account setup, usage seems to work.

One of the initial paths I looked at was to just dynamically load the "folder" module along with the protocol stuff, but ActiveSyncAccount wanted $asfolder.ActiveSyncFolderSyncer for the $mailslice.FolderStorage creation in its constructor, so I gave up on that path.

This change has a similar performance benefit to the imap changes, even seems a bit better, in the 2300 range for many cold starts, after initial mail sync.
The commit in Comment 14 landed in v1.0.1 as part of the Big Uplift from Bug 851124. Will land to v1-train tomorrow.

I'd suggest that we close this bug and open a new one for the remaining work. I don't like having part of bugs landed. What do you think ?
(Reporter)

Comment 21

5 years ago
Right, yes, let's spin up a follow-up for reasons of tracking by cloning this bug.
ok then I mark it fixed and let you clone this bug.
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
status-b2g18-v1.0.1: --- → fixed
Resolution: --- → FIXED
(Assignee)

Updated

5 years ago
Blocks: 852710
Landed in v1-train as part of Bug 851124 big uplift.
status-b2g18: --- → fixed

Comment 24

5 years ago
Posting to clear needinfo, since I talked about this on IRC a while back.
Flags: needinfo?(squibblyflabbetydoo)
Attachment mime type: text/plain → text/x-github-pull-request
You need to log in before you can comment on or make changes to this bug.