Closed Bug 887889 Opened 11 years ago Closed 6 years ago

Use Sqlite.jsm in ContentPrefs

Categories

(Firefox :: General, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Firefox 61
Tracking Status
firefox61 --- fixed

People

(Reporter: mak, Assigned: alexical)

References

(Blocks 2 open bugs)

Details

(Keywords: main-thread-io, Whiteboard: [fxperf:p1])

Attachments

(7 files, 2 obsolete files)

if possible we should move ContentPrefs to the async only connection, this bug is to track the needed work
or use Sqlite.jsm
Summary: Use mozIStorageAsyncConnection in ContentPrefs → Use Sqlite.jsm in ContentPrefs
Here is a newer profile of content pref initialization now that bug 886907 is fixed: https://perfht.ml/2v7AA7n
Whiteboard: [fxperf]
Marking this as P2 since it's sync IO and I think it should be doable in a reasonable time frame, looking at the code. Marco, is there any surprising difficulty here that I'm missing? I can't see what if anything depends on this being synchronous.
Flags: needinfo?(mak77)
Priority: -- → P2
We removed the old ContentPrefs service that had a synchronous API, and replaced it with an async API, thus I don't expect surprising difficulties. But as bug 888784 teached us, it may not be trivial as it looks.
The added difficulty is that some of this code is not very modern (no promises, lots of callbacks). Doing some first cleanup passes to modernize the code may help.
Flags: needinfo?(mak77)
Resetting priority to what it was (team switching to whiteboard priority system.)
Priority: P2 → --
Whiteboard: [fxperf] → [fxperf:p2]
Assignee: nobody → dothayer
Status: NEW → ASSIGNED
Whiteboard: [fxperf:p2] → [fxperf:p1]
Hey Marco, sorry to add to your work, but let me know if there's anyone else who I can offload these review requests to if you're swamped!
Comment on attachment 8958235 [details]
Bug 887889 - Fix temporary leak in Sqlite.jsm transactions

https://reviewboard.mozilla.org/r/227180/#review232988

This got bitrotted by bug 1442353.
Comment on attachment 8958235 [details]
Bug 887889 - Fix temporary leak in Sqlite.jsm transactions

https://reviewboard.mozilla.org/r/227180/#review234112

::: toolkit/modules/Sqlite.jsm:631
(Diff revision 1)
>  
>            return result;
>          } finally {
>            this._hasInProgressTransaction = false;
> +          // Ensure we don't hold onto anything for too long by keeping the timeout alive
> +          clearTimeout(timeoutHandle);

I'm not sure this is correct after bug 1442353, because now a timer can cover multiple transactions, and thus it should probably not be cleared when one transaction is done.

We could maybe do this clearTimeout when the connection is closing?
Attachment #8958235 - Flags: review?(mak77)
Comment on attachment 8958233 [details]
Bug 887889 - Migrate ContentPrefService2 to Sqlite.jsm

https://reviewboard.mozilla.org/r/227160/#review234130

it's not trivial to review this with these many changes, and because the original code is not particularly clean. It's likely this may take a few rounds.

::: toolkit/components/contentprefs/ContentPrefService2.js:12
(Diff revision 1)
>  ChromeUtils.import("resource://gre/modules/ContentPrefUtils.jsm");
>  ChromeUtils.import("resource://gre/modules/ContentPrefStore.jsm");
> +ChromeUtils.defineModuleGetter(this, "OS",
> +                               "resource://gre/modules/osfile.jsm");
> +ChromeUtils.defineModuleGetter(this, "FileUtils",
> +                               "resource://gre/modules/FileUtils.jsm");

This doesn't appear to be used?

::: toolkit/components/contentprefs/ContentPrefService2.js:109
(Diff revision 1)
> +    if (this._connPromise) {
> +      return this._connPromise;
> +    }
> +
> +    // Don't use sessionstore-windows-restored from xpcshell, since it
> +    // won't fire.

I'm not sure I get the whole picture around sessionstore-windows-restore, shouldn't the connection (and Sqlite.jsm) be initialized lazily when the first API in need of it is invoked?
Since all the APIs are async, it should be possible to wait for the connection.

Imo, if this causes us to initialize Sqlite.jsm earlier, then we should either:
1. figure out which consumer is doing that and whether can can avoid that on its side
2. store somewhere that we can't use the service yet, and make the connection getter throw

Clearly 1 is the preferred path, since it doesn't make much sense to have a service that sometimes works, sometimes doesn't.

::: toolkit/components/contentprefs/ContentPrefService2.js:169
(Diff revision 1)
>              cbHandleResult(callback, new ContentPref(pbGroup, pbName, pbVal));
>            }
>          }
>          cbHandleCompletion(callback, reason);
>        },
> -      onError: function onError(nsresult) {
> +      onError: (nsresult) => {

If you rerwite these, please be consistent with the style of a single argument in this file, either it takes parenthesis or it doesn't.

In any case, shorthands (like "onError(nsresult) {") would even be shorter

::: toolkit/components/contentprefs/ContentPrefService2.js:799
(Diff revision 1)
>      let gotRow = false;
> -    this._dbConnection.executeAsync(stmts, stmts.length, {
> -      handleResult: function handleResult(results) {
> +    let { onRow, onError } = callbacks;
> +    await conn.executeTransaction(async () => {
> +      for (let {sql, params} of stmts) {
>          try {
> -          let row = null;
> +          await conn.execute(sql, params, row => {

could likely use executeCached if the sql doesn't contain a LIKE. Maybe the _stmt helper could add a .cachable property to each object and we could check that here?

Fwiw, most of this indirection is due to the old API being worst than Sqlite.jsm, if we should rewrite it today likely it would be quite different...

::: toolkit/components/contentprefs/ContentPrefService2.js:979
(Diff revision 1)
> +        done = true;
> +      });
> +      Services.tm.spinEventLoopUntil(() => done);
> +      if (error) {
> +        throw error;
> +      }

to avoid spinning, you could also send another notification to the observer service, and have the test wait for it after sending "test:db".

::: toolkit/components/contentprefs/ContentPrefService2.js:1078
(Diff revision 1)
> -          this._dbMigrate(dbConnection, version, this._dbVersion);
> -        } catch (ex) {
> -          Cu.reportError("error migrating DB: " + ex + "; backing up and recreating");
> -          dbConnection = this._dbBackUpAndRecreate(dbFile, dbConnection);
> -        }
> +      conn = await Sqlite.openConnection({ path });
> +      Sqlite.shutdown.addBlocker(
> +        "Closing ContentPrefService2 connection.",
> +        () => conn.close());
> +    } catch (e) {
> +      Cu.reportError(e);

you should be able to check e.status == Cr.NS_ERROR_FILE_CORRUPTED; to check for corruption and replace the db. in the other cases we should probably just bailout and not try to replace the db.
See https://hg.mozilla.org/mozilla-central/rev/8f3552d61627
Attachment #8958233 - Flags: review?(mak77)
Comment on attachment 8958234 [details]
Bug 887889 - Asyncify cps2 tests

https://reviewboard.mozilla.org/r/227178/#review234144

I love all of this, modern code!

::: toolkit/components/contentprefs/tests/unit_cps2/head.js:18
(Diff revision 1)
> -function runAsyncTests(tests, dontResetBefore = false) {
> -  do_test_pending();
> -
>    cps = Cc["@mozilla.org/content-pref/service;1"].
>          getService(Ci.nsIContentPrefService2);
> +})();

not sure what's the point of creating an init() function and calling it immediately, rather than just invoking do_get_profile() and setting cps...

::: toolkit/components/contentprefs/tests/unit_cps2/test_observers.js:13
(Diff revision 1)
> -  var allTests = [];
> -  for (var i = 0; i < tests.length; i++) {
> -    // Generate two wrappers of each test function that invoke the original test with an
> -    // appropriate privacy context.
> -    /* eslint-disable no-eval */
> -    var pub = eval("var f = function* " + tests[i].name + "() { yield tests[" + i + "](privateLoadContext); }; f");
> +  await reset();
> +});
> +
> +function raceWithTimeout(promise) {
> +  return Promise.race([promise, new Promise(resolve => {
> +    do_timeout(50, () => resolve(null));

This timeout is very short, I/O is unpredictable, especially in automation... I'm also not sure why we need this timeout when before it didn't exist.
Did you add tests that some observers are not wrongly invoked, or is this a behavioral change?
Usually when we use timeouts with I/O we don't use anything smaller than 1 second or 2.
Attachment #8958234 - Flags: review?(mak77)
Comment on attachment 8958233 [details]
Bug 887889 - Migrate ContentPrefService2 to Sqlite.jsm

https://reviewboard.mozilla.org/r/227160/#review234130

> If you rerwite these, please be consistent with the style of a single argument in this file, either it takes parenthesis or it doesn't.
> 
> In any case, shorthands (like "onError(nsresult) {") would even be shorter

Woops on the parenthesis, but regarding the lambdas, I settled on that convention not for brevity but because a number of these callbacks reference `this`, which I stopped binding at the call site since that seemed like a slightly unexpected practice. Not all of them do but I just wanted to be consistent.
ok, my fault cause I didn't notice those bind()
Comment on attachment 8958233 [details]
Bug 887889 - Migrate ContentPrefService2 to Sqlite.jsm

https://reviewboard.mozilla.org/r/227160/#review234130

> I'm not sure I get the whole picture around sessionstore-windows-restore, shouldn't the connection (and Sqlite.jsm) be initialized lazily when the first API in need of it is invoked?
> Since all the APIs are async, it should be possible to wait for the connection.
> 
> Imo, if this causes us to initialize Sqlite.jsm earlier, then we should either:
> 1. figure out which consumer is doing that and whether can can avoid that on its side
> 2. store somewhere that we can't use the service yet, and make the connection getter throw
> 
> Clearly 1 is the preferred path, since it doesn't make much sense to have a service that sometimes works, sometimes doesn't.

It's browser-fullZoom.js, starting from here: https://searchfox.org/mozilla-central/rev/3abf6fa7e2a6d9a7bfb88796141b0f012e68c2db/browser/base/content/browser-fullZoom.js#214

The comment above that line doesn't seem to make all that much since anymore, since it's not avoiding a cps roundtrip. I couldn't sort out a better way to delay that than just delaying retrieving the connection inside CPS2. It seems like we do need to eventually read and apply the global value, and it seems like the logic to delay it would amount to roughly the same thing just moved inside browser-fullZoom.js, no?
(In reply to Doug Thayer [:dthayer] from comment #16)
> It seems like we do need to eventually read and apply the global
> value, and it seems like the logic to delay it would amount to roughly the
> same thing just moved inside browser-fullZoom.js, no?

Probably yes, in any case it seems better to delay the consumer code rather than adding a hack inside the service to artificially delay handling the first call to it.
The reason is that it's the consumer that must have a clear definition of when it is being initialized and when it needs to do work. ContentPrefs is just a service used to store and return data.

Drew originally worked on the first CPS rewrite and may be of help for the call, and florian is working on startup perf and eventually may be of help to find a better spot for full zoom init.
Looking deeper at it, I'm not sure why we need to get a zoom level at all for about:blank. Looking at the history but it's looking to me like this might be a bug.
(In reply to Doug Thayer [:dthayer] from comment #18)
> Looking deeper at it, I'm not sure why we need to get a zoom level at all
> for about:blank. Looking at the history but it's looking to me like this
> might be a bug.

Yes. Content prefs should be on a principal basis, I think? We should just avoid storing or fetching preferences for null principals (as they'll never re-occur anyway). about:blank should have a null principal or, when opened by a website, the codebase principal for that website.
Comment on attachment 8958234 [details]
Bug 887889 - Asyncify cps2 tests

https://reviewboard.mozilla.org/r/227178/#review234144

> This timeout is very short, I/O is unpredictable, especially in automation... I'm also not sure why we need this timeout when before it didn't exist.
> Did you add tests that some observers are not wrongly invoked, or is this a behavioral change?
> Usually when we use timeouts with I/O we don't use anything smaller than 1 second or 2.

The old code relied on the fact that continuations are called synchronously when calling next(). Async/await doesn't really have a way to make up for this. The consuming code looked like this:

    // yield - continuation will be called in the onDone callback, but before observers are notified
    yield cps.removeByDomain("bogus", context, makeCallback());
    // We've got our continuation - execution inside cps2 is waiting on us to call yield again before notifying observers
    let args = yield on("Removed", ["foo", null, "bar"]);
    // The on(...) function will listen to observer notifications, and do roughly: executeSoon(() => next()).
    // Since we can rely on observers being called synchronously after we yield, we know that args will
    // now be in the final state, which in our case should be empty (we should have received no notifications)
    observerArgsOK(args.foo, []);
    observerArgsOK(args.null, []);
    observerArgsOK(args.bar, []);

The key here is that we use the raceWithTimeout function only when we want to ensure that we _lose_ the race. To get the same behaviour without the synchronous interleaving of code enabled by yield, I just opted to have us call on(...) before performing an action, and then make sure that our promise of a notification times out. Since we don't want to wait for a whole second in every test for it to time out, I set a small timeout. If bugs were introduced - the should probably manifest by just racing with executeSoon, but I figured the 50ms was just a cheap way to get a little bit of extra confidence.

Does that make sense? I can see how it's a bit confusing, and I'm open to more readable ways of coding the expectation that we _don't_ want to get a notification, if you have any.
Attachment #8958235 - Attachment is obsolete: true
(Also - Marco, if you're really swamped right now and don't mind passing it off, Gijs offered to review these patches.)

I forgot to mention - some change in between now and when I posted the initial patch seems to have made both browser_startup.js failure (regarding initializing Sqlite.jsm too early), and the shutdown leak with executeTransaction go away. So I've removed the Sqlite.jsm patch. I'm going to mull over why that went away, because it doesn't seem like it should have, but I just wanted to clarify why the changes for both of those issues were no longer in the patches.
I can review this tomorrow, no worries.
Comment on attachment 8958233 [details]
Bug 887889 - Migrate ContentPrefService2 to Sqlite.jsm

https://reviewboard.mozilla.org/r/227160/#review236038

nit: now that we don't have legacy add-ons, it may be worth filing a followup to cleanup the code a bit:
1. modernize the methods definitions to avoid CPS2_ labels
2. Rename to just ContentPrefService
3. Rename unit_cps2 to just unit
4. rename various instances around from cps2 to just cps

::: commit-message-a6a32:6
(Diff revision 2)
> +Bug 887889 - Migrate ContentPrefService2 to Sqlite.jsm r?mak
> +
> +Some notes:
> +
> +- We delay initializing the connection until we see a
> +  sessionstore-windows-restored notification. This is to get the

this is no more the case right? Then the commit message should be updated.

::: commit-message-a6a32:16
(Diff revision 2)
> +- I kept the xpcom-shutdown observer around even though it's not
> +  doing much and it could be satisfied by doing a little more work
> +  in the Sqlite.shutdown blocker. I wasn't sure which to use since
> +  it seems like the Sqlite.shutdown blocker is intended to be
> +  used to cleanup connection-related things. Thoughts on this are
> +  welcome.

yes sqlite.shutdown is mostly about "Sqlite.jsm is going away, please cleanup your connection-related stuff".

Anyway, xpcom-shutdown is surely wrong for any service storing things in the profile, those services should instead use the async shutdown clients like profileBeforeChange, profileChangeTeardown or quitApplicationGranted.
Since in this case the shutdown is really trivial (just cleaning up references and observers) you can just change xpcom-shutdown for profile-before-change here, that is a trivial workaround to the problem.

::: toolkit/components/contentprefs/ContentPrefService2.js:749
(Diff revision 2)
>     * statement is cached, one is created and cached.
>     *
>     * @param sql  The SQL query string.
>     * @return     The cached, possibly new, statement.
>     */
> -  _stmt: function CPS2__stmt(sql) {
> +  _stmt: function CPS2__stmt(sql, cachable) {

nit: since the most common case seems to be cachable: true, you could provide it as default value and simplify most of the callers.
One day Sqlite.jsm will just cache all the statements, and automatically forget the ones that have not been used in a while...

::: toolkit/components/contentprefs/ContentPrefService2.js:955
(Diff revision 2)
>    /**
>     * Removes all state from the service.  Used by tests.
>     *
>     * @param callback  A function that will be called when done.
>     */
> -  _reset: function CPS2__reset(callback) {
> +  _reset: async function CPS2__reset(callback) {

nit: async _reset(callback) {

::: toolkit/components/contentprefs/ContentPrefService2.js:1023
(Diff revision 2)
> -  // specific migration methods) must be careful not to call any method
> -  // of the service that assumes the database connection has already been
> -  // initialized, since it won't be initialized until at the end of _dbInit.
> -
> -  _dbInit: function ContentPrefService__dbInit() {
> -    var dbFile = Services.dirsvc.get("ProfD", Ci.nsIFile);
> +    if (this._debugLog) {
> +      Services.console.logStringMessage("ContentPrefService2: " + aMessage);
> +    }
> +  },
> +
> +  _getConnection: async function CPS2__getConnection(aAttemptNum = 0) {

nit: async _getConnection(attemptNum = 0) {

::: toolkit/components/contentprefs/ContentPrefService2.js:1082
(Diff revision 2)
> -      dbConnection.executeSimpleSQL("PRAGMA synchronous = OFF");
> +      await conn.execute("PRAGMA synchronous = OFF");
>  
> -    this._dbConnection = dbConnection;
> +    return conn;
>    },
>  
> -  _dbCreate: function ContentPrefService__dbCreate(aDBFile) {
> +  _failover: async function CPS2__failover(aConn, aPath) {

nit: ditto, if you pretty-much rewrite a function please modernize its definition/code style, in the end you are already changing most of the blame in the function, so no reason to retain the old name/label/xpcom conventions.
Attachment #8958233 - Flags: review?(mak77) → review+
Comment on attachment 8958234 [details]
Bug 887889 - Asyncify cps2 tests

https://reviewboard.mozilla.org/r/227178/#review236056

::: toolkit/components/contentprefs/tests/unit_cps2/test_observers.js:13
(Diff revision 2)
> -  var allTests = [];
> -  for (var i = 0; i < tests.length; i++) {
> -    // Generate two wrappers of each test function that invoke the original test with an
> -    // appropriate privacy context.
> -    /* eslint-disable no-eval */
> -    var pub = eval("var f = function* " + tests[i].name + "() { yield tests[" + i + "](privateLoadContext); }; f");
> +  await reset();
> +});
> +
> +function raceWithTimeout(promise) {
> +  return Promise.race([promise, new Promise(resolve => {
> +    do_timeout(50, () => resolve(null));

ok, I see your point, but still, 50ms means this test could have frequent false PASS, because there's no way I/O on our tinderboxes takes a so small time.

In lack of a better solution I propose to set this timing to a meaningful value, that in any case keeps the total run time of this test below 5 seconds.
I'd not set this below 300 or 400ms to avoid false PASSes.
Attachment #8958234 - Flags: review?(mak77) → review+
So, the state of Sqlite.jsm in browser_startup.js is a bit unclear. Could you clarify it please?
I think in the end it is not a regression, in the sense we were already starting up mozStorage and the connection before, but browser_startup can't detect that. On the other side it can detect Sqlite.jsm.
Considered that, we could ask Florian for a temporary pass-through, and have a bug filed to investigate how to make browser-fullzoom not do I/O at all on startup. Though, that should be fixed shortly after.
(In reply to Doug Thayer [:dthayer] from comment #20)
> If bugs were
> introduced - the should probably manifest by just racing with executeSoon,
> but I figured the 50ms was just a cheap way to get a little bit of extra
> confidence.
> 
> Does that make sense? I can see how it's a bit confusing, and I'm open to
> more readable ways of coding the expectation that we _don't_ want to get a
> notification, if you have any.

If executeSoon isn't giving enough confidence, you can use idle callbacks instead (Services.tm.idleDispatchToMainThread). That should be faster than 50ms on a fast machine, and more reliable than a 50ms timeout on a machine under heavy load.
(In reply to Marco Bonardo [::mak] from comment #28)
> I think in the end it is not a regression, in the sense we were already
> starting up mozStorage and the connection before, but browser_startup can't
> detect that. On the other side it can detect Sqlite.jsm.

I stand corrected (thanks Florian) browser_Startup detects mozStorage, but it's not blacklisted and the test doesn't complain, on the other side Sqlite.jsm is blacklisted.
(In reply to Marco Bonardo [::mak] from comment #27)
> In lack of a better solution I propose to set this timing to a meaningful
> value, that in any case keeps the total run time of this test below 5
> seconds.
> I'd not set this below 300 or 400ms to avoid false PASSes.

Hmm, but we're not waiting on IO, and I'm not anticipating ever having to wait on IO (we've already awaited for the onDone notification when using this). Really, racing with Promise.resolve() should be enough, but I just wanted to add a buffer that could survive us calling notifications in an executeSoon for any reason. I do like Florian's suggestion of using Services.tm.idleDispatchToMainThread.

If you still want to add it, to anticipate potential cases where we _will_ be waiting on IO in between sending onDone and sending notifications, I'm okay with that - I just wanted to make sure we're on the same page about it.
(In reply to Marco Bonardo [::mak] from comment #30)
> I stand corrected (thanks Florian) browser_Startup detects mozStorage, but
> it's not blacklisted and the test doesn't complain, on the other side
> Sqlite.jsm is blacklisted.

So I'm a bit unclear - would you still prefer to execute on your suggestion in Comment #28?
(In reply to Doug Thayer [:dthayer] from comment #31)
> Hmm, but we're not waiting on IO, and I'm not anticipating ever having to
> wait on IO 

Ok, that wasn't clear, I'm fine with Florian suggestions.

(In reply to Doug Thayer [:dthayer] from comment #32)
> So I'm a bit unclear - would you still prefer to execute on your suggestion
> in Comment #28?

I still don't know if, with the null principal trick, we still call into contentPrefs in browser_startup.js. If so did you figure out why?
I asked Florian on IRC, he said that for a main-thread I/O win like this, we could even take a Sqlite.jsm hit in browser_startup (move it a bit earlier), provided we start investigating in a follow-up bug why we can't delay this zoom fetching.
(In reply to Marco Bonardo [::mak] from comment #33)
> I still don't know if, with the null principal trick, we still call into
> contentPrefs in browser_startup.js. If so did you figure out why?
> I asked Florian on IRC, he said that for a main-thread I/O win like this, we
> could even take a Sqlite.jsm hit in browser_startup (move it a bit earlier),
> provided we start investigating in a follow-up bug why we can't delay this
> zoom fetching.

The null principal trick makes browser_startup.js pass just fine (at least locally - but I don't have any reason to believe it would be different in automation.)
Ok, that's fine.
Comment on attachment 8961629 [details]
Bug 887889 - Don't load browser zoom pref for blank page

https://reviewboard.mozilla.org/r/230496/#review236230

::: commit-message-ad8d2:4
(Diff revision 1)
> +Bug 887889 - Don't load browser zoom pref for blank page r?mak
> +
> +This is (still - ignore the comment in the bug) breaking the
> +browser_startup.js test, because it causes us to load Sqlite.jsm

this comment needs a small update
Attachment #8961629 - Flags: review?(mak77) → review+
Turns out it does fix the browser_startup.js failure, but it has some incidental failures in other tests. I think it's probably best to do the work for this in a follow-up, so I'm just going to put up a patch relaxing browser_startup.js for now and file a follow-up.
Blocks: 1448944
Attachment #8961629 - Attachment is obsolete: true
Comment on attachment 8962417 [details]
Bug 887889 - Relax Sqlite.jsm restrictions in browser_startup.js

https://reviewboard.mozilla.org/r/231270/#review236694
Attachment #8962417 - Flags: review?(florian) → review+
Running into shutdown leaks on try that are killing me. I think I've narrowed it down to two things: the timeout promise and the .jsm boundary. If I implement the executeTransaction method inside ContentPrefService2.js _and_ change the timeout mechanism, the failures go away. Unfortunately it's slow going because I can only reproduce the failures on try (though it is 100% reproducible).
jsm boundaries may matter, if you store an object or an array created in a window in your jsm. Usually to solve those we clone before storing them, for example https://searchfox.org/mozilla-central/source/toolkit/components/places/PlacesTransactions.jsm#741

It's strange that these leaks only appear for CPS2, when a lot of other components use Sqlite.jsm without showing any leak.
Hey dthayer,

I struggled with shutdown leaks in bug 888784 too (see bug 888784 comment 68).

What ended up working for me was ensuring transactions ran in a separate function scope instead of using an arrow function which would enclose all of the variables within the scope of the caller of executeTransaction.

See bug 888784 comment 83.

I've not looked at your patch, so I don't know if my experience in that bug is even applicable, but wanted to bring it up in case it's useful as a road for experimentation.
(In reply to Mike Conley (:mconley) (:⚙️) (Totally backlogged on reviews and needinfos) from comment #45)
> Hey dthayer,
> 
> I struggled with shutdown leaks in bug 888784 too (see bug 888784 comment
> 68).
> 
> What ended up working for me was ensuring transactions ran in a separate
> function scope instead of using an arrow function which would enclose all of
> the variables within the scope of the caller of executeTransaction.
> 
> See bug 888784 comment 83.
> 
> I've not looked at your patch, so I don't know if my experience in that bug
> is even applicable, but wanted to bring it up in case it's useful as a road
> for experimentation.

Yeah - unfortunately I made the same transformation to mine that you made to yours and it didn't resolve it. The only thing that's resolved it is pulling the implementation of executeTransaction into ContentPrefService2.js :/
Comment on attachment 8964377 [details]
Bug 887889 - Clean up ContentPrefServiceChild on xpcom-shutdown

https://reviewboard.mozilla.org/r/233086/#review239318

::: toolkit/components/contentprefs/ContentPrefServiceChild.jsm:65
(Diff revision 1)
>  
>    init() {
>      Services.cpmm.addMessageListener("ContentPrefs:HandleResult", this);
>      Services.cpmm.addMessageListener("ContentPrefs:HandleError", this);
>      Services.cpmm.addMessageListener("ContentPrefs:HandleCompletion", this);
> +    Services.obs.addObserver(this, "xpcom-shutdown");

can we do this earlier, like on profile-before-change, or is necessary to act so late?
Attachment #8964377 - Flags: review?(mak77) → review+
Comment on attachment 8964376 [details]
Bug 887889 - Limit scope of executeTransaction closure

https://reviewboard.mozilla.org/r/233084/#review239328

I wonder if the problem here was using the onRow handler inside a transaction. Usually only multiple writes should go inside transactions, that should be as short lived as possible. This _execStmts thing is just a bit overengineered.
Maybe one day we'll fix this service API to use promises instead of the triple-callback API, and then it would be trivial to just remove _execStmts and use raw Sqlite.jsm executes all around. But it may be more complex to use from cpp :(

Moreover, wrapping multiple SELECTs in a transaction is not a great thing to do, as well as wrapping a single write statement. Currently _get and getByName are doing the former... could we add an option to skip the transaction for those, or alternatively, check if there is only 1 stmt in the array or if all the stmts in the array begin with SELECT, and just skip the transaction in that case.

::: toolkit/components/contentprefs/ContentPrefService2.js:798
(Diff revision 1)
> -                onRow(row);
> +          callbacks.onError(e);
> -              } catch (e) {
> +        } catch (e) {
> -                Cu.reportError(e);
> +          Cu.reportError(e);
> -              }
> +        }
> +      } else {
> +        throw e;

The old method didn't seem to throw at all... why is this one rethrowing if there's no onError callback? Is this a case we may hit in reality? Should it rather just console.assert?

::: toolkit/components/contentprefs/ContentPrefService2.js:802
(Diff revision 1)
> +      } else {
> +        throw e;
> -            }
> +      }
> -          });
> -        } catch (e) {
> +    }
> +
> +    if (rows) {

if (rows && callbacks.onRow) {
Attachment #8964376 - Flags: review?(mak77) → review+
Attachment #8969710 - Flags: review?(mak77) → review?(masayuki)
Comment on attachment 8969711 [details]
Bug 887889 - Fix intermittent failure in browser_bug419612.js

https://reviewboard.mozilla.org/r/238504/#review244632

::: browser/base/content/browser-fullZoom.js:264
(Diff revision 1)
>    // Setting & Pref Manipulation
>  
>    /**
>     * Reduces the zoom level of the page in the current browser.
>     */
> -  reduce: function FullZoom_reduce() {
> +  reduce: async function FullZoom_reduce() {

nit: async reduce() {

::: browser/base/content/browser-fullZoom.js:274
(Diff revision 1)
>    },
>  
>    /**
>     * Enlarges the zoom level of the page in the current browser.
>     */
> -  enlarge: function FullZoom_enlarge() {
> +  enlarge: async function FullZoom_enlarge() {

nit: ditto
Attachment #8969711 - Flags: review?(mak77) → review+
I honestly don't know who can review the editor/spellcheck code, I tentatively forwarded to Masayuki-san that IIRC owns editor
You could try ehsan if Masayuki-san doesn't feel comfortable reviewing it.
Attachment #8969710 - Flags: review?(masayuki) → review?(ehsan)
Comment on attachment 8969710 [details]
Bug 887889 - Fix leak in RemoteSpellCheckingEngineChild

https://reviewboard.mozilla.org/r/238502/#review245462

Thanks, your patch makes perfect sense and it does look that your change just got unlucky to run into this.  I have a suggestion for not managing the memory manually below, which should be easy to address, feel free to flag me again if you need another review pass but otherwise I don't need to look at it again.  :-)

Sorry about the delay!

::: extensions/spellcheck/hunspell/glue/RemoteSpellCheckEngineChild.h:31
(Diff revision 1)
>    RefPtr<GenericPromise> SetCurrentDictionaryFromList(
>                             const nsTArray<nsString>& aList);
>  
>  private:
>    mozSpellChecker *mOwner;
> +  nsTArray<MozPromiseHolder<GenericPromise>*> mResponsePromises;

I'm not a huge fan of storing raw pointers in this array and managing the memory behind them manually.  How about instead making this an nsTArray<UniquePtr<MozPromiseHolder<GenericPromise>>>?

::: extensions/spellcheck/hunspell/glue/RemoteSpellCheckEngineChild.cpp:25
(Diff revision 1)
> +
> +  // ensure we don't leak any promise holders for which we haven't yet
> +  // received responses
> +  for (MozPromiseHolder<GenericPromise>* promiseHolder : mResponsePromises) {
> +    promiseHolder->RejectIfExists(NS_ERROR_ABORT, __func__);
> +    delete promiseHolder;

then this delete would go away

::: extensions/spellcheck/hunspell/glue/RemoteSpellCheckEngineChild.cpp:34
(Diff revision 1)
>  RefPtr<GenericPromise>
>  RemoteSpellcheckEngineChild::SetCurrentDictionaryFromList(
>    const nsTArray<nsString>& aList)
>  {
>    MozPromiseHolder<GenericPromise>* promiseHolder =
>      new MozPromiseHolder<GenericPromise>();

and you would make a UniquePtr here

::: extensions/spellcheck/hunspell/glue/RemoteSpellCheckEngineChild.cpp:38
(Diff revision 1)
>    MozPromiseHolder<GenericPromise>* promiseHolder =
>      new MozPromiseHolder<GenericPromise>();
>    if (!SendSetDictionaryFromList(
>           aList,
>           reinterpret_cast<intptr_t>(promiseHolder))) {
>      delete promiseHolder;

and this would go away

::: extensions/spellcheck/hunspell/glue/RemoteSpellCheckEngineChild.cpp:41
(Diff revision 1)
>           aList,
>           reinterpret_cast<intptr_t>(promiseHolder))) {
>      delete promiseHolder;
>      return GenericPromise::CreateAndReject(NS_ERROR_FAILURE, __func__);
>    }
> +  mResponsePromises.AppendElement(promiseHolder);

and you'd move here

::: extensions/spellcheck/hunspell/glue/RemoteSpellCheckEngineChild.cpp:60
(Diff revision 1)
>      promiseHolder->RejectIfExists(NS_ERROR_NOT_AVAILABLE, __func__);
>    } else {
>      promiseHolder->ResolveIfExists(true, __func__);
>    }
> +  mResponsePromises.RemoveElement(promiseHolder);
>    delete promiseHolder;

and then you'd probably need to rewrite this bit as a loop over the array with a call to RemoveElementAt().

I think this would help make the code less error prone.
Attachment #8969710 - Flags: review?(ehsan) → review+
(In reply to Marco Bonardo [::mak] (Away 23 Apr - 1 May) from comment #52)
> can we do this earlier, like on profile-before-change, or is necessary to
> act so late?

Unfortunately, and I still am not sure I understand why, if I move this to profile-before-change it still leaks :/
Pushed by dothayer@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/29598c587f7f
Migrate ContentPrefService2 to Sqlite.jsm r=mak
https://hg.mozilla.org/integration/autoland/rev/744a7b9fe0ab
Asyncify cps2 tests r=mak
https://hg.mozilla.org/integration/autoland/rev/e7f2bc2cb13c
Relax Sqlite.jsm restrictions in browser_startup.js r=florian
https://hg.mozilla.org/integration/autoland/rev/7e4a87170569
Limit scope of executeTransaction closure r=mak
https://hg.mozilla.org/integration/autoland/rev/6480e870dd73
Clean up ContentPrefServiceChild on xpcom-shutdown r=mak
https://hg.mozilla.org/integration/autoland/rev/fad1b59c9f55
Fix leak in RemoteSpellCheckingEngineChild r=Ehsan
https://hg.mozilla.org/integration/autoland/rev/20e6e6da280d
Fix intermittent failure in browser_bug419612.js r=mak
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: