Last Comment Bug 662989 - Avoid crashing if a DB connection isn't asyncClose()d when it should be
: Avoid crashing if a DB connection isn't asyncClose()d when it should be
Status: VERIFIED FIXED
[inbound]
: crash
Product: Toolkit
Classification: Components
Component: Storage (show other bugs)
: unspecified
: x86 All
: -- critical (vote)
: mozilla8
Assigned To: Nicholas Nethercote [:njn]
:
: Marco Bonardo [::mak]
Mentors:
: 670953 (view as bug list)
Depends on:
Blocks: 670953
  Show dependency treegraph
 
Reported: 2011-06-08 17:55 PDT by Nicholas Nethercote [:njn]
Modified: 2011-08-23 07:21 PDT (History)
15 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
fixed


Attachments
patch (against m-i 70732:96fac50b57ed) (2.00 KB, patch)
2011-06-08 21:34 PDT, Nicholas Nethercote [:njn]
no flags Details | Diff | Splinter Review
better patch (against 70732:96fac50b57ed) (7.12 KB, patch)
2011-06-09 00:05 PDT, Nicholas Nethercote [:njn]
mak77: review+
Details | Diff | Splinter Review
better patch, v2 (7.13 KB, patch)
2011-06-19 19:55 PDT, Nicholas Nethercote [:njn]
sdwilsh: review-
Details | Diff | Splinter Review
patch, v3, with test (10.48 KB, patch)
2011-07-14 21:55 PDT, Nicholas Nethercote [:njn]
no flags Details | Diff | Splinter Review
patch, v4 (10.97 KB, patch)
2011-07-24 17:52 PDT, Nicholas Nethercote [:njn]
mak77: review+
christian: approval‑mozilla‑aurora+
Details | Diff | Splinter Review

Description Nicholas Nethercote [:njn] 2011-06-08 17:55:47 PDT
From bug 654573 comment 59:

>> AFAICT the real underlying problem is that some JS code that uses an async
>> DB connection fails to call asyncClose() on it before it becomes garbage. 
>
> Yes.  Which is not to say that it's reasonable for storage to be causing a
> crash or endangering the heap when this sequence of events occurs; storage
> shouldn't do that, it's a bug.

The problem is that if a connection on which async transactions were executed is not explicitly closed with asyncClose(), its destructor will run Close() instead, which will bail out early (because of the async-ness), so Connection::internalClose() isn't run, and the memory reporters for the connection are not unregistered.  The end result is that memory reporters are left pointing to a Connection that has been freed, which is a sure recipe for a subsequent hard-to-debug crash (eg. bug 654573, bug 660213).

To make things more robust, we just need to ensure that the memory reporters are always freed when ~Close() is run.
Comment 1 Nicholas Nethercote [:njn] 2011-06-08 21:34:00 PDT
Created attachment 538168 [details] [diff] [review]
patch (against m-i 70732:96fac50b57ed)

This patch unregisters the memory reporters for a Connection in ~Connection() if they haven't already been unregistered.

I re-ran the stress test from bug 654573 (having removed the corresponding fix to aboutPermissions.js from that bug):

- Open about:memory?verbose and about:permissions
- Hold down F5 to trigger constant reloads
- Click back and forth between the two tabs, once per second

Without this patch, this causes a crash every time within a few seconds.  With this patch, I couldn't reproduce the crash.  However, I did see the SQLite memory usage go up a lot.  In a 30 second run, I got this warning 107 times:

  WARNING: NS_ENSURE_TRUE(!(mAsyncExecutionThread)) failed: file /home/njn/moz/mi1/storage/src/mozStorageConnection.cpp, line 797

which means that 107 connections to places.sqlite were not asyncClosed as they should be.

Before doing my stress-test, the relevant parts of about:memory looked like this:

├───4,192,088 B (09.10%) -- storage
│   └──4,192,088 B (09.10%) -- sqlite
│      ├──1,110,936 B (02.41%) -- places.sqlite
│      │  ├────960,616 B (02.09%) -- cache-used
│      │  ├─────95,440 B (00.21%) -- schema-used
│      │  └─────54,880 B (00.12%) -- stmt-used
│      ├────865,448 B (01.88%) -- other

After 30 seconds, they looked like this:

├───24,052,136 B (22.35%) -- storage
│   └──24,052,136 B (22.35%) -- sqlite
│      ├──20,537,184 B (19.09%) -- other
│      ├───1,221,296 B (01.13%) -- places.sqlite
│      │   ├──1,059,664 B (00.98%) -- cache-used
│      │   ├─────95,440 B (00.09%) -- schema-used
│      │   └─────66,192 B (00.06%) -- stmt-used

(The reason the extra memory is all in the storage/sqlite/other bucket is because the memory reporters are unregistered for each destroyed Connection even though the SQLite connection is still open.)

107 connections led to a ~20,000,000 byte increase in memory usage, which is about 187,000 bytes per unclosed connection.
Comment 2 Nicholas Nethercote [:njn] 2011-06-09 00:05:44 PDT
Created attachment 538197 [details] [diff] [review]
better patch (against 70732:96fac50b57ed)

This patch takes an alternative approach -- it avoids the crashes, but also
keeps the memory reporters alive, but appends "-LEAKED" to their name so
it's obvious from looking at about:memory if any connections have leaked!

For example:

├──10,088,872 B (12.59%) -- storage
│  └──10,088,872 B (12.59%) -- sqlite
│     ├───4,552,864 B (05.68%) -- places.sqlite-LEAKED
│     │   ├──3,977,792 B (04.96%) -- cache-used
│     │   ├────575,072 B (00.72%) -- schema-used
│     │   └──────────0 B (00.00%) -- stmt-used

This is better -- the leak is findable in optimized builds, even from 
reports from normal users.  You don't have to spot the warning message in 
stderr (but that's still there).

This required a few changes:

- We detect in ~Connection() if the connection has been leaked (due to a
  missing asyncClose) and set StorageMemoryReporter::mHasLeaked.  "-LEAKED"
  is then appended onto the name of the connection as given by the memory
  reporter.

- StorageMemoryReporter no longer has a reference to the Connection, but
  rather a sqlite3*.  This means that the crashes are avoided when
  Connection is destroyed, because it's communicating directly with the
  sqlite connection, rather than going through the Connection.

  (Nb: up until now we've been passing a Connection& as the first argument
  to sqlite3_db_status().  I have no idea how that even compiled, let alone
  worked.  This patch passes an sqlite3* instead.)

- I also tweaked the descriptions on the memory reporters.

Note that I'm very new to Mozilla strings and am completely uncertain if my
usage of nsCString is appropriate;  thorough checking would be appreciated.
Comment 3 Shawn Wilsher :sdwilsh 2011-06-11 14:56:29 PDT
(In reply to comment #2)
>   (Nb: up until now we've been passing a Connection& as the first argument
>   to sqlite3_db_status().  I have no idea how that even compiled, let alone
>   worked.  This patch passes an sqlite3* instead.)
C++ type operator on the Connection class.
Comment 4 Marco Bonardo [::mak] 2011-06-17 08:38:38 PDT
Comment on attachment 538197 [details] [diff] [review]
better patch (against 70732:96fac50b57ed)

Review of attachment 538197 [details] [diff] [review]:
-----------------------------------------------------------------

It looks good to me.

Shawn, since this is still a transition and I'm not yet officially a peer, would you mind checking if I missed something?

::: storage/src/mozStorageConnection.cpp
@@ +357,5 @@
>    }
>  
>    NS_IMETHOD GetPath(char **memoryPath)
>    {
>      nsCString path;

Why not a nsCAutoString? I'd expect its buffer to be sufficient most of the time.

@@ +363,5 @@
>      path.AppendLiteral("explicit/storage/sqlite/");
> +    path.Append(mFileName);
> +
> +    if (mHasLeaked) {
> +      path.AppendLiteral("-LEAKED");

fwiw, -LEAKED sounds scary to me (And I guess to users), looks like everything there leaked, not just a connection. What about -HAS-LEAKS?

@@ +607,5 @@
>        break;
>    }
>  
> +  nsRefPtr<StorageMemoryReporter> reporter;
> +  nsCString filename = this->getFilename();

nsCAutoString here as well

@@ +766,5 @@
>  #endif
>  
>    for (PRUint32 i = 0; i < mMemoryReporters.Length(); i++) {
>      (void)::NS_UnregisterMemoryReporter(mMemoryReporters[i]);
> +    mMemoryReporters[i] = NULL;

nsnull

::: storage/src/mozStorageConnection.h
@@ +203,5 @@
>  
>    sqlite3 *mDBConn;
>    nsCOMPtr<nsIFile> mDatabaseFile;
>  
> +  nsTArray<nsCOMPtr<StorageMemoryReporter> > mMemoryReporters;

Shouldn't this rather be a nsRefPtr since not anymore an interface?
Comment 5 Shawn Wilsher :sdwilsh 2011-06-18 09:26:41 PDT
(In reply to comment #4)
> Shawn, since this is still a transition and I'm not yet officially a peer,
> would you mind checking if I missed something?
If you need me to do reviews, just flag me still.  I should start doing them again this weekend.
Comment 6 Nicholas Nethercote [:njn] 2011-06-19 19:54:20 PDT
(In reply to comment #4)
> >    NS_IMETHOD GetPath(char **memoryPath)
> >    {
> >      nsCString path;
> 
> Why not a nsCAutoString? I'd expect its buffer to be sufficient most of the
> time.

"Most of the time" makes me nervous.  And that code pre-dates my patch.


> @@ +363,5 @@
> >      path.AppendLiteral("explicit/storage/sqlite/");
> > +    path.Append(mFileName);
> > +
> > +    if (mHasLeaked) {
> > +      path.AppendLiteral("-LEAKED");
> 
> fwiw, -LEAKED sounds scary to me (And I guess to users), looks like
> everything there leaked, not just a connection. What about -HAS-LEAKS?

The entire connection has leaked, and this memory reporter is describing the connection, so I think it's a reasonable name.  It's supposed to sound scary :)


> > +  nsRefPtr<StorageMemoryReporter> reporter;
> > +  nsCString filename = this->getFilename();
> 
> nsCAutoString here as well

Again, I'd rather not make assumptions about the length of the filenames.


> > +    mMemoryReporters[i] = NULL;
> 
> nsnull

I changed it.


> > +  nsTArray<nsCOMPtr<StorageMemoryReporter> > mMemoryReporters;
> 
> Shouldn't this rather be a nsRefPtr since not anymore an interface?

I'm not sure.  StorageMemoryReporter inherits from nsIMemoryReporter...
Comment 7 Nicholas Nethercote [:njn] 2011-06-19 19:55:21 PDT
Created attachment 540380 [details] [diff] [review]
better patch, v2

For sdwilsh to review.
Comment 8 Josh Matthews [:jdm] (on vacation until Dec 5) 2011-06-19 20:29:01 PDT
(In reply to comment #6)
> (In reply to comment #4)
> > > +  nsTArray<nsCOMPtr<StorageMemoryReporter> > mMemoryReporters;
> > 
> > Shouldn't this rather be a nsRefPtr since not anymore an interface?
> 
> I'm not sure.  StorageMemoryReporter inherits from nsIMemoryReporter...

When the template type is a concrete class, you should always use nsRefPtr.
Comment 9 Shawn Wilsher :sdwilsh 2011-06-22 21:05:23 PDT
Comment on attachment 540380 [details] [diff] [review]
better patch, v2

Review of attachment 540380 [details] [diff] [review]:
-----------------------------------------------------------------

This code feels like it should be very easy to write a test for, so I'd like to see that before giving it r+ (and I should have a faster review turn-around time moving forward here).

::: storage/src/mozStorageConnection.cpp
@@ +363,5 @@
>      path.AppendLiteral("explicit/storage/sqlite/");
> +    path.Append(mFileName);
> +
> +    if (mHasLeaked) {
> +      path.AppendLiteral("-LEAKED");

I think I'd prefer to not do "-LEAKED" but rather " (LEAKED)".  Thoughts?

@@ +424,5 @@
>      return convertResultCode(rc);
>    }
> +
> +  // We call this when we know we've leaked a connection.
> +  void hasLeaked()

hasLeaked sounds like it is querying status.  How about setWillLeak?

@@ +819,2 @@
>      MutexAutoLock lockedScope(sharedAsyncExecutionMutex);
>      NS_ENSURE_FALSE(mAsyncExecutionThread, NS_ERROR_UNEXPECTED);

Because this error isn't very telling in it's current state, let's add a local variable to help make the debug output better:
bool asyncCloseWasCalled = !mAsyncExecutionThread;
NS_ENSURE_TRUE(asyncCloseWasCalled, NS_ERROR_UNEXPECTED);

::: storage/src/mozStorageConnection.h
@@ +203,5 @@
>  
>    sqlite3 *mDBConn;
>    nsCOMPtr<nsIFile> mDatabaseFile;
>  
> +  nsTArray<nsCOMPtr<StorageMemoryReporter> > mMemoryReporters;

s/nsCOMPtr/nsRefPtr/
Comment 10 Nicholas Nethercote [:njn] 2011-06-22 21:09:20 PDT
(In reply to comment #9)
> 
> I think I'd prefer to not do "-LEAKED" but rather " (LEAKED)".  Thoughts?

No other memory reporters have a space in them.  But I guess it doesn't matter, and we shouldn't even be seeing this in practice.  I'll update.

> hasLeaked sounds like it is querying status.  How about setWillLeak?

setWillLeak sounds funny to me.  I originally had markAsLeaked(), I'll go back to that.

Thanks for reviewing, I'll do the test and make the changes.
Comment 11 Shawn Wilsher :sdwilsh 2011-06-22 21:32:56 PDT
(In reply to comment #10)
> No other memory reporters have a space in them.  But I guess it doesn't matter,
> and we shouldn't even be seeing this in practice.  I'll update.
Yeah, but other memory reporters aren't indicating state like this either AFIAK, right?

> setWillLeak sounds funny to me.  I originally had markAsLeaked(), I'll go back
> to that.
markAsLeaked sounds better than my suggestion (:
Comment 12 Marco Bonardo [::mak] 2011-06-23 03:26:26 PDT
(In reply to comment #6)
> > @@ +363,5 @@
> > >      path.AppendLiteral("explicit/storage/sqlite/");
> > > +    path.Append(mFileName);
> > > +
> > > +    if (mHasLeaked) {
> > > +      path.AppendLiteral("-LEAKED");
> > 
> > fwiw, -LEAKED sounds scary to me (And I guess to users), looks like
> > everything there leaked, not just a connection. What about -HAS-LEAKS?
> 
> The entire connection has leaked, and this memory reporter is describing the
> connection, so I think it's a reasonable name.  It's supposed to sound scary
> :)

Yes, I meant that to me sounds possible there are multiple connections, and maybe some of them leaked and some not. Btw, I don't mind :)

Regarding the autostrings, they should avoid the heap allocation if the string is shorter and not be much more expensive when longer (they'll just allocate on the heap the rare database names longer than 64 chars). So I'm not sure what is preventing not using them, apart their advantages.
Comment 13 Nicholas Nethercote [:njn] 2011-07-14 21:55:54 PDT
Created attachment 546094 [details] [diff] [review]
patch, v3, with test

Well, that wasn't much fun, but it's done.

I ended up going with "foo-LEAKED" because I couldn't bear "foo (LEAKED)" since it was a different style to every other memory reporter.  Hope you don't mind.  Hopefully it'll never be seen by anyone :)
Comment 14 Marco Bonardo [::mak] 2011-07-20 15:00:00 PDT
Comment on attachment 546094 [details] [diff] [review]
patch, v3, with test

Review of attachment 546094 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/aboutmemory/tests/chrome/test_asyncClose_leak.xul
@@ +22,5 @@
> +  const Ci = Components.interfaces;
> +  const Cu = Components.utils;
> +
> +  var db = Cc["@mozilla.org/browser/nav-history-service;1"].
> +           getService(Ci.nsPIPlacesDatabase).DBConnection.clone(true);

having a test in another component depending on Places is a bit strange esp. since one may compile without it, I suppose it may be better to create a db for this test, it should just be matter of opening a connection to a new database.

@@ +28,5 @@
> +  // A statement, the exact form doesn't matter, it just has to be
> +  // asynchronous.
> +  var stmt = db.createAsyncStatement(
> +               "SELECT SUM(visit_count) AS count " +
> +               "FROM moz_places WHERE rev_host = :rev_host");

and as a query to simplify things you may select from sqlite_master table that always exists

@@ +42,5 @@
> +    }
> +  });
> +  stmt.finalize();
> +  //db.asyncClose();
> +  stmt = null;

this commented out line looks bogus as it is, I know why it is there, but a comment would clarify that for everyone.
Comment 15 Nicholas Nethercote [:njn] 2011-07-21 20:12:12 PDT
> I suppose it may be better to create a db
> for this test, it should just be matter of opening a connection to a new
> database.

How do I do that?
Comment 16 Marco Bonardo [::mak] 2011-07-22 06:41:00 PDT
just make a nsIFile and storage.openDatabase(file) it
Comment 17 Nicholas Nethercote [:njn] 2011-07-24 17:52:34 PDT
Created attachment 548063 [details] [diff] [review]
patch, v4

This patch address mak's comments on the previous patch.
Comment 18 Mozilla RelEng Bot 2011-07-24 22:00:23 PDT
Try run for e1c8cbe45980 is complete.
Detailed breakdown of the results available here:
    http://tbpl.mozilla.org/?tree=Try&rev=e1c8cbe45980
Results:
    success: 128
    warnings: 16
    failure: 2
Total buildrequests: 146
Comment 19 Nicholas Nethercote [:njn] 2011-07-24 22:17:11 PDT
Sigh, I got a try failure on the Linux64 opt build, which I cannot reproduce locally.  Presumably the GC is not running in time and so the DB isn't being collected, which means the leak doesn't occur in time.
Comment 20 Marco Bonardo [::mak] 2011-07-26 07:16:15 PDT
Comment on attachment 548063 [details] [diff] [review]
patch, v4

Review of attachment 548063 [details] [diff] [review]:
-----------------------------------------------------------------

it looks fine to me, you should clearly resolve the random failures in the test before pushing, but that's obvious.

::: storage/src/mozStorageConnection.cpp
@@ +469,5 @@
> +  // with asyncClose(), the connection is about to leak and it's too late to do
> +  // anything about it.  So we mark the memory reporters accordingly so that
> +  // the leak will be obvious in about:memory.
> +  for (PRUint32 i = 0; i < mMemoryReporters.Length(); i++) {
> +    if (mMemoryReporters[i] != nsnull) {

any specific reasons you can't do if (mMemoryReporters[i]) ? I think the general good practice is to avoid nsnull or boolean comparisons

@@ +820,3 @@
>      MutexAutoLock lockedScope(sharedAsyncExecutionMutex);
> +    bool asyncCloseWasCalled = !mAsyncExecutionThread;
> +    NS_ENSURE_TRUE(asyncCloseWasCalled, NS_ERROR_UNEXPECTED);

I think there is no need of a warning here, based on recent discussion about removing useless warnings (benjamin would like to remove NS_ENSURE_ usage in new patches), I'd rather
if (mAsyncExecutionThread)
  return NS_ERROR_UNEXPECTED;
and better explain in the comment above that this may fail if asyncClosed has not been called before, rather than adding another bool assignment.

::: toolkit/components/aboutmemory/tests/chrome/test_asyncClose_leak.xul
@@ +24,5 @@
> +
> +  // Make a fake DB file.
> +  var file = Cc["@mozilla.org/file/directory_service;1"].
> +             getService(Ci.nsIProperties).
> +             get("TmpD", Ci.nsIFile);

you may create a file in ProfD I guess that should be removed at the end of the test harness run, not sure if creating in tmp is fine on all platforms, esp because that file may stick there and never be removed across test runs.

@@ +26,5 @@
> +  var file = Cc["@mozilla.org/file/directory_service;1"].
> +             getService(Ci.nsIProperties).
> +             get("TmpD", Ci.nsIFile);
> +  file.append("asyncClose_leak-fake-DB-tmp.sqlite");
> +  file.createUnique(Ci.nsIFile.NORMAL_FILE_TYPE, 0666);

and then you wouldn't need this createUnique thing (lust let sqlite create the file and harness cleanup, bonus points if the test deletes the db before finishing, that is usually a good choice)

@@ +29,5 @@
> +  file.append("asyncClose_leak-fake-DB-tmp.sqlite");
> +  file.createUnique(Ci.nsIFile.NORMAL_FILE_TYPE, 0666);
> +
> +  var service = Cc["@mozilla.org/storage/service;1"].
> +                getService(Ci.mozIStorageService);

nit: s/service/storage/ since service is so generic I have no idea which service is

@@ +44,5 @@
> +    },
> +    handleCompletion: function(aReason) {
> +    }
> +  });
> +  stmt.finalize();

nit: try {} around execution and put finalize() in a finally {} (people copies from our tests, let's give them good practices)
Comment 21 Nicholas Nethercote [:njn] 2011-08-03 17:26:31 PDT
We're still getting occasional crashes, see bug 654573 comment 76.
Comment 22 Nicholas Nethercote [:njn] 2011-08-04 16:16:38 PDT
*** Bug 670953 has been marked as a duplicate of this bug. ***
Comment 23 Nicholas Nethercote [:njn] 2011-08-04 16:20:07 PDT
http://hg.mozilla.org/integration/mozilla-inbound/rev/e820bee09d57
Comment 24 Marco Bonardo [::mak] 2011-08-05 09:03:43 PDT
http://hg.mozilla.org/mozilla-central/rev/e820bee09d57
Comment 25 Scoobidiver (away) 2011-08-09 03:03:50 PDT
With combined signatures, it is #3 top browser crasher in 7.0a2.
Comment 26 Nicholas Nethercote [:njn] 2011-08-09 03:22:18 PDT
(In reply to Scoobidiver from comment #25)
> With combined signatures, it is #3 top browser crasher in 7.0a2.

Ugh.  If we want to fix it in FF7, either we land this patch, which tolerates the non-asyncClosed connections -- i.e. avoids the crash but the leak still occurs.  Alternatively, we find the connection that isn't being asyncClosed;  that's harder but would truly fix the problem.
Comment 27 Shawn Wilsher :sdwilsh 2011-08-09 03:27:53 PDT
Leaking is better than crashing.  We should fix the crash, and then we can try to find the bad connection after the fact to plug the leak.
Comment 28 christian 2011-08-09 14:47:50 PDT
Comment on attachment 548063 [details] [diff] [review]
patch, v4

We decided we want this for aurora rather than hunt down every case where they are mismatched. Can we make sure there is a followup for finding all the mismatched calls?
Comment 29 Nicholas Nethercote [:njn] 2011-08-09 18:26:17 PDT
http://hg.mozilla.org/releases/mozilla-aurora/rev/d05ba21b434b

> Can we make sure there is a followup for finding all
> the mismatched calls?

Bug 677809.
Comment 30 Vlad [QA] 2011-08-19 05:24:38 PDT
How can this be verified?
Thanks.
Comment 31 Nicholas Nethercote [:njn] 2011-08-19 15:35:09 PDT
(In reply to Vlad from comment #30)
> How can this be verified?

Run the chrome test that landed with the patch.
Comment 32 Vlad [QA] 2011-08-23 07:21:52 PDT
Based on today's TBPL, this test is passing. http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-Aurora/1313761786.1313766031.6313.gz&fulltext=1

Setting resolution to Verified Fixed.

Note You need to log in before you can comment on or make changes to this bug.