Closed
Bug 662989
Opened 13 years ago
Closed 13 years ago
Avoid crashing if a DB connection isn't asyncClose()d when it should be
Categories
(Core :: SQLite and Embedded Database Bindings, defect)
Tracking
()
VERIFIED
FIXED
mozilla8
People
(Reporter: n.nethercote, Assigned: n.nethercote)
References
Details
(Keywords: crash, Whiteboard: [inbound])
Crash Data
Attachments
(1 file, 4 obsolete files)
10.97 KB,
patch
|
mak
:
review+
christian
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
From bug 654573 comment 59: >> AFAICT the real underlying problem is that some JS code that uses an async >> DB connection fails to call asyncClose() on it before it becomes garbage. > > Yes. Which is not to say that it's reasonable for storage to be causing a > crash or endangering the heap when this sequence of events occurs; storage > shouldn't do that, it's a bug. The problem is that if a connection on which async transactions were executed is not explicitly closed with asyncClose(), its destructor will run Close() instead, which will bail out early (because of the async-ness), so Connection::internalClose() isn't run, and the memory reporters for the connection are not unregistered. The end result is that memory reporters are left pointing to a Connection that has been freed, which is a sure recipe for a subsequent hard-to-debug crash (eg. bug 654573, bug 660213). To make things more robust, we just need to ensure that the memory reporters are always freed when ~Close() is run.
Assignee | ||
Comment 1•13 years ago
|
||
This patch unregisters the memory reporters for a Connection in ~Connection() if they haven't already been unregistered. I re-ran the stress test from bug 654573 (having removed the corresponding fix to aboutPermissions.js from that bug): - Open about:memory?verbose and about:permissions - Hold down F5 to trigger constant reloads - Click back and forth between the two tabs, once per second Without this patch, this causes a crash every time within a few seconds. With this patch, I couldn't reproduce the crash. However, I did see the SQLite memory usage go up a lot. In a 30 second run, I got this warning 107 times: WARNING: NS_ENSURE_TRUE(!(mAsyncExecutionThread)) failed: file /home/njn/moz/mi1/storage/src/mozStorageConnection.cpp, line 797 which means that 107 connections to places.sqlite were not asyncClosed as they should be. Before doing my stress-test, the relevant parts of about:memory looked like this: ├───4,192,088 B (09.10%) -- storage │ └──4,192,088 B (09.10%) -- sqlite │ ├──1,110,936 B (02.41%) -- places.sqlite │ │ ├────960,616 B (02.09%) -- cache-used │ │ ├─────95,440 B (00.21%) -- schema-used │ │ └─────54,880 B (00.12%) -- stmt-used │ ├────865,448 B (01.88%) -- other After 30 seconds, they looked like this: ├───24,052,136 B (22.35%) -- storage │ └──24,052,136 B (22.35%) -- sqlite │ ├──20,537,184 B (19.09%) -- other │ ├───1,221,296 B (01.13%) -- places.sqlite │ │ ├──1,059,664 B (00.98%) -- cache-used │ │ ├─────95,440 B (00.09%) -- schema-used │ │ └─────66,192 B (00.06%) -- stmt-used (The reason the extra memory is all in the storage/sqlite/other bucket is because the memory reporters are unregistered for each destroyed Connection even though the SQLite connection is still open.) 107 connections led to a ~20,000,000 byte increase in memory usage, which is about 187,000 bytes per unclosed connection.
Attachment #538168 -
Flags: review?(mak77)
Assignee | ||
Comment 2•13 years ago
|
||
This patch takes an alternative approach -- it avoids the crashes, but also keeps the memory reporters alive, but appends "-LEAKED" to their name so it's obvious from looking at about:memory if any connections have leaked! For example: ├──10,088,872 B (12.59%) -- storage │ └──10,088,872 B (12.59%) -- sqlite │ ├───4,552,864 B (05.68%) -- places.sqlite-LEAKED │ │ ├──3,977,792 B (04.96%) -- cache-used │ │ ├────575,072 B (00.72%) -- schema-used │ │ └──────────0 B (00.00%) -- stmt-used This is better -- the leak is findable in optimized builds, even from reports from normal users. You don't have to spot the warning message in stderr (but that's still there). This required a few changes: - We detect in ~Connection() if the connection has been leaked (due to a missing asyncClose) and set StorageMemoryReporter::mHasLeaked. "-LEAKED" is then appended onto the name of the connection as given by the memory reporter. - StorageMemoryReporter no longer has a reference to the Connection, but rather a sqlite3*. This means that the crashes are avoided when Connection is destroyed, because it's communicating directly with the sqlite connection, rather than going through the Connection. (Nb: up until now we've been passing a Connection& as the first argument to sqlite3_db_status(). I have no idea how that even compiled, let alone worked. This patch passes an sqlite3* instead.) - I also tweaked the descriptions on the memory reporters. Note that I'm very new to Mozilla strings and am completely uncertain if my usage of nsCString is appropriate; thorough checking would be appreciated.
Attachment #538168 -
Attachment is obsolete: true
Attachment #538168 -
Flags: review?(mak77)
Attachment #538197 -
Flags: review?(mak77)
Comment 3•13 years ago
|
||
(In reply to comment #2) > (Nb: up until now we've been passing a Connection& as the first argument > to sqlite3_db_status(). I have no idea how that even compiled, let alone > worked. This patch passes an sqlite3* instead.) C++ type operator on the Connection class.
Comment 4•13 years ago
|
||
Comment on attachment 538197 [details] [diff] [review] better patch (against 70732:96fac50b57ed) Review of attachment 538197 [details] [diff] [review]: ----------------------------------------------------------------- It looks good to me. Shawn, since this is still a transition and I'm not yet officially a peer, would you mind checking if I missed something? ::: storage/src/mozStorageConnection.cpp @@ +357,5 @@ > } > > NS_IMETHOD GetPath(char **memoryPath) > { > nsCString path; Why not a nsCAutoString? I'd expect its buffer to be sufficient most of the time. @@ +363,5 @@ > path.AppendLiteral("explicit/storage/sqlite/"); > + path.Append(mFileName); > + > + if (mHasLeaked) { > + path.AppendLiteral("-LEAKED"); fwiw, -LEAKED sounds scary to me (And I guess to users), looks like everything there leaked, not just a connection. What about -HAS-LEAKS? @@ +607,5 @@ > break; > } > > + nsRefPtr<StorageMemoryReporter> reporter; > + nsCString filename = this->getFilename(); nsCAutoString here as well @@ +766,5 @@ > #endif > > for (PRUint32 i = 0; i < mMemoryReporters.Length(); i++) { > (void)::NS_UnregisterMemoryReporter(mMemoryReporters[i]); > + mMemoryReporters[i] = NULL; nsnull ::: storage/src/mozStorageConnection.h @@ +203,5 @@ > > sqlite3 *mDBConn; > nsCOMPtr<nsIFile> mDatabaseFile; > > + nsTArray<nsCOMPtr<StorageMemoryReporter> > mMemoryReporters; Shouldn't this rather be a nsRefPtr since not anymore an interface?
Attachment #538197 -
Flags: review?(mak77) → review+
Comment 5•13 years ago
|
||
(In reply to comment #4) > Shawn, since this is still a transition and I'm not yet officially a peer, > would you mind checking if I missed something? If you need me to do reviews, just flag me still. I should start doing them again this weekend.
Assignee | ||
Comment 6•13 years ago
|
||
(In reply to comment #4) > > NS_IMETHOD GetPath(char **memoryPath) > > { > > nsCString path; > > Why not a nsCAutoString? I'd expect its buffer to be sufficient most of the > time. "Most of the time" makes me nervous. And that code pre-dates my patch. > @@ +363,5 @@ > > path.AppendLiteral("explicit/storage/sqlite/"); > > + path.Append(mFileName); > > + > > + if (mHasLeaked) { > > + path.AppendLiteral("-LEAKED"); > > fwiw, -LEAKED sounds scary to me (And I guess to users), looks like > everything there leaked, not just a connection. What about -HAS-LEAKS? The entire connection has leaked, and this memory reporter is describing the connection, so I think it's a reasonable name. It's supposed to sound scary :) > > + nsRefPtr<StorageMemoryReporter> reporter; > > + nsCString filename = this->getFilename(); > > nsCAutoString here as well Again, I'd rather not make assumptions about the length of the filenames. > > + mMemoryReporters[i] = NULL; > > nsnull I changed it. > > + nsTArray<nsCOMPtr<StorageMemoryReporter> > mMemoryReporters; > > Shouldn't this rather be a nsRefPtr since not anymore an interface? I'm not sure. StorageMemoryReporter inherits from nsIMemoryReporter...
Assignee | ||
Comment 7•13 years ago
|
||
For sdwilsh to review.
Attachment #538197 -
Attachment is obsolete: true
Attachment #540380 -
Flags: review?(sdwilsh)
Comment 8•13 years ago
|
||
(In reply to comment #6) > (In reply to comment #4) > > > + nsTArray<nsCOMPtr<StorageMemoryReporter> > mMemoryReporters; > > > > Shouldn't this rather be a nsRefPtr since not anymore an interface? > > I'm not sure. StorageMemoryReporter inherits from nsIMemoryReporter... When the template type is a concrete class, you should always use nsRefPtr.
Comment 9•13 years ago
|
||
Comment on attachment 540380 [details] [diff] [review] better patch, v2 Review of attachment 540380 [details] [diff] [review]: ----------------------------------------------------------------- This code feels like it should be very easy to write a test for, so I'd like to see that before giving it r+ (and I should have a faster review turn-around time moving forward here). ::: storage/src/mozStorageConnection.cpp @@ +363,5 @@ > path.AppendLiteral("explicit/storage/sqlite/"); > + path.Append(mFileName); > + > + if (mHasLeaked) { > + path.AppendLiteral("-LEAKED"); I think I'd prefer to not do "-LEAKED" but rather " (LEAKED)". Thoughts? @@ +424,5 @@ > return convertResultCode(rc); > } > + > + // We call this when we know we've leaked a connection. > + void hasLeaked() hasLeaked sounds like it is querying status. How about setWillLeak? @@ +819,2 @@ > MutexAutoLock lockedScope(sharedAsyncExecutionMutex); > NS_ENSURE_FALSE(mAsyncExecutionThread, NS_ERROR_UNEXPECTED); Because this error isn't very telling in it's current state, let's add a local variable to help make the debug output better: bool asyncCloseWasCalled = !mAsyncExecutionThread; NS_ENSURE_TRUE(asyncCloseWasCalled, NS_ERROR_UNEXPECTED); ::: storage/src/mozStorageConnection.h @@ +203,5 @@ > > sqlite3 *mDBConn; > nsCOMPtr<nsIFile> mDatabaseFile; > > + nsTArray<nsCOMPtr<StorageMemoryReporter> > mMemoryReporters; s/nsCOMPtr/nsRefPtr/
Attachment #540380 -
Flags: review?(sdwilsh) → review-
Assignee | ||
Comment 10•13 years ago
|
||
(In reply to comment #9) > > I think I'd prefer to not do "-LEAKED" but rather " (LEAKED)". Thoughts? No other memory reporters have a space in them. But I guess it doesn't matter, and we shouldn't even be seeing this in practice. I'll update. > hasLeaked sounds like it is querying status. How about setWillLeak? setWillLeak sounds funny to me. I originally had markAsLeaked(), I'll go back to that. Thanks for reviewing, I'll do the test and make the changes.
Comment 11•13 years ago
|
||
(In reply to comment #10) > No other memory reporters have a space in them. But I guess it doesn't matter, > and we shouldn't even be seeing this in practice. I'll update. Yeah, but other memory reporters aren't indicating state like this either AFIAK, right? > setWillLeak sounds funny to me. I originally had markAsLeaked(), I'll go back > to that. markAsLeaked sounds better than my suggestion (:
Comment 12•13 years ago
|
||
(In reply to comment #6) > > @@ +363,5 @@ > > > path.AppendLiteral("explicit/storage/sqlite/"); > > > + path.Append(mFileName); > > > + > > > + if (mHasLeaked) { > > > + path.AppendLiteral("-LEAKED"); > > > > fwiw, -LEAKED sounds scary to me (And I guess to users), looks like > > everything there leaked, not just a connection. What about -HAS-LEAKS? > > The entire connection has leaked, and this memory reporter is describing the > connection, so I think it's a reasonable name. It's supposed to sound scary > :) Yes, I meant that to me sounds possible there are multiple connections, and maybe some of them leaked and some not. Btw, I don't mind :) Regarding the autostrings, they should avoid the heap allocation if the string is shorter and not be much more expensive when longer (they'll just allocate on the heap the rare database names longer than 64 chars). So I'm not sure what is preventing not using them, apart their advantages.
Assignee | ||
Comment 13•13 years ago
|
||
Well, that wasn't much fun, but it's done. I ended up going with "foo-LEAKED" because I couldn't bear "foo (LEAKED)" since it was a different style to every other memory reporter. Hope you don't mind. Hopefully it'll never be seen by anyone :)
Attachment #540380 -
Attachment is obsolete: true
Attachment #546094 -
Flags: review?(sdwilsh)
Comment 14•13 years ago
|
||
Comment on attachment 546094 [details] [diff] [review] patch, v3, with test Review of attachment 546094 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/components/aboutmemory/tests/chrome/test_asyncClose_leak.xul @@ +22,5 @@ > + const Ci = Components.interfaces; > + const Cu = Components.utils; > + > + var db = Cc["@mozilla.org/browser/nav-history-service;1"]. > + getService(Ci.nsPIPlacesDatabase).DBConnection.clone(true); having a test in another component depending on Places is a bit strange esp. since one may compile without it, I suppose it may be better to create a db for this test, it should just be matter of opening a connection to a new database. @@ +28,5 @@ > + // A statement, the exact form doesn't matter, it just has to be > + // asynchronous. > + var stmt = db.createAsyncStatement( > + "SELECT SUM(visit_count) AS count " + > + "FROM moz_places WHERE rev_host = :rev_host"); and as a query to simplify things you may select from sqlite_master table that always exists @@ +42,5 @@ > + } > + }); > + stmt.finalize(); > + //db.asyncClose(); > + stmt = null; this commented out line looks bogus as it is, I know why it is there, but a comment would clarify that for everyone.
Assignee | ||
Comment 15•13 years ago
|
||
> I suppose it may be better to create a db
> for this test, it should just be matter of opening a connection to a new
> database.
How do I do that?
Comment 16•13 years ago
|
||
just make a nsIFile and storage.openDatabase(file) it
Assignee | ||
Comment 17•13 years ago
|
||
This patch address mak's comments on the previous patch.
Attachment #546094 -
Attachment is obsolete: true
Attachment #546094 -
Flags: review?(sdwilsh)
Attachment #548063 -
Flags: review?(mak77)
Comment 18•13 years ago
|
||
Try run for e1c8cbe45980 is complete. Detailed breakdown of the results available here: http://tbpl.mozilla.org/?tree=Try&rev=e1c8cbe45980 Results: success: 128 warnings: 16 failure: 2 Total buildrequests: 146
Assignee | ||
Comment 19•13 years ago
|
||
Sigh, I got a try failure on the Linux64 opt build, which I cannot reproduce locally. Presumably the GC is not running in time and so the DB isn't being collected, which means the leak doesn't occur in time.
Comment 20•13 years ago
|
||
Comment on attachment 548063 [details] [diff] [review] patch, v4 Review of attachment 548063 [details] [diff] [review]: ----------------------------------------------------------------- it looks fine to me, you should clearly resolve the random failures in the test before pushing, but that's obvious. ::: storage/src/mozStorageConnection.cpp @@ +469,5 @@ > + // with asyncClose(), the connection is about to leak and it's too late to do > + // anything about it. So we mark the memory reporters accordingly so that > + // the leak will be obvious in about:memory. > + for (PRUint32 i = 0; i < mMemoryReporters.Length(); i++) { > + if (mMemoryReporters[i] != nsnull) { any specific reasons you can't do if (mMemoryReporters[i]) ? I think the general good practice is to avoid nsnull or boolean comparisons @@ +820,3 @@ > MutexAutoLock lockedScope(sharedAsyncExecutionMutex); > + bool asyncCloseWasCalled = !mAsyncExecutionThread; > + NS_ENSURE_TRUE(asyncCloseWasCalled, NS_ERROR_UNEXPECTED); I think there is no need of a warning here, based on recent discussion about removing useless warnings (benjamin would like to remove NS_ENSURE_ usage in new patches), I'd rather if (mAsyncExecutionThread) return NS_ERROR_UNEXPECTED; and better explain in the comment above that this may fail if asyncClosed has not been called before, rather than adding another bool assignment. ::: toolkit/components/aboutmemory/tests/chrome/test_asyncClose_leak.xul @@ +24,5 @@ > + > + // Make a fake DB file. > + var file = Cc["@mozilla.org/file/directory_service;1"]. > + getService(Ci.nsIProperties). > + get("TmpD", Ci.nsIFile); you may create a file in ProfD I guess that should be removed at the end of the test harness run, not sure if creating in tmp is fine on all platforms, esp because that file may stick there and never be removed across test runs. @@ +26,5 @@ > + var file = Cc["@mozilla.org/file/directory_service;1"]. > + getService(Ci.nsIProperties). > + get("TmpD", Ci.nsIFile); > + file.append("asyncClose_leak-fake-DB-tmp.sqlite"); > + file.createUnique(Ci.nsIFile.NORMAL_FILE_TYPE, 0666); and then you wouldn't need this createUnique thing (lust let sqlite create the file and harness cleanup, bonus points if the test deletes the db before finishing, that is usually a good choice) @@ +29,5 @@ > + file.append("asyncClose_leak-fake-DB-tmp.sqlite"); > + file.createUnique(Ci.nsIFile.NORMAL_FILE_TYPE, 0666); > + > + var service = Cc["@mozilla.org/storage/service;1"]. > + getService(Ci.mozIStorageService); nit: s/service/storage/ since service is so generic I have no idea which service is @@ +44,5 @@ > + }, > + handleCompletion: function(aReason) { > + } > + }); > + stmt.finalize(); nit: try {} around execution and put finalize() in a finally {} (people copies from our tests, let's give them good practices)
Attachment #548063 -
Flags: review?(mak77) → review+
Assignee | ||
Comment 21•13 years ago
|
||
We're still getting occasional crashes, see bug 654573 comment 76.
Crash Signature: [@ mozilla::storage::StorageMemoryReporter::GetPath ]
[@ @0x0 | mozilla::storage::StorageMemoryReporter::GetPath ]
[@ mozilla::storage::Connection::getFilename() ]
[@ mozilla::storage::Connection::getFilename ]
Assignee | ||
Comment 23•13 years ago
|
||
http://hg.mozilla.org/integration/mozilla-inbound/rev/e820bee09d57
Whiteboard: [inbound]
Comment 24•13 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/e820bee09d57
Target Milestone: --- → mozilla8
Updated•13 years ago
|
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Comment 25•13 years ago
|
||
With combined signatures, it is #3 top browser crasher in 7.0a2.
Assignee | ||
Comment 26•13 years ago
|
||
(In reply to Scoobidiver from comment #25) > With combined signatures, it is #3 top browser crasher in 7.0a2. Ugh. If we want to fix it in FF7, either we land this patch, which tolerates the non-asyncClosed connections -- i.e. avoids the crash but the leak still occurs. Alternatively, we find the connection that isn't being asyncClosed; that's harder but would truly fix the problem.
Comment 27•13 years ago
|
||
Leaking is better than crashing. We should fix the crash, and then we can try to find the bad connection after the fact to plug the leak.
Comment 28•13 years ago
|
||
Comment on attachment 548063 [details] [diff] [review] patch, v4 We decided we want this for aurora rather than hunt down every case where they are mismatched. Can we make sure there is a followup for finding all the mismatched calls?
Attachment #548063 -
Flags: approval-mozilla-aurora+
Assignee | ||
Comment 29•13 years ago
|
||
http://hg.mozilla.org/releases/mozilla-aurora/rev/d05ba21b434b > Can we make sure there is a followup for finding all > the mismatched calls? Bug 677809.
status-firefox7:
--- → fixed
Comment 30•13 years ago
|
||
How can this be verified? Thanks.
Assignee | ||
Comment 31•13 years ago
|
||
(In reply to Vlad from comment #30) > How can this be verified? Run the chrome test that landed with the patch.
Comment 32•13 years ago
|
||
Based on today's TBPL, this test is passing. http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-Aurora/1313761786.1313766031.6313.gz&fulltext=1 Setting resolution to Verified Fixed.
Status: RESOLVED → VERIFIED
Updated•8 days ago
|
Product: Toolkit → Core
You need to log in
before you can comment on or make changes to this bug.
Description
•