723248 - add support for closing inactive databases (folders)

Assignee

Description

•

12 years ago

Attached patch implement last use time for msg db's — Details — Splinter Review

Because nsIMsgFolder caches open db's, it's possible to end up with db's open that aren't being used. We try to avoid this by nsIMsgFolder.msgDatabase = null; but we also leave special folders open, and sometimes db's get left open inadvertently, and some extensions don't know that they should be doing this. Keeping track of the last use time for a db and all the objects that come from the db is daunting, but it occurred to me that we can do well enough by tracking Commit and Open and a few other methods, and that the perfect is definitely the enemy of the good in this case. The attached patch instruments the db's last use time, and allows some other client code to close inactive db's. I'm still experimenting with the client code to close inactive db's, but I want to get the backend code started.

Attachment #593559 - Flags: superreview?(mbanner)

Attachment #593559 - Flags: review?(neil)

David :Bienvenu

Assignee

Comment 1

•

12 years ago

wsmwk, when finished, this should help with some memory bloat issues, as well as file handle limit issues.

Status: NEW → ASSIGNED

Andrew Sutherland [:asuth] (he/him)

Comment 2

•

12 years ago

An alternative to using the timestamp might be to use a 'generation' identifier that lives in a global.  While the overhead of the PR_Now is likely only to be noticable in GetMsgHdrForKey, the use of the generation could aid in debugging since it could provide more useful info.

For example, we could reserve generation id's ending in 1 for autosync.  When autosync is doing something on the event loop, it boosts the generation id and then that is associated with the autosync.  When we go to expire the database, we can see that autosync is complicit in the database being left open by logging the generation identifier.  Autosync would boost the generation when it is done to something that does not blame it.

Updating the generation otherwise could be handled by the 'cron' process that handles retirement.  It could update the generation to the current timestamp times 10 or something like that.

This mechanism could potentially also be used to point blame at extensions, although causing the tagging to happen reliably is non-trivial without some help from the extension (or a framework it uses) or platform.

David :Bienvenu

Assignee

Comment 3

•

12 years ago

Thx, a generation identifier is an interesting idea. I hadn't intended to run the cron process frequently - I'm doing it every minute right now.  I'm really trying to solve the problem of gradual bloat, not the problem of an extension deciding to open the db for every folder in a tight loop - those extensions are beyond hope :-) I wasn't focused on blame, either, since I know most of the culprits, and there are definitely advantages to leaving db's open for a brief amount of time, under the theory that if the user does something that involves a particular folder, they're more likely to do something with that folder again in the not too distant future. And I'm trying to lessen the burden on client and extension code, and having them cooperate with the generation mechanism could increase it.

My first idea for closing db's was to have two thresholds - a db idle time pref in seconds, and a max number of db's open. Every time through the cron process, we close db's that have been idle more than the limit, and then if more db's are still open than the max, we close the ones that have been idle the longest (with perhaps a minimum idle time).

I'm not convinced that the PR_Now call in GetMsgHdrForKey will be noticeable, but I probably don't need it either - setting the last use time when client code accesses nsIMsgFolder.msgDatabase occurred to me later and handles the cases I was worrying about fairly well. I could use your idea of a global but keep the timestamp and make GetMsgHdrForKey set the global value in the db.

I was also planning on hooking up the code that figures out which db's to close to the memory pressure notifications.

Andrew Sutherland [:asuth] (he/him)

Comment 4

•

12 years ago

Yeah, noticeable in this case was "anywhere on the radar"; I don't expect it to add up to anything meaningful.  If anything, I'm more concerned about the overhead of responding to well-meaning bug reporters who see Thunderbird being slow, run strace, and then report that gettimeofday() is probably the problem because it gets called a zillion times.  (Although apparently on x86_64 on linux there is a magic VDSO fast-path that can be used that just reads stuff out of allowed kernel space rather than touching glibc and issuing a syscall.)

neil@parkwaycc.co.uk

Comment 5

•

12 years ago

Comment on attachment 593559 [details] [diff] [review]
implement last use time for msg db's

>+  /// an enumerator to iterate over the open dbs.
>+  readonly attribute nsISimpleEnumerator openDBs;
This doesn't seem to be return the open dbs in a useful way. Slightly more useful would be an nsIArray, since you could at least count them, and you get to avoid the ugly function-that-returns-an-xpcom-array syntax, which would otherwise have been my preference.

Attachment #593559 - Flags: review?(neil) → review+

David :Bienvenu

Assignee

Comment 6

•

12 years ago

Attached patch cumulative wip — Details — Splinter Review

I think I will change to an nsIArray, but this is a snapshot of what I have so far.

David :Bienvenu

Assignee

Comment 7

•

12 years ago

I switched to an nsIArray but that made the cycle collector really unhappy and crashy. I used strong refs when calling AppendElement, and it doesn't seem like the object that made it crash had been deleted. I'll do a clobber build, but it's surprising.

David :Bienvenu

Assignee

Comment 8

•

12 years ago

Attached patch nsIArray approach that crashes in cycle collector (obsolete) — Details — Splinter Review

not sure what I'm doing wrong, but this crashes in the cycle collector after a while.

I added the deletes of the db arrays in the js module just to get the crash to occur faster, which it seems to.

David :Bienvenu

Assignee

Comment 9

•

12 years ago

Ugh, it seems as if we return an array of xpcom objects to js, we need to have the participants implement nsXPCOMCycleCollectionParticipant?

this code in nsCycleCollector:

    nsXPCOMCycleCollectionParticipant *cp;
    ToParticipant(child, &cp);

seems to cause problems when the QI fails, because

static inline void
ToParticipant(nsISupports *s, nsXPCOMCycleCollectionParticipant **cp)
{
    // We use QI to move from an nsISupports to an
    // nsXPCOMCycleCollectionParticipant, which is a per-class singleton helper
    // object that implements traversal and unlinking logic for the nsISupports
    // in question.
    CallQueryInterface(s, cp);

doesn't null out *cp, so if CallQI fails, we're left with garbage in cp.

I must be missing something, because that would seem like a rather large problem. Is our QI supposed to null out the out param before it returns an error? nsMsgDatabase implements it by hand, and perhaps the helper functions do it for you...I'll poke around a bit.

David :Bienvenu

Assignee

Comment 10

•

12 years ago

Andrew, I've noticed that the cycle collector will crash if the QI method of an object returns an error but doesn't null the out pointer. I can fix the offending code on my end to null the out pointer on error, but the cycle collector might want to protect against this as well. I'm cc'ing you since you seem to have been active in nsCycleCollector.cpp recently. I'm also wondering if it's possible that some of the other crash bugs in nsCycleCollector might be related to this issue.

Andrew McCreight [:mccr8]

Comment 11

•

12 years ago

Yes, that does sound really bad.  Basically, the QIs in the CC should probably check for an error and treat that as though a null was return?  I'm not very familiar with QIs, but I do recall that is the sort of behavior that's supposed to happen.  Thanks for letting me know, I'll file a new bug about it.

David :Bienvenu

Assignee

Comment 12

•

12 years ago

(In reply to Andrew McCreight [:mccr8] from comment #11)
> Yes, that does sound really bad.  Basically, the QIs in the CC should
> probably check for an error and treat that as though a null was return? 
Andrew M, thx for looking at this. Yes, static inline void
ToParticipant(nsISupports *s, nsXPCOMCycleCollectionParticipant **cp)

should check the rv of CallQueryInterface, and set cp to null if it's failure. Or just initialize *cp to nsnull before calling QI. 

I did a quick look through the mailnews and mozilla code, and the mailnews code is all OK (once I fixed nsMsgDatabase.cpp) and almost all the mozilla code looks fine, though there are a few suspicious things:

nsXPCWrappedJS::QueryInterface return an error without nulling the out param if !isValid().

AudioSession::QueryInterface() fails to null the out param on error.

same for ReadbackManagerD3D10::QueryInterface and COMMessageFilter::QueryInterface

nsXTFElementWrapper::QueryInterface is a little suspicious

(I don't know if any of these ever cause any actual problems, and the mailnews class that was broken is ancient code).

David :Bienvenu

Assignee

Comment 13

•

12 years ago

Attached patch fix for crash in cycle collector — Details — Splinter Review

carrying forward Neil's r+, requesting sr for db changes, and r+ for mail changes.

I've put the code that does the actual aging in a Thunderbird js file loaded by the TB 3 pane code. I imagine SeaMonkey will want to be able to age open db's as well, and perhaps it would just like to use the same code, but I'd like to shake out any issues in TB-land.

Attachment #599290 - Attachment is obsolete: true

Attachment #599389 - Flags: superreview?(mbanner)

Attachment #599389 - Flags: review+

Andrew McCreight [:mccr8]

Comment 14

•

12 years ago

(In reply to David :Bienvenu from comment #12)
> (I don't know if any of these ever cause any actual problems, and the
> mailnews class that was broken is ancient code).

I talked to jst, and he said that code that returns an error should null out what it returns.  But maybe we should just check anyways...

David :Bienvenu

Assignee

Comment 15

•

12 years ago

(In reply to Andrew McCreight [:mccr8] from comment #14)

> I talked to jst, and he said that code that returns an error should null out
> what it returns.  But maybe we should just check anyways...

It definitely should, and almost everyone does. But the cycle collector might want to be a bit defensive about this since it may have to handle arbitrary xpcom objects, including 3rd party binary components.

I don't know enough about the cycle collector to know when we'll get into trouble with bad QI implementations. I know my case was with an nsIArray, and it's probably unlikely that things like audiosessions end up in nsArrays exposed to js.

Mark Banner (:standard8)

Comment 16

•

12 years ago

(In reply to David :Bienvenu from comment #15)
> (In reply to Andrew McCreight [:mccr8] from comment #14)
> 
> > I talked to jst, and he said that code that returns an error should null out
> > what it returns.  But maybe we should just check anyways...
> 
> It definitely should, and almost everyone does.

My understanding of xpcom rules is that if you are returning/throwing an error code, you don't to touch the return/out parameter and it is up to the caller to check the error value. Exceptions to the rules can happen, but it has to be documented on the interface, and it isn't for QueryInterface...

David :Bienvenu

Assignee

Comment 17

•

12 years ago

(In reply to Mark Banner (:standard8) from comment #16)
> My understanding of xpcom rules is that if you are returning/throwing an
> error code, you don't to touch the return/out parameter and it is up to the
> caller to check the error value. Exceptions to the rules can happen, but it
> has to be documented on the interface, and it isn't for QueryInterface...

I think there's a pre-existing exception to that rule for QI, and the example code about how to write a good QI impl nulls the out param, and calls out the fact that it's doing so. And, all the NS_IMPL macros null the out param before doing anything else.

Benjamin Smedberg

Comment 18

•

12 years ago

There has been disagreement about what to do with outparams in failure cases. The current common practice is:

1) you may change an outparam to null and fail
2) you must not set an interface outparam to nonnull and then fail
3) some interfaces document that outparams are always nulled, and I believe that xpconnect will always null them

Wayne Mery (:wsmwk)

Updated

•

12 years ago

Blocks: 347837

Keywords: footprint, perf

Summary: add support for closing inactive databases → add support for closing inactive databases (folders)

Wayne Mery (:wsmwk)

Updated

•

12 years ago

Blocks: 722183

Mark Banner (:standard8)

Comment 19

•

12 years ago

Comment on attachment 593559 [details] [diff] [review]
implement last use time for msg db's

Assuming this patch is obsolete.

Attachment #593559 - Flags: superreview?(mbanner)

Mark Banner (:standard8)

Comment 20

•

12 years ago

Comment on attachment 599389 [details] [diff] [review]
fix for crash in cycle collector

Review of attachment 599389 [details] [diff] [review]:
-----------------------------------------------------------------

sr=me with the comments addressed. Not sure if we might want to land this after the merge to allow a bit more testing on it.

::: mail/base/modules/msgDBCacheManager.js
@@ +1,2 @@
> +/* ***** BEGIN LICENSE BLOCK *****
> + *   Version: MPL 2.0

The MPL 2 license block should just be three lines:

http://www.mozilla.org/MPL/headers/

::: mailnews/base/prefs/content/amUtils.js
@@ +75,2 @@
>          var directoryAlreadyUsed =
> +          messengerBundle.getFormattedString(

Is this a different bug or something? (it was bitrotted, and it doesn't seem related to this bug).

Attachment #599389 - Flags: superreview?(mbanner) → superreview+

Wayne Mery (:wsmwk)

Comment 21

•

12 years ago

I like the idea of landing after the merge. We can always decide push it into v14 at a later date.

David :Bienvenu

Assignee

Comment 22

•

12 years ago

(In reply to Wayne Mery (:wsmwk) from comment #21)
> I like the idea of landing after the merge. We can always decide push it
> into v14 at a later date.

yes, that's fine with me.

Wayne Mery (:wsmwk)

Updated

•

12 years ago

OS: Windows 7 → All

Hardware: x86_64 → All

David :Bienvenu

Assignee

Comment 23

•

12 years ago

fixed on trunk - http://hg.mozilla.org/comm-central/rev/83fd926bf1f8

Status: ASSIGNED → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Target Milestone: --- → Thunderbird 15.0

:aceman

Comment 24

•

12 years ago

What is considered an idle DB here? Will this close all the relevant .msf files after the timeout?

Blocks: 231606

David :Bienvenu

Assignee

Comment 25

•

12 years ago

(In reply to :aceman from comment #24)
> What is considered an idle DB here? Will this close all the relevant .msf
> files after the timeout?

in the patch, you'll see two new prefs that you can use to control this:

+// How long should we leave idle db's open, in seconds.
+pref("mail.db.idle_limit", 3000);
+// How many db's should we leave open? LRU db's will be closed first
+pref("mail.db.max_open", 30);

Serge Gautherie (:sgautherie)

Updated

•

12 years ago

Blocks: TB2SM

Wayne Mery (:wsmwk)

Comment 26

•

12 years ago

user with 231 open db at http://forums.mozillazine.org/viewtopic.php?f=39&t=2459825&p=11943887#p11943887

David :Bienvenu

Assignee

Updated

•

12 years ago

Depends on: 750080

Philip Chee

Updated

•

12 years ago

Blocks: 770559

Wayne Mery (:wsmwk)

Updated

•

12 years ago

Blocks: 526568

Wayne Mery (:wsmwk)

Updated

•

12 years ago

Blocks: 671613

Philip Chee

Updated

•

12 years ago

No longer blocks: TB2SM

Wayne Mery (:wsmwk)

Updated

•

12 years ago

Blocks: 495911

Jonathan Kamens

Comment 27

•

12 years ago

This has an impact which I suspect was not anticipated and which breaks at least some code (including my add-on, Send Later 3, which has 25,000 users).

See:

https://groups.google.com/d/topic/mozilla.dev.apps.thunderbird/_WnlEWM5Y0o/discussion

Wayne Mery (:wsmwk)

Comment 28

•

12 years ago

(In reply to Jonathan Kamens from comment #27)
> This has an impact which I suspect was not anticipated and which breaks at
> least some code (including my add-on, Send Later 3, which has 25,000 users).
> 
> See:
> 
> https://groups.google.com/d/topic/mozilla.dev.apps.thunderbird/_WnlEWM5Y0o/
> discussion

"I think the logic in nsMsgLocalMailFolder::UpdateFolder is
probably wrong, and a FolderLoaded event should be generated
when a folder is opened, not just when it is updated"

file a bug?

Mark Banner (:standard8)

Comment 29

•

12 years ago

Yes, please file a new bug for new issues. Commenting on old bugs will likely mean your comment will be lost over time.

:aceman

Comment 30

•

12 years ago

I think he filed bug 787557 for it.

Depends on: 787557

Wayne Mery (:wsmwk)

Updated

•

12 years ago

Depends on: 793455

Wayne Mery (:wsmwk)

Updated

•

11 years ago

Depends on: 794401

Wayne Mery (:wsmwk)

Updated

•

11 years ago

Depends on: 876548

:aceman

Updated

•

9 years ago

Depends on: 1135310

Wayne Mery (:wsmwk)

Comment 31

•

9 years ago

(In reply to Andrew McCreight [:mccr8] from comment #11)
> Yes, that does sound really bad.  Basically, the QIs in the CC should
> probably check for an error and treat that as though a null was return?  I'm
> not very familiar with QIs, but I do recall that is the sort of behavior
> that's supposed to happen.  Thanks for letting me know, I'll file a new bug
> about it.

Was such a bug filed?  Or was it decided no bug was needed?  (comment 14, etc)

Flags: needinfo?(continuation)

Andrew McCreight [:mccr8]

Comment 32

•

9 years ago

(In reply to Wayne Mery (:wsmwk) from comment #31)
> Was such a bug filed?  Or was it decided no bug was needed?  (comment 14,
> etc)

Oh, no, I'd totally forgotten about that.  Thanks for the reminder!  I filed bug 1135773 for making the CC more robust, and bug 1135772 for fixing some of the broken QIs that were pointed out.

Flags: needinfo?(continuation)

implement last use time for msg db's 12 years ago David :Bienvenu 16.80 KB, patch	neil : review+	Details \| Diff \| Splinter Review
cumulative wip 12 years ago David :Bienvenu 27.31 KB, patch		Details \| Diff \| Splinter Review
nsIArray approach that crashes in cycle collector 12 years ago David :Bienvenu 18.09 KB, patch		Details \| Diff \| Splinter Review
fix for crash in cycle collector 12 years ago David :Bienvenu 18.49 KB, patch	Bienvenu : review+ standard8 : superreview+	Details \| Diff \| Splinter Review