Closed Bug 450494 Opened 16 years ago Closed 16 years ago

Global Database functionality should go in MailNews core

Categories

(MailNews Core :: Database, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 3.0b1

People

(Reporter: asuth, Assigned: asuth)

References

()

Details

Attachments

(6 files, 1 obsolete file)

The 'gloda' extension provides global database functionality that should be in the core.

The current gloda mercurial repository can be found here:
http://hg.mozilla.org/users/bugmail_asutherland.org/index.cgi/gloda/

I suppose a reasonable place to put it might be in mailnews/db/global or something along those lines?

My current plan is to add a bunch of unit tests and then try and get reviews happening.
Flags: blocking-thunderbird3?
mailnews/db/global sounds like a reasonable place to me.
Definitely a 3.0 blocker, do we want to get this in before beta 1 as well?
Flags: blocking-thunderbird3?
Flags: blocking-thunderbird3.0b1?
Flags: blocking-thunderbird3+
I think we want this in before beta 1.  What is not clear to me is how "on" we want it (for beta 1); the switch would be to enable/disable event-driven indexing.

As long as we have user interface exposing the functionality, I think we should set it to "on" given the relative importance of the functionality and the desire to test it as much as possible.  If we have the UI, which would presumably be dependent on gloda*, I might go so far as to say we should not provide a way to turn it off.

I'm not sure what to do about SeaMonkey.  I doubt it wants the 'super search bar' (clarkbw's proposal to make the search text box giant, bumping all the reply/forward/etc. buttons to the message), so they would not be dependent on gloda.  Do we just not hook gloda in at all, support the enable/disable pref for when they do hook it in, and leave it up to SeaMonkey?

(side note: dmose also thinks mailnews/db/global is also good)

* Anything gloda can do, cross-folder search conceivably can also do.  So that could be hooked up, but the code/effort required for the fallback case would probably be much more than the gloda case, and negatively impact gloda-based development.
(In reply to comment #3)
> I'm not sure what to do about SeaMonkey.

I'd go for having at least all the backend built in no matter how much it is used in UI or not.
Flags: blocking-thunderbird3.0b1? → blocking-thunderbird3.0b1+
Priority: -- → P1
(In reply to comment #3)
> As long as we have user interface exposing the functionality, I think we should
> set it to "on" given the relative importance of the functionality and the
> desire to test it as much as possible.

There is not yet any such UI blocking beta 1, correct?  Should there be?  If so, what?
(In reply to comment #5)
> There is not yet any such UI blocking beta 1, correct?  Should there be?  If
> so, what?

The candidate UI to go in the core, as I understand has been proposed is:

- 'Super search bar'/exptoolbar.  All toolbar buttons except for "get mail" and "write" are relocated, and the search bar becomes giant.  The search bar is enabled with auto-completion and is used to formulate gloda queries which populate a gloda-backed nsIMsgDBView.  (Alternatively, gloda could cram messages into a cross-folder search view.)

- 'Message Reader' (http://clarkbw.net/designs/msg-reader/) The toolbar buttons are relocated into the message preview area (both at the top and the bottom of the message).  The headers become part of the same scrolling region as the message body itself.

The 'message reader' is dependent on layout bug 80713, which is required to have the header and body scroll together as one.  Everything else is a question of (Thunderbird) leg-work.  Most of the 'super search bar' exists in a primitive fashion, the message reader has fairly thorough HTML prototypes (as linked).

I do not know whether said UI should block beta 1.  Given that it constitutes a fairly radical change, I think it would be great to get it in for beta 1 for many reasons.  I fear it is a 'stretch' goal (not 'in the bag'), so I guess I'll try and do a quick push to determine feasibility.
The UI I speak of for the 'message reader' is apparently bug 449691, which is blocking b1.  It is also currently not gloda-dependent, so I'm not marking it as a blocker or dependency.
Target Milestone: --- → Thunderbird 3.0b1
Whiteboard: [needs review bienvenu] [needs review dmose]
In addition to applying this patch, please check out https://hg.mozilla.org/users/bugmail_asutherland.org/gloda/ to mailnews/db/global.  (ex: "cd mailnews/db; hg clone http://hg.mozilla.org/users/bugmail_asutherland.org/index.cgi/gloda/ global").  (The gloda repository can currently be used as an extension or built as part of the tree.)  Please indicate the revision/changeset in use when providing review comments (via "hg head").

There is still a little more to do for the unit tests, but I believe the current level of documentation should be more than sufficient, especially when starting from modules/gloda.js.  (The asynchronous nature of the indexing complicates things.)
Attachment #335946 - Flags: superreview?
Attachment #335946 - Flags: review?
Comment on attachment 335946 [details] [diff] [review]
Link checked-out gloda into the build process.

Hope the bugzilla upgrade fixes the requestee oddness...
Attachment #335946 - Flags: superreview?(bienvenu)
Attachment #335946 - Flags: superreview?
Attachment #335946 - Flags: review?(dmose)
Attachment #335946 - Flags: review?(bienvenu)
Attachment #335946 - Flags: review?
Minor changes were made to the POP3 fake server allow gloda testing to feed it synthesized messages rather than having the messages read from a file.  This also addresses two apparent syntactic errors and removes directory service-ish debug that shouldn't actually provide any debugging insight.
Attachment #336694 - Flags: review?(bienvenu)
Attachment #336694 - Flags: review?(bienvenu) → review+
(In reply to comment #6)
> 
> - 'Message Reader' (http://clarkbw.net/designs/msg-reader/) The toolbar buttons
> are relocated into the message preview area (both at the top and the bottom of
> the message).  The headers become part of the same scrolling region as the
> message body itself.

How does this relate to gloda?  (FWIW, I suspect a bunch, though perhaps not all, of the message reader work will make B1).
(In reply to comment #11)
> How does this relate to gloda?  (FWIW, I suspect a bunch, though perhaps not
> all, of the message reader work will make B1).

Super search bar depends on the message reader to be a UI possibility.  It gets rid of all the buttons on the toolbar, so it needs the message reader's buttons.
(In reply to comment #10)
> Created an attachment (id=336694) [details]
> ...removes directory service-ish
> debug that shouldn't actually provide any debugging insight.

-          } else {
-            dump("Wants directory: "+prop+"\n");

Andrew I don't see why you feel the need to delete this. If you're getting it a lot, then maybe something is requesting a directory more than it should be (or we need to fix it). However I see it as a useful tool because if obtaining the directory is buried in code somewhere, it can be very hard to realise that the directory isn't being obtained because we haven't provided a test for it at xpcshell level.
(In reply to comment #13)
> we need to fix it). However I see it as a useful tool because if obtaining the
> directory is buried in code somewhere, it can be very hard to realise that the
> directory isn't being obtained because we haven't provided a test for it at
> xpcshell level.

So, I thought I was the one who added it in a fit of debugging.  I realized this was not the case when it showed up in the patch, but I left it in because it seemed like it might be more confusing to someone who didn't understand how nsIDirectoryServiceProvider and family work.

For example, I was experiencing a problem where popstate.dat could not be found.  I saw that output, saw the source of the dump, saw that it was throwing an exception, and assumed that I needed to be providing another directory path in the handler.  Of course, it turns out that the exception is actually a normal thing to do and allows the built-in logic to just build off the single directory path we do provide.  So the debug there led me down the wrong path.

My concern can easily be mitigated by adding a comment to the code with the reinstatement of the debug.  Does that sound good?
I've started looking through/running the code. It was pulled yesterday:

changeset:   87:5f1106c3cae4
tag:         tip
user:        Andrew Sutherland <asutherland@asutherland.org>
date:        Wed Sep 03 02:09:58 2008 -0700
summary:     indexing unit test actually works!

I apologize if there are answers to these questions written down somewhere I haven't seen...

I don't see the gloda code using the idle service to do background indexing on idle - am I missing that? Is that planned, or does it not seem like the right thing to do?

Should I be installing the expmess xpi  to see the status of indexing? Is there a newer version of it than 0.0-m1?

I'm seeing lots of assertions about the tree row count getting out of sync - those are a sign of not sending the right notifications to the tree control.

I see this assertion when picking the "index everything" menu item:


 	xpcom_core.dll!NS_DebugBreak_P(unsigned int aSeverity=0x00000001, const char * aStr=0x05719678, const char * aExpr=0x05719668, const char * aFile=0x05719628, int aLine=0x0000012e)  Line 359 + 0xc bytes	C++
 	mime.dll!MimeObject_parse_eof(MimeObject * obj=0x064878a8, int abort_p=0x00000000)  Line 302 + 0x25 bytes	C++
 	mime.dll!MimeContainer_parse_eof(MimeObject * object=0x064878a8, int abort_p=0x00000000)  Line 129 + 0xe bytes	C++
 	mime.dll!MimeContainer_finalize(MimeObject * object=0x064878a8)  Line 98 + 0x10 bytes	C++
 	mime.dll!MimeUntypedText_finalize(MimeObject * object=0x064878a8)  Line 117 + 0xa bytes	C++
 	mime.dll!mime_free(MimeObject * object=0x064878a8)  Line 323 + 0xe bytes	C++
 	mime.dll!MimeContainer_finalize(MimeObject * object=0x09690148)  Line 109 + 0x9 bytes	C++
 	mime.dll!MimeMessage_finalize(MimeObject * object=0x09690148)  Line 126 + 0xa bytes	C++
 	mime.dll!mime_free(MimeObject * object=0x09690148)  Line 323 + 0xe bytes	C++
 	mime.dll!mime_display_stream_complete(_nsMIMESession * stream=0x09690238)  Line 989 + 0x9 bytes	C++
 	mime.dll!nsStreamConverter::OnStopRequest(nsIRequest * request=0x0968a37c, nsISupports * ctxt=0x09689f90, unsigned int status=0x00000000)  Line 1049 + 0xf bytes	C++

as well as this one, quite frequently:


 	xpcom_core.dll!NS_DebugBreak_P(unsigned int aSeverity=0x00000001, const char * aStr=0x0571f198, const char * aExpr=0x0571f164, const char * aFile=0x0571f124, int aLine=0x00000113)  Line 359 + 0xc bytes	C++
 	mime.dll!nsSMimeVerificationListener::Release()  Line 275 + 0x6c bytes	C++
 	pipnss.dll!nsCOMPtr<nsISMimeVerificationListener>::~nsCOMPtr<nsISMimeVerificationListener>()  Line 526	C++
 	pipnss.dll!nsSMimeVerificationJob::~nsSMimeVerificationJob()  Line 88 + 0x32 bytes	C++
 	pipnss.dll!nsSMimeVerificationJob::`scalar deleting destructor'()  + 0xf bytes	C++
 	pipnss.dll!nsCertVerificationThread::Run()  Line 150 + 0x20 bytes	C++
 	pipnss.dll!nsPSMBackgroundThread::nsThreadRunner(void * arg=0x02aa0c80)  Line 45	C++

Is this comment obsolete? You're not using setTimeout, or this._messenger, in indexer.js

    // we need this for setTimeout... what to do about this?
    this._messenger = Cc["@mozilla.org/messenger;1"].
                        createInstance(Ci.nsIMessenger);
    
    this._timer = Cc["@mozilla.org/timer;1"].createInstance(Ci.nsITimer);

do we expect this pref to live permanently? If so, it's worth putting in mailnews.js so that it shows up in about:config and has a default.
    
    let branch = prefService.getBranch("mailnews.database.global.indexer");

might add a comment referring to the NS_ERROR define you're catching here:

      catch ( e if e.result == 0xc1f30001) {
        // this means that we need to pend on the update.
        this._log.debug("Pending on folder load...");


we generally put spaces around the '=', at least in c++ - collection.js
    let iWrite=0;
    for (let iRead=0; iRead < items.length; iRead++) {

datamodel.js:

            let values = this[attrDef.boundName];
            for (let iValue=0; iValue < values.length; iValue++)

GlodaConversation.prototype = {

Conversations might want have an attribute for the number of unread messages, though it's easily calculated, obviously.

I'm going through the js files one by one - there's a lot to grok!
(In reply to comment #14)
> My concern can easily be mitigated by adding a comment to the code with the
> reinstatement of the debug.  Does that sound good?

Yes, maybe a clearer debug statement as well - I think it'd just be good to highlight if we are getting things wrong.
(In reply to comment #15)
> changeset:   87:5f1106c3cae4

> I don't see the gloda code using the idle service to do background indexing on
> idle - am I missing that? Is that planned, or does it not seem like the right
> thing to do?

Correct, the idle service is not currently used.  The initial implementation just used a timer interval to schedule things, and it appears to work reasonably well.  Which is, no one who has tried gloda thus far has complained about a lack of responsiveness.  In practice, the introduction of the body fetching has actually slowed things down even more.

My original plan was to use the idle service to control the parameters of the timer-based indexing.  Those variables being: the number of milliseconds between re-scheduling, and the number of messages processed before we stop processing pending another timer interrupt (captured using a token bucket-style implementation).  The idea was that when the machine was idle, we could throttle up our indexing.

I think this general idea is the right idea, because we *do* want to index when the user is around, otherwise we lose the ability to keep up with new messages / changed messages.  At least, for new/changed messages.  When we are just trying to get gloda up to speed, that part is not as important, but we still want to get up to speed as quickly as possible.

There's also some forthcoming interaction with Emre's automatic IMAP offline-fetching stuff that needs to happen.  I haven't entirely thought it through, but it seems like, to a large extent, we could just play follow-the-leader on the offline fetching... but there are a large number of cases that need to be dealt with.

Your thoughts/input are greatly appreciated on this one.

> Should I be installing the expmess xpi  to see the status of indexing? Is there
> a newer version of it than 0.0-m1?

Yes, expmess is the only way to see what is happening.  The XPI is very old and probably will break things.  I would install the extension directly from hg.  (So, check out https://hg.mozilla.org/users/bugmail_asutherland.org/expmess/, then create a file "expmess@mozillamessaging.com" in your profile's extensions directory that provides the path to the checked-out directory.)

> I'm seeing lots of assertions about the tree row count getting out of sync -
> those are a sign of not sending the right notifications to the tree control.

Yes.  I think the hg version of expmess improves that, but I'm not sure it's entirely fixed.

> I see this assertion when picking the "index everything" menu item:
> 
>      xpcom_core.dll!NS_DebugBreak_P(unsigned int aSeverity=0x00000001, const
> char * aStr=0x05719678, const char * aExpr=0x05719668, const char *
> aFile=0x05719628, int aLine=0x0000012e)  Line 359 + 0xc bytes    C++
>      mime.dll!MimeObject_parse_eof(MimeObject * obj=0x064878a8, int
> abort_p=0x00000000)  Line 302 + 0x25 bytes    C++
>      mime.dll!MimeContainer_parse_eof(MimeObject * object=0x064878a8, int
> abort_p=0x00000000)  Line 129 + 0xe bytes    C++

I'm not familiar with that one.  What's the actual assertion text?  What type of message was it trying to process?

> as well as this one, quite frequently:
> 
pipnss.dll!nsCOMPtr<nsISMimeVerificationListener>::~nsCOMPtr<nsISMimeVerificationListener>()
>  Line 526    C++
>      pipnss.dll!nsSMimeVerificationJob::~nsSMimeVerificationJob()  Line 88 +
> 0x32 bytes    C++

I don't think I've actually seen this assertion either, but I had noticed that the SMime stuff was getting involved in an undesirable fashion not too long ago.  It apparently feeds events to the lock icon on the message pane, and may be involved in the deadlocks I had noticed a-ways back.

> Is this comment obsolete? You're not using setTimeout, or this._messenger, in
> indexer.js
> 
>     // we need this for setTimeout... what to do about this?

Yes, it's obsolete and went away sometime yesterday in a clean-up/update pass.  _messenger (which oddly inherited the comment) also went away... I think that was detritus from the mime-body-fetching before it entirely migrated to mimemsg.js.

> do we expect this pref to live permanently? If so, it's worth putting in
> mailnews.js so that it shows up in about:config and has a default.
> 
>     let branch = prefService.getBranch("mailnews.database.global.indexer");

I'm not sure.  I think Thunderbird will always want the event-driven indexing active, but it sounds like SeaMonkey may not want it on by default?  I created the pref as a better alternative to having chrome reach in to turn that on, but without any deep rationale.
 
> datamodel.js:
> 
> GlodaConversation.prototype = {
> 
> Conversations might want have an attribute for the number of unread messages,
> though it's easily calculated, obviously.

What specific use-case are you imagining here?  We can easily create a getter that performs this tally (by forcing the messages to be loaded and then tallying).  If the goal is just to be able to query for conversations with unread messages, the message attribute schema lets us find conversations with at least one unread message (although the query mechanism does not yet expose that).  But if we want to query conversations based on their number of un-read, or ordering them, then it definitely makes sense to add the attribute explicitly and take the complexity hit to maintain it.  (We're already part-way there with the popularity for contacts and our desire to track first message date/last message date for the conversation, though.)
 
> I'm going through the js files one by one - there's a lot to grok!

I do get paid by the line.
(In reply to comment #15)
> as well as this one, quite frequently:
> 
>      xpcom_core.dll!NS_DebugBreak_P(unsigned int aSeverity=0x00000001, const
> char * aStr=0x0571f198, const char * aExpr=0x0571f164, const char *
> aFile=0x0571f124, int aLine=0x00000113)  Line 359 + 0xc bytes    C++
>      mime.dll!nsSMimeVerificationListener::Release()  Line 275 + 0x6c bytes   
pipnss.dll!nsCOMPtr<nsISMimeVerificationListener>::~nsCOMPtr<nsISMimeVerificationListener>()
>  Line 526    C++
>      pipnss.dll!nsSMimeVerificationJob::~nsSMimeVerificationJob()  Line 88 +
> 0x32 bytes    C++
>      pipnss.dll!nsSMimeVerificationJob::`scalar deleting destructor'()  + 0xf
> bytes    C++
>      pipnss.dll!nsCertVerificationThread::Run()  Line 150 + 0x20 bytes    C++
>      pipnss.dll!nsPSMBackgroundThread::nsThreadRunner(void * arg=0x02aa0c80) 
> Line 45    C++

I believe this is the assertion described by bug 379661.  So, while the problem is not confined to gloda, I think I still need to investigate/deal with the SMIME interaction.
yes, I probably just saw it because gloda streams all my messages through libmime, the weakest link :-)
OK, I've updated to use the expmess repository.

Re using the idle service to know when to throttle up the indexing, I think that's a good plan. My brief experience with gloda and expmess is that indexing while I'm doing stuff is pretty painful, so much so that I would really throttle down the amount of stuff that's done when not idle. When new messages arrive, I would put them on a queue of messages to be indexed, and not index them right away. Similarly, if the user selects 100 messages and deletes them, we'd really like to avoid doing all the gloda housekeeping right away (it looks like you have infrastructure in place to do all this, so it's probably just a matter of being less aggressive while not idle). Also (I apologize for not fully groking the code :-) ), can/do you do the body indexing separate from the basic adding of the header to gloda? As you say, streaming the message through libmime is pretty cpu intensive.

I realize there are tradeoffs between having the gloda data up to date, and keeping the UI performant, and there's always the shutdown case. It might be simplest just to remember which folders have pending operations, and sync those on the next run, instead of building full-blown persistance of your job queues. 

I'm on Windows, on a rather old box, but I really did feel the pain when new mail came in and they started getting indexed while I was reading and deleting them...

I haven't seen the row count assertions since updating expmess.

>I'm not familiar with that one.  What's the actual assertion text?  What type
> of message was it trying to process?
I'll save it next time I see it - "already parse eof/eom" was one of them, iirc.

>> Conversations might want have an attribute for the number of unread messages,
>> though it's easily calculated, obviously.

> What specific use-case are you imagining here? 
Nothing specific - as a user, I find the unread count in a thread to be a very interesting bit of information, often, but I'm not sure if that translates into the need to be able to query based on that. And it's a royal pain when that count gets wrong :-)
(In reply to comment #20)

Just want to chime in that I'd like to keep the UI (future expmess systems) up to date as possible.  If that means we only index the headers and then queue up the body or throttle the indexing or a combination of those.  It's very important for our user experience to keep the index as updated as possible even while the user is performing other actions; however it's probably more important for headers than the body.
(In reply to comment #20)
> My brief experience with gloda and expmess is that indexing
> while I'm doing stuff is pretty painful, so much so that I would really
> throttle down the amount of stuff that's done when not idle.

Seems like chunking up the work into a finer granularity might be another way to address this.
(In reply to comment #21)
> Just want to chime in that I'd like to keep the UI (future expmess systems) up
> to date as possible.  If that means we only index the headers and then queue up
> the body or throttle the indexing or a combination of those.  It's very
> important for our user experience to keep the index as updated as possible even
> while the user is performing other actions; however it's probably more
> important for headers than the body.

Because some gloda plugins may alter important attributes of the message as indexed, gloda really needs to have all of the headers available when indexing.  Without doing streaming, all we have is what the nsIMsgDBHdr provides to us, which is only a subset of the headers.  When we stream the body, we get all the headers and the body (and potential attachment information).

We could try and have the gloda plugins register the headers they might want to see and force Thunderbird to fetch those during its nsIMsgDBHdr population.  We could also try streaming just the headers.  But I think both of those are probably premature optimization.

I think we need some numbers.  It probably doesn't help that gloda is also currently logging a rather large amount of debug to stdout (via log4moz via dump)... that could alter performance too.

One optimization that would also probably not be premature would be to move towards using the async storage API.  The transaction commit happens from the main thread right now, which will never be cheap (fsync), not to mention the rest of the synchronous queries going on.
Here's the assertion I see - "obj already parsed" is the string, I think.

 	xpcom_core.dll!NS_DebugBreak_P(unsigned int aSeverity=0x00000001, const char * aStr=0x02ae9678, const char * aExpr=0x02ae9668, const char * aFile=0x02ae9628, int aLine=0x0000012e)  Line 359 + 0xc bytes	C++
>	mime.dll!MimeObject_parse_eof(MimeObject * obj=0x0c9c04d0, int abort_p=0x00000000)  Line 302 + 0x25 bytes	C++
 	mime.dll!MimeContainer_parse_eof(MimeObject * object=0x0c9c04d0, int abort_p=0x00000000)  Line 129 + 0xe bytes	C++
 	mime.dll!MimeContainer_finalize(MimeObject * object=0x0c9c04d0)  Line 98 + 0x10 bytes	C++
 	mime.dll!MimeUntypedText_finalize(MimeObject * object=0x0c9c04d0)  Line 117 + 0xa bytes	C++
 	mime.dll!mime_free(MimeObject * object=0x0c9c04d0)  Line 323 + 0xe bytes	C++
 	mime.dll!MimeContainer_finalize(MimeObject * object=0x0c9bc058)  Line 109 + 0x9 bytes	C++
 	mime.dll!MimeMessage_finalize(MimeObject * object=0x0c9bc058)  Line 126 + 0xa bytes	C++
 	mime.dll!mime_free(MimeObject * object=0x0c9bc058)  Line 323 + 0xe bytes	C++
 	mime.dll!mime_display_stream_complete(_nsMIMESession * stream=0x0c9bc128)  Line 989 + 0x9 bytes	C++
 	mime.dll!nsStreamConverter::OnStopRequest(nsIRequest * request=0x0c9b6b1c, nsISupports * ctxt=0x0c9b67e0, unsigned int status=0x00000000)  Line 1049 + 0xf bytes	C++
 	msgbsutl.dll!nsMsgProtocol::OnStopRequest(nsIRequest * request=0x0c9bc6e8, nsISupports * ctxt=0x0c9b67e0, unsigned int aStatus=0x00000000)  Line 389 + 0x5a bytes	C++
3.0b1 flag is going away in favour of 3.0 flag and milestone combination.
Flags: blocking-thunderbird3.0b1+
  md5HashString: function gloda_utils_md5hash(aString) {
    let converter = Cc["@mozilla.org/intl/scriptableunicodeconverter"].
                    createInstance(Ci.nsIScriptableUnicodeConverter);
    let trash = {};
    converter.charset = "UTF-8";
    let emailArr = converter.convertToByteArray(aString, trash);

    let hasher = Cc['@mozilla.org/security/hash;1'].

"emailArr" seems an odd var name here, if this is a generic utility function :-)


Some of this code is commented out - does that make it inconsistent with the comment about 'infinity'?

    // However, if this state is a leaf node (end == 'infinity'), then 'end'
    //  isn't describing an edge at all and we want to avoid accounting for it.
    let delta;
    /*
    if (state.end != this._infinity)
      //delta = index - end + 1;
      delta = end - (index - state.length); 
    else */
    delta = index - state.length - end + 1;

There are a lot of dump statements, e.g.,

dump("considering " + this._str.slice(stringStart, stringEnd) + " with pos " + 
     stringStart + ":" + stringEnd + " with state " +
     aState.start + ":" + aState.end +
     "(patFirst:" + patternFirst + " patLast: " + patternLast +
     " delta: " + aDelta + ")\n");

  if (patternFirst >= stringStart) {
    if (!(stringStart in aPresence)) {
dump("  adding! (patternFirst: " + patternFirst + ")\n");
      aPresence[stringStart] = true;
      aResults.push(this._offsetsToItems[mid*3+2]);
    }
  }
  else {
dump("  disregarding because pattern is not contained. (patternLast:" +
  patternLast + " aPatLength: " + aPatLength + " stringstart: " +
  stringStart + "\n"); 
  }

these should either be turned into logging, or removed, or turned into calls to a function which calls dump based on a global bool, defaulted to false.
I think you also need this patch to get gloda to build if you build from the top - I had to build from mailnews/db otherwise, to get gloda to build. I haven't finished checking that this will fix it, but I suspect it will.
I was never seeing anything from expmess when I clicked on a message, other than the frowny face :-( This fixes it for me. I suspect we were getting an ldap directory, which throws an exception when this is called.  Andrew, let me know if you want me to just land this in your repo, or if you want to fix it another way...
Attachment #338137 - Flags: review?(bugmail)
Comment on attachment 338137 [details] [diff] [review]
get expmess working if  exception is thrown getting card for e-mail address - checked in

LDAP, eh?  Feel free to land it in the gloda repo.
Attachment #338137 - Flags: review?(bugmail) → review+
(In reply to comment #29)
> (From update of attachment 338137 [details] [diff] [review])
> LDAP, eh?  Feel free to land it in the gloda repo.

Yes, not all address book types can support this function at the moment.
Comment on attachment 338137 [details] [diff] [review]
get expmess working if  exception is thrown getting card for e-mail address - checked in

finally coerced hg into letting me push
Attachment #338137 - Attachment description: get expmess working if exception is thrown getting card for e-mail address → get expmess working if exception is thrown getting card for e-mail address - checked in
One other issue I've run into - gloda seems to update folders on startup, and I run into an assertion because we're in the middle of classifying new mail as junk or not junk, because the ui is getting new mail at the same time. I've hit this for my pop3 inbox, and the assertion is in nsMsgLocalMailFolder::SpamFilterClassifyMessages. This is a potentially bad assertion because if we get out of sync with our classification requests, it can break things like new mail notification and junk mail processing. 

There's no particular reason for the indexer to update local folders, since unlike IMAP, it doesn't really do anything interesting as far as the indexer is concerned. I guess you're doing it to make sure the .msf is up to date...which is a bit of an edge case, but I can see why you'd want to do that. But consider the imap case, where update folder will issue a select on the imap server and download new messages; does the indexer want to do that? As you've pointed out before, you'll need to sync with Emre's auto update stuff as well...

You could actually just call GetDatabaseWithReparse for local mail folders. Or I could add a new method to nsIMsgFolder that does what you want for both kinds of folders. UpdateFolder is really meant more for opening folders in the UI, since it kinda assumes that's what's going on (hence the running of the junk mail controls). Or I could treat UpdateFolder with a null msg window as meaning it's not coming from the UI...
we were generating an error before because the base class returns an error. Gloda & friends wants this not to error out.
Attachment #338704 - Flags: superreview?(neil)
Attachment #338704 - Flags: review?(bugmail)
(In reply to comment #32)
> One other issue I've run into - gloda seems to update folders on startup, and I
> run into an assertion because we're in the middle of classifying new mail as
> junk or not junk, because the ui is getting new mail at the same time. I've hit

Right, I initially did this for local folders because sometimes in my testing (because of me, generally), the .msf files did not yet exist.  I wanted to use GetDatabaseWithReparse, but wanted to use nsIMsgFolder and avoid a downcast.  It seemed like a nice freebie that the IMAP state should be up-to-date to boot.  (After all, what's the point of a global index if you have to click on things to get the results to be accurate...)

Since Emre's code will be a given, I think I can rely on it to make sure the IMAP is sufficiently up-to-date.  That just leaves avoiding dying when the msf doesn't exist yet.  If that means an instanceof check and a call to GetDatabaseWithReparse, that works for me.
Attachment #338704 - Flags: review?(bugmail) → review+
Comment on attachment 338704 [details] [diff] [review]
make seach view return a view type - checked in

Patch looks good; applies and makes the problem go away.  Danke.
Attachment #338704 - Flags: superreview?(neil) → superreview+
Comment on attachment 338704 [details] [diff] [review]
make seach view return a view type - checked in

Basically this function is the inverse of the switch in CreateBareDBView, right?
Comment on attachment 338704 [details] [diff] [review]
make seach view return a view type - checked in

yes, it's the same kind of thing...
Attachment #338704 - Attachment description: make seach view return a view type → make seach view return a view type - checked in
Alias: gloda
* I shut Thunderbird down while it was in the middle of downloading my
  giant IMAP Trash folder the first time.  Saw the following debug
  output:

2008-09-17 19:11:14     gloda.indexer   INFO    Shutting Down
2008-09-17 19:11:14     gloda.indexer   INFO    Event-Driven Indexing is now false
2008-09-17 19:11:14     gloda.datastore INFO    Closing pending transaction out for shutdown.
2008-09-17 19:11:14     gloda.datastore INFO    Closing async connection
WARNING: sqlite3_close failed. There are probably outstanding statements!: file /Users/dmose/s/gloda/src/mozilla/storage/src/mozStorageConnection.cpp, line 236
************************************************************
* Call to xpconnect wrapped JSObject produced this error:  *
[Exception... "Component returned failure code: 0x8052000e (NS_ERROR_FILE_IS_LOCKED) [mozIStorageConnection.close]"  nsresult: "0x8052000e (NS_ERROR_FILE_IS_LOCKED)"  location: "JS frame :: file:///Users/dmose/s/gloda/obj-tb/mozilla/dist/ShredderDebug.app/Contents/MacOS/modules/datastore.js :: gloda_ds_shutdown :: line 493"  data: no]
************************************************************
2008-09-17 19:11:14     gloda.datastore INFO    Closing async connection
************************************************************
* Call to xpconnect wrapped JSObject produced this error:  *
[Exception... "Component returned failure code: 0xc1f30001 (NS_ERROR_NOT_INITIALIZED) [mozIStorageConnection.close]"  nsresult: "0xc1f30001 (NS_ERROR_NOT_INITIALIZED)"  location: "JS frame :: file:///Users/dmose/s/gloda/obj-tb/mozilla/dist/ShredderDebug.app/Contents/MacOS/modules/datastore.js :: gloda_ds_shutdown :: line 493"  data: no]
************************************************************

* Doing it a second time, I ended up with a few of these as well:

###!!! ASSERTION: Still pending events!: 'mPendingEvents.Count() == 0', file /Users/dmose/s/gloda/src/mozilla/storage/src/mozStorageEvents.cpp, line 462

Both of these were running without the IMAP changes from Emre, FWIW.
per dmose's request I have created a wiki page to track the various issues raised by reviewers here:
https://wiki.mozilla.org/User:Andrew_Sutherland/MailNews/GlobalDatabase/Review

It is accessible off of the URL link on the bug as well.

I have no problem with me maintaining this page as a derived status indicator from the bug here.  Alternatively, if the reviewers would prefer, we can move much of the review talk there with.  (One can 'watch' the page and enable e-mail notification on watch changes to know something has changed.)
Whiteboard: [needs review bienvenu] [needs review dmose] → [under review bienvenu] [under review dmose]
Depends on: 456272
Attachment #337873 - Attachment is patch: true
Due to not quite being ready, this is being moved out to beta 2.
Target Milestone: Thunderbird 3.0b1 → Thunderbird 3.0b2
I think we still want a pref to be able to disable gloda, before we land this. I think a hidden pref is OK for nightlies but Bryan's thoughts would be helpful.
bug 455978 has the pref design (or possible lack of...)
this is what I landed to turn on gloda in the build system - changeset:   1007:7d8fe0015267
Attachment #346312 - Flags: superreview+
Attachment #346312 - Flags: review+
marking fixed - we should have a follow up bug for turning it on by default, if that's OK (otherwise, we could re-open this)
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
> we should have a follow up bug for turning it on by default

What do I need to do to "turn it on"?

Set a pref or enable a build option? This pref?
+// Should the indexer be enabled?
+pref("mailnews.database.global.indexer.enabled", false);

BTW: Can you please change the comment so that it's not repeating the pref name? For example, I don't know what "indexer" means? Does it mean enabling gloda? Or the full-text indexing?

Either way, I want to enable gloda, but not full-text indexing. How do I do that?
(In reply to comment #46)
> BTW: Can you please change the comment so that it's not repeating the pref
> name? For example, I don't know what "indexer" means? Does it mean enabling
> gloda? Or the full-text indexing?

Indexing results in the creation of the gloda representation of the message.  If you don't enable indexing, gloda is there, it just won't process any messages (or address book cards).  The 'gloda representation of messages' includes storing the subject, textual body of a message, and names of attachments in a full-text search capable manner.

Indexing only happens on messages that are offline (so local and IMAP offline)
 
> Either way, I want to enable gloda, but not full-text indexing. How do I do
> that?

You can't right now.
(In reply to comment #46)
> 
> Either way, I want to enable gloda, but not full-text indexing. How do I do
> that?

Why?  What's the use case?
I have lots of mail and:
* I don't want to have it all in /one/ huge database file, together with important metadata.
* I don't want another copy of it - it's a waste of space in my case, as my server is on the local network.
* I can use the TB UI to search on the server when I need full-text search.
* There can be privacy implications of having another copy of stuff that the user is not aware of (e.g. has deleted it in another client and assumes it's gone).


Having a 2GB cache file in my home folder makes quite a few things harder, e.g. backup, rsync, fulltext search across my home folder using grep etc..

E.g. today, TB kept crashing and I deleted all non-important data from my profile.

In fact, I'd recommend that you put the fulltext cache into a separate DB file (on disk). You'd still use the IDs from the metadata DB, so it shouldn't be too much of a problem. That would allow easy deletion, exclusion from backup/sync, recovery, etc..

I think the pref to turn this on/off should even be user-visible, as it's a tradeoff and it doesn't work for some people. As timeless said in Barcelona, you don't want that on an EeePC or Nokia, and he was assured that such people  could turn full-text indexing off.


BTW, I am assuming that gloda is just a cache and will never ever be the primary storage for any important user data, e.g. address books, otherwise all the problems above would be far worse, plus many other serious problems.
Note that we currently also have a per-folder and per-account pref to control "offline" storage.
Is there a special reason why we now have those additional menuitems in context and tools menus that 1) are non-understandable for normal users, 2) sound like they're for gloda testing/debugging only and 3) are just confusing (I'm used to "delete message" being the last item on the message context menu)?
I see those in the current state (gloda isn't even turned on, right?) in my SeaMonkey builds and I don't think we should go and deliver any pre-release of either of our products with those confusing menuitems (btw, a menuitem that doesn't do anything visible and has no feedback at all is bad as well).
(In reply to comment #49)
> Having a 2GB cache file in my home folder makes quite a few things harder, e.g.
> backup, rsync, fulltext search across my home folder using grep etc..

Is gloda's database growing to 2 GB for you, or is this a made-up number?  How much mail are we talking about?

What alternate location would you propose for the file so this does not pose a problem to your use cases, while also not compromising the privacy use case you also cite above?
 
> In fact, I'd recommend that you put the fulltext cache into a separate DB file
> (on disk). You'd still use the IDs from the metadata DB, so it shouldn't be too
> much of a problem. That would allow easy deletion, exclusion from backup/sync,
> recovery, etc..

I'm not sure I see the benefit from separating them.  The user either wants full-text search or they don't.  If they want full-text search, why would they delete the full-text search database?  In order to rebuild the full-text search index, we'd need to re-index everything again, so they might have just as well have deleted both databases.

And if they don't want to back-up the full-text index, why would they want to back up the meta-data index?

This does suggest that perhaps the database name should be something like "global-message-index-you-can-delete-me-but-it-will-take-a-while-to-regenerate-me.sqlite".

> BTW, I am assuming that gloda is just a cache and will never ever be the
> primary storage for any important user data, e.g. address books, otherwise all
> the problems above would be far worse, plus many other serious problems.

"Gloda is a index" is the motto.
> Is gloda's database growing to 2 GB for you, or is this a made-up number?

The number is made up.
(I have not enabled it yet, because I don't want full-text indexing.)

> How much mail are we talking about?

10 years of mail :).
Not including spam, but including attachments (which I know you don't index), it's about 1.5 GB (surprisingly little, actually).

Including all spam, it's some 100 GB.

> What alternate location would you propose for the file so this does
> not pose a problem to your use cases, while also not compromising
> the privacy use case you also cite above?

Good question. There isn't really a place for user-specific caches or non-primary application data, at least on Unix, and it's a real pain.

I think there's no better place than /home/.../.thunderbird/ for gloda.

> "Gloda is a index" is the motto.

Perfect! :-)

BTW: It's not visible in my comments, but I am quite excited about gloda. The papers sound great, a sound design. I am itching to try it out.
(In reply to comment #53)
> Not including spam, but including attachments (which I know you don't index),
> it's about 1.5 GB (surprisingly little, actually).
> 
> Including all spam, it's some 100 GB.
Why do you keep so much, just to train spam filters? Or out of historical/statistical interest?
Sound like you're only concern is to exclude the spam folder from the index. 
1.5 GB including attachments should result in no more than a couple of dozen or hundred MB.

I don't keep spam, but I'd want to exclude my server-side-filtered almost-certain-spam folders.
Maybe I'd also exclude my trash folders, although others argued that they need to find accidentally deleted mail as well.
Attachment #335946 - Attachment is obsolete: true
Attachment #335946 - Flags: superreview?(bienvenu)
Attachment #335946 - Flags: review?(dmose)
Attachment #335946 - Flags: review?(bienvenu)
Whiteboard: [under review bienvenu] [under review dmose]
> > Including all spam, it's some 100 GB.
> Why do you keep so much, just to train spam filters?

Because there might still be ham underneath it. I don't trust spam filters. I'll only delete it once I'm sure there wasn't a mail from a friend underneath it. It's currently gzipped, not even on the IMAP server.

> I'd want to exclude my server-side-filtered almost-certain-spam folders.

Yes, please.

Going back to the original question: We currently have the "offline" storage pref per account and per folder. I think we need the same for gloda full-text indexing, maybe even reusing the same UI checkbox.
removing the nickname, as it makes searching for gloda bugs harder.
Alias: gloda
not sure about 2gb. i know that historically w/ some of my old mail boxes a mailbox index (this should have been headers only, since the mailbox was imap) was about 10% the cost of the mailbox. My gmail mailbox is about 5gb atm. My non gmail mailbox was 3.5gb (6.8million messages) when i took a backup of it in November.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: