Closed Bug 456326 Opened 16 years ago Closed 16 years ago

"ASSERTION: Some other operation is in progress" when synchronizing mails

Categories

(MailNews Core :: Backend, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 3.0a3

People

(Reporter: gkw, Assigned: bugmil.ebirol)

Details

(Keywords: assertion)

Attachments

(3 files, 7 obsolete files)

###!!! ASSERTION: Some other operation is in progress: 'PR_FALSE', file /Users/skywalker/comm-central/mailnews/db/msgdb/src/nsMailDatabase.cpp, line 177

I get this when synchronizing IMAP mails, on the latest compiled TB debug build on 10.5.5.
The cause of the assertion is that someone is holding the lock on the folder. It seems that the lock is acquired by either compaction or offline download. I'm guessing the latter is causing it.

I recommend moving to MailNews Core: Database, but I'm not an expert here.
(In reply to comment #1)
> I recommend moving to MailNews Core: Database, but I'm not an expert here.

Ditto me. It seems to be related to the IMAP synchronization code when I was testing it though...

This assertion was mentioned by dmose here too:
https://bugzilla.mozilla.org/show_bug.cgi?id=455963#c10
This really belongs in IMAP, but putting it in mailnews core/backend for now because we've got the right tracking milestones there.  Here is some debug spew from a trunk build:

*** Skipping folder imap://dmose2@mail.mac.com/INBOX [Offline: no]
*** Skipping folder imap://dmose2@mail.mac.com/Sent Messages [Offline: no]
*** Skipping folder imap://dmose2@mail.mac.com/Trash [Offline: no]
in msgsDeleted
encoded message id = 2583B86B-DAD6-41C1-827A-338CC7869CA4%40mozilla.com
getting support file path = /Users/dmose/Library/Thunderbird/Profiles/7gmvijow.default/ImapMail/mail.mozilla.com/INBOX.mozmsgs/2583B86B-DAD6-41C1-827A-338CC7869CA4%40mozilla.com.mozeml
Downloading 5 messages for folder imap://dmosedale@mail.mozilla.com/INBOX
Downloading 1 messages for folder imap://dmosedale@mail.mozilla.com/INBOX
WARNING: nsImapMockChannel::SetContentCharset() not implemented: file /Users/dmose/s/comm-central-trunk/src/mailnews/imap/src/nsImapProtocol.cpp, line 8651
WARNING: nsImapMockChannel::SetContentCharset() not implemented: file /Users/dmose/s/comm-central-trunk/src/mailnews/imap/src/nsImapProtocol.cpp, line 8651
Downloading 1 messages for folder imap://dmosedale@mail.mozilla.com/INBOX
in msgsMoveCopyCompleted
encoded message id = 01c91cd0%249d069780%24128ee4d5%40tevvg
getting support file path = /Users/dmose/Library/Thunderbird/Profiles/7gmvijow.default/ImapMail/mail.mozilla.com/INBOX.mozmsgs/01c91cd0%249d069780%24128ee4d5%40tevvg.mozeml
encoded message id = 20080922162344.1167DB8141%40dm-mail01.mozilla.org
getting support file path = /Users/dmose/Library/Thunderbird/Profiles/7gmvijow.default/ImapMail/mail.mozilla.com/INBOX.mozmsgs/20080922162344.1167DB8141%40dm-mail01.mozilla.org.mozeml
encoded message id = 20080922160926.0C141B80E8%40dm-mail01.mozilla.org
getting support file path = /Users/dmose/Library/Thunderbird/Profiles/7gmvijow.default/ImapMail/mail.mozilla.com/INBOX.mozmsgs/20080922160926.0C141B80E8%40dm-mail01.mozilla.org.mozeml
moveCopyCompleted move = true
###!!! ASSERTION: Some other operation is in progress: 'PR_FALSE', file /Users/dmose/s/comm-central-trunk/src/mailnews/db/msgdb/src/nsMailDatabase.cpp, line 177
*** Folder imap://dmosedale@mail.mozilla.com/INBOX is syncd


And here's some debug spew from a trunk build with the new gloda indexer patches:

Downloading 1 messages for folder imap://dmosedale%40mozilla.com@mail.mozilla.com/INBOX
Downloading 3 messages for folder imap://dmosedale%40mozilla.com@mail.mozilla.com/INBOX
###!!! ASSERTION: Some other operation is in progress: 'PR_FALSE', file /Users/dmose/s/gloda/src/mailnews/db/msgdb/src/nsMailDatabase.cpp, line 177
Assignee: nobody → bugmil.ebirol
Component: General → Backend
Product: Thunderbird → MailNews Core
QA Contact: general → backend
Target Milestone: --- → Thunderbird 3.0b1
This is how to reproduce it:

- Move a large message from one folder to another in the same imap account -
make sure that the message body large enough so download will take 20-25
seconds.
- Select the destination folder that wait until the headers are updated. 
- Delete another message in the same folder. 
The assertion will show up on the console.

Like Joshua put it at #1, delete operation can't acquire the database lock
during background download and gives this assertion. This is true because all
delete/move operations are converted into pseudo-offline operations with bug
435153, means that, tb needs to create fake headers prior to send the delete
request (url) to the imap protocol layer. Since the database is not available
for writing, delete operation fails. This is also true for compacting. 

I believe copy is different since imap copy command is sent before the database
header creation, and imap protocol layer makes the copy request (copy url) is
queued until the download request (download url) is completed.

IDLE shouldn't be a problem neither since I don't think the imap server would
send us an IDLE command in the middle of a message transfer on the same folder.
But I need to confirm that.

Couple ideas to solve this problem:

1) Do not download immediately if the message size is larger than the default
group size.
2) Cancel background download operations when tb comes back from idle.
3) As I have suggested at
https://bugzilla.mozilla.org/show_bug.cgi?id=435153#c39 while implementing the
pseudo-offline feature, let the same manager/queue coordinates the
pseudo-offline and background download requests.
4) Make pseudo-offline aware of auto-sync, so it can wait until the background
op is completed.
5) Change the database locking system for better
6) if IDLE poses a problem then use the aging mechanism that we removed from 
435153's patch.

I think (1) should be implemented, and very easy to do it. (2) sounds like a
good idea but imap protocol layer doesn't provide us any mechanism to cancel
on-going url operations. A hack would be to close the session. (3) It is too
much effort at this point to implement it again. (4) Looks like
low-hanging-fruit approach. (5) Don't think it is even feasible. (6) Same as
#3.

Thoughts?
>- Select the destination folder that wait until the headers are updated. 

should be 

Select the destination folder **then** wait until the headers are updated.
Is this a dup? I could have sworn I explained the cause of this in another bug, and you were going to fix at least one case of it by avoiding grabbing the semaphore until after the select had happened using the new progress listener stuff?
Are you referring to our IRC discussion about folders not getting updating their contents? IIRC, you said that each download operation puts the connection into "Select" state first, and if imap layer finds out that there are new pending headers on the server, tries to add them into the database. But since auto-sync acquired the semaphore before select, operation fails and folder can't update its content. 

I believe it was rkent or wayne who brought up this issue, but not sure if a bug filed for it or not. Can you remember the bug number?

I think both are related but not the same I guess.
It could have been just an irc discussion, which is why it got lost track of, I guess. But what I said was that the imap layer finds there are headers that need to be deleted from the database, and that tries to grab the semaphore. Downloading headers doesn't grab the offline store sempahore.

I was just looking at this code, along with your example above:

nsAutoSyncManager::OnDownloadQChanged

  if (NS_SUCCEEDED(rv) && nsAutoSyncState::stCompletedIdle == state && !excluded)
  {
    // add this folder into the priority queue - if state == stCompletedIdle shouldn't be
    // in the priority queue
    autoSyncStateObj->SetState(nsAutoSyncState::stReadyToDownload);
    ScheduleFolderForOfflineDownload(autoSyncStateObj);
    
    if (mDownloadModel == dmParallel || !DoesQContainAnySiblingOf(mPriorityQ, autoSyncStateObj))
    {
      rv = DownloadMessagesForOffline(autoSyncStateObj);
      if (NS_FAILED(rv))
        HandleDownloadErrorFor(autoSyncStateObj);
    }
  }

It looks to me like this code will attempt to download message bodies even if we're not in the idle state. I don't think state == stCompletedIdle really refers to the user being idle, but I could be confused. Am I misunderstanding this code?
OK, the code that gets the stream and checks the semaphore and asserts should all not be called for imap. I'll attach a fix in a minute. We're not touching the offline store in the delete case, so there's no need for it to try to grab the semaphore.
Attached patch proposed fix (obsolete) — Splinter Review
imap db's don't need to grab the semaphore in this case, since deleting an imap message doesn't touch the offline store/local mailbox, unlike deleting a local message.
Attachment #339851 - Flags: superreview?(neil)
Attachment #339851 - Flags: review?(bugzilla)
>It looks to me like this code will attempt to download message bodies even if
>we're not in the idle state.

Correct, it does download first group of messages. It was our decision at the
meeting in San Fran that we download the first group before idle to make the
most recent messages available immediately.
By first group of messages, do you mean up to 50k of messages, or do you mean all the messages whose headers we just downloaded? (which would make the first run case a bit painful...)
Up to 50k. But, in case where the highest priority message in the queue is larger than the group size, and not excluded by the strategy, we don't mind, and we proceed with download of this single message.

Comment #4, item #1 suggests to change this behavior.
(In reply to comment #13)
> Up to 50k. But, in case where the highest priority message in the queue is
> larger than the group size, and not excluded by the strategy, we don't mind,
> and we proceed with download of this single message.
> 
> Comment #4, item #1 suggests to change this behavior.

In that case, that's an excellent idea, thx.
Comment on attachment 339851 [details] [diff] [review]
proposed fix

>+// we override this to avoid our parent class (nsMailDatabase)'s 
>+// grabbing of the folder semaphore, and bailing on failure.
>+NS_IMETHODIMP nsImapMailDatabase::DeleteMessages(nsTArray<nsMsgKey>* nsMsgKeys, nsIDBChangeListener *instigator)
>+{
>+  return nsMsgDatabase::DeleteMessages(nsMsgKeys, instigator);
> }
> 
> PRBool nsImapMailDatabase::SetHdrFlag(nsIMsgDBHdr *msgHdr, PRBool bSet, MsgFlags flag)
> {
>   return nsMsgDatabase::SetHdrFlag(msgHdr, bSet, flag);
> }
> 
> // override so nsMailDatabase methods that deal with m_folderStream are *not* called
Rather than adding a new comment, I'd prefer if you moved the existing comment before these two functions that it happens to refer to ;-)
Attachment #339851 - Flags: superreview?(neil) → superreview+
I'm happy to move the old comment up...
Attachment #339851 - Flags: review?(bugzilla) → review+
Attachment #339903 - Flags: review?(bienvenu)
Comment on attachment 339903 [details] [diff] [review]
First cut: Limits the size of the first group of messages

+  NS_ENSURE_ARG_POINTER(aTotalSize);
+  
+  *aTotalSize = 0U;

that can just be *aTotalSize = 0; 

and

+  if (aSizeLimit > 0 && aSizeLimit < totalSize)
+    return NS_ERROR_FAILURE;

can just be if (aSizeLimit && ...
r=bienvenu with those nits fixed.
Attachment #339903 - Flags: superreview?(dmose)
Attachment #339903 - Flags: review?(bienvenu)
Attachment #339903 - Flags: review+
this is what I'm going to check in - it turns out the comment Neil wanted me to move really did apply to the two methods that came after it, so I left it there, and added new comments to the methods above.
Attachment #339851 - Attachment is obsolete: true
marking fixed - Emre's patch is a good thing, but it's really only tangentially related to this bug, I think.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Attached patch Rev 1 (obsolete) — Splinter Review
Sorry for the trouble David, the one you have reviewed is the wrong patch that I have mentioned this morning at IRC.

This one fixes your nits, adds more comments, and adds more defensive code to some nsAutoSyncManager methods.
Attachment #339903 - Attachment is obsolete: true
Attachment #340029 - Flags: superreview?(bienvenu)
Attachment #340029 - Flags: review?(bienvenu)
Attachment #339903 - Flags: superreview?(dmose)
Comment on attachment 340029 [details] [diff] [review]
Rev 1

I'm not sure why you need both to pass in a size limit, and return the out size - couldn't you just pass in the size limit, and require getNextGroupOfMessages not to return more messages than will fit in that size? If none fit, return an empty array or an error? In other words, getNextGroupOfMessages should enforce the limit.

+  mGroupSize = (aGroupSize == 0 ? kDefaultGroupSize : aGroupSize);

this would look nicer as 

mGroupSize = aGroupSize ? aGroupSize : kDefaultGroupSize;

Not sure I buy this change:

-  mFolderStrategyImpl = aFolderStrategy;
+  if (aFolderStrategy)
+    mFolderStrategyImpl = aFolderStrategy;

What if the caller wants to make you release your ref to a previous mFolderStrategy? With this change, they couldn't do that.
>I'm not sure why you need both to pass in a size limit, and return the out size

I am passing the suggested-group-size to getNextGroupOfMessages as a parameter now, since it is only called by nsAutoSyncManager. In the current code we get this value by calling nsAutoSyncManager->GetGroupSize() method in the getNextGroupOfMessages.

>couldn't you just pass in the size limit, and require getNextGroupOfMessages
>not to return more messages than will fit in that size? If none fit, return an
>empty array or an error? In other words, getNextGroupOfMessages should enforce
>the limit.

I just wanted to give the control to the caller.

>What if the caller wants to make you release your ref to a previous
>mFolderStrategy? With this change, they couldn't do that.

Good point, not sure why they might want to do that but if they do, better be flexible.
 > I just wanted to give the control to the caller.

The caller is saying "don't give me more than XX" - that's where the control is, I would say.
I prefer to do this decision in nsAutoSyncManager::DownloadMessagesForOffline instead of nsAutoSyncState::GetNextGroupOfMessages. I thought you agree with me since you have already r+ this change.
Then why pass in the size requested? You weren't doing that in the previous patch. That's what makes the method seem weird.
Attached patch Rev 1.2 (obsolete) — Splinter Review
Attachment #340029 - Attachment is obsolete: true
Attachment #340041 - Flags: superreview?(bienvenu)
Attachment #340041 - Flags: review?(bienvenu)
Attachment #340029 - Flags: superreview?(bienvenu)
Attachment #340029 - Flags: review?(bienvenu)
Attachment #340041 - Flags: superreview?(bienvenu) → superreview?(dmose)
Actually only difference is "how GetNextGroupOfMessages getting the GroupSize".
In the original auto-sync code GetNextGroupOfMessages was getting it from
nsAutoSyncManager service's GetGroupSize() method. Now instead of making a
method call in GetNextGroupOfMessages we pass this value as an argument.

GroupSize is a suggested one, and it doesn't guarantee that the size of the
message in the group exceed this size. This is true for example if there is
only 1 message waiting to be downloaded and it is larger than suggested group
size. So, caller doesn't know exact size of the group. Second argument returns
this value to the caller in order to help her to make a decision based on its
place in the flow etc...

Sorry if argument names creates a confusion.
Attachment #340041 - Flags: review?(bienvenu) → review+
Comment on attachment 340041 [details] [diff] [review]
Rev 1.2

ok, then should aGroupSizeLimit be called "suggestedGroupSizeLimit" ? And GetNextGroupOfMessages is free to ignore it?
Attached patch Rev 2 (obsolete) — Splinter Review
>ok, then should aGroupSizeLimit be called "suggestedGroupSizeLimit" ? 

Sure, done. I am calling the second parameter aActualGroupSize, now to emphasize the difference.

>And GetNextGroupOfMessages is free to ignore it?

It is not free but if the next message that will be downloaded is larger than the given value, it has to download it anyway....
Attachment #340041 - Attachment is obsolete: true
Attachment #340041 - Flags: superreview?(dmose)
Attached patch Rev 3: post bug 455812 (obsolete) — Splinter Review
Summary of changes; 
- Limits the first, non-idle time download with default group size
- More comments
- Some defensive code

This patch is the result of the parallel effort to improve the robustness of the auto-sync code. This patch does not directly  address to the stated problem.
Attachment #340085 - Attachment is obsolete: true
Attachment #340149 - Flags: superreview?(dmose)
Attachment #340149 - Flags: review?(bienvenu)
Comment on attachment 340149 [details] [diff] [review]
Rev 3: post bug 455812 

some nits:

technically, you should rev the uuid of the interface, since you're changing it - in this case, the interface is so new that it's not needed, but it's a good habit to get into.

I just spotted this:

   if ( NS_FAILED(rv) || (NS_SUCCEEDED(rv) && 0 == count) )
     return NS_ERROR_FAILURE; 

so this is really just:

if (NS_FAILED(rv) || !count)

I don't know why, but I just find this weirdly unreadable:

+  PRUint32 totalSize = 0U;

compared to
PRUint32 totalSize = 0;

+        if ( 0 == (*aActualGroupSize) && msgSize >= aSuggestedGroupSizeLimit) 

if (!*aActualGroupSize ...)
Attachment #340149 - Flags: review?(bienvenu) → review+
Attached patch Rev 3.1 (obsolete) — Splinter Review
fixes the nits
Attachment #340192 - Flags: superreview+
Comment on attachment 340192 [details] [diff] [review]
Rev 3.1

Looks good; sr=dmose with a couple of nits fixed.

>+  nsIMutableArray getNextGroupOfMessages(in unsigned long aSuggestedGroupSizeLimit, 
>+                                        out unsigned long aActualGroupSize);

There's an indentation error here.

> {  
>   if (!aAutoSyncStateObj)
>     return NS_ERROR_INVALID_ARG;
>   
>   PRInt32 count;
>   nsresult rv = aAutoSyncStateObj->GetPendingMessageCount(&count);
>-  if ( NS_FAILED(rv) || (NS_SUCCEEDED(rv) && 0 == count) )
>+  // TODO: right thing to do here is to return an error to the caller saying that do not
>+  // try again. Not sure how to do it using nserror mechanism.
>+  // Note that we can't return success in case of 0 == count here since
>+  // we only remove the object from the queue in the OnDownloadCompleted method
>+  if (NS_FAILED(rv) || !count)
>     return NS_ERROR_FAILURE; 

Seems to me that the two cases here are somewhat different.  Getting an nsresult error code seems like it might want to return NS_ERROR_UNEXPECTED since it shouldn't really ever happen, whereas presumably the !count could happen because of a race where if that's possible).  Maybe NS_ERROR_NOT_AVAILABLE in that case?  Anyway, no need to fix this now, I'm just reacting to the comment, and if you agree you're welcome to update it.

>@@ -832,7 +851,7 @@ NS_IMETHODIMP nsAutoSyncManager::GetGrou
> }
> NS_IMETHODIMP nsAutoSyncManager::SetGroupSize(PRUint32 aGroupSize)
> {
>-  mGroupSize = aGroupSize;
>+  mGroupSize = aGroupSize ? aGroupSize : kDefaultGroupSize;
>   return NS_OK;

The current nsIAutoSyncManager.groupSize doxygen comments wouldn't leave me expecting this behavior at all (or knowing how to use the attribute).  Please update the doxygen comments to describe how the attribute actually works, including that the setter will interpret "0" to mean "use some reasonable default".
>presumably the !count could happen because of a race where if that's possible)

That should have read "presumably the !count could happen because of a race where one thread finishes the downloads while another thread is calling this method (if that's possible)".
>Getting an nsresult error code seems like it might want to return 
>NS_ERROR_UNEXPECTED since it shouldn't really ever happen, whereas presumably >the !count could happen because of a race where if that's possible).  Maybe
>NS_ERROR_NOT_AVAILABLE in that case?

Good idea! Because of the time constraint we have, I am going to file a new bug (bug 457342) for this improvement, submit a patch, and mark it milestone beta2. Thanks for the review.
Attachment #340192 - Attachment is obsolete: true
Whiteboard: [check-in needed]
Attachment #340149 - Attachment is obsolete: true
Attachment #340149 - Flags: superreview?(dmose)
Dan/David, are we taking this patch for beta 1? This bug isn't actually blocking but should it have been?
Whiteboard: [check-in needed] → [checkin needed - pre or post b1?]
we're not blocking for the remaining issue that this patch addresses; it's a nice to have, not a blocking issue. But I believe Dan said it was ok to take this patch on IRC, iirc.
Mark, this is the patch that we have discussed to move it to another bug or not. We didn't move it because Dan is convinced to land it with beta1 and accepted to review.
Attachment #340675 - Attachment description: Rev 4: fixes nits → [checked in] Rev 4: fixes nits
Comment on attachment 340675 [details] [diff] [review]
[checked in] Rev 4: fixes nits

Ok, I didn't see all those discussions. Landed patch, changeset id 449:77affe25fcd4.
Whiteboard: [checkin needed - pre or post b1?]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: