Closed Bug 457342 Opened 13 years ago Closed 13 years ago

Handle group element count == 0 case in nsAutoSyncManager::DownloadMessagesForOffline

Categories

(MailNews Core :: Backend, defect)

defect
Not set
minor

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 3.0b1

People

(Reporter: bugmil.ebirol, Assigned: bugmil.ebirol)

Details

Attachments

(1 file, 4 obsolete files)

This is an minor improvement to handle unexpected cases in nsAutoSyncManager.
Flags: blocking-thunderbird3?
Biggest question is whether it is safe to remove an element while iterating nsCOMArray....
To reproduce:
- Move a large message from a folder to another on the same imap server
- Go to the destination folder, and select the message to display

Selecting the folder puts the folder into the download q, but since the message is large auto-sync manager postpones the download until the idle time. The user explicitly downloads the message by displaying. The next time auto-sync manager asks for messages to be downloaded from auto-sync state object of the folder, it returns nothing since all messages are literally downloaded. 

Without this patch, auto-sync manager interprets this situation as an error and 
the destination folder stays in the download q until something it has something to download. 

This bug doesn't affect synchronization of other folders, including the folders belong to the same imap server - HandleDownloadErrorFor() method deals with that properly. Auto-sync listeners receives contiguous download errors. 

This bug should be fixed before we implement bug 257942
Attached patch Rev 1 (obsolete) — Splinter Review
This patch also deals (gracefully) with the case where the user moves the messages while there are in the download queue.
Attachment #340678 - Attachment is obsolete: true
Attachment #341130 - Flags: superreview?(dmose)
Attachment #341130 - Flags: review?(bienvenu)
If you handled this case in DownloadMessages, then you wouldn't have to treat a non-error case as a special kind of error. Is there any reason not to handle that case in DownloadMessages?
Well, this occurs in the case where the caller thinks that the given folder has pending messages waiting to be downloaded, but in fact, it has not. So, from  DownloadMessagesForOffline() method's perspective this is an actual error, and naturally it should communicate this to the caller. For me, it is the caller's responsibility to recover from this error, since it drives the downloads, and the download queue.

If we handle this error in DownloadMessagesForOffline() method, first of, we overloads the logic of this function unnecessarily, second, we change the work flow of the application in a hidden way: the caller makes a request to DownloadMessagesForOffline() to download folderA, but to recover from this error, DownloadMessagesForOffline() switches to folderB without letting the caller know about it.
OK, thx for the explanation, that makes sense, my idea won't work.

But to be clear, I really don't like treating this as an error with an error code. DownloadMessagesForOffline is called in six places; with your patch, we'll check one of those places for the special error code, but the other five don't check, and will call HandleDownloadErrorFor. And that will call NOTIFY_LISTENERS...if the user can make this happen by innocently deleting a message, it doesn't seem like an error worthy of notifying listeners about. Maybe the one place you're checking is the one place where the situation can occur, but I can't really prove that to myself.

A different option is to have an out param in DownloadMessagesForOffline, and fill it in with the number/size of messages we're going to download. In the case where that's 0, the code can continue on to the next folder.
>I really don't like treating this as an error with an error code.

This is an actual error because the success criteria for DownloadMessagesForOffline is to put the download request URL into the URL queue. When this error case happens, DownloadMessagesForOffline cannot meet this criteria (because there is nothing to put into the queue),  and returns an error code explaining the situation. 

I understand that you don't consider "no message to download" case as an error case, and I agree that this is not an error for the user, but it is an error for nsAutoSyncManager since the folder is still hanging in the priority queue with an empty list. 

> Maybe the one place you're checking is the one place where the situation can occur

No, you are right, this is not the only place it can occur. _But_ this is the only place where we want to recover from this error, because this is the only place we initiate a download sequence for given account. At other places, DownloadMessagesForOffline is called because of the chaining and there are not good places to deal with this problem. Instead of trying to fix it, we simply skip the problem folder, and continue with the next in chain. HandleDownloadErrorFor method delivers this functionality.  They are not good places because the initiation point is the only spot that we catch all the problem folders. 

> it doesn't seem like an error worthy of notifying listeners about

if the folder is in the priority queue, it is guaranteed that the listener got a "folder added into the priority queue" notification. To complete the circle, we should notify them when we remove the folder from the queue as well. As you might notice, to recover from this error - to prevent the folder hanging in the queue for no reason -  we simple remove it. The notification belongs to that action.

Hope that makes sense.
Attachment #341130 - Flags: review?(bienvenu) → review+
Comment on attachment 341130 [details] [diff] [review]
Rev 1

> Instead of trying to fix it, we simply
> skip the problem folder, and continue with the next in chain.
> HandleDownloadErrorFor method delivers this functionality.

It looks to me like HandleDownloadErrorFor will retry the folder that didn't have any messages, even though we're 100% sure there aren't any messages to download and the retry will fail. Then that will fail and we'll advance to the next folder, but it's odd to try something we know will fail...

I'm sorry it seems like I'm fixated on the word "error". I agree that it's a situation that this particular caller of DownloadMessagesForOffline needs to handle;

It's not clear to me why changing the way you return this information from an rv to an out parameter will break the notification stuff since you're doing that notification here, if I understand correctly:

+        if (folder)
+          NOTIFY_LISTENERS(OnDownloadCompleted, (folder));

so in this case, you'll just send an OnDownloadCompleted and OnFolderRemovedFromQ, but in all the other cases, you'll send an OnDownloadError for the same situation, right?

But if you really want this to be return via rv, you should add a comment to the top of nsAutoSyncManager::DownloadMessagesForOffline to the effect that it will return NS_ERROR_NOT_AVAILABLE if there are no messages to download.
> it doesn't seem like an error worthy of notifying listeners about

Back to your comment #6, I didn't realize that you were referring to OnDownloadError notification in the HandleDwonloadErrorFor. This notification is used for diagnostic purposes only and it is turned off by default. It will be _deprecated_ with the bug #257942. I used this notification to test autosync after we removed all the DEBUG symbols from nsAutoSyncManager; this is how I found out about this bug in the first place. 

>It looks to me like HandleDownloadErrorFor will retry the folder that didn't
>have any messages, even though we're 100% sure there aren't any messages to
>download and the retry will fail. Then that will fail and we'll advance to the
>next folder, but it's odd to try something we know will fail...

HandleDownloadErrorFor never retries the problematic folder, it makes sure that the next time we try to download the same folder, we start from the first message that could not be downloaded. We try the same group of messages max 3 times, then we continue with the next group, but this is not happening in this method. Retrying is done by chaining mechanism, in OnDownloadCompleted. In case of an error HandleDownloadErrorFor's role is to switch to the next folder in order to not break the chain. Given that this problem only occurs when there is more than one sibling folder in the queue, and  when the folder in question is not the first one, it's not a big deal to handle it this way. To be exact, the real cost of this approach is to call HandleDownloadErrorFor (N-L+1) extra times, where N is the number of the sibling folders (folders belong to the same imap account) in the queue, and L is the position of the problem folder (relative to its siblings) in the queue. I would explain this approach as "ignoring a none critical problem until the execution reaches to the point where we can deal with it properly".

>It's not clear to me why changing the way you return this information from an
>rv to an out parameter will break the notification stuff

This is not what I mean. How we communicate this problem with the caller of DownloadMessagesForOffline is orthogonal. What I am saying is, if I understand your suggestion correctly, returning the number of pending messages in an out arg instead of returning an error code in order to eliminate the need to call into HandleDownloadErrorFor, doesn't solve our problem since we still need to i) move to the next folder in the chain, ii) remove the folder in question from the queue.

Another possibility is to change the signature of HandleDownloadErrorFor to pass the error code, and deal with it in this method. I didn't prefer this before because i) wasn't sure how nsCOMArray (priority queue) would react if we remove an element from it while iterating through (see Observe), ii) didn't want to implement this special case in a general error handling routine. Since I know now that the first one is not a problem anymore (I am using in this patch anyway), I am ok with moving this recovery logic into HandleDownloadErrorFor if it looks like a better (non confusing) solution..
pinging for sr
A quote from emre over IRC:

the problem with this bug is, when happens, it prevents auto-sync manager from downloading messages of a particular account. It is easy for beta tester to think that gloda doesn't index properly as a result. It might cause false bugs filed against gloda or auto-sync manager

Based on this, marking as blocking beta 1.
Flags: blocking-thunderbird3? → blocking-thunderbird3+
Whiteboard: [needs sr dmose]
well, the inbox of a particular account, which for a lot of users, is the same thing...
fwiw, emre suspects this is why i was cursing at autosync after lunch today.
Attachment #341130 - Flags: superreview?(dmose) → superreview?(neil)
Comment on attachment 341130 [details] [diff] [review]
Rev 1

Based on dmose suggestion, requesting sr from Neil.
Whiteboard: [needs sr dmose] → [needs sr Neil]
Attachment #341130 - Flags: superreview?(neil) → superreview?(dmose)
Whiteboard: [needs sr Neil] → [needs sr dmose]
Attachment #341130 - Flags: superreview?(dmose) → superreview?(bugzilla)
Whiteboard: [needs sr dmose] → [needs sr standard8]
Attachment #341130 - Flags: superreview?(bugzilla) → superreview-
Comment on attachment 341130 [details] [diff] [review]
Rev 1

>+        if (folder)
>+          NOTIFY_LISTENERS(OnDownloadCompleted, (folder));
>+        

nit: on all the "blank" lines in this patch, you've got unnecessary whitespace.

>+        autoSyncStateObj->SetState(nsAutoSyncState::stCompletedIdle);
>+        
>+        rv = mPriorityQ.RemoveObject(autoSyncStateObj);

As discussed on irc, I think this is risky. Especially in the case queue = mPriorityQ.

Additionally nsCOMArray isn't returning an nsresult here, its returning a boolean.

>+            rv = queue->AppendObject(nextFolderToDownload);

Ditto here with nsCOMArray/nsresult - its a boolean.

>+  NS_ENSURE_SUCCESS(rv, rv);
>+  
>+  if (!count)
>+    return NS_ERROR_NOT_AVAILABLE;

I think you should mention this return possibility (and why) at the start of the function, as David mentioned in comment 8.
Whiteboard: [needs sr standard8] → [needs new patch - will try to land tomorrow]
It turns out that removing this part from the iteration is not trivial without using a secondary looping mechanism or 'goto'. It seems better to implement the approach that I have mentioned at the end of comment 9, but this requires a small flow change in the code that needs to be tested thoroughly. 

I am not going to rush this patch. I am hoping to convince david to land a tested patch after code freeze.
Attached patch Alternative 1 (obsolete) — Splinter Review
This one removes the folders after the iteration is over. 

Under test, work in progress.
Attachment #341130 - Attachment is obsolete: true
Attached patch Rev 3 (obsolete) — Splinter Review
Same as the previous one with more documentation about auto-sync policy, and with some corrections in the existing documentation.

Tested on two different setups with and without gloda enabled. 

Asking for review and permission to land into b1.
Attachment #350214 - Attachment is obsolete: true
Attachment #350250 - Flags: superreview?(bugzilla)
Attachment #350250 - Flags: review+
Attachment #350250 - Flags: superreview?(bugzilla) → superreview+
Comment on attachment 350250 [details] [diff] [review]
Rev 3

>+  // there are pending messages but the cumulative size is zero:
>+  // treat as special case
>+  // 
>+  // Note that although it shouldn't happen, we know that sometimes 
>+  // imap servers manifest messages as zero length. By returning 
>+  // NS_ERROR_NOT_AVAILABLE we cause this folder to be removed from 
>+  // the priority queue temporarily (until the next idle or next update) 
>+  // in an effort to prevent it blocking other folders of the same account 
>+  // being synced. 
>+  if (!totalSize)
>+    return NS_ERROR_NOT_AVAILABLE;
>+  

Please fix all the whitespace issues mentioned by http://beaufour.dk/jst-review/?patch=350250&file= (see http://beaufour.dk/jst-review/ for more info) and check you haven't got whitespace on blank lines where you've added them (there are several places in this patch).

>-nsresult nsAutoSyncManager::HandleDownloadErrorFor(nsIAutoSyncState *aAutoSyncStateObj)
>+nsresult nsAutoSyncManager::HandleDownloadErrorFor(nsIAutoSyncState *aAutoSyncStateObj, nsresult error)

The new error parameter can be a const.
Whiteboard: [needs new patch - will try to land tomorrow] → [has reviews][needs comments addressing]
Attached patch Rev 4Splinter Review
Nits are addressed.
Attachment #350250 - Attachment is obsolete: true
Whiteboard: [has reviews][needs comments addressing] → [has reviews][needs comments addressing][checkin needed]
Keywords: checkin-needed
Checked in: http://hg.mozilla.org/comm-central/rev/00858c123ca4
Status: NEW → RESOLVED
Closed: 13 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Whiteboard: [has reviews][needs comments addressing][checkin needed]
You need to log in before you can comment on or make changes to this bug.