Bug 1566717 Comment 58 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

kris, I was able to read the logs using "less" and search for things. What I see is as you describe: I see one complete fetch of all the messages that works fine. Then you do the copy again and this time there is a tcp timeout error. This will occur if the server takes more than 100 seconds to respond to anything tb sends to it. It looks likes the server sends a 64k size packet of message data and tb ACKs the response but the server doesn't send another packet for over 100 seconds. Tb then closes the tcp connection, opens a new connection and requests the data again and eventually the same thing happens (tb detect a server timeout). Tb only does 2 retries so nothing more happens.

I can't tell exactly what is happening since you need something like wireshark to see all the tcp transactions. I tried to duplicate this on my system by slowing down the network interface some but never saw a "timeout". I did see one other problem: with a slow network all the messages are fetched OK and it takes more than 10m. When the fetch completes, tb then does an imap "check" command but the "stream" got "closed" and the check fails. (Tb does imap check on 10 min interval.) This causes tb to retry the fetches and it fails again due to the "check". This also results in "blank" messages in LFs like you see. But the blank messages are really duplicates due to the retry. (I suspect your blanks are duplicates due to retries too but not 100% sure.) I can fix this by changing the hardcoded imap check interval from 10m to something bigger, but not a good solution so need to look closer at this too.

So you might try your copy again with mailnews.tcptimeout set to something bigger (use the config editor in advanced settings). Maybe 1000 seconds would be OK. Be sure to restart tb if you change this so it takes effect. Then again, I'm not 100% sure your server will ever answer or respond again so this might not help.

Also, I have made a modified ESR 68 build that you can run and it addresses the issue described in comment 52 so that if the source folder has offline store it is used rather than having to fetch it from the server. It is here:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&collapsedPushes=525525&revision=8e3022b16b5eb4b314546b11393bf9455f6f5455&selectedJob=287405213
Click on the green "B" next to Windows 2012 and you will see items appear below. Then click on "Job details" and there are various items you can select. I think the one you want is called "target.installer.exe". This will install the 68 version with my small change. If you run this you should set the mailnew.tcptimeout back to default 100.  Here's a direct link to target.installer.exe: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/ZhVdaIpISRKv3qgXzGnvmQ/runs/0/artifacts/public/build/install/sea/target.installer.exe  So if you can let me know if this works I would greatly appreciate it.
kris, I was able to read the logs using "less" and search for things. What I see is as you describe: I see one complete fetch of all the messages that works fine. Then you do the copy again and this time there is a tcp timeout error. This will occur if the server takes more than 100 seconds to respond to anything tb sends to it. It looks likes the server sends a 64k size packet of message data and tb ACKs the response but the server doesn't send another packet for over 100 seconds. Tb then closes the tcp connection, opens a new connection and requests the data again and eventually the same thing happens (tb detect a server timeout). Tb only does 1 retry so nothing more happens.

I can't tell exactly what is happening since you need something like wireshark to see all the tcp transactions. I tried to duplicate this on my system by slowing down the network interface some but never saw a "timeout". I did see one other problem: with a slow network all the messages are fetched OK and it takes more than 10m. When the fetch completes, tb then does an imap "check" command but the "stream" got "closed" and the check fails. (Tb does imap check on 10 min interval.) This causes tb to retry the fetches and it fails again due to the "check". This also results in "blank" messages in LFs like you see. But the blank messages are really duplicates due to the retry. (I suspect your blanks are duplicates due to retries too but not 100% sure.) I can fix this by changing the hardcoded imap check interval from 10m to something bigger, but not a good solution so need to look closer at this too.

So you might try your copy again with mailnews.tcptimeout set to something bigger (use the config editor in advanced settings). Maybe 1000 seconds would be OK. Be sure to restart tb if you change this so it takes effect. Then again, I'm not 100% sure your server will ever answer or respond again so this might not help.

Also, I have made a modified ESR 68 build that you can run and it addresses the issue described in comment 52 so that if the source folder has offline store it is used rather than having to fetch it from the server. It is here:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&collapsedPushes=525525&revision=8e3022b16b5eb4b314546b11393bf9455f6f5455&selectedJob=287405213
Click on the green "B" next to Windows 2012 and you will see items appear below. Then click on "Job details" and there are various items you can select. I think the one you want is called "target.installer.exe". This will install the 68 version with my small change. If you run this you should set the mailnew.tcptimeout back to default 100.  Here's a direct link to target.installer.exe: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/ZhVdaIpISRKv3qgXzGnvmQ/runs/0/artifacts/public/build/install/sea/target.installer.exe  So if you can let me know if this works I would greatly appreciate it.

Back to Bug 1566717 Comment 58