Open Bug 442908 Opened 16 years ago Updated 9 years ago

tb hangs when trying to attach a document if last document was attached from network smb-share drive that is now unreachable

Categories

(Thunderbird :: Message Compose Window, defect)

x86
Windows XP
defect
Not set
critical

Tracking

(thunderbird-esr17 affected)

Tracking Status
thunderbird-esr17 --- affected

People

(Reporter: jochen.zimmermann, Unassigned)

References

Details

(Keywords: hang, stackwanted, Whiteboard: dupme)

Attachments

(1 file)

User-Agent:       Mozilla/5.0 (compatible; Konqueror/3.5) KHTML/3.5.9 (like Gecko)
Build Identifier: 20080421

if a pc or especially laptop is used in an environment with smb-shares, and people send mails with attachments from these shares and try to attach documents without access to this network, thunderbird still tries to access the share and doesn't respond for 30 seconds or longer.

the main problem here is that the target is saved across sessions, so if the user tries to close the window because he thinks the program crashed, windows will ask to terminate tb because it doesn't respond, so at the next start tb does still try to attach from the unreachable share.

that leads to really confused users and possible data loss.

Reproducible: Always

Steps to Reproduce:
1. get into a network with smb-shares
2. attach a document from one of the shares
3. plug off the network (or switch to a network, where the above used server is not reachable
4. try to attach a document


Expected Results:  
a solution would be to reset the saved folder after tb is closed, if it was a network-folder.

another solution could be to see that tb doesn't get completely unresponsible in this time so the user sees that there is still something happening.
Summary: tb hangs when trying to attach a document if last document was attached from smb-share and is now unreachable → tb hangs when trying to attach a document if last document was attached from smb-share that is now unreachable
jochen, please test using trunk build ftp://ftp.mozilla.org/pub/thunderbird/nightly/latest-comm-1.9.1/
Whiteboard: closeme 2009-08-10
i'm on vacation until first week of august, can't test it earlier cause i have just linux-machines myself.
RESO INCO due to lack of response to last question. If you feel this change was made in error, please respond to this bug with your reasons why.
Status: UNCONFIRMED → RESOLVED
Closed: 15 years ago
Resolution: --- → INCOMPLETE
tested it with 3.0b4pre-en, problem still exists, seems to be even worse, with 2.0.* it hung about 2 minutes, with this version i killed it after 4 or 5 minutes because it was still hanging.

will try the debug-thing from the comment above now.

and it is *really* easy to reproduce, just attach a file from any windows-share, disconnect the nic and try again to attach a file.
Resolution: INCOMPLETE → FIXED
Status: RESOLVED → UNCONFIRMED
Resolution: FIXED → ---
windbg seems to need quite some time, i'm away from office now, so i will leave it over night and upload the outcome tomorrow.
Attached file debug log
correction, it just finished a minute before.
Whiteboard: closeme 2009-08-10
Attachment #395581 - Attachment mime type: text/x-log → text/plain
How long does next command take after pull off network in your environment?
  At command prompt.
     (Make shared file available. Mout as network drive if required)
     (Access to folder, files, and plug off network)
     NET VIEW
     NET VIEW \\<server_name>
     NET USE \\<server_name>\<share_name>

(In reply to comment #0)
> Steps to Reproduce:
> 1. get into a network with smb-shares
> 2. attach a document from one of the shares
> 3. plug off the network (or switch to a network, where the above used server is
> not reachable
> 4. try to attach a document

Does above mean "plug off the network while you are composing a mail"?
(attach operation was executed twice on a compose mail, one before plug-off, one after plug-off)

What is exact operation at step 4?
  File/Attach/Files?
  Drag&Drop of file from MS Windows Explorer to Tb's compose window?

(In reply to comment #5)
> it hung about 2 minutes, with this version
> i killed it after 4 or 5 minutes because it was still hanging.

No way to cancel send operation? (e.g. by "Cancel" at error dialog, ...)
If so, it sounds "retry forever(timeout detected)", "waiting forever(timeout is ignored or is not detected).

If File/Attach/Files, Tb's action at step 4 is displaying of file picker dialog, and if last used directory is not available, Tb falls back to a directory(desktop, working directory, MRU directory etc.) and displays folder picker dialog for the fallen back directory sooner or later.
When I renamed directory which is saved in mail.compose.attach.dir, both Tb 2.0.0.23 and Tb 3(trunk nightly) pre-selected working directory(program directory of Tb in my environment) at folder picker dialog.

Do you set "working directory of Tb" to smb shared directory?
(Property of Tb's short cut, "Working folder") 

Do you enable auto-save?
If auto-save is enabled, Tb currently tries to read attached file at step 2. "retry forever" or "wait forever" in your test is possibly phenomenon when draft save(so, it can occur upon send later or send, if attachment file is not read yet.)

By the way, Tb keeps directory names in next entries.
> browser.download.dir
> browser.download.downloadDir 
> messenger.save.dir
> mail.compose.attach.dir
If network is not available, manual change of them to existent local directory is a simple workaround of "long time to display file picker", although it's not applicable to your case.
> How long does next command take after pull off network in your environment?
>   At command prompt.
>      (Make shared file available. Mout as network drive if required)
>      (Access to folder, files, and plug off network)
>      NET VIEW
>      NET VIEW \\<server_name>
>      NET USE \\<server_name>\<share_name>

'net view' and 'net view \\server name' take quite some time when disconnected, 'net use' takes no time at all.

> Does above mean "plug off the network while you are composing a mail"?
> (attach operation was executed twice on a compose mail, one before plug-off,
> one after plug-off)

originally it happened when one of our people with laptops took his laptop home and tried to attach a file when he had last attached a file from the samba-server in the office.

so unplugging is just a fast way around; unplugging is faster than walking to another network to test it there.

so you can either
- plug the computer to another network
- unplug it before opening thunderbird
- unplug it while thunderbird is open
- unplug it while message composing window is open

> What is exact operation at step 4?
>   File/Attach/Files?
>   Drag&Drop of file from MS Windows Explorer to Tb's compose window?

file/attach/files.

> No way to cancel send operation? (e.g. by "Cancel" at error dialog, ...)
> If so, it sounds "retry forever(timeout detected)", "waiting forever(timeout
> is ignored or is not detected).

at first there is no error dialog, thunderbird just freezes for some time, which mostly leads people to hit the close-button and windows telling about a not reacting application and forcing it to close.

if you are patient enough to wait (which is something that many users are not) you get the error dialog that the share could not be found or you don't have rights to access it. and then the attach-window falls back to the desktop.

> If File/Attach/Files, Tb's action at step 4 is displaying of file picker
> dialog, and if last used directory is not available, Tb falls back to a
> directory(desktop, working directory, MRU directory etc.) and displays folder
> picker dialog for the fallen back directory sooner or later.
> When I renamed directory which is saved in mail.compose.attach.dir, both Tb
> 2.0.0.23 and Tb 3(trunk nightly) pre-selected working directory(program
> directory of Tb in my environment) at folder picker dialog.

yes, if it is a local directory there is no problem.

> Do you set "working directory of Tb" to smb shared directory?
> (Property of Tb's short cut, "Working folder") 

no, it's just the mail.compose.attach.dir-variable. what doesn't mean that i did any test with the other variables and network-shares, should be exactly the same thing there.

> Do you enable auto-save?
> If auto-save is enabled, Tb currently tries to read attached file at step 2.
> "retry forever" or "wait forever" in your test is possibly phenomenon when
> draft save(so, it can occur upon send later or send, if attachment file is not
> read yet.)

auto-save is enabled, but i don't think it has to do with that. disabling auto-save doesn't change it, what you write here could happen when i attach a file from a network-share and plug out then and try to save or send, but that's somethjng i did not try.

----

so, it i get that right, the problem is that thunderbird does not check if the directory saved in the variables is a local directory or a network share, and if it is a network share it does just try to access it before checking if it is available at all?

which seems to be generally a problem with windows and network-shares, but when someone uses thunderbirds and it hangs here, the problem will mostly be seen as a thunderbird-problem, though.

so the question is if there is any way to check if a saved path is local or on the net. and if is on the net to check if it is available, without looking to users like thunderbird has crashed.
Will "specify IP address of file share server in LMHOSTS" reduce time to get timeout? 
> C:\WINDOWS\system32\drivers\etc\LMHOST
> Copy lmhosts.sam(not AmiPro file :-) to LMHOSTS.TXT, and edit it by text editor.
> Then rename LMHOSTS.TXT to LMHOSTS.

Will "use as network drive instead of direct use of UNC" reduce time to get timeout? 
> At command prompt, NET USE X: \\server\sharename
> At MS Windows Explorer, Tool, "Assign network drive" or something
(In reply to comment #9)
> if you are patient enough to wait (which is something that many users are not)
> you get the error dialog that the share could not be found or you don't have
> rights to access it. and then the attach-window falls back to the desktop.

Time from (a) "file/attach after unplug network" to (b) "error dialog that the share could not be found or you don't have rights to access it" is same as time to take "NET VIEW \\server" for unavailable server?
Or time from (a) to (b) is far greater than time for "NET VIEW \\unavailable_server"?
What is accurate time from (a) to (b) or time for "NET VIEW \\unavailable_server"?
Time to take "NET VIEW \\unavailable_server" is very different at office and home? Or similar time?
How about IE or MS Outlook Express?
  1. IE(File/Open): specify \\<server>\<sharename>\xxx.yyy, view it 
     OE(Insert/Attachment) : File picker dialog, select smb shared file 
  2. plug off the network
  3. IE(File/Open): specify \\<server>\<sharename>\xxx.yyy, try to view
     OE(Insert/Attachment) : File picker dialog appears
Time to detect timeout by application(IE,OE) is far shorter than one by Tb?
(In reply to comment #9)
> so the question is if there is any way to check if a saved path is local or on
> the net.

Following is available conditions for "if" statement in BAT on MS Win-XP. 
> if [not] errorlevel number command [else expression]
> if [not] string1==string2 command [else expression]
> if [not] exist FileName command [else expression]
I think ordinal API for existence check is file level which is similar to above "if" of BAT. So, if last accessed attach directory is \\<server>\<sharename>\<subdir>, ordinal existence check can be done on \\<server>\<sharename>(\<subdir>) only. It'll prodcuces similar process as "NET VIEW \\<server>\<sharename>" or "DIR \\<server>\<sharename>\<sundir>".

Do you know API to know existence of already known smb servers only or already known shared directory only, without invoking new "NET VIEW \\<server>" like process?
(In reply to comment #13)
> Do you know API to know existence of already known smb servers only or already
> known shared directory only, without invoking new "NET VIEW \\<server>" like
> process?

hm, i don't know of anything like that. what of course doesn't mean there is nothing like that.

will answer to the other comments tomorrow when i'm in the office again.
i'll start with the timings first. they are all in a row, no time between.

4x opening attachment-window in tb:

1x ~40 seconds, 2x ~20 seconds, 1x 2 minutes

net view:
1x 36 seconds, 3x 0 seconds, 1x ~20 seconds

with lmhosts:
1x 40 seconds, 4x 0 seconds, 1x 20 seconds

so now with ie8 and lmhosts deleted again:
1x 14 seconds, 2x 0 seconds, 1x 7 seconds, 1x 0 seconds, 1x 20 seconds, 1x 0 seconds

while the next try with thunderbird was at 2 minutes 20 and the next 2 tries at 20 seconds again.

another detail with thunderbird, if i cancel the attachment-window (after popping up defaulted to home-directory) it hangs for about 14 or 15 seconds again every time.

so at least ie seems to do something different, the timings here were definitely shorter. a rather common value seems to be 20 seconds, which is reached by all 3, but while it seems to be the maximum with ie it is the minimum with tb. and tb is the only one that gets spikes up to 2 minutes and more.

made some tries with network drives too, seems not to make any difference.
(In reply to comment #15)
> and tb is the only one that gets spikes up to 2 minutes and more.

Tb retries when error? Or Tb fails to detect error return code then waits for timeout of request to OS?
Can you check Process Monitor log?
> http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx
  Start Process Monitor. Filter=Process Name contains thunderbird.exe.
  Stop capture, start Tb, disable auto-save, attach smb file.
  Start capture, plug off network, attach again.
  => falls back to local directory, stop capture.
  (If related path is found, add "path contains ***" to reduce number of logs.)
will try that in the end of next week, no time today and i'm away tomorrow for a week, somewhere where (hopefully) no computers will be around for some kilometres ;).
Severity: normal → critical
hm. seems like procmon and kaspersky don't go well together, after starting procmon the system freezes completely, what seems to be a known problem. tried to deactivate and close kaspersky, but still same result.

any other suggestions? filemon for example works without any problems.
I don't know that it's changed (even doubt that it has) but you might try v3.1
 ftp://ftp.mozilla.org/pub/thunderbird/nightly/latest-comm-1.9.2/
Summary: tb hangs when trying to attach a document if last document was attached from smb-share that is now unreachable → tb hangs when trying to attach a document if last document was attached from network smb-share drive that is now unreachable
Whiteboard: dupme
The following appear to be duplicates of this.

Bug 435986 - Attachments dialog does not correctly recognize all canceling of connect to network file share dialog boxes and hangs up for a long time in some situations

Bug 466130 - Thunderbird hang when network share no longer available
Status: UNCONFIRMED → NEW
Ever confirmed: true
WADA, are this bug and bug 435986 duplicates?
Flags: needinfo?(m-wada)
   (In reply to Thomas D. from comment #24)
> WADA, are this bug and bug 435986 duplicates?

IIUC, these two bugs are:

- This bug, Tb on Win, Fileshare=SMB
  problem :
  It takes pretty long in OS to detect previously accessed shared directory is not
  reachable any more.
  It is "OS is waiting for timeout", Tb is simply waiting for return code to API request.
  After a while, "resource is not available" is returned to API call by Tb,
  then Tb falls back to local resource such as MRU directory, Working directory etc.

- That bug. Tb on Mac OS X, Fileshare=afp
  problem :
  It takes pretty long in OS to detect previously accessed shared directory is not
  reachable any more.
  It is "OS is waiting for timeout", Tb is simply waiting for return code to API request.
  "resource is not available" is not returned to API call by Tb, so Tb waits forever.
  Or Tb fails to detect "resource is not available", tb requests same resource forever.
  According to that bug, if "Cancel" operation is done by user at somewhere,
  (I can't know wehre/how by bug comments)
  and "File Attach operation at composition window" is requested,
  Tb falls back to local resource such as MRU directory, Working directory etc.

Even though start point is "mail.compose.attach.dir == shard directory which is not reachable any more" and is absolutely same in these two bugs, phenomenon is different(one is "it takes pretty long" and another is "as if never ending"), and it looks for me that cause of "as if never ending" is in OS and/or AFP as written in bug 435986 comment #19.

"Decision on DUP or Not-DUP" is all up to Tb developers.
  If cause of these two bugs is inappropriate action by Tb or bad action by Tb
  when "mail.compose.attach.dir == shard directory which is not reachable any more",
  these two bugs are absolutely same problem, and are obviously DUP.

Same solution in Tb may be possible for these two bugs:
  Never request resource in mail.compose.attach.dir
  if "mail.compose.attach.dir == shard directory".
However, I don't think "DUPing these two bugs merely by same solution is possible" is appropriate action in software development.
Flags: needinfo?(m-wada)
Just meet this problem on Geckozone forums, with TB17. Reinitialization of mail.compose.attach.dir fix the issue.

Very problematic because this is just impossible to be solved by the user himself without our help.

See: http://forums.mozfr.org/viewtopic.php?f=4&t=114955
A recent posting on SUMO.
https://support.mozilla.org/en-US/questions/1061216
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: