Open Bug 1081896 Opened 10 years ago Updated 4 years ago

Thunderbird 31.1.2 for Windows intermittent loses profiles stored on DFS network share on Windows Server

Categories

(MailNews Core :: Networking, defect)

x86_64
Windows Server 2008
defect
Not set
critical

Tracking

(Not tracked)

UNCONFIRMED

People

(Reporter: enz-coast, Unassigned)

Details

(Keywords: dataloss, Whiteboard: [dupeme])

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0
Build ID: 20140923175406

Steps to reproduce:

Sometimes our users ( Windows 7 64 BIT Workstations ) lose their profiles. We use folder-redirection via GPO to redirect the user profile-folder to a serverpath ( \\servername\profiles\user\ ) and the users "Application Data" folder to a DFS-Namespace ( \\domainname\dfs\user\ ). In the last 10 months we´ve lost 172 profiles with this bug. All previous versions of Thunderbird including the newewst ( we always have the newest release versions deployed to our clients ) showed the same behaviour/bug. The data is still there but on every startup of Thunderbird you receive the "New E-Mail Address" wizard. To fix it, we had to manually delete the "Thunderbird" folder under "Appdata" and than we had to recover it via backup software. Then we have to wait until all DFS servers have synched and then Thunderbird runs flawless until the error returns ( weeks, months, days ) later. We renamed all profiles using the profile manager from default to the unsername but that did not help at all... We use lightning and lookout as addons on a few computers, but the computers without addons are also experiencing the bug.
Problem occurs also on:
-Windows 7 32BIT as Client OS
-Windows Server 2012 configured as DFS Server
-Windows Server 2003 configured as DFS Server
Thunderbird accesses his profile directory via "%APPDATA%\Thunderbird\profiles.ini".
What is set in environment variable named "APPDATA"?  (at Command Prompt, SET)
What is set in Tb's profiles.ini file? (check content by Text Editor)

> Sometimes our users ( Windows 7 64 BIT Workstations ) lose their profiles.
> We use folder-redirection via GPO to redirect the user profile-folder to a serverpath ( \\servername\profiles\user\ ) 

What phenomenon do you call by "lose their profiles"?
Profile directory of a Thunderbird on each PC is "\\servername\profiles\user\"?
(Note: Default Tb's profile directory location is "%APPDATA%\Thuderbird\Profiles" and default profile directory name is "%APPDATA%\Thuderbird\Profiles\<random_string>.<profile_name>".)
Does "lose their profiles" mean that all directories/files under the server resource "\\servername\profiles\user\" are deleted by Thunderbird while Thunderbird is running or when shutdown of Thunderbird ended?
Or "prefs.js file under Tb's profile directory"(where account  definitions are held) was deleted by Thunderbird while Thunderbird is running or when shutdown of Thunderbird ended?

> The data is still there but on every startup of Thunderbird you receive the "New E-Mail Address" wizard.
> To fix it, we had to manually delete the "Thunderbird" folder under "Appdata" and then we had to recover it via backup software. 

At where entire data is still there? Does it mean that backup by backup software is available?
"New E-Mail Address wizard" indicates "prefs.js under Tb's profile directory was not found, so newly generated" or "account definition entries was not found in prefs.js under Tb's profile directory".
Tb's profile directory is placed under "%APPDATA%Thunderbird\Profiles"?

If "prefs.js file under Tb's profile directory" was deleted by Thunderbird while Thunderbird is running, or when shutdown of Thunderbird is executed, it can bee known by Process Monitor log(search Google for Process Monitor of MS Win).
   filter : path doesn't contains \prefs, Exclude, path doesn't ends with .js, Exclude.
IIUC, Tb does do following upon prefs.js save.
   1. Write current preference data to prefs-N.js file.
   2. Rename prefs-N.js to prefs.js, with replace mode(prefs.js is removed before rename by OS).
So, window of "loss of prefs.js due to delete by Thunderbird" is pretty small.
   If step 2. was not executed, prefs-N.js files are kept.
   So, if "loss of prefs.js" happens, it's produced by OS during executing "rename prefs-N.js to prefs.js with replace mode".
   I cannot imagine other than "Power failure during OS is executing "rename prefs-N.js to prefs.js with replace mode".
(In reply to enz-coast from comment #1)
> -Windows Server 2012 configured as DFS Server
> -Windows Server 2003 configured as DFS Server

DFS Namespaces and DFS Replication Overview
> http://technet.microsoft.com/en-us/library/jj127250.aspx
Distributed File System (DFS) in Windows Server 2003 Service Pack 1
> http://technet.microsoft.com/en-us/library/cc736868%28v=ws.10%29.aspx

Is DFS of Windows Server 2003 still supported? Bugs in "DFS in Windows Server 2003" are already fixed?
> http://en.wikipedia.org/wiki/Distributed_File_System_%28Microsoft%29
> Distributed File System (Microsoft)
> If a server fails, the client can select a different server transparently to the user. 
> One major caveat regarding this flexibility is that currently-open files will potentially become unusable,
> as open files cannot be failed-over.

According to above document, if file share server failure occurs while file is opened, the "opened file" is not failed over.
No file share server failure is involved in your case?
Your bug summary.
> Thunderbird 31.1.2 for Windows intermittent loses profiles stored on DFS share on 2008R2 Server
Your comment #1.
> -Windows Server 2012 configured as DFS Server
> -Windows Server 2003 configured as DFS Server

What is your actual environment?
(In reply to WADA from comment #5)
> Your bug summary.
> > Thunderbird 31.1.2 for Windows intermittent loses profiles stored on DFS share on 2008R2 Server
> Your comment #1.
> > -Windows Server 2012 configured as DFS Server
> > -Windows Server 2003 configured as DFS Server
> 
> What is your actual environment?

Our actual Environment is 12x Windows 2008 R2 Standard Servers running different services for ~500 Clients on Windows 7 Professional 64BIT. We installed Win7 Prof 32BIT, Win2012 Server and Win2003 Server for testing purposes to check if the error we encounter is present there, too - and yes - it is, but after we proved that, we removed the test clients and servers from our environment.
When I say - "users lose their profiles" I mean the following events:

-User is working flawless in Thunderbird
-He closes the application for one of 1000 reasons
-He starts the application again
-His addons are gone
-His e-mail accounts are gone
-All of his settings are gone
-He is welcomed by the "New E-Mail Address Wizard"

After all of the above things have happened, the data itself is still physically present on our server in the users directories. It apppears that nothing has been deleted as far as we can tell.

The profiles of our users are stored like this:
\\servername1\profiles$\%username%

The appdata\thunderbird directory is stored like this:
\\domain-name\dfs-namespace-name\userdata\%username%\appdata\thunderbird\

There are no changes made to the profile directory of thunderbird except the folder redirection for the whole appdata folder from ( default ) local harddrive to ( our config ) a folder on a dfs-namespace / dfs-share.

The servers which are part of our DFS-Namespace are not showing any eventlog entries that indicate replication problems or downtimes or something like that. That was our first guess and we monitored the event logs constantly over a period of 2 weeks, but the loss of user profiles continued and nothing showed up in the logs. The network was our second guess, so we continuously measured throuput and availability of all network devices in our environment. No downtimes, no timeouts, no bandwith bottlenecks on that end. Profile size was our next guess, so we analyzed all profiles which were gone, but nothing suspicable. Profiles are from 104 Megabytes up to 6.8 Gigabytes, but we have smaller and larger profiles in use that were not affected yet.

The problem on our side is that we can not reproduce the error on a specific machine, it happens ( as far as we know ) completely random. Some users experienced the bug a dozen times, some users experienced it only once and some did not until today.
(In reply to enz-coast from comment #7)
> When I say - "users lose their profiles" I mean the following events:
> 
> -User is working flawless in Thunderbird
> -He closes the application for one of 1000 reasons
> -He starts the application again
> -His addons are gone
> -His e-mail accounts are gone
> -All of his settings are gone
> -He is welcomed by the "New E-Mail Address Wizard"
> 
> After all of the above things have happened, the data itself is still
> physically present on our server in the users directories. 
> It apppears that nothing has been deleted as far as we can tell.

It sounds "loss of prefs.js file".
Actually nothing has been deleted from Tb's profile directory for each user which is held at server?
If prefs.js is not deleted, what is content of prefs.js file? prefs.js is null file?

> -His addons are gone
It indicates loss of following entry.
> user_pref("extensions.xpiState", "{ ... }");

> -His e-mail accounts are gone
> -He is welcomed by the "New E-Mail Address Wizard"
It indicates loss of following entry.
> user_pref("mail.accountmanager.accounts", "account1,account2, ..., accountN");

> -All of his settings are gone
It indicates loss of other entries in prefs.js.

I can't imagine "loss of partial data in prefs.js file" case. I can imagine "loss of prefs.js file" caase or "null prefs.js file" case only.

If "loss of prefs.js file" or "loss of prefs.js file content", following may be a way to hook it.
  Start Tb using BAT file, Script file, every day, with copying prefs.js before start of Tb, after termination of Tb.
     Copy prefs.js to <username>-<timestamp>-prefs.js
     Call thunderbird.exe
     After end of thunderbird.exe, Copy prefs.js to <username>-<timestamp>-prefs.js

If prefs.js is somehow null file, it may be cache relevant issue in DFS (issue in SMB2 is already known).
Note: Tb uses "rename prefs-N.js to prefs.js, with replacing previous prefs.js" upon prefs.js  save.
FYI.
Following is document found by Google search for "smb2 cache issues".
> http://support.microsoft.com/kb/2646563
> http://serverfault.com/questions/475678/weird-issue-on-2008-2008r2-shares-using-smb2
IIRC, Hotfix referred in these documents was "disabling SMB2 directory cache by default" or "force disabling SMB2 directory cache" in order to avoid problems in new SMB2.
(In reply to enz-coast from comment #7)
> The profiles of our users are stored like this:
> \\servername1\profiles$\%username%

It looks for me ordinal "SMB2 file share"(not SMB1 nor SMB3), as for Tb's profile directory.
When the next profile is lost, i´ll provide you with the 2 versions of the prefs file.

-The one before the lose - taken out of our daily backups.
-The one which is left there after the profile is lost.

Could take 1-3 days to stumble across a lost profile... so please excuse the delay. After the next lost profile, i´ll test your idea regarding the smb2 cache, to be sure that this is/is not our problem...
One question that I had in mind for months now is the question why Firefox is not affected by this. We use Firefox alongside Thunderbird for quite a time now, but Firefox never showed such behaviour. That point was the reason that made us believe, that Thunderbird has to be the problem and not our server architecture or the client computers config - please correct us if our assumption is wrong...
(In reply to enz-coast from comment #12)

A big difference between Thunderbird and Firefox is:
   Thunderbird has many mail folders, and Tb repeats open/close of xxx.msf/xxx file in Profile directory for mail folders.
   And many opened xxx.msf files are closed upon shotdown of Tb, and prefs.js is updated during the shutdown.
I believe that "number of files under Profile directory which should be closed upon termination" is far smaller in Firefox than Tb.
FYI.
Bug 545650 is problem observed in Mozilla due to bug of SMB2.
(In reply to WADA from comment #14)
> FYI.
> Bug 545650 is problem observed in Mozilla due to bug of SMB2.

but that was fixed long ago.
Summary: Thunderbird 31.1.2 for Windows intermittent loses profiles stored on DFS share on 2008R2 Server → Thunderbird 31.1.2 for Windows intermittent loses profiles stored on DFS network share on 2008R2 Server
Severity: normal → critical
Keywords: dataloss
(In reply to enz-coast from comment #11)
> When the next profile is lost, i´ll provide you with the 2 versions of the
> prefs file.
> 
> -The one before the lose - taken out of our daily backups.
> -The one which is left there after the profile is lost.

enz-coast,
Do you still see this problem?
And were you able to get this data?
Flags: needinfo?(enz-coast)
OS: Windows 7 → Windows Server 2008
sounds a bit like bug 597329
Error is still present in the newest release and the newest beta version and has been present in all versions since i release this bug report. Since nothing was fixed and no solution arrived here i forgot about the problem and we fixed it with a batchfile per user that copies the prefs.js away once per day and if something fails we copy it back and the problem is fixed... At the moment we have 3273 client computers running Thunderbird and we lose between 17 and 23 prefs files a day - nasty stuff, but you get used to it, after you recovered the first 1000 prefs files :-(
Flags: needinfo?(enz-coast)
Do you see this also with version 60?

Oddly enough, I might be able to test this in a month.
Flags: needinfo?(enz-coast)

Wayne,

Were you ever able to test this? Also is there a component that this can be reassigned to?

Thanks,

Lee

Flags: needinfo?(vseerror)

I am not sure what type of lower level network protocol DFS uses.

But the problem sounds so much like problems I observed for CIFS/SAMBA where short read (that a system call returns fewer octets than requested but clients, i.e., TBs fail to repeat the system calls until all the data is obtained) occurs when the server is swamped by many clients.

However, that problem, I thought, was observed only for clients on linux and OSX.
Windows clients should be immune to that issue, IMHO. (Windows WIN32 API for file I/O repeats the call under suitable conditions automagically thus only clients using linux and OSX are affected.)
So the problem may be elsewhere. Hmm...
On the other hand, DFS may have an issue when the file meta data may not be returned completely due to short read or some other quirks.

It would be interesting to see network capture of packets when the failure occurs and have them analyzed by DFS experts.

(In reply to Lee Goolsby from comment #20)

Wayne,

Were you ever able to test this?

Unfortunately no. Unlikely at this point.

Also is there a component that this can be reassigned to?

Thanks for asking. You can assign it to MailNews Core > Networking

Flags: needinfo?(vseerror)
Flags: needinfo?(leergoolsby)
Flags: needinfo?(enz-coast)
Summary: Thunderbird 31.1.2 for Windows intermittent loses profiles stored on DFS network share on 2008R2 Server → Thunderbird 31.1.2 for Windows intermittent loses profiles stored on DFS network share on Windows Server
Whiteboard: [dupeme]
Component: Untriaged → Networking
Flags: needinfo?(leergoolsby)
Product: Thunderbird → MailNews Core
Version: 31 Branch → 31
You need to log in before you can comment on or make changes to this bug.